#305In this paper we show how two standard outputs from information extraction (IE) systems - named entity annotations and scenario templates - can be used to enhance access totext collections via a standard text browser.
tech,36-1-H01-1040,ak
text collections
</term>
via a standard
<term>
text
browser
</term>
. We describe how this information
#310In this paper we show how two standard outputs from information extraction (IE) systems - named entity annotations and scenario templates - can be used to enhance access to text collections via a standardtext browser.
tech,38-3-H01-1040,ak
increased potential of
<term>
IE-enhanced
text
browsers
</term>
. At MIT Lincoln Laboratory
#383We also report results of a preliminary, qualitative user evaluation of the system, which while broadly positive indicates further work needs to be done on the interface to make users aware of the increased potential of IE-enhanced text browsers.
up to six extracts of translated newswire
text
. Some of the extracts were
<term>
expert
#695Subjects were given a set of up to six extracts of translated newswire text.
other,20-2-P01-1008,ak
translations
</term>
of the same
<term>
source
text
</term>
. Our approach yields
<term>
phrasal
#1799We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the same source text.
tech,29-1-N03-1018,ak
progressing from
<term>
generation of true
text
</term>
through its transformation into the
#2700In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing from generation of true text through its transformation into the noisy output of an OCR system.
other,38-3-N03-1018,ak
translation lexicons
</term>
from printed
<term>
text
</term>
. We present an application of
<term>
#2783We present an implementation of the model based on finite-state models, demonstrate the model's ability to significantly reduce character and word error rate, and provide evaluation results involving automatic extraction of translation lexicons from printedtext.
other,13-2-N03-2003,ak
data
</term>
can be supplemented with
<term>
text
</term>
from the
<term>
web
</term>
filtered
#3042In this paper, we show how training data can be supplemented withtext from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams.
other,24-1-N03-4010,ak
answering capability
</term>
on
<term>
free
text
</term>
. The demonstration will focus on
#3661The JAVELIN system integrates a flexible, planning-based architecture with a variety of language processing modules to provide an open-domain question answering capability on free text.
lr,19-2-N03-4010,ak
answer candidates
</term>
from the given
<term>
text
corpus
</term>
. The operation of the
<term>
#3682The demonstration will focus on how JAVELIN processes questions and retrieves the most likely answer candidates from the giventext corpus.
lr,1-3-P03-1050,ak
training resources
</term>
. No
<term>
parallel
text
</term>
is needed after the
<term>
training
#4480No parallel text is needed after the training phase.
lr,0-4-P03-1050,ak
phase
</term>
.
<term>
Monolingual , unannotated
text
</term>
can be used to further improve the
#4491Monolingual, unannotated text can be used to further improve the stemmer by allowing it to adapt to a desired domain or genre.
lr,26-6-P03-1050,ak
affix lists
</term>
, and
<term>
human annotated
text
</term>
, in addition to an
<term>
unsupervised
#4562Our resource-frugal approach results in 87.5% agreement with a state of the art, proprietary Arabic stemmer built using rules, affix lists, and human annotated text, in addition to an unsupervised component.
other,16-7-P03-1050,ak
average precision
</term>
over
<term>
unstemmed
text
</term>
, and 96 % of the performance of
#4588Task-based evaluation using Arabic information retrieval indicates an improvement of 22-38% in average precision over unstemmed text, and 96% of the performance of the proprietary stemmer above.
tech,7-1-H05-1032,ak
presents a
<term>
Bayesian model
</term>
for
<term>
text
summarization
</term>
, which explicitly
#5370The paper presents a Bayesian model fortext summarization, which explicitly encodes and exploits information on how human judgments are distributed over the text.
other,24-1-H05-1032,ak
judgments
</term>
are distributed over the
<term>
text
</term>
. Comparison is made against
<term>
#5387The paper presents a Bayesian model for text summarization, which explicitly encodes and exploits information on how human judgments are distributed over thetext.
other,12-2-H05-1032,ak
<term>
test data
</term>
from
<term>
Japanese news
texts
</term>
. It is found that the
<term>
Bayesian
#5403Comparison is made against non Bayesian summarizers, using test data from Japanese news texts.
other,13-1-I05-2013,ak
which takes as
<term>
input
</term>
a
<term>
raw
text
</term>
in
<term>
French
</term>
and produces
#6093We present a tool, called ILIMP, which takes as input a raw text in French and produces as output the same text in which every occurrence of the pronoun il is tagged either with tag [ANA] for anaphoric or [IMP] for impersonal or expletive.
other,23-1-I05-2013,ak
produces as
<term>
output
</term>
the same
<term>
text
</term>
in which every
<term>
occurrence
</term>
#6102We present a tool, called ILIMP, which takes as input a raw text in French and produces as output the sametext in which every occurrence of the pronoun il is tagged either with tag [ANA] for anaphoric or [IMP] for impersonal or expletive.
commercial systems outputting unsegmented
texts
with , for instance ,
<term>
statistical
#6314The use of BLEU at the character level eliminates the word segmentation problem: it makes it possible to directly compare commercial systems outputting unsegmented texts with, for instance, statistical MT systems which usually segment their outputs.