#283In this paper we show how two standard outputs from information extraction (IE) systems - named entity annotations and scenario templates - can be used to enhance access to text collections via a standard text browser.
other,9-4-H01-1042,ak
assessors
</term>
can differentiate
<term>
native
from
non-native language essays
</term>
in less
#638A language learning experiment showed that assessors can differentiate native from non-native language essays in less than 100 words.
see if similar criteria could be elicited
from
duplicating the experiment using
<term>
machine
#673We tested this to see if similar criteria could be elicited from duplicating the experiment using machine translation output.
<term>
word or semantic error rate
</term>
)
from
a list of
<term>
word strings
</term>
, where
#1095The oracle knows the reference word string and selects the word string with the best performance (typically, word or semantic error rate) from a list of word strings, where each word string has been obtained by using a different LM.
ranking rules
</term>
automatically learned
from
<term>
training data
</term>
. We show that
#1429The SPR uses ranking rules automatically learned from training data.
<term>
identification of paraphrases
</term>
from
a
<term>
corpus of multiple English translations
#1788We present an unsupervised learning algorithm for identification of paraphrasesfrom a corpus of multiple English translations of the same source text.
</term>
, and a
<term>
learning algorithm
</term>
from
<term>
structured data
</term>
( based on a
#1980Our logical definition leads to a neat relation to categorial grammar, (yielding a treatment of Montague semantics), a parsing-as-deduction in a resource sensitive logic, and a learning algorithmfrom structured data (based on a typing-algorithm and type-unification).
These
<term>
models
</term>
, which are built
from
<term>
shallow linguistic features
</term>
#2150These models, which are built from shallow linguistic features of questions, are employed to predict target variables which represent a user's informational goals.
</term>
which is based on combining the results
from
different
<term>
answering agents
</term>
searching
#2342Motivated by the success of ensemble methods in machine learning and other areas of natural language processing, we developed a multi-strategy and multi-source approach to question answering which is based on combining the results from different answering agents searching for answers in multiple corpora.
resolution algorithm
</term>
that combines results
from
the
<term>
answering agents
</term>
at the
<term>
#2383We present our multi-level answer resolution algorithm that combines results from the answering agents at the question, passage, and/or answer levels.
</term>
of
<term>
phrase translations
</term>
from
<term>
word-based alignments
</term>
and
<term>
#2621Our empirical results, which hold for all examined language pairs, suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translationsfrom word-based alignments and lexical weighting of phrase translations.
words
</term>
and learning
<term>
phrases
</term>
from
<term>
high-accuracy word-level alignment
#2642Surprisingly, learning phrases longer than three words and learning phrasesfrom high-accuracy word-level alignment models does not have a strong impact on performance.
noisy channel framework
</term>
, progressing
from
<term>
generation of true text
</term>
through
#2696In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing from generation of true text through its transformation into the noisy output of an OCR system.
</term>
of
<term>
translation lexicons
</term>
from
printed
<term>
text
</term>
. We present an
#2781We present an implementation of the model based on finite-state models, demonstrate the model's ability to significantly reduce character and word error rate, and provide evaluation results involving automatic extraction of translation lexiconsfrom printed text.
can be supplemented with
<term>
text
</term>
from
the
<term>
web
</term>
filtered to match the
#3043In this paper, we show how training data can be supplemented with textfrom the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams.
possible to get bigger performance gains
from
the
<term>
data
</term>
by using
<term>
class-dependent
#3070In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams.
</term>
, the
<term>
blocks
</term>
are learned
from
<term>
source interval projections
</term>
#3452During training, the blocks are learned from source interval projections using an underlying word alignment.
write a topical report , culling information
from
a large inflow of multilingual , multimedia
#3595The TAP-XL Automated Analyst's Assistant is an application designed to help an English-speaking analyst write a topical report, culling information from a large inflow of multilingual, multimedia data.
most likely
<term>
answer candidates
</term>
from
the given
<term>
text corpus
</term>
. The
#3679The demonstration will focus on how JAVELIN processes questions and retrieves the most likely answer candidatesfrom the given text corpus.
inducing
<term>
semantic verb classes
</term>
from
undisambiguated
<term>
corpus data
</term>
#3898Previous research has demonstrated the utility of clustering in inducing semantic verb classesfrom undisambiguated corpus data.