|
this paper we show how two standard outputs
|
from
|
<term>
information extraction ( IE ) systems
|
#283
In this paper we show how two standard outputs from information extraction (IE) systems - named entity annotations and scenario templates - can be used to enhance access to text collections via a standard text browser. |
other,9-4-H01-1042,bq |
assessors
</term>
can differentiate
<term>
native
|
from
|
non-native language essays
</term>
in less
|
#638
A language learning experiment showed that assessors can differentiate native from non-native language essays in less than 100 words. |
|
see if similar criteria could be elicited
|
from
|
duplicating the experiment using
<term>
machine
|
#673
We tested this to see if similar criteria could be elicited from duplicating the experiment using machine translation output. |
|
<term>
word or semantic error rate
</term>
)
|
from
|
a list of
<term>
word strings
</term>
, where
|
#1095
The oracle knows the reference word string and selects the word string with the best performance (typically, word or semantic error rate) from a list of word strings, where each word string has been obtained by using a different LM. |
|
ranking rules
</term>
automatically learned
|
from
|
<term>
training data
</term>
. We show that
|
#1429
The SPR uses ranking rules automatically learned from training data. |
|
<term>
identification of paraphrases
</term>
|
from
|
a
<term>
corpus of multiple English translations
|
#1787
We present an unsupervised learning algorithm for identification of paraphrasesfrom a corpus of multiple English translations of the same source text. |
|
</term>
, and a
<term>
learning algorithm
</term>
|
from
|
<term>
structured data
</term>
( based on a
|
#1979
Our logical definition leads to a neat relation to categorial grammar, (yielding a treatment of Montague semantics), a parsing-as-deduction in a resource sensitive logic, and a learning algorithmfrom structured data (based on a typing-algorithm and type-unification). |
|
These
<term>
models
</term>
, which are built
|
from
|
<term>
shallow linguistic features
</term>
|
#2149
These models, which are built from shallow linguistic features of questions, are employed to predict target variables which represent a user's informational goals. |
|
</term>
which is based on combining the results
|
from
|
different
<term>
answering agents
</term>
searching
|
#2341
Motivated by the success of ensemble methods in machine learning and other areas of natural language processing, we developed a multi-strategy and multi-source approach to question answering which is based on combining the results from different answering agents searching for answers in multiple corpora. |
|
resolution algorithm
</term>
that combines results
|
from
|
the
<term>
answering agents
</term>
at the
<term>
|
#2382
We present our multi-level answer resolution algorithm that combines results from the answering agents at the question, passage, and/or answer levels. |
|
</term>
of
<term>
phrase translations
</term>
|
from
|
<term>
word-based alignments
</term>
and
<term>
|
#2620
Our empirical results, which hold for all examined language pairs, suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translationsfrom word-based alignments and lexical weighting of phrase translations. |
|
words
</term>
and learning
<term>
phrases
</term>
|
from
|
<term>
high-accuracy word-level alignment
|
#2641
Surprisingly, learning phrases longer than three words and learning phrasesfrom high-accuracy word-level alignment models does not have a strong impact on performance. |
|
noisy channel framework
</term>
, progressing
|
from
|
generation of
<term>
true text
</term>
through
|
#2695
In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing from generation of true text through its transformation into the noisy output of an OCR system. |
|
</term>
of
<term>
translation lexicons
</term>
|
from
|
<term>
printed text
</term>
. We present an
|
#2780
We present an implementation of the model based on finite-state models, demonstrate the model's ability to significantly reduce character and word error rate, and provide evaluation results involving automatic extraction of translation lexiconsfrom printed text. |
|
can be supplemented with
<term>
text
</term>
|
from
|
the
<term>
web
</term>
filtered to match the
|
#3042
In this paper, we show how training data can be supplemented with textfrom the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams. |
|
possible to get bigger performance gains
|
from
|
the
<term>
data
</term>
by using
<term>
class-dependent
|
#3069
In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams. |
|
</term>
, the
<term>
blocks
</term>
are learned
|
from
|
<term>
source interval projections
</term>
|
#3451
During training, the blocks are learned from source interval projections using an underlying word alignment. |
|
topical report
</term>
, culling information
|
from
|
a large inflow of
<term>
multilingual , multimedia
|
#3594
The TAP-XL Automated Analyst's Assistant is an application designed to help an English-speaking analyst write a topical report, culling information from a large inflow of multilingual, multimedia data. |
|
most likely
<term>
answer candidates
</term>
|
from
|
the given
<term>
text corpus
</term>
. The
|
#3678
The demonstration will focus on how JAVELIN processes questions and retrieves the most likely answer candidatesfrom the given text corpus. |
|
inducing
<term>
semantic verb classes
</term>
|
from
|
undisambiguated
<term>
corpus data
</term>
|
#3897
Previous research has demonstrated the utility of clustering in inducing semantic verb classesfrom undisambiguated corpus data. |