identification of paraphrases
</term>
from a
<term>
corpus
of multiple English translations
</term>
#1790We present an unsupervised learning algorithm for identification of paraphrases from acorpus of multiple English translations of the same source text.
lr,44-1-N03-1004,ak
for
<term>
answers
</term>
in multiple
<term>
corpora
</term>
. The
<term>
answering agents
</term>
#2351Motivated by the success of ensemble methods in machine learning and other areas of natural language processing, we developed a multi-strategy and multi-source approach to question answering which is based on combining the results from different answering agents searching for answers in multiplecorpora.
lr,19-4-N03-1012,ak
successfully classifies 73.2 % in a
<term>
German
corpus
</term>
of 2.284
<term>
SRHs
</term>
as either
#2522An evaluation of our system against the annotated data shows that, it successfully classifies 73.2% in a German corpus of 2.284 SRHs as either coherent or incoherent (given a baseline of 54.55%).
lr,12-1-N03-2006,ak
</term>
based on a
<term>
small-sized bilingual
corpus
</term>
, we use an
<term>
out-of-domain bilingual
#3094In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus.
lr,19-1-N03-2006,ak
, we use an
<term>
out-of-domain bilingual
corpus
</term>
and , in addition , the
<term>
language
#3101In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus.
lr,32-1-N03-2006,ak
model
</term>
of an
<term>
in-domain monolingual
corpus
</term>
. We conducted experiments with an
#3114In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus.
lr,18-3-N03-2006,ak
of using an
<term>
out-of-domain bilingual
corpus
</term>
and the possibility of using the
<term>
#3144The two evaluation measures of the BLEU score and the NIST score demonstrated the effect of using an out-of-domain bilingual corpus and the possibility of using the language model.
lr,10-5-N03-2025,ak
Markov Model
</term>
is trained on a
<term>
corpus
</term>
automatically tagged by the first
#3369Then, a Hidden Markov Model is trained on acorpus automatically tagged by the first learner.
lr,19-2-N03-4010,ak
candidates
</term>
from the given
<term>
text
corpus
</term>
. The operation of the
<term>
system
#3683The demonstration will focus on how JAVELIN processes questions and retrieves the most likely answer candidates from the given text corpus.
other,15-1-P03-1009,ak
classes
</term>
from undisambiguated
<term>
corpus
data
</term>
. We describe a new approach
#3900Previous research has demonstrated the utility of clustering in inducing semantic verb classes from undisambiguatedcorpus data.
lr,15-5-P03-1031,ak
information
</term>
obtained from
<term>
dialogue
corpora
</term>
. Unlike conventional methods that
#4234This paper proposes a method for resolving this ambiguity based on statistical information obtained from dialogue corpora.
lr,17-2-P03-1050,ak
a
<term>
small ( 10K sentences ) parallel
corpus
</term>
as its sole
<term>
training resources
#4471The stemming model is based on statistical machine translation and it uses an English stemmer and a small (10K sentences) parallel corpus as its sole training resources.
lr,6-2-P03-1051,ak
by a
<term>
small manually segmented Arabic
corpus
</term>
and uses it to bootstrap an
<term>
#4650Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus.
lr,27-2-P03-1051,ak
</term>
from a
<term>
large unsegmented Arabic
corpus
</term>
. The
<term>
algorithm
</term>
uses a
#4670Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus.
lr,8-4-P03-1051,ak
estimated from a
<term>
small manually segmented
corpus
</term>
of about 110,000
<term>
words
</term>
#4702The language model is initially estimated from a small manually segmented corpus of about 110,000 words.
lr,18-5-P03-1051,ak
from a
<term>
155 million word unsegmented
corpus
</term>
, and re-estimate the
<term>
model
#4730To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus.
lr,34-5-P03-1051,ak
<term>
vocabulary
</term>
and
<term>
training
corpus
</term>
. The resulting
<term>
Arabic word
#4743To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus.
lr,15-6-P03-1051,ak
exact match accuracy
</term>
on a
<term>
test
corpus
</term>
containing 28,449
<term>
word tokens
#4761The resulting Arabic word segmentation system achieves around 97% exact match accuracy on a test corpus containing 28,449 word tokens.
lr,24-7-P03-1051,ak
can create a
<term>
small manually segmented
corpus
</term>
of the
<term>
language
</term>
of interest
#4794We believe this is a state-of-the-art performance and the algorithm can be used for many highly inflected languages provided that one can create a small manually segmented corpus of the language of interest.
lr,15-2-P03-1058,ak
</term>
from
<term>
English-Chinese parallel
corpora
</term>
, which are then used for disambiguating
#4840In this paper, we evaluate an approach to automatically acquire sense-tagged training data from English-Chinese parallel corpora, which are then used for disambiguating the nouns in the SENSEVAL-2 English lexical sample task.