lr,12-2-P01-1008,bq |
identification of paraphrases
</term>
from a
<term>
|
corpus
|
of multiple English translations
</term>
|
#1789
We present an unsupervised learning algorithm for identification of paraphrases from acorpus of multiple English translations of the same source text. |
lr,19-4-N03-1012,bq |
successfully classifies 73.2 % in a
<term>
German
|
corpus
|
</term>
of 2.284
<term>
SRHs
</term>
as either
|
#2521
An evaluation of our system against the annotated data shows that, it successfully classifies 73.2% in a German corpus of 2.284 SRHs as either coherent or incoherent (given a baseline of 54.55%). |
lr,13-1-N03-2006,bq |
</term>
based on a small-sized
<term>
bilingual
|
corpus
|
</term>
, we use an out-of-domain
<term>
bilingual
|
#3093
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr,20-1-N03-2006,bq |
, we use an out-of-domain
<term>
bilingual
|
corpus
|
</term>
and , in addition , the
<term>
language
|
#3100
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr,33-1-N03-2006,bq |
model
</term>
of an in-domain
<term>
monolingual
|
corpus
|
</term>
. We conducted experiments with an
|
#3113
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr,19-3-N03-2006,bq |
of using an out-of-domain
<term>
bilingual
|
corpus
|
</term>
and the possibility of using the
<term>
|
#3143
The two evaluation measures of the BLEU score and the NIST score demonstrated the effect of using an out-of-domain bilingual corpus and the possibility of using the language model. |
lr,10-5-N03-2025,bq |
Markov Model
</term>
is trained on a
<term>
|
corpus
|
</term>
automatically tagged by the first
|
#3368
Then, a Hidden Markov Model is trained on acorpus automatically tagged by the first learner. |
lr,19-2-N03-4010,bq |
candidates
</term>
from the given
<term>
text
|
corpus
|
</term>
. The operation of the
<term>
system
|
#3682
The demonstration will focus on how JAVELIN processes questions and retrieves the most likely answer candidates from the given text corpus. |
other,15-1-P03-1009,bq |
classes
</term>
from undisambiguated
<term>
|
corpus
|
data
</term>
. We describe a new approach
|
#3899
Previous research has demonstrated the utility of clustering in inducing semantic verb classes from undisambiguatedcorpus data. |
lr,22-2-P03-1050,bq |
a small ( 10K sentences )
<term>
parallel
|
corpus
|
</term>
as its sole
<term>
training resources
|
#4469
The stemming model is based on statistical machine translation and it uses an English stemmer and a small (10K sentences) parallel corpus as its sole training resources. |
lr,7-2-P03-1051,bq |
by a small
<term>
manually segmented Arabic
|
corpus
|
</term>
and uses it to bootstrap an
<term>
|
#4648
Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus. |
lr,28-2-P03-1051,bq |
</term>
from a large
<term>
unsegmented Arabic
|
corpus
|
</term>
. The
<term>
algorithm
</term>
uses a
|
#4668
Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus. |
lr,9-4-P03-1051,bq |
estimated from a small
<term>
manually segmented
|
corpus
|
</term>
of about 110,000
<term>
words
</term>
|
#4700
The language model is initially estimated from a small manually segmented corpus of about 110,000 words. |
lr,21-5-P03-1051,bq |
million
<term>
word
</term><term>
unsegmented
|
corpus
|
</term>
, and re-estimate the
<term>
model
|
#4728
To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus. |
lr,34-5-P03-1051,bq |
<term>
vocabulary
</term>
and
<term>
training
|
corpus
|
</term>
. The resulting
<term>
Arabic word
|
#4741
To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus. |
lr,15-6-P03-1051,bq |
exact match accuracy
</term>
on a
<term>
test
|
corpus
|
</term>
containing 28,449
<term>
word tokens
|
#4759
The resulting Arabic word segmentation system achieves around 97% exact match accuracy on a test corpus containing 28,449 word tokens. |
lr,25-7-P03-1051,bq |
can create a small
<term>
manually segmented
|
corpus
|
</term>
of the
<term>
language
</term>
of interest
|
#4792
We believe this is a state-of-the-art performance and the algorithm can be used for many highly inflected languages provided that one can create a small manually segmented corpus of the language of interest. |
lr,9-1-P03-1068,bq |
of a large ,
<term>
semantically annotated
|
corpus
|
</term>
resource as reliable basis for the
|
#4943
We describe the ongoing construction of a large, semantically annotated corpus resource as reliable basis for the large-scale acquisition of word-semantic information, e.g. the construction of domain-independent lexica. |
lr,6-3-C04-1106,bq |
experiments conducted on a
<term>
multilingual
|
corpus
|
</term>
to estimate the number of
<term>
analogies
|
#5916
We report experiments conducted on a multilingual corpus to estimate the number of analogies among the sentences that it contains. |
lr,23-2-C04-1116,bq |
each author 's text as a coherent
<term>
|
corpus
|
</term>
. Our approach is based on the idea
|
#6137
This paper proposes a new methodology to improve the accuracy of a term aggregation system using each author's text as a coherentcorpus. |