lr,12-2-P01-1008,bq |
We present an
<term>
unsupervised learning algorithm
</term>
for
<term>
identification of paraphrases
</term>
from a
<term>
corpus
of multiple English translations
</term>
of the same
<term>
source text
</term>
.
|
#1789
We present an unsupervised learning algorithm for identification of paraphrases from acorpus of multiple English translations of the same source text. |
lr,19-4-N03-1012,bq |
An evaluation of our
<term>
system
</term>
against the
<term>
annotated data
</term>
shows that , it successfully classifies 73.2 % in a
<term>
German
corpus
</term>
of 2.284
<term>
SRHs
</term>
as either coherent or incoherent ( given a
<term>
baseline
</term>
of 54.55 % ) .
|
#2521
An evaluation of our system against the annotated data shows that, it successfully classifies 73.2% in a German corpus of 2.284 SRHs as either coherent or incoherent (given a baseline of 54.55%). |
lr,13-1-N03-2006,bq |
In order to boost the
<term>
translation quality
</term>
of
<term>
EBMT
</term>
based on a small-sized
<term>
bilingual
corpus
</term>
, we use an out-of-domain
<term>
bilingual corpus
</term>
and , in addition , the
<term>
language model
</term>
of an in-domain
<term>
monolingual corpus
</term>
.
|
#3093
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr,20-1-N03-2006,bq |
In order to boost the
<term>
translation quality
</term>
of
<term>
EBMT
</term>
based on a small-sized
<term>
bilingual corpus
</term>
, we use an out-of-domain
<term>
bilingual
corpus
</term>
and , in addition , the
<term>
language model
</term>
of an in-domain
<term>
monolingual corpus
</term>
.
|
#3100
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr,33-1-N03-2006,bq |
In order to boost the
<term>
translation quality
</term>
of
<term>
EBMT
</term>
based on a small-sized
<term>
bilingual corpus
</term>
, we use an out-of-domain
<term>
bilingual corpus
</term>
and , in addition , the
<term>
language model
</term>
of an in-domain
<term>
monolingual
corpus
</term>
.
|
#3113
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr,19-3-N03-2006,bq |
The two
<term>
evaluation measures
</term>
of the
<term>
BLEU score
</term>
and the
<term>
NIST score
</term>
demonstrated the effect of using an out-of-domain
<term>
bilingual
corpus
</term>
and the possibility of using the
<term>
language model
</term>
.
|
#3143
The two evaluation measures of the BLEU score and the NIST score demonstrated the effect of using an out-of-domain bilingual corpus and the possibility of using the language model. |
lr,10-5-N03-2025,bq |
Then , a
<term>
Hidden Markov Model
</term>
is trained on a
<term>
corpus
</term>
automatically tagged by the first
<term>
learner
</term>
.
|
#3368
Then, a Hidden Markov Model is trained on acorpus automatically tagged by the first learner. |
lr,19-2-N03-4010,bq |
The demonstration will focus on how
<term>
JAVELIN
</term>
processes
<term>
questions
</term>
and retrieves the most likely
<term>
answer candidates
</term>
from the given
<term>
text
corpus
</term>
.
|
#3682
The demonstration will focus on how JAVELIN processes questions and retrieves the most likely answer candidates from the given text corpus. |
other,15-1-P03-1009,bq |
Previous research has demonstrated the utility of
<term>
clustering
</term>
in inducing
<term>
semantic verb classes
</term>
from undisambiguated
<term>
corpus
data
</term>
.
|
#3899
Previous research has demonstrated the utility of clustering in inducing semantic verb classes from undisambiguatedcorpus data. |
lr,22-2-P03-1050,bq |
The
<term>
stemming model
</term>
is based on
<term>
statistical machine translation
</term>
and it uses an
<term>
English stemmer
</term>
and a small ( 10K sentences )
<term>
parallel
corpus
</term>
as its sole
<term>
training resources
</term>
.
|
#4469
The stemming model is based on statistical machine translation and it uses an English stemmer and a small (10K sentences) parallel corpus as its sole training resources. |
lr,7-2-P03-1051,bq |
Our method is seeded by a small
<term>
manually segmented Arabic
corpus
</term>
and uses it to bootstrap an
<term>
unsupervised algorithm
</term>
to build the
<term>
Arabic word segmenter
</term>
from a large
<term>
unsegmented Arabic corpus
</term>
.
|
#4648
Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus. |
lr,28-2-P03-1051,bq |
Our method is seeded by a small
<term>
manually segmented Arabic corpus
</term>
and uses it to bootstrap an
<term>
unsupervised algorithm
</term>
to build the
<term>
Arabic word segmenter
</term>
from a large
<term>
unsegmented Arabic
corpus
</term>
.
|
#4668
Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus. |
lr,9-4-P03-1051,bq |
The
<term>
language model
</term>
is initially estimated from a small
<term>
manually segmented
corpus
</term>
of about 110,000
<term>
words
</term>
.
|
#4700
The language model is initially estimated from a small manually segmented corpus of about 110,000 words. |
lr,21-5-P03-1051,bq |
To improve the
<term>
segmentation
</term><term>
accuracy
</term>
, we use an
<term>
unsupervised algorithm
</term>
for automatically acquiring new
<term>
stems
</term>
from a 155 million
<term>
word
</term><term>
unsegmented
corpus
</term>
, and re-estimate the
<term>
model parameters
</term>
with the expanded
<term>
vocabulary
</term>
and
<term>
training corpus
</term>
.
|
#4728
To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus. |
lr,34-5-P03-1051,bq |
To improve the
<term>
segmentation
</term><term>
accuracy
</term>
, we use an
<term>
unsupervised algorithm
</term>
for automatically acquiring new
<term>
stems
</term>
from a 155 million
<term>
word
</term><term>
unsegmented corpus
</term>
, and re-estimate the
<term>
model parameters
</term>
with the expanded
<term>
vocabulary
</term>
and
<term>
training
corpus
</term>
.
|
#4741
To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus. |
lr,15-6-P03-1051,bq |
The resulting
<term>
Arabic word segmentation system
</term>
achieves around 97 %
<term>
exact match accuracy
</term>
on a
<term>
test
corpus
</term>
containing 28,449
<term>
word tokens
</term>
.
|
#4759
The resulting Arabic word segmentation system achieves around 97% exact match accuracy on a test corpus containing 28,449 word tokens. |
lr,25-7-P03-1051,bq |
We believe this is a state-of-the-art performance and the
<term>
algorithm
</term>
can be used for many
<term>
highly inflected languages
</term>
provided that one can create a small
<term>
manually segmented
corpus
</term>
of the
<term>
language
</term>
of interest .
|
#4792
We believe this is a state-of-the-art performance and the algorithm can be used for many highly inflected languages provided that one can create a small manually segmented corpus of the language of interest. |
lr,9-1-P03-1068,bq |
We describe the ongoing construction of a large ,
<term>
semantically annotated
corpus
</term>
resource as reliable basis for the large-scale
<term>
acquisition of word-semantic information
</term>
, e.g. the construction of
<term>
domain-independent lexica
</term>
.
|
#4943
We describe the ongoing construction of a large, semantically annotated corpus resource as reliable basis for the large-scale acquisition of word-semantic information, e.g. the construction of domain-independent lexica. |
lr,6-3-C04-1106,bq |
We report experiments conducted on a
<term>
multilingual
corpus
</term>
to estimate the number of
<term>
analogies
</term>
among the
<term>
sentences
</term>
that it contains .
|
#5916
We report experiments conducted on a multilingual corpus to estimate the number of analogies among the sentences that it contains. |
lr,23-2-C04-1116,bq |
This paper proposes a new methodology to improve the
<term>
accuracy
</term>
of a
<term>
term aggregation system
</term>
using each author 's text as a coherent
<term>
corpus
</term>
.
|
#6137
This paper proposes a new methodology to improve the accuracy of a term aggregation system using each author's text as a coherentcorpus. |