lr,1-2-H92-1074,bq |
of the art in
<term>
CSR
</term>
. This
<term>
|
corpus
|
</term>
essentially supersedes the now old
|
#19554
Thiscorpus essentially supersedes the now old Resource Management (RM) corpus that has fueled DARPA speech recognition technology development for the past 5 years. |
lr,10-5-N03-2025,bq |
Markov Model
</term>
is trained on a
<term>
|
corpus
|
</term>
automatically tagged by the first
|
#3368
Then, a Hidden Markov Model is trained on acorpus automatically tagged by the first learner. |
lr,11-4-P05-1074,bq |
extracted from a
<term>
bilingual parallel
|
corpus
|
</term>
to be ranked using
<term>
translation
|
#9729
We define a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities, and show how it can be refined to take contextual information into account. |
lr,12-2-P01-1008,bq |
identification of paraphrases
</term>
from a
<term>
|
corpus
|
of multiple English translations
</term>
|
#1789
We present an unsupervised learning algorithm for identification of paraphrases from acorpus of multiple English translations of the same source text. |
lr,12-4-C92-1055,bq |
possible variations between the
<term>
training
|
corpus
|
</term>
and the real tasks are also taken
|
#17893
To make the proposed algorithm robust, the possible variations between the training corpus and the real tasks are also taken into consideration by enlarging the separation margin between the correct candidate and its competing members. |
lr,13-1-N03-2006,bq |
</term>
based on a small-sized
<term>
bilingual
|
corpus
|
</term>
, we use an out-of-domain
<term>
bilingual
|
#3093
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr,15-2-C90-3063,bq |
co-occurrence patterns
</term>
in a large
<term>
|
corpus
|
</term>
. To a large extent , these
<term>
|
#16631
This paper presents an automatic scheme for collecting statistics on co-occurrence patterns in a largecorpus. |
lr,15-6-P03-1051,bq |
exact match accuracy
</term>
on a
<term>
test
|
corpus
|
</term>
containing 28,449
<term>
word tokens
|
#4759
The resulting Arabic word segmentation system achieves around 97% exact match accuracy on a test corpus containing 28,449 word tokens. |
lr,16-6-H90-1060,bq |
adaptation ( SA )
</term>
using the new
<term>
SI
|
corpus
|
</term>
and a small amount of
<term>
speech
|
#17136
Second, we show a significant improvement for speaker adaptation (SA) using the new SI corpus and a small amount of speech from the new (target) speaker. |
lr,17-4-C04-1116,bq |
context features
</term>
in each author 's
<term>
|
corpus
|
</term>
tend not to be
<term>
synonymous expressions
|
#6175
According to our assumption, most of the words with similar context features in each author'scorpus tend not to be synonymous expressions. |
lr,18-4-P06-2001,bq |
using a bigger and a more homogeneous
<term>
|
corpus
|
</term>
to train , that is , a bigger
<term>
|
#11300
Finally, we have shown that these results can be improved using a bigger and a more homogeneouscorpus to train, that is, a bigger corpus written by one unique author. |
lr,19-2-N03-4010,bq |
candidates
</term>
from the given
<term>
text
|
corpus
|
</term>
. The operation of the
<term>
system
|
#3682
The demonstration will focus on how JAVELIN processes questions and retrieves the most likely answer candidates from the given text corpus. |
lr,19-3-N03-2006,bq |
of using an out-of-domain
<term>
bilingual
|
corpus
|
</term>
and the possibility of using the
<term>
|
#3143
The two evaluation measures of the BLEU score and the NIST score demonstrated the effect of using an out-of-domain bilingual corpus and the possibility of using the language model. |
lr,19-4-N03-1012,bq |
successfully classifies 73.2 % in a
<term>
German
|
corpus
|
</term>
of 2.284
<term>
SRHs
</term>
as either
|
#2521
An evaluation of our system against the annotated data shows that, it successfully classifies 73.2% in a German corpus of 2.284 SRHs as either coherent or incoherent (given a baseline of 54.55%). |
lr,19-5-C90-3063,bq |
that were randomly selected from the
<term>
|
corpus
|
</term>
. The results of the experiment show
|
#16689
An experiment was performed to resolve references of the pronoun it in sentences that were randomly selected from thecorpus. |
lr,19-5-J05-4003,bq |
starting with a very small
<term>
parallel
|
corpus
|
</term>
( 100,000
<term>
words
</term>
) and
|
#9088
We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. |
lr,2-3-I05-4010,bq |
in detail . The resultant
<term>
bilingual
|
corpus
|
</term>
, 10.4 M
<term>
English words
</term>
|
#8255
The resultant bilingual corpus, 10.4M English words and 18.3M Chinese characters, is an authoritative and comprehensive text collection covering the specific and special domain of HK laws. |
lr,20-1-N03-2006,bq |
, we use an out-of-domain
<term>
bilingual
|
corpus
|
</term>
and , in addition , the
<term>
language
|
#3100
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr,21-5-P03-1051,bq |
million
<term>
word
</term><term>
unsegmented
|
corpus
|
</term>
, and re-estimate the
<term>
model
|
#4728
To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus. |
lr,22-2-P03-1050,bq |
a small ( 10K sentences )
<term>
parallel
|
corpus
|
</term>
as its sole
<term>
training resources
|
#4469
The stemming model is based on statistical machine translation and it uses an English stemmer and a small (10K sentences) parallel corpus as its sole training resources. |