lr,33-1-N03-2006,bq |
model
</term>
of an in-domain
<term>
monolingual
|
corpus
|
</term>
. We conducted experiments with an
|
#3113
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr,19-5-J05-4003,bq |
starting with a very small
<term>
parallel
|
corpus
|
</term>
( 100,000
<term>
words
</term>
) and
|
#9088
We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. |
tech,4-2-N06-4001,bq |
InfoMagnets
</term>
aims at making
<term>
exploratory
|
corpus
|
analysis
</term>
accessible to researchers
|
#10881
InfoMagnets aims at making exploratory corpus analysis accessible to researchers who are not experts in text mining. |
lr,20-1-N03-2006,bq |
, we use an out-of-domain
<term>
bilingual
|
corpus
|
</term>
and , in addition , the
<term>
language
|
#3100
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr,16-6-H90-1060,bq |
adaptation ( SA )
</term>
using the new
<term>
SI
|
corpus
|
</term>
and a small amount of
<term>
speech
|
#17136
Second, we show a significant improvement for speaker adaptation (SA) using the new SI corpus and a small amount of speech from the new (target) speaker. |
lr,19-3-N03-2006,bq |
of using an out-of-domain
<term>
bilingual
|
corpus
|
</term>
and the possibility of using the
<term>
|
#3143
The two evaluation measures of the BLEU score and the NIST score demonstrated the effect of using an out-of-domain bilingual corpus and the possibility of using the language model. |
lr,12-4-C92-1055,bq |
possible variations between the
<term>
training
|
corpus
|
</term>
and the real tasks are also taken
|
#17893
To make the proposed algorithm robust, the possible variations between the training corpus and the real tasks are also taken into consideration by enlarging the separation margin between the correct candidate and its competing members. |
lr,7-2-P03-1051,bq |
by a small
<term>
manually segmented Arabic
|
corpus
|
</term>
and uses it to bootstrap an
<term>
|
#4648
Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus. |
lr,22-2-P03-1050,bq |
a small ( 10K sentences )
<term>
parallel
|
corpus
|
</term>
as its sole
<term>
training resources
|
#4469
The stemming model is based on statistical machine translation and it uses an English stemmer and a small (10K sentences) parallel corpus as its sole training resources. |
lr,10-5-N03-2025,bq |
Markov Model
</term>
is trained on a
<term>
|
corpus
|
</term>
automatically tagged by the first
|
#3368
Then, a Hidden Markov Model is trained on acorpus automatically tagged by the first learner. |
lr,9-5-P06-2059,bq |
experiment , the method could construct a
<term>
|
corpus
|
</term>
consisting of 126,610
<term>
sentences
|
#11464
In our experiment, the method could construct acorpus consisting of 126,610 sentences. |
lr,15-6-P03-1051,bq |
exact match accuracy
</term>
on a
<term>
test
|
corpus
|
</term>
containing 28,449
<term>
word tokens
|
#4759
The resulting Arabic word segmentation system achieves around 97% exact match accuracy on a test corpus containing 28,449 word tokens. |
other,15-1-P03-1009,bq |
classes
</term>
from undisambiguated
<term>
|
corpus
|
data
</term>
. We describe a new approach
|
#3899
Previous research has demonstrated the utility of clustering in inducing semantic verb classes from undisambiguatedcorpus data. |
lr,1-2-H92-1074,bq |
of the art in
<term>
CSR
</term>
. This
<term>
|
corpus
|
</term>
essentially supersedes the now old
|
#19554
Thiscorpus essentially supersedes the now old Resource Management (RM) corpus that has fueled DARPA speech recognition technology development for the past 5 years. |
tech,4-1-N06-4001,bq |
strategies . We introduce a new
<term>
interactive
|
corpus
|
exploration tool
</term>
called
<term>
InfoMagnets
|
#10870
We introduce a new interactive corpus exploration tool called InfoMagnets. |
lr,6-1-H92-1003,bq |
recently collected
<term>
spoken language
|
corpus
|
</term>
for the
<term>
ATIS ( Air Travel Information
|
#18532
This paper describes a recently collected spoken language corpus for the ATIS (Air Travel Information System) domain. |
lr,8-1-P06-2059,bq |
method of building
<term>
polarity-tagged
|
corpus
|
</term>
from
<term>
HTML documents
</term>
.
|
#11401
This paper proposes a novel method of building polarity-tagged corpus from HTML documents. |
lr,9-2-P06-2001,bq |
experiments , and trained with a little
<term>
|
corpus
|
</term>
of 100,000
<term>
words
</term>
, the
|
#11236
After several experiments, and trained with a littlecorpus of 100,000 words, the system guesses correctly not placing commas with a precision of 96% and a recall of 98%. |
lr,19-4-N03-1012,bq |
successfully classifies 73.2 % in a
<term>
German
|
corpus
|
</term>
of 2.284
<term>
SRHs
</term>
as either
|
#2521
An evaluation of our system against the annotated data shows that, it successfully classifies 73.2% in a German corpus of 2.284 SRHs as either coherent or incoherent (given a baseline of 54.55%). |
lr,9-4-P03-1051,bq |
estimated from a small
<term>
manually segmented
|
corpus
|
</term>
of about 110,000
<term>
words
</term>
|
#4700
The language model is initially estimated from a small manually segmented corpus of about 110,000 words. |