lr,33-1-N03-2006,bq |
model
</term>
of an in-domain
<term>
monolingual
|
corpus
|
</term>
. We conducted experiments with an
|
#3113
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr-prod,17-4-H92-1074,bq |
definition and development of the
<term>
CSR pilot
|
corpus
|
</term>
, and examines the dynamic challenge
|
#19620
This paper presents an overview of the CSR corpus, reviews the definition and development of the CSR pilot corpus, and examines the dynamic challenge of extending the CSR corpus to meet future needs. |
lr,28-2-P03-1051,bq |
</term>
from a large
<term>
unsegmented Arabic
|
corpus
|
</term>
. The
<term>
algorithm
</term>
uses a
|
#4668
Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus. |
lr,15-6-P03-1051,bq |
exact match accuracy
</term>
on a
<term>
test
|
corpus
|
</term>
containing 28,449
<term>
word tokens
|
#4759
The resulting Arabic word segmentation system achieves around 97% exact match accuracy on a test corpus containing 28,449 word tokens. |
lr,2-3-I05-4010,bq |
in detail . The resultant
<term>
bilingual
|
corpus
|
</term>
, 10.4 M
<term>
English words
</term>
|
#8255
The resultant bilingual corpus, 10.4M English words and 18.3M Chinese characters, is an authoritative and comprehensive text collection covering the specific and special domain of HK laws. |
lr,27-4-P06-2001,bq |
</term>
to train , that is , a bigger
<term>
|
corpus
|
</term>
written by one unique
<term>
author
|
#11309
Finally, we have shown that these results can be improved using a bigger and a more homogeneous corpus to train, that is, a biggercorpus written by one unique author. |
lr,9-1-P03-1068,bq |
of a large ,
<term>
semantically annotated
|
corpus
|
</term>
resource as reliable basis for the
|
#4943
We describe the ongoing construction of a large, semantically annotated corpus resource as reliable basis for the large-scale acquisition of word-semantic information, e.g. the construction of domain-independent lexica. |
lr,8-1-P06-2059,bq |
method of building
<term>
polarity-tagged
|
corpus
|
</term>
from
<term>
HTML documents
</term>
.
|
#11401
This paper proposes a novel method of building polarity-tagged corpus from HTML documents. |
lr,29-5-J05-4003,bq |
and exploiting a large
<term>
non-parallel
|
corpus
|
</term>
. Thus , our method can be applied
|
#9098
We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. |
lr-prod,7-4-H92-1074,bq |
paper presents an overview of the
<term>
CSR
|
corpus
|
</term>
, reviews the definition and development
|
#19609
This paper presents an overview of the CSR corpus, reviews the definition and development of the CSR pilot corpus, and examines the dynamic challenge of extending the CSR corpus to meet future needs. |
lr,9-2-P06-2001,bq |
experiments , and trained with a little
<term>
|
corpus
|
</term>
of 100,000
<term>
words
</term>
, the
|
#11236
After several experiments, and trained with a littlecorpus of 100,000 words, the system guesses correctly not placing commas with a precision of 96% and a recall of 98%. |
lr,15-2-C90-3063,bq |
co-occurrence patterns
</term>
in a large
<term>
|
corpus
|
</term>
. To a large extent , these
<term>
|
#16631
This paper presents an automatic scheme for collecting statistics on co-occurrence patterns in a largecorpus. |
lr,23-2-C04-1116,bq |
each author 's text as a coherent
<term>
|
corpus
|
</term>
. Our approach is based on the idea
|
#6137
This paper proposes a new methodology to improve the accuracy of a term aggregation system using each author's text as a coherentcorpus. |
lr,13-1-N03-2006,bq |
</term>
based on a small-sized
<term>
bilingual
|
corpus
|
</term>
, we use an out-of-domain
<term>
bilingual
|
#3093
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr,1-2-H92-1074,bq |
of the art in
<term>
CSR
</term>
. This
<term>
|
corpus
|
</term>
essentially supersedes the now old
|
#19554
Thiscorpus essentially supersedes the now old Resource Management (RM) corpus that has fueled DARPA speech recognition technology development for the past 5 years. |
lr-prod,2-3-H92-1074,bq |
for the past 5 years . The new
<term>
CSR
|
corpus
|
</term>
supports research on major new problems
|
#19583
The new CSR corpus supports research on major new problems including unlimited vocabulary, natural grammar, and spontaneous speech. |
lr,19-5-J05-4003,bq |
starting with a very small
<term>
parallel
|
corpus
|
</term>
( 100,000
<term>
words
</term>
) and
|
#9088
We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. |
lr,6-3-C04-1106,bq |
experiments conducted on a
<term>
multilingual
|
corpus
|
</term>
to estimate the number of
<term>
analogies
|
#5916
We report experiments conducted on a multilingual corpus to estimate the number of analogies among the sentences that it contains. |
lr,6-3-P06-1052,bq |
</term>
. We evaluate the algorithm on a
<term>
|
corpus
|
</term>
, and show that it reduces the degree
|
#11183
We evaluate the algorithm on acorpus, and show that it reduces the degree of ambiguity significantly while taking negligible runtime. |
lr,19-5-C90-3063,bq |
that were randomly selected from the
<term>
|
corpus
|
</term>
. The results of the experiment show
|
#16689
An experiment was performed to resolve references of the pronoun it in sentences that were randomly selected from thecorpus. |