lr,12-4-C92-1055,bq |
possible variations between the
<term>
training
|
corpus
|
</term>
and the real tasks are also taken
|
#17893
To make the proposed algorithm robust, the possible variations between the training corpus and the real tasks are also taken into consideration by enlarging the separation margin between the correct candidate and its competing members. |
lr-prod,7-4-H92-1074,bq |
paper presents an overview of the
<term>
CSR
|
corpus
|
</term>
, reviews the definition and development
|
#19609
This paper presents an overview of the CSR corpus, reviews the definition and development of the CSR pilot corpus, and examines the dynamic challenge of extending the CSR corpus to meet future needs. |
lr,19-2-N03-4010,bq |
candidates
</term>
from the given
<term>
text
|
corpus
|
</term>
. The operation of the
<term>
system
|
#3682
The demonstration will focus on how JAVELIN processes questions and retrieves the most likely answer candidates from the given text corpus. |
lr,19-5-J05-4003,bq |
starting with a very small
<term>
parallel
|
corpus
|
</term>
( 100,000
<term>
words
</term>
) and
|
#9088
We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. |
lr,29-2-C88-2130,bq |
</term>
derived through analysis of our
<term>
|
corpus
|
</term>
.
<term>
Chart parsing
</term>
is
<term>
|
#15495
The model is embodied in a program, APT, that can reproduce segments of actual tape-recorded descriptions, using organizational and discourse strategies derived through analysis of ourcorpus. |
lr,6-3-P06-1052,bq |
</term>
. We evaluate the algorithm on a
<term>
|
corpus
|
</term>
, and show that it reduces the degree
|
#11183
We evaluate the algorithm on acorpus, and show that it reduces the degree of ambiguity significantly while taking negligible runtime. |
lr,20-1-N03-2006,bq |
, we use an out-of-domain
<term>
bilingual
|
corpus
|
</term>
and , in addition , the
<term>
language
|
#3100
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr,7-2-P03-1051,bq |
by a small
<term>
manually segmented Arabic
|
corpus
|
</term>
and uses it to bootstrap an
<term>
|
#4648
Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus. |
lr,3-3-H92-1026,bq |
process
</term>
in a novel way . We use a
<term>
|
corpus
|
of bracketed sentences
</term>
, called a
|
#18946
We use acorpus of bracketed sentences, called a Treebank, in combination with decision tree building to tease out the relevant aspects of a parse tree that will determine the correct parse of a sentence. |
lr-prod,7-2-H92-1074,bq |
now old
<term>
Resource Management ( RM )
|
corpus
|
</term>
that has fueled
<term>
DARPA speech
|
#19565
This corpus essentially supersedes the now old Resource Management (RM) corpus that has fueled DARPA speech recognition technology development for the past 5 years. |
lr,6-1-H92-1003,bq |
recently collected
<term>
spoken language
|
corpus
|
</term>
for the
<term>
ATIS ( Air Travel Information
|
#18532
This paper describes a recently collected spoken language corpus for the ATIS (Air Travel Information System) domain. |
lr,8-1-P06-2059,bq |
method of building
<term>
polarity-tagged
|
corpus
|
</term>
from
<term>
HTML documents
</term>
.
|
#11401
This paper proposes a novel method of building polarity-tagged corpus from HTML documents. |
lr,22-2-P03-1050,bq |
a small ( 10K sentences )
<term>
parallel
|
corpus
|
</term>
as its sole
<term>
training resources
|
#4469
The stemming model is based on statistical machine translation and it uses an English stemmer and a small (10K sentences) parallel corpus as its sole training resources. |
lr,11-4-P05-1074,bq |
extracted from a
<term>
bilingual parallel
|
corpus
|
</term>
to be ranked using
<term>
translation
|
#9729
We define a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities, and show how it can be refined to take contextual information into account. |
lr-prod,2-3-H92-1074,bq |
for the past 5 years . The new
<term>
CSR
|
corpus
|
</term>
supports research on major new problems
|
#19583
The new CSR corpus supports research on major new problems including unlimited vocabulary, natural grammar, and spontaneous speech. |
lr-prod,29-4-H92-1074,bq |
dynamic challenge of extending the
<term>
CSR
|
corpus
|
</term>
to meet future needs .
<term>
Language
|
#19631
This paper presents an overview of the CSR corpus, reviews the definition and development of the CSR pilot corpus, and examines the dynamic challenge of extending the CSR corpus to meet future needs. |
lr,29-5-J05-4003,bq |
and exploiting a large
<term>
non-parallel
|
corpus
|
</term>
. Thus , our method can be applied
|
#9098
We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. |
lr,21-5-P03-1051,bq |
million
<term>
word
</term><term>
unsegmented
|
corpus
|
</term>
, and re-estimate the
<term>
model
|
#4728
To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus. |
lr,52-3-A94-1011,bq |
, and does not require a
<term>
pre-tagged
|
corpus
|
</term>
to fit . One of the distinguishing
|
#19998
A novel method for adding linguistic annotation to corpora is presented which involves using a statistical POS tagger in conjunction with unsupervised structure finding methods to derive notions of noun group, verb group, and so on which is inherently extensible to more sophisticated annotation, and does not require a pre-tagged corpus to fit. |
lr,9-1-P03-1068,bq |
of a large ,
<term>
semantically annotated
|
corpus
|
</term>
resource as reliable basis for the
|
#4943
We describe the ongoing construction of a large, semantically annotated corpus resource as reliable basis for the large-scale acquisition of word-semantic information, e.g. the construction of domain-independent lexica. |