lr-prod,1-1-H92-1074,bq |
<term>
CSR ( Connected Speech Recognition )
|
corpus
|
</term>
represents a new
<term>
DARPA speech
|
#19533
The CSR (Connected Speech Recognition) corpus represents a new DARPA speech recognition technology development initiative to advance the state of the art in CSR. |
lr-prod,7-2-H92-1074,bq |
now old
<term>
Resource Management ( RM )
|
corpus
|
</term>
that has fueled
<term>
DARPA speech
|
#19565
This corpus essentially supersedes the now old Resource Management (RM) corpus that has fueled DARPA speech recognition technology development for the past 5 years. |
lr,9-5-P06-2059,bq |
experiment , the method could construct a
<term>
|
corpus
|
</term>
consisting of 126,610
<term>
sentences
|
#11464
In our experiment, the method could construct acorpus consisting of 126,610 sentences. |
lr,12-2-P01-1008,bq |
identification of paraphrases
</term>
from a
<term>
|
corpus
|
of multiple English translations
</term>
|
#1789
We present an unsupervised learning algorithm for identification of paraphrases from acorpus of multiple English translations of the same source text. |
lr,6-3-P06-1052,bq |
</term>
. We evaluate the algorithm on a
<term>
|
corpus
|
</term>
, and show that it reduces the degree
|
#11183
We evaluate the algorithm on acorpus, and show that it reduces the degree of ambiguity significantly while taking negligible runtime. |
lr,10-5-N03-2025,bq |
Markov Model
</term>
is trained on a
<term>
|
corpus
|
</term>
automatically tagged by the first
|
#3368
Then, a Hidden Markov Model is trained on acorpus automatically tagged by the first learner. |
lr,3-3-H92-1026,bq |
process
</term>
in a novel way . We use a
<term>
|
corpus
|
of bracketed sentences
</term>
, called a
|
#18946
We use acorpus of bracketed sentences, called a Treebank, in combination with decision tree building to tease out the relevant aspects of a parse tree that will determine the correct parse of a sentence. |
lr,9-1-P03-1068,bq |
of a large ,
<term>
semantically annotated
|
corpus
|
</term>
resource as reliable basis for the
|
#4943
We describe the ongoing construction of a large, semantically annotated corpus resource as reliable basis for the large-scale acquisition of word-semantic information, e.g. the construction of domain-independent lexica. |
lr,7-2-P03-1051,bq |
by a small
<term>
manually segmented Arabic
|
corpus
|
</term>
and uses it to bootstrap an
<term>
|
#4648
Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus. |
lr,28-2-P03-1051,bq |
</term>
from a large
<term>
unsegmented Arabic
|
corpus
|
</term>
. The
<term>
algorithm
</term>
uses a
|
#4668
Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus. |
lr,27-4-P06-2001,bq |
</term>
to train , that is , a bigger
<term>
|
corpus
|
</term>
written by one unique
<term>
author
|
#11309
Finally, we have shown that these results can be improved using a bigger and a more homogeneous corpus to train, that is, a biggercorpus written by one unique author. |
lr,20-1-N03-2006,bq |
, we use an out-of-domain
<term>
bilingual
|
corpus
|
</term>
and , in addition , the
<term>
language
|
#3100
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr,19-3-N03-2006,bq |
of using an out-of-domain
<term>
bilingual
|
corpus
|
</term>
and the possibility of using the
<term>
|
#3143
The two evaluation measures of the BLEU score and the NIST score demonstrated the effect of using an out-of-domain bilingual corpus and the possibility of using the language model. |
lr,2-3-I05-4010,bq |
in detail . The resultant
<term>
bilingual
|
corpus
|
</term>
, 10.4 M
<term>
English words
</term>
|
#8255
The resultant bilingual corpus, 10.4M English words and 18.3M Chinese characters, is an authoritative and comprehensive text collection covering the specific and special domain of HK laws. |
lr,13-1-N03-2006,bq |
</term>
based on a small-sized
<term>
bilingual
|
corpus
|
</term>
, we use an out-of-domain
<term>
bilingual
|
#3093
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr,23-2-C04-1116,bq |
each author 's text as a coherent
<term>
|
corpus
|
</term>
. Our approach is based on the idea
|
#6137
This paper proposes a new methodology to improve the accuracy of a term aggregation system using each author's text as a coherentcorpus. |
lr-prod,2-3-H92-1074,bq |
for the past 5 years . The new
<term>
CSR
|
corpus
|
</term>
supports research on major new problems
|
#19583
The new CSR corpus supports research on major new problems including unlimited vocabulary, natural grammar, and spontaneous speech. |
lr-prod,29-4-H92-1074,bq |
dynamic challenge of extending the
<term>
CSR
|
corpus
|
</term>
to meet future needs .
<term>
Language
|
#19631
This paper presents an overview of the CSR corpus, reviews the definition and development of the CSR pilot corpus, and examines the dynamic challenge of extending the CSR corpus to meet future needs. |
lr-prod,7-4-H92-1074,bq |
paper presents an overview of the
<term>
CSR
|
corpus
|
</term>
, reviews the definition and development
|
#19609
This paper presents an overview of the CSR corpus, reviews the definition and development of the CSR pilot corpus, and examines the dynamic challenge of extending the CSR corpus to meet future needs. |
tech,4-2-N06-4001,bq |
InfoMagnets
</term>
aims at making
<term>
exploratory
|
corpus
|
analysis
</term>
accessible to researchers
|
#10881
InfoMagnets aims at making exploratory corpus analysis accessible to researchers who are not experts in text mining. |