lr,3-3-H92-1026,bq |
process
</term>
in a novel way . We use a
<term>
|
corpus
|
of bracketed sentences
</term>
, called a
|
#18946
We use acorpus of bracketed sentences, called a Treebank, in combination with decision tree building to tease out the relevant aspects of a parse tree that will determine the correct parse of a sentence. |
lr,12-2-P01-1008,bq |
identification of paraphrases
</term>
from a
<term>
|
corpus
|
of multiple English translations
</term>
|
#1789
We present an unsupervised learning algorithm for identification of paraphrases from acorpus of multiple English translations of the same source text. |
lr,25-7-P03-1051,bq |
can create a small
<term>
manually segmented
|
corpus
|
</term>
of the
<term>
language
</term>
of interest
|
#4792
We believe this is a state-of-the-art performance and the algorithm can be used for many highly inflected languages provided that one can create a small manually segmented corpus of the language of interest. |
lr-prod,1-1-H92-1074,bq |
<term>
CSR ( Connected Speech Recognition )
|
corpus
|
</term>
represents a new
<term>
DARPA speech
|
#19533
The CSR (Connected Speech Recognition) corpus represents a new DARPA speech recognition technology development initiative to advance the state of the art in CSR. |
lr,9-1-P03-1068,bq |
of a large ,
<term>
semantically annotated
|
corpus
|
</term>
resource as reliable basis for the
|
#4943
We describe the ongoing construction of a large, semantically annotated corpus resource as reliable basis for the large-scale acquisition of word-semantic information, e.g. the construction of domain-independent lexica. |
lr-prod,2-3-H92-1074,bq |
for the past 5 years . The new
<term>
CSR
|
corpus
|
</term>
supports research on major new problems
|
#19583
The new CSR corpus supports research on major new problems including unlimited vocabulary, natural grammar, and spontaneous speech. |
lr,17-4-C04-1116,bq |
context features
</term>
in each author 's
<term>
|
corpus
|
</term>
tend not to be
<term>
synonymous expressions
|
#6175
According to our assumption, most of the words with similar context features in each author'scorpus tend not to be synonymous expressions. |
lr-prod,7-2-H92-1074,bq |
now old
<term>
Resource Management ( RM )
|
corpus
|
</term>
that has fueled
<term>
DARPA speech
|
#19565
This corpus essentially supersedes the now old Resource Management (RM) corpus that has fueled DARPA speech recognition technology development for the past 5 years. |
lr,7-5-C04-1147,bq |
apply it in combination with a
<term>
terabyte
|
corpus
|
</term>
to answer
<term>
natural language tests
|
#6425
We apply it in combination with a terabyte corpus to answer natural language tests, achieving encouraging results. |
lr,11-4-P05-1074,bq |
extracted from a
<term>
bilingual parallel
|
corpus
|
</term>
to be ranked using
<term>
translation
|
#9729
We define a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities, and show how it can be refined to take contextual information into account. |
lr,6-3-C04-1106,bq |
experiments conducted on a
<term>
multilingual
|
corpus
|
</term>
to estimate the number of
<term>
analogies
|
#5916
We report experiments conducted on a multilingual corpus to estimate the number of analogies among the sentences that it contains. |
lr,52-3-A94-1011,bq |
, and does not require a
<term>
pre-tagged
|
corpus
|
</term>
to fit . One of the distinguishing
|
#19998
A novel method for adding linguistic annotation to corpora is presented which involves using a statistical POS tagger in conjunction with unsupervised structure finding methods to derive notions of noun group, verb group, and so on which is inherently extensible to more sophisticated annotation, and does not require a pre-tagged corpus to fit. |
lr-prod,29-4-H92-1074,bq |
dynamic challenge of extending the
<term>
CSR
|
corpus
|
</term>
to meet future needs .
<term>
Language
|
#19631
This paper presents an overview of the CSR corpus, reviews the definition and development of the CSR pilot corpus, and examines the dynamic challenge of extending the CSR corpus to meet future needs. |
lr,18-4-P06-2001,bq |
using a bigger and a more homogeneous
<term>
|
corpus
|
</term>
to train , that is , a bigger
<term>
|
#11300
Finally, we have shown that these results can be improved using a bigger and a more homogeneouscorpus to train, that is, a bigger corpus written by one unique author. |
lr,27-4-P06-2001,bq |
</term>
to train , that is , a bigger
<term>
|
corpus
|
</term>
written by one unique
<term>
author
|
#11309
Finally, we have shown that these results can be improved using a bigger and a more homogeneous corpus to train, that is, a biggercorpus written by one unique author. |