lr,3-3-P05-1034,bq |
component
</term>
. We align a
<term>
parallel
|
corpus
|
</term>
, project the
<term>
source dependency
|
#9248
We align a parallel corpus, project the source dependency parse onto the target sentence, extract dependency treelet translation pairs, and train a tree-based ordering model. |
lr,52-3-A94-1011,bq |
, and does not require a
<term>
pre-tagged
|
corpus
|
</term>
to fit . One of the distinguishing
|
#19998
A novel method for adding linguistic annotation to corpora is presented which involves using a statistical POS tagger in conjunction with unsupervised structure finding methods to derive notions of noun group, verb group, and so on which is inherently extensible to more sophisticated annotation, and does not require a pre-tagged corpus to fit. |
lr,50-3-C04-1147,bq |
phrases
</term>
at any distance in the
<term>
|
corpus
|
</term>
. The framework is flexible , allowing
|
#6400
In comparison with previous models, which either use arbitrary windows to compute similarity between words or use lexical affinity to create sequential models, in this paper we focus on models intended to capture the co-occurrence patterns of any pair of words or phrases at any distance in thecorpus. |
lr,30-2-C04-1192,bq |
for the
<term>
languages
</term>
in the
<term>
|
corpus
|
</term>
. The
<term>
wordnets
</term>
are aligned
|
#6480
The method exploits recent advances in word alignment and word clustering based on automatic extraction of translation equivalents and being supported by available aligned wordnets for the languages in thecorpus. |
lr,16-6-H90-1060,bq |
adaptation ( SA )
</term>
using the new
<term>
SI
|
corpus
|
</term>
and a small amount of
<term>
speech
|
#17136
Second, we show a significant improvement for speaker adaptation (SA) using the new SI corpus and a small amount of speech from the new (target) speaker. |
lr,6-1-H92-1003,bq |
recently collected
<term>
spoken language
|
corpus
|
</term>
for the
<term>
ATIS ( Air Travel Information
|
#18532
This paper describes a recently collected spoken language corpus for the ATIS (Air Travel Information System) domain. |
lr,29-5-J05-4003,bq |
and exploiting a large
<term>
non-parallel
|
corpus
|
</term>
. Thus , our method can be applied
|
#9098
We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. |
lr,8-1-P06-2059,bq |
method of building
<term>
polarity-tagged
|
corpus
|
</term>
from
<term>
HTML documents
</term>
.
|
#11401
This paper proposes a novel method of building polarity-tagged corpus from HTML documents. |
lr,12-4-C92-1055,bq |
possible variations between the
<term>
training
|
corpus
|
</term>
and the real tasks are also taken
|
#17893
To make the proposed algorithm robust, the possible variations between the training corpus and the real tasks are also taken into consideration by enlarging the separation margin between the correct candidate and its competing members. |
lr,6-3-C04-1106,bq |
experiments conducted on a
<term>
multilingual
|
corpus
|
</term>
to estimate the number of
<term>
analogies
|
#5916
We report experiments conducted on a multilingual corpus to estimate the number of analogies among the sentences that it contains. |
lr,1-2-H92-1074,bq |
of the art in
<term>
CSR
</term>
. This
<term>
|
corpus
|
</term>
essentially supersedes the now old
|
#19554
Thiscorpus essentially supersedes the now old Resource Management (RM) corpus that has fueled DARPA speech recognition technology development for the past 5 years. |
lr,9-4-P03-1051,bq |
estimated from a small
<term>
manually segmented
|
corpus
|
</term>
of about 110,000
<term>
words
</term>
|
#4700
The language model is initially estimated from a small manually segmented corpus of about 110,000 words. |
lr-prod,15-3-H94-1014,bq |
word
</term><term>
Wall Street Journal text
|
corpus
|
</term>
. Using the
<term>
BU recognition system
|
#21261
The models were constructed using a 5K vocabulary and trained using a 76 million word Wall Street Journal text corpus. |
lr,21-5-P03-1051,bq |
million
<term>
word
</term><term>
unsegmented
|
corpus
|
</term>
, and re-estimate the
<term>
model
|
#4728
To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus. |
lr-prod,7-2-H92-1074,bq |
now old
<term>
Resource Management ( RM )
|
corpus
|
</term>
that has fueled
<term>
DARPA speech
|
#19565
This corpus essentially supersedes the now old Resource Management (RM) corpus that has fueled DARPA speech recognition technology development for the past 5 years. |
lr,11-4-P05-1074,bq |
extracted from a
<term>
bilingual parallel
|
corpus
|
</term>
to be ranked using
<term>
translation
|
#9729
We define a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities, and show how it can be refined to take contextual information into account. |
lr,19-3-N03-2006,bq |
of using an out-of-domain
<term>
bilingual
|
corpus
|
</term>
and the possibility of using the
<term>
|
#3143
The two evaluation measures of the BLEU score and the NIST score demonstrated the effect of using an out-of-domain bilingual corpus and the possibility of using the language model. |
lr,15-2-C90-3063,bq |
co-occurrence patterns
</term>
in a large
<term>
|
corpus
|
</term>
. To a large extent , these
<term>
|
#16631
This paper presents an automatic scheme for collecting statistics on co-occurrence patterns in a largecorpus. |
other,15-1-P03-1009,bq |
classes
</term>
from undisambiguated
<term>
|
corpus
|
data
</term>
. We describe a new approach
|
#3899
Previous research has demonstrated the utility of clustering in inducing semantic verb classes from undisambiguatedcorpus data. |
lr,3-3-H92-1026,bq |
process
</term>
in a novel way . We use a
<term>
|
corpus
|
of bracketed sentences
</term>
, called a
|
#18946
We use acorpus of bracketed sentences, called a Treebank, in combination with decision tree building to tease out the relevant aspects of a parse tree that will determine the correct parse of a sentence. |