lr,17-4-C04-1116,bq |
context features
</term>
in each author 's
<term>
|
corpus
|
</term>
tend not to be
<term>
synonymous expressions
|
#6175
According to our assumption, most of the words with similar context features in each author'scorpus tend not to be synonymous expressions. |
lr,50-3-C04-1147,bq |
phrases
</term>
at any distance in the
<term>
|
corpus
|
</term>
. The framework is flexible , allowing
|
#6400
In comparison with previous models, which either use arbitrary windows to compute similarity between words or use lexical affinity to create sequential models, in this paper we focus on models intended to capture the co-occurrence patterns of any pair of words or phrases at any distance in thecorpus. |
lr,7-5-C04-1147,bq |
apply it in combination with a
<term>
terabyte
|
corpus
|
</term>
to answer
<term>
natural language tests
|
#6425
We apply it in combination with a terabyte corpus to answer natural language tests, achieving encouraging results. |
lr,30-2-C04-1192,bq |
for the
<term>
languages
</term>
in the
<term>
|
corpus
|
</term>
. The
<term>
wordnets
</term>
are aligned
|
#6480
The method exploits recent advances in word alignment and word clustering based on automatic extraction of translation equivalents and being supported by available aligned wordnets for the languages in thecorpus. |
lr,2-3-I05-4010,bq |
in detail . The resultant
<term>
bilingual
|
corpus
|
</term>
, 10.4 M
<term>
English words
</term>
|
#8255
The resultant bilingual corpus, 10.4M English words and 18.3M Chinese characters, is an authoritative and comprehensive text collection covering the specific and special domain of HK laws. |
lr,19-5-J05-4003,bq |
starting with a very small
<term>
parallel
|
corpus
|
</term>
( 100,000
<term>
words
</term>
) and
|
#9088
We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. |
lr,29-5-J05-4003,bq |
and exploiting a large
<term>
non-parallel
|
corpus
|
</term>
. Thus , our method can be applied
|
#9098
We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. |
lr,3-3-P05-1034,bq |
component
</term>
. We align a
<term>
parallel
|
corpus
|
</term>
, project the
<term>
source dependency
|
#9248
We align a parallel corpus, project the source dependency parse onto the target sentence, extract dependency treelet translation pairs, and train a tree-based ordering model. |
lr,11-4-P05-1074,bq |
extracted from a
<term>
bilingual parallel
|
corpus
|
</term>
to be ranked using
<term>
translation
|
#9729
We define a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities, and show how it can be refined to take contextual information into account. |
lr,7-2-P05-2016,bq |
required is a
<term>
sentence-aligned parallel
|
corpus
|
</term>
. All other
<term>
resources
</term>
|
#9803
The only bilingual resource required is a sentence-aligned parallel corpus. |
tech,4-1-N06-4001,bq |
strategies . We introduce a new
<term>
interactive
|
corpus
|
exploration tool
</term>
called
<term>
InfoMagnets
|
#10870
We introduce a new interactive corpus exploration tool called InfoMagnets. |
tech,4-2-N06-4001,bq |
InfoMagnets
</term>
aims at making
<term>
exploratory
|
corpus
|
analysis
</term>
accessible to researchers
|
#10881
InfoMagnets aims at making exploratory corpus analysis accessible to researchers who are not experts in text mining. |
lr,6-3-P06-1052,bq |
</term>
. We evaluate the algorithm on a
<term>
|
corpus
|
</term>
, and show that it reduces the degree
|
#11183
We evaluate the algorithm on acorpus, and show that it reduces the degree of ambiguity significantly while taking negligible runtime. |
lr,9-2-P06-2001,bq |
experiments , and trained with a little
<term>
|
corpus
|
</term>
of 100,000
<term>
words
</term>
, the
|
#11236
After several experiments, and trained with a littlecorpus of 100,000 words, the system guesses correctly not placing commas with a precision of 96% and a recall of 98%. |
lr,18-4-P06-2001,bq |
using a bigger and a more homogeneous
<term>
|
corpus
|
</term>
to train , that is , a bigger
<term>
|
#11300
Finally, we have shown that these results can be improved using a bigger and a more homogeneouscorpus to train, that is, a bigger corpus written by one unique author. |
lr,27-4-P06-2001,bq |
</term>
to train , that is , a bigger
<term>
|
corpus
|
</term>
written by one unique
<term>
author
|
#11309
Finally, we have shown that these results can be improved using a bigger and a more homogeneous corpus to train, that is, a biggercorpus written by one unique author. |
lr,8-1-P06-2059,bq |
method of building
<term>
polarity-tagged
|
corpus
|
</term>
from
<term>
HTML documents
</term>
.
|
#11401
This paper proposes a novel method of building polarity-tagged corpus from HTML documents. |
lr,9-5-P06-2059,bq |
experiment , the method could construct a
<term>
|
corpus
|
</term>
consisting of 126,610
<term>
sentences
|
#11464
In our experiment, the method could construct acorpus consisting of 126,610 sentences. |
lr,29-2-C88-2130,bq |
</term>
derived through analysis of our
<term>
|
corpus
|
</term>
.
<term>
Chart parsing
</term>
is
<term>
|
#15495
The model is embodied in a program, APT, that can reproduce segments of actual tape-recorded descriptions, using organizational and discourse strategies derived through analysis of ourcorpus. |
lr,15-2-C90-3063,bq |
co-occurrence patterns
</term>
in a large
<term>
|
corpus
|
</term>
. To a large extent , these
<term>
|
#16631
This paper presents an automatic scheme for collecting statistics on co-occurrence patterns in a largecorpus. |