other,18-4-H01-1042,bq |
language essays
</term>
in less than 100
<term>
|
words
|
</term>
. Even more illuminating was the
|
#646
A language learning experiment showed that assessors can differentiate native from non-native language essays in less than 100words. |
other,11-1-P01-1009,bq |
analysis
</term>
for a large class of
<term>
|
words
|
</term>
called
<term>
alternative markers
</term>
|
#1826
This paper presents a formal analysis for a large class ofwords called alternative markers, which includes other (than), such (as), and besides. |
other,1-2-P01-1009,bq |
</term>
, and
<term>
besides
</term>
. These
<term>
|
words
|
</term>
appear frequently enough in
<term>
|
#1847
Thesewords appear frequently enough in dialog to warrant serious attention, yet present natural language search engines perform poorly on queries containing them. |
other,7-4-N03-1017,bq |
<term>
phrases
</term>
longer than three
<term>
|
words
|
</term>
and learning
<term>
phrases
</term>
from
|
#2637
Surprisingly, learning phrases longer than threewords and learning phrases from high-accuracy word-level alignment models does not have a strong impact on performance. |
tech,40-1-N03-1033,bq |
jointly conditioning on multiple consecutive
|
words
|
</term>
, ( iii ) effective use of
<term>
priors
|
#2954
We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) fine-grained modeling of unknown word features. |
other,15-4-P03-1051,bq |
segmented corpus
</term>
of about 110,000
<term>
|
words
|
</term>
. To improve the
<term>
segmentation
|
#4704
The language model is initially estimated from a small manually segmented corpus of about 110,000words. |
other,3-1-C04-1106,bq |
. The reality of
<term>
analogies between
|
words
|
</term>
is refuted by noone ( e.g. , I walked
|
#5847
The reality of analogies between words is refuted by noone (e.g., I walked is to to walk as I laughed is to to laugh, noted I walked : to walk :: I laughed : to laugh). |
|
According to our assumption , most of the
|
words
|
with similar
<term>
context features
</term>
|
#6166
According to our assumption, most of the words with similar context features in each author's corpus tend not to be synonymous expressions. |
other,15-3-C04-1147,bq |
compute
<term>
similarity
</term>
between
<term>
|
words
|
</term>
or use
<term>
lexical affinity
</term>
|
#6365
In comparison with previous models, which either use arbitrary windows to compute similarity betweenwords or use lexical affinity to create sequential models, in this paper we focus on models intended to capture the co-occurrence patterns of any pair of words or phrases at any distance in the corpus. |
other,42-3-C04-1147,bq |
co-occurrence patterns
</term>
of any pair of
<term>
|
words
|
</term>
or
<term>
phrases
</term>
at any distance
|
#6392
In comparison with previous models, which either use arbitrary windows to compute similarity between words or use lexical affinity to create sequential models, in this paper we focus on models intended to capture the co-occurrence patterns of any pair ofwords or phrases at any distance in the corpus. |
other,16-2-P04-2005,bq |
<term>
topic signature
</term>
is a set of
<term>
|
words
|
</term>
that tend to co-occur with it .
<term>
|
#6921
Given a particular concept, or word sense, a topic signature is a set ofwords that tend to co-occur with it. |
other,32-3-I05-2021,bq |
right
<term>
translation
</term>
of the
<term>
|
words
|
</term>
in
<term>
source language sentences
|
#7888
At the same time, the recent improvements in the BLEU scores of statistical machine translation (SMT) suggests that SMT models are good at predicting the right translation of thewords in source language sentences. |
other,7-3-I05-4010,bq |
bilingual corpus
</term>
, 10.4 M
<term>
English
|
words
|
</term>
and 18.3 M
<term>
Chinese characters
|
#8260
The resultant bilingual corpus, 10.4M English words and 18.3M Chinese characters, is an authoritative and comprehensive text collection covering the specific and special domain of HK laws. |
other,18-3-I05-5003,bq |
of speech information
</term>
of the
<term>
|
words
|
</term>
contributing to the
<term>
word matches
|
#8385
We also introduce a novel classification method based on PER which leverages part of speech information of thewords contributing to the word matches and non-matches in the sentence. |
|
encodes
<term>
honorifics
</term>
( respectful
|
words
|
) .
<term>
Honorifics
</term>
are used extensively
|
#8576
This paper proposes an annotating scheme that encodes honorifics (respectful words). |
other,23-5-J05-4003,bq |
<term>
parallel corpus
</term>
( 100,000
<term>
|
words
|
</term>
) and exploiting a large
<term>
non-parallel
|
#9091
We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000words) and exploiting a large non-parallel corpus. |
other,25-4-E06-1018,bq |
observation
</term>
by using triplets of
<term>
|
words
|
</term>
instead of pairs . The combination
|
#10161
This approach differs from other approaches to WSI in that it enhances the effect of the one sense per collocation observation by using triplets ofwords instead of pairs. |
other,12-2-P06-2001,bq |
little
<term>
corpus
</term>
of 100,000
<term>
|
words
|
</term>
, the system guesses correctly not
|
#11239
After several experiments, and trained with a little corpus of 100,000words, the system guesses correctly not placing commas with a precision of 96% and a recall of 98%. |
other,8-1-P06-2110,bq |
kind of
<term>
similarity
</term>
between
<term>
|
words
|
</term>
can be represented by what kind of
|
#11478
This paper examines what kind of similarity betweenwords can be represented by what kind of word vectors in the vector space model. |
|
ungrammatically , missing out or repeating
|
words
|
, breaking-off and restarting , speaking
|
#12685
When people use natural language in natural settings, they often use it ungrammatically, missing out or repeating words, breaking-off and restarting, speaking in fragments, etc.. |