#646A language learning experiment showed that assessors can differentiate native from non-native language essays in less than 100words.
other,11-1-P01-1009,ak
formal analysis for a large class of
<term>
words
</term>
called
<term>
alternative markers
</term>
#1827This paper presents a formal analysis for a large class ofwords called alternative markers, which includes other (than), such (as), and besides.
other,1-2-P01-1009,ak
such ( as ) , and besides . These
<term>
words
</term>
appear frequently enough in
<term>
#1848Thesewords appear frequently enough in dialog to warrant serious attention, yet present natural language search engines perform poorly on queries containing them.
other,7-4-N03-1017,ak
<term>
phrases
</term>
longer than three
<term>
words
</term>
and learning
<term>
phrases
</term>
from
#2638Surprisingly, learning phrases longer than threewords and learning phrases from high-accuracy word-level alignment models does not have a strong impact on performance.
jointly conditioning on multiple consecutive
words
, ( iii ) effective use of
<term>
priors
</term>
#2955We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) fine-grained modeling of unknown word features.
other,15-4-P03-1051,ak
segmented corpus
</term>
of about 110,000
<term>
words
</term>
. To improve the
<term>
segmentation
#4706The language model is initially estimated from a small manually segmented corpus of about 110,000words.
the right
<term>
translation
</term>
of the
words
in
<term>
source language sentences
</term>
#6431At the same time, the recent improvements in the BLEU scores of statistical machine translation (SMT) suggests that SMT models are good at predicting the right translation of the words in source language sentences.
other,9-3-I05-4008,ak
<term>
corpus
</term>
is about 1.6 million
<term>
words
</term>
. In this paper , we describe
<term>
#7222The size of the corpus is about 1.6 millionwords.
<term>
bilingual corpus
</term>
, 10.4 M English
words
and 18.3 M Chinese characters , is an authoritative
#7310The resultant bilingual corpus, 10.4M English words and 18.3M Chinese characters, is an authoritative and comprehensive text collection covering the specific and special domain of HK laws.
<term>
part of speech information
</term>
of the
words
contributing to the
<term>
word matches
</term>
#7435We also introduce a novel classification method based on PER which leverages part of speech information of the words contributing to the word matches and non-matches in the sentence.
encodes
<term>
honorifics
</term>
( respectful
words
) .
<term>
Honorifics
</term>
are used extensively
#7941This paper proposes an annotating scheme that encodes honorifics (respectful words).
small
<term>
parallel corpus
</term>
( 100,000
words
) and exploiting a largenon-parallel
<term>
#8451We also show that a good-quality MT system can be built fromscratch by starting with a very small parallel corpus (100,000 words) and exploiting a largenon-parallel corpus.
performance of 86.6 % ( Fa5 , sentences a6 40
words
) , which is comparable to that of an
<term>
#8572In experiments using the Penn WSJ corpus, our automatically trained model gave a performance of 86.6% (Fa5 , sentences a6 40 words), which is comparable to that of an unlexicalized PCFG parser created using extensive manual feature selection.
other,23-4-E06-1018,ak
observation
</term>
by using
<term>
triplets of
words
</term>
instead of pairs . The combination
#11098This approach differs from other approaches to WSI in that it enhances the effect of the one sense per collocation observation by using triplets of words instead of pairs.
with a little
<term>
corpus
</term>
of 100,000
words
, the system guesses correctly not placing
#12176After several experiments, and trained with a little corpus of 100,000 words, the system guesses correctly not placing commas with a precision of 96% and a recall of 98%.
other,8-1-P06-2110,ak
kind of
<term>
similarity
</term>
between
<term>
words
</term>
can be represented by what kind of
#12415This paper examines what kind of similarity betweenwords can be represented by what kind of word vectors in the vector space model.
other,19-1-P80-1026,ak
ungrammatically , missing out or repeating
<term>
words
</term>
, breaking-off and restarting , speaking
#13622When people use natural language in natural settings, they often use it ungrammatically, missing out or repeatingwords, breaking-off and restarting, speaking in fragments, etc..
other,38-2-P82-1035,ak
problems for readers , such as misspelled
<term>
words
</term>
, missing
<term>
words
</term>
, poor
#14301However, a great deal of natural language texts e.g., memos, rough drafts, conversation transcripts etc., have features that differ significantly from neat texts, posing special problems for readers, such as misspelledwords, missing words, poor syntactic construction, missing periods, etc.
other,41-2-P82-1035,ak
misspelled
<term>
words
</term>
, missing
<term>
words
</term>
, poor
<term>
syntactic construction
#14304However, a great deal of natural language texts e.g., memos, rough drafts, conversation transcripts etc., have features that differ significantly from neat texts, posing special problems for readers, such as misspelled words, missingwords, poor syntactic construction, missing periods, etc.
</term>
can be used to figure out unknown
words
from
<term>
context
</term>
, constrain the
#14356These syntactic and semantic expectations can be used to figure out unknown words from context, constrain the possible word-senses of words with multiple meanings (ambiguity), fill in missing words (ellipsis), and resolve referents (anaphora).