|
According to our assumption , most of the
|
words
|
with similar
<term>
context features
</term>
|
#6166
According to our assumption, most of the words with similar context features in each author's corpus tend not to be synonymous expressions. |
|
encodes
<term>
honorifics
</term>
( respectful
|
words
|
) .
<term>
Honorifics
</term>
are used extensively
|
#8576
This paper proposes an annotating scheme that encodes honorifics (respectful words). |
|
ungrammatically , missing out or repeating
|
words
|
, breaking-off and restarting , speaking
|
#12685
When people use natural language in natural settings, they often use it ungrammatically, missing out or repeating words, breaking-off and restarting, speaking in fragments, etc.. |
lr,21-2-C90-3072,bq |
dictionaries of word forms
</term>
instead of
<term>
|
words
|
</term>
. This approach is sufficient for
|
#16755
From different reasons among which the speed of processing prevails they are usually based on dictionaries of word forms instead ofwords. |
other,1-2-P01-1009,bq |
</term>
, and
<term>
besides
</term>
. These
<term>
|
words
|
</term>
appear frequently enough in
<term>
|
#1847
Thesewords appear frequently enough in dialog to warrant serious attention, yet present natural language search engines perform poorly on queries containing them. |
other,1-8-C92-3165,bq |
practical systems . Detected
<term>
unknown
|
words
|
</term>
can be incrementally incorporated
|
#18244
Detected unknown words can be incrementally incorporated into the dictionary after the interaction with the user. |
other,11-1-P01-1009,bq |
analysis
</term>
for a large class of
<term>
|
words
|
</term>
called
<term>
alternative markers
</term>
|
#1826
This paper presents a formal analysis for a large class ofwords called alternative markers, which includes other (than), such (as), and besides. |
other,11-4-P82-1035,bq |
</term>
can be used to figure out
<term>
unknown
|
words
|
</term>
from
<term>
context
</term>
, constrain
|
#13067
These syntactic and semantic expectations can be used to figure out unknown words from context, constrain the possible word-senses of words with multiple meanings (ambiguity), fill in missing words (elllpsis), and resolve referents (anaphora). |
other,12-2-P06-2001,bq |
little
<term>
corpus
</term>
of 100,000
<term>
|
words
|
</term>
, the system guesses correctly not
|
#11239
After several experiments, and trained with a little corpus of 100,000words, the system guesses correctly not placing commas with a precision of 96% and a recall of 98%. |
other,15-3-C04-1147,bq |
compute
<term>
similarity
</term>
between
<term>
|
words
|
</term>
or use
<term>
lexical affinity
</term>
|
#6365
In comparison with previous models, which either use arbitrary windows to compute similarity betweenwords or use lexical affinity to create sequential models, in this paper we focus on models intended to capture the co-occurrence patterns of any pair of words or phrases at any distance in the corpus. |
other,15-4-P03-1051,bq |
segmented corpus
</term>
of about 110,000
<term>
|
words
|
</term>
. To improve the
<term>
segmentation
|
#4704
The language model is initially estimated from a small manually segmented corpus of about 110,000words. |
other,15-5-A92-1027,bq |
</term>
based on the placement of
<term>
function
|
words
|
</term>
, and by
<term>
heuristic rules
</term>
|
#17681
This is facilitated through the use of phrase boundary heuristics based on the placement of function words, and by heuristic rules that permit certain kinds of phrases to be deduced despite the presence of unknown words. |
other,16-2-P04-2005,bq |
<term>
topic signature
</term>
is a set of
<term>
|
words
|
</term>
that tend to co-occur with it .
<term>
|
#6921
Given a particular concept, or word sense, a topic signature is a set ofwords that tend to co-occur with it. |
other,18-3-I05-5003,bq |
of speech information
</term>
of the
<term>
|
words
|
</term>
contributing to the
<term>
word matches
|
#8385
We also introduce a novel classification method based on PER which leverages part of speech information of thewords contributing to the word matches and non-matches in the sentence. |
other,18-4-H01-1042,bq |
language essays
</term>
in less than 100
<term>
|
words
|
</term>
. Even more illuminating was the
|
#646
A language learning experiment showed that assessors can differentiate native from non-native language essays in less than 100words. |
other,19-2-C92-4199,bq |
is proposed for identifying
<term>
unknown
|
words
|
</term>
, especially
<term>
personal names
</term>
|
#18305
In this paper, a new mechanism, based on the concept of sublanguage, is proposed for identifying unknown words, especially personal names, in Chinese newspapers. |
other,21-4-P82-1035,bq |
possible
<term>
word-senses
</term>
of
<term>
|
words
|
with multiple meanings
</term>
(
<term>
ambiguity
|
#13076
These syntactic and semantic expectations can be used to figure out unknown words from context, constrain the possible word-senses ofwords with multiple meanings (ambiguity), fill in missing words (elllpsis), and resolve referents (anaphora). |
other,21-6-A94-1026,bq |
semantic categories
</term>
of the
<term>
adjoining
|
words
|
</term>
. The method accurately determines
|
#20482
The basic idea of this method is that a compound noun component places some restrictions on the semantic categories of the adjoining words. |
other,22-1-A94-1007,bq |
<term>
but
</term>
and the equivalent
<term>
|
words
|
</term>
.
<term>
Syntactic analysis of the
|
#19697
The authors propose a model for analyzing English sentences including coordinate conjunctions such as and, or, but and the equivalentwords. |
other,23-5-J05-4003,bq |
<term>
parallel corpus
</term>
( 100,000
<term>
|
words
|
</term>
) and exploiting a large
<term>
non-parallel
|
#9091
We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000words) and exploiting a large non-parallel corpus. |