Concordance

other,11-1-P01-1009,ak

formal analysis for a large class of <term>

words

</term> called <term> alternative markers </term>

other,7-4-N03-1017,ak

<term> phrases </term> longer than three <term>

words

</term> and learning <term> phrases </term> from

other,15-4-P03-1051,ak

segmented corpus </term> of about 110,000 <term>

words

</term> . To improve the <term> segmentation

other,9-3-I05-4008,ak

<term> corpus </term> is about 1.6 million <term>

words

</term> . In this paper , we describe <term>

<term> part of speech information </term> of the

words

contributing to the <term> word matches </term>

small <term> parallel corpus </term> ( 100,000

words

) and exploiting a largenon-parallel <term>

other,23-4-E06-1018,ak

observation </term> by using <term> triplets of

words

</term> instead of pairs . The combination

other,8-1-P06-2110,ak

kind of <term> similarity </term> between <term>

words

</term> can be represented by what kind of

other,38-2-P82-1035,ak

problems for readers , such as misspelled <term>

words

</term> , missing <term> words </term> , poor

</term> can be used to figure out unknown

words

from <term> context </term> , constrain the

other,18-4-H01-1042,ak	language essays </term> in less than 100 <term>	words	</term> . Even more illuminating was the	#646 A language learning experiment showed that assessors can differentiate native from non-native language essays in less than 100words.
other,11-1-P01-1009,ak	formal analysis for a large class of <term>	words	</term> called <term> alternative markers </term>	#1827 This paper presents a formal analysis for a large class ofwords called alternative markers, which includes other (than), such (as), and besides.
other,1-2-P01-1009,ak	such ( as ) , and besides . These <term>	words	</term> appear frequently enough in <term>	#1848 Thesewords appear frequently enough in dialog to warrant serious attention, yet present natural language search engines perform poorly on queries containing them.
other,7-4-N03-1017,ak	<term> phrases </term> longer than three <term>	words	</term> and learning <term> phrases </term> from	#2638 Surprisingly, learning phrases longer than threewords and learning phrases from high-accuracy word-level alignment models does not have a strong impact on performance.
	jointly conditioning on multiple consecutive	words	, ( iii ) effective use of <term> priors </term>	#2955 We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) fine-grained modeling of unknown word features.
other,15-4-P03-1051,ak	segmented corpus </term> of about 110,000 <term>	words	</term> . To improve the <term> segmentation	#4706 The language model is initially estimated from a small manually segmented corpus of about 110,000words.
	the right <term> translation </term> of the	words	in <term> source language sentences </term>	#6431 At the same time, the recent improvements in the BLEU scores of statistical machine translation (SMT) suggests that SMT models are good at predicting the right translation of the words in source language sentences.
other,9-3-I05-4008,ak	<term> corpus </term> is about 1.6 million <term>	words	</term> . In this paper , we describe <term>	#7222 The size of the corpus is about 1.6 millionwords.
	<term> bilingual corpus </term> , 10.4 M English	words	and 18.3 M Chinese characters , is an authoritative	#7310 The resultant bilingual corpus, 10.4M English words and 18.3M Chinese characters, is an authoritative and comprehensive text collection covering the specific and special domain of HK laws.
	<term> part of speech information </term> of the	words	contributing to the <term> word matches </term>	#7435 We also introduce a novel classification method based on PER which leverages part of speech information of the words contributing to the word matches and non-matches in the sentence.
	encodes <term> honorifics </term> ( respectful	words	) . <term> Honorifics </term> are used extensively	#7941 This paper proposes an annotating scheme that encodes honorifics (respectful words).
	small <term> parallel corpus </term> ( 100,000	words	) and exploiting a largenon-parallel <term>	#8451 We also show that a good-quality MT system can be built fromscratch by starting with a very small parallel corpus (100,000 words) and exploiting a largenon-parallel corpus.
	performance of 86.6 % ( Fa5 , sentences a6 40	words	) , which is comparable to that of an <term>	#8572 In experiments using the Penn WSJ corpus, our automatically trained model gave a performance of 86.6% (Fa5 , sentences a6 40 words), which is comparable to that of an unlexicalized PCFG parser created using extensive manual feature selection.
other,23-4-E06-1018,ak	observation </term> by using <term> triplets of	words	</term> instead of pairs . The combination	#11098 This approach differs from other approaches to WSI in that it enhances the effect of the one sense per collocation observation by using triplets of words instead of pairs.
	with a little <term> corpus </term> of 100,000	words	, the system guesses correctly not placing	#12176 After several experiments, and trained with a little corpus of 100,000 words, the system guesses correctly not placing commas with a precision of 96% and a recall of 98%.
other,8-1-P06-2110,ak	kind of <term> similarity </term> between <term>	words	</term> can be represented by what kind of	#12415 This paper examines what kind of similarity betweenwords can be represented by what kind of word vectors in the vector space model.
other,19-1-P80-1026,ak	ungrammatically , missing out or repeating <term>	words	</term> , breaking-off and restarting , speaking	#13622 When people use natural language in natural settings, they often use it ungrammatically, missing out or repeatingwords, breaking-off and restarting, speaking in fragments, etc..
other,38-2-P82-1035,ak	problems for readers , such as misspelled <term>	words	</term> , missing <term> words </term> , poor	#14301 However, a great deal of natural language texts e.g., memos, rough drafts, conversation transcripts etc., have features that differ significantly from neat texts, posing special problems for readers, such as misspelledwords, missing words, poor syntactic construction, missing periods, etc.
other,41-2-P82-1035,ak	misspelled <term> words </term> , missing <term>	words	</term> , poor <term> syntactic construction	#14304 However, a great deal of natural language texts e.g., memos, rough drafts, conversation transcripts etc., have features that differ significantly from neat texts, posing special problems for readers, such as misspelled words, missingwords, poor syntactic construction, missing periods, etc.
	</term> can be used to figure out unknown	words	from <term> context </term> , constrain the	#14356 These syntactic and semantic expectations can be used to figure out unknown words from context, constrain the possible word-senses of words with multiple meanings (ambiguity), fill in missing words (ellipsis), and resolve referents (anaphora).


	in Help