Concordance

Query word, tokenization 59 >
GDEX 59 (0.6 per million)

W03-1731	of our chunking-based Chinese	word tokenization	in the competition . The most
S14-2018	linguistic processing , not even	word tokenization	. For each of the ( proprietary
S14-2017	linguistic processing , not even	word tokenization	. For each of the ( proprietary
P15-1004	Habash et al. , 2013 ) for Arabic	word tokenization	. For Chinese tokenization ,
W03-1731	Language Processing . Normally ,	word tokenization	is implemented through word segmentation
W03-1731	cope with unkown words in Chinese	word tokenization	. The unknown word detection
W03-1731	unknown word problem in Chinese	word tokenization	. <figurecaption> log ( \| ) log
W03-1731	This paper introduces a Chinese	word tokenization	system through HMM-based chunking
W02-1804	many possible segmentation in the	word tokenization	stage . Figure 1 shows the tokenization
S15-2002	all substrings ( or n-grams in	word tokenization	) . In total we computed 22 metrics
Q14-1016	annotators used the sentence and	word tokenizations	supplied by the treebank .2 Annotation
D13-1146	segmentation task is split into	word tokenization	and sentence boundary detection
P12-3007	performing the vectorization ,	word tokenization	was conducted . In this step
W03-1731	problem with Chunking-based Chinese	word tokenization	is how to effectively approximate
K15-2001	propagate to the end , and other than	word tokenization	, all input to the participating
W03-1731	<title> Chunking-based Chinese	Word Tokenization	</title> GuoDong abstract This
P98-2189	removal of stop words •	word tokenization	, e.g. lemmatization . 3.2 Indexing
W03-1731	</figurecaption> 1 Introduction	Word Tokenization	is regarded as one of major bottlenecks
W03-1731	has become major bottleneck in	word tokenization	. This paper proposes a HMM-based
M95-1012	words ) and zoners ( e.g. , for	word tokenization	, sentence boundary determination


	in Help