W03-1731 |
of our chunking-based Chinese
|
word tokenization
|
in the competition . The most
|
S14-2018 |
linguistic processing , not even
|
word tokenization
|
. For each of the ( proprietary
|
S14-2017 |
linguistic processing , not even
|
word tokenization
|
. For each of the ( proprietary
|
P15-1004 |
Habash et al. , 2013 ) for Arabic
|
word tokenization
|
. For Chinese tokenization ,
|
W03-1731 |
Language Processing . Normally ,
|
word tokenization
|
is implemented through word segmentation
|
W03-1731 |
cope with unkown words in Chinese
|
word tokenization
|
. The unknown word detection
|
W03-1731 |
unknown word problem in Chinese
|
word tokenization
|
. <figurecaption> log ( | ) log
|
W03-1731 |
This paper introduces a Chinese
|
word tokenization
|
system through HMM-based chunking
|
W02-1804 |
many possible segmentation in the
|
word tokenization
|
stage . Figure 1 shows the tokenization
|
S15-2002 |
all substrings ( or n-grams in
|
word tokenization
|
) . In total we computed 22 metrics
|
Q14-1016 |
annotators used the sentence and
|
word tokenizations
|
supplied by the treebank .2 Annotation
|
D13-1146 |
segmentation task is split into
|
word tokenization
|
and sentence boundary detection
|
P12-3007 |
performing the vectorization ,
|
word tokenization
|
was conducted . In this step
|
W03-1731 |
problem with Chunking-based Chinese
|
word tokenization
|
is how to effectively approximate
|
K15-2001 |
propagate to the end , and other than
|
word tokenization
|
, all input to the participating
|
W03-1731 |
<title> Chunking-based Chinese
|
Word Tokenization
|
</title> GuoDong abstract This
|
P98-2189 |
removal of stop words •
|
word tokenization
|
, e.g. lemmatization . 3.2 Indexing
|
W03-1731 |
</figurecaption> 1 Introduction
|
Word Tokenization
|
is regarded as one of major bottlenecks
|
W03-1731 |
has become major bottleneck in
|
word tokenization
|
. This paper proposes a HMM-based
|
M95-1012 |
words ) and zoners ( e.g. , for
|
word tokenization
|
, sentence boundary determination
|