D14-1092 |
evaluating the performance of Chinese
|
word segmentation systems
|
. Two of the four datasets are
|
C04-1067 |
matching method . Character Tagging A
|
word segmentation system
|
using the character tagging method
|
C04-1067 |
, 2003 ) . Maximum Matching A
|
word segmentation system
|
using the well-known maximum
|
C04-1067 |
other methods . Many practical
|
word segmentation systems
|
add candidates of unknown words
|
C96-1039 |
three single characters by our
|
word segmentation system
|
. From the viewpoint of personal
|
C96-1039 |
) . It is just segmented by a
|
word segmentation system
|
without checking manually . Although
|
D13-1005 |
begin by evaluating our model as a
|
word segmentation system
|
. ( Table 1 gives segmentation
|
D11-1089 |
order to show how well existing
|
word segmentation systems
|
perform this task . Although
|
D08-1111 |
considered as the best Chinese
|
word segmentation systems
|
. We chose ICTCLAS as the comparison
|
C04-1081 |
fortunately , building a Chinese
|
word segmentation system
|
is complicated by the fact that
|
D14-1092 |
bound for any unsupervised Chinese
|
word segmentation systems
|
. We also use it as the topline
|
D14-1092 |
different types of unsupervised
|
word segmentation systems
|
. This paper is organized as
|
D11-1089 |
, even for the state-of-theart
|
word segmentation systems
|
. On the other hand , PROPOSED
|
D11-1089 |
by , typically , using existent
|
word segmentation systems
|
. This is , however , not appropriate
|
D14-1092 |
influencing accuracy of Chinese
|
word segmentation systems
|
( Huang and Zhao , 2007 ) . We
|
D14-1092 |
testing set T0 to test several
|
word segmentation systems
|
, there are N testing examples
|
C94-2209 |
-- 8 \ -RSB- ) . Many automatic
|
word segmentation systems
|
adopting the above models have
|
H01-1057 |
researchers had implemented Thai
|
word segmentation systems
|
based on using a dictionary (
|
D14-1092 |
and comparison for unsupervised
|
word segmentation systems
|
, an important issue is what
|
C96-1039 |
corpus . It is segmented by a
|
word segmentation system
|
and is checked manually . In
|