H92-1041 |
of proportional assignment with
|
word-based indexing
|
languages . Figure 3 shows results
|
P01-1004 |
character-based indexing over
|
word-based indexing
|
is that there is no pre-processing
|
C00-1006 |
with character overlap . With
|
word-based indexing
|
, this would only be possible
|
H92-1041 |
The optimal feature set size for
|
word-based indexing
|
was found to be surprisingly
|
P01-1004 |
indexing performs comparably to
|
word-based indexing
|
. In analogous research , Baldwin
|
C00-1006 |
partitioned off into character-based and
|
word-based indexing
|
for the vm ` ious similarity
|
P01-1004 |
indexing is consistently superior to
|
word-based indexing
|
. Furthermore , the bagof-words
|
C00-1006 |
produces a superior match accuracy to
|
word-based indexing
|
tbr all similarity metrics ,
|
C00-1006 |
2 . O3 for character-based and
|
word-based indexing
|
, respectively . All methods
|
C00-1006 |
not stem or lemmatise words in
|
word-based indexing
|
. Having said this , the . output
|
C00-1006 |
ious similarity methods . For
|
word-based indexing
|
, seginentation was carried out
|
C00-1006 |
sequential correspondence tbr
|
word-based indexing
|
, but tile word order-based methods
|
C00-1006 |
indexing performs comparably with
|
word-based indexing
|
in Japanese information retrieval
|
C00-1006 |
conservatively for character-based than
|
word-based indexing
|
. The most robust method is (
|
C00-1006 |
methods for t ) oth characterand
|
word-based indexing
|
, peaking at just over 50 % for
|
P01-1004 |
( 2000 ) compared characterand
|
word-based indexing
|
within a Japanese -- English
|
C00-1006 |
" , but would not match under
|
word-based indexing
|
. Character-based index - ing
|
C00-1006 |
number of string comparisons in
|
word-based indexing
|
evaluation for VSM , token in
|
P01-1004 |
corpus , under both character - and
|
word-based indexing
|
, and with each of unigrams ,
|
C00-1006 |
performance for both character-based and
|
word-based indexing
|
. As such , this side-etfect
|