ACL RD-TEC 1.0 Summarization of W06-0136
Paper Title:
N-GRAM BASED TWO-STEP ALGORITHM FOR WORD SEGMENTATION
N-GRAM BASED TWO-STEP ALGORITHM FOR WORD SEGMENTATION
Authors: Dong-Hee Lim and Kyu-Baek Hwang and Seung-Shik Kang
Primarily assigned technology terms:
- algorithm
- automatic word segmentation
- chinese language processing
- chinese word segmentation
- computational linguistics
- cross validation
- error correction
- information retrieval
- information retrieval system
- language processing
- loose segmentation
- name recognition
- pos-tagging
- postprocessing
- processing
- proper name recognition
- recognition
- recognizer
- retrieval system
- search
- search engine
- segmentation
- segmentation algorithm
- segmentation system
- smoothing
- tagging
- term extraction
- two-step word segmentation
- unknown word processing
- validation
- word processing
- word segmentation
- word segmentation bakeoff
- word segmentation system
- word space
Other assigned terms:
- ambiguous word
- approach
- association for computational linguistics
- bigram
- character sequence
- characters
- chinese language
- chinese word
- chinese words
- corpora
- data sparseness
- data sparseness problem
- dictionaries
- dictionary
- error rate
- experimental results
- f-measure
- fact
- feature
- index
- korean language
- lexicon
- linguistics
- memory space
- method
- methodology
- n-gram
- n-grams
- names
- probability
- proper name
- proper names
- query
- segmentation bakeoff
- sentence
- small-sized training
- small-sized training corpora
- sparseness problem
- statistics
- tag information
- tagging problem
- tags
- term
- training
- training corpora
- training corpus
- trigram
- unigram
- word
- word boundaries
- words