ACL RD-TEC 1.0 Summarization of W03-1731
Paper Title:
CHUNKING-BASED CHINESE WORD TOKENIZATION
CHUNKING-BASED CHINESE WORD TOKENIZATION
Primarily assigned technology terms:
- algorithm
- bracketing
- chinese language processing
- chinese word segmentation
- chinese word tokenization
- chunking
- error-driven learning
- language processing
- learning
- learning approach
- modeling
- ngram modeling
- processing
- segmentation
- skimming
- tagger
- tokenization
- tokenization system
- training process
- unknown word detection
- viterbi
- viterbi algorithm
- word detection
- word segmentation
- word tokenization
Other assigned terms:
- ambiguity
- ambiguity problem
- approach
- chinese language
- chinese word
- chunk
- chunk tag
- context-dependent information
- contextual information
- ctb corpus
- dictionary
- experimental results
- feature
- implementation
- information independence
- lattice
- lexical entries
- lexical entry
- lexicon
- mutual information
- mutual information independence
- ngram
- probabilities
- process
- sentence
- tag sequence
- tags
- term
- training
- training corpus
- training data
- word
- word category
- word category information
- word formation
- word formation pattern
- word pair
- word sequence
- word type
- word types
- words