ACL RD-TEC 1.0 Summarization of C04-1067
Paper Title:
CHINESE AND JAPANESE WORD SEGMENTATION USING WORD-LEVEL AND CHARACTER-LEVEL INFORMATION
CHINESE AND JAPANESE WORD SEGMENTATION USING WORD-LEVEL AND CHARACTER-LEVEL INFORMATION
Primarily assigned technology terms:
- algorithm
- baum-welch algorithm
- character tagging
- chinese word segmentation
- chunking
- entity recognition
- generalized iterative scaling
- good-turing smoothing
- hidden markov
- hidden markov models
- hybrid method
- identification
- iterative scaling
- japanese word segmentation
- learning
- learning techniques
- machine learning
- machine learning techniques
- matching
- maximum entropy
- maximum matching
- maximum-likelihood
- model-based method
- named entity recognition
- parameter estimation
- pos tagging
- pos-tagging
- processing
- recognition
- segmentation
- segmentation system
- segmenter
- smoothing
- smoothing method
- support vector machines
- tagging
- tagging method
- unknown word processing
- viterbi
- viterbi algorithm
- word bigram
- word processing
- word segmentation
- word segmentation bakeoff
- word segmentation system
- word segmentation task
- word segmenter
- word-based approach
Other assigned terms:
- alphabet
- approach
- bigram
- case
- characters
- chinese word
- city university corpus
- corpora
- dictionaries
- dictionary
- distribution
- english part-of-speech
- entropy
- estimation
- events
- experimental results
- f-measure
- heuristic
- heuristic rules
- interpolation
- ipadic
- japanese word
- knowledge
- lattice
- markov chain
- markov models
- method
- named entity
- part-of-speech
- parts-of-speech
- pos sequence
- probabilities
- probability
- process
- segmentation bakeoff
- segments
- sentence
- sentences
- sinica corpus
- statistical information
- statistics
- substring
- support vector
- tag sequence
- tag set
- tagging task
- tags
- technology
- test data
- theorem
- training
- training corpus
- training data
- unigram
- word
- word boundaries
- word dictionary
- word sequence
- words