ACL RD-TEC 1.0 Summarization of W06-0134
Paper Title:
A PRAGMATIC CHINESE WORD SEGMENTATION SYSTEM
A PRAGMATIC CHINESE WORD SEGMENTATION SYSTEM
Authors: Wei Jiang and Yi Guan and Xiao-Long Wang
Primarily assigned technology terms:
- algorithm
- automaton
- chinese language processing
- chinese word segmentation
- classification
- classification algorithm
- compiler
- computational linguistics
- conditional random fields
- detection algorithm
- disambiguation
- disambiguation algorithm
- disambiguation processing
- entity recognition
- finite state
- finite state automaton
- information extraction
- language processing
- maximum entropy
- maximum entropy model
- named entity recognition
- new word detection
- nlp
- parser
- pos tagger
- processing
- recognition
- recognition algorithm
- rule compiler
- search
- searching
- segmentation
- segmentation system
- smoothing
- state automaton
- tagger
- viterbi
- viterbi algorithm
- word detection
- word disambiguation
- word recognition
- word recognition algorithm
- word segmentation
- word segmentation bakeoff
- word segmentation system
- word segmentation task
Other assigned terms:
- ambiguous segmentation
- ambiguous words
- annotated corpora
- association for computational linguistics
- case
- character sequence
- characters
- chinese language
- chinese word
- conditional probability
- context feature
- context features
- corpora
- data structure
- dictionaries
- dictionary
- entity recognition module
- entity recognition task
- entropy
- estimation
- f score
- feature
- fmeasure
- generative rule
- information gain
- lattice
- lexicon
- linguist
- linguistic
- linguistic features
- linguistics
- measure
- method
- mutual information
- n-grams
- named entities
- named entity
- natural language
- nlp tasks
- open test
- out-of-vocabulary word
- precision
- probability
- recognition module
- recognition task
- regular expressions
- segmentation bakeoff
- semantic
- sentence
- sentences
- sparse data
- sparse data problem
- statistics
- suffix
- symbol
- syntactic unit
- system description
- system performance
- tags
- target word
- terms
- test corpus
- training
- training corpora
- training corpus
- training data
- trigram
- trigram model
- word
- word boundaries
- word lattice
- word sequence
- word-based model
- words
- xml format