ACL RD-TEC 1.0 Summarization of W06-0120
Paper Title:
ON CLOSED TASK OF CHINESE WORD SEGMENTATION: AN IMPROVED CRF MODEL COUPLED WITH CHARACTER CLUSTERING AND AUTOMATICALLY GENERATED TEMPLATE MATCHING
ON CLOSED TASK OF CHINESE WORD SEGMENTATION: AN IMPROVED CRF MODEL COUPLED WITH CHARACTER CLUSTERING AND AUTOMATICALLY GENERATED TEMPLATE MATCHING
Authors: Richard Tzong-Han Tsai and Hsieh-Chuan Hung and Cheng-Lung Sung and Hong-Jie Dai and Wen-Lian Hsu
Primarily assigned technology terms:
- algorithm
- character clustering
- chinese language processing
- chinese text processing
- chinese word segmentation
- cluster selection
- clustering
- clustering algorithm
- computational linguistics
- conditional random fields
- crfs
- identification
- indexing
- k-means
- k-means clustering
- language processing
- matching
- normalization
- post-processing
- postprocessing
- processing
- relative distance
- segmentation
- segmentation system
- sequence training
- tagger
- taggers
- tagging
- template generation
- template matching
- text processing
- word segmentation
- word segmentation system
Other assigned terms:
- alphabet
- approach
- association for computational linguistics
- bigram
- character sequence
- characters
- chinese characters
- chinese corpora
- chinese corpus
- chinese language
- chinese text
- chinese word
- chinese words
- class information
- cluster
- cluster centroid
- clusters
- co-occurrence
- conditional probability
- context window
- corpora
- cosine distance
- crf model
- data sparseness
- data sparseness problem
- development set
- feature
- generation
- linguistics
- measure
- method
- named entity
- normalization factor
- phrase
- precision
- probability
- sentence
- sentences
- simplified chinese
- sparseness problem
- substring
- symbol
- test data
- text
- tokens
- training
- training corpus
- training data
- training set
- wildcard
- window size
- word
- word sequences
- words