ACL RD-TEC 1.0 Summarization of P06-1125
Paper Title:
A PHONETIC-BASED APPROACH TO CHINESE CHAT TEXT NORMALIZATION
A PHONETIC-BASED APPROACH TO CHINESE CHAT TEXT NORMALIZATION
Authors: Yunqing Xia and Kam-Fai Wong and Wenjie Li
Primarily assigned technology terms:
- algorithm
- backoff smoothing
- backoff smoothing technique
- computational linguistics
- dictionary-based method
- error analysis
- estimation method
- feature analysis
- internet
- katz backoff smoothing
- language normalization
- language processing
- language training
- letter matching
- likelihood estimation
- likelihood estimation method
- machine translation
- matching
- maximum likelihood
- maximum likelihood estimation
- modeling
- nlp
- normalization
- parameter estimation
- phonetic mapping
- phonetic transcription
- processing
- recognition
- smoothing
- smoothing technique
- speech recognition
- term normalization
- term translation
- text normalization
- text understanding
- transcription
- viterbi
- viterbi algorithm
- xscm method
Other assigned terms:
- approach
- association for computational linguistics
- backoff
- case
- characters
- chinese characters
- chinese corpus
- chinese language
- chinese language corpus
- chinese text
- corpora
- data sparseness
- data sparseness problem
- dictionaries
- dictionary
- discourse
- distribution
- estimation
- experimental results
- f-1 measure
- fact
- feature
- formalism
- implementation
- intention
- language corpora
- language corpus
- language model
- likelihood
- linguistics
- mapping
- mapping model
- mappings
- meanings
- measure
- method
- methodology
- natural language
- phonetic mapping model
- phonetic similarity
- pinyin
- precision
- probabilities
- probability
- sentence
- sentences
- sentential context
- simplified chinese
- source channel model
- sparse data
- sparse data problem
- sparseness problem
- statistical approach
- statistics
- technique
- term
- term distribution
- terms
- test set
- text
- text corpus
- training
- training data
- training samples
- translation model
- trigram
- trigram model
- understanding
- word
- words