ACL RD-TEC 1.0 Summarization of P06-2005
Paper Title:
A PHRASE-BASED STATISTICAL MODEL FOR SMS TEXT NORMALIZATION
A PHRASE-BASED STATISTICAL MODEL FOR SMS TEXT NORMALIZATION
Authors: AiTi Aw and Min Zhang and Juan Xiao and Jian Su
Primarily assigned technology terms:
- 5-fold cross validation
- algorithm
- alignment learning
- alignment process
- approximation
- backoff smoothing
- bootstrap
- bootstrapping
- computational linguistics
- computing
- consensus translation
- context understanding
- cross validation
- decoding
- dictionary look-up
- disambiguation
- dynamic programming
- em algorithm
- english-to-chinese translation
- error analysis
- expectation-maximization
- expectation-maximization algorithm
- five-fold cross validation
- global optimization
- instant messaging
- internet
- language modeling
- learning
- learning process
- levenshtein
- lexical disambiguation
- lexical mapping
- likelihood approach
- machine translation
- matching
- matching technique
- maximum approximation
- maximum likelihood
- maximum likelihood approach
- messaging
- model training
- modeling
- monotone search
- mt system
- noisy channel model
- normalization
- optimization
- paraphrasing
- phonetic spelling
- phrase alignment
- phrase mapping
- phrase segmentation
- phrasing
- pre-processing
- preprocessing
- processing
- pronunciation modeling
- search
- segmentation
- short messaging service
- smoothing
- sms normalization
- sms text normalization
- spelling
- spelling correction
- statistical machine translation
- statistical method
- statistical mt
- statistical translation
- string matching
- text normalization
- text-to-speech
- tokenization
- translation system
- translation systems
- validation
- viterbi
- viterbi search
Other assigned terms:
- abbreviations
- ambiguity
- approach
- association for computational linguistics
- backoff
- baseline performance
- baseline score
- bleu
- bleu score
- bleu scores
- case
- characters
- convergence
- conversation
- copula verb
- corpora
- customization
- data set
- data sparseness
- derivations
- dictionary
- discourse
- distribution
- edit distance
- english language
- english sentence
- english text
- grammar
- heuristics
- joint probability
- language expression
- language model
- language modeling toolkit
- lexical ambiguity
- lexical unit
- lexicon
- likelihood
- linguistic
- linguistics
- machine translation model
- main verb
- mapping
- mapping model
- mappings
- meaning
- measure
- measures
- message
- method
- modeling toolkit
- morpho-syntactic information
- n-gram
- noisy channel
- normalization model
- nouns
- orthographic variation
- parallel corpora
- parallel corpus
- paraphrases
- particles
- phrase
- phrase level
- phrase-based model
- prior distribution
- probabilities
- probability
- process
- pronoun
- pronunciation
- punctuation
- reordering
- representations
- semantic
- semantic information
- sentence
- sentence boundaries
- sentence pair
- sentences
- slang
- sms text
- source channel model
- source text
- statistical model
- statistical translation model
- statistics
- style
- target text
- technique
- text
- text collection
- text corpus
- text structure
- text style
- tokens
- toolkit
- training
- training corpus
- training data
- transformation
- translation accuracy
- translation model
- translation output
- translation problem
- translation quality
- understanding
- unigram
- verb
- vocabulary
- word
- word-based language model
- word-based model
- words
- written texts