ACL RD-TEC 1.0 Summarization of W06-0103
Paper Title:
MINING ATOMIC CHINESE ABBREVIATION PAIRS: A PROBABILISTIC MODEL FOR SINGLE CHARACTER WORD RECOVERY
MINING ATOMIC CHINESE ABBREVIATION PAIRS: A PROBABILISTIC MODEL FOR SINGLE CHARACTER WORD RECOVERY
Authors: Jing-Shin Chang and Wei-Lun Teng
Primarily assigned technology terms:
- abbreviation identification
- algorithm
- chinese language processing
- chinese word segmentation
- computational linguistics
- em algorithm
- error correction
- error recovery
- hidden markov
- hidden markov model
- identification
- identification process
- information retrieval
- information retrieval system
- iterative training
- keyword-based information retrieval
- language processing
- markov model
- matching
- mining
- parameter estimation
- parameter training
- processing
- query expansion
- re-estimation
- retrieval system
- root word recovery
- sampling
- segmentation
- segmentation process
- single character recovery
- smoothing
- training method
- training process
- translation system
- unsupervised training
- weighting
- word bigram
- word recovery
- word segmentation
Other assigned terms:
- abbreviation
- abbreviations
- analogy
- association for computational linguistics
- bigram
- bigram model
- break
- case
- character sequence
- characters
- chinese characters
- chinese compound
- chinese language
- chinese lexical
- chinese word
- chunk
- class-based model
- composition
- compound words
- compounds
- contextual information
- convergence
- corpora
- data sparseness
- dictionary
- distribution
- estimation
- fact
- frequency counts
- generation
- generation model
- heuristics
- hmm model
- hmm parameter
- hmm-based model
- language model
- language model score
- large corpora
- large corpus
- lattice
- lexical translation
- lexical unit
- lexicon
- likelihood
- linguistics
- local context
- method
- model parameters
- named entities
- parallel corpus
- performance evaluation
- precision
- probabilistic model
- probabilistic models
- probabilities
- probability
- process
- query
- seed
- segmented corpus
- sentence
- sentences
- surface form
- symbols
- syntactic structure
- term
- terms
- test set
- text
- text corpus
- theories
- tokens
- training
- training corpus
- training set
- transition probabilities
- transition probability
- translation pair
- web documents
- word
- word bigram model
- word formation
- word lattice
- word order
- word sequence
- words