ACL RD-TEC 1.0 Summarization of P02-1024
Paper Title:
EXPLORING ASYMMETRIC CLUSTERING FOR STATISTICAL LANGUAGE MODELING
EXPLORING ASYMMETRIC CLUSTERING FOR STATISTICAL LANGUAGE MODELING
Authors: Jianfeng Gao and Joshua Goodman and Guihong Cao and Hang Li
Primarily assigned technology terms:
- agglomerative clustering
- algorithm
- approximation
- asian language modeling
- asymmetric clustering
- backoff bigram
- binary branching
- chinese language modeling
- clustering
- clustering algorithm
- clustering technique
- computational linguistics
- computer science
- computing
- cutoff
- decomposition
- factoring
- hard clustering
- kana-kanji conversion
- language model training
- language modeling
- machine translation
- model construction
- model parameter optimization
- model training
- modeling
- normalization
- optimization
- parameter estimation
- parameter optimization
- pruning
- pruning method
- recognition
- search
- smoothing
- soft clustering
- speech recognition
- splitting
- statistical clustering
- statistical language modeling
- top-down algorithm
Other assigned terms:
- ambiguity
- approach
- array
- asian language
- asian language text
- backoff
- bigram
- case
- character error rate
- characters
- chinese language
- chinese text
- cluster
- cluster number
- clusters
- comparative study
- conditional probability
- convergence
- corpora
- data sets
- data sparseness
- data sparseness problem
- entropy
- error rate
- estimation
- experimental results
- ibm model
- ibm models
- independence assumption
- japanese text
- kanji
- language model
- language model probability
- language models
- leaf
- lexicon
- linguistics
- meaning
- method
- methodology
- model parameter
- model parameters
- model performance
- model probability
- model size
- mutual information
- n-gram
- n-gram model
- n-gram models
- n-grams
- newspaper corpus
- orthography
- parameter settings
- perplexity
- probabilities
- probability
- pruning threshold
- research topic
- root node
- search space
- sparseness problem
- stochastic model
- symbol
- technique
- terms
- test set
- testing data
- text
- text corpora
- theory
- training
- training corpora
- training data
- training instance
- transcript
- tree
- tree structure
- tree structures
- trees
- trigram
- trigram model
- unigram
- word
- word n-gram model
- word sequence
- word string
- word strings
- word trigram
- word trigram model
- words