ACL RD-TEC 1.0 Summarization of W04-3242
Paper Title:
RANDOM FORESTS IN LANGUAGE MODELIN
RANDOM FORESTS IN LANGUAGE MODELIN
Authors: Peng Xu and Frederick Jelinek
Primarily assigned technology terms:
- algorithm
- approximation
- automatic speech recognition
- backoff smoothing
- classification
- clustering
- decision tree
- decision trees
- dt construction
- entity recognition
- greedy approach
- interpolated kneser-ney smoothing
- kneser-ney smoothing
- language model smoothing
- language modeling
- language modeling approach
- language processing
- large vocabulary speech recognition
- linear interpolation
- machine translation
- maximum likelihood
- model smoothing
- modeling
- named entity recognition
- natural language processing
- natural language system
- neural network
- node splitting
- parser
- parsing
- pos tagging
- probability estimation
- processing
- pruning
- pruning strategy
- random forest
- re-scoring
- recognition
- recognition system
- regression
- rf modeling
- sampling
- searching
- smoothing
- smoothing method
- smoothing technique
- smoothing techniques
- speech recognition
- speech recognition system
- splitting
- statistical machine translation
- statistical system
- tagging
Other assigned terms:
- approach
- backoff
- case
- cluster
- clusters
- cross entropy
- data sparseness
- data sparseness problem
- data structure
- dimensionality
- distribution
- entropy
- error rate
- estimation
- events
- experimental results
- fact
- forest
- hypothesis
- hypothesis space
- interpolation
- knowledge
- language model
- language model probability
- language models
- large vocabulary speech
- lattices
- leaf
- likelihood
- log-likelihood
- measure
- measures
- method
- methodology
- model probability
- named entity
- natural language
- natural speech
- nist
- noun phrase
- perplexity
- phrase
- probabilities
- probability
- probability distribution
- procedure
- random sample
- sentence
- sentences
- sparseness problem
- statistics
- sub-tree
- syntactic information
- technique
- test data
- test set
- text
- toolkit
- training
- training corpus
- training data
- tree
- treebank
- trees
- trigram
- trigram language model
- trigram model
- upenn treebank
- utterance
- vocabulary
- vocabulary size
- word
- word error rate
- word sequence
- word string
- words
- wsj corpus