ACL RD-TEC 1.0 Summarization of J96-2003
Paper Title:
IMPROVING STATISTICAL LANGUAGE MODEL PERFORMANCE WITH AUTOMATICALLY GENERATED WORD HIERARCHIES
IMPROVING STATISTICAL LANGUAGE MODEL PERFORMANCE WITH AUTOMATICALLY GENERATED WORD HIERARCHIES
Authors: John G. McMahon and Francis J. Smith
Primarily assigned technology terms:
- algorithm
- approximation
- automatic classification
- bootstrapping
- bootstrapping process
- case analysis
- classification
- classification system
- classification trees
- clustering
- clustering algorithm
- computational linguistics
- computer science
- computing
- database
- disambiguation
- estimator
- evaluation system
- hardware
- illustration
- indirect evaluation
- language acquisition
- language modeling
- markov model
- maximum likelihood
- modeling
- parameter setting
- part-of-speech tagger
- processing
- pseudorandom number generator
- re-estimation
- re-estimation algorithm
- reasoning
- recognition
- recognition systems
- search
- searching
- speech recognition
- speech recognition systems
- statistical clustering
- statistical language modeling
- tagger
- taggers
- text compression
- top-down algorithm
- top-down approach
- top-down automatic word-classification algorithm
- top-down classification
- top-down clustering
- weighting
- word classification
- word clustering
- word-classification
- word-classification algorithm
- word-classification system
- word-sense disambiguation
Other assigned terms:
- anaphoric reference
- approach
- association for computational linguistics
- baseline model
- benchmark
- bigram
- bigram model
- binary tree
- bottom-up approach
- brown corpus
- case
- characters
- class information
- class membership
- classification hierarchy
- cluster
- cluster evaluation
- clusters
- co-occurrence
- cognitive
- collocation
- community
- conditional probability
- contour
- corpora
- distribution
- entropy
- evaluation method
- evaluations
- events
- feature
- grammar
- implementation
- interpolation
- lambda
- language data
- language model
- language model performance
- language models
- lexical structure
- likelihood
- likelihood probability
- linguistic
- linguistic phenomena
- linguistics
- lob corpus
- mapping
- markov model theory
- method
- model performance
- model probability
- model theory
- mutual information
- n-gram
- n-gram models
- n-grams
- natural language
- nouns
- parse
- part of speech
- part-of-speech
- parts of speech
- performance comparison
- perplexity
- phoneme
- phoneme string
- pos information
- probabilistic language model
- probabilities
- probability
- probability estimate
- probability estimates
- process
- pronoun
- punctuation
- representations
- research topic
- search space
- semantic
- semantic information
- semantic structure
- sentence
- sentences
- sparse data
- sparse data problem
- statistical language model
- statistics
- tag model
- tagged corpus
- tags
- terms
- test set
- text
- theory
- tokens
- training
- training corpus
- training data
- training set
- training text
- transformation
- tree
- tree representation
- trees
- trigram
- trigram language model
- trigram model
- unigram
- untagged corpora
- utterance
- verb
- vocabulary
- vocabulary size
- weighted average language model
- word
- word behavior
- word classes
- word frequencies
- word strings
- word types
- word-based language model
- word-class information
- words