ACL RD-TEC 1.0 Summarization of W97-0309
Paper Title:
AGGREGATE AND MIXED-ORDER MARKOV MODELS FOR STATISTICAL LANGUAGE PROCESSING
AGGREGATE AND MIXED-ORDER MARKOV MODELS FOR STATISTICAL LANGUAGE PROCESSING
Authors: Lawrence Saul and Fernando Pereira
Primarily assigned technology terms:
- algorithm
- baum-welch algorithm
- class-based language modeling
- clustering
- decomposition
- density estimation
- em algorithm
- expectation-maximization
- hidden markov
- hidden markov models
- hmms
- interpolation algorithm
- iterative algorithm
- iterative procedure
- language modeling
- language processing
- large-vocabulary language modeling
- learning
- learning algorithms
- likelihood estimation
- markov model
- maximum entropy
- maximum entropy approach
- maximum entropy method
- maximum likelihood
- maximum likelihood estimation
- modeling
- n-gram modeling
- processing
- processor
- smoothing
- statistical language processing
- truncation
- tuning
- validation
- weighting
Other assigned terms:
- approach
- assignment probability
- backoff
- backoff model
- baseline model
- bias
- bigram
- bigram model
- case
- convergence
- correlations
- distribution
- entropy
- estimation
- events
- fact
- histogram
- interpolation
- interpretation
- language model
- language models
- learning problem
- likelihood
- log-likelihood
- mapping
- markov models
- measure
- method
- model parameters
- n-gram
- n-gram models
- n-grams
- natural language
- noun phrases
- perplexity
- posterior
- posterior probability
- prepositions
- probabilities
- probability
- probability distribution
- probability estimates
- procedure
- process
- punctuation
- punctuation marks
- sentence
- sentence boundaries
- sentences
- sparse data
- statistical language model
- statistics
- terms
- test data
- test set
- tokens
- training
- training corpus
- training data
- training efficiency
- training set
- training time
- transition matrix
- transition probabilities
- trigram
- trigram model
- understanding
- unigram
- unigram model
- vocabulary
- vocabulary size
- vowel
- wall street journal corpus
- word
- word classes
- word sequences
- words