ACL RD-TEC 1.0 Summarization of W95-0108
Paper Title:
BEYOND WORD N-GRAMS
BEYOND WORD N-GRAMS
Authors: Fernando Pereira and Yoram Singer and Naftali Tishby
Primarily assigned technology terms:
- adaptive language modeling
- algorithm
- automaton
- batch training
- bayesian approach
- coding
- estimation method
- finite-state automaton
- good-turing method
- handwriting recognition
- language modeling
- language processing
- learning
- learning algorithm
- learning procedure
- machine learning
- machine translation
- matching
- model construction
- modeling
- n-gram estimation
- online adaptation
- online learning
- probabilistic finite-state
- probability function
- processing
- pruning
- pst learning
- pst learning algorithm
- random walk
- recognition
- recognition systems
- search
- sentence recognition
- speech recognition
- statistical modeling
- suffix tree
- training algorithm
- word prediction
- word-sequence prediction
Other assigned terms:
- alphabet
- approach
- array
- backoff
- backoff model
- bag of words
- bayesian framework
- bias
- bigram
- bigram model
- brown corpus
- case
- characters
- conditional probability
- corpora
- correlations
- data structure
- data structures
- derivation
- distribution
- estimation
- events
- fact
- feature
- finite alphabet
- formalism
- frequency counts
- grammatical structure
- handwriting
- implementation
- information theory
- interpolation
- interpretation
- language model
- language processing research
- large training
- large training corpora
- leaf
- likelihood
- method
- mixture models
- model structure
- n-gram
- n-gram model
- n-gram models
- n-grams
- names
- natural language
- natural languages
- perplexity
- phrase
- posterior
- posterior probability
- predictive power
- prior probability
- probabilistic model
- probabilities
- probability
- probability distribution
- probability estimates
- procedure
- process
- pruning threshold
- recursion
- recursive structure
- relation
- root node
- semantic
- semantic relations
- semantic relationships
- sentence
- sentences
- sentiment
- subtree
- subtrees
- suffix
- suffixes
- symbol
- symbols
- syntactic structure
- technical terms
- terms
- test corpora
- test data
- test material
- test set
- text
- text length
- theory
- training
- training corpora
- training corpus
- training data
- training phase
- training set
- tree
- trees
- trigram
- trigram model
- vocabulary
- vocabulary size
- wildcard
- word
- word sequences
- words