ACL RD-TEC 1.0 Summarization of P94-1038
Paper Title:
SIMILARITY-BASED ESTIMATION OF WORD COOCCURRENCE PROBABILITIES
SIMILARITY-BASED ESTIMATION OF WORD COOCCURRENCE PROBABILITIES
Authors: Ido Dagan and Fernando Pereira
Primarily assigned technology terms:
- clustering
- cooccurrence smooihing
- cooccurrence smoothing
- discounting method
- estimator
- good-turing method
- k-nearest neighbor
- language modeling
- language processing
- likelihood estimate
- likelihood estimator
- linear interpolation
- maximum likelihood
- maximum likelihood estimator
- measuring
- model smoothing
- modeling
- natural language processing
- nearest neighbors
- nlp
- normalization
- parsing
- pattern recognition
- probabilistic parsing
- probability redistribution
- processing
- recognition
- recognizer
- similarity-based estimation
- smoothing
- smoothing method
- smoothing technique
- speech modeling
- speech recognition
- speech recognizer
- speech-recognition
- statistical methods
- statistical nlp
- weighting
Other assigned terms:
- acoustic model
- acoustic score
- analogy
- approach
- back-off model
- baseline model
- bigram
- bigram model
- case
- clusters
- conditional distribution
- conditional probabilities
- conditional probability
- confusion probability
- context words
- corpora
- data sparseness
- distribution
- entropy
- error rate
- estimation
- events
- experimental results
- fact
- frequency counts
- function word
- grammars
- hypotheses
- hypothesis
- independence assumption
- information sources
- interpolation
- language model
- language models
- lattice
- lattices
- lexicon
- likelihood
- linguistic
- measure
- method
- model parameters
- n-gram
- n-gram model
- n-grams
- natural language
- normalization factor
- parallelism
- parameter values
- perplexity
- predictive power
- probabilistic framework
- probabilities
- probability
- probability estimate
- probability estimates
- probability model
- process
- punctuation
- semantic
- semantic parallelism
- sentence
- sentences
- similarity measure
- similarity metric
- similarity metrics
- similarity model
- sparse data
- sparse data problem
- statistics
- syntactic constructions
- technique
- tense form
- terms
- test data
- test set
- text
- training
- training corpus
- unigram
- unigram probability
- wall street journal text
- word
- word association
- word classes
- word lattices
- word sequences
- word similarity
- words