ACL RD-TEC 1.0 Summarization of P97-1056
Paper Title:
MEMORY-BASED LEARNING: USING SIMILARITY FOR SMOOTHING
MEMORY-BASED LEARNING: USING SIMILARITY FOR SMOOTHING
Authors: Jakub Zavrel and Walter Daelemans
Primarily assigned technology terms:
- algorithm
- approximation
- back-off algorithm
- back-off estimation
- back-off smoothing
- bayesian classifier
- capitalization
- classification
- classifier
- classifier algorithm
- clustering
- computing
- cross-validation
- decision trees
- disambiguation
- feature weighting
- forward-backward algorithm
- indexing
- induction
- inductive generalization
- interpolation method
- iterative training
- k-nearest neighbor
- k-nn
- language modeling
- language processing
- learning
- linear interpolation
- machine learning
- machine-learning
- matching
- maximum likelihood
- measuring
- memory-based language processing
- memory-based learning
- modeling
- n-gram language modeling
- naive back-off algorithm
- nearest neighbors
- nlp
- part of speech tagging
- pos-tagging
- pp-attachment disambiguation
- probabilistic classification
- processing
- re-estimation
- re-estimation smoothing
- smoothing
- speech tagging
- statistical approaches
- statistical language modeling
- tagging
- term selection
- validation
- vector representation
- voting
- weighted voting
- weighting
Other assigned terms:
- 10-fold cross-validation
- affix
- ambiguity
- analogy
- annotator
- approach
- automata
- back-off model
- bias
- case
- characters
- co-occurrences
- composition
- conditional distribution
- conditional probability
- contextual information
- cross-validation experiment
- data set
- distribution
- entropy
- estimation
- events
- fact
- feature
- feature set
- feature value
- feature vectors
- generalisation
- head noun
- heuristic
- information gain
- information sources
- information theory
- interpolation
- knowledge
- language processing tasks
- lexical categories
- lexicon
- likelihood
- linguistic
- linguistic knowledge
- logic
- manual intervention
- measure
- measures
- method
- methodology
- n-gram
- nlp tasks
- noun phrase
- parallelism
- parse
- part of speech
- penn treebank
- phrase
- pp-attachment
- prefixes and suffixes
- preposition
- prepositional phrase
- priori
- probabilities
- probability
- probability estimates
- processing tasks
- relation
- relative frequency
- representations
- schema
- semantic
- semantic knowledge
- sentences
- similarity metric
- similarity metrics
- sparse data
- sparse data problem
- statistical framework
- statistical information
- statistics
- style
- suffix
- suffixes
- tags
- term
- terms
- test material
- text
- theory
- training
- training corpus
- training data
- training material
- training set
- treebank
- treebank wsj corpus
- trees
- verb
- wall street journal corpus
- weighting scheme
- wildcard
- word
- word form
- words
- wsj corpus