ACL RD-TEC 1.0 Summarization of H01-1052
Paper Title:
MITIGATING THE PAUCITY-OF-DATA PROBLEM: EXPLORING THE EFFECT OF TRAINING CORPUS SIZE ON CLASSIFIER PERFORMANCE FOR NATURAL LANGUAGE PROCESSING
MITIGATING THE PAUCITY-OF-DATA PROBLEM: EXPLORING THE EFFECT OF TRAINING CORPUS SIZE ON CLASSIFIER PERFORMANCE FOR NATURAL LANGUAGE PROCESSING
Authors: M. Banko and E. Brill
Primarily assigned technology terms:
- annotation system
- classification
- classifier
- classifiers
- clustering
- decision trees
- disambiguation
- disambiguation problem
- entity recognition
- error identification
- identification
- language disambiguation
- language processing
- language technology
- latent semantic analysis
- learner
- learning
- learning algorithms
- learning methods
- learning techniques
- machine learning
- machine learning algorithms
- machine learning techniques
- machine translation
- maximum-entropy
- maximum-entropy parsing
- memory-based learner
- memory-based learning
- named entity recognition
- natural language disambiguation
- natural language processing
- natural language technology
- nlp
- parameter tuning
- parsers
- parsing
- part of speech tagging
- perceptron
- phrase labeling
- processing
- recognition
- sample selection
- semantic analysis
- semantic understanding
- sense disambiguation
- set disambiguation
- speech tagging
- spelling
- spelling correction
- supervised learning
- supervised training
- tagger
- tagging
- transformation-based learning
- tuning
- unsupervised learning
- word sense disambiguation
Other assigned terms:
- ambiguity
- annotated corpus
- annotation
- approach
- base noun
- base noun phrase
- brown corpus
- case
- classification accuracy
- community
- corpora
- corpus size
- determiner
- distribution
- error rate
- feature
- feature set
- feature sets
- feature space
- grammars
- knowledge
- labeled training data
- labeling
- language disambiguation problem
- large corpora
- latent semantic
- lexical features
- linguistic
- linguistic knowledge
- method
- named entity
- natural language
- noun phrase
- paragraphs
- parse
- part of speech
- part of speech tags
- penn treebank
- phrase
- pronoun
- pronoun case
- scalability
- semantic
- sentence
- sentence structure
- sentences
- set size
- small training corpora
- sparse data
- style
- syntactic context
- system performance
- tags
- technology
- term
- terms
- test corpus
- test set
- text
- text corpora
- tokens
- training
- training corpora
- training corpus
- training data
- training material
- training set
- training set size
- transcripts
- treebank
- trees
- understanding
- wall street journal text
- word
- word corpus
- word sense
- words