ACL RD-TEC 1.0 Summarization of P01-1005
Paper Title:
SCALING TO VERY VERY LARGE CORPORA FOR NATURAL LANGUAGE DISAMBIGUATION
SCALING TO VERY VERY LARGE CORPORA FOR NATURAL LANGUAGE DISAMBIGUATION
Authors: Michele Banko and Eric Brill
Primarily assigned technology terms:
- active learning
- active learning algorithm
- algorithm
- algorithm development
- bagging
- bayes classifier
- chunking
- classification
- classifier
- classifiers
- committee-based sampling
- corpus development
- cutoff
- disambiguation
- grammar checker
- language classification
- language disambiguation
- language processing
- learner
- learning
- learning algorithm
- learning algorithms
- learning approaches
- learning methods
- learning techniques
- machine learning
- machine learning algorithm
- machine learning algorithms
- machine learning methods
- machine learning techniques
- memory-based learner
- mining
- naive bayes
- naive bayes classifier
- naive bayes classifiers
- natural language classification
- natural language disambiguation
- natural language processing
- nlp
- parser
- parsing
- part of speech tagging
- part-of-speech tagger
- part-of-speech tagging
- perceptron
- processing
- sample selection
- sampling
- sense disambiguation
- sequential sampling
- set disambiguation
- single classifier
- speech tagger
- speech tagging
- standardization
- supervised learning
- tagger
- tagger training
- taggers
- tagging
- topic classifier
- unsupervised learning
- unsupervised training
- voting
- weakly supervised learning
- word sense disambiguation
Other assigned terms:
- ambiguity
- annotated corpus
- annotation
- approach
- bias
- case
- checker
- classification accuracy
- classification task
- community
- corpora
- corpus size
- data set
- data sets
- disambiguation task
- entropy
- fact
- grammar
- human annotation
- labeled training data
- labeling
- language classification task
- language disambiguation task
- large corpora
- large training
- large training corpora
- linguistic
- linguistic information
- manual annotation
- measure
- method
- natural language
- nlp community
- parse
- part of speech
- part-of-speech
- probabilities
- probability
- representations
- seed
- sentence
- sentences
- set size
- small training corpora
- tags
- target word
- technique
- test set
- text
- text corpora
- training
- training corpora
- training corpus
- training data
- training instance
- training material
- training samples
- training set
- training set size
- training size
- training time
- transcripts
- trees
- unlabeled corpus
- unlabeled examples
- wall street journal text
- word
- word sense
- words