ACL RD-TEC 1.0 Summarization of P96-1042
Paper Title:
MINIMIZING MANUAL ANNOTATION COST IN SUPERVISED TRAINING FROM CORPORA
MINIMIZING MANUAL ANNOTATION COST IN SUPERVISED TRAINING FROM CORPORA
Authors: Sean P. Engelson and Ido Dagan
Primarily assigned technology terms:
- algorithm
- approximation
- batch selection
- batch selection method
- bigram tagging
- bigram training
- categorization
- cd-rom
- classification
- classifier
- classifiers
- committee-based sample selection
- committee-based sampling
- committee-based selection
- comparative evaluation
- concept learning
- disambiguation
- estimator
- example selection
- heterogeneous uncertainty sampling
- hidden markov
- hidden markov model
- hidden markov models
- instantiation
- language processing
- learner
- learning
- learning algorithm
- learning method
- learning program
- likelihood estimate
- markov model
- maximum likelihood
- measuring
- member selection
- member sequential selection
- modeling
- natural language processing
- nlp
- parsing
- part-of-speech tagging
- probabilistic classification
- probabilistic classifier
- probability function
- processing
- randomized selection
- reasoning
- sample selection
- sampling
- selection algorithm
- selection method
- sense disambiguation
- sentence ordering
- smoothing
- statistical methods
- statistical nlp
- statistical parsing
- supervised training
- tagger
- tagging
- terminology
- text categorization
- tuning
- uncertainty sampling
- unsupervised training
- word sense disambiguation
Other assigned terms:
- ambiguous word
- ambiguous words
- annotated corpora
- annotated corpus
- annotation
- annotation effort
- approach
- bigram
- bigram model
- break
- case
- category label
- classification accuracy
- composition
- concept
- conditional probability
- corpora
- data sparseness
- derivation
- derivation process
- dirichlet distribution
- distribution
- document
- entropy
- estimation
- events
- experimental results
- feature
- frequency counts
- generation
- head word
- hmm model
- implementation
- information gain
- interpolation
- labeling
- language models
- lexicon
- likelihood
- linguistic
- linguistic structure
- manual annotation
- markov models
- maximum likelihood estimate
- measure
- measures
- method
- model parameter
- model parameters
- model size
- natural language
- nlp tasks
- parameter settings
- parameter values
- parse
- parse tree
- part of speech
- part-of-speech
- parts of speech
- posterior
- posterior distribution
- posterior probability
- preposition
- prior distribution
- probabilistic model
- probabilities
- probability
- probability distribution
- probability estimate
- probability model
- procedure
- process
- sentence
- sentences
- size of the corpus
- statistics
- syntactic structure
- tag sequence
- tagged corpus
- tagging accuracy
- tags
- term
- terms
- test set
- text
- textual corpora
- trained model
- training
- training corpora
- training corpus
- training data
- training example
- training examples
- training set
- transition probabilities
- transition probability
- tree
- uniform probability
- unlabeled examples
- user
- verb
- word
- word sense
- word senses
- word sequence
- words