ACL RD-TEC 1.0 Summarization of H01-1052

Paper Title:
MITIGATING THE PAUCITY-OF-DATA PROBLEM: EXPLORING THE EFFECT OF TRAINING CORPUS SIZE ON CLASSIFIER PERFORMANCE FOR NATURAL LANGUAGE PROCESSING

Authors: M. Banko and E. Brill

Primarily assigned technology terms:

Other assigned terms:

ambiguity
annotated corpus
annotation
approach
base noun
base noun phrase
brown corpus
case
classification accuracy
community
corpora
corpus size
determiner
distribution
error rate
feature
feature set
feature sets
feature space
grammars
knowledge
labeled training data
labeling
language disambiguation problem
large corpora
latent semantic
lexical features
linguistic
linguistic knowledge
method
named entity
natural language
noun phrase
paragraphs
parse
part of speech
part of speech tags
penn treebank
phrase
pronoun
pronoun case
scalability
semantic
sentence
sentence structure
sentences
set size
small training corpora
sparse data
style
syntactic context
system performance
tags
technology
term
terms
test corpus
test set
text
text corpora
tokens
training
training corpora
training corpus
training data
training material
training set
training set size
transcripts
treebank
trees
understanding
wall street journal text
word
word corpus
word sense
words

Extracted Section Types:

Download the PDF file from the ACL Anthology.
Brwose this paper on the University of Michigan CLAIR Group's interface.