ACL RD-TEC 1.0 Summarization of H01-1052

Paper Title:
MITIGATING THE PAUCITY-OF-DATA PROBLEM: EXPLORING THE EFFECT OF TRAINING CORPUS SIZE ON CLASSIFIER PERFORMANCE FOR NATURAL LANGUAGE PROCESSING

Authors: M. Banko and E. Brill

Other assigned terms:

  • ambiguity
  • annotated corpus
  • annotation
  • approach
  • base noun
  • base noun phrase
  • brown corpus
  • case
  • classification accuracy
  • community
  • corpora
  • corpus size
  • determiner
  • distribution
  • error rate
  • feature
  • feature set
  • feature sets
  • feature space
  • grammars
  • knowledge
  • labeled training data
  • labeling
  • language disambiguation problem
  • large corpora
  • latent semantic
  • lexical features
  • linguistic
  • linguistic knowledge
  • method
  • named entity
  • natural language
  • noun phrase
  • paragraphs
  • parse
  • part of speech
  • part of speech tags
  • penn treebank
  • phrase
  • pronoun
  • pronoun case
  • scalability
  • semantic
  • sentence
  • sentence structure
  • sentences
  • set size
  • small training corpora
  • sparse data
  • style
  • syntactic context
  • system performance
  • tags
  • technology
  • term
  • terms
  • test corpus
  • test set
  • text
  • text corpora
  • tokens
  • training
  • training corpora
  • training corpus
  • training data
  • training material
  • training set
  • training set size
  • transcripts
  • treebank
  • trees
  • understanding
  • wall street journal text
  • word
  • word corpus
  • word sense
  • words

Extracted Section Types:


This page last edited on 10 May 2017.

*** ***