ACL RD-TEC 1.0 Summarization of W97-0118

Paper Title:
THE EFFECTS OF CORPUS SIZE AND HOMOGENEITY ON LANGUAGE MODEL QUALITY

Other assigned terms:

  • approach
  • background corpus
  • bigram
  • bottom-up approach
  • british national corpus
  • case
  • chunks
  • classification scheme
  • coefficient
  • contingency table
  • corpora
  • corpus size
  • correlation
  • correlation coefficient
  • correlations
  • data sets
  • distribution
  • domain corpus
  • domain information
  • domain-specific corpora
  • electronic information
  • email
  • entropy
  • error rate
  • evaluation method
  • evaluation metric
  • evaluations
  • events
  • fact
  • frequency list
  • function words
  • genre
  • handwriting
  • human judgement
  • knowledge
  • language data
  • language model
  • language model quality
  • language models
  • large corpus
  • linguistic
  • linguistic phenomena
  • manual intervention
  • measure
  • measures
  • method
  • methodology
  • n-grams
  • noise
  • normal distribution
  • part-of-speech
  • perplexity
  • polarity
  • probabilities
  • probability
  • process
  • rank correlation
  • seed
  • similarity measure
  • similarity measures
  • similarity metric
  • sparse data
  • speech data
  • spoken email
  • standard deviation
  • statistic
  • subcorpus
  • sublanguage
  • tags
  • technique
  • terms
  • test data
  • text
  • text encoding initiative
  • textual similarity
  • toolkit
  • training
  • training corpus
  • training data
  • training text
  • transcriptions
  • trigram
  • unigram
  • utterance
  • vocabulary
  • word
  • word error rate
  • word frequencies
  • word frequency
  • word types
  • words

Extracted Section Types:


This page last edited on 10 May 2017.

*** ***