ACL RD-TEC 1.0 Summarization of E06-1028

Paper Title:
A FIGURE OF MERIT FOR THE EVALUATION OF WEB-CORPUS RANDOMNESS

Authors: Massimiliano Ciaramita and Marco Baroni

Other assigned terms:

  • alphabet
  • american english
  • approach
  • bias
  • british english
  • british national corpus
  • brown corpus
  • case
  • composition
  • corpora
  • data sparseness
  • dictionary
  • discipline
  • distance matrix
  • distribution
  • document
  • document collection
  • email
  • empirical evaluation
  • entropy
  • estimation
  • experimental setting
  • fact
  • finite alphabet
  • frequency distribution
  • frequency list
  • function word
  • function words
  • genre
  • heuristic
  • hypothesis
  • interpretation
  • language model
  • language models
  • lexical resource
  • linguistic
  • linguistic corpora
  • linguistic data
  • linguistics
  • linguists
  • manual intervention
  • measure
  • measures
  • method
  • methodology
  • n-grams
  • navigational information
  • pairs of words
  • priori
  • probabilities
  • procedure
  • qualitative analysis
  • queries
  • query
  • random order
  • random sample
  • relative frequency
  • russian
  • search strategy
  • seed
  • seed words
  • similarity measure
  • sociology
  • specialized corpora
  • statistics
  • sub-language
  • subcorpus
  • tags
  • target language
  • technical terms
  • technique
  • terms
  • text
  • tokens
  • topics
  • unigram
  • vocabulary
  • web corpus
  • web documents
  • web pages
  • web-based corpora
  • web-corpus randomness
  • word
  • word frequency
  • word lists
  • word model
  • word types
  • wordnet
  • words

Extracted Section Types:


This page last edited on 10 May 2017.

*** ***