ACL RD-TEC 1.0 Summarization of H92-1073

Paper Title:
THE DESIGN FOR THE WALL STREET JOURNAL-BASED CSR CORPUS

Authors: Douglas B. Paul and Janet M. Baker

Other assigned terms:

  • abbreviation
  • abbreviations
  • acoustic models
  • ambiguity
  • array
  • benchmark
  • bias
  • bigram
  • case
  • community
  • complex sentence
  • continuous speech
  • corpora
  • corpus design
  • csr corpora
  • csr corpus
  • data set
  • dictionary
  • distribution
  • evaluation test
  • fact
  • french
  • french language
  • frequency distribution
  • labeling
  • language model
  • language models
  • lexical items
  • mapping
  • meaning
  • method
  • names
  • natural language
  • nist
  • noise
  • paragraph
  • paragraphs
  • penn treebank
  • perplexity
  • preprocessor
  • procedure
  • process
  • pronunciation
  • punctuation
  • punctuation mark
  • punctuation marks
  • recognition errors
  • sentence
  • sentence punctuation
  • sentences
  • speech corpus
  • speech data
  • spoken language
  • technology
  • test data
  • test material
  • test set
  • text
  • training
  • training data
  • training set
  • treebank
  • understanding
  • unigram
  • user
  • utterance
  • vocabulary
  • vocabulary size
  • vocabulary test
  • word
  • word frequency
  • word sequence
  • words
  • wsj corpus

Extracted Section Types:


This page last edited on 10 May 2017.

*** ***