ACL RD-TEC 1.0 Summarization of H92-1073
Paper Title:
THE DESIGN FOR THE WALL STREET JOURNAL-BASED CSR CORPUS
THE DESIGN FOR THE WALL STREET JOURNAL-BASED CSR CORPUS
Authors: Douglas B. Paul and Janet M. Baker
Primarily assigned technology terms:
- acoustic modeling
- automatic adaptation
- cd-rom
- comparative evaluation
- database
- dictating
- information system
- interactive system
- language model training
- language modeling
- language processing
- language technology
- model training
- modeling
- natural language processing
- preprocessing
- processing
- processor
- recognition
- resource management
- scoring
- speaker-adaptation
- speech recognition
- spelling
- spontaneous dictation
- statistical language modeling
- structuring
- text preprocessing
- text processing
- text selection
- text-to-speech
- text-to-speech system
- text-to-speech system \
- transcription
- word processing
Other assigned terms:
- abbreviation
- abbreviations
- acoustic models
- ambiguity
- array
- benchmark
- bias
- bigram
- case
- community
- complex sentence
- continuous speech
- corpora
- corpus design
- csr corpora
- csr corpus
- data set
- dictionary
- distribution
- evaluation test
- fact
- french
- french language
- frequency distribution
- labeling
- language model
- language models
- lexical items
- mapping
- meaning
- method
- names
- natural language
- nist
- noise
- paragraph
- paragraphs
- penn treebank
- perplexity
- preprocessor
- procedure
- process
- pronunciation
- punctuation
- punctuation mark
- punctuation marks
- recognition errors
- sentence
- sentence punctuation
- sentences
- speech corpus
- speech data
- spoken language
- technology
- test data
- test material
- test set
- text
- training
- training data
- training set
- treebank
- understanding
- unigram
- user
- utterance
- vocabulary
- vocabulary size
- vocabulary test
- word
- word frequency
- word sequence
- words
- wsj corpus