ACL RD-TEC 1.0 Summarization of N03-2003
Paper Title:
GETTING MORE MILEAGE FROM WEB TEXT SOURCES FOR CONVERSATIONAL SPEECH LANGUAGE MODELING USING CLASS-DEPENDENT MIXTURES
GETTING MORE MILEAGE FROM WEB TEXT SOURCES FOR CONVERSATIONAL SPEECH LANGUAGE MODELING USING CLASS-DEPENDENT MIXTURES
Authors: Ivan Bulyko and Mari Ostendorf and Andreas Stolcke
Primarily assigned technology terms:
- algorithm
- clustering
- cross-domain language modeling
- decoding
- decomposition
- information retrieval
- language model adaptation
- language modeling
- large vocabulary speech recognizer
- latent semantic analysis
- linear interpolation
- maximum entropy
- meeting transcription
- mixture modeling
- model adaptation
- modeling
- n-best rescoring
- normalization
- pruning
- recognition
- recognition systems
- recognizer
- rescoring
- search
- search engine
- searching
- semantic analysis
- speech language modeling
- speech recognition
- speech recognition systems
- speech recognizer
- target recognition
- text normalization
- transcription
- web search
- web search engine
- world wide web
Other assigned terms:
- approach
- baseline model
- bigram
- bigram language model
- broadcast news
- case
- content words
- conversational speech
- conversational speech language
- conversational telephone speech
- corpora
- entropy
- exact match
- frequency counts
- function words
- hypotheses
- interpolation
- language model
- language model perplexity
- language models
- large vocabulary speech
- latent semantic
- linear combination
- mapping
- method
- model perplexity
- n-gram
- n-gram language model
- n-grams
- part-of-speech
- part-of-speech tags
- pauses
- perplexity
- posterior
- posterior probability
- probabilities
- probability
- queries
- recognition task
- search strategy
- semantic
- sentence
- sentence boundary
- speaking style
- statistics
- style
- switchboard training corpus
- syntactic structure
- tags
- technique
- terms
- test set
- text
- text corpora
- tokens
- topics
- training
- training corpus
- training data
- training material
- transcripts
- trigram
- user
- user utterances
- vocabulary
- web corpus
- web pages
- web text
- word
- word error rates
- word frequency
- words