ACL RD-TEC 1.0 Summarization of W03-1003
Paper Title:
CROSS-LINGUAL LEXICAL TRIGGERS IN STATISTICAL LANGUAGE MODELING
CROSS-LINGUAL LEXICAL TRIGGERS IN STATISTICAL LANGUAGE MODELING
Authors: Woosung Kim and Sanjeev Khudanpur
Primarily assigned technology terms:
- algorithm
- asr system
- automatic speech recognition
- bootstrap
- character recognition
- clustering
- computing
- cross-lingual information retrieval
- electronic translation
- giza
- identification
- illustration
- information retrieval
- k-means
- language model adaptation
- language modeling
- language processing
- likelihood approach
- linear interpolation
- lm adaptation
- machine translation
- matching
- maximum likelihood
- maximum likelihood approach
- model adaptation
- modeling
- morphological analyzers
- mt systems
- natural language processing
- nlp
- optical character recognition
- processing
- pronunciation modeling
- querytranslation
- recognition
- rescoring
- search
- segmentation
- speech recognition
- statistical machine translation
- statistical techniques
- taggers
- transcription
- vector-based information retrieval
Other assigned terms:
- acoustic model
- acoustic models
- approach
- asr output
- asr task
- benchmark
- bigram
- bilingual dictionary
- broadcast news
- cache
- chinese language
- chinese language model
- chinese text
- chinese translation
- chinese word
- chinese words
- cluster
- clusters
- comparable corpus
- conditional distribution
- corpora
- data flow
- data sparseness
- dictionary
- distribution
- document
- document frequency
- english corpus
- english text
- estimating trigger lm
- fact
- interpolation
- language information
- language model
- language models
- large corpus
- large text corpora
- lattices
- lexicon
- likelihood
- mandarin chinese
- mandarin pronunciation
- measure
- measures
- method
- mutual information
- natural language
- news corpus
- nist
- paragraph
- parallel corpus
- parallel text
- perplexity
- probabilistic model
- probabilities
- probability
- pronunciation
- query
- recognition errors
- relative frequency
- seed
- sentence
- statistic
- statistical language model
- statistical model
- statistics
- test corpora
- test data
- test set
- text
- text corpora
- text corpus
- tokens
- topics
- training
- training corpus
- training data
- training text
- transcriptions
- translation dictionary
- translation lexicon
- translation model
- translation models
- translation probability
- translations
- trigram
- trigram model
- unigram
- unigram model
- vocabulary
- word
- word error rates
- word frequencies
- word pair
- words