ACL RD-TEC 1.0 Summarization of C04-1089
Paper Title:
MINING NEW WORD TRANSLATIONS FROM COMPARABLE CORPORA
MINING NEW WORD TRANSLATIONS FROM COMPARABLE CORPORA
Authors: Li Shao and Hwee Tou Ng
Primarily assigned technology terms:
- algorithm
- chinese character-to-pinyin mapping
- chunker
- direct translation
- document retrieval
- em algorithm
- entity recognizer
- entity translation
- expectation maximization
- information retrieval
- language modeling
- language modeling approach
- linear interpolation
- machine translation
- machine translation system
- machine transliteration
- maximum entropy
- measuring
- mining
- modeling
- named entity recognizer
- named entity translation
- part-of-speech tagging
- preprocessing
- probability estimation
- pruning
- ranking
- recognizer
- romanization
- search
- search space pruning
- searching
- segmentation
- segmenter
- sentence segmentation
- spelling
- tagging
- translation system
- transliteration
- vector space model
- weighting
- word segmenter
Other assigned terms:
- approach
- backoff
- bilingual dictionary
- bilingual lexicon
- bilingual lexicons
- candidate translation
- case
- characters
- chinese characters
- chinese corpus
- chinese gigaword corpus
- chinese text
- chinese translation
- chinese word
- chinese words
- comparable corpora
- comparable corpus
- context information
- context similarity
- corpora
- data consortium
- dictionary
- distribution
- document
- document collection
- english corpus
- english translation
- english translation candidate
- english translations
- entropy
- estimation
- events
- information source
- interpolation
- language model
- language pairs
- lexicon
- linguistic
- linguistic data
- linguistic data consortium
- mapping
- mappings
- maps
- method
- multinomial distribution
- named entity
- names
- noun phrase
- noun phrases
- organization names
- parallel corpora
- parallel texts
- part-of-speech
- part-of-speech tag
- person names
- phrase
- pinyin
- precision
- probabilistic model
- probabilities
- probability
- pronunciation
- query
- retrieval performance
- russian
- search space
- semantic
- semantic information
- semantic similarity
- sentence
- similarity model
- source language
- source language word
- sources of information
- syllables
- tag information
- target language
- technical terms
- term
- terms
- test set
- text
- training
- training data
- translation candidate
- translation candidates
- translation problem
- translations
- transliteration model
- unigram
- vector space
- window size
- word
- words