ACL RD-TEC 1.0 Summarization of P06-1010
Paper Title:
NAMED ENTITY TRANSLITERATION WITH COMPARABLE CORPORA
NAMED ENTITY TRANSLITERATION WITH COMPARABLE CORPORA
Authors: Richard Sproat and Tao Tao and ChengXiang Zhai
Primarily assigned technology terms:
- algorithm
- alignment algorithm
- candidate generation
- candidate scoring
- candidate selection
- computational linguistics
- english-chinese transliteration
- entity recognizer
- entity transliteration
- error analysis
- festival text-tospeech system
- generation method
- information retrieval
- learning
- machine learning
- machine translation
- machine translation system
- matching
- maximum-entropy
- mi method
- mining
- mutual information method
- name transliteration
- named entity transliteration
- named-entity recognizer
- pearson correlation
- phonetic alignment
- phonetic scoring
- phonetic transliteration
- propagation algorithm
- ranking
- ranking algorithm
- recognizer
- score propagation
- scoring
- scoring method
- selection process
- text-tospeech
- text-tospeech system
- translation system
- transliteration
- web retrieval
Other assigned terms:
- approach
- association for computational linguistics
- character sequence
- characters
- chinese named-entity
- cluster
- co-occurrence
- co-occurrence relation
- coefficient
- comparable corpora
- corpora
- correlation
- correlation coefficient
- correlations
- dictionaries
- dictionary
- distribution
- document
- english language
- english phone string
- english text
- english-chinese corpus
- estimation
- fact
- frequency correlation
- frequency distribution
- generation
- good-turing estimation
- hindi
- interpolation
- jensen-shannon divergence
- language pairs
- linguistics
- measure
- measures
- method
- mutual information
- n-grams
- named entities
- named entity
- named-entity
- names
- noise
- pagerank
- pearson correlation coefficient
- pinyin
- probabilities
- probability
- process
- pronunciation
- relation
- russian
- similarity measures
- source-channel model
- syllables
- technique
- terms
- test set
- text
- text documents
- tone
- toolkit
- topics
- topology
- training
- training data
- transition probabilities
- translations
- transliteration candidate
- web pages
- word
- words