ACL RD-TEC 1.0 Summarization of W04-1118
Paper Title:
DO WE NEED CHINESE WORD SEGMENTATION FOR STATISTICAL MACHINE TRANSLATION?
DO WE NEED CHINESE WORD SEGMENTATION FOR STATISTICAL MACHINE TRANSLATION?
Authors: Jia Xu and Richard Zens and Hermann Ney
Primarily assigned technology terms:
- algorithm
- chinese word segmentation
- chinese-english translation
- decomposition
- dictionary learning
- explicit word segmentation
- extraction algorithm
- giza
- language processing
- learning
- learning method
- machine translation
- machine translation approach
- machine translation system
- modeling
- natural language processing
- parallel training
- processing
- search
- segmentation
- segmentation method
- segmentation tool
- segmenter
- statistical machine translation
- statistical machine translation system
- statistical translation
- training method
- training procedure
- translation system
- translation systems
- tuning
- viterbi
- viterbi alignment
- word alignment
- word segmentation
- word segmenter
Other assigned terms:
- aligned sentence
- alignment information
- alignment model
- alignment models
- alignment template
- ambiguous segmentation
- approach
- bayes decision rule
- bilingual corpora
- bilingual corpus
- bilingual dictionary
- bilingual training corpus
- bleu
- bleu score
- bleu scores
- case
- character sequence
- characters
- chinese characters
- chinese corpus
- chinese text
- chinese treebank
- chinese word
- chinese words
- corpora
- data consortium
- decision rule
- dictionary
- distribution
- english sentence
- english translation
- english translations
- error rate
- evaluations
- experimental results
- french
- generation
- idiomatic expressions
- knowledge
- language model
- language processing tasks
- lexicon
- lexicon model
- linguistic
- linguistic data
- linguistic data consortium
- machine translation research
- mapping
- measure
- measures
- method
- monolingual dictionary
- mutual information
- named entities
- named entity
- natural language
- natural language processing tasks
- nist
- parallel training corpus
- precision
- prior distribution
- prior probability
- probability
- probability distributions
- procedure
- processing tasks
- punctuation
- punctuation marks
- reference translation
- search problem
- segmented corpus
- sentence
- sentence pair
- sentences
- source language
- source language sentence
- source sentence
- statistics
- symbol
- symbols
- target language
- target language model
- target language sentence
- target sentence
- term
- test corpus
- text
- training
- training corpus
- training data
- training text
- translation model
- translation models
- translation quality
- translation research
- translation task
- translations
- treebank
- web pages
- word
- word boundaries
- word dictionary
- word error rate
- word frequency
- word order
- words