ACL RD-TEC 1.0 Summarization of W06-3111
Paper Title:
PARTITIONING PARALLEL DOCUMENTS USING BINARY SEGMENTATION
PARTITIONING PARALLEL DOCUMENTS USING BINARY SEGMENTATION
Authors: Jia Xu and Richard Zens and Hermann Ney
Primarily assigned technology terms:
- algorithm
- beam search
- beam search algorithm
- binary segmentation
- binary sentence segmentation
- decomposition
- dynamic programming
- dynamic programming algorithm
- giza
- machine translation
- machine translation system
- machine translation systems
- modeling
- paragraph alignment
- parallel sentence extraction
- partitioning
- phrase extraction
- phrase-based translation
- phrase-based translation approach
- programming algorithm
- search
- search algorithm
- segmentation
- segmentation algorithm
- segmentation method
- segmentation system
- sentence alignment
- sentence extraction
- sentence segmentation
- splitting
- statistical alignment
- statistical machine translation
- statistical machine translation system
- translation system
- translation systems
- viterbi
- weighting
- word alignment
Other assigned terms:
- aligned corpus
- alignment accuracy
- alignment model
- alignment models
- anchor
- anchors
- approach
- beam
- bilingual corpora
- bilingual phrase
- bilingual sentence
- bilingual training corpus
- bleu
- bleu score
- break
- case
- corpora
- data consortium
- distribution
- document
- error rate
- evaluation set
- experimental results
- feature
- hypothesis
- ibm model
- interpolation
- knowledge
- language model
- language resources
- lexicon
- lexicon entries
- linguistic
- linguistic data
- linguistic data consortium
- log-linear model
- mapping
- measure
- method
- model parameters
- n-gram
- n-grams
- nist
- paragraph
- paragraphs
- parallel corpora
- parallel sentence
- parallel text
- phrase
- probability
- probability distributions
- procedure
- process
- punctuation
- punctuation marks
- recursion
- reordering
- search space
- segmentation accuracy
- segments
- sentence
- sentence pair
- sentences
- source language
- source language sentence
- source sentence
- statistics
- symbol
- target language
- target language model
- target language sentence
- target sentence
- text
- training
- training corpora
- training corpus
- training data
- training time
- translation hypothesis
- translation model
- translation models
- translation quality
- uniform distribution
- user
- word
- word alignment model
- word alignments
- word error rate
- word order
- words