ACL RD-TEC 1.0 Summarization of W03-0315
Paper Title:
EFFICIENT OPTIMIZATION FOR BILINGUAL SENTENCE ALIGNMENT BASED ON LINEAR REGRESSION
EFFICIENT OPTIMIZATION FOR BILINGUAL SENTENCE ALIGNMENT BASED ON LINEAR REGRESSION
Authors: Bing Zhao and Klaus Zechner and Stephen Vogel and Alex Waibel
Primarily assigned technology terms:
- algorithm
- character encoding
- chinese-english sentence alignment
- classification
- crawler
- decomposition
- document alignment
- dynamic programming
- dynamic programming implementation
- encoding
- expectationmaximization
- genetic programming
- html parsing
- internet
- linear regression
- machine translation
- machine translation systems
- measuring
- mining
- model training
- modeling
- mt system
- natural language systems
- optimization
- parallel text mining
- parallel training
- parameter reestimation
- parser
- parsing
- preprocessing
- processing
- re-estimation
- re-scoring
- re-training
- reestimation
- regression
- score prediction
- scoring
- segmentation
- segmenter
- sentence alignment
- sentence alignment program
- smt system
- statistical machine translation
- statistical translation
- text mining
- text mining system
- translation systems
- viterbi
- viterbi alignment
- word segmenter
- word translation
Other assigned terms:
- agreement score
- aligned sentence
- alignment model
- alignment models
- alignment probability
- annotator
- annotators
- approach
- bilingual dictionary
- bilingual sentence
- boundary information
- break
- case
- characters
- chinese sentence
- chinese word
- chinese words
- chinese-english language pair
- comparable document
- conditional probability
- confidence score
- corpora
- correlation
- correlations
- data consortium
- dictionary
- distribution
- document
- english sentence
- english translation
- english vocabulary
- estimation
- exact match
- fact
- french
- gaussian distribution
- generation
- genre
- html document
- human annotator
- human annotators
- human judgment
- implementation
- interpolation
- language pair
- language pairs
- lexical features
- lexical information
- lexicon
- likelihood
- linear regression model
- linguistic
- linguistic data
- linguistic data consortium
- measures
- modeling power
- natural language
- noise
- paragraphs
- parallel corpora
- parallel corpus
- parallel sentence
- parallel text
- parallel training corpus
- parse
- perplexity
- probability
- procedure
- programming implementation
- punctuation
- quality judgment
- regression model
- sentence
- sentence boundary
- sentence length model
- sentence pair
- sentences
- source language
- source sentence
- statistics
- target language
- target word
- term
- text
- text genre
- training
- training corpus
- training data
- training set
- transformation
- translation candidates
- translation lexicon
- translation quality
- translations
- viterbi path
- vocabulary
- vocabulary size
- web pages
- word
- word boundary
- word frequency
- word level
- word pair
- words