ACL RD-TEC 1.0 Summarization of N06-2013
Paper Title:
ARABIC PREPROCESSING SCHEMES FOR STATISTICAL MACHINE TRANSLATION
ARABIC PREPROCESSING SCHEMES FOR STATISTICAL MACHINE TRANSLATION
Authors: Nizar Habash and Fatiha Sadat
Primarily assigned technology terms:
- abstracting
- algorithm
- analyzer
- arabic disambiguation
- arabic morphological analyzer
- automatic alignment
- beam search
- beam search algorithm
- bootstrap
- bootstrap resampling
- classifiers
- decoder
- decoding
- disambiguation
- dynamic-programming beam search
- english-like tokenization
- learning
- lemmatization
- machine translation
- matching
- model training
- morphological analysis
- morphological analyzer
- morphological disambiguation
- morphology
- normalization
- optimization
- phrase translation
- pos tagging
- preprocessing
- reading
- regular expression
- regular expression matching
- resampling
- search
- search algorithm
- smt system
- splitting
- statistical machine translation
- tagging
- tokenization
- translation model training
- transliteration
- weight optimization
- word alignment
- word analysis
Other assigned terms:
- affixes
- alignment models
- ambiguity
- arabic-english parallel corpus
- beam
- bleu
- bleu metric
- bleu score
- catalan
- corpora
- data consortium
- disambiguation system
- english language
- english language model
- evaluation metric
- evaluation test
- fact
- feature
- genre
- knowledge
- language model
- language models
- lexeme
- linguistic
- linguistic data
- linguistic data consortium
- linguistic knowledge
- log-linear model
- method
- morphemes
- morphological features
- mt evaluation
- nist
- oracle
- orthography
- parallel corpus
- part-of-speech
- part-of-speech tags
- particle
- particles
- phrase
- phrase translation model
- process
- punctuation
- sentence
- sentences
- serbian
- stem
- stems
- syntactic knowledge
- tags
- technique
- terms
- test data
- test set
- text
- toolkit
- training
- training and test data
- training corpus
- training data
- translation model
- translation quality
- trigram
- verb
- word
- word form
- word-length feature
- words