ACL RD-TEC 1.0 Summarization of P06-1001
Paper Title:
COMBINATION OF ARABIC PREPROCESSING SCHEMES FOR STATISTICAL MACHINE TRANSLATION
COMBINATION OF ARABIC PREPROCESSING SCHEMES FOR STATISTICAL MACHINE TRANSLATION
Authors: Fatiha Sadat and Nizar Habash
Primarily assigned technology terms:
- abstracting
- algorithm
- analyzer
- arabic disambiguation
- arabic morphological analyzer
- automatic alignment
- beam search
- beam search algorithm
- bootstrap
- bootstrap resampling
- coding
- computational linguistics
- concatenative morphology
- decoder
- decoding
- disambiguation
- document alignment
- dynamic-programming beam search
- encoding
- feature representation
- learning
- lemmatization
- machine translation
- matching
- model training
- morphological analysis
- morphological analyzer
- morphological generation
- morphology
- normalization
- one-hot coding
- optimization
- orthographic normalization
- phrase translation
- phrase-based statistical machine translation
- pos tagging
- pre-processing
- preprocessing
- reading
- resampling
- rescoring
- scheme combination
- search
- search algorithm
- smt system
- spelling
- splitting
- statistical machine translation
- synthesis
- tagging
- tokenization
- translation model training
- weight optimization
- word alignment
- word analysis
- word matching
- word-sense disambiguation
Other assigned terms:
- affixes
- alignment models
- ambiguity
- approach
- arabic treebank
- arabic-english parallel corpus
- association for computational linguistics
- beam
- binary features
- bleu
- bleu metric
- bleu score
- brevity penalty
- case
- catalan
- corpora
- correlation
- data consortium
- data sets
- derivational morphology
- determiner
- development set
- disambiguation system
- document
- english language
- english language model
- english translation
- evaluation metric
- evaluation test
- experimental results
- fact
- feature
- generation
- generation system
- heuristic
- hypotheses
- ibm model
- inflection
- knowledge
- language model
- language models
- large training
- lemma
- lexeme
- lexemes
- linguistic
- linguistic data
- linguistic data consortium
- linguistics
- log-linear combination
- log-linear model
- mood
- morph
- morpheme
- morphemes
- morphological features
- morphological information
- morphological rule
- mt evaluation
- nist
- noun phrases
- nouns
- nunation
- oracle
- orthography
- parallel corpus
- particle
- particles
- perplexity
- phrase
- phrase translation model
- pos tag
- preposition
- prepositions
- probabilities
- process
- pronoun
- punctuation
- reference translations
- sentence
- sentences
- serbian
- source language
- source sentence
- statistical significance
- statistics
- stem
- stems
- syntactic knowledge
- syntax
- tag set
- tags
- technique
- templatic morphology
- terms
- test data
- test set
- text
- tokens
- toolkit
- training
- training corpus
- training data
- training size
- translation model
- translation quality
- translations
- treebank
- treebank tag set
- trigram
- trigram language model
- word
- word sense
- word-length feature
- words