ACL RD-TEC 1.0 Summarization of P06-1073
Paper Title:
MAXIMUM ENTROPY BASED RESTORATION OF ARABIC DIACRITICS
MAXIMUM ENTROPY BASED RESTORATION OF ARABIC DIACRITICS
Authors: Imed Zitouni and Jeffrey S. Sorensen and Ruhi Sarikaya
Primarily assigned technology terms:
- algorithm
- analyzer
- arabic segmentation system
- beam-search
- classification
- classifier
- classifiers
- computational linguistics
- computing
- conditional random fields
- decoding
- diacritic restoration
- dynamic programming
- dynamic programming search
- entity recognition
- entropy classifier
- example-based classification
- expectation maximization
- finite state
- finite state machine
- finite state transducers
- generalized iterative scaling
- hidden markov
- hidden markov models
- hmm-based diacritization
- iterative scaling
- language modeling
- language processing
- machine modeling
- matching
- maxent
- maxent classifier
- maximum entropy
- maximum entropy approach
- maximum entropy classifier
- maximum entropy classifiers
- maximum entropy framework
- maximum entropy model
- modeling
- modeling technique
- morphological analysis
- morphological analyzer
- named entity recognition
- natural language processing
- normalization
- parsing
- pos tagging
- processing
- recognition
- regularization
- reporting
- rule-based system
- search
- search algorithm
- segmentation
- segmentation system
- sequence classification
- shallow parsing
- speech recognition
- syllabification
- tagging
- tagging system
- top-down approach
- transducers
- unsupervised tagging
- viterbi
- viterbi search
- vocalization
- vowel restoration
Other assigned terms:
- acoustic signal
- affixes
- alphabet
- ambiguity
- approach
- arabic text
- arabic treebank
- array
- association for computational linguistics
- binary features
- case
- character sequence
- characters
- class probability
- classification problem
- comparative study
- conditional probability
- context information
- contextual information
- diacritization error rate
- distribution
- document
- english translation
- entropy
- error rate
- experimental results
- feature
- feature sets
- feature space
- feature types
- formal speech
- grammar
- grapheme
- heuristic
- hmm model
- hypotheses
- inflection
- information sources
- input text
- interpretation
- knowledge
- labeling
- language processing tasks
- lattice
- lexical features
- lexicon
- likelihood
- linguistics
- markov models
- markov sequence
- maxent model
- meaning
- method
- modern standard arabic
- morphological information
- n-gram
- n-gram model
- n-gram models
- n-grams
- named entity
- natural language
- natural language processing tasks
- normalization factor
- opinion
- parse
- parsing model
- part-of-speech
- part-of-speech tag
- pause
- probabilities
- probability
- probability distribution
- processing tasks
- pronoun
- runtime
- search space
- search task
- segment-based information
- segments
- sentence
- sentence meaning
- signal
- sources of information
- standard arabic
- statistical model
- statistical models
- stem
- suffix
- suffixes
- symbols
- syntactic information
- system performance
- tag sequence
- tagging problem
- tags
- technique
- term
- terms
- testing data
- testing set
- text
- training
- training and testing data
- training corpus
- training data
- training examples
- training phase
- training set
- treebank
- treebank corpus
- trigram
- utterance
- verb
- vowel
- word
- word error rate
- word level
- words