ACL RD-TEC 1.0 Summarization of W06-1648
Paper Title:
ARABIC OCR ERROR CORRECTION USING CHARACTER SEGMENT CORRECTION, LANGUAGE MODELING, AND SHALLOW MORPHOLOGY
ARABIC OCR ERROR CORRECTION USING CHARACTER SEGMENT CORRECTION, LANGUAGE MODELING, AND SHALLOW MORPHOLOGY
Authors: Walid Magdy and Kareem Darwish
Primarily assigned technology terms:
- algorithm
- arabic information retrieval
- arabic ocr
- arabic ocr error correction
- classifier
- clustering
- clustering technique
- computational linguistics
- dictionary lookup
- dynamic programming
- edit distance algorithm
- error correction
- finite state
- frequency analysis
- good-turing smoothing
- information retrieval
- language modeling
- language processing
- levenshtein
- listing
- modeling
- morphological analysis
- morphological processing
- morphology
- n-gram frequency analysis
- natural language processing
- noisy channel model
- ocr error correction
- parsing
- phological analysis
- post-processing
- probabilistic relaxation
- processing
- ranking
- reading
- recognition
- recognizer
- smoothing
- spell checker
- spell checking
- spelling
- spelling correction
- stemmer
- word clustering
- word prediction
Other assigned terms:
- acronym
- anchors
- approach
- arabic morphology
- association for computational linguistics
- backoff
- character error rate
- characters
- checker
- cluster
- clusters
- compound words
- confusion model
- data sparseness
- dictionaries
- dictionary
- document
- edit distance
- english grammar
- english language
- error rate
- experimental results
- factored language model
- grammar
- heuristics
- language model
- large corpus
- levenshtein edit distance
- likelihood
- linguistic
- linguistic context
- linguistic features
- linguistics
- mapping
- meanings
- method
- methodology
- morphemes
- morphological information
- n-gram
- n-grams
- named entities
- named entity
- natural language
- noisy channel
- parse
- part of speech
- part of speech tags
- passage
- prefixes and suffixes
- prior probability
- probabilities
- probability
- probability estimates
- process
- recognition errors
- segments
- sentence
- sentences
- stem
- stems
- suffix
- suffixes
- surface form
- tags
- technique
- term
- term list
- text
- text corpus
- text documents
- tokens
- toolkit
- training
- training corpus
- training data
- training examples
- trigram
- trigram language model
- uniform probability
- visual context
- word
- word error rate
- word frequency
- word level
- word sequence
- word trigram
- words