ACL RD-TEC 1.0 Summarization of W06-1648

Paper Title:
ARABIC OCR ERROR CORRECTION USING CHARACTER SEGMENT CORRECTION, LANGUAGE MODELING, AND SHALLOW MORPHOLOGY

Authors: Walid Magdy and Kareem Darwish

Other assigned terms:

  • acronym
  • anchors
  • approach
  • arabic morphology
  • association for computational linguistics
  • backoff
  • character error rate
  • characters
  • checker
  • cluster
  • clusters
  • compound words
  • confusion model
  • data sparseness
  • dictionaries
  • dictionary
  • document
  • edit distance
  • english grammar
  • english language
  • error rate
  • experimental results
  • factored language model
  • grammar
  • heuristics
  • language model
  • large corpus
  • levenshtein edit distance
  • likelihood
  • linguistic
  • linguistic context
  • linguistic features
  • linguistics
  • mapping
  • meanings
  • method
  • methodology
  • morphemes
  • morphological information
  • n-gram
  • n-grams
  • named entities
  • named entity
  • natural language
  • noisy channel
  • parse
  • part of speech
  • part of speech tags
  • passage
  • prefixes and suffixes
  • prior probability
  • probabilities
  • probability
  • probability estimates
  • process
  • recognition errors
  • segments
  • sentence
  • sentences
  • stem
  • stems
  • suffix
  • suffixes
  • surface form
  • tags
  • technique
  • term
  • term list
  • text
  • text corpus
  • text documents
  • tokens
  • toolkit
  • training
  • training corpus
  • training data
  • training examples
  • trigram
  • trigram language model
  • uniform probability
  • visual context
  • word
  • word error rate
  • word frequency
  • word level
  • word sequence
  • word trigram
  • words

Extracted Section Types:


This page last edited on 10 May 2017.

*** ***