ACL RD-TEC 1.0 Summarization of W96-0108
Paper Title:
A STATISTICAL APPROACH TO AUTOMATIC OCR ERROR CORRECTION IN CONTEXT
A STATISTICAL APPROACH TO AUTOMATIC OCR ERROR CORRECTION IN CONTEXT
Authors: Xiang Tong and David A. Evans
Primarily assigned technology terms:
- algorithm
- approximation
- bayesian classifier
- character recognition
- classifier
- computer science
- context-based word-error correction
- context-sensitive spelling-error correction
- cutoff
- database
- decomposition
- dynamic programming
- dynamic programming method
- error correction
- error reduction
- estimator
- information retrieval
- information retrieval systems
- information retrieval technique
- interfaces
- language modeling
- learning
- modeling
- natural-language processing
- non-word error correction
- ocr error correction
- ocr software
- office automation
- optical character recognition
- post-processing
- postprocessing
- processing
- processor
- programming method
- querying
- real-word error correction
- recognition
- retrieval systems
- retrieval technique
- scoring
- spelling
- spelling correction
- spelling-error correction
- statistical language modeling
- tagging
- tagging method
- text retrieval
- viterbi
- viterbi algorithm
- viterbi algorithm \
- word bigram
- word correction
- word error correction
- word-correction
- word-error correction
Other assigned terms:
- approach
- automatic correction
- back-off model
- bigram
- character sequence
- characters
- conditional probabilities
- conditional probability
- confusion probability
- confusion probability table
- context information
- device
- dictionary
- dictionary entries
- discourse
- discourse structures
- edit distance
- error rate
- error reduction rate
- evaluations
- events
- fact
- feature
- generation
- heuristics
- index
- input string
- language model
- language models
- lexicon
- lexicon entries
- lexicon entry
- meaning
- method
- n-gram
- n-gram vector
- n-grams
- natural-language
- part-of-speech
- prior probability
- probabilities
- probability
- process
- processing tasks
- query
- query vector
- sentence
- sentences
- source text
- statistical approach
- statistics
- substring
- system performance
- tags
- target string
- technique
- technology
- term
- term frequency
- test corpus
- test set
- text
- training
- training corpus
- training data
- training set
- training text
- transposition
- trigram
- vector space
- word
- word boundaries
- word boundary
- word error rate
- word meaning
- word sequence
- word trigram
- words