ACL RD-TEC 1.0 Summarization of P98-2152
Paper Title:
JAPANESE OCR ERROR CORRECTION USING CHARACTER SHAPE SIMILARITY AND STATISTICAL LANGUAGE MODEL
JAPANESE OCR ERROR CORRECTION USING CHARACTER SHAPE SIMILARITY AND STATISTICAL LANGUAGE MODEL
Primarily assigned technology terms:
- algorithm
- approximate string match
- back-off smoothing
- beam search
- character clustering
- character recognition
- classification
- clustering
- computing
- context-independent approximate word match
- context-sensitive spelling
- context-sensitive spelling correction
- correction method
- dynamic programing
- editing
- error correction
- error correction method
- feature extraction
- feature selection
- good-turing method
- information retrieval
- japanese character recognition
- language modeling
- matching
- modeling
- nlp
- noisy channel model
- ocr error correction
- office automation
- partial matching
- recognition
- search
- segmentation
- segmentation algorithm
- similarity method
- simulator
- smoothing
- smoothing method
- speech recognition
- spelling
- spelling correction
- statistical language modeling
- statistical modeling
- string match
- text compression
- vector quantization
- viterbi-like word segmentation
- word bigram
- word error correction
- word matching
- word segmentation
Other assigned terms:
- approximate word match
- beam
- bigram
- bigram model
- boundary marker
- case
- character bigram model
- character sequence
- characters
- cluster
- confusion matrix
- confusion probability
- corpora
- dictionaries
- dictionary
- distance metric
- distribution
- document
- edit distance
- edr corpus
- electric engineering
- english speech
- events
- fact
- feature
- feature vector
- feature vectors
- foreign words
- geometric distribution
- handwriting
- heuristic
- hypotheses
- hypothesis
- index
- input string
- inverted index
- japanese corpus
- japanese sentences
- joint probability
- language model
- language models
- likelihood
- linguistic
- measures
- method
- ngram
- noisy channel
- part of speech
- perplexity
- poisson distribution
- priori
- probabilities
- probability
- procedure
- process
- pronunciation
- rank order
- recognition accuracy
- recognition errors
- sentence
- sentences
- statistical language model
- substring
- symbol
- symbols
- technique
- technology
- test data
- test set
- text
- training
- training corpus
- training data
- training set
- unigram
- unigram probability
- vocabulary
- word
- word bigram model
- word boundaries
- word boundary
- word model
- word perplexity
- word sequence
- words