ACL RD-TEC 1.0 Summarization of N03-1018

Paper Title:
A GENERATIVE PROBABILISTIC OCR MODEL FOR NLP APPLICATIONS

Authors: Okan Kolak and William Byrne and Philip Resnik

Other assigned terms:

  • alphabet
  • ambiguous word
  • approach
  • bigram
  • boundary marker
  • case
  • character error rate
  • character sequence
  • characters
  • chunks
  • co-occurrence
  • concept
  • confusion model
  • dictionary
  • distribution
  • document
  • electronic form
  • error rate
  • estimation
  • evaluation metrics
  • evaluations
  • experimental results
  • fact
  • finite state model
  • foreign language
  • french
  • french text
  • generation
  • generative model
  • glossary
  • heuristics
  • implementation
  • joint probability
  • knowledge
  • labeling
  • language information
  • language model
  • language resources
  • latin alphabet
  • lattice
  • lexicon
  • lexicon entries
  • likelihood
  • likelihood ratio
  • machine translation model
  • mapping
  • method
  • names
  • ngram
  • ngram language model
  • nlp applications
  • nlp task
  • nlp tasks
  • noisy channel
  • ocr performance
  • parallel text
  • parse
  • parse structure
  • precision
  • probabilistic models
  • probabilities
  • probability
  • process
  • punctuation
  • recognition errors
  • retrieval performance
  • rewrite rules
  • search space
  • segment boundaries
  • segment boundary
  • segments
  • style
  • symbols
  • technique
  • technology
  • test data
  • test set
  • text
  • tokens
  • toolkit
  • training
  • training and test data
  • training corpus
  • training data
  • training size
  • transformation
  • translation lexicon
  • translation model
  • trigram
  • trigram language model
  • trigram model
  • unigram
  • unigram language model
  • usability
  • user
  • vocabulary
  • vocabulary size
  • word
  • word boundaries
  • word boundary
  • word co-occurrence
  • word error rate
  • word level
  • word sequence
  • word sequences
  • words

Extracted Section Types:


This page last edited on 10 May 2017.

*** ***