ACL RD-TEC 1.0 Summarization of A94-1004
Paper Title:
MODELING CONTENT IDENTIFICATION FROM DOCUMENT IMAGES
MODELING CONTENT IDENTIFICATION FROM DOCUMENT IMAGES
Primarily assigned technology terms:
- automatic document categorization
- capitalization
- categorization
- character recognition
- classification
- code generation
- content identification
- document categorization
- identification
- information retrieval
- language determination
- language processing
- matching
- modeling
- natural language processing
- optical character recognition
- processing
- ranking
- recognition
- reporting
- statistical method
- text categorization
Other assigned terms:
- ambiguity
- approach
- character code
- characters
- content words
- determiners
- distribution
- document
- function word
- function words
- generation
- hypothesis
- key words
- lexicon
- linguistic
- linguistic information
- mapping
- method
- natural language
- nouns
- prepositions
- process
- pronouns
- punctuation
- punctuation marks
- sentence
- technique
- text
- tokens
- training
- training data
- word
- word frequencies
- words