ACL RD-TEC 1.0 Summarization of C96-2138
Paper Title:
CONTENT-ORIENTED CATEGORIZATION OF DOCUMENT IMAGES
CONTENT-ORIENTED CATEGORIZATION OF DOCUMENT IMAGES
Primarily assigned technology terms:
- algorithm
- categorization
- character recognition
- classification
- classifier
- comparative evaluation
- content characterization
- content-oriented categorization
- database
- downstream processing
- encoding
- european language identification
- extraction system
- identification
- information extraction
- information extraction system
- information retrieval
- information retrieval systems
- internet
- language identification
- language processing
- laser printer
- natural language processing
- nlp
- optical character recognition
- processing
- recognition
- retrieval systems
- segmentation
- statistical categorization
- statistical techniques
- token processing
- word recognition
- word stemming
- word-spotting
Other assigned terms:
- ambiguity
- approach
- categorization task
- characters
- correlation
- cosine measure
- data set
- dictionaries
- dictionary
- document
- experimental results
- feature
- generation
- hypothesis
- lexical information
- lexicon
- mapping
- measure
- measures
- natural language
- nouns
- process
- processing time
- punctuation
- punctuation marks
- recipe
- recognition accuracy
- recognition errors
- relative frequency
- standard deviation
- suffix
- suffixes
- sun microsystems
- technique
- technology
- terms
- text
- tokens
- topics
- training
- training data
- vector space
- word
- word boundary
- words