ACL RD-TEC 1.0 Summarization of C94-2108
Paper Title:
CONTENT CHARACTERIZATION USING WORD SHAPE TOKENS
CONTENT CHARACTERIZATION USING WORD SHAPE TOKENS
Authors: Penelope Sibun and David S. Farrar
Primarily assigned technology terms:
- algorithm
- approximation
- baum-welch algorithm
- character recognition
- classification
- computing
- content characterization
- database
- document classification
- document processing
- document rceognition
- document understanding
- encoding
- grammatical function assignment
- hidden markov
- identification
- language identification
- markov model
- matching
- noun phrase recognition
- optical character recognition
- part-of-speech lagging
- part-of-speech tagging
- phrase recognition
- processing
- rceognition
- recognition
- recognizer
- speech tagger
- spelling
- structuring
- tagger
- taggers
- tagging
- text processing
- text tagging
- tile
- tokenizer
- topic identification
- training process
- viterbi
- viterbi algorithm
- word shape tagging
- word tagging
Other assigned terms:
- adjective
- ambiguity
- approach
- case
- characters
- determiner
- document
- document content
- document text
- dutch
- fact
- french
- genre
- grammatical function
- index
- large corpus
- lexicon
- linguistics
- mapping
- maps
- method
- noun phrase
- noun phrases
- nouns
- part of speech
- part-of-speech
- part-of-speech information
- parts of speech
- phrase
- plural noun
- probabilities
- probability
- process
- processing tasks
- punctuation
- punctuation marks
- queries
- sentence
- sentence boundaries
- sentence boundary
- sentences
- stems
- style
- suffixes
- surface form
- surface form word
- syntactic information
- tag set
- tags
- tagset
- technique
- technology
- terms
- text
- text database
- tile lexicon
- tile word
- tokens
- topics
- training
- training corpus
- understanding
- verb
- verb forms
- verb tag
- word
- words