ACL RD-TEC 1.0 Summarization of W03-0504
Paper Title:
SUMMARIZATION OF NOISY DOCUMENTS: A PILOT STUDY
SUMMARIZATION OF NOISY DOCUMENTS: A PILOT STUDY
Authors: Hongyan Jing and Daniel Lopresti and Chilin Shih
Primarily assigned technology terms:
- abstract generation
- automatic summarization
- boundary detection
- capitalization
- character recognition
- chunking
- co-reference resolution
- decision tree
- decision tree approach
- deep understanding
- document summarization
- editing
- entity extraction
- extraction system
- extraction systems
- extraction technology
- extractor
- extrinsic evaluation
- finite state
- finite state modeling
- image-based summarization
- indexing
- information extraction
- information retrieval
- internet
- intrinsic evaluation
- language processing
- measuring
- modeling
- modeling speech
- named entity extraction
- natural language processing
- noisy document summarization
- optical character recognition
- parser
- parsers
- parsing
- part-of-speech tagger
- part-of-speech tagging
- phrase chunking
- preprocessing
- processing
- ranking
- recognition
- rule-based system
- segmentation
- sentence boundary detection
- sentence combination
- sentence extraction
- sentence extraction system
- sentence reduction
- smoothing
- speech recognition
- spelling
- statistical parser
- summarization
- summarization process
- summarization system
- summarization systems
- summarizer
- syntactic parser
- syntactic parsing
- tagger
- tagging
- text summarization
- tokenization
- tokenizer
- topic segmentation
Other assigned terms:
- approach
- bigram
- boundary information
- break
- broadcast news
- broadcast news speech
- case
- characters
- co-reference
- cohesion
- complete parse
- confidence score
- confidence scores
- contextual information
- cue phrases
- dictionary
- document
- document layout
- document set
- document vectors
- duration
- duration information
- edit distance
- english text
- estimation
- experimental results
- generation
- good-turing estimation
- grammar
- heuristic
- heuristic rules
- index
- information sources
- input text
- intention
- knowledge
- language models
- large corpus
- lexical cohesion
- likelihood
- linguistic
- main verb
- mappings
- measure
- measures
- method
- n-gram
- named entity
- natural language
- noise
- noise rate
- noisy input
- noun phrases
- nouns
- ocr performance
- opinion
- paragraphs
- parse
- parse tree
- part-of-speech
- part-of-speech tag
- pause
- pause duration
- phrase
- precision
- probability
- process
- punctuation
- punctuation marks
- query
- recognition errors
- sentence
- sentence boundaries
- sentence boundary
- sentence level
- sentences
- slot
- sources of information
- sparse data
- speech recognition errors
- statistics
- symbol
- symbols
- syntactic information
- synthetic noise
- synthetic noise rate
- technique
- technology
- test set
- text
- text documents
- tokens
- training
- transcripts
- trec corpus
- tree
- trees
- trigram
- understanding
- unigram
- user
- user interaction
- verb
- word
- word error rates
- word frequency
- words