ACL RD-TEC 1.0 Summarization of P05-3031
Paper Title:
REFORMATTING WEB DOCUMENTS VIA HEADER TREES
REFORMATTING WEB DOCUMENTS VIA HEADER TREES
Authors: Minoru Yoshida and Hiroshi Nakagawa
Primarily assigned technology terms:
Other assigned terms:
- approach
- association for computational linguistics
- bigram
- case
- characters
- cluster
- concepts
- data sparseness
- document
- document set
- f-measure
- fact
- heuristic
- heuristic rules
- heuristics
- html document
- joint probability
- likelihood
- linguistics
- measure
- method
- noise
- parameter space
- precision
- probabilistic model
- probabilistic models
- probability
- relation
- representations
- semantic
- semantic structures
- server
- symbols
- tag information
- tags
- terms
- training
- training data
- training examples
- tree
- tree representation
- trees
- unigram
- user
- web documents
- web pages
- words