ACL RD-TEC 1.0 Summarization of W06-1710
Paper Title:
WEB CORPUS MINING BY INSTANCE OF WIKIPEDIA
WEB CORPUS MINING BY INSTANCE OF WIKIPEDIA
Authors: RĂ¼diger Gleim and Alexander Mehler and Matthias Dehmer
Primarily assigned technology terms:
- 5-fold cross validation
- agglomerative clustering
- algorithm
- automatic classification
- automatic webgenre tagging
- binary categorization
- categorization
- category assignment
- class detection
- classification
- classifiers
- cluster analysis
- clustering
- compiler
- corpus analysis
- corpus linguistics
- cross validation
- disambiguation
- distance measurement
- document classification
- dynamic programming
- feature selection
- hierarchical clustering
- k-means
- k-means clustering
- kernel
- kernels
- learning
- linearization
- linking
- measuring
- mining
- optimization
- parameter selection
- radial basis function
- rating
- sampling
- sequence alignment
- structure analysis
- structure learning
- structuring
- support vector machine
- tagging
- text categorization
- tree alignment
- tree edit distance
- tree linearization
- validation
- web mining
- webgenre tagging
Other assigned terms:
- ambiguity
- approach
- argumentation
- bag of words
- case
- categorization task
- cluster
- clusters
- collocation
- comparative study
- computational complexity
- corpora
- distance matrix
- document
- document object model
- document structure
- edit distance
- f-measure
- fact
- feature
- feature vectors
- genre
- human reader
- hypothesis
- implementation
- information gain
- interpretation
- large corpora
- lexical content
- linguistic
- linguistics
- mapping
- maps
- markup
- measure
- measures
- method
- polymorphism
- probability
- random order
- relation
- representations
- segments
- signal
- support vector
- svms
- tags
- test corpus
- test set
- text
- text structure
- textual unit
- textual units
- tokens
- topics
- training
- training examples
- training set
- tree
- tree node
- trees
- web content
- web corpus
- web documents
- web page
- web pages
- webgenre
- wikipedia
- words