ACL RD-TEC 1.0 Summarization of W00-1214
Paper Title:
MACHINE LEARNING METHODS FOR CHINESE WEB PAGE CATEGORIZATION
MACHINE LEARNING METHODS FOR CHINESE WEB PAGE CATEGORIZATION
Authors: Ji He and Ah-Hwee Tan and Chew-Lim Tan
Primarily assigned technology terms:
- algorithm
- automatic feature selection
- binary classification
- categorization
- chinese information processing
- chinese segmentation
- chinese text categorization
- chinese web
- chinese word segmentation
- class assignment
- classification
- classifier
- classifiers
- computational learning
- disambiguation
- document classification
- document processing
- encoding
- english text categorization
- feature ranking
- feature representation
- feature selection
- feature selection and extraction
- incremental learning
- information processing
- information retrieval
- k ranging
- keyword extraction
- knn
- knowledge representation
- lazy learning
- learning
- learning method
- learning methods
- learning process
- learning techniques
- machine learning
- machine learning methods
- machine learning techniques
- macro-averaging
- nearest neighbors
- neural networks
- page classification
- pattern abstraction
- pattern recognition
- processing
- quadratic programming
- ranking
- rating
- recognition
- recognition algorithm
- risk minimization
- rule representation
- scoring
- search
- search process
- segmentation
- statistical pattern recognition
- supervised learning
- support vector machines
- svm learning
- svm light
- svm problem
- term weighting
- text categorization
- voting
- web page classification
- weighting
- weighting method
- word segmentation
Other assigned terms:
- benchmark
- case
- chinese corpus
- chinese text
- chinese word
- classification tasks
- corpora
- data sets
- distance metric
- document
- document feature
- document frequency
- document length
- document set
- domain knowledge
- domain theory
- duration
- empirical evaluation
- english text
- euclidean distance
- evaluation paradigm
- feature
- feature vector
- index
- information gain
- information retrieval research
- k value
- keyword
- knowledge
- lexicon
- likelihood
- mapping
- maps
- measure
- measures
- mechanisms
- method
- mutual information
- norm
- parameter values
- precision
- process
- representations
- risk minimization principle
- statistics
- stems
- style
- support vector
- system architecture
- term
- term frequency
- terms
- test corpus
- testing data
- text
- text documents
- theory
- tokens
- topics
- training
- training and testing data
- training corpus
- training data
- training document
- training documents
- training examples
- training set
- user
- web corpus
- web page
- web pages
- web site
- word
- words