ACL RD-TEC 1.0 Summarization of W00-1214

Paper Title:
MACHINE LEARNING METHODS FOR CHINESE WEB PAGE CATEGORIZATION

Authors: Ji He and Ah-Hwee Tan and Chew-Lim Tan

Other assigned terms:

  • benchmark
  • case
  • chinese corpus
  • chinese text
  • chinese word
  • classification tasks
  • corpora
  • data sets
  • distance metric
  • document
  • document feature
  • document frequency
  • document length
  • document set
  • domain knowledge
  • domain theory
  • duration
  • empirical evaluation
  • english text
  • euclidean distance
  • evaluation paradigm
  • feature
  • feature vector
  • index
  • information gain
  • information retrieval research
  • k value
  • keyword
  • knowledge
  • lexicon
  • likelihood
  • mapping
  • maps
  • measure
  • measures
  • mechanisms
  • method
  • mutual information
  • norm
  • parameter values
  • precision
  • process
  • representations
  • risk minimization principle
  • statistics
  • stems
  • style
  • support vector
  • system architecture
  • term
  • term frequency
  • terms
  • test corpus
  • testing data
  • text
  • text documents
  • theory
  • tokens
  • topics
  • training
  • training and testing data
  • training corpus
  • training data
  • training document
  • training documents
  • training examples
  • training set
  • user
  • web corpus
  • web page
  • web pages
  • web site
  • word
  • words

Extracted Section Types:


This page last edited on 10 May 2017.

*** ***