ACL RD-TEC 1.0 Summarization of P06-1069
Paper Title:
A COMPARISON AND SEMI-QUANTITATIVE ANALYSIS OF WORDS AND CHARACTER-BIGRAMS AS FEATURES IN CHINESE TEXT CATEGORIZATION
A COMPARISON AND SEMI-QUANTITATIVE ANALYSIS OF WORDS AND CHARACTER-BIGRAMS AS FEATURES IN CHINESE TEXT CATEGORIZATION
Authors: Jingyang Li and Maosong Sun and Xian Zhang
Primarily assigned technology terms:
- algorithm
- categorization
- centroid-based classifier
- chi-tfidf
- chinese text categorization
- chinese text processing
- classification
- classifier
- classifiers
- computational linguistics
- data collection
- dimensionality reduction
- document indexing
- feature selection
- indexing
- kernel
- latent semantic indexing
- learning
- machine learning
- maximum match
- multi-class classification
- processing
- segmentation
- semantic indexing
- statistical analysis
- support vector machine
- svm classifier
- term weighting
- text categorization
- text classification
- text processing
- weighting
- word segmentation
- word-indexing
Other assigned terms:
- ambiguity
- approach
- association for computational linguistics
- bigram
- category label
- chinese language
- chinese text
- chinese words
- classification performance
- classification task
- complementation
- dimensionality
- document
- document collection
- document collections
- encyclopedia
- fact
- feature
- feature list
- feature selection criterion
- feature selection scheme
- feature space
- implementation
- information quantity
- information theory
- latent semantic
- linguistics
- meaning
- meanings
- measure
- method
- noise
- part-of-speech
- performance comparison
- phrase
- precision
- probability
- probability density
- processing tasks
- qualitative analysis
- scalability
- semantic
- semantic information
- sparseness problem
- statistics
- support vector
- svm implementation
- tags
- term
- term weighting scheme
- terms
- test set
- text
- text categorization evaluation
- theory
- training
- training data
- training document
- training phase
- training set
- training time
- weighting scheme
- word
- word features
- words