ACL RD-TEC 1.0 Summarization of W03-1106
Paper Title:
TEXT CLASSIFICATION IN ASIAN LANGUAGES WITHOUT WORD SEGMENTATION
TEXT CLASSIFICATION IN ASIAN LANGUAGES WITHOUT WORD SEGMENTATION
Authors: Fuchun Peng and Xiangji Huang and Dale Schuurmans and Shaojun Wang
Primarily assigned technology terms:
- algorithm
- binary classi cation
- categorization
- chinese text categorization
- chinese text retrieval
- classi cation
- classi er
- classication
- cutoff
- explicit word segmentation
- feature selection
- feature selection process
- good-turing smoothing
- greedy search
- heuristic search
- information retrieval
- information scoring
- japanese text retrieval
- knearest neighbor
- language modeling
- language modeling approach
- language processing
- laplace smoothing
- learning
- learning algorithm
- learning approaches
- learning techniques
- machine learning
- machine learning techniques
- matching
- measuring
- modeling
- naive bayes
- natural language processing
- neural networks
- partial matching
- preprocessing
- processing
- ranking
- re-ranking
- scoring
- search
- segmentation
- selection process
- smoothing
- smoothing method
- smoothing technique
- smoothing techniques
- statistical language modeling
- support vector machine
- support vector machines
- svm approach
- text categorization
- text classi cation
- text classication
- text retrieval
- topic detection
- witten-bell smoothing
- word segmentation
Other assigned terms:
- approach
- asian language
- asian language text
- bayesian decision theory
- benchmark
- case
- characters
- chinese text
- classi cation accuracy
- confusion matrix
- contextual information
- culture
- data consortium
- data set
- data sets
- decision theory
- dimensionality
- distribution
- document
- empirical evaluation
- empirical results
- english text
- entropy
- events
- experimental results
- f-measure
- fact
- feature
- feature space
- heuristic
- independence assumption
- information retrieval research
- japanese text
- knowledge
- language model
- language models
- likelihood
- linguistic
- linguistic data
- linguistic data consortium
- measure
- method
- mutual information
- natural language
- news corpus
- passage
- perplexity
- perplexity reduction
- positive and negative examples
- precision
- probability
- probability estimates
- procedure
- process
- retrieval performance
- semantic
- sparse data
- standard benchmark
- support vector
- svms
- technique
- terms
- test corpus
- testing set
- text
- theory
- topics
- training
- training corpus
- training data
- training set
- vocabulary
- word
- word level
- word model
- word sequence
- word sequences
- words