ACL RD-TEC 1.0 Summarization of N03-1025
Paper Title:
LANGUAGE AND TASK INDEPENDENT TEXT CATEGORIZATION WITH SIMPLE LANGUAGE MODELS
LANGUAGE AND TASK INDEPENDENT TEXT CATEGORIZATION WITH SIMPLE LANGUAGE MODELS
Authors: Fuchun Peng and Dale Schuurmans and Shaojun Wang
Primarily assigned technology terms:
- automated text categorization
- categorization
- categorization learning
- chinese text categorization
- chinese text retrieval
- classification
- classifier
- classifiers
- coding
- estimator
- explicit word segmentation
- feature construction
- feature engineering
- feature selection
- feature selection process
- genre classification
- good-turing smoothing
- greedy search
- greek genre classification
- identification
- independent text categorization
- information retrieval
- japanese text categorization
- japanese text retrieval
- k-nearest neighbor
- language identification
- language modeling
- language modeling approach
- language processing
- laplace smoothing
- learning
- learning algorithms
- learning method
- learning process
- learning techniques
- likelihood estimate
- linear interpolation
- machine learning
- machine learning techniques
- maximum likelihood
- modeling
- morphology
- morphology analysis
- n-gram language modeling
- natural language processing
- neural networks
- nlp
- nlp analysis
- normalization
- output coding
- pre-processing
- processing
- ranking
- recognition
- search
- segmentation
- selection process
- sentiment classification
- smoothing
- smoothing method
- smoothing technique
- smoothing techniques
- speech recognition
- statistical language modeling
- stop-word removal
- support vector machines
- svm approach
- text categorization
- text classification
- text classifier
- text compression
- text genre classification
- text retrieval
- topic detection
- topic identification
- witten-bell smoothing
- word segmentation
- word-based approach
Other assigned terms:
- approach
- author attribution
- authorship
- authorship attribution
- bag of words
- baseline performance
- bayesian decision theory
- benchmark
- case
- categorization problem
- characters
- chinese characters
- chinese text
- classification accuracy
- classification performance
- classification problem
- data set
- data sets
- decision theory
- distribution
- document
- entropy
- events
- experimental results
- f-measure
- fact
- feature
- feature sets
- feature space
- fmeasure
- french
- genre
- heuristic
- index
- interpolation
- japanese text
- knowledge
- language model
- language model quality
- language models
- likelihood
- linguistic
- maximum likelihood estimate
- meaning
- measure
- measures
- method
- methodology
- mutual information
- n-gram
- n-gram model
- n-gram models
- n-grams
- natural language
- paragraphs
- perplexity
- perplexity reduction
- posterior
- posterior probability
- precision
- probabilities
- probability
- probability estimates
- process
- relation
- semantic
- sentences
- sentiment
- sparse data
- sparse data problem
- style
- support vector
- technique
- terms
- test corpus
- testing set
- text
- text categorization problem
- text genre
- theory
- training
- training corpus
- training data
- vocabulary
- vocabulary size
- word
- word model
- word sequence
- word sequences
- words