ACL RD-TEC 1.0 Summarization of W04-1610
Paper Title:
AUTOMATIC ARABIC DOCUMENT CATEGORIZATION BASED ON THE NAÏVE BAYES ALGORITHM
AUTOMATIC ARABIC DOCUMENT CATEGORIZATION BASED ON THE NAÏVE BAYES ALGORITHM
Authors: Mohamed El Kourdi and Amine Bensaid and Tajje-eddine Rachidi
Primarily assigned technology terms:
- algorithm
- arabic document categorization
- arabic root extraction
- arabic text categorization
- artificial neural networks
- automatic categorization
- automatic classification
- automatic text categorization
- bayes algorithm
- bayesian learning
- categorization
- classification
- classifier
- classifiers
- clustering
- cross validation
- cross-validation
- decision tree
- decision tree learning
- digital library
- document categorization
- document classification
- document management
- document preprocessing
- extraction algorithm
- extraction technique
- feature extraction
- feature selection
- grouping
- indexing
- information filtering
- information retrieval
- information retrieval tasks
- language processing
- learning
- learning algorithm
- learning algorithms
- machine learning
- machine learning algorithm
- machine learning algorithms
- management systems
- measuring
- morphology
- naive bayes
- natural language processing
- nb learning
- neural networks
- preprocessing
- processing
- root extraction
- scoring
- search
- search engines
- smoothing
- text categorization
- text classification
- text representation
- tree learning
- validation
- weighting
Other assigned terms:
- ambiguity
- arabic language
- arabic morphology
- arabic text
- canonical form
- case
- classification accuracy
- classification error
- classification error rate
- classification performance
- confusion matrix
- correlations
- culture
- data set
- data sets
- disjunction
- document
- document frequency
- error rate
- evaluation set
- experimental setting
- expert knowledge
- extraction process
- fact
- feature
- feature selection criterion
- information gain
- interpretation
- knowledge
- labeling
- learning module
- marketing
- measure
- method
- minimum description length
- natural language
- non-concatenative language
- paragraphs
- posteriori probability
- precision
- probabilities
- probability
- process
- statistic
- statistics
- stem
- stems
- technical documentation
- technique
- television
- term
- terms
- test set
- testing set
- text
- text documents
- theorem
- training
- training documents
- training set
- transformation
- tree
- vocabulary
- web documents
- web site
- web text
- word
- word morphology
- words