ACL RD-TEC 1.0 Summarization of N03-1025

Paper Title:
LANGUAGE AND TASK INDEPENDENT TEXT CATEGORIZATION WITH SIMPLE LANGUAGE MODELS

Authors: Fuchun Peng and Dale Schuurmans and Shaojun Wang

Other assigned terms:

  • approach
  • author attribution
  • authorship
  • authorship attribution
  • bag of words
  • baseline performance
  • bayesian decision theory
  • benchmark
  • case
  • categorization problem
  • characters
  • chinese characters
  • chinese text
  • classification accuracy
  • classification performance
  • classification problem
  • data set
  • data sets
  • decision theory
  • distribution
  • document
  • entropy
  • events
  • experimental results
  • f-measure
  • fact
  • feature
  • feature sets
  • feature space
  • fmeasure
  • french
  • genre
  • heuristic
  • index
  • interpolation
  • japanese text
  • knowledge
  • language model
  • language model quality
  • language models
  • likelihood
  • linguistic
  • maximum likelihood estimate
  • meaning
  • measure
  • measures
  • method
  • methodology
  • mutual information
  • n-gram
  • n-gram model
  • n-gram models
  • n-grams
  • natural language
  • paragraphs
  • perplexity
  • perplexity reduction
  • posterior
  • posterior probability
  • precision
  • probabilities
  • probability
  • probability estimates
  • process
  • relation
  • semantic
  • sentences
  • sentiment
  • sparse data
  • sparse data problem
  • style
  • support vector
  • technique
  • terms
  • test corpus
  • testing set
  • text
  • text categorization problem
  • text genre
  • theory
  • training
  • training corpus
  • training data
  • vocabulary
  • vocabulary size
  • word
  • word model
  • word sequence
  • word sequences
  • words

Extracted Section Types:


This page last edited on 10 May 2017.

*** ***