ACL RD-TEC 1.0 Summarization of J00-3004

Paper Title:
A COMPRESSION-BASED ALGORITHM FOR CHINESE WORD SEGMENTATION

Authors: W. J. Teahan and Rodger McNab and Yingying Wen and Ian H. Witten

Other assigned terms:

  • ambiguity
  • annotators
  • approach
  • automata
  • bigram
  • bigram model
  • brown corpus
  • case
  • character sequence
  • characters
  • chinese characters
  • chinese language
  • chinese text
  • chinese word
  • chinese words
  • coding scheme
  • contextual information
  • corpora
  • corpus size
  • dictionary
  • distribution
  • document
  • document frequency
  • english text
  • error rate
  • evaluations
  • f-measure
  • fact
  • frequency distribution
  • gold standard
  • heuristic
  • heuristics
  • human judgment
  • index
  • input string
  • input text
  • interpretation
  • keyphrase
  • knowledge
  • language model
  • language thesaurus
  • lexicon
  • linguistic
  • linguistic information
  • linguistics
  • mandarin chinese
  • manual segmentation
  • meaning
  • measures
  • method
  • names
  • natural language
  • paragraphs
  • ph corpus
  • phrase
  • precision
  • probabilities
  • probability
  • probability estimates
  • procedure
  • process
  • punctuation
  • punctuation marks
  • queries
  • query
  • relative frequency
  • segmentation problem
  • segmented corpus
  • segments
  • semantic
  • semantic knowledge
  • sentence
  • sentences
  • source text
  • standard deviation
  • statistics
  • stem
  • suffix
  • technique
  • technologies
  • technology
  • terms
  • test data
  • test material
  • testing data
  • text
  • theory
  • thesaurus
  • tipster collection
  • topics
  • training
  • training and test data
  • training and testing data
  • training corpus
  • training data
  • training text
  • tree
  • trigram
  • typographical errors
  • user
  • word
  • word boundaries
  • word boundary
  • word frequencies
  • word meaning
  • words

Extracted Section Types:


This page last edited on 10 May 2017.

*** ***