ACL RD-TEC 1.0 Summarization of W03-1701
Paper Title:
UNSUPERVISED TRAINING FOR OVERLAPPING AMBIGUITY RESOLUTION IN CHINESE WORD SEGMENTATION
UNSUPERVISED TRAINING FOR OVERLAPPING AMBIGUITY RESOLUTION IN CHINESE WORD SEGMENTATION
Authors: Mu Li and Jianfeng Gao and Chang-Ning Huang and Jianfeng Li
Primarily assigned technology terms:
- ambiguity resolution
- bayesian classifier
- binary classification
- chinese word segmentation
- classification
- classification process
- classifier
- classifiers
- disambiguation
- ensemble learning
- hybrid method
- learning
- likelihood estimation
- matching
- maximum likelihood
- maximum likelihood estimation
- maximum matching
- naive bayesian
- overlapping ambiguity resolution
- rule-based approach
- rule-based system
- search
- search engine
- search process
- segmentation
- segmentation tool
- statistical approaches
- statistical methods
- supervised training
- support vector machine
- tokenization
- training method
- training procedure
- training process
- unsupervised training
- word segmentation
Other assigned terms:
- ambiguity
- ambiguous word
- approach
- binary classification problem
- case
- character sequence
- characters
- chinese characters
- chinese text
- chinese text corpus
- chinese word
- classification problem
- classification task
- co-occurrence
- context feature
- context features
- context information
- context window
- context words
- contextual information
- data set
- distribution
- estimation
- evaluations
- experimental results
- fact
- feature
- feature set
- joint probability
- labeled training data
- language model
- lexicon
- likelihood
- measures
- method
- mutual information
- natural language
- open test
- oracle
- precision
- probabilities
- probability
- procedure
- process
- rule set
- search space
- sentence
- sentences
- statistical information
- statistical language model
- statistics
- substring
- support vector
- test set
- text
- text corpus
- tokens
- toolkit
- training
- training corpus
- training data
- training data set
- training set
- trigram
- trigram language model
- unigram
- unigram language model
- unigram probability
- window size
- word
- word boundaries
- word co-occurrence
- word sequence
- word sequences
- word trigram
- words