ACL RD-TEC 1.0 Summarization of C04-1081
Paper Title:
CHINESE SEGMENTATION AND NEW WORD DETECTION USING CONDITIONAL RANDOM FIELDS
CHINESE SEGMENTATION AND NEW WORD DETECTION USING CONDITIONAL RANDOM FIELDS
Authors: Fuchun Peng and Fangfang Feng and Andrew McCallum
Primarily assigned technology terms:
- algorithm
- chinese information processing
- chinese segmentation
- chinese word segmentation
- classifier
- classifiers
- conditional random fields
- crfs
- cross-validation
- detection method
- encoding
- entity extractor
- entropy classifier
- entropy learning
- error analysis
- exact inference
- extractor
- feature selection
- finite state
- finite state machine
- finitestate
- forward-backward algorithm
- hidden markov
- hidden markov model
- hidden markov models
- identification
- information processing
- information retrieval
- intelligent information retrieval
- internet
- language modeling
- language processing
- learning
- learning algorithms
- learning approaches
- learning methods
- learning procedure
- machine learning
- machine learning approaches
- machine learning methods
- markov model
- maximum entropy
- maximum entropy classifier
- maximum entropy classifiers
- maximum entropy model
- maximum likelihood
- modeling
- new word detection
- normalization
- optimization
- probabilistic new word detection
- processing
- regularization
- robust chinese word segmentation
- scoring
- scoring program
- search
- segmentation
- segmentation system
- sequence labeling
- sequence modeling
- sequence tagging
- statistical inference
- supervised learning
- tagger
- tagging
- unsupervised learning
- validation
- viterbi
- viterbi algorithm
- word detection
- word identification
- word segmentation
- word segmentation system
Other assigned terms:
- approach
- benchmark
- case
- character sequence
- characters
- chinese characters
- chinese sentence
- chinese word
- chinese words
- conditional probability
- ctb dataset
- data sets
- data sparseness
- dictionary
- distribution
- domain knowledge
- entropy
- entropy models
- fact
- feature
- gaussian prior
- generative models
- graph structure
- heuristic
- implementation
- input text
- intelligence
- knowledge
- labeling
- language models
- language processing tasks
- lexical knowledge
- lexicon
- likelihood
- likelihood function
- log-likelihood
- log-linear models
- markov models
- maximum entropy models
- measure
- method
- model structure
- modeling problem
- n-best list
- n-gram
- named entity
- names
- open test
- part-of-speech
- precision
- prior distribution
- probability
- procedure
- process
- processing tasks
- proper names
- segmentation accuracy
- segmentation problem
- segments
- sentence
- sentences
- sequence modeling problem
- tags
- terms
- test data
- test set
- text
- training
- training data
- training set
- training time
- vocabulary
- word
- word boundaries
- word category
- word category information
- word segmentation accuracy
- words