ACL RD-TEC 1.0 Summarization of W06-0129
Paper Title:
CHARACTER LANGUAGE MODELS FOR CHINESE WORD SEGMENTATION AND NAMED ENTITY RECOGNITION
CHARACTER LANGUAGE MODELS FOR CHINESE WORD SEGMENTATION AND NAMED ENTITY RECOGNITION
Primarily assigned technology terms:
- bio coding
- capitalization
- chinese language processing
- chinese word segmentation
- chunker
- chunking
- coding
- computational linguistics
- confidence estimation
- decoder
- decoding
- encoding
- entity recognition
- estimator
- hidden markov
- hidden markov model
- language modeling
- language processing
- language processing toolkit
- likelihood estimate
- likelihood estimator
- linear interpolation
- markov model
- matching
- maximum likelihood
- maximum likelihood estimator
- modeling
- n-best chunker
- named entity recognition
- named-entity extraction
- natural language processing
- noisy channel spelling
- noisy-channel model
- parameter tuning
- processing
- ranking
- recognition
- rescoring
- segmentation
- segmentation system
- smoothing
- spelling
- spelling correction
- tagging
- tuning
- viterbi
- witten-bell smoothing
- word segmentation
- word segmentation system
- word segmentation task
Other assigned terms:
- alphabet
- approach
- association for computational linguistics
- baseline performance
- case
- characters
- chinese language
- chinese word
- chunk
- corpora
- distribution
- edit distance
- entity corpora
- estimation
- f-measure
- fact
- generative model
- hmm model
- hypotheses
- hypothesis
- implementation
- inflection
- interpolation
- joint probability
- joint probability distribution
- language model
- language models
- likelihood
- linguistics
- maximum likelihood estimate
- message
- n-gram
- n-grams
- named entities
- named entity
- named entity corpora
- named-entity
- named-entity task
- natural language
- ne corpus
- noisy channel
- out-of-vocabulary rate
- phrase
- precision
- prefixes and suffixes
- probabilities
- probability
- probability distribution
- process
- proper noun
- segmentation corpora
- sentence
- sentence boundaries
- sentences
- signal
- suffixes
- symbol
- symbols
- tagging problem
- tags
- test corpus
- test data
- text
- tokens
- toolkit
- training
- training corpora
- training data
- transposition
- weighted edit distance
- word
- words