ACL RD-TEC 1.0 Summarization of W97-0120
Paper Title:
A SELF-ORGANIZING JAPANESE WORD SEGMENTER USING HEURISTIC WORD IDENTIFICATION AND RE-ESTIMATION
A SELF-ORGANIZING JAPANESE WORD SEGMENTER USING HEURISTIC WORD IDENTIFICATION AND RE-ESTIMATION
Primarily assigned technology terms:
- algorithm
- approximation
- bracketing
- chinese word segmentation
- classification
- classifier
- estimation method
- estimation procedure
- frequency estimation
- frequency method
- greedy algorithm
- heuristic initial word identification
- heuristic word identification
- identification
- identification method
- japanese word segmentation
- learning
- lexical acquisition
- morphological process
- nlp
- re-estimation
- re-estimation procedure
- segmentation
- segmentation algorithm
- segmenter
- speech tagger
- spelling
- spelling correction
- statistical method
- tagger
- text retrieval
- unsupervised learning
- unsupervised word segmentation
- viterbi
- viterbi algorithm
- word bigram
- word frequency estimation
- word identification
- word segmentation
- word segmentation task
- word segmenter
Other assigned terms:
- alphabet
- ambiguity
- approach
- array
- bigram
- break
- character sequence
- character type
- characters
- chinese characters
- chinese word
- community
- corpora
- data structure
- dictionaries
- dictionary
- distribution
- estimation
- evaluation measures
- f-measure
- fact
- foreign words
- function words
- grammatical function
- heuristic
- heuristic rule
- heuristics
- human intervention
- hypotheses
- hypothesis
- input string
- japanese corpus
- japanese sentences
- japanese text
- japanese word
- joint probability
- kanji
- katakana
- language data
- language model
- large training
- lexical rules
- lexicon
- linguistics
- manual segmentation
- meaning
- measures
- method
- n-gram
- names
- nlp application
- out-of-vocabulary rate
- part of speech
- partial parses
- particles
- personal names
- phrase
- plural noun
- poisson distribution
- precision
- probabilities
- probability
- procedure
- process
- pronunciation
- punctuation
- relation
- roman alphabet
- seed
- segmentation accuracy
- segmented corpus
- semantic
- sentence
- sentences
- speech tag
- statistical language model
- substring
- suffix
- term
- terms
- text
- tokens
- training
- training corpus
- training set
- training text
- unigram
- unigram model
- unknown word model
- word
- word boundaries
- word boundary
- word formation
- word frequencies
- word frequency
- word lists
- word model
- word segmentation accuracy
- word sequence
- word types
- word-based language model
- word-based statistical language model
- words
- writing system