ACL RD-TEC 1.0 Summarization of J96-3004
Paper Title:
A STOCHASTIC FINITE-STATE WORD-SEGMENTATION ALGORITHM FOR CHINESE
A STOCHASTIC FINITE-STATE WORD-SEGMENTATION ALGORITHM FOR CHINESE
Authors: Richard Sproat and William Gales and Chilin Shih and Nancy Chang
Primarily assigned technology terms:
- algorithm
- approximation
- automatic segmentation
- character coding
- chinese segmentation
- chinese word segmentation
- classification
- classifier
- classifiers
- coding
- computational linguistics
- computing
- crawling
- database
- decomposition
- electronic dictionary
- encoding
- final state
- finite state
- finite state transducer
- finite-state acceptor
- finite-state framework
- finite-state transducer
- finite-state transducers
- good-turing method
- greedy algorithm
- grouping
- identification
- information retrieval
- likelihood estimate
- listing
- machine translation
- mandarin tts
- matching
- matching algorithm
- maximum likelihood
- maximum matching
- modeling
- morphological analysis
- morphological decomposition
- multidimensional scaling
- name identification
- name recognition
- natural language system
- nlp
- parsers
- part-of-speech assignment
- partial evaluation
- re-estimation
- re-estimation procedure
- reading
- recognition
- recognition system
- romanization
- rule-based system
- scoring
- segmentation
- segmentation algorithm
- segmenter
- speech recognition
- speech synthesis
- splitting
- statistical approaches
- statistical method
- statistical methods
- syntactic analysis
- synthesis
- tagging
- text analysis
- text-to-speech
- text-to-speech synthesis
- tokenization
- transducer
- transducers
- transduction
- translation system
- transliteration
- tts system
- tuning
- unification-based approach
- viterbi
- viterbi algorithm
- weighted finite-state transducer
- weighted finite-state transducers
- word segmentation
- word segmenter
- word-segmentation
Other assigned terms:
- abbreviation
- abbreviations
- acronym
- adjective
- adverb
- affix
- affixation
- affixes
- ambiguity
- approach
- arithmetic mean
- backoff
- backoff model
- bias
- bigram
- case
- chinese sentence
- chinese text
- chinese word
- class-based model
- cluster
- constraint satisfaction
- contextual information
- corpora
- correlation
- data consortium
- dictionaries
- dictionary
- dictionary entries
- dictionary entry
- discourse
- discourse context
- distance matrix
- empty string
- english sentence
- english text
- essay
- evaluation method
- evaluations
- fact
- foreign words
- foreign-name
- frame
- genre
- grammar
- grammars
- grammatical features
- grammatical information
- heuristics
- human judgments
- human performance
- hypotheses
- implementation
- independence model
- interpretation
- intonational phrase
- knowledge
- language model
- language models
- lattice
- lexical information
- lexical relations
- lexical rules
- lexicon
- likelihood
- linguistic
- linguistic constraints
- linguistic data
- linguistic data consortium
- linguistic information
- linguistic knowledge
- linguistic work
- linguistics
- machine-readable dictionary
- main verb
- mandarin chinese
- mapping
- mappings
- maps
- maximum likelihood estimate
- meaning
- meanings
- measure
- measures
- method
- modal verb
- morpheme
- morphemes
- morphological information
- morphological rules
- mutual information
- names
- natural language
- nlp application
- nlp task
- nouns
- orthography
- paragraphs
- part of speech
- part-of-speech
- part-of-speech information
- pause
- personal names
- phrase
- pinyin
- plural noun
- precision
- precision measure
- probabilities
- probability
- probability estimate
- procedure
- process
- pronunciation
- proper noun
- punctuation
- question formation
- relation
- relaxation technique
- segmentation problem
- segments
- semantic
- semantic class
- semantic classes
- semantic features
- semantic interpretation
- sentence
- sentences
- similarity matrix
- similarity measures
- singular noun
- source language
- statistical information
- stem
- stems
- style
- suffix
- suffixes
- syllables
- symbols
- technique
- term
- terms
- test corpora
- test corpus
- test set
- text
- text database
- theorem
- tokens
- tone
- toolkit
- topics
- training
- training corpus
- transcriptions
- transitive closure
- trees
- unigram
- unigram model
- verb
- vocabulary
- word
- word boundaries
- word classes
- word corpus
- word frequencies
- word frequency
- word level
- word meaning
- words
- writing system