ACL RD-TEC 1.0 Summarization of W96-0205
Paper Title:
AUTOMATIC EXTRACTION OF NEW WORDS FROM JAPANESE TEXTS USING GENERALIZED FORWARD-BACKWARD SEARCH
AUTOMATIC EXTRACTION OF NEW WORDS FROM JAPANESE TEXTS USING GENERALIZED FORWARD-BACKWARD SEARCH
Primarily assigned technology terms:
- algorithm
- approximation
- automatic extraction
- back-off smoothing
- capitalization
- classifier
- computational linguistics
- computing
- corpus segmentation
- dictionary construction
- dictionary construction method
- dynamic programming
- dynamic programming search
- electronic dictionary
- extraction method
- finite-state transducer
- forward dynamic programming search
- forward-backward algorithm
- forward-backward search
- frequency counting
- hyphenation
- identification
- illustration
- information retrieval
- japanese morphological analysis
- japanese word segmentation
- matching
- maximum likelihood
- morphological analysis
- morphology
- n-best word segmentation
- nlp
- probabilistic word segmentation
- pruning
- reestimation
- search
- segmentation
- segmentation algorithm
- segmentation method
- segmenter
- smoothing
- spelling
- tagger
- taggers
- tagging
- tile
- transducer
- transliteration
- tree-trellis search
- truncation
- unknown word extraction
- viterbi
- viterbi algorithm
- viterbi reestimation
- viterbi search
- weighted finite-state transducer
- word bigram
- word extraction
- word extraction method
- word segmentation
- word segmentation task
- word segmenter
Other assigned terms:
- ambiguity
- annotation
- approach
- bigram
- bigram model
- boundary marker
- case
- character bigram model
- character sequence
- character type
- characters
- corpus size
- derivation
- dictionaries
- dictionary
- distribution
- edr corpus
- evaluation measures
- evaluation method
- events
- f-measure
- fact
- frequency distribution
- hypotheses
- hypothesis
- interpolation
- japanese corpus
- japanese sentences
- japanese text
- japanese word
- joint probability
- language data
- language model
- large corpus
- lexicon
- likelihood
- linguistics
- maps
- meaning
- measure
- measures
- method
- model probability
- morpheme
- n-gram
- names
- nlp applications
- noun phrase
- orthography
- out-of-vocabulary rate
- part of speech
- part-of-speech
- part-of-speech trigram
- part-of-speech trigram model
- particle
- parts of speech
- phrase
- poisson distribution
- precision
- probabilities
- probability
- procedure
- pronunciation
- pruning threshold
- relative frequency
- seed
- segmentation accuracy
- segmentation ambiguity
- segmented corpus
- semantic
- sentence
- sentences
- sparse data
- sparse data problem
- spelling model
- statistical language model
- statistical model
- statistics
- substring
- suffix
- suffixes
- symbol
- syntax
- tag sequence
- tagging model
- tags
- target language
- terms
- test set
- text
- theory
- tokens
- training
- training corpus
- training data
- training set
- transition probability
- trigram
- trigram model
- unigram
- unigram model
- unigram probability
- untagged corpus
- verb
- word
- word bigram model
- word boundary
- word frequencies
- word frequency
- word model
- word segmentation accuracy
- word segmentation ambiguity
- word sequence
- word tag
- word trigram
- word types
- words
- writing system