ACL RD-TEC 1.0 Summarization of C04-1066
Paper Title:
JAPANESE UNKNOWN WORD IDENTIFICATION BY CHARACTER-BASED CHUNKING
JAPANESE UNKNOWN WORD IDENTIFICATION BY CHARACTER-BASED CHUNKING
Authors: Masayuki Asahara and Yuji Matsumoto
Primarily assigned technology terms:
- algorithm
- analyzer
- analyzer development
- character-based chunking
- character-based tagging
- chunker
- chunking
- classifier
- classifiers
- computing
- corpus annotation
- cross validation
- identification
- japanese morphological analysis
- kernel
- likelihood estimation
- markov model
- markov model estimation
- maximum entropy
- maximum likelihood
- maximum likelihood estimation
- measuring
- model estimation
- morphological analysis
- morphological analyzer
- n-best word segmentation
- polynomial kernel
- pos estimation
- position tagging
- processing
- recognition
- search
- search engine
- segmentation
- statistical method
- statistical methods
- statistical morphological analyzer development
- support vector machine
- support vector machines
- svm-based chunking
- tagging
- tagging method
- unknown word identification
- unknown word processing
- validation
- viterbi
- viterbi algorithm
- web search
- web search engine
- word identification
- word processing
- word segmentation
Other assigned terms:
- alphabet
- annotation
- auxiliary verbs
- baseline model
- case
- case particle
- character type
- characters
- chinese language
- chunking model
- chunking procedure
- compound words
- compounds
- contextual feature
- contextual information
- corpora
- corpus size
- csj corpus
- data set
- data sets
- dependency structure
- dictionary
- distribution
- edr corpus
- entropy
- estimation
- events
- experimental setting
- f-measure
- feature
- feature vector
- gold standard
- input string
- ipadic
- japanese corpus
- japanese language
- japanese text
- kanji
- katakana
- kernel function
- keyword
- kyoto university corpus
- kyoto university text corpus
- lattice
- lexicon
- likelihood
- maps
- method
- n-gram
- n-gram model
- names
- organization names
- part-of-speech
- particle
- patent
- person names
- pos information
- pos tag
- pos tag information
- precision
- probabilities
- probability
- procedure
- proper noun
- rwcp text corpus
- segmentation accuracy
- sentence
- sentences
- stem
- suffixes
- support vector
- svms
- tag information
- tagged corpora
- tags
- tagset
- test corpus
- test data
- text
- text corpus
- tokens
- training
- training and test data
- training corpus
- training data
- verb
- word
- word boundaries
- word definition
- word form
- word segmentation accuracy
- word sequences
- word tag
- word types
- words