ACL RD-TEC 1.0 Summarization of C00-2116
Paper Title:
AUTOMATIC CORPUS-BASED THAI WORD EXTRACTION WITH THE C4.5 LEARNING ALGORITHM
AUTOMATIC CORPUS-BASED THAI WORD EXTRACTION WITH THE C4.5 LEARNING ALGORITHM
Authors: VIRACH SORNLERTLAMVANICH and TANAPONG POTIPITI and THATSANEE CHAROENPORN
Primarily assigned technology terms:
- algorithm
- automatic word-extraction algorithm
- compound extraction
- conversational system
- decision tree
- decision tree induction
- disambiguation
- disambiguation process
- extraction procedure
- induction
- induction algorithm
- information retrieval
- knowledge bases
- language processing
- learning
- learning algorithm
- learning procedure
- machine translation
- measuring
- nlp
- parsing
- probability function
- processing
- pruning
- recognition
- segmentation
- sentence boundary dismnbiguation
- spelling
- thai word extraction
- tile
- tree induction
- word extraction
- word extraction procedure
- word segmentation
- word-extraction
- word-extraction algorithm
- word\/non-word disambiguation
Other assigned terms:
- alphabet
- approach
- characters
- compounds
- corpora
- dictionaries
- dictionary
- entropy
- error rate
- experimental results
- explicit word boundary
- extraction problem
- extraction process
- heuristic
- human judgement
- information gain
- japanese language
- knowledge
- language processing tasks
- large corpora
- leaf
- lexical entries
- lexical knowledge
- linear time
- measure
- measures
- method
- mutual information
- n-gram
- precision
- probability
- procedure
- process
- processing tasks
- sentence
- sentence boundary
- statistics
- substring
- subtree
- tags
- test corpus
- test set
- thai language
- thai word
- training
- training data
- training example
- training set
- tree
- word
- word boundary
- word frequency
- word strings
- words