ACL RD-TEC 1.0 Summarization of P06-2045
Paper Title:
A COLLABORATIVE FRAMEWORK FOR COLLECTING THAI UNKNOWN WORDS FROM THE WEB
A COLLABORATIVE FRAMEWORK FOR COLLECTING THAI UNKNOWN WORDS FROM THE WEB
Authors: Choochart Haruechaiyasak and Chatchawal Sangkeettrakarn and Pornpimon Palingoon and Sarawoot Kongyoung and Chaianun Damrongrat
Primarily assigned technology terms:
- algorithm
- analyzer
- boundary identification
- boundary identification approach
- chunking
- computational linguistics
- crawler
- database
- decision trees
- decision-tree
- dictionary-based word-segmentation
- disambiguation
- editing
- extraction system
- identification
- identification process
- information retrieval
- internet
- iterative method
- longest matching
- longest-matching word segmentation
- machine translation
- matching
- morphological analysis
- morphological analyzer
- nlp
- nlp systems
- parser
- pattern matching
- pattern-matching
- pattern-matching technique
- phological analysis
- pos disambiguation
- post-processing
- segmentation
- segmentation algorithm
- segmentation process
- statistical analysis
- string matching
- svm-based chunking
- transliteration
- unknown word extraction
- unknown-word boundary identification
- unknown-word detection
- word extraction
- word extraction system
- word guessing
- word segmentation
- word-segmentation
Other assigned terms:
- approach
- association for computational linguistics
- case
- character sequence
- characters
- chinese text
- chinese text corpus
- contextual information
- detection rate
- detection task
- dictionary
- distribution
- edit distance
- fact
- foreign words
- frequency distribution
- generation
- heuristic
- identification accuracy
- identification task
- information agent
- knowledge
- lexicon
- linguistics
- linguists
- measure
- method
- morphological rules
- n-gram
- named entities
- names
- nlp tasks
- parse
- part-of-speech
- precision
- process
- rule set
- rule sets
- segments
- statistical model
- statistics
- substring
- tags
- technique
- text
- text corpus
- thai language
- trees
- unknown-word boundary
- user
- web pages
- web site
- word
- word boundaries
- words