ACL RD-TEC 1.0 Summarization of I05-3017
Paper Title:
THE SECOND INTERNATIONAL CHINESE WORD SEGMENTATION BAKEOFF
THE SECOND INTERNATIONAL CHINESE WORD SEGMENTATION BAKEOFF
Primarily assigned technology terms:
- algorithm
- bracketing
- categorization
- character encoding
- chinese word segmentation
- data analysis
- encoding
- hardware
- left-toright maximal matching algorithm
- matching
- matching algorithm
- maximal matching
- oov handling
- perl script
- recognition
- scoring
- scoring script
- search
- search engine
- segmentation
- sentence interpretation
- text segmentation
- unknown word recognition
- word recognition
- word segmentation
- word segmentation bakeoff
Other assigned terms:
- binomial distribution
- case
- characters
- chinese characters
- chinese corpora
- chinese translation
- chinese treebank
- chinese word
- corpora
- data sets
- distribution
- document
- evaluations
- f score
- fact
- generation
- genre
- heuristic
- human intervention
- interpretation
- knowledge
- lexica
- measures
- open test
- part-of-speech
- part-of-speech information
- penn chinese treebank
- precision
- probability
- process
- punctuation
- queries
- runtime
- segmentation accuracy
- segmentation bakeoff
- sentence
- simplified chinese
- sinica corpus
- stress
- system description
- system evaluation
- technology
- terms
- test corpora
- test corpus
- test set
- testing data
- text
- theorem
- tokens
- training
- training and testing data
- training corpus
- training data
- training document
- treebank
- vocabulary
- web site
- word
- word lists
- words