ACL RD-TEC 1.0 Summarization of W06-3325
Paper Title:
THE DIFFICULTIES OF TAXONOMIC NAME EXTRACTION AND A SOLUTION
THE DIFFICULTIES OF TAXONOMIC NAME EXTRACTION AND A SOLUTION
Authors: Guido Sautter and Klemens Böhm
Primarily assigned technology terms:
- active learning
- algorithm
- automated extraction
- automatic extraction
- bionlp
- bootstrapping
- bootstrapping algorithm
- c + +
- classification
- classification process
- classifier
- combined classifier
- computational linguistics
- decomposition
- entity recognition
- entity recognizer
- extractor
- hidden markov
- hidden markov models
- incremental learning
- java
- language processing
- language recognition
- language recognizer
- learning
- learning techniques
- linking
- matching
- modern biology
- name extraction
- name recognition
- named entity recognition
- named entity recognizer
- named-entity recognition
- natural language processing
- nlp
- part-of-speech tagger
- pos-tagging
- processing
- protein name extraction
- protein name recognition
- recognition
- recognizer
- regular expression
- regular expression matching
- statistical methods
- structure recognition
- tagger
- taxonomic name extraction
- word-level classifier
Other assigned terms:
- abbreviations
- approach
- association for computational linguistics
- biology
- case
- classification quality
- composition
- corpora
- dictionaries
- dictionary
- distribution
- document
- domain-specific vocabulary
- english dictionary
- english text
- experimental results
- f-measure
- formal representation
- formal structure
- grammars
- heuristics
- knowledge
- labeled training data
- large labeled training corpora
- lexica
- lexicon
- linguistics
- markov models
- meaning
- measure
- measures
- mechanisms
- n-gram
- n-grams
- name lexicon
- named entities
- named entity
- named-entity
- names
- natural language
- part-of-speech
- phrase
- precision
- probability
- process
- proper names
- protein names
- regular expressions
- runtime
- sentence
- statistics
- svms
- syntax
- taxonomic name
- technique
- technology
- term
- text
- text documents
- thesaurus
- training
- training corpora
- training data
- training phase
- tree
- unlabeled corpus
- user
- user interaction
- vocabulary
- word
- word sequence
- word sequences
- words