ACL RD-TEC 1.0 Summarization of W06-2202
Paper Title:
SIMPLE INFORMATION EXTRACTION (SIE): A PORTABLE AND EFFECTIVE IE SYSTEM
SIMPLE INFORMATION EXTRACTION (SIE): A PORTABLE AND EFFECTIVE IE SYSTEM
Authors: Claudio Giuliano and Alberto Lavelli and Lorenza Romano
Primarily assigned technology terms:
- 5-fold cross-validation
- algorithm
- capitalization
- categorization
- classification
- classifier
- classifiers
- cross-validation
- document indexing
- encoding
- entity recognition
- expression recognition
- extraction system
- feature extraction
- feature selection
- ie system
- indexing
- information extraction
- information extraction system
- java
- kernel
- language-independent named entity recognition
- learning
- learning algorithm
- learning algorithms
- machine learning
- machine learning algorithm
- machine learning algorithms
- matching
- matching algorithm
- named entity recognition
- normalization
- recognition
- semantic analysis
- semantic web
- supervised system
- term filtering
- text categorization
- tuning
- validation
Other assigned terms:
- approach
- classification problem
- classification tasks
- coefficient
- context window
- corpora
- correlation
- correlation coefficient
- data set
- data sets
- data structures
- distribution
- document
- document frequency
- domain-specific information
- domain-specific knowledge
- dutch
- english corpus
- english text
- entity types
- experimental results
- feature
- feature description
- feature set
- implementation
- index
- information content
- information extraction task
- knowledge
- learning module
- length distribution
- lexical resources
- lexicon
- mapping
- maps
- measure
- measures
- medline
- modular architecture
- named entities
- named entity
- names
- part of speech
- portability
- pos tag
- precision
- prediction accuracy
- probability
- probability distribution
- process
- recognition phase
- representations
- semantic
- sentence
- set size
- statistic
- stop word list
- system architecture
- tags
- technique
- temporal expressions
- term
- terms
- test set
- text
- tokens
- training
- training corpora
- training corpus
- training phase
- training set
- word
- words