ACL RD-TEC 1.0 Summarization of W06-2206
Paper Title:
SPOTTING THE `ODD-ONE-OUT': DATA-DRIVEN ERROR DETECTION AND CORRECTION IN TEXTUAL DATABASES
SPOTTING THE `ODD-ONE-OUT': DATA-DRIVEN ERROR DETECTION AND CORRECTION IN TEXTUAL DATABASES
Authors: Caroline Sporleder and Marieke van Erp and Tijn Porcelijn and Antal van den Bosch
Primarily assigned technology terms:
- algorithm
- automatic error detection
- classification
- classifier
- classifiers
- clustering
- clustering algorithm
- database
- databases
- detection method
- encoding
- error checking
- error correction
- error detection
- error detection and correction
- feature selection
- horizontal error detection
- identification
- information extraction
- information retrieval
- k-nearest neighbor
- language identification
- learner
- learning
- learning methods
- machine learning
- machine learning methods
- memory-based learner
- nearest neighbors
- parameter setting
- pre-processing
- semi-automatic error correction
- supervised machine learning
- taggers
- text classification
- tokenisation
- weighting
Other assigned terms:
- abbreviations
- annotation
- annotator
- approach
- background knowledge
- bibliographical information
- case
- classification problem
- classification task
- clusters
- community
- content words
- data set
- data sets
- database record
- development set
- document
- document frequency
- dutch
- fact
- feature
- feature set
- feature vectors
- french
- function words
- human annotator
- information gain
- inverse document frequency
- knowledge
- likelihood
- manual annotation
- method
- names
- noun phrases
- parameter settings
- person names
- precision
- prediction accuracy
- prepositions
- probability
- probability distributions
- process
- proper names
- proper noun
- punctuation
- query
- rule set
- similarity metric
- stem
- synonyms
- taxonomy
- term
- term frequency
- terms
- test set
- text
- text classification task
- textual information
- theory
- tokens
- training
- training data
- training set
- tree
- uniform probability
- user
- word
- word lists
- words