ACL RD-TEC 1.0 Summarization of W04-3234
Paper Title:
TRAINED NAMED ENTITY RECOGNITION USING DISTRIBUTIONAL CLUSTERS
TRAINED NAMED ENTITY RECOGNITION USING DISTRIBUTIONAL CLUSTERS
Primarily assigned technology terms:
- active learning
- adaboost
- agglomerative clustering
- algorithm
- approximate maximization
- boosting
- bootstrapping
- bootstrapping algorithm
- bootstrapping approach
- bootstrapping technique
- boundary detection
- classification
- classifier
- clustering
- coclustering
- computing
- corpus analysis
- corpus annotation
- crfs
- distributional analysis
- encoding
- entity recognition
- evaluation framework
- extractor
- hierarchical clustering
- induction
- inference process
- information extraction
- learner
- learning
- learning algorithm
- learning framework
- machine learning
- machine learning algorithm
- maximum entropy
- maximum entropy model
- modeling
- named entity recognition
- noise reduction
- post-processing
- quantitative evaluation
- recognition
- search
- simulated annealing
- statistical approaches
- string comparison
- supervised learner
- supervised learning
- syntactic analysis
- tagging
- tokenizer
- validation
- weak learner
- weighting
- wrapper induction
Other assigned terms:
- american news corpus
- annotation
- approach
- case
- cluster
- clusters
- co-occurrence
- co-occurrence count
- co-occurrences
- confidence scores
- data set
- data sets
- discourse
- distribution
- document
- domain knowledge
- entity type
- entity types
- entropy
- events
- feature
- feature set
- gazetteer
- histogram
- hypotheses
- hypothesis
- invocation
- knowledge
- labeling
- learning problem
- likelihood
- linguistic
- linguistic resources
- mark-up
- meaning
- mechanisms
- method
- mutual information
- named entity
- names
- ner problem
- news corpus
- noise
- opinions
- phrase
- portability
- precision
- probability
- procedure
- process
- seed
- seed words
- semantic
- semantic categories
- sentence
- set size
- statistical significance
- symbol
- symbols
- tags
- technique
- temporal expression
- term
- terms
- test set
- text
- text corpus
- tokens
- training
- training data
- training set
- training set size
- training time
- unlabeled text
- verb
- vocabulary
- wildcard
- words
- wrapper