ACL RD-TEC 1.0 Summarization of P06-1046
Paper Title:
SCALING DISTRIBUTIONAL SIMILARITY TO LARGE CORPORA
SCALING DISTRIBUTIONAL SIMILARITY TO LARGE CORPORA
Authors: James Gorman and James R. Curran
Primarily assigned technology terms:
- algorithm
- approximation
- attribute indexing
- backtracking
- chunker
- cluster analysis
- clustering
- comparison function
- computational linguistics
- context extraction
- dimensionality reduction
- dimensionality reduction technique
- extractor
- feature selection
- incremental sampling
- indexing
- k-nn
- latent semantic analysis
- matching
- maximum entropy
- measuring
- nearest neighbors
- pos tagger
- random indexing
- ranking
- ri algorithm
- sampling
- search
- searching
- semantic analysis
- smoothing
- tagger
- thesaurus extraction
- tuning
- vector comparison
- weighting
Other assigned terms:
- accuracy\/efficiency trade-off
- approach
- association for computational linguistics
- bilingual lexicons
- binary tree
- british national corpus
- case
- cluster
- computational complexity
- concreteness
- context information
- context vector
- context vectors
- corpora
- corpus size
- cosine distance
- cosine measure
- data set
- data structure
- data structures
- dimensionality
- distance measure
- distributional similarity
- entropy
- evaluation measures
- feature
- foreign language
- frequency cut-off
- gold standard
- grammatical relation
- hamming distance
- heuristic
- heuristics
- index
- information gain
- knowledge
- large corpora
- latent semantic
- linguistics
- measure
- measures
- method
- mutual information
- nouns
- parallel corpora
- permutation
- probability
- process
- relation
- reordering
- reuters corpus
- root node
- scalability
- search time
- semantic
- semantic similarity
- sentence
- similarity function
- similarity measure
- similarity scores
- statistics
- sub-tree
- subtree
- synonym
- synonyms
- synonymy
- technique
- term
- terms
- text
- thesaurus
- time complexity
- tree
- verb
- vocabulary
- vocabulary size
- word
- wordnet
- words