ACL RD-TEC 1.0 Summarization of P06-1144
Paper Title:
MULTILINGUAL DOCUMENT CLUSTERING: AN HEURISTIC APPROACH BASED ON COGNATE NAMED ENTITIES
MULTILINGUAL DOCUMENT CLUSTERING: AN HEURISTIC APPROACH BASED ON COGNATE NAMED ENTITIES
Authors: Soto Montalvo and Raquel Martínez and Arantza Casillas and Víctor Fresno
Primarily assigned technology terms:
- algorithm
- classification
- classification system
- clustering
- clustering algorithm
- cognate identification
- computational linguistics
- cross-lingual information retrieval
- dictionary lookup
- disambiguation
- document clustering
- document representation
- extractor
- feature selection
- feature translation
- grouping
- heuristic method
- identification
- information retrieval
- knowledge representation
- learning
- learning method
- learning techniques
- levenshtein
- linguistic analysis
- machine learning
- machine learning techniques
- machine translation
- machine translation system
- machine translation systems
- mapping process
- monolingual clustering
- multilingual document clustering
- multilingual news summarizer
- ne identification
- neural network
- partitioning
- recognition
- retrieving
- summarizer
- text representation
- translation process
- translation system
- translation systems
- translation technology
- word-sense disambiguation
Other assigned terms:
- adjective
- anchor
- approach
- association for computational linguistics
- bilingual dictionaries
- bilingual dictionary
- case
- cluster
- cluster similarity
- clusters
- comparable corpora
- comparable corpus
- corpora
- customization
- dictionaries
- dictionary
- document
- document frequency
- entity types
- evaluation measure
- evaluation metric
- events
- f-measure
- fact
- feature
- grammatical categories
- grammatical category
- heuristic
- knowledge
- levenshtein edit-distance
- levenshtein edit-distance function
- linear combination
- linguistic
- linguistic resources
- linguistics
- mapping
- maps
- measure
- measures
- method
- methodology
- monolingual corpus
- multilingual corpus
- multilingual document
- named entities
- named entity
- names
- news corpus
- noise
- nouns
- paragraph
- parallel corpora
- parallel corpus
- precision
- procedure
- process
- regular expressions
- representations
- russian
- semantic
- semantic similarity
- statistic
- statistics
- style
- technologies
- technology
- terms
- text
- thesaurus
- training
- user
- verb
- word
- words