ACL RD-TEC 1.0 Summarization of W03-1108
Paper Title:
LEARNING BILINGUAL TRANSLATIONS FROM COMPARABLE CORPORA TO CROSS-LANGUAGE INFORMATION RETRIEVAL: HYBRID STATISTICS-BASED AND LINGUISTICS-BASED APPROACH
LEARNING BILINGUAL TRANSLATIONS FROM COMPARABLE CORPORA TO CROSS-LANGUAGE INFORMATION RETRIEVAL: HYBRID STATISTICS-BASED AND LINGUISTICS-BASED APPROACH
Authors: Fatiha Sadat and Masatoshi Yoshikawa and Shunsuke Uemura
Primarily assigned technology terms:
- analyzer
- bilingual lexicon extraction
- bilingual terminology
- bilingual terminology acquisition
- bilingual terminology extraction
- bootstrapping
- bootstrapping approach
- computational linguistics
- cross-language information retrieval
- data collection
- disambiguation
- indexing
- information retrieval
- information retrieval system
- knowledge acquisition
- language processing
- learning
- lexicon extraction
- linguistics-based pruning
- linguistics-based technique
- machine translation
- morphological analysis
- morphological analyzer
- morphological analyzers
- natural language processing
- nlp
- normalization
- phrasal translation
- preprocessing
- processing
- pruning
- query expansion
- re-scoring
- retrieval system
- retrieving
- romanization
- spelling
- statistical methods
- statistical t-test
- terminology
- terminology acquisition
- terminology extraction
- transliteration
- vector normalization
- vector space model
- weighting
- word translation
- world wide web
Other assigned terms:
- alphabet
- approach
- bilingual dictionaries
- bilingual dictionary
- bilingual lexicon
- bilingual lexicons
- case
- characters
- collocation
- comparable corpora
- compounds
- concept
- concepts
- content words
- context vectors
- corpora
- culture
- dictionaries
- dictionary
- document
- document frequency
- english translation
- english translations
- english verbs
- evaluations
- feature
- foreign words
- inverse document frequency
- inverted document frequency
- katakana
- knowledge
- language corpus
- language pair
- language pairs
- large corpora
- large text corpora
- lexical resources
- lexicon
- linguistic
- linguistic knowledge
- linguistic resources
- linguistics
- log-likelihood
- log-likelihood ratio
- measure
- method
- morphological knowledge
- names
- natural language
- nouns
- parallel corpora
- part-of-speech
- phonetic alphabet
- precision
- procedure
- pronunciation
- queries
- query
- query vector
- retrieval performance
- seed
- source language
- tags
- target language
- target language corpus
- target languages
- technical terms
- technique
- technology
- term
- term frequency
- terms
- test collection
- text
- text corpora
- topics
- translation candidates
- translation model
- translation models
- translations
- two-stages model
- two-stages translation model
- vector space
- vocabulary
- web pages
- weighting scheme
- word
- word frequencies
- words