ACL RD-TEC 1.0 Summarization of P04-1068
Paper Title:
CREATING MULTILINGUAL TRANSLATION LEXICONS WITH REGIONAL VARIATIONS USING WEB CORPORA
CREATING MULTILINGUAL TRANSLATION LEXICONS WITH REGIONAL VARIATIONS USING WEB CORPORA
Authors: Pu-Jen Cheng and Wen-Hsiang Lu and Jei-Wen Teng and Lee-Feng Chien
Primarily assigned technology terms:
- algorithm
- alignment method
- automatic construction
- automatic extraction
- browser
- character encoding
- chi-square test
- chinese-to-english translation
- co-occurrence analysis
- computer science
- corpus-based translation
- cross-language information retrieval
- direct translation
- dynamic programming
- encoding
- english-to-chinese translation
- english-to-japanese translation
- feature selection
- identification
- information management
- information retrieval
- internet
- internet technology
- language identification
- linking
- machine translation
- mining
- multilingual text alignment
- multilingual translation
- noun phrase recognition
- part-of-speech tagging
- phrase recognition
- ranking
- recognition
- search
- search engine
- search engines
- searching
- statistical techniques
- statistical translation
- tagging
- text alignment
- tf-idf weighting
- translation extraction
- translation method
- translation process
- transliteration
- web search
- web-based translation
- weighting
- word alignment
- word translation
Other assigned terms:
- anchor
- approach
- association measure
- bilingual corpora
- bilingual corpus
- bilingual lexicon
- case
- chinese words
- chinese-english lexicon
- co-occurrence
- coefficient
- cohesion
- collocation
- comparable corpora
- context vectors
- corpora
- correlations
- cosine measure
- data consortium
- data set
- data sets
- data sparseness
- dice
- dice coefficient
- dictionaries
- document
- domain-specific corpora
- encoding scheme
- entropy
- estimation
- experimental results
- fact
- feature
- feature set
- feature vectors
- generation
- geographic information
- identification accuracy
- information science
- knowledge
- language model
- language models
- language pairs
- lexemes
- lexicon
- linguistic
- linguistic data
- linguistic data consortium
- local maxima
- log-likelihood
- log-likelihood ratio
- many-to-many mapping
- mapping
- mappings
- measure
- measures
- method
- multilingual text
- mutual information
- n-gram
- names
- ngram
- non-parallel corpora
- noun phrase
- nouns
- ordered list
- parallel corpora
- parallel texts
- part-of-speech
- performance evaluation
- personal names
- phrase
- probability
- process
- proper names
- queries
- query
- query term
- seed
- seed words
- sentence
- similarity measure
- simplified chinese
- statistical information
- target language
- technical terms
- technology
- term
- terms
- text
- training
- training data
- translation accuracy
- translation candidate
- translation candidates
- translation equivalents
- translation model
- translation models
- translation problem
- translations
- unigram
- unigram model
- web documents
- web page
- web pages
- weighting scheme
- word
- words