ACL RD-TEC 1.0 Summarization of P04-1024
Paper Title:
FINDING IDEOGRAPHIC REPRESENTATIONS OF JAPANESE NAMES WRITTEN IN LATIN SCRIPT VIA LANGUAGE IDENTIFICATION AND CORPUS VALIDATION
FINDING IDEOGRAPHIC REPRESENTATIONS OF JAPANESE NAMES WRITTEN IN LATIN SCRIPT VIA LANGUAGE IDENTIFICATION AND CORPUS VALIDATION
Authors: Yan Qu and Gregory Grefenstette
Primarily assigned technology terms:
- a statistical part-of-speech
- character conversion
- corpus validation
- corpus-based validation
- cross-language retrieval
- cutoff
- database
- english-to-chinese transliteration
- error analysis
- finite-state transducers
- fuzzy matching
- generation method
- google search engine
- greedy segmentation
- identification
- japanese retrieval
- japanese term identification
- katakana transliteration
- language identification
- machine translation
- matching
- name translation
- part-of-speech tagger
- reading
- search
- search engine
- segmentation
- segmentation method
- statistical part-of-speech tagger
- tagger
- terminology
- thresholding
- transcription
- transducers
- translation method
- transliteration
- trigram-based language identifier
- validation
- weighted finite-state transducers
Other assigned terms:
- alphabet
- approach
- back-transliteration
- bigram
- bigram model
- bilingual lexicon
- bilingual lexicons
- characters
- chinese characters
- dictionary
- distribution
- document
- document collection
- english-japanese dictionary
- evaluation measures
- f measure
- f-measure
- feature
- generation
- gold standard
- hypothesis
- hypothesis space
- japanese corpus
- kanji
- katakana
- language model
- language pairs
- latin alphabet
- lexicon
- mapping
- mapping table
- mappings
- meaning
- measure
- measures
- method
- monolingual corpus
- name lexicon
- names
- part-of-speech
- person names
- phoneme
- phonemes
- pinyin
- precision
- procedure
- process
- pronunciation
- pronunciation dictionary
- proper names
- representations
- segments
- specialized terminology
- statistics
- syllables
- technique
- term
- term list
- terms
- test set
- text
- theory
- training
- training data
- translation pairs
- translation problem
- translations
- trigram
- unihan database
- web pages
- word
- words
- writing system