ACL RD-TEC 1.0 Summarization of W03-0314
Paper Title:
LEARNING SEQUENCE-TO-SEQUENCE CORRESPONDENCES FROM PARALLEL CORPORA VIA SEQUENTIAL PATTERN MINING
LEARNING SEQUENCE-TO-SEQUENCE CORRESPONDENCES FROM PARALLEL CORPORA VIA SEQUENTIAL PATTERN MINING
Authors: Kaoru Yamamoto and Taku Kudo and Yuta Tsuboi and Yuji Matsumoto
Primarily assigned technology terms:
- algorithm
- bilingual lexicon extraction
- c + +
- candidate generation
- chunking
- cross-validation
- cutoff
- database
- databases
- depth-first search
- em training
- extraction method
- greedy algorithm
- hypothesizing
- identification
- iterative method
- japanese morphological analysis
- japanese word segmentation
- learning
- lexicon acquisition
- lexicon extraction
- machine translation
- mining
- morphological analysis
- multi-word translation
- nlp
- one-to-one mapping
- parsers
- part-of-speech tagging
- pattern mining
- pos tagging
- preprocessing
- probabilistic translation
- processor
- search
- searching
- segmentation
- sequence alignment
- sequential pattern mining
- single-word translation
- statistical machine translation
- tagging
- translation lexicon acquisition
- translation memory
- tuning
- unsupervised extraction
- unsupervised learning
- viterbi
- word alignment
- word segmentation
Other assigned terms:
- ambiguity
- annotation
- approach
- array
- association score
- bilingual corpora
- bilingual dictionary
- bilingual lexicon
- case
- chunk
- collocation
- comparable corpora
- compounds
- concepts
- content words
- contingency table
- corpora
- data structure
- dictionary
- empirical results
- english-japanese dictionary
- evaluation metrics
- events
- experimental results
- extraction process
- french
- french translation
- functional word
- generation
- genre
- heuristics
- implementation
- japanese word
- joint probability
- joint probability model
- language pair
- lexicon
- linguistic
- linguistic constraints
- log-likelihood
- log-likelihood ratio
- many-to-many mapping
- mapping
- method
- n-gram
- named entities
- noise
- noun phrase
- paragraph
- parallel corpora
- parallel sentence
- part-of-speech
- partof-speech
- phrase
- possible translation
- precision
- probabilistic model
- probability
- probability model
- process
- projection
- punctuation
- search space
- segmentation ambiguity
- sentence
- sentence pair
- sentence similarity
- sentences
- sequence database
- similarity score
- statistics
- symbols
- terms
- text
- tokens
- training
- translation candidate
- translation candidates
- translation lexicon
- translation pair
- translation pairs
- translations
- word
- word segmentation ambiguity
- word sequences
- words