#8355We present a novel method for discoveringparallel sentences in comparable, non-parallel corpora.
lr,10-1-J05-4003,ak
discovering
<term>
parallel sentences
</term>
in
<term>
comparable , non-parallel corpora
</term>
. We train a
<term>
maximum entropy
#8358We present a novel method for discovering parallel sentences incomparable , non-parallel corpora.
other,10-2-J05-4003,ak
entropy classifier
</term>
that , given a
<term>
pair of sentences
</term>
, can reliably determinewhether or
#8373We train a maximum entropy classifier that, given apair of sentences, can reliably determinewhether or not they are translations of each other.
other,21-2-J05-4003,ak
reliably determinewhether or not they are
<term>
translations
</term>
of each other . Using this approach
#8384We train a maximum entropy classifier that, given a pair of sentences, can reliably determinewhether or not they aretranslations of each other.
other,6-3-J05-4003,ak
. Using this approach , we extract
<term>
parallel datafrom
</term>
large Chinese , Arabic , and English
#8395Using this approach, we extractparallel datafrom large Chinese, Arabic, and English non-parallel newspaper corpora.
lr,15-3-J05-4003,ak
large Chinese , Arabic , and English
<term>
non-parallel newspaper corpora
</term>
. We evaluate the qualityof the extracted
#8404Using this approach, we extract parallel datafrom large Chinese, Arabic, and Englishnon-parallel newspaper corpora.
tech,17-4-J05-4003,ak
performance of a state-of-the-art
<term>
statisticalmachine translation system
</term>
. We also show that a good-quality
#8425We evaluate the qualityof the extracted data by showing that it improves the performance of a state-of-the-artstatisticalmachine translation system.
tech,6-5-J05-4003,ak
. We also show that a good-quality
<term>
MT system
</term>
can be built fromscratch by starting
#8435We also show that a good-qualityMT system can be built fromscratch by starting with a very small parallel corpus (100,000 words) and exploiting a largenon-parallel corpus.
lr,18-5-J05-4003,ak
fromscratch by starting with a very small
<term>
parallel corpus
</term>
( 100,000 words ) and exploiting
#8447We also show that a good-quality MT system can be built fromscratch by starting with a very smallparallel corpus (100,000 words) and exploiting a largenon-parallel corpus.
lr,28-5-J05-4003,ak
and exploiting a largenon-parallel
<term>
corpus
</term>
. Thus , our method can be applied
#8457We also show that a good-quality MT system can be built fromscratch by starting with a very small parallel corpus (100,000 words) and exploiting a largenon-parallelcorpus.
other,11-6-J05-4003,ak
can be applied with great benefit to
<term>
language pairs
</term>
forwhich only scarce resources are
#8470Thus, our method can be applied with great benefit tolanguage pairs forwhich only scarce resources are available.