other,7-1-J05-4003,ak present a novel method for discovering <term> parallel sentences </term> in <term> comparable , non-parallel
lr,10-1-J05-4003,ak discovering <term> parallel sentences </term> in <term> comparable , non-parallel corpora </term> . We train a <term> maximum entropy
tech,3-2-J05-4003,ak non-parallel corpora </term> . We train a <term> maximum entropy classifier </term> that , given a <term> pair of sentences
other,10-2-J05-4003,ak entropy classifier </term> that , given a <term> pair of sentences </term> , can reliably determinewhether or
other,21-2-J05-4003,ak reliably determinewhether or not they are <term> translations </term> of each other . Using this approach
other,6-3-J05-4003,ak . Using this approach , we extract <term> parallel datafrom </term> large Chinese , Arabic , and English
lr,15-3-J05-4003,ak large Chinese , Arabic , and English <term> non-parallel newspaper corpora </term> . We evaluate the qualityof the extracted
tech,17-4-J05-4003,ak performance of a state-of-the-art <term> statisticalmachine translation system </term> . We also show that a good-quality
tech,6-5-J05-4003,ak . We also show that a good-quality <term> MT system </term> can be built fromscratch by starting
lr,18-5-J05-4003,ak fromscratch by starting with a very small <term> parallel corpus </term> ( 100,000 words ) and exploiting
lr,28-5-J05-4003,ak and exploiting a largenon-parallel <term> corpus </term> . Thus , our method can be applied
other,11-6-J05-4003,ak can be applied with great benefit to <term> language pairs </term> forwhich only scarce resources are
hide detail