D09-1008 |
trained on the English side of the
|
parallel training
|
data . For that purpose , we
|
D08-1078 |
2000 sentences from each of the
|
parallel training
|
corpora . From this subset we
|
D09-1040 |
come to mind is to use larger
|
parallel training
|
corpora . However , current state-of-the-art
|
C96-1033 |
independently of each other , allowing for
|
parallel training
|
on several CPU 's . In run mode
|
D09-1074 |
sentence - aligned , word-aligned
|
parallel training
|
data , one could extract various
|
D08-1066 |
word-level alignments for the
|
parallel training
|
corpus using GIZA + + . We use
|
D09-1074 |
is easy to see that most of the
|
parallel training
|
data are either newswire or from
|
D08-1066 |
function of the word-aligned ,
|
parallel training
|
corpus . Earlier efforts on devising
|
D09-1040 |
are limited by the quantity of
|
parallel training
|
texts . Augmenting the training
|
D09-1123 |
to parse the target side of the
|
parallel training
|
data . Each sentence is associated
|
D09-1074 |
probabilities . Typically , a
|
parallel training
|
corpus is comprised of collections
|
D10-1041 |
covered by the phrase table and the
|
parallel training
|
data . Section 3 describes our
|
D08-1090 |
4.2 used only 5 million words of
|
parallel training
|
, 230 million words of parallel
|
D08-1078 |
Each language pair has a separate
|
parallel training
|
corpus , but the target vocabulary
|
D09-1074 |
weight for each sentence in a
|
parallel training
|
corpus so as to optimize MT performance
|
D08-1090 |
condition involved a small amount of
|
parallel training
|
, such as one might find when
|
D09-1040 |
source and target sides of the
|
parallel training
|
sets . When the baseline system
|
D09-1074 |
identify , for each sentence in the
|
parallel training
|
data , a set of features that
|
D08-1090 |
versus 0.68 % BLEU . 4.3 Full
|
Parallel Training
|
Results While the simulation
|
D09-1040 |
rate concentrates on increasing
|
parallel training
|
set size without using more dedicated
|