D15-1040 |
representation automatically during
|
joint training
|
. The performance results for
|
D13-1054 |
parameters with respect to the
|
joint training
|
objective . Given a set of parameters
|
D10-1037 |
using a novel EM-based method for
|
joint training
|
. We evaluate our approach on
|
D15-1040 |
our model can easily be used for
|
joint training
|
over k > 2 languages . We
|
D15-1064 |
embeddings , combined with our
|
joint training
|
objective , provide a large improvement
|
D14-1017 |
results using geometric means . The
|
joint training
|
method ( Liang et al. , 2006
|
D15-1040 |
This shows the superiority of
|
joint training
|
compared with single language
|
D13-1074 |
described in Sec . 4 . We show that
|
joint training
|
produces an even stronger gain
|
D15-1040 |
the tagset mapping as part of
|
joint training
|
. Beyond 15k tokens , the joint
|
D15-1064 |
performed the best on dev data for
|
joint training
|
. General Results Table 2 shows
|
D10-1102 |
second property is desirable since
|
joint training
|
avoids error propagation that
|
D15-1040 |
Regularization Parameter Tuning
|
Joint training
|
with a dictionary ( see equation
|
D10-1019 |
data set . • c = number of
|
joint training
|
iterations . • cs = number
|
D14-1015 |
local and global context via a
|
joint training
|
objective . Much of the research
|
D15-1053 |
propose to introduce a degree of
|
joint training
|
of parameters is to incorporate
|
D10-1019 |
using virtual nodes , and performs
|
joint training
|
and decoding in the factorized
|
D15-1121 |
efforts along these lines is the
|
joint training
|
of the CTM and the log-linear
|
D15-1064 |
text . Finally , we propose a
|
joint training
|
objective for the embeddings
|
D15-1040 |
future work , we plan to extend
|
joint training
|
to several languages , and further
|
D10-1019 |
graph structure that exploits
|
joint training
|
and decoding in the factorized
|