P07-1040 Currently , the most widely used automatic MT evaluation metric is the NIST BLEU-4 ( Papineni
N04-1022 AMTA , 2003 ) . We expect new automatic MT evaluation metrics to emerge frequently
P08-1007 is invaluable . Among all the automatic MT evaluation metrics , BLEU ( Papineni et
N13-3005 . Crucially , current standard automatic MT evaluation metrics also lack any diagnostic
P05-1067 evaluated using the NIST and Bleu automatic MT evaluation software . The evaluation shows
P05-1067 system using the NIST and Bleu automatic MT evaluation software . The result shows that
N07-1005 system parameters based on the automatic MT evaluation measures . Acknowledgments The
N06-1057 translation ( MT ) community for automatic MT evaluation . A problem with ROUGE is that
P08-1007 et al. , 2007 ) , a total of 11 automatic MT evaluation metrics were evaluated for correlation
N06-1058 correlate with its utility for automatic MT evaluation . Our results suggest that researchers
P07-1038 an alternative perspective on automatic MT evaluation that may be informative in its
N04-1036 translation quality is measured by the automatic MT evaluation metrics , such as NIST and Bleu
P04-1077 framework that automatically evaluated automatic MT evaluation metrics using only manual translations
P08-1007 judgements than all of the 11 automatic MT evaluation metrics evaluated during the
P08-1007 , having a robust and accurate automatic MT evaluation metric that correlates well with
N09-2006 riddled error surface computed by automatic MT evaluation metrics . We showed , empirically
P08-1007 with human judgements than all 11 automatic MT evaluation metrics that were evaluated during
P06-2003 trans - lator . Most approaches to automatic MT evaluation implicitly assume that both criteria
P05-1067 in section 5 with the NIST/Bleu automatic MT evaluation software and the results are
I05-5003 translations . Fortunately , the automatic MT evaluation techniques commonly in use do
hide detail