tech,5-1-H05-1117,bq Following recent developments in the <term> automatic evaluation </term> of <term> machine translation </term>
tech,8-1-H05-1117,bq <term> automatic evaluation </term> of <term> machine translation </term> and <term> document summarization </term>
tech,11-1-H05-1117,bq <term> machine translation </term> and <term> document summarization </term> , we present a similar approach ,
measure(ment),25-1-H05-1117,bq , implemented in a measure called <term> POURPRE </term> , for <term> automatically evaluating
measure(ment),28-1-H05-1117,bq measure called <term> POURPRE </term> , for <term> automatically evaluating answers to definition questions </term> . Until now , the only way to assess
tech,4-3-H05-1117,bq 's response . The lack of automatic <term> methods </term> for <term> scoring system output </term>
measure(ment),6-3-H05-1117,bq of automatic <term> methods </term> for <term> scoring system output </term> is an impediment to progress in the
other,3-4-H05-1117,bq with this work . Experiments with the <term> TREC 2003 and TREC 2004 QA tracks </term> indicate that <term> rankings </term>
other,12-4-H05-1117,bq 2004 QA tracks </term> indicate that <term> rankings </term> produced by our <term> metric </term>
measure(ment),16-4-H05-1117,bq <term> rankings </term> produced by our <term> metric </term> correlate highly with <term> official
measure(ment),20-4-H05-1117,bq metric </term> correlate highly with <term> official rankings </term> , and that <term> POURPRE </term> outperforms
measure(ment),25-4-H05-1117,bq official rankings </term> , and that <term> POURPRE </term> outperforms direct application of
measure(ment),31-4-H05-1117,bq outperforms direct application of existing <term> metrics </term> . We describe a <term> method </term>
skrij podrobnosti