tech,5-1-H05-1117,ak Following recent developments in the <term> automatic evaluation </term> of <term> machine translation </term>
tech,3-3-H05-1117,ak a system 's response . The lack of <term> automatic methods for scoring system output </term> is an impediment to progress in the
other,32-1-H05-1117,ak automatically evaluating answers to <term> definition questions </term> . Until now , the only way to assess
tech,11-1-H05-1117,ak <term> machine translation </term> and <term> document summarization </term> , we present a similar approach ,
other,21-2-H05-1117,ak manual determination of whether an <term> information nugget </term> appears in a system 's response .
tech,8-1-H05-1117,ak <term> automatic evaluation </term> of <term> machine translation </term> and <term> document summarization </term>
measure(ment),25-1-H05-1117,ak , implemented in a measure called <term> POURPRE </term> , for automatically evaluating answers
measure(ment),25-4-H05-1117,ak official <term> rankings </term> , and that <term> POURPRE </term> outperforms direct application of
other,12-4-H05-1117,ak 2004 QA tracks </term> indicate that <term> rankings </term> produced by our metric correlate
other,21-4-H05-1117,ak metric correlate highly with official <term> rankings </term> , and that <term> POURPRE </term> outperforms
other,3-4-H05-1117,ak with this work . Experiments with the <term> TREC 2003 and TREC 2004 QA tracks </term> indicate that <term> rankings </term>
hide detail