Following recent developments in the
<term>
automatic evaluation
</term>
of
<term>
machine translation
</term>
and
<term>
document summarization
</term>
, we present a similar approach , implemented in a measure called
<term>
POURPRE
</term>
, for automatically evaluating answers to
<term>
definition questions
</term>
.
#5917Following recent developments in the automatic evaluation of machine translation anddocument summarization, we present a similar approach, implemented in a measure called POURPRE, for automatically evaluating answers to definition questions.
measure(ment),25-4-H05-1117,ak
Experiments with the
<term>
TREC 2003 and TREC 2004 QA tracks
</term>
indicate that
<term>
rankings
</term>
produced by our metric correlate highly with official
<term>
rankings
</term>
, and that
<term>
POURPRE
</term>
outperforms direct application of existing metrics .
#6021Experiments with the TREC 2003 and TREC 2004 QA tracks indicate that rankings produced by our metric correlate highly with official rankings, and thatPOURPRE outperforms direct application of existing metrics.
tech,8-1-H05-1117,ak
Following recent developments in the
<term>
automatic evaluation
</term>
of
<term>
machine translation
</term>
and
<term>
document summarization
</term>
, we present a similar approach , implemented in a measure called
<term>
POURPRE
</term>
, for automatically evaluating answers to
<term>
definition questions
</term>
.
#5914Following recent developments in the automatic evaluation ofmachine translation and document summarization, we present a similar approach, implemented in a measure called POURPRE, for automatically evaluating answers to definition questions.
other,21-2-H05-1117,ak
Until now , the only way to assess the correctness of answers to such questions involves manual determination of whether an
<term>
information nugget
</term>
appears in a system 's response .
#5962Until now, the only way to assess the correctness of answers to such questions involves manual determination of whether aninformation nugget appears in a system's response.
other,32-1-H05-1117,ak
Following recent developments in the
<term>
automatic evaluation
</term>
of
<term>
machine translation
</term>
and
<term>
document summarization
</term>
, we present a similar approach , implemented in a measure called
<term>
POURPRE
</term>
, for automatically evaluating answers to
<term>
definition questions
</term>
.
#5938Following recent developments in the automatic evaluation of machine translation and document summarization, we present a similar approach, implemented in a measure called POURPRE, for automatically evaluating answers todefinition questions.
other,3-4-H05-1117,ak
Experiments with the
<term>
TREC 2003 and TREC 2004 QA tracks
</term>
indicate that
<term>
rankings
</term>
produced by our metric correlate highly with official
<term>
rankings
</term>
, and that
<term>
POURPRE
</term>
outperforms direct application of existing metrics .
#5999Experiments with theTREC 2003 and TREC 2004 QA tracks indicate that rankings produced by our metric correlate highly with official rankings, and that POURPRE outperforms direct application of existing metrics.
tech,5-1-H05-1117,ak
Following recent developments in the
<term>
automatic evaluation
</term>
of
<term>
machine translation
</term>
and
<term>
document summarization
</term>
, we present a similar approach , implemented in a measure called
<term>
POURPRE
</term>
, for automatically evaluating answers to
<term>
definition questions
</term>
.
#5911Following recent developments in theautomatic evaluation of machine translation and document summarization, we present a similar approach, implemented in a measure called POURPRE, for automatically evaluating answers to definition questions.
other,12-4-H05-1117,ak
Experiments with the
<term>
TREC 2003 and TREC 2004 QA tracks
</term>
indicate that
<term>
rankings
</term>
produced by our metric correlate highly with official
<term>
rankings
</term>
, and that
<term>
POURPRE
</term>
outperforms direct application of existing metrics .
#6008Experiments with the TREC 2003 and TREC 2004 QA tracks indicate thatrankings produced by our metric correlate highly with official rankings, and that POURPRE outperforms direct application of existing metrics.
tech,3-3-H05-1117,ak
The lack of
<term>
automatic methods for scoring system output
</term>
is an impediment to progress in the field , which we address with this work .
#5974The lack ofautomatic methods for scoring system output is an impediment to progress in the field, which we address with this work.
other,21-4-H05-1117,ak
Experiments with the
<term>
TREC 2003 and TREC 2004 QA tracks
</term>
indicate that
<term>
rankings
</term>
produced by our metric correlate highly with official
<term>
rankings
</term>
, and that
<term>
POURPRE
</term>
outperforms direct application of existing metrics .
#6017Experiments with the TREC 2003 and TREC 2004 QA tracks indicate that rankings produced by our metric correlate highly with officialrankings, and that POURPRE outperforms direct application of existing metrics.
measure(ment),25-1-H05-1117,ak
Following recent developments in the
<term>
automatic evaluation
</term>
of
<term>
machine translation
</term>
and
<term>
document summarization
</term>
, we present a similar approach , implemented in a measure called
<term>
POURPRE
</term>
, for automatically evaluating answers to
<term>
definition questions
</term>
.
#5931Following recent developments in the automatic evaluation of machine translation and document summarization, we present a similar approach, implemented in a measure calledPOURPRE, for automatically evaluating answers to definition questions.