D14-1188 |
fourth column is the p-value for
|
statistical significance testing
|
against the baseline . The first
|
D12-1091 |
demonstrated some limitations of
|
statistical significance testing
|
for NLP . In particular , while
|
M92-1001 |
. Second , a method o f doing
|
statistical significance testing
|
was incorporated into the test
|
N06-1058 |
based on WordNet . The results of
|
statistical significance testing
|
are summarized in Table 5 . All
|
D14-1102 |
F1 to emphasize precision . For
|
statistical significance testing
|
, we use the sign test with bootstrap
|
D12-1052 |
system edits are computed . For
|
statistical significance testing
|
, we use sign-test with bootstrap
|
D12-1091 |
considered a good practice to include
|
statistical significance testing
|
results with empirical evaluations
|
D10-1003 |
tion . We conduct x2 tests for
|
statistical significance testing
|
. We analyze the Penn Treebank
|
J08-1003 |
too small to support reliable
|
statistical significance testing
|
of the performance ranking of
|
M98-1024 |
metrics , scoring algorithms , and
|
statistical significance testing
|
. The first column in the report
|
E09-1048 |
caused by chance , we applied
|
statistical significance testing
|
. As we did not want to make
|
M92-1003 |
systems influence the outcome of the
|
statistical significance testing
|
more than the actual test statistics
|
M92-1043 |
recall . We intend to conduct
|
statistical significance testing
|
at least for the version of the
|
J08-1003 |
Below we give those results and
|
statistical significance testing
|
for the PARC 700 and CBS 500
|
J93-3001 |
the MUC-3 data . 4.1 Review of
|
Statistical Significance Testing
|
A statistical significance test
|
J12-2005 |
samples that form the basis of the
|
statistical significance testing
|
is less straightforward for the
|
M95-1002 |
defined for this task , and no
|
statistical significance testing
|
was performed on the scores .
|
J93-3001 |
we review the main concepts in
|
statistical significance testing
|
and describe our approach to
|
D15-1278 |
bootstrap test is adopted for
|
statistical significance testing
|
( Efron and Tibshirani , 1994
|
M98-1001 |
are given and included in the
|
statistical significance testing
|
because the systems can achieve
|