A00-1013 |
presuppositions identified by the human
|
ratings
|
were detected by DP . The false
|
A00-1013 |
restrictive measures based on the
|
ratings
|
to evaluate the performance of
|
A00-1013 |
agreement , the raters with correlated
|
ratings
|
agreed in 67 % of the cases .
|
A00-1013 |
questions were identified based on
|
ratings
|
by three human expert raters
|
A00-1042 |
1995 -RSB- receives the highest
|
rating
|
. How - ever , in terms of a
|
A00-1013 |
human raters . Some of the human
|
ratings
|
diverged substantially . Therefore
|
A00-1013 |
similar results for analyses of the
|
ratings
|
based on the four-point and the
|
A00-1013 |
and 2 ( " no problem " ) versus
|
ratings
|
of 3 and 4 ( " problem " ) .
|
A00-1013 |
measures we computed based on these
|
ratings
|
to evaluate DP 's performance
|
A00-1013 |
presupposition for the question (
|
rating
|
of 3 or 4 ) . We call this measure
|
A00-1013 |
) , resulting in agreement of
|
ratings
|
in 63 % and 66 % of the questions
|
A00-1013 |
present , as reported by the human
|
ratings
|
. flaj PCOMp All measures , except
|
A00-1013 |
Pcomp ( 0.35 ) . DP and Pconv
|
ratings
|
were in agreement for 67 % of
|
A00-1013 |
acknowledge three colleagues for
|
rating
|
the questions in our evaluation
|
A00-1013 |
. 2.3 Evaluation of the DP DP
|
ratings
|
were significantly correlated
|
A00-1013 |
DP 's performance . 2.1 Human
|
ratings
|
We used human ratings as the
|
A00-1013 |
other words , the agreement of
|
ratings
|
provided by the system and by
|
A00-1013 |
presuppositions reported by the human
|
rating
|
criterion ( computed as hits
|
A00-1013 |
into Boolean ratings by combining
|
ratings
|
of 1 and 2 ( " no problem " )
|
A00-1013 |
transformed the ratings into Boolean
|
ratings
|
by combining ratings of 1 and
|