A00-1013 presuppositions identified by the human ratings were detected by DP . The false
A00-1013 restrictive measures based on the ratings to evaluate the performance of
A00-1013 agreement , the raters with correlated ratings agreed in 67 % of the cases .
A00-1013 questions were identified based on ratings by three human expert raters
A00-1042 1995 -RSB- receives the highest rating . How - ever , in terms of a
A00-1013 human raters . Some of the human ratings diverged substantially . Therefore
A00-1013 similar results for analyses of the ratings based on the four-point and the
A00-1013 and 2 ( " no problem " ) versus ratings of 3 and 4 ( " problem " ) .
A00-1013 measures we computed based on these ratings to evaluate DP 's performance
A00-1013 presupposition for the question ( rating of 3 or 4 ) . We call this measure
A00-1013 ) , resulting in agreement of ratings in 63 % and 66 % of the questions
A00-1013 present , as reported by the human ratings . flaj PCOMp All measures , except
A00-1013 Pcomp ( 0.35 ) . DP and Pconv ratings were in agreement for 67 % of
A00-1013 acknowledge three colleagues for rating the questions in our evaluation
A00-1013 . 2.3 Evaluation of the DP DP ratings were significantly correlated
A00-1013 DP 's performance . 2.1 Human ratings We used human ratings as the
A00-1013 other words , the agreement of ratings provided by the system and by
A00-1013 presuppositions reported by the human rating criterion ( computed as hits
A00-1013 into Boolean ratings by combining ratings of 1 and 2 ( " no problem " )
A00-1013 transformed the ratings into Boolean ratings by combining ratings of 1 and
hide detail