#613This, the first experiment in a series of experiments, looks at the intelligibility of MT output.
This , the first experiment in a series of
experiments
, looks at the
<term>
intelligibility
</term>
#618This, the first experiment in a series of experiments, looks at the intelligibility of MT output.
other,1-4-H01-1042,ak
output
</term>
. A
<term>
language learning
experiment
</term>
showed that
<term>
assessors
</term>
#631A language learning experiment showed that assessors can differentiate native from non-native language essays in less than 100 words.
criteria could be elicited from duplicating the
experiment
using
<term>
machine translation output
</term>
#676We tested this to see if similar criteria could be elicited from duplicating the experiment using machine translation output.
made this decision . The results of this
experiment
, along with a preliminary analysis of
#759The results of this experiment, along with a preliminary analysis of the factors involved in the decision making process will be presented here.
tech,5-1-P01-1070,ak
set of
<term>
supervised machine learning
experiments
</term>
centering on the construction of
<term>
#2133We describe a set of supervised machine learning experiments centering on the construction of statistical models of WH-questions.
, passage , and/or answer levels
</term>
.
Experiments
evaluating the effectiveness of our
<term>
#2397We present our multi-level answer resolution algorithm that combines results from the answering agents at the question, passage, and/or answer levels. Experiments evaluating the effectiveness of our answer resolution algorithm show a 35.0% relative improvement over our baseline system in the number of questions correctly answered, and a 32.8% improvement according to the average precision metric.
other,3-3-N03-1012,ak
</term>
. We conducted an
<term>
annotation
experiment
</term>
and showed that
<term>
human annotators
#2484We conducted an annotation experiment and showed that human annotators can reliably differentiate between semantically coherent and incoherent speech recognition hypotheses.
framework , we carry out a large number of
experiments
to understand better and explain why
<term>
#2576Within our framework, we carry out a large number of experiments to understand better and explain why phrase-based models outperform word-based models.
monolingual corpus
</term>
. We conducted
experiments
with an
<term>
EBMT system
</term>
. The two
#3118We conducted experiments with an EBMT system.
kernel function
</term>
. The results of the
experiments
demonstrate that the
<term>
HDAG Kernel
</term>
#3869The results of the experiments demonstrate that the HDAG Kernel is superior to other kernel functions and baseline methods.
<term>
discourse understanding process
</term>
.
Experiment
results have shown that a
<term>
system
</term>
#4256Unlike conventional methods that use hand-crafted rules, the proposed method enables easy design of the discourse understanding process. Experiment results have shown that a system that exploits the proposed method performs sufficiently and that holding multiple candidates for understanding results is effective.
Collins ( 2000 ) reranker
</term>
. Although our
experiments
are focused on
<term>
parsing
</term>
, the
#5562Although our experiments are focused on parsing, the techniques described generalize naturally to NLP structures other than parse trees.
IDF-weighted word overlap
</term>
. In our
experiments
, the method achieves a
<term>
TRDR score
#5888In our experiments, the method achieves a TRDR score that is significantly higher than that of the baseline.
field , which we address with this work .
Experiments
with the
<term>
TREC 2003 and TREC 2004 QA
#5996The lack of automatic methods for scoring system output is an impediment to progress in the field, which we address with this work. Experiments with the TREC 2003 and TREC 2004 QA tracks indicate that rankings produced by our metric correlate highly with official rankings, and that POURPRE outperforms direct application of existing metrics.
WSD models
</term>
. We present controlled
experiments
showing the
<term>
WSD accuracy
</term>
of
#6463We present controlled experiments showing the WSD accuracy of current typical SMT models to be significantly lower than that of all the dedicated WSD models considered.
over all of the other models used in the
experiments
. Towards
<term>
deep analysis
</term>
of
<term>
#7489Our technique gives a substantial improvement in paraphrase classification accuracy over all of the other models used in the experiments.
manual judgement
</term>
. Our preliminary
experiments
on building a
<term>
paraphrase corpus
</term>
#7534Our preliminary experiments on building a paraphrase corpus have so far been producing promising results, which we have evaluated according to cost-efficiency, exhaustiveness, and reliability.
the
<term>
paraphrases
</term>
produced in an
experiment
, i.e. , ( i ) their
<term>
grammaticality
#7600We measured the quality of the paraphrases produced in an experiment, i.e., (i) their grammaticality: at least 99% correct sentences; (ii) their equivalence in meaning: at least 96% correct paraphrases either by meaning equivalence or entailment; and, (iii) the amount of internal lexical and syntactical variation in a set of paraphrases: slightly superior to that of hand-produced sets.
space
</term>
in the
<term>
parsing data
</term>
.
Experiments
show significant efficiency gains for the
#8252The article also introduces a new algorithm for the boosting approach which takes advantage of the sparsity of the feature space in the parsing data. Experiments show significant efficiency gains for the new algorithm over the obvious implementation of the boosting approach.