#354We also report results of a preliminary, qualitative user evaluation of the system, which while broadly positive indicates further work needs to be done on the interface to make users aware of the increased potential of IE-enhanced text browsers.
tech,12-1-H01-1042,ak
the efficacy of applying
<term>
automated
evaluation
techniques
</term>
, originally devised for
#557The purpose of this research is to test the efficacy of applying automated evaluation techniques, originally devised for the evaluation of human language learners, to the output of machine translation (MT) systems.
tech,20-1-H01-1042,ak
</term>
, originally devised for the
<term>
evaluation
</term>
of
<term>
human language learners
</term>
#564The purpose of this research is to test the efficacy of applying automated evaluation techniques, originally devised for theevaluation of human language learners, to the output of machine translation (MT) systems.
tech,4-2-H01-1042,ak
systems
</term>
. We believe that these
<term>
evaluation
techniques
</term>
will provide information
#585We believe that theseevaluation techniques will provide information about both the human language learning process, the translation process and the development of machine translation systems.
tech,6-1-H01-1068,ak
describe a three-tiered approach for
<term>
evaluation
</term>
of
<term>
spoken dialogue systems
</term>
#1201We describe a three-tiered approach forevaluation of spoken dialogue systems.
</term>
of L . The results of a practical
evaluation
of this method on a
<term>
wide coverage
#1741The results of a practical evaluation of this method on a wide coverage English grammar are given.
speech recognition hypotheses
</term>
. An
evaluation
of our
<term>
system
</term>
against the
<term>
#2503An evaluation of our system against the annotated data shows that, it successfully classifies 73.2% in a German corpus of 2.284 SRHs as either coherent or incoherent (given a baseline of 54.55%).
and word error rate
</term>
, and provide
evaluation
results involving
<term>
automatic extraction
#2773We present an implementation of the model based on finite-state models, demonstrate the model's ability to significantly reduce character and word error rate, and provide evaluation results involving automatic extraction of translation lexicons from printed text.
tech,8-3-N03-1026,ak
propose the use of standard
<term>
parser
evaluation
methods
</term>
for automatically evaluating
#2848Furthermore, we propose the use of standard parser evaluation methods for automatically evaluating the summarization quality of sentence condensation systems.
tech,1-4-N03-1026,ak
condensation systems
</term>
. An
<term>
experimental
evaluation
</term>
of
<term>
summarization quality
</term>
#2863An experimental evaluation of summarization quality shows a close correlation between the automatic parse-based evaluation and a manual evaluation of generated strings.
tech,12-4-N03-1026,ak
correlation between the
<term>
automatic parse-based
evaluation
</term>
and a
<term>
manual evaluation
</term>
#2875An experimental evaluation of summarization quality shows a close correlation between the automatic parse-based evaluation and a manual evaluation of generated strings.
tech,17-4-N03-1026,ak
parse-based evaluation
</term>
and a
<term>
manual
evaluation
</term>
of generated strings . Overall
<term>
#2879An experimental evaluation of summarization quality shows a close correlation between the automatic parse-based evaluation and a manual evaluation of generated strings.
measure(ment),2-3-N03-2006,ak
an
<term>
EBMT system
</term>
. The two
<term>
evaluation
measures
</term>
of the
<term>
BLEU score
</term>
#3126The twoevaluation measures of the BLEU score and the NIST score demonstrated the effect of using an out-of-domain bilingual corpus and the possibility of using the language model.
tech,2-4-P03-1009,ak
<term>
polysemic verbs
</term>
. A novel
<term>
evaluation
scheme
</term>
is proposed which accounts
#3942A novelevaluation scheme is proposed which accounts for the effect of polysemy on the clusters, offering us a good insight into the potential and limitations of semantically classifying undisambiguated SCF data.
developed at our laboratory . Experimental
evaluation
shows that the
<term>
cooperative responses
#4406Experimental evaluation shows that the cooperative responses adaptive to individual users serve as good guidance for novice users without increasing the dialogue duration for skilled users.
#4572Task-based evaluation using Arabic information retrieval indicates an improvement of 22-38% in average precision over unstemmed text, and 96% of the performance of the proprietary stemmer above.
measure(ment),30-3-H05-1095,ak
accuracy
</term>
, as measured with the
<term>
NIST
evaluation
metric
</term>
.
<term>
Translations
</term>
#5645A statistical translation model is also presented that deals such phrases, as well as a training method based on the maximization of translation accuracy, as measured with the NIST evaluation metric.
tech,5-1-H05-1117,ak
recent developments in the
<term>
automatic
evaluation
</term>
of
<term>
machine translation
</term>
#5912Following recent developments in the automatic evaluation of machine translation and document summarization, we present a similar approach, implemented in a measure called POURPRE, for automatically evaluating answers to definition questions.
measure(ment),0-1-I05-2014,ak
syntactic analysis system
</term>
.
<term>
Automatic
evaluation
metrics
</term>
for
<term>
Machine Translation
#6222Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU or NIST, are now well established.
tech,30-1-I05-2021,ak
performance
</term>
, using standard
<term>
WSD
evaluation
methodology
</term>
and datasets from the
#6360We present the first known empirical test of an increasingly common speculative claim, by evaluating a representative Chinese-to-English SMT model directly on word sense disambiguation performance, using standard WSD evaluation methodology and datasets from the Senseval-3 Chinese lexical sample task.