#9Oral communication is ubiquitous and carries important information yet it is also time consuming to document.
the
<term>
annotated data
</term>
shows that ,
it
successfully classifies 73.2 % in a
<term>
#2514An evaluation of our system against the annotated data shows that, it successfully classifies 73.2% in a German corpus of 2.284 SRHs as either coherent or incoherent (given a baseline of 54.55%).
black-box OCR systems
</term>
in order to make
it
more useful for
<term>
NLP tasks
</term>
.
#2738The model is designed for use in error correction, with a focus on post-processing the output of black-box OCR systems in order to make it more useful for NLP tasks.
target recognition task
</term>
, but also that
it
is possible to get bigger performance gains
#3062In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams.
create a
<term>
word-trie
</term>
, transform
it
into a
<term>
minimal DFA
</term>
, then identify
#3199We create a word-trie, transform it into a minimal DFA, then identify hubs.
as the
<term>
cohesion constraint
</term>
.
It
requires disjoint
<term>
English phrases
</term>
#3244We present a syntax-based constraint for word alignment, known as the cohesion constraint. It requires disjoint English phrases to be mapped to non-overlapping intervals in the French sentence.
algorithms
</term>
. The results show that
it
can provide a significant improvement in
#3276The results show that it can provide a significant improvement in alignment quality.
inflow of multilingual , multimedia data .
It
gives users the ability to spend their
#3605The TAP-XL Automated Analyst's Assistant is an application designed to help an English-speaking analyst write a topical report, culling information from a large inflow of multilingual, multimedia data. It gives users the ability to spend their time finding more data relevant to their task, and gives them translingual reach into other languages by leveraging human language technology.
central to our
<term>
IE paradigm
</term>
.
It
is based on : ( 1 ) an extended set of
<term>
#3751We also introduce a new way of automatically identifying predicate argument structures, which is central to our IE paradigm. It is based on: (1) an extended set of features; and (2) inductive decision tree learning.
Switchboard dialogues
</term>
and show that
it
compares well to Byron 's ( 2002 ) manually
#4030We evaluate the system on twenty Switchboard dialogues and show that it compares well to Byron's (2002) manually tuned system.
</term>
of
<term>
speech understanding
</term>
,
it
is not appropriate to decide on a single
#4178Since multiple candidates for the understanding result can be obtained for a user utterance due to the ambiguity of speech understanding, it is not appropriate to decide on a single understanding result after each user utterance.
statistical machine translation
</term>
and
it
uses an
<term>
English stemmer
</term>
and
#4458The stemming model is based on statistical machine translation and it uses an English stemmer and a small (10K sentences) parallel corpus as its sole training resources.
improve the
<term>
stemmer
</term>
by allowing
it
to adapt to a desired
<term>
domain
</term>
#4502Monolingual, unannotated text can be used to further improve the stemmer by allowing it to adapt to a desired domain or genre.
manually segmented Arabic corpus
</term>
and uses
it
to bootstrap an
<term>
unsupervised algorithm
#4653Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus.
English . Typically , information that makes
it
to a summary appears in many different
<term>
#5207Typically, information that makes it to a summary appears in many different lexical-syntactic forms in the input documents.
training data
</term>
. We demonstrate that
it
is feasible to create
<term>
training material
#5295We demonstrate that it is feasible to create training material for problems in machine translation and that a mixture of supervised and unsupervised methods yields superior performance.
</term>
from
<term>
Japanese news texts
</term>
.
It
is found that the
<term>
Bayesian approach
#5405Comparison is made against non Bayesian summarizers, using test data from Japanese news texts. It is found that the Bayesian approach generally leverages performance of a summarizer, at times giving it a significant lead over non-Bayesian models.
<term>
summarizer
</term>
, at times giving
it
a significant lead over
<term>
non-Bayesian
#5422It is found that the Bayesian approach generally leverages performance of a summarizer, at times giving it a significant lead over non-Bayesian models.
version of our method and hypothesize that
it
can outperform a competitive
<term>
baseline
#5863Currently, we present a topic-sensitive version of our method and hypothesize that it can outperform a competitive baseline, which compares the similarity of each sentence to the input question via IDF-weighted word overlap.
</term>
of this
<term>
pronoun
</term>
, for which
it
does not make sense to look for an
<term>
#6167This tool is therefore designed to distinguish between the anaphoric occurrences of il, for which an anaphora resolution system has to look for an antecedent, and the expletive occurrences of this pronoun, for which it does not make sense to look for an antecedent.