|
For many reasons , it is highly desirable
|
to
|
accurately estimate the
<term>
confidence
|
#6795
For many reasons, it is highly desirable to accurately estimate the confidence the system has in the correctness of each extracted field. |
|
lexicons
</term>
and
<term>
grammars
</term>
|
to
|
achieve complex
<term>
natural language processing
|
#17267
If we want valuable lexicons and grammars to achieve complex natural language processing, we must provide very powerful tools to help create and ensure the validity of such complex linguistic databases. |
|
<term>
maximum likelihood method
</term>
, fail
|
to
|
achieve high
<term>
performance
</term>
in
|
#17855
Owing to the problem of insufficient training data and approximation error introduced by the language model, traditional statistical approaches, which resolve ambiguities by indirectly and implicitly using maximum likelihood method, fail to achieve high performance in real applications. |
|
topics that must be addressed in order
|
to
|
achieve powerful , general
<term>
user modeling
|
#16199
Finally, the current state of research in user modeling is summarized, and future research topics that must be addressed in order to achieve powerful, general user modeling systems are assessed. |
|
being discussed and/or evaluated : Similar
|
to
|
activities one can define subsets of larger
|
#139
Several extensions of this basic idea are being discussed and/or evaluated: Similar to activities one can define subsets of larger database and detect those automatically which is shown on a large database of TV shows. |
|
improve the
<term>
stemmer
</term>
by allowing it
|
to
|
adapt to a desired
<term>
domain
</term>
or
|
#4501
Monolingual, unannotated text can be used to further improve the stemmer by allowing it to adapt to a desired domain or genre. |
|
question-answering ( Q/A ) system
</term>
designed
|
to
|
address the challenges of integrating
<term>
|
#11641
This paper describes FERRET, an interactive question-answering (Q/A) system designed to address the challenges of integrating automatic Q/A applications into real-world environments. |
|
technology
</term>
development initiative
|
to
|
advance the state of the art in
<term>
CSR
|
#19543
The CSR (Connected Speech Recognition) corpus represents a new DARPA speech recognition technology development initiative to advance the state of the art in CSR. |
|
method of using
<term>
expectations
</term>
|
to
|
aid the understanding of
<term>
scruffy texts
|
#13104
This method of using expectations to aid the understanding of scruffy texts has been incorporated into a working computer program called NOMAD, which understands scruffy texts in the domain of Navy messages. |
|
detected
<term>
homophone errors
</term>
.
|
To
|
align
<term>
bilingual texts
</term>
becomes
|
#20532
Also, the method successfully indicates the correct candidates for the detected homophone errors. To align bilingual texts becomes a crucial issue recently. |
|
quantifiers
</term>
which are approximations
|
to
|
all and always , e.g. , almost all , almost
|
#13521
An idea which underlies the theory described in this paper is that a disposition may be viewed as a proposition with implicit fuzzy quantifiers which are approximations to all and always, e.g., almost all, almost always, most, frequently, etc. |
|
language
</term>
; in summary , we intend it
|
to
|
allow
<term>
TAGs
</term>
to be used beyond
|
#16520
The formalism's intended usage is to relate expressions of natural languages to their associated semantics represented in a logical form language, or to their translates in another natural language; in summary, we intend it to allow TAGs to be used beyond their role in syntax proper. |
|
<term>
monolingual
</term>
. We also refer
|
to
|
an
<term>
evaluation method
</term>
and plan
|
#9814
We also refer to an evaluation method and plan to compare our system's output with a benchmark system. |
|
i.e. , retrieving examples most similar
|
to
|
an input expression , is the most dominant
|
#20282
In TDMT, example-retrieval (ER), i.e., retrieving examples most similar to an input expression, is the most dominant part of the total processing time. |
|
human annotated text
</term>
, in addition
|
to
|
an
<term>
unsupervised component
</term>
.
<term>
|
#4564
Our resource-frugal approach results in 87.5% agreement with a state of the art, proprietary Arabic stemmer built using rules, affix lists, and human annotated text, in addition to an unsupervised component. |
|
<term>
domain independent features
</term>
|
to
|
annotate an input
<term>
dataset
</term>
,
|
#5198
We then use the predicates of such clauses to create a set of domain independent features to annotate an input dataset, and run two different machine learning algorithms: SLIPPER, a rule-based learning algorithm, and TiMBL, a memory-based system. |
|
Service ) system
</term>
, which allows it
|
to
|
answer a
<term>
question
</term>
when a full
|
#19353
This paper describes an extension to the MIT ATIS (Air Travel Information Service) system, which allows it to answer a question when a full linguistic analysis fails. |
|
the robust
<term>
parser
</term>
allowed us
|
to
|
answer many more
<term>
questions
</term>
correctly
|
#19477
It was clear that the robust parser allowed us to answer many more questions correctly, as over a third of the sentences were not covered by the grammar. |
|
combination with a
<term>
terabyte corpus
</term>
|
to
|
answer
<term>
natural language tests
</term>
|
#6426
We apply it in combination with a terabyte corpus to answer natural language tests, achieving encouraging results. |
|
</term>
, but the approach is applicable
|
to
|
any
<term>
language
</term>
that needs
<term>
|
#4524
Examples and results will be given for Arabic, but the approach is applicable to any language that needs affix removal. |