|
occurrences of a
<term>
morpheme
</term>
) .
|
Our
|
method is seeded by a small
<term>
manually
|
#4638
We approximate Arabic's rich morphology by a model that a word consists of a sequence of morphemes in the pattern prefix*-stem-suffix* (* denotes zero or more occurrences of a morpheme). Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus. |
|
break down ,
<term>
communication
</term>
.
|
Our
|
goal is to recognize and isolate such
<term>
|
#14498
Such mistakes can slow, and possibly break down, communication. Our goal is to recognize and isolate such miscommunications and circumvent them. |
|
domain of
<term>
sentence condensation
</term>
.
|
Our
|
<term>
system
</term>
incorporates a
<term>
linguistic
|
#2808
We present an application of ambiguity packing and stochastic disambiguation techniques for Lexical-Functional Grammars (LFG) to the domain of sentence condensation. Our system incorporates a linguistic parser/generator for LFG, a transfer component for parse reduction operating on packed parse forests, and a maximum-entropy model for stochastic output selection. |
|
's text as a coherent
<term>
corpus
</term>
.
|
Our
|
approach is based on the idea that one
|
#6139
This paper proposes a new methodology to improve the accuracy of a term aggregation system using each author's text as a coherent corpus. Our approach is based on the idea that one person tends to use one expression for one meaning. |
|
have in their
<term>
sense coverage
</term>
.
|
Our
|
analysis also highlights the importance
|
#4915
On a subset of the most difficult SENSEVAL-2 nouns, the accuracy difference between the two approaches is only 14.0%, and the difference could narrow further to 6.5% if we disregard the advantage that manually sense-tagged data have in their sense coverage. Our analysis also highlights the importance of the issue of domain dependence in evaluating WSD programs. |
|
complex
<term>
linguistic databases
</term>
.
|
Our
|
most important task in building the
<term>
|
#17293
If we want valuable lexicons and grammars to achieve complex natural language processing, we must provide very powerful tools to help create and ensure the validity of such complex linguistic databases. Our most important task in building the editor was to define a set of coherence rules that could be computationally applied to ensure the validity of lexical entries. |
|
resolution
</term>
in
<term>
spoken dialogue
</term>
.
|
Our
|
<term>
system
</term>
deals with
<term>
pronouns
|
#3987
We apply a decision tree based approach to pronoun resolution in spoken dialogue. Our system deals with pronouns with NP- and non-NP-antecedents. |
|
embedded within
<term>
disjunctions
</term>
.
|
Our
|
interpretation differs from that of Pereira
|
#14751
This semantics for feature structures extends the ideas of Pereira and Shieber [11], by providing an interpretation for values which are specified by disjunctions and path values embedded within disjunctions. Our interpretation differs from that of Pereira and Shieber by using a logical model in place of a denotational semantics. |
|
a lesser extent
<term>
entailment
</term>
.
|
Our
|
<term>
technique
</term>
gives a substantial
|
#8420
Our results show that MT evaluation techniques are able to produce useful features for paraphrase classification and to a lesser extent entailment. Our technique gives a substantial improvement in paraphrase classification accuracy over all of the other models used in the experiments. |
|
</term>
,
<term>
missing periods
</term>
, etc .
|
Our
|
solution to these problems is to make use
|
#13026
However, a great deal of natural language texts e.g., memos, rough drafts, conversation transcripts etc., have features that differ significantly from neat texts, posing special problems for readers, such as misspelled words, missing words, poor syntactic construction, missing periods, etc. Our solution to these problems is to make use of expectations, based both on knowledge of surface English and on world knowledge of the situation being described. |
|
to be
<term>
synonymous expressions
</term>
.
|
Our
|
proposed method improves the
<term>
accuracy
|
#6183
According to our assumption, most of the words with similar context features in each author's corpus tend not to be synonymous expressions. Our proposed method improves the accuracy of our term aggregation system, showing that our approach is successful. |
|
themselves , e.g. block bigram features .
|
Our
|
<term>
training algorithm
</term>
can easily
|
#9626
We use a maximum likelihood criterion to train a log-linear block bigram model which uses real-valued features (e.g. a language model score) as well as binary features based on the block identities themselves, e.g. block bigram features. Our training algorithm can easily handle millions of features. |
|
it is often computationally inefficient .
|
Our
|
<term>
model
</term>
allows a careful examination
|
#14805
Unification is attractive, because of its generality, but it is often computationally inefficient. Our model allows a careful examination of the computational complexity of unification. |
|
outperform
<term>
word-based models
</term>
.
|
Our
|
empirical results , which hold for all
|
#2588
Within our framework, we carry out a large number of experiments to understand better and explain why phrase-based models outperform word-based models. Our empirical results, which hold for all examined language pairs, suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translations. |
|
</term>
as an
<term>
edit operation
</term>
.
|
Our
|
<term>
measure
</term>
can be exactly calculated
|
#10375
In this paper, we will present a new evaluation measure which explicitly models block reordering as an edit operation. Our measure can be exactly calculated in quadratic time. |
|
English/Japanese language pairs
</term>
.
|
Our
|
study reveals that the proposed method
|
#5819
We evaluate the proposed methods through several transliteration/back transliteration experiments for English/Chinese and English/Japanese language pairs. Our study reveals that the proposed method not only reduces an extensive system development effort but also improves the transliteration accuracy significantly. |
|
<term>
Chomsky 's minimalist program
</term>
.
|
Our
|
<term>
logical definition
</term>
leads to
|
#1946
We provide a logical definition of Minimalist grammars, that are Stabler's formalization of Chomsky's minimalist program. Our logical definition leads to a neat relation to categorial grammar, (yielding a treatment of Montague semantics), a parsing-as-deduction in a resource sensitive logic, and a learning algorithm from structured data (based on a typing-algorithm and type-unification). |
|
</term>
that needs
<term>
affix removal
</term>
.
|
Our
|
<term>
resource-frugal approach
</term>
results
|
#4532
Examples and results will be given for Arabic, but the approach is applicable to any language that needs affix removal. Our resource-frugal approach results in 87.5% agreement with a state of the art, proprietary Arabic stemmer built using rules, affix lists, and human annotated text, in addition to an unsupervised component. |
|
simple
<term>
information retrieval
</term>
.
|
Our
|
evaluation shows that our
<term>
filtering
|
#5480
We tested the clustering and filtering processes on electronic newsgroup discussions, and evaluated their performance by means of two experiments: coarse-level clustering and simple information retrieval. Our evaluation shows that our filtering mechanism has a significant positive effect on both tasks. |
|
the
<term>
error-correction rules
</term>
.
|
Our
|
<term>
algorithm
</term>
reported more than
|
#1277
The paper also proposes rule-reduction algorithm applying mutual information to reduce the error-correction rules. Our algorithm reported more than 99% accuracy in both language identification and key prediction. |