other,27-3-H01-1041,bq |
case markers
</term>
, relatively
<term>
free
|
word
|
order
</term>
, and frequent omissions of
|
#467
The key features of the system include: (i) Robust efficient parsing of Korean (a verb final language with overt case markers, relatively free word order, and frequent omissions of arguments). |
tech,7-4-H01-1041,bq |
quality
<term>
translation
</term>
via
<term>
|
word
|
sense disambiguation
</term>
and accurate
|
#484
(ii) High quality translation viaword sense disambiguation and accurate word order generation of the target language. |
tech,12-4-H01-1041,bq |
disambiguation
</term>
and accurate
<term>
|
word
|
order generation
</term>
of the
<term>
target
|
#489
(ii) High quality translation via word sense disambiguation and accurateword order generation of the target language. |
other,8-10-H01-1042,bq |
Additionally , they were asked to mark the
<term>
|
word
|
</term>
at which they made this decision
|
#747
Additionally, they were asked to mark theword at which they made this decision. |
other,4-3-H01-1058,bq |
<term>
oracle
</term>
knows the
<term>
reference
|
word
|
string
</term>
and selects the
<term>
word
|
#1075
The oracle knows the reference word string and selects the word string with the best performance (typically, word or semantic error rate) from a list of word strings, where each word string has been obtained by using a different LM. |
other,10-3-H01-1058,bq |
word string
</term>
and selects the
<term>
|
word
|
string
</term>
with the best
<term>
performance
|
#1080
The oracle knows the reference word string and selects theword string with the best performance (typically, word or semantic error rate) from a list of word strings, where each word string has been obtained by using a different LM. |
measure(ment),19-3-H01-1058,bq |
<term>
performance
</term>
( typically ,
<term>
|
word
|
or semantic error rate
</term>
) from a list
|
#1089
The oracle knows the reference word string and selects the word string with the best performance (typically,word or semantic error rate) from a list of word strings, where each word string has been obtained by using a different LM. |
other,29-3-H01-1058,bq |
error rate
</term>
) from a list of
<term>
|
word
|
strings
</term>
, where each
<term>
word string
|
#1099
The oracle knows the reference word string and selects the word string with the best performance (typically, word or semantic error rate) from a list ofword strings, where each word string has been obtained by using a different LM. |
other,34-3-H01-1058,bq |
<term>
word strings
</term>
, where each
<term>
|
word
|
string
</term>
has been obtained by using
|
#1104
The oracle knows the reference word string and selects the word string with the best performance (typically, word or semantic error rate) from a list of word strings, where eachword string has been obtained by using a different LM. |
model,24-3-P01-1004,bq |
</term>
superior to any of the tested
<term>
|
word
|
N-gram models
</term>
. Further , in their
|
#1555
Over two distinct datasets, we find that indexing according to simple character bigrams produces a retrieval accuracy superior to any of the testedword N-gram models. |
other,3-3-P01-1008,bq |
approach yields
<term>
phrasal and single
|
word
|
lexical paraphrases
</term>
as well as
<term>
|
#1806
Our approach yields phrasal and single word lexical paraphrases as well as syntactic paraphrases. |
measure(ment),20-3-N03-1018,bq |
significantly reduce
<term>
character and
|
word
|
error rate
</term>
, and provide evaluation
|
#2766
We present an implementation of the model based on finite-state models, demonstrate the model's ability to significantly reduce character and word error rate, and provide evaluation results involving automatic extraction of translation lexicons from printed text. |
other,66-1-N03-1033,bq |
) fine-grained modeling of
<term>
unknown
|
word
|
features
</term>
. Using these ideas together
|
#2976
We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) fine-grained modeling of unknown word features. |
tech,6-1-N03-2017,bq |
<term>
syntax-based constraint
</term>
for
<term>
|
word
|
alignment
</term>
, known as the
<term>
cohesion
|
#3234
We present a syntax-based constraint forword alignment, known as the cohesion constraint. |
model,14-4-N03-2036,bq |
projections
</term>
using an underlying
<term>
|
word
|
alignment
</term>
. We show experimental
|
#3458
During training, the blocks are learned from source interval projections using an underlyingword alignment. |
other,11-1-P03-1051,bq |
</term>
by a
<term>
model
</term>
that a
<term>
|
word
|
</term>
consists of a sequence of
<term>
morphemes
|
#4611
We approximate Arabic's rich morphology by a model that aword consists of a sequence of morphemes in the pattern prefix*-stem-suffix* (* denotes zero or more occurrences of a morpheme). |
tech,22-2-P03-1051,bq |
algorithm
</term>
to build the
<term>
Arabic
|
word
|
segmenter
</term>
from a large
<term>
unsegmented
|
#4661
Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus. |
other,20-5-P03-1051,bq |
<term>
stems
</term>
from a 155 million
<term>
|
word
|
</term><term>
unsegmented corpus
</term>
,
|
#4726
To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 millionword unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus. |
tech,2-6-P03-1051,bq |
corpus
</term>
. The resulting
<term>
Arabic
|
word
|
segmentation system
</term>
achieves around
|
#4746
The resulting Arabic word segmentation system achieves around 97% exact match accuracy on a test corpus containing 28,449 word tokens. |
other,19-6-P03-1051,bq |
test corpus
</term>
containing 28,449
<term>
|
word
|
tokens
</term>
. We believe this is a state-of-the-art
|
#4762
The resulting Arabic word segmentation system achieves around 97% exact match accuracy on a test corpus containing 28,449word tokens. |