lr,12-2-P01-1008,bq |
identification of paraphrases
</term>
from a
<term>
|
corpus
|
of multiple English translations
</term>
|
#1789
We present an unsupervised learning algorithm for identification of paraphrases from acorpus of multiple English translations of the same source text. |
lr,19-4-N03-1012,bq |
successfully classifies 73.2 % in a
<term>
German
|
corpus
|
</term>
of 2.284
<term>
SRHs
</term>
as either
|
#2521
An evaluation of our system against the annotated data shows that, it successfully classifies 73.2% in a German corpus of 2.284 SRHs as either coherent or incoherent (given a baseline of 54.55%). |
lr,13-1-N03-2006,bq |
</term>
based on a small-sized
<term>
bilingual
|
corpus
|
</term>
, we use an out-of-domain
<term>
bilingual
|
#3093
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr,10-5-N03-2025,bq |
Markov Model
</term>
is trained on a
<term>
|
corpus
|
</term>
automatically tagged by the first
|
#3368
Then, a Hidden Markov Model is trained on acorpus automatically tagged by the first learner. |
lr,19-2-N03-4010,bq |
candidates
</term>
from the given
<term>
text
|
corpus
|
</term>
. The operation of the
<term>
system
|
#3682
The demonstration will focus on how JAVELIN processes questions and retrieves the most likely answer candidates from the given text corpus. |
other,15-1-P03-1009,bq |
classes
</term>
from undisambiguated
<term>
|
corpus
|
data
</term>
. We describe a new approach
|
#3899
Previous research has demonstrated the utility of clustering in inducing semantic verb classes from undisambiguatedcorpus data. |
lr,22-2-P03-1050,bq |
a small ( 10K sentences )
<term>
parallel
|
corpus
|
</term>
as its sole
<term>
training resources
|
#4469
The stemming model is based on statistical machine translation and it uses an English stemmer and a small (10K sentences) parallel corpus as its sole training resources. |
lr,7-2-P03-1051,bq |
by a small
<term>
manually segmented Arabic
|
corpus
|
</term>
and uses it to bootstrap an
<term>
|
#4648
Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus. |
lr,9-1-P03-1068,bq |
of a large ,
<term>
semantically annotated
|
corpus
|
</term>
resource as reliable basis for the
|
#4943
We describe the ongoing construction of a large, semantically annotated corpus resource as reliable basis for the large-scale acquisition of word-semantic information, e.g. the construction of domain-independent lexica. |
lr,6-3-C04-1106,bq |
experiments conducted on a
<term>
multilingual
|
corpus
|
</term>
to estimate the number of
<term>
analogies
|
#5916
We report experiments conducted on a multilingual corpus to estimate the number of analogies among the sentences that it contains. |
lr,23-2-C04-1116,bq |
each author 's text as a coherent
<term>
|
corpus
|
</term>
. Our approach is based on the idea
|
#6137
This paper proposes a new methodology to improve the accuracy of a term aggregation system using each author's text as a coherentcorpus. |
lr,50-3-C04-1147,bq |
phrases
</term>
at any distance in the
<term>
|
corpus
|
</term>
. The framework is flexible , allowing
|
#6400
In comparison with previous models, which either use arbitrary windows to compute similarity between words or use lexical affinity to create sequential models, in this paper we focus on models intended to capture the co-occurrence patterns of any pair of words or phrases at any distance in thecorpus. |
lr,30-2-C04-1192,bq |
for the
<term>
languages
</term>
in the
<term>
|
corpus
|
</term>
. The
<term>
wordnets
</term>
are aligned
|
#6480
The method exploits recent advances in word alignment and word clustering based on automatic extraction of translation equivalents and being supported by available aligned wordnets for the languages in thecorpus. |
lr,2-3-I05-4010,bq |
in detail . The resultant
<term>
bilingual
|
corpus
|
</term>
, 10.4 M
<term>
English words
</term>
|
#8255
The resultant bilingual corpus, 10.4M English words and 18.3M Chinese characters, is an authoritative and comprehensive text collection covering the specific and special domain of HK laws. |
lr,19-5-J05-4003,bq |
starting with a very small
<term>
parallel
|
corpus
|
</term>
( 100,000
<term>
words
</term>
) and
|
#9088
We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. |
lr,3-3-P05-1034,bq |
component
</term>
. We align a
<term>
parallel
|
corpus
|
</term>
, project the
<term>
source dependency
|
#9248
We align a parallel corpus, project the source dependency parse onto the target sentence, extract dependency treelet translation pairs, and train a tree-based ordering model. |
lr,11-4-P05-1074,bq |
extracted from a
<term>
bilingual parallel
|
corpus
|
</term>
to be ranked using
<term>
translation
|
#9729
We define a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities, and show how it can be refined to take contextual information into account. |
lr,7-2-P05-2016,bq |
required is a
<term>
sentence-aligned parallel
|
corpus
|
</term>
. All other
<term>
resources
</term>
|
#9803
The only bilingual resource required is a sentence-aligned parallel corpus. |
tech,4-1-N06-4001,bq |
strategies . We introduce a new
<term>
interactive
|
corpus
|
exploration tool
</term>
called
<term>
InfoMagnets
|
#10870
We introduce a new interactive corpus exploration tool called InfoMagnets. |
lr,6-3-P06-1052,bq |
</term>
. We evaluate the algorithm on a
<term>
|
corpus
|
</term>
, and show that it reduces the degree
|
#11183
We evaluate the algorithm on acorpus, and show that it reduces the degree of ambiguity significantly while taking negligible runtime. |