#4465The stemming model is based on statistical machine translation and it uses an English stemmer and asmall ( 10K sentences ) parallel corpus as its sole training resources.
lr,26-6-P03-1050,ak
</term>
,
<term>
affix lists
</term>
, and
<term>
human annotated text
</term>
, in addition to an
<term>
unsupervised
#4560Our resource-frugal approach results in 87.5% agreement with a state of the art, proprietary Arabic stemmer built using rules, affix lists, andhuman annotated text, in addition to an unsupervised component.
lr,1-3-P03-1050,ak
<term>
training resources
</term>
. No
<term>
parallel text
</term>
is needed after the
<term>
training
#4479Noparallel text is needed after the training phase.
other,22-4-P03-1050,ak
to a desired
<term>
domain
</term>
or
<term>
genre
</term>
. Examples and results will be given
#4510Monolingual, unannotated text can be used to further improve the stemmer by allowing it to adapt to a desired domain orgenre.
other,16-5-P03-1050,ak
the approach is applicable to any
<term>
language
</term>
that needs
<term>
affix removal
</term>
#4528Examples and results will be given for Arabic, but the approach is applicable to anylanguage that needs affix removal.
lr,22-6-P03-1050,ak
</term>
built using
<term>
rules
</term>
,
<term>
affix lists
</term>
, and
<term>
human annotated text
</term>
#4556Our resource-frugal approach results in 87.5% agreement with a state of the art, proprietary Arabic stemmer built using rules,affix lists, and human annotated text, in addition to an unsupervised component.
tech,28-7-P03-1050,ak
the performance of the proprietary
<term>
stemmer
</term>
above . We approximate
<term>
Arabic
#4599Task-based evaluation using Arabic information retrieval indicates an improvement of 22-38% in average precision over unstemmed text, and 96% of the performance of the proprietarystemmer above.
lr,27-2-P03-1050,ak
parallel corpus
</term>
as its sole
<term>
training resources
</term>
. No
<term>
parallel text
</term>
is
#4475The stemming model is based on statistical machine translation and it uses an English stemmer and a small (10K sentences) parallel corpus as its soletraining resources.