other,31-1-H01-1040,bq |
In this paper we show how two standard outputs from
<term>
information extraction ( IE ) systems
</term>
-
<term>
named entity annotations
</term>
and
<term>
scenario templates
</term>
- can be used to enhance access to
<term>
text
collections
</term>
via a standard
<term>
text browser
</term>
.
|
#305
In this paper we show how two standard outputs from information extraction (IE) systems - named entity annotations and scenario templates - can be used to enhance access totext collections via a standard text browser. |
tech,36-1-H01-1040,bq |
In this paper we show how two standard outputs from
<term>
information extraction ( IE ) systems
</term>
-
<term>
named entity annotations
</term>
and
<term>
scenario templates
</term>
- can be used to enhance access to
<term>
text collections
</term>
via a standard
<term>
text
browser
</term>
.
|
#310
In this paper we show how two standard outputs from information extraction (IE) systems - named entity annotations and scenario templates - can be used to enhance access to text collections via a standardtext browser. |
tech,38-3-H01-1040,bq |
We also report results of a preliminary ,
<term>
qualitative user evaluation
</term>
of the
<term>
system
</term>
, which while broadly positive indicates further work needs to be done on the
<term>
interface
</term>
to make
<term>
users
</term>
aware of the increased potential of
<term>
IE-enhanced
text
browsers
</term>
.
|
#383
We also report results of a preliminary, qualitative user evaluation of the system, which while broadly positive indicates further work needs to be done on the interface to make users aware of the increased potential of IE-enhanced text browsers. |
other,11-7-H01-1042,bq |
Subjects were given a set of up to six extracts of
<term>
translated newswire
text
</term>
.
|
#695
Subjects were given a set of up to six extracts of translated newswire text. |
other,20-2-P01-1008,bq |
We present an
<term>
unsupervised learning algorithm
</term>
for
<term>
identification of paraphrases
</term>
from a
<term>
corpus of multiple English translations
</term>
of the same
<term>
source
text
</term>
.
|
#1798
We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the same source text. |
other,31-1-N03-1018,bq |
In this paper , we introduce a
<term>
generative probabilistic optical character recognition ( OCR ) model
</term>
that describes an end-to-end process in the
<term>
noisy channel framework
</term>
, progressing from generation of
<term>
true
text
</term>
through its transformation into the
<term>
noisy output
</term>
of an
<term>
OCR system
</term>
.
|
#2699
In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing from generation of true text through its transformation into the noisy output of an OCR system. |
other,37-3-N03-1018,bq |
We present an implementation of the
<term>
model
</term>
based on
<term>
finite-state models
</term>
, demonstrate the
<term>
model
</term>
's ability to significantly reduce
<term>
character and word error rate
</term>
, and provide evaluation results involving
<term>
automatic extraction
</term>
of
<term>
translation lexicons
</term>
from
<term>
printed
text
</term>
.
|
#2782
We present an implementation of the model based on finite-state models, demonstrate the model's ability to significantly reduce character and word error rate, and provide evaluation results involving automatic extraction of translation lexicons from printed text. |
other,13-2-N03-2003,bq |
In this paper , we show how
<term>
training data
</term>
can be supplemented with
<term>
text
</term>
from the
<term>
web
</term>
filtered to match the
<term>
style
</term>
and/or
<term>
topic
</term>
of the target
<term>
recognition task
</term>
, but also that it is possible to get bigger performance gains from the
<term>
data
</term>
by using
<term>
class-dependent interpolation
</term>
of
<term>
N-grams
</term>
.
|
#3041
In this paper, we show how training data can be supplemented withtext from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams. |
other,24-1-N03-4010,bq |
The
<term>
JAVELIN system
</term>
integrates a flexible ,
<term>
planning-based architecture
</term>
with a variety of
<term>
language processing modules
</term>
to provide an
<term>
open-domain question answering capability
</term>
on
<term>
free
text
</term>
.
|
#3660
The JAVELIN system integrates a flexible, planning-based architecture with a variety of language processing modules to provide an open-domain question answering capability on free text. |
lr,19-2-N03-4010,bq |
The demonstration will focus on how
<term>
JAVELIN
</term>
processes
<term>
questions
</term>
and retrieves the most likely
<term>
answer candidates
</term>
from the given
<term>
text
corpus
</term>
.
|
#3681
The demonstration will focus on how JAVELIN processes questions and retrieves the most likely answer candidates from the giventext corpus. |
lr,1-3-P03-1050,bq |
No
<term>
parallel
text
</term>
is needed after the
<term>
training phase
</term>
.
|
#4478
No parallel text is needed after the training phase. |
lr,0-4-P03-1050,bq |
<term>
Monolingual , unannotated
text
</term>
can be used to further improve the
<term>
stemmer
</term>
by allowing it to adapt to a desired
<term>
domain
</term>
or
<term>
genre
</term>
.
|
#4489
Monolingual, unannotated text can be used to further improve the stemmer by allowing it to adapt to a desired domain or genre. |
lr,26-6-P03-1050,bq |
Our
<term>
resource-frugal approach
</term>
results in 87.5 %
<term>
agreement
</term>
with a state of the art , proprietary
<term>
Arabic stemmer
</term>
built using
<term>
rules
</term>
,
<term>
affix lists
</term>
, and
<term>
human annotated
text
</term>
, in addition to an
<term>
unsupervised component
</term>
.
|
#4560
Our resource-frugal approach results in 87.5% agreement with a state of the art, proprietary Arabic stemmer built using rules, affix lists, and human annotated text, in addition to an unsupervised component. |
other,16-7-P03-1050,bq |
<term>
Task-based evaluation
</term>
using
<term>
Arabic information retrieval
</term>
indicates an improvement of 22-38 % in
<term>
average precision
</term>
over
<term>
unstemmed
text
</term>
, and 96 % of the performance of the proprietary
<term>
stemmer
</term>
above .
|
#4586
Task-based evaluation using Arabic information retrieval indicates an improvement of 22-38% in average precision over unstemmed text, and 96% of the performance of the proprietary stemmer above. |
tech,3-1-C04-1116,bq |
We present a
<term>
text
mining method
</term>
for finding
<term>
synonymous expressions
</term>
based on the
<term>
distributional hypothesis
</term>
in a set of coherent
<term>
corpora
</term>
.
|
#6095
We present atext mining method for finding synonymous expressions based on the distributional hypothesis in a set of coherent corpora. |
|
This paper proposes a new methodology to improve the
<term>
accuracy
</term>
of a
<term>
term aggregation system
</term>
using each author 's
text
as a coherent
<term>
corpus
</term>
.
|
#6133
This paper proposes a new methodology to improve the accuracy of a term aggregation system using each author's text as a coherent corpus. |
other,26-4-P04-2005,bq |
Our method takes advantage of the different way in which
<term>
word senses
</term>
are lexicalised in
<term>
English
</term>
and
<term>
Chinese
</term>
, and also exploits the large amount of
<term>
Chinese
text
</term>
available in
<term>
corpora
</term>
and on the
<term>
Web
</term>
.
|
#6985
Our method takes advantage of the different way in which word senses are lexicalised in English and Chinese, and also exploits the large amount of Chinese text available in corpora and on the Web. |
lr,11-4-P04-2010,bq |
Furthermore , we present a standalone system that resolves
<term>
pronouns
</term>
in
<term>
unannotated
text
</term>
by using a fully automatic sequence of
<term>
preprocessing modules
</term>
that mimics the manual
<term>
annotation process
</term>
.
|
#7081
Furthermore, we present a standalone system that resolves pronouns in unannotated text by using a fully automatic sequence of preprocessing modules that mimics the manual annotation process. |
tech,24-5-P04-2010,bq |
Although the system performs well within a limited textual domain , further research is needed to make it effective for
<term>
open-domain question answering
</term>
and
<term>
text
summarisation
</term>
.
|
#7122
Although the system performs well within a limited textual domain, further research is needed to make it effective for open-domain question answering andtext summarisation. |
other,35-1-I05-4010,bq |
In this paper we present our recent work on harvesting
<term>
English-Chinese bitexts
</term>
of the laws of Hong Kong from the
<term>
Web
</term>
and aligning them to the
<term>
subparagraph
</term>
level via utilizing the
<term>
numbering system
</term>
in the
<term>
legal
text
hierarchy
</term>
.
|
#8239
In this paper we present our recent work on harvesting English-Chinese bitexts of the laws of Hong Kong from the Web and aligning them to the subparagraph level via utilizing the numbering system in the legal text hierarchy. |