other,31-1-H01-1040,bq |
- can be used to enhance access to
<term>
|
text
|
collections
</term>
via a standard
<term>
text
|
#305
In this paper we show how two standard outputs from information extraction (IE) systems - named entity annotations and scenario templates - can be used to enhance access totext collections via a standard text browser. |
other,11-7-H01-1042,bq |
six extracts of
<term>
translated newswire
|
text
|
</term>
. Some of the extracts were
<term>
|
#695
Subjects were given a set of up to six extracts of translated newswire text. |
other,20-2-P01-1008,bq |
translations
</term>
of the same
<term>
source
|
text
|
</term>
. Our approach yields
<term>
phrasal
|
#1798
We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the same source text. |
other,31-1-N03-1018,bq |
progressing from generation of
<term>
true
|
text
|
</term>
through its transformation into the
|
#2699
In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing from generation of true text through its transformation into the noisy output of an OCR system. |
other,13-2-N03-2003,bq |
data
</term>
can be supplemented with
<term>
|
text
|
</term>
from the
<term>
web
</term>
filtered
|
#3041
In this paper, we show how training data can be supplemented withtext from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams. |
other,24-1-N03-4010,bq |
answering capability
</term>
on
<term>
free
|
text
|
</term>
. The demonstration will focus on
|
#3660
The JAVELIN system integrates a flexible, planning-based architecture with a variety of language processing modules to provide an open-domain question answering capability on free text. |
lr,1-3-P03-1050,bq |
training resources
</term>
. No
<term>
parallel
|
text
|
</term>
is needed after the
<term>
training
|
#4478
No parallel text is needed after the training phase. |
tech,3-1-C04-1116,bq |
smaller and more robust . We present a
<term>
|
text
|
mining method
</term>
for finding
<term>
synonymous
|
#6095
We present atext mining method for finding synonymous expressions based on the distributional hypothesis in a set of coherent corpora. |
tech,26-3-P04-2005,bq |
Sense Disambiguation ( WSD )
</term>
and
<term>
|
Text
|
Summarisation
</term>
. Our method takes
|
#6955
Topic signatures can be useful in a number of Natural Language Processing (NLP) applications, such as Word Sense Disambiguation (WSD) andText Summarisation. |
lr,11-4-P04-2010,bq |
<term>
pronouns
</term>
in
<term>
unannotated
|
text
|
</term>
by using a fully automatic sequence
|
#7081
Furthermore, we present a standalone system that resolves pronouns in unannotated text by using a fully automatic sequence of preprocessing modules that mimics the manual annotation process. |
other,24-4-I05-2014,bq |
systems
</term>
outputting
<term>
unsegmented
|
texts
|
</term>
with , for instance ,
<term>
statistical
|
#7771
The use of BLEU at the character level eliminates the word segmentation problem: it makes it possible to directly compare commercial systems outputting unsegmented texts with, for instance, statistical MT systems which usually segment their outputs. |
other,35-1-I05-4010,bq |
numbering system
</term>
in the
<term>
legal
|
text
|
hierarchy
</term>
. Basic methodology and
|
#8239
In this paper we present our recent work on harvesting English-Chinese bitexts of the laws of Hong Kong from the Web and aligning them to the subparagraph level via utilizing the numbering system in the legal text hierarchy. |
tech,15-2-N06-4001,bq |
researchers who are not experts in
<term>
|
text
|
mining
</term>
. As evidence of its usefulness
|
#10891
InfoMagnets aims at making exploratory corpus analysis accessible to researchers who are not experts intext mining. |
other,12-4-P06-1013,bq |
are derived automatically from
<term>
raw
|
text
|
</term>
. Experiments using the
<term>
SemCor
|
#11022
Our combination methods rely on predominant senses which are derived automatically from raw text. |
|
papers in English , many systems to run off
|
texts
|
have been developed . In this paper , we
|
#12231
In order to meet the needs of a publication of papers in English, many systems to run off texts have been developed. |
other,13-1-P82-1035,bq |
under the assumption that the input
<term>
|
text
|
</term>
will be in reasonably neat form ,
|
#12957
Most large text-understanding systems have been designed under the assumption that the inputtext will be in reasonably neat form, e.g., newspaper stories and other edited texts. |
tech,6-1-P84-1078,bq |
describes
<term>
Paul
</term>
, a
<term>
computer
|
text
|
generation system
</term>
designed to create
|
#13751
This report describes Paul, a computer text generation system designed to create cohesive text through the use of lexical substitutions. |
other,28-1-C86-1132,bq |
sublanguages
</term>
with
<term>
stereotyped
|
text
|
structure
</term>
.
<term>
RAREAS
</term>
draws
|
#13943
This paper describes a system (RAREAS) which synthesizes marine weather forecasts directly from formatted weather data. Such synthesis appears feasible in certain natural sublanguages with stereotyped text structure. |
other,10-2-A88-1001,bq |
heuristically-produced complete
<term>
sentences
</term>
in
<term>
|
text
|
</term>
or
<term>
text-to-speech form
</term>
|
#14892
Multimedia answers include videodisc images and heuristically-produced complete sentences intext or text-to-speech form. |
other,6-2-C88-1044,bq |
</term>
. We examine a broad range of
<term>
|
texts
|
</term>
to show how the distribution of
<term>
|
#15199
We examine a broad range oftexts to show how the distribution of demonstrative forms and functions is genre dependent. |