|
papers in English , many systems to run off
|
texts
|
have been developed . In this paper , we
|
#12231
In order to meet the needs of a publication of papers in English, many systems to run off texts have been developed. |
lr,1-3-P03-1050,bq |
training resources
</term>
. No
<term>
parallel
|
text
|
</term>
is needed after the
<term>
training
|
#4478
No parallel text is needed after the training phase. |
lr,11-4-P04-2010,bq |
<term>
pronouns
</term>
in
<term>
unannotated
|
text
|
</term>
by using a fully automatic sequence
|
#7081
Furthermore, we present a standalone system that resolves pronouns in unannotated text by using a fully automatic sequence of preprocessing modules that mimics the manual annotation process. |
lr-prod,15-3-H94-1014,bq |
<term>
word
</term><term>
Wall Street Journal
|
text
|
corpus
</term>
. Using the
<term>
BU recognition
|
#21260
The models were constructed using a 5K vocabulary and trained using a 76 million word Wall Street Journal text corpus. |
other,0-1-A94-1026,bq |
language translation
</term>
.
<term>
Japanese
|
texts
|
</term>
frequently suffer from the
<term>
homophone
|
#20367
Japanese texts frequently suffer from the homophone errors caused by the KANA-KANJI conversion needed to input the text. |
other,10-2-A88-1001,bq |
heuristically-produced complete
<term>
sentences
</term>
in
<term>
|
text
|
</term>
or
<term>
text-to-speech form
</term>
|
#14892
Multimedia answers include videodisc images and heuristically-produced complete sentences intext or text-to-speech form. |
other,11-7-H01-1042,bq |
six extracts of
<term>
translated newswire
|
text
|
</term>
. Some of the extracts were
<term>
|
#695
Subjects were given a set of up to six extracts of translated newswire text. |
other,12-3-C92-4207,bq |
</term>
, which takes
<term>
natural language
|
texts
|
</term>
and produces a
<term>
model
</term>
of
|
#18444
It is done by an experimental computer program SPRINT, which takes natural language texts and produces a model of the described world. |
other,12-4-P06-1013,bq |
are derived automatically from
<term>
raw
|
text
|
</term>
. Experiments using the
<term>
SemCor
|
#11022
Our combination methods rely on predominant senses which are derived automatically from raw text. |
other,13-1-P82-1035,bq |
under the assumption that the input
<term>
|
text
|
</term>
will be in reasonably neat form ,
|
#12957
Most large text-understanding systems have been designed under the assumption that the inputtext will be in reasonably neat form, e.g., newspaper stories and other edited texts. |
other,13-2-N03-2003,bq |
data
</term>
can be supplemented with
<term>
|
text
|
</term>
from the
<term>
web
</term>
filtered
|
#3041
In this paper, we show how training data can be supplemented withtext from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams. |
other,2-1-C94-1026,bq |
homophone errors
</term>
. To align
<term>
bilingual
|
texts
|
</term>
becomes a crucial issue recently
|
#20535
To align bilingual texts becomes a crucial issue recently. |
other,20-2-P01-1008,bq |
translations
</term>
of the same
<term>
source
|
text
|
</term>
. Our approach yields
<term>
phrasal
|
#1798
We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the same source text. |
other,24-1-A92-1027,bq |
specific information from
<term>
unrestricted
|
texts
|
</term>
where many of the
<term>
words
</term>
|
#17568
We present an efficient algorithm for chart-based phrase structure parsing of natural language that is tailored to the problem of extracting specific information from unrestricted texts where many of the words are unknown and much of the text is irrelevant to the task. |
other,24-1-N03-4010,bq |
answering capability
</term>
on
<term>
free
|
text
|
</term>
. The demonstration will focus on
|
#3660
The JAVELIN system integrates a flexible, planning-based architecture with a variety of language processing modules to provide an open-domain question answering capability on free text. |
other,24-4-I05-2014,bq |
systems
</term>
outputting
<term>
unsegmented
|
texts
|
</term>
with , for instance ,
<term>
statistical
|
#7771
The use of BLEU at the character level eliminates the word segmentation problem: it makes it possible to directly compare commercial systems outputting unsegmented texts with, for instance, statistical MT systems which usually segment their outputs. |
other,28-1-C86-1132,bq |
sublanguages
</term>
with
<term>
stereotyped
|
text
|
structure
</term>
.
<term>
RAREAS
</term>
draws
|
#13943
This paper describes a system (RAREAS) which synthesizes marine weather forecasts directly from formatted weather data. Such synthesis appears feasible in certain natural sublanguages with stereotyped text structure. |
other,31-1-H01-1040,bq |
- can be used to enhance access to
<term>
|
text
|
collections
</term>
via a standard
<term>
text
|
#305
In this paper we show how two standard outputs from information extraction (IE) systems - named entity annotations and scenario templates - can be used to enhance access totext collections via a standard text browser. |
other,31-1-N03-1018,bq |
progressing from generation of
<term>
true
|
text
|
</term>
through its transformation into the
|
#2699
In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing from generation of true text through its transformation into the noisy output of an OCR system. |
other,35-1-I05-4010,bq |
numbering system
</term>
in the
<term>
legal
|
text
|
hierarchy
</term>
. Basic methodology and
|
#8239
In this paper we present our recent work on harvesting English-Chinese bitexts of the laws of Hong Kong from the Web and aligning them to the subparagraph level via utilizing the numbering system in the legal text hierarchy. |