lr,11-2-P03-1058,bq |
automatically acquire
<term>
sense-tagged training
|
data
|
</term>
from
<term>
English-Chinese parallel
|
#4834
In this paper, we evaluate an approach to automatically acquire sense-tagged training data from English-Chinese parallel corpora, which are then used for disambiguating the nouns in the SENSEVAL-2 English lexical sample task. |
other,16-2-P05-1032,bq |
translations
</term>
in our
<term>
suffix array-based
|
data
|
structure
</term>
. We show how
<term>
sampling
|
#9175
We detail the computational complexity and average retrieval times for looking up phrase translations in our suffix array-based data structure. |
lr,2-1-N03-2003,bq |
</term>
result . Sources of
<term>
training
|
data
|
</term>
suitable for
<term>
language modeling
|
#3017
Sources of training data suitable for language modeling of conversational speech are limited. |
lr,6-3-J05-4003,bq |
approach
</term>
, we extract
<term>
parallel
|
data
|
</term>
from large
<term>
Chinese , Arabic
|
#9032
Using this approach, we extract parallel data from large Chinese, Arabic, and English non-parallel newspaper corpora. |
other,13-1-P05-1067,bq |
statistical models
</term>
to
<term>
structured
|
data
|
</term>
. In this paper , we present a
<term>
|
#9422
Syntax-based statistical machine translation (MT) aims at applying statistical models to structured data. |
other,7-1-P05-1032,bq |
In this paper we describe a novel
<term>
|
data
|
structure
</term>
for
<term>
phrase-based statistical
|
#9128
In this paper we describe a noveldata structure for phrase-based statistical machine translation which allows for the retrieval of arbitrarily long phrases while simultaneously using less memory than is required by current decoder implementations. |
lr,7-4-N03-1012,bq |
<term>
system
</term>
against the
<term>
annotated
|
data
|
</term>
shows that , it successfully classifies
|
#2509
An evaluation of our system against the annotated data shows that, it successfully classifies 73.2% in a German corpus of 2.284 SRHs as either coherent or incoherent (given a baseline of 54.55%). |
other,8-1-N04-4028,bq |
structured databases
</term>
from
<term>
unstructured
|
data
|
sources
</term>
, such as the
<term>
Web
</term>
|
#6763
Information extraction techniques automatically create structured databases from unstructured data sources, such as the Web or newswire documents. |
other,15-1-P03-1009,bq |
classes
</term>
from undisambiguated
<term>
corpus
|
data
|
</term>
. We describe a new approach which
|
#3900
Previous research has demonstrated the utility of clustering in inducing semantic verb classes from undisambiguated corpus data. |
other,30-4-P03-1009,bq |
classifying
</term><term>
undisambiguated SCF
|
data
|
</term>
. We apply a
<term>
decision tree based
|
#3971
A novel evaluation scheme is proposed which accounts for the effect of polysemy on the clusters, offering us a good insight into the potential and limitations of semantically classifying undisambiguated SCF data. |
other,5-2-C92-1055,bq |
the problem of
<term>
insufficient training
|
data
|
</term>
and
<term>
approximation error
</term>
|
#17828
Owing to the problem of insufficient training data and approximation error introduced by the language model, traditional statistical approaches, which resolve ambiguities by indirectly and implicitly using maximum likelihood method, fail to achieve high performance in real applications. |
tech,9-1-H01-1049,bq |
paradigm for
<term>
human interaction with
|
data
|
sources
</term>
. We integrate a
<term>
spoken
|
#792
Listen-Communicate-Show (LCS) is a new paradigm for human interaction with data sources. |
lr,9-1-H05-2007,bq |
<term>
patterns
</term>
in
<term>
translation
|
data
|
</term>
using
<term>
part-of-speech tag sequences
|
#7638
We describe a method for identifying systematic patterns in translation data using part-of-speech tag sequences. |
|
Information System ) domain
</term>
. This
|
data
|
collection effort has been co-ordinated
|
#18545
This data collection effort has been co-ordinated by MADCOW (Multi-site ATIS Data COllection Working group). |
|
ability to spend their time finding more
|
data
|
relevant to their task , and gives them
|
#3615
It gives users the ability to spend their time finding more data relevant to their task, and gives them translingual reach into other languages by leveraging human language technology. |
other,13-1-P03-1005,bq |
</term>
for
<term>
structured natural language
|
data
|
</term>
. The
<term>
HDAG Kernel
</term>
directly
|
#3805
This paper proposes the Hierarchical Directed Acyclic Graph (HDAG) Kernel for structured natural language data. |
other,15-1-C86-1132,bq |
forecasts directly from
<term>
formatted weather
|
data
|
</term>
. Such
<term>
synthesis
</term>
appears
|
#13931
This paper describes a system (RAREAS) which synthesizes marine weather forecasts directly from formatted weather data. Such synthesis appears feasible in certain natural sublanguages with stereotyped text structure. |
lr,8-6-N01-1003,bq |
automatically learned from
<term>
training
|
data
|
</term>
. We show that the trained
<term>
SPR
|
#1431
The SPR uses ranking rules automatically learned from training data. |
other,34-2-P01-1047,bq |
learning algorithm
</term>
from
<term>
structured
|
data
|
</term>
( based on a
<term>
typing-algorithm
|
#1981
Our logical definition leads to a neat relation to categorial grammar, (yielding a treatment of Montague semantics), a parsing-as-deduction in a resource sensitive logic, and a learning algorithm from structured data (based on a typing-algorithm and type-unification). |
lr,15-1-N03-1001,bq |
manual transcription
</term>
of
<term>
training
|
data
|
</term>
. The method combines
<term>
domain
|
#2221
This paper describes a method for utterance classification that does not require manual transcription of training data. |