lr,2-3-I05-4010,bq |
The resultant
<term>
bilingual
corpus
</term>
, 10.4 M
<term>
English words
</term>
and 18.3 M
<term>
Chinese characters
</term>
, is an authoritative and comprehensive
<term>
text collection
</term>
covering the specific and special domain of HK laws .
|
#8255
The resultant bilingual corpus, 10.4M English words and 18.3M Chinese characters, is an authoritative and comprehensive text collection covering the specific and special domain of HK laws. |
lr-prod,17-4-H92-1074,bq |
This paper presents an overview of the
<term>
CSR corpus
</term>
, reviews the definition and development of the
<term>
CSR pilot
corpus
</term>
, and examines the dynamic challenge of extending the
<term>
CSR corpus
</term>
to meet future needs .
|
#19620
This paper presents an overview of the CSR corpus, reviews the definition and development of the CSR pilot corpus, and examines the dynamic challenge of extending the CSR corpus to meet future needs. |
lr,21-5-P03-1051,bq |
To improve the
<term>
segmentation
</term><term>
accuracy
</term>
, we use an
<term>
unsupervised algorithm
</term>
for automatically acquiring new
<term>
stems
</term>
from a 155 million
<term>
word
</term><term>
unsegmented
corpus
</term>
, and re-estimate the
<term>
model parameters
</term>
with the expanded
<term>
vocabulary
</term>
and
<term>
training corpus
</term>
.
|
#4728
To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus. |
lr,6-3-P06-1052,bq |
We evaluate the algorithm on a
<term>
corpus
</term>
, and show that it reduces the degree of
<term>
ambiguity
</term>
significantly while taking negligible runtime .
|
#11183
We evaluate the algorithm on acorpus, and show that it reduces the degree of ambiguity significantly while taking negligible runtime. |
lr,3-3-P05-1034,bq |
We align a
<term>
parallel
corpus
</term>
, project the
<term>
source dependency parse
</term>
onto the target
<term>
sentence
</term>
, extract
<term>
dependency treelet translation pairs
</term>
, and train a
<term>
tree-based ordering model
</term>
.
|
#9248
We align a parallel corpus, project the source dependency parse onto the target sentence, extract dependency treelet translation pairs, and train a tree-based ordering model. |
lr-prod,7-4-H92-1074,bq |
This paper presents an overview of the
<term>
CSR
corpus
</term>
, reviews the definition and development of the
<term>
CSR pilot corpus
</term>
, and examines the dynamic challenge of extending the
<term>
CSR corpus
</term>
to meet future needs .
|
#19609
This paper presents an overview of the CSR corpus, reviews the definition and development of the CSR pilot corpus, and examines the dynamic challenge of extending the CSR corpus to meet future needs. |
lr,13-1-N03-2006,bq |
In order to boost the
<term>
translation quality
</term>
of
<term>
EBMT
</term>
based on a small-sized
<term>
bilingual
corpus
</term>
, we use an out-of-domain
<term>
bilingual corpus
</term>
and , in addition , the
<term>
language model
</term>
of an in-domain
<term>
monolingual corpus
</term>
.
|
#3093
In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus. |
lr,7-2-P05-2016,bq |
The only
<term>
bilingual resource
</term>
required is a
<term>
sentence-aligned parallel
corpus
</term>
.
|
#9803
The only bilingual resource required is a sentence-aligned parallel corpus. |
lr,29-2-C88-2130,bq |
The
<term>
model
</term>
is embodied in a program ,
<term>
APT
</term>
, that can reproduce segments of actual tape-recorded descriptions , using
<term>
organizational and discourse strategies
</term>
derived through analysis of our
<term>
corpus
</term>
.
|
#15495
The model is embodied in a program, APT, that can reproduce segments of actual tape-recorded descriptions, using organizational and discourse strategies derived through analysis of ourcorpus. |
lr,23-2-C04-1116,bq |
This paper proposes a new methodology to improve the
<term>
accuracy
</term>
of a
<term>
term aggregation system
</term>
using each author 's text as a coherent
<term>
corpus
</term>
.
|
#6137
This paper proposes a new methodology to improve the accuracy of a term aggregation system using each author's text as a coherentcorpus. |
lr,28-2-P03-1051,bq |
Our method is seeded by a small
<term>
manually segmented Arabic corpus
</term>
and uses it to bootstrap an
<term>
unsupervised algorithm
</term>
to build the
<term>
Arabic word segmenter
</term>
from a large
<term>
unsegmented Arabic
corpus
</term>
.
|
#4668
Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus. |
lr,50-3-C04-1147,bq |
In comparison with previous
<term>
models
</term>
, which either use arbitrary
<term>
windows
</term>
to compute
<term>
similarity
</term>
between
<term>
words
</term>
or use
<term>
lexical affinity
</term>
to create
<term>
sequential models
</term>
, in this paper we focus on
<term>
models
</term>
intended to capture the
<term>
co-occurrence patterns
</term>
of any pair of
<term>
words
</term>
or
<term>
phrases
</term>
at any distance in the
<term>
corpus
</term>
.
|
#6400
In comparison with previous models, which either use arbitrary windows to compute similarity between words or use lexical affinity to create sequential models, in this paper we focus on models intended to capture the co-occurrence patterns of any pair of words or phrases at any distance in thecorpus. |
lr,19-2-N03-4010,bq |
The demonstration will focus on how
<term>
JAVELIN
</term>
processes
<term>
questions
</term>
and retrieves the most likely
<term>
answer candidates
</term>
from the given
<term>
text
corpus
</term>
.
|
#3682
The demonstration will focus on how JAVELIN processes questions and retrieves the most likely answer candidates from the given text corpus. |
lr,34-5-P03-1051,bq |
To improve the
<term>
segmentation
</term><term>
accuracy
</term>
, we use an
<term>
unsupervised algorithm
</term>
for automatically acquiring new
<term>
stems
</term>
from a 155 million
<term>
word
</term><term>
unsegmented corpus
</term>
, and re-estimate the
<term>
model parameters
</term>
with the expanded
<term>
vocabulary
</term>
and
<term>
training
corpus
</term>
.
|
#4741
To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus. |
lr,19-5-C90-3063,bq |
An experiment was performed to resolve
<term>
references
</term>
of the
<term>
pronoun
</term><term>
it
</term>
in
<term>
sentences
</term>
that were randomly selected from the
<term>
corpus
</term>
.
|
#16689
An experiment was performed to resolve references of the pronoun it in sentences that were randomly selected from thecorpus. |
lr,30-2-C04-1192,bq |
The method exploits recent advances in
<term>
word alignment
</term>
and
<term>
word clustering
</term>
based on
<term>
automatic extraction of translation equivalents
</term>
and being supported by available
<term>
aligned wordnets
</term>
for the
<term>
languages
</term>
in the
<term>
corpus
</term>
.
|
#6480
The method exploits recent advances in word alignment and word clustering based on automatic extraction of translation equivalents and being supported by available aligned wordnets for the languages in thecorpus. |
lr-prod,26-4-H90-1060,bq |
With only 12
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we achieved a 7.5 %
<term>
word error rate
</term>
on a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management
corpus
</term>
.
|
#17099
With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus. |
lr,29-5-J05-4003,bq |
We also show that a good-quality
<term>
MT system
</term>
can be built from scratch by starting with a very small
<term>
parallel corpus
</term>
( 100,000
<term>
words
</term>
) and exploiting a large
<term>
non-parallel
corpus
</term>
.
|
#9098
We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. |
lr,15-2-C90-3063,bq |
This paper presents an
<term>
automatic scheme
</term>
for collecting
<term>
statistics
</term>
on
<term>
co-occurrence patterns
</term>
in a large
<term>
corpus
</term>
.
|
#16631
This paper presents an automatic scheme for collecting statistics on co-occurrence patterns in a largecorpus. |
lr-prod,15-3-H94-1014,bq |
The models were constructed using a 5K
<term>
vocabulary
</term>
and trained using a 76 million
<term>
word
</term><term>
Wall Street Journal text
corpus
</term>
.
|
#21261
The models were constructed using a 5K vocabulary and trained using a 76 million word Wall Street Journal text corpus. |