#21042This paper reports on two contributions to large vocabulary continuous speech recognition.
tech,8-2-H90-1060,ak
First , we present a new paradigm for
<term>
speaker-independent ( SI ) training
</term>
of
<term>
hidden Markov models ( HMM
#21056First, we present a new paradigm for speaker-independent ( SI ) training of hidden Markov models (HMM), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from many speakers.
model,14-2-H90-1060,ak
speaker-independent ( SI ) training
</term>
of
<term>
hidden Markov models ( HMM )
</term>
, which uses a large amount of
<term>
#21062First, we present a new paradigm for speaker-independent (SI) training of hidden Markov models ( HMM ), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from many speakers.
lr,27-2-H90-1060,ak
</term>
, which uses a large amount of
<term>
speech
</term>
from a few
<term>
speakers
</term>
instead
#21075First, we present a new paradigm for speaker-independent (SI) training of hidden Markov models (HMM), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from many speakers.
other,31-2-H90-1060,ak
amount of
<term>
speech
</term>
from a few
<term>
speakers
</term>
instead of the traditional practice
#21079First, we present a new paradigm for speaker-independent (SI) training of hidden Markov models (HMM), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from many speakers.
other,44-2-H90-1060,ak
of using a little speech from many
<term>
speakers
</term>
. In addition , combination of the
#21092First, we present a new paradigm for speaker-independent (SI) training of hidden Markov models (HMM), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from many speakers .
other,6-3-H90-1060,ak
In addition , combination of the
<term>
training speakers
</term>
is done by averaging the statistics
#21100In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from many speakers prior to training.
model,17-3-H90-1060,ak
statistics of independently trained
<term>
models
</term>
rather than the usual
<term>
pooling
#21111In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from many speakers prior to training.
tech,22-3-H90-1060,ak
models
</term>
rather than the usual
<term>
pooling
</term>
of all the
<term>
speech data
</term>
#21116In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from many speakers prior to training.
lr,26-3-H90-1060,ak
usual
<term>
pooling
</term>
of all the
<term>
speech data
</term>
from many
<term>
speakers
</term>
prior
#21120In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from many speakers prior to training.
other,30-3-H90-1060,ak
<term>
speech data
</term>
from many
<term>
speakers
</term>
prior to
<term>
training
</term>
. With
#21124In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from many speakers prior to training.
tech,33-3-H90-1060,ak
many
<term>
speakers
</term>
prior to
<term>
training
</term>
. With only 12
<term>
training speakers
#21127In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from many speakers prior to training .
other,3-4-H90-1060,ak
<term>
training
</term>
. With only 12
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we
#21132With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
tech,6-4-H90-1060,ak
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we achieved a 7.5 %
<term>
word error
#21135With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
measure(ment),14-4-H90-1060,ak
recognition
</term>
, we achieved a 7.5 %
<term>
word error rate
</term>
on a standard
<term>
grammar
</term>
#21143With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
lr,20-4-H90-1060,ak
word error rate
</term>
on a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
#21149With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
other,22-4-H90-1060,ak
a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management
#21151With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
lr-prod,26-4-H90-1060,ak
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management corpus
</term>
. This performance is comparable
#21155With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
other,10-5-H90-1060,ak
comparable to our best condition for this
<term>
test suite
</term>
, using 109
<term>
training speakers
#21170This performance is comparable to our best condition for this test suite, using 109 training speakers.
other,15-5-H90-1060,ak
<term>
test suite
</term>
, using 109
<term>
training speakers
</term>
. Second , we show a significant
#21175This performance is comparable to our best condition for this test suite, using 109 training speakers.