#21042This paper reports on two contributions tolarge vocabulary continuous speech recognition.
tech,8-2-H90-1060,ak
First , we present a new paradigm for
<term>
speaker-independent ( SI ) training
</term>
of
<term>
hidden Markov models ( HMM
#21056First, we present a new paradigm forspeaker-independent ( SI ) training of hidden Markov models (HMM), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from many speakers.
model,14-2-H90-1060,ak
speaker-independent ( SI ) training
</term>
of
<term>
hidden Markov models ( HMM )
</term>
, which uses a large amount of
<term>
#21062First, we present a new paradigm for speaker-independent (SI) training ofhidden Markov models ( HMM ), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from many speakers.
lr,27-2-H90-1060,ak
</term>
, which uses a large amount of
<term>
speech
</term>
from a few
<term>
speakers
</term>
instead
#21075First, we present a new paradigm for speaker-independent (SI) training of hidden Markov models (HMM), which uses a large amount ofspeech from a few speakers instead of the traditional practice of using a little speech from many speakers.
other,31-2-H90-1060,ak
amount of
<term>
speech
</term>
from a few
<term>
speakers
</term>
instead of the traditional practice
#21079First, we present a new paradigm for speaker-independent (SI) training of hidden Markov models (HMM), which uses a large amount of speech from a fewspeakers instead of the traditional practice of using a little speech from many speakers.
other,44-2-H90-1060,ak
of using a little speech from many
<term>
speakers
</term>
. In addition , combination of the
#21092First, we present a new paradigm for speaker-independent (SI) training of hidden Markov models (HMM), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from manyspeakers.
other,6-3-H90-1060,ak
. In addition , combination of the
<term>
training speakers
</term>
is done by averaging the statistics
#21100In addition, combination of thetraining speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from many speakers prior to training.
model,17-3-H90-1060,ak
statistics of independently trained
<term>
models
</term>
rather than the usual
<term>
pooling
#21111In addition, combination of the training speakers is done by averaging the statistics of independently trainedmodels rather than the usual pooling of all the speech data from many speakers prior to training.
tech,22-3-H90-1060,ak
models
</term>
rather than the usual
<term>
pooling
</term>
of all the
<term>
speech data
</term>
#21116In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usualpooling of all the speech data from many speakers prior to training.
lr,26-3-H90-1060,ak
usual
<term>
pooling
</term>
of all the
<term>
speech data
</term>
from many
<term>
speakers
</term>
prior
#21120In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all thespeech data from many speakers prior to training.
other,30-3-H90-1060,ak
the
<term>
speech data
</term>
from many
<term>
speakers
</term>
prior to
<term>
training
</term>
. With
#21124In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from manyspeakers prior to training.
tech,33-3-H90-1060,ak
many
<term>
speakers
</term>
prior to
<term>
training
</term>
. With only 12
<term>
training speakers
#21127In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from many speakers prior totraining.
other,3-4-H90-1060,ak
<term>
training
</term>
. With only 12
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we
#21132With only 12training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
tech,6-4-H90-1060,ak
12
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we achieved a 7.5 %
<term>
word error
#21135With only 12 training speakers forSI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
measure(ment),14-4-H90-1060,ak
recognition
</term>
, we achieved a 7.5 %
<term>
word error rate
</term>
on a standard
<term>
grammar
</term>
#21143With only 12 training speakers for SI recognition, we achieved a 7.5%word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
lr,20-4-H90-1060,ak
word error rate
</term>
on a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
<term>
#21149With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standardgrammar and test set from the DARPA Resource Management corpus.
other,22-4-H90-1060,ak
a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management
#21151With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar andtest set from the DARPA Resource Management corpus.
lr-prod,26-4-H90-1060,ak
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management corpus
</term>
. This performance is comparable
#21155With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from theDARPA Resource Management corpus.
other,10-5-H90-1060,ak
comparable to our best condition for this
<term>
test suite
</term>
, using 109
<term>
training speakers
#21170This performance is comparable to our best condition for thistest suite, using 109 training speakers.
other,15-5-H90-1060,ak
<term>
test suite
</term>
, using 109
<term>
training speakers
</term>
. Second , we show a significant
#21175This performance is comparable to our best condition for this test suite, using 109training speakers.