This paper reports on two contributions to
<term>
large vocabulary continuous speech recognition
</term>
.
#21042This paper reports on two contributions tolarge vocabulary continuous speech recognition.
tech,8-2-H90-1060,ak
First , we present a new paradigm for
<term>
speaker-independent ( SI ) training
</term>
of
<term>
hidden Markov models ( HMM )
</term>
, which uses a large amount of
<term>
speech
</term>
from a few
<term>
speakers
</term>
instead of the traditional practice of using a little speech from many
<term>
speakers
</term>
.
#21056First, we present a new paradigm forspeaker-independent ( SI ) training of hidden Markov models (HMM), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from many speakers.
model,14-2-H90-1060,ak
First , we present a new paradigm for
<term>
speaker-independent ( SI ) training
</term>
of
<term>
hidden Markov models ( HMM )
</term>
, which uses a large amount of
<term>
speech
</term>
from a few
<term>
speakers
</term>
instead of the traditional practice of using a little speech from many
<term>
speakers
</term>
.
#21062First, we present a new paradigm for speaker-independent (SI) training ofhidden Markov models ( HMM ), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from many speakers.
lr,27-2-H90-1060,ak
First , we present a new paradigm for
<term>
speaker-independent ( SI ) training
</term>
of
<term>
hidden Markov models ( HMM )
</term>
, which uses a large amount of
<term>
speech
</term>
from a few
<term>
speakers
</term>
instead of the traditional practice of using a little speech from many
<term>
speakers
</term>
.
#21075First, we present a new paradigm for speaker-independent (SI) training of hidden Markov models (HMM), which uses a large amount ofspeech from a few speakers instead of the traditional practice of using a little speech from many speakers.
other,31-2-H90-1060,ak
First , we present a new paradigm for
<term>
speaker-independent ( SI ) training
</term>
of
<term>
hidden Markov models ( HMM )
</term>
, which uses a large amount of
<term>
speech
</term>
from a few
<term>
speakers
</term>
instead of the traditional practice of using a little speech from many
<term>
speakers
</term>
.
#21079First, we present a new paradigm for speaker-independent (SI) training of hidden Markov models (HMM), which uses a large amount of speech from a fewspeakers instead of the traditional practice of using a little speech from many speakers.
other,44-2-H90-1060,ak
First , we present a new paradigm for
<term>
speaker-independent ( SI ) training
</term>
of
<term>
hidden Markov models ( HMM )
</term>
, which uses a large amount of
<term>
speech
</term>
from a few
<term>
speakers
</term>
instead of the traditional practice of using a little speech from many
<term>
speakers
</term>
.
#21092First, we present a new paradigm for speaker-independent (SI) training of hidden Markov models (HMM), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from manyspeakers.
other,6-3-H90-1060,ak
In addition , combination of the
<term>
training speakers
</term>
is done by averaging the statistics of independently trained
<term>
models
</term>
rather than the usual
<term>
pooling
</term>
of all the
<term>
speech data
</term>
from many
<term>
speakers
</term>
prior to
<term>
training
</term>
.
#21100In addition, combination of thetraining speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from many speakers prior to training.
model,17-3-H90-1060,ak
In addition , combination of the
<term>
training speakers
</term>
is done by averaging the statistics of independently trained
<term>
models
</term>
rather than the usual
<term>
pooling
</term>
of all the
<term>
speech data
</term>
from many
<term>
speakers
</term>
prior to
<term>
training
</term>
.
#21111In addition, combination of the training speakers is done by averaging the statistics of independently trainedmodels rather than the usual pooling of all the speech data from many speakers prior to training.
tech,22-3-H90-1060,ak
In addition , combination of the
<term>
training speakers
</term>
is done by averaging the statistics of independently trained
<term>
models
</term>
rather than the usual
<term>
pooling
</term>
of all the
<term>
speech data
</term>
from many
<term>
speakers
</term>
prior to
<term>
training
</term>
.
#21116In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usualpooling of all the speech data from many speakers prior to training.
lr,26-3-H90-1060,ak
In addition , combination of the
<term>
training speakers
</term>
is done by averaging the statistics of independently trained
<term>
models
</term>
rather than the usual
<term>
pooling
</term>
of all the
<term>
speech data
</term>
from many
<term>
speakers
</term>
prior to
<term>
training
</term>
.
#21120In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all thespeech data from many speakers prior to training.
other,30-3-H90-1060,ak
In addition , combination of the
<term>
training speakers
</term>
is done by averaging the statistics of independently trained
<term>
models
</term>
rather than the usual
<term>
pooling
</term>
of all the
<term>
speech data
</term>
from many
<term>
speakers
</term>
prior to
<term>
training
</term>
.
#21124In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from manyspeakers prior to training.
tech,33-3-H90-1060,ak
In addition , combination of the
<term>
training speakers
</term>
is done by averaging the statistics of independently trained
<term>
models
</term>
rather than the usual
<term>
pooling
</term>
of all the
<term>
speech data
</term>
from many
<term>
speakers
</term>
prior to
<term>
training
</term>
.
#21127In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from many speakers prior totraining.
other,3-4-H90-1060,ak
With only 12
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we achieved a 7.5 %
<term>
word error rate
</term>
on a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management corpus
</term>
.
#21132With only 12training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
tech,6-4-H90-1060,ak
With only 12
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we achieved a 7.5 %
<term>
word error rate
</term>
on a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management corpus
</term>
.
#21135With only 12 training speakers forSI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
measure(ment),14-4-H90-1060,ak
With only 12
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we achieved a 7.5 %
<term>
word error rate
</term>
on a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management corpus
</term>
.
#21143With only 12 training speakers for SI recognition, we achieved a 7.5%word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
lr,20-4-H90-1060,ak
With only 12
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we achieved a 7.5 %
<term>
word error rate
</term>
on a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management corpus
</term>
.
#21149With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standardgrammar and test set from the DARPA Resource Management corpus.
other,22-4-H90-1060,ak
With only 12
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we achieved a 7.5 %
<term>
word error rate
</term>
on a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management corpus
</term>
.
#21151With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar andtest set from the DARPA Resource Management corpus.
lr-prod,26-4-H90-1060,ak
With only 12
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we achieved a 7.5 %
<term>
word error rate
</term>
on a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management corpus
</term>
.
#21155With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from theDARPA Resource Management corpus.
other,10-5-H90-1060,ak
This performance is comparable to our best condition for this
<term>
test suite
</term>
, using 109
<term>
training speakers
</term>
.
#21170This performance is comparable to our best condition for thistest suite, using 109 training speakers.
other,15-5-H90-1060,ak
This performance is comparable to our best condition for this
<term>
test suite
</term>
, using 109
<term>
training speakers
</term>
.
#21175This performance is comparable to our best condition for this test suite, using 109training speakers.