Using only 40
<term>
utterances
</term>
from the
<term>
target speaker
</term>
for
<term>
adaptation
</term>
, the
<term>
error rate
</term>
dropped to 4.1 % --- a 45 % reduction in error compared to the SI result .
#21255Using only 40 utterances from the target speaker foradaptation, the error rate dropped to 4.1% --- a 45% reduction in error compared to the SI result.
other,10-5-H90-1060,ak
This performance is comparable to our best condition for this
<term>
test suite
</term>
, using 109
<term>
training speakers
</term>
.
#21170This performance is comparable to our best condition for thistest suite, using 109 training speakers.
tech,6-4-H90-1060,ak
With only 12
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we achieved a 7.5 %
<term>
word error rate
</term>
on a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management corpus
</term>
.
#21135With only 12 training speakers forSI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
model,14-2-H90-1060,ak
First , we present a new paradigm for
<term>
speaker-independent ( SI ) training
</term>
of
<term>
hidden Markov models ( HMM )
</term>
, which uses a large amount of
<term>
speech
</term>
from a few
<term>
speakers
</term>
instead of the traditional practice of using a little speech from many
<term>
speakers
</term>
.
#21062First, we present a new paradigm for speaker-independent (SI) training ofhidden Markov models ( HMM ), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from many speakers.
other,26-6-H90-1060,ak
Second , we show a significant improvement for
<term>
speaker adaptation ( SA )
</term>
using the new
<term>
SI corpus
</term>
and a small amount of
<term>
speech
</term>
from the
<term>
new ( target ) speaker
</term>
.
#21204Second, we show a significant improvement for speaker adaptation (SA) using the new SI corpus and a small amount of speech from thenew ( target ) speaker.
other,16-7-H90-1060,ak
A
<term>
probabilistic spectral mapping
</term>
is estimated independently for each
<term>
training ( reference ) speaker
</term>
and the
<term>
target speaker
</term>
.
#21226A probabilistic spectral mapping is estimated independently for each training (reference) speaker and thetarget speaker.
tech,7-1-H90-1060,ak
This paper reports on two contributions to
<term>
large vocabulary continuous speech recognition
</term>
.
#21042This paper reports on two contributions tolarge vocabulary continuous speech recognition.
other,44-2-H90-1060,ak
First , we present a new paradigm for
<term>
speaker-independent ( SI ) training
</term>
of
<term>
hidden Markov models ( HMM )
</term>
, which uses a large amount of
<term>
speech
</term>
from a few
<term>
speakers
</term>
instead of the traditional practice of using a little speech from many
<term>
speakers
</term>
.
#21092First, we present a new paradigm for speaker-independent (SI) training of hidden Markov models (HMM), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from manyspeakers.
other,15-5-H90-1060,ak
This performance is comparable to our best condition for this
<term>
test suite
</term>
, using 109
<term>
training speakers
</term>
.
#21175This performance is comparable to our best condition for this test suite, using 109training speakers.
lr-prod,26-4-H90-1060,ak
With only 12
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we achieved a 7.5 %
<term>
word error rate
</term>
on a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management corpus
</term>
.
#21155With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from theDARPA Resource Management corpus.
tech,33-3-H90-1060,ak
In addition , combination of the
<term>
training speakers
</term>
is done by averaging the statistics of independently trained
<term>
models
</term>
rather than the usual
<term>
pooling
</term>
of all the
<term>
speech data
</term>
from many
<term>
speakers
</term>
prior to
<term>
training
</term>
.
#21127In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from many speakers prior totraining.
lr,16-6-H90-1060,ak
Second , we show a significant improvement for
<term>
speaker adaptation ( SA )
</term>
using the new
<term>
SI corpus
</term>
and a small amount of
<term>
speech
</term>
from the
<term>
new ( target ) speaker
</term>
.
#21194Second, we show a significant improvement for speaker adaptation (SA) using the newSI corpus and a small amount of speech from the new (target) speaker.
other,10-8-H90-1060,ak
Each
<term>
reference model
</term>
is transformed to the space of the
<term>
target speaker
</term>
and combined by averaging .
#21239Each reference model is transformed to the space of thetarget speaker and combined by averaging.
lr,20-4-H90-1060,ak
With only 12
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we achieved a 7.5 %
<term>
word error rate
</term>
on a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management corpus
</term>
.
#21149With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standardgrammar and test set from the DARPA Resource Management corpus.
other,9-7-H90-1060,ak
A
<term>
probabilistic spectral mapping
</term>
is estimated independently for each
<term>
training ( reference ) speaker
</term>
and the
<term>
target speaker
</term>
.
#21219A probabilistic spectral mapping is estimated independently for eachtraining ( reference ) speaker and the target speaker.
measure(ment),12-9-H90-1060,ak
Using only 40
<term>
utterances
</term>
from the
<term>
target speaker
</term>
for
<term>
adaptation
</term>
, the
<term>
error rate
</term>
dropped to 4.1 % --- a 45 % reduction in error compared to the SI result .
#21258Using only 40 utterances from the target speaker for adaptation, theerror rate dropped to 4.1% --- a 45% reduction in error compared to the SI result.
other,6-9-H90-1060,ak
Using only 40
<term>
utterances
</term>
from the
<term>
target speaker
</term>
for
<term>
adaptation
</term>
, the
<term>
error rate
</term>
dropped to 4.1 % --- a 45 % reduction in error compared to the SI result .
#21252Using only 40 utterances from thetarget speaker for adaptation, the error rate dropped to 4.1% --- a 45% reduction in error compared to the SI result.
other,3-4-H90-1060,ak
With only 12
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we achieved a 7.5 %
<term>
word error rate
</term>
on a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management corpus
</term>
.
#21132With only 12training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
lr,27-2-H90-1060,ak
First , we present a new paradigm for
<term>
speaker-independent ( SI ) training
</term>
of
<term>
hidden Markov models ( HMM )
</term>
, which uses a large amount of
<term>
speech
</term>
from a few
<term>
speakers
</term>
instead of the traditional practice of using a little speech from many
<term>
speakers
</term>
.
#21075First, we present a new paradigm for speaker-independent (SI) training of hidden Markov models (HMM), which uses a large amount ofspeech from a few speakers instead of the traditional practice of using a little speech from many speakers.
lr,26-3-H90-1060,ak
In addition , combination of the
<term>
training speakers
</term>
is done by averaging the statistics of independently trained
<term>
models
</term>
rather than the usual
<term>
pooling
</term>
of all the
<term>
speech data
</term>
from many
<term>
speakers
</term>
prior to
<term>
training
</term>
.
#21120In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all thespeech data from many speakers prior to training.