#21255Using only 40 utterances from the target speaker foradaptation, the error rate dropped to 4.1% --- a 45% reduction in error compared to the SI result.
lr-prod,26-4-H90-1060,ak
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management corpus
</term>
. This performance is comparable
#21155With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from theDARPA Resource Management corpus.
measure(ment),12-9-H90-1060,ak
</term>
for
<term>
adaptation
</term>
, the
<term>
error rate
</term>
dropped to 4.1 % --- a 45 % reduction
#21258Using only 40 utterances from the target speaker for adaptation, theerror rate dropped to 4.1% --- a 45% reduction in error compared to the SI result.
lr,20-4-H90-1060,ak
word error rate
</term>
on a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
<term>
#21149With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standardgrammar and test set from the DARPA Resource Management corpus.
model,14-2-H90-1060,ak
speaker-independent ( SI ) training
</term>
of
<term>
hidden Markov models ( HMM )
</term>
, which uses a large amount of
<term>
#21062First, we present a new paradigm for speaker-independent (SI) training ofhidden Markov models ( HMM ), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from many speakers.
tech,7-1-H90-1060,ak
paper reports on two contributions to
<term>
large vocabulary continuous speech recognition
</term>
. First , we present a new paradigm
#21042This paper reports on two contributions tolarge vocabulary continuous speech recognition.
model,17-3-H90-1060,ak
statistics of independently trained
<term>
models
</term>
rather than the usual
<term>
pooling
#21111In addition, combination of the training speakers is done by averaging the statistics of independently trainedmodels rather than the usual pooling of all the speech data from many speakers prior to training.
other,26-6-H90-1060,ak
amount of
<term>
speech
</term>
from the
<term>
new ( target ) speaker
</term>
. A
<term>
probabilistic spectral mapping
#21204Second, we show a significant improvement for speaker adaptation (SA) using the new SI corpus and a small amount of speech from thenew ( target ) speaker.
tech,22-3-H90-1060,ak
models
</term>
rather than the usual
<term>
pooling
</term>
of all the
<term>
speech data
</term>
#21116In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usualpooling of all the speech data from many speakers prior to training.
model,1-7-H90-1060,ak
<term>
new ( target ) speaker
</term>
. A
<term>
probabilistic spectral mapping
</term>
is estimated independently for each
#21211Aprobabilistic spectral mapping is estimated independently for each training (reference) speaker and the target speaker.
model,1-8-H90-1060,ak
the
<term>
target speaker
</term>
. Each
<term>
reference model
</term>
is transformed to the space of the
#21230Eachreference model is transformed to the space of the target speaker and combined by averaging.
lr,16-6-H90-1060,ak
adaptation ( SA )
</term>
using the new
<term>
SI corpus
</term>
and a small amount of
<term>
speech
#21194Second, we show a significant improvement for speaker adaptation (SA) using the newSI corpus and a small amount of speech from the new (target) speaker.
tech,6-4-H90-1060,ak
12
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we achieved a 7.5 %
<term>
word error
#21135With only 12 training speakers forSI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
tech,8-6-H90-1060,ak
show a significant improvement for
<term>
speaker adaptation ( SA )
</term>
using the new
<term>
SI corpus
</term>
#21186Second, we show a significant improvement forspeaker adaptation ( SA ) using the new SI corpus and a small amount of speech from the new (target) speaker.
tech,8-2-H90-1060,ak
First , we present a new paradigm for
<term>
speaker-independent ( SI ) training
</term>
of
<term>
hidden Markov models ( HMM
#21056First, we present a new paradigm forspeaker-independent ( SI ) training of hidden Markov models (HMM), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from many speakers.
other,31-2-H90-1060,ak
amount of
<term>
speech
</term>
from a few
<term>
speakers
</term>
instead of the traditional practice
#21079First, we present a new paradigm for speaker-independent (SI) training of hidden Markov models (HMM), which uses a large amount of speech from a fewspeakers instead of the traditional practice of using a little speech from many speakers.
other,44-2-H90-1060,ak
of using a little speech from many
<term>
speakers
</term>
. In addition , combination of the
#21092First, we present a new paradigm for speaker-independent (SI) training of hidden Markov models (HMM), which uses a large amount of speech from a few speakers instead of the traditional practice of using a little speech from manyspeakers.
other,30-3-H90-1060,ak
the
<term>
speech data
</term>
from many
<term>
speakers
</term>
prior to
<term>
training
</term>
. With
#21124In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from manyspeakers prior to training.
lr,27-2-H90-1060,ak
</term>
, which uses a large amount of
<term>
speech
</term>
from a few
<term>
speakers
</term>
instead
#21075First, we present a new paradigm for speaker-independent (SI) training of hidden Markov models (HMM), which uses a large amount ofspeech from a few speakers instead of the traditional practice of using a little speech from many speakers.
lr,23-6-H90-1060,ak
corpus
</term>
and a small amount of
<term>
speech
</term>
from the
<term>
new ( target ) speaker
#21201Second, we show a significant improvement for speaker adaptation (SA) using the new SI corpus and a small amount ofspeech from the new (target) speaker.