#21120In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all thespeech data from many speakers prior to training.
other,16-7-H90-1060,ak
reference ) speaker
</term>
and the
<term>
target speaker
</term>
. Each
<term>
reference model
</term>
#21226A probabilistic spectral mapping is estimated independently for each training (reference) speaker and thetarget speaker.
other,10-8-H90-1060,ak
is transformed to the space of the
<term>
target speaker
</term>
and combined by averaging . Using
#21239Each reference model is transformed to the space of thetarget speaker and combined by averaging.
other,6-9-H90-1060,ak
40
<term>
utterances
</term>
from the
<term>
target speaker
</term>
for
<term>
adaptation
</term>
, the
<term>
#21252Using only 40 utterances from thetarget speaker for adaptation, the error rate dropped to 4.1% --- a 45% reduction in error compared to the SI result.
other,22-4-H90-1060,ak
a standard
<term>
grammar
</term>
and
<term>
test set
</term>
from the
<term>
DARPA Resource Management
#21151With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar andtest set from the DARPA Resource Management corpus.
other,10-5-H90-1060,ak
comparable to our best condition for this
<term>
test suite
</term>
, using 109
<term>
training speakers
#21170This performance is comparable to our best condition for thistest suite, using 109 training speakers.
tech,33-3-H90-1060,ak
many
<term>
speakers
</term>
prior to
<term>
training
</term>
. With only 12
<term>
training speakers
#21127In addition, combination of the training speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from many speakers prior totraining.
other,9-7-H90-1060,ak
is estimated independently for each
<term>
training ( reference ) speaker
</term>
and the
<term>
target speaker
</term>
#21219A probabilistic spectral mapping is estimated independently for eachtraining ( reference ) speaker and the target speaker.
other,6-3-H90-1060,ak
. In addition , combination of the
<term>
training speakers
</term>
is done by averaging the statistics
#21100In addition, combination of thetraining speakers is done by averaging the statistics of independently trained models rather than the usual pooling of all the speech data from many speakers prior to training.
other,3-4-H90-1060,ak
<term>
training
</term>
. With only 12
<term>
training speakers
</term>
for
<term>
SI recognition
</term>
, we
#21132With only 12training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
other,15-5-H90-1060,ak
<term>
test suite
</term>
, using 109
<term>
training speakers
</term>
. Second , we show a significant
#21175This performance is comparable to our best condition for this test suite, using 109training speakers.
other,3-9-H90-1060,ak
combined by averaging . Using only 40
<term>
utterances
</term>
from the
<term>
target speaker
</term>
#21249Using only 40utterances from the target speaker for adaptation, the error rate dropped to 4.1% --- a 45% reduction in error compared to the SI result.
measure(ment),14-4-H90-1060,ak
recognition
</term>
, we achieved a 7.5 %
<term>
word error rate
</term>
on a standard
<term>
grammar
</term>
#21143With only 12 training speakers for SI recognition, we achieved a 7.5%word error rate on a standard grammar and test set from the DARPA Resource Management corpus.