tech,9-9-H90-1060,bq Using only 40 <term> utterances </term> from the <term> target speaker </term> for <term> adaptation </term> , the <term> error rate </term> dropped to 4.1 % --- a 45 % reduction in <term> error </term> compared to the <term> SI </term> result .
tech,6-4-H90-1060,bq With only 12 <term> training speakers </term> for <term> SI recognition </term> , we achieved a 7.5 % <term> word error rate </term> on a standard <term> grammar </term> and <term> test set </term> from the <term> DARPA Resource Management corpus </term> .
tech,14-2-H90-1060,bq First , we present a new paradigm for <term> speaker-independent ( SI ) training </term> of <term> hidden Markov models ( HMM ) </term> , which uses a large amount of <term> speech </term> from a few <term> speakers </term> instead of the traditional practice of using a little <term> speech </term> from many <term> speakers </term> .
other,30-6-H90-1060,bq Second , we show a significant improvement for <term> speaker adaptation ( SA ) </term> using the new <term> SI corpus </term> and a small amount of <term> speech </term> from the new ( target ) <term> speaker </term> .
other,16-7-H90-1060,bq A <term> probabilistic spectral mapping </term> is estimated independently for each <term> training ( reference ) speaker </term> and the <term> target speaker </term> .
other,7-1-H90-1060,bq This paper reports on two contributions to <term> large vocabulary continuous speech recognition </term> .
other,44-2-H90-1060,bq First , we present a new paradigm for <term> speaker-independent ( SI ) training </term> of <term> hidden Markov models ( HMM ) </term> , which uses a large amount of <term> speech </term> from a few <term> speakers </term> instead of the traditional practice of using a little <term> speech </term> from many <term> speakers </term> .
other,15-5-H90-1060,bq This <term> performance </term> is comparable to our best condition for this test suite , using 109 <term> training speakers </term> .
lr-prod,26-4-H90-1060,bq With only 12 <term> training speakers </term> for <term> SI recognition </term> , we achieved a 7.5 % <term> word error rate </term> on a standard <term> grammar </term> and <term> test set </term> from the <term> DARPA Resource Management corpus </term> .
tech,15-8-H90-1060,bq Each <term> reference model </term> is transformed to the <term> space </term> of the <term> target speaker </term> and combined by <term> averaging </term> .
tech,34-3-H90-1060,bq In addition , combination of the <term> training speakers </term> is done by averaging the <term> statistics > </term> of <term> independently trained models </term> rather than the usual pooling of all the <term> speech data </term> from many <term> speakers </term> prior to <term> training </term> .
lr,16-6-H90-1060,bq Second , we show a significant improvement for <term> speaker adaptation ( SA ) </term> using the new <term> SI corpus </term> and a small amount of <term> speech </term> from the new ( target ) <term> speaker </term> .
other,10-8-H90-1060,bq Each <term> reference model </term> is transformed to the <term> space </term> of the <term> target speaker </term> and combined by <term> averaging </term> .
lr,20-4-H90-1060,bq With only 12 <term> training speakers </term> for <term> SI recognition </term> , we achieved a 7.5 % <term> word error rate </term> on a standard <term> grammar </term> and <term> test set </term> from the <term> DARPA Resource Management corpus </term> .
other,9-7-H90-1060,bq A <term> probabilistic spectral mapping </term> is estimated independently for each <term> training ( reference ) speaker </term> and the <term> target speaker </term> .
other,24-9-H90-1060,bq Using only 40 <term> utterances </term> from the <term> target speaker </term> for <term> adaptation </term> , the <term> error rate </term> dropped to 4.1 % --- a 45 % reduction in <term> error </term> compared to the <term> SI </term> result .
measure(ment),12-9-H90-1060,bq Using only 40 <term> utterances </term> from the <term> target speaker </term> for <term> adaptation </term> , the <term> error rate </term> dropped to 4.1 % --- a 45 % reduction in <term> error </term> compared to the <term> SI </term> result .
other,6-9-H90-1060,bq Using only 40 <term> utterances </term> from the <term> target speaker </term> for <term> adaptation </term> , the <term> error rate </term> dropped to 4.1 % --- a 45 % reduction in <term> error </term> compared to the <term> SI </term> result .
other,3-4-H90-1060,bq With only 12 <term> training speakers </term> for <term> SI recognition </term> , we achieved a 7.5 % <term> word error rate </term> on a standard <term> grammar </term> and <term> test set </term> from the <term> DARPA Resource Management corpus </term> .
lr,27-2-H90-1060,bq First , we present a new paradigm for <term> speaker-independent ( SI ) training </term> of <term> hidden Markov models ( HMM ) </term> , which uses a large amount of <term> speech </term> from a few <term> speakers </term> instead of the traditional practice of using a little <term> speech </term> from many <term> speakers </term> .
hide detail