tech,9-9-H90-1060,ak the <term> target speaker </term> for <term> adaptation </term> , the <term> error rate </term> dropped
lr-prod,26-4-H90-1060,ak </term> and <term> test set </term> from the <term> DARPA Resource Management corpus </term> . This performance is comparable
measure(ment),12-9-H90-1060,ak </term> for <term> adaptation </term> , the <term> error rate </term> dropped to 4.1 % --- a 45 % reduction
lr,20-4-H90-1060,ak word error rate </term> on a standard <term> grammar </term> and <term> test set </term> from the <term>
model,14-2-H90-1060,ak speaker-independent ( SI ) training </term> of <term> hidden Markov models ( HMM ) </term> , which uses a large amount of <term>
tech,7-1-H90-1060,ak paper reports on two contributions to <term> large vocabulary continuous speech recognition </term> . First , we present a new paradigm
model,17-3-H90-1060,ak statistics of independently trained <term> models </term> rather than the usual <term> pooling
other,26-6-H90-1060,ak amount of <term> speech </term> from the <term> new ( target ) speaker </term> . A <term> probabilistic spectral mapping
tech,22-3-H90-1060,ak models </term> rather than the usual <term> pooling </term> of all the <term> speech data </term>
model,1-7-H90-1060,ak <term> new ( target ) speaker </term> . A <term> probabilistic spectral mapping </term> is estimated independently for each
model,1-8-H90-1060,ak the <term> target speaker </term> . Each <term> reference model </term> is transformed to the space of the
lr,16-6-H90-1060,ak adaptation ( SA ) </term> using the new <term> SI corpus </term> and a small amount of <term> speech
tech,6-4-H90-1060,ak 12 <term> training speakers </term> for <term> SI recognition </term> , we achieved a 7.5 % <term> word error
tech,8-6-H90-1060,ak show a significant improvement for <term> speaker adaptation ( SA ) </term> using the new <term> SI corpus </term>
tech,8-2-H90-1060,ak First , we present a new paradigm for <term> speaker-independent ( SI ) training </term> of <term> hidden Markov models ( HMM
other,31-2-H90-1060,ak amount of <term> speech </term> from a few <term> speakers </term> instead of the traditional practice
other,44-2-H90-1060,ak of using a little speech from many <term> speakers </term> . In addition , combination of the
other,30-3-H90-1060,ak the <term> speech data </term> from many <term> speakers </term> prior to <term> training </term> . With
lr,27-2-H90-1060,ak </term> , which uses a large amount of <term> speech </term> from a few <term> speakers </term> instead
lr,23-6-H90-1060,ak corpus </term> and a small amount of <term> speech </term> from the <term> new ( target ) speaker
hide detail