Non-Negative Intermediate-Layer DNN Adaptation for A 10-Kb Speaker Adaptation Profile

Chaojun Liu¹, Kshitiz Kumar¹, Yifan Gong²

¹Microsoft
²Microsoft Corporation

Details

13:30 - 15:30 | Tue 22 Mar | Poster Area H | SP-P1.4

Session: Acoustic Model Adaptation for Speech Recognition I

Abstract

Previously we demonstrated that speaker adaptation of acoustic models (AM) can provide significant improvement in the accuracy of large-scale speech recognition systems. In this work we discuss numerous challenges in scaling speaker adaptation to millions of speakers, where the size of speaker-dependent (SD) parameters is a critical challenge. Subsequently, we formulate an intermediate-layer adaptation framework for adaptation, upon which we build a non-negative adaptation for a very sparse set of non-negative SD parameters. We further improve this work with, (a) non-negative adaptation with a small-positive threshold, (b) setting small-positive weights in an already trained non-negative model to zero. We also discuss effective methods to store the non-negative SD parameters. We show that our methods reduce the SD parameters from 86KB for our previous best adaptation approach to 8.8KB, thus about 90% relative reduction in the size of SD parameters, and still retain 10+% word-error-rate-relative (WERR) gain over the baseline speaker-independent (SI) model.