On Combining I-Vectors and Discriminative Adaptation Methods for Unsupervised Speaker Normalization in DNN Acoustic Models

¹National University of Singapore

Details

13:30 - 15:30 | Tue 22 Mar | Poster Area H | SP-P1.2

Session: Acoustic Model Adaptation for Speech Recognition I

Abstract

In automatic speech recognition (ASR), adaptation and adaptive training techniques are used to perform speaker normalization. Previous methods mainly focus on using these techniques in isolation. In contrast, this paper investigates two approaches to improve the ASR performance by combining i-vector based speaker adaptive training in deep neural network (DNN) acoustic models with discriminative adaptation techniques. First, we combine these techniques by interpolating the decoding lattices of i-vector based systems with the decoding lattices of a discriminatively adapted model. Then, we combine these methods by discriminatively adapting the i-vector based system in unsupervised fashion. Our experiments on TED-LIUM dataset show that compared with a strong speaker independent baseline, lattice interpolation and adaptation of the i-vector systems achieve 12.0% and 15.6% relative improvements, respectively. Moreover, in comparison to the i-vector based systems, lattice interpolation reported a 4.5% relative improvement while discriminatively adapting the i-vector system reported a 8.3% relative improvement.