Speaker Adaptation of RNN-BLSTM for Speech Recognition Based on Speaker Code

Jian Tang¹, Li-Rong Dai¹, Shaofei Xue¹, Zhiying Huang¹

¹University of Science and Technology of China

Details

13:30 - 15:30 | Tue 22 Mar | Poster Area H | SP-P1.8

Session: Acoustic Model Adaptation for Speech Recognition I

Abstract

Recently, recurrent neural network with bidirectional Long Short-Term Memory (RNN-BLSTM) acoustic model has been shown to give great performance on the TIMIT [1] and other speech recognition tasks. Meanwhile, the speaker code based adaptation method has been demonstrated as a valid adaptation method for Deep Neural Network (DNN) acoustic model [2]. However, whether the speaker code based adaptation method is also valid for RNN-BLSTM has not been reported to the best our knowledge. In this paper, we study how to conduct effective speaker code based speaker adaptation on RNN-BLSTM and demonstrate that the speaker code based adaptation method is also a valid adaptation method for RNN-BLSTM. Experimental results on TIMIT have shown that the adaptation of RNN-LSTM can achieve over 10% relative reduction in phone error rate (PER) compared to without adaptation. Then, a set of comparative experiments are implemented to analyze the different contribution of the adaptation on cell input and each gate activation function of the BLSTM. It's found that the adaptation on cell input activation function is more effective than the adaptation on each gate activation function.