Speaker-Aware Training of LSTM-RNNS for Acoustic Modelling

Dong Yu1, Khe Chai Sim2, Liang Lu3, Souvik Kundu2, Tian Tan4, Xiong Xiao5, Yanmin Qian4, Yu Zhang6

  • 1Microsoft Research
  • 2National University of Singapore
  • 3University of Edinburgh
  • 4Shanghai Jiao Tong University
  • 5Nanyang Technological University
  • 6Massachusetts Institute of Technology

Details

13:30 - 15:30 | Tue 22 Mar | Poster Area H | SP-P1.3

Session: Acoustic Model Adaptation for Speech Recognition I

Abstract

Long Short-Term Memory (LSTM) is a particular type of recurrent neural network (RNN) that can model long term temporal dynamics. Recently it has been shown that LSTM-RNNs can achieve higher recognition accuracy than deep feed-forword neural networks (DNNs) in acoustic modelling. However, speaker adaption for LSTM-RNN based acoustic models is not well investigated. In this paper, we study the speaker-aware training for LSTM-RNNs that incorporate the speaker information during model training to normalise the speaker variability. We first present several different model architectures for this purpose, and then experimentally evaluate three different types of speaker representations, i.e. I-vectors, bottleneck speaker vectors and speaking rate. Furthermore, to factorize the variability in the acoustic signals caused by speakers and phonemes respectively, we investigate the speaker-aware and phone-aware joint training in the framework of multi-task learning. In AMI meeting speech transcription task, speaker-aware training of LSTM-RNNs improves word error rates by 6.5\% relative over a very strong LSTM-RNN baseline, which has used fmllr features.