Sequence Summarizing Neural Network for Speaker Adaptation

Jan Cernocky1, Karel Vesely1, Katerina Zmolikova2, Lukas Burget1, Martin Karafiat1, Shinji Watanabe3

  • 1Brno University of Technology
  • 2BUT
  • 3Mitsubishi Electric Research Laboratories (MERL)

Details

13:30 - 15:30 | Tue 22 Mar | Poster Area H | SP-P1.10

Session: Acoustic Model Adaptation for Speech Recognition I

Abstract

In this paper we propose a DNN adaptation technique, where the i-vector extractor is replaced by a Sequence Summarizing Neural Network (SSNN). Similarly to i-vector extractor, the SSNN produces a “summary vector”, which represents acoustic summary of an utterance. Such vector is then appended to the input of main network, while both networks are trained together optimizing single loss function. Both the ivector and SSNN speaker adaptation methods are compared on AMI headset meeting data. The results show comparable performance of both techniques on FBANK system with frame-classification training. Moreover, appending both the i-vector and “summary vector” to the FBANK features leads to additional improvement comparable to the performance of fMLLR feature DNN system.