DNN Speaker Adaptation Using Parameterised Sigmoid and ReLU Hidden Activation Functions

Chao Zhang1, Philip Woodland2

  • 1University of Cambridge
  • 2Cambridge University

Details

13:30 - 15:30 | Tue 22 Mar | Poster Area H | SP-P1.7

Session: Acoustic Model Adaptation for Speech Recognition I

Abstract

This paper investigates the use of parameterised sigmoid and rectified linear unit (ReLU) hidden activation functions in deep neural network (DNN) speaker adaptation. The sigmoid and ReLU parameterisation schemes from a previous study for speaker independent (SI) training are used. An adaptive linear factor associated with each sigmoid or ReLU hidden unit is used to scale the unit output value and create a speaker dependent (SD) model. Hence, DNN adaptation becomes re-weighting the importance of different hidden units for every speaker. This adaptation scheme is applied to both hybrid DNN acoustic modelling and DNN-based bottleneck feature extraction. Experiments using multi-genre British English television broadcast data show that the technique is effective in both directly adapting DNN acoustic models and the bottleneck (BN) features, and combines well with other DNN adaptation techniques. Reductions in word error rate are consistently obtained using parameterised sigmoid and ReLU activation function for multiple hidden layer adaptation.