Convolutional Neural Network Pre-Trained with Projection Matrices on Linear Discriminant Analysis

Details

13:30 - 15:30 | Tue 22 Mar | Poster Area I | SP-P2.6

Session: Speech Recognition I

Abstract

Recently, the hybrid architecture of a neural network (NN) and a hidden Markov model (HMM) has shown significant improvement on automatic speech recognition (ASR) over the conventional Gaussian mixture model (GMM)-based system. The convolutional neural network (CNN), a successful NN-based system, can represent local spectral variations spanning the time-frequency space. Meanwhile, spectro-temporal features have been widely studied to make ASR more robust. Typically, the spectro-temporal features are extracted from acoustic spectral patterns using a 2D filtering process. Convolutional layers in CNN that have various local windows can also be regarded as an efficient feature extractor to capture 2D spectral variations. In a standard procedure, the local windows in CNN are initialized randomly before the pre-training and are iteratively updated with a back propagation algorithm in the pre-training and fine-tuning steps. In this paper, we explore using projection matrices composed of eigenvectors estimated by linear discriminant analysis (LDA) objective function as initial weights for the first convolutional layer in CNN. From analysis of the local windows trained by the proposed method, we can see the eigenvectors of LDA has desirable properties as initial weights of CNN. The proposed method yielded 8.1% relative improvement compared to CNN with local weights initialized randomly.