Acoustic Data-Driven Pronunciation Lexicon Generation for Logographic Languages

Daniel Povey¹, Guoguo Chen¹, Sanjeev Khudanpur²

¹JHU
²Johns Hopkins University

Details

13:30 - 15:30 | Tue 22 Mar | Poster Area I | SP-P2.7

Session: Speech Recognition I

Abstract

Handcrafted pronunciation lexicons are widely used in modern speech recognition systems. Designing a pronunciation lexicon, however, requires tremendous amount of expert knowledge and effort, which is not practical when applying speech recognition techniques to low resource languages. In this paper, we are interested in developing speech recognition systems for logographic languages with only a small expert pronunciation lexicon. An iterative framework is proposed to generate and refine the phonetic transcripts of the training data, which will then be aligned to their word-level transcripts for grapheme-to-phoneme (G2P) model training. The G2P model trained this way covers graphemes that appear in the training transcripts (most of which are usually unseen in a small expert lexicon for logographic languages), therefore is able to generate pronunciations for all the words in the transcripts. The proposed lexicon generation procedure is evaluated on Cantonese speech recognition and keyword search tasks. Experiments show that starting from an expert lexicon of only 1K words, we are able to generate a lexicon that works reasonably well when compared with an expert-crafted lexicon of 5K words.