Accelerating Multi-User Large Vocabulary Continuous Speech Recognition on Heterogeneous CPU-GPU Platforms

¹Carnegie Mellon University

Details

13:30 - 15:30 | Tue 22 Mar | Poster Area I | SP-P2.3

Session: Speech Recognition I

Abstract

In our previous work, we developed a GPU-accelerated speech recognition engine optimized for faster than real time speech recognition on heterogeneous CPU-GPU architectures. In this work, we focused on developing a scalable server-client architecture specifically optimized for simultaneous decoding of multiple users in real-time. In order to efficiently support real-time speech recognition for multiple users, a "producer/consumer" design pattern was applied to decouple sub-processes that run at different rates and to enable the efficient handling of multiple processes at the same time. Furthermore, we divided up the speech recognition process into multiple consumers in order to maximize hardware utilization. As a result, our platform architecture was able to process more than 45 real-time audio streams with an average latency of less than 0.3 seconds using one-million-word vocabulary language models.