Audio Word Similarity for Clustering with Zero Resources Based on Iterative HMM Classification

Amélie Royer¹, Guillaume Gravier², Vincent Claveau²

¹Institute of Science and Technology
²CNRS

Details

13:30 - 15:30 | Tue 22 Mar | Poster Area I | SP-P2.5

Session: Speech Recognition I

Abstract

Recent work on zero resource word discovery makes intensive use of audio fragment clustering to find repeating speech patterns. In the absence of acoustic models, the clustering step traditionally relies on dynamic time warping to compare two samples and thus suffers from the known limitations of this technique. We propose a new sample comparison method, called similarity by iterative clustering, that exploits the modeling capacities of HMMs with no supervision. The core idea relies on the use of HMMs trained on randomly labeled data and exploits the fact that similar samples are more likely to be classified together by a large number of random classifiers than dissimilar ones. This similarity measure is compared to DTW on two tasks, namely nearest neighbor retrieval and clustering, showing that the generalization capabilities of probabilistic machine learning significantly benefit to audio word comparison and overcome many of the limitations of DTW-based comparison.