Near Optimal Representative Subset Selection from Short Sequences Generated by a Stationary Source

Ali Payani, Afshin Abdi, Faramarz Fekri1

  • 1Georgia Institute of Technology

Details

14:30 - 16:00 | Tue 5 Jul | Pentland B | R4.6

Session: Detection, estimation and filtering in sensor/wireless networks

Abstract

We consider the problem of representative subset selection for model training and identification, particularly for the applications in universal compression. We used the Kullback-Leibler divergence to measure the distance between the estimated model from the representative subset and the source model resulting from the entire data in the set. As directly solving the original projection problem is NP-hard and not practical for large data-sets, we proposed a different approximate algorithm, based on $\ell_2$ optimization with $\ell_0$ constraints. Furthermore, we derive an upper bound for the KL-Divergence that relates monotonically (increasingly) to the $\ell_2$ cost function. Hence, we can bound KL-divergence penalty by minimizing the $\ell_2$-norm. Using experimental results, we confirm the effectiveness of the proposed $\ell_2$-norm method.