Joint Action Recognition and Summarization By Sub-Modular Inference

Fairouz Hussein¹, Massimo Piccardi¹, Sari Awwad¹

¹University of Technology Sydney

Details

13:30 - 13:50 | Tue 22 Mar | Room 3E | MMSP-L1.1

Session: Action and Activity Recognition

Abstract

Action recognition and video summarization are two important multimedia tasks that are useful for applications such as video indexing and retrieval, video surveillance, human-computer interaction and home intelligence. While many approaches exist in the literature for these two tasks, to date they have always been addressed separately. Instead, in this paper we move from the assumption that these two tasks should be tackled as a joint objective: on the one hand, action recognition can drive the selection of meaningful and informative summaries; on the other, recognizing actions from a summary rather than the entire video can in principle reduce noise and prove more accurate. To this aim, we propose a novel approach for joint action recognition-summarization based on the performing latent structural SVM framework, together with an efficient algorithm for inferring the action and the summary based on the property of sub-modularity. Experimental results on a challenging benchmark, MSR DailyActivity3D, show that the approach is capable of achieving remarkable action recognition accuracy while providing appealing video summaries.