Structured Prediction with Short/Long-Range Dependencies for Human Activity Recognition from Depth Skeleton Data

Mohammad Mahdi Arzani1, Mahmood Fathy1, Hamid Aghajan2, Ahmad Akbari Azirani1, Kaamran Raahemifar3, Ehsan Adeli4

  • 1Iran University of Science and Technology
  • 2Gent University
  • 3Ryerson University
  • 4Stanford University

Details

11:15 - 11:30 | Mon 25 Sep | Room 217 | MoAT14.4

Session: Computer Vision for Automation I

Abstract

One of the main abilities that the robots need to maintain is to efficiently communicate with people in a humanly manner. Thus, human activity recognition (HAR) would be an integral part of such a human-robot interaction system. One of the major challenges in HAR is that the individuals perform their activities in different manners. Furthermore, there is a very wide range of different types of activities that the robots would require to understand. Some activities are simple, quick and short (e.g., sit down), while many others are complex, have many details and span through a long range of time (e.g., wearing contact lens). In this paper, we model the recognition of activates into a sequence-labeling problem and propose a new probabilistic graphical model (PGM) that can recognize both short- and long-range activities, by introducing a hierarchical classification model and including extra links and loopy conditions in our PGM. To optimize the PGM and obtain its parameters during training, we use a structured prediction technique, a general framework that involves latent structured support vector machines (LSSVM) and hidden-state conditional random fields (HCRF). We evaluate our method on two widely used datasets (CAD-60 & UT-Kinect) that contain both activity types. Our obtained results are promising and show that our method can recognize both types of activities effectively, while most of the previous works only focused on one of these two major types. We further explore distributed processing techniques, since our method can easily be distributed over processing nodes. We also propose an efficient divide-and-merge technique to further speedup the training step.