SMSnet: Semantic Motion Segmentation Using Deep Convolutional Neural Networks

Johan Vertens1, Abhinav Valada1, Wolfram Burgard2

  • 1University of Freiburg
  • 2University of Technology Nuremberg

Details

10:30 - 10:45 | Mon 25 Sep | Room 215 | MoAT15.1

Session: Semantic Scene Understanding

Abstract

Interpreting the semantics and motion of objects are prerequisites for autonomous robots that enable them to reason and operate in dynamic real-world environments. Existing approaches that tackle the problem of semantic motion segmentation consist of long multistage pipelines and typically require several seconds to process each frame. In this paper, we present a novel convolutional neural network architecture that learns to predict both the object label and motion status of each pixel in an image. Given a pair of consecutive images, the network learns to fuse features from self-generated optical flow maps and semantic segmentation kernels to yield pixel-wise semantic motion labels. We also introduce the Cityscapes-Motion dataset which contains over 2,900 manually annotated semantic motion labels, which is the largest dataset of its kind so far. We demonstrate that our network outperforms existing approaches achieving state-of-the-art performance on the KITTI dataset, as well as in the more challenging Cityscapes-Motion dataset while being substantially faster than existing techniques.