Estimating Metric Scale Visual Odometry from Videos Using 3D Convolutional Networks

Alexander Koumis1, James Preiss, Gaurav Sukhatme1

  • 1University of Southern California

Details

11:30 - 11:45 | Tue 5 Nov | L1-R7 | TuAT7.3

Session: Computer Vision and Applications I

Abstract

We present an end-to-end deep learning approach for performing metric scale-sensitive regression tasks such visual odometry with a single camera and no additional sensors. We propose a novel 3D convolutional architecture, 3DC-VO, that can leverage temporal relationships over a short moving window of images to estimate linear and angular velocities. The network makes local predictions on stacks of images that can be integrated to form a full trajectory. We apply 3DC-VO to the KITTI visual odometry benchmark and the task of estimating a pilot's control inputs from a first-person video of a quadrotor flight. Our method exhibits increased accuracy relative to comparable learning-based algorithms trained on monocular images. We also show promising results for quadrotor control input prediction when trained on a new dataset collected with a UAV simulator.