LIDAR and Monocular Camera Fusion: On-Road Depth Completion for Autonomous Driving

¹Carnegie Mellon University

Details

12:30 - 12:45 | Mon 28 Oct | The Great Room I | MoD-T1.3

Session: Regular Session on Sensor Fusion (II)

Full Text

Abstract

LIDAR and RGB cameras are commonly used sensors in autonomous vehicles. However, both of them have limitations: LIDAR provides accurate depth but is sparse in vertical and horizontal resolution; RGB images provide dense texture but lack depth information. In this paper, we fuse LIDAR and RGB images by a deep neural network, which completes a denser pixel-wise depth map. The proposed architecture reconstructs the pixel-wise depth map, taking advantage of both the dense color features and sparse 3D spatial features. We applied the early fusion technique and fine-tuned the ResNet model as the encoder. The designed Residual Up-Projection block recovers the spatial resolution of the feature map and captures context within the depth map. We introduced a depth feature tensor which propagates context information from encoder blocks to decoder blocks. Our proposed method is evaluated on the large-scale indoor NYUdepthV2 and KITTI odometry datasets and outperforms the state-of-the-art single RGB image and depth fusion method. The proposed method is also evaluated on a reduced-resolution KITTI dataset which synthesizes the planar LIDAR and RGB image fusion.