Accurate Detection and 3D Localization of Humans Using a Novel YOLO-Based RGB-D Fusion Approach and Synthetic Training Data

Timm Linder1, Kilian Yutaka Pfeiffer2, Narunas Vaskevicius1, Robert Schirmer1, Kai Oliver Arras3

  • 1Robert Bosch GmbH
  • 2RWTH Aachen University
  • 3Bosch Research

Details

09:45 - 10:00 | Mon 1 Jun | Room T24 | MoA24.3

Session: Human Detection and Tracking

Abstract

While 2D object detection has made significant progress, robustly localizing objects in 3D space under presence of occlusion is still an unresolved issue. Our focus in this work is on real-time detection of human 3D centroids in RGB-D data. We propose an image-based detection approach which extends the YOLOv3 architecture with a 3D centroid loss and mid-level feature fusion to exploit complementary information from both modalities. We employ a transfer learning scheme which can benefit from existing large-scale 2D object detection datasets, while at the same time learning end-to-end 3D localization from our highly randomized, diverse synthetic RGB-D dataset with precise 3D groundtruth. We further propose a geometrically more accurate depth-aware crop augmentation for training on RGB-D data, which helps to improve 3D localization accuracy. In experiments on our challenging intralogistics dataset, we achieve state-of-the-art performance even when learning 3D localization just from synthetic data.