Learning Object Placements for Relational Instructions by Hallucinating Scene Representations

Oier Mees1, Alp Emek1, Johan Vertens1, Wolfram Burgard2

  • 1University of Freiburg
  • 2University of Technology Nuremberg

Details

09:30 - 09:45 | Mon 1 Jun | Room T3 | MoA03.2

Session: Deep Learning in Robotics and Automation I

Abstract

Human-centered environments contain a wide variety of spatial relations between everyday objects. For autonomous robots to interact with humans effectively in such environments, they should be able to reason about the best way to place objects in order to follow natural language instructions based on spatial relations. In this work, we present a convolutional neural network for estimating pixelwise object placement probabilities for a set of spatial relations from a single input image. During training, our network receives the learning signal by classifying hallucinated high-level scene representations as an auxiliary task. Unlike previous approaches, our method does not require ground truth data for the pixelwise relational probabilities or 3D models of the objects, which significantly expands the applicability in practical robotics scenarios. Our results, based on both real-world data and human-robot experiments, demonstrate the effectiveness of our method in reasoning about the best way to place objects to reproduce a spatial relation. Videos of our experiments can be found at https://youtu.be/zaZkHTWFMKM