Comparing Human-Centric and Robot-Centric Sampling for Robot Deep Learning from Demonstrations

Michael Laskey1, Caleb Chuck1, Jonathan Lee2, Jeffrey Mahler1, Sanjay Krishnan2, Kevin Jamieson3, Anca Dragan2, Ken Goldberg4

  • 1University of California, Berkeley
  • 2University of California Berkeley
  • 3University of Washington
  • 4UC Berkeley

Details

10:30 - 10:35 | Tue 30 May | Room 4611/4612 | TUA6.8

Session: Learning and Adaptive Systems 1

Abstract

Motivated by recent advances in Deep Learning for robot control, this paper considers two learning algorithms in terms of how they acquire demonstrations from fallible human supervisors. “Human-Centric” (HC) sampling is a standard supervised learning algorithm, where a human supervisor demonstrates the task by teleoperating the robot to provide trajectories consisting of state-control pairs. “Robot-Centric” (RC) sampling is an increasingly popular alternative used in algorithms such as DAgger, where a human supervisor observes the robot execute a learned policy and provides corrective control labels for each state visited. We suggest RC sampling can be challenging for human supervisors and prone to mislabeling. We compare HC and RC using a grid world environment and a physical robot singulation task. In the latter the input is an image of objects on a planar worksurface and the policy generates a motion in the gripper to separate one object from the rest. We observe in simulation that for linear SVMs, policies learned with RC outperformed those learned with HC but that using deep models this advantage disappears. We also find that with RC, the corrective control labels provided by humans can be highly inconsistent. We prove there exists a class of examples in which at the limit, HC is guaranteed to converge to an optimal policy while RC may fail to converge. These results suggest a form of HC sampling may be preferable for human supervisors.