Sample and Feedback Efficient Hierarchical Reinforcement Learning from Human Preferences

Robert Pinsler1, Riad Akrour2, Takayuki Osa3, Jan Peters4, Gerhard Neumann2

  • 1University of Cambridge
  • 2TU Darmstadt
  • 3University of Tokyo
  • 4Technische Universität Darmstadt



Interactive Session


10:30 - 13:00 | Tue 22 May | podK | TuA@K


Full Text


While reinforcement learning has led to promising results in robotics, defining an informative reward function is challenging. Prior work considered including the human in the loop to jointly learn the reward function and the optimal policy. Generating samples from a physical robot and requesting human feedback are both taxing efforts for which efficiency is critical. We propose to learn reward functions from both the robot and the human perspectives to improve on both efficiency metrics. Learning a reward function from the human perspective increases feedback efficiency by assuming that humans rank trajectories according to a low-dimensional outcome space. Learning a reward function from the robot perspective circumvents the need for a dynamics model while retaining the sample efficiency of model-based approaches. We provide an algorithm that incorporates bi-perspective reward learning into a general hierarchical reinforcement learning framework and demonstrate the merits of our approach on a toy task and a simulated robot grasping task.

Additional Information

No information added


No videos found


  • Specifying reward function can be difficult for many tasks, e.g. robotic grasping
  • Idea: learn reward functions from human and robot perspective to achieve sample and feedback efficiency
  • We incorporate this scheme into hierarchical reinforcement learning framework
  • Merits of our approach demonstrated on toy task and simulated robot grasping task