BaRC allows a simulated quadrotor system to efficiently solve sparse reward RL problems.

BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning

Boris Ivanovic¹, James Harrison², Apoorva Sharma², Mo Chen³, Marco Pavone²

¹NVIDIA
²Stanford University
³Simon Fraser University

Details

10:45 - 12:00 | Mon 20 May | Room 220 POD 01 | MoA1-01.3

Session: Robot Learning I - 1.1.01

Full Text

Digital Library Website

Abstract

Model-free Reinforcement Learning (RL) offers an attractive approach to learn control policies for high-dimensional systems, but its relatively poor sample complexity often necessitates training in simulated environments. Even in simulation, goal-directed tasks whose natural reward function is sparse remain intractable for state-of-the-art model-free algorithms for continuous control. The bottleneck in these tasks is the prohibitive amount of exploration required to obtain a learning signal from the initial state of the system. In this work, we leverage physical priors in the form of an approximate system dynamics model to design a curriculum for a model-free policy optimization algorithm. Our Backward Reachability Curriculum (BaRC) begins policy training from states that require a small number of actions to accomplish the task, and expands the initial state distribution backwards in a dynamically-consistent manner once the policy optimization algorithm demonstrates sufficient performance. Its curriculum strategy is physically intuitive, easy-to-tune, and allows incorporating physical priors to accelerate training without hindering the performance, flexibility, and applicability of the model-free RL algorithm. We evaluate our approach on two representative dynamic robotic learning problems and find substantial performance improvement relative to previous curriculum generation techniques and naive exploration strategies.