Snapbot (a) turning left, (b) going straight, and (c) turning right

Trajectory-Based Probabilistic Policy Gradient for Learning Locomotion Behaviors

Sungjoon Choi¹, Joohyung Kim²

¹Seoul National University
²University of Illinois at Urbana-Champaign

Details

10:45 - 12:00 | Mon 20 May | Room 220 POD 01 | MoA1-01.1

Session: Robot Learning I - 1.1.01

Full Text

Digital Library Website

Abstract

In this paper, we propose a trajectory-based reinforcement learning method named deep latent policy gradient (DLPG) for learning locomotion skills. We deﬁne the policy function as a probability distribution over trajectories and train the policy using a deep latent variable model to achieve sample efﬁcient skill learning. We ﬁrst evaluate the sample efﬁciency of DLPG compared to the state-of-the-art reinforcement learning methods in simulated environments. Then, we apply the proposed method to a four-legged walking robot named Snapbot to learn three basic locomotion skills of turn left, go straight, and turn right. We demonstrate that, by properly designing two reward functions for curriculum learning, Snapbot successfully learns the desired locomotion skills with moderate sample complexity.