Rearrangement with Nonprehensile Manipulation Using Deep Reinforcement Learning

Weihao Yuan1, Johannes Andreas Stork2, Danica Kragic3, Michael Yu Wang4, Kaiyu Hang2

  • 1Hong Kong University of Science and Technology
  • 2KTH Royal Institute of Technology
  • 3KTH
  • 4Hong Kong University of Science & Technology

Details

Category

Interactive Session

Sessions

10:30 - 13:00 | Tue 22 May | podE | TuA@E

Manipulation - Planning 1

Full Text

Abstract

Rearranging objects on a tabletop surface by means of nonprehensile manipulation is a task which requires skillful interaction with the physical world. Usually, this is achieved by precisely modeling physical properties of the objects, robot, and the environment for explicit planning. In contrast, as explicitly modeling the physical environment is not always feasible and involves various uncertainties, we learn a nonprehensile rearrangement strategy with deep reinforcement learning based on only visual feedback. For this, we model the task with rewards and train a deep Q-network. Our potential field-based heuristic exploration strategy reduces the amount of collisions which lead to suboptimal outcomes and we actively balance the training set to avoid bias towards poor examples. Our training process leads to quicker learning and better performance on the task as compared to uniform exploration and standard experience replay. We demonstrate empirical evidence from simulation that our method leads to a success rate of 85%, show that our system can cope with sudden changes of the environment, and compare our performance with human level performance.

Additional Information

No information added

Video

No videos found

Summary

  • The robot is tasked to push an object (blue) around obstacles (red) to a goal region (green) relying on only visual feedback.
  • Our learning system treats perception and planning in an end-to-end process
  • We improve the training process by active replay buffer control and informed exploration based on a Gaussian potential field
  • The trained system can react to sudden changes of the environment, e.g. positions of the object, obstacles or the target, friction coefficients, and a distracting novel object