Learning Dynamic Locomotion Skills for Terrains with Obstacles
published: July 28, 2015, recorded: June 2015, views: 1534
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Using reinforcement learning to develop motor skills for articulated figures is challenging because of state spaces and action spaces that are high dimensional and continuous. In this work, we learn control policies for dynamic gaits across terrains having sequences of gaps, walls, and steps. Results are demonstrated using physics-based simulations of a 21 link planar dog and a 7-link planar biped. Our approach is characterized by a number of features, including: non-parametric representation of the value function and the control policy; value iteration using batched positive-TD updates; localized epsilon-greedy exploration; and an action parameterization that is tailored for the problem domain. In support of the nonparametric representation, we further optimize for a task-specific distance metric. The policies are computed offline using repeated iterations of epsilon-greedy exploration and value iteration. The final control policies then run in real time over novel terrains. We evaluate the impact of the key features of our skill learning pipeline on the resulting performance.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !