Free Energy and Relative Entropy Dualities: Connections to Path Integral Control and Applications to Robotics
published: Oct. 16, 2012, recorded: September 2012, views: 5538
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
While optimal control and reinforcement learning are fundamental frameworks for learning and control applications, their application to high dimensional control systems of the complexity of humanoid and biomimetic robots has largely been impossible so far. Among the key problems are that classical value function-based approaches run into severe limitations in continuous state-action spaces due to issues of value function approximation. Additionally, the computational complexity and time of exploring high dimensional state-action spaces quickly exceeds practical feasibility. As an alternative, researchers have turned into trajectory-based reinforcement learning, which sacri#ces global optimality in favor of being applicable to high-dimensional state-action spaces. Model-based algorithms, inspired by ideas of differential dynamic programming, have demonstrated some success if models are accurate. Model-free trajectory-based reinforcement learning has been limited by problems of slow learning and the need to tune many open parameters. Recently reinforcement learning has moved towards combining classical techniques from stochastic optimal optimal control and dynamic programming with learning techniques from statistical estimation theory and the connection between SDEs and PDEs via the Feynman-Kac Lemma. In this talk, I will discuss theoretical developments and extensions of path integral control to iterative cases and present algorithms for policy improvement in continuous state actions spaces. I will provide Information theoretic interpretations and extensions based on the fundamental relationship between free energy and relative entropy. The aforementioned relationship provides an alternative view of stochastic optimal control theory that does not rely on the Bellman principle. I will demonstrate the applicability of the proposed algorithms to control and learning of humanoid, manipulator and tendon driven robots and propose future directions in terms of theory and applications.
Download slides: cyberstat2012_theodorou_free_energy_01.pdf (12.2 MB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !