event thumbnail image
On-line Trading of Exploration and Exploitation

Finite horizon exploration for path integral control problems

author: Bert Kappen, Radboud University Nijmegen

Description

We have recently developed a path integral method for solving a class of non-linear stochastic control problems in the continuous domain [1, 2]. Path integral (PI) control can be applied for timedependent finite-horizon tasks (motor control, coordination between agents) and static tasks (which behave similar to discounted reward reinforcement learning). In this control formalism, the cost-togo or value function can be solved explicitly as a function of the environment and rewards (as a path integral). Thus, for PI control one does not need to solve the Bellman equation. The computation of the path integral can also be complex, but one can use methods and concepts from statistical physics, such as Monte Carlo sampling or the Laplace approximation to obtain efficient approximations. One can also generalize this control formalism to multiple agents that jointly solve a task. In this case the agents need to coordinate their actions not only through time, but also among each other. It was recently shown that the problem can be mapped on a graphical model inference problem and can be solved using the junction tree algorithm. Exact control solutions can be computed for instance with hundreds of agents, depending on the complexity of the cost function [3].

You might be experiencing some problems with Your Video player.
Slides
0:01 Finite horizon exploration for path integral control
problems
0:51 Stochastic optimal control theory
3:03 Exploration in RL
4:46 Exploration in receding horizon problems
4:59 Finite horizon exploration
6:07 Outline
6:32 Stochastic optimal control
8:19 The optimal cost-to-go
9:52 The diffusion process
10:45 The path integral formulation
11:40 An example: double slit
12:02 MC sampling on double slit
12:24 Planning: RL vs PI control
13:14 Planning: RL vs PI control01
13:48 Some observations
14:21 Fixed horizon exploration
16:49 Fixed horizon exploration01
17:07 Fixed horizon exploration02
17:49 Fixed horizon exploration03
18:10 Fixed horizon exploration04
19:00 Fixed horizon exploration05
20:23 Receding horizon exploration
22:11 Receding horizon exploration01
23:49 Receding horizon exploration02
24:06 Summary
25:25 Further reading

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: