Finite horizon exploration for path integral control problems
Description
We have recently developed a path integral method for solving a
class of non-linear stochastic control problems in the continuous
domain [1, 2]. Path integral (PI) control can be applied for
timedependent finite-horizon tasks (motor control, coordination between
agents) and static tasks (which behave similar to discounted reward
reinforcement learning). In this control formalism, the cost-togo or
value function can be solved explicitly as a function of the
environment and rewards (as a path integral). Thus, for PI control one
does not need to solve the Bellman equation. The computation of the
path integral can also be complex, but one can use methods and concepts
from statistical physics, such as Monte Carlo sampling or the Laplace
approximation to obtain efficient approximations. One can also
generalize this control formalism to multiple agents that jointly solve
a task. In this case the agents need to coordinate their actions not
only through time, but also among each other. It was recently shown
that the problem can be mapped on a graphical model inference problem
and can be solved using the junction tree algorithm. Exact control
solutions can be computed for instance with hundreds of agents,
depending on the complexity of the cost function [3].
| Slides | |
| 0:01 | Finite horizon exploration for path integral control problems |
| 0:51 | Stochastic optimal control theory |
| 3:03 | Exploration in RL |
| 4:46 | Exploration in receding horizon problems |
| 4:59 | Finite horizon exploration |
| 6:07 | Outline |
| 6:32 | Stochastic optimal control |
| 8:19 | The optimal cost-to-go |
| 9:52 | The diffusion process |
| 10:45 | The path integral formulation |
| 11:40 | An example: double slit |
| 12:02 | MC sampling on double slit |
| 12:24 | Planning: RL vs PI control |
| 13:14 | Planning: RL vs PI control01 |
| 13:48 | Some observations |
| 14:21 | Fixed horizon exploration |
| 16:49 | Fixed horizon exploration01 |
| 17:07 | Fixed horizon exploration02 |
| 17:49 | Fixed horizon exploration03 |
| 18:10 | Fixed horizon exploration04 |
| 19:00 | Fixed horizon exploration05 |
| 20:23 | Receding horizon exploration |
| 22:11 | Receding horizon exploration01 |
| 23:49 | Receding horizon exploration02 |
| 24:06 | Summary |
| 25:25 | Further reading |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !





