event thumbnail image
Reinforcement Learning

Space-indexed Dynamic Programming: Learning to Follow Trajectories

author: J. Zico Kolter, Computer Science department, Stanford University

Description

We consider the task of learning to accurately follow a trajectory in a vehicle such as a car or helicopter. A number of dynamic programming algorithms such as Differential Dynamic Programming (DDP) and Policy Search by Dynamic Programming (PSDP), can efficiently compute non-stationary policies for these tasks --- such policies in general are well-suited to trajectory following since they can easily generate different control actions at different times in order to follow the trajectory. However, a weakness of these algorithms is that their policies are time-indexed, in that they apply different policies depending on the current time. This is problematic since 1) the current time may not correspond well to where we are along the trajectory and 2) the uncertainty over future states can prevent these algorithms from finding any good policies at all. In this paper we propose a method for space-indexed dynamic programming that overcomes both these difficulties. We begin by showing how a dynamical system can be rewritten in terms of a spatial index variable (i.e., how far along the trajectory we are) rather than as a function of time. We then use these space-indexed dynamical systems to derive space-indexed version of the DDP and PSDP algorithms. Finally, we show that these algorithms perform well on a variety of control tasks, both in simulation and on real systems.

You might be experiencing some problems with Your Video player.
Slides
0:00 Space-Indexed Dynamic Programming: Learning to Follow Trajectories
0:16 Outline
0:34 Reinforcement Learning and Following Trajectories
0:51 Trajectory Following (1)
1:11 Trajectory Following (2)
1:33 Dynamic Programming (1)
1:58 Dynamic Programming (2)
2:01 Dynamic Programming (3)
2:07 Dynamic Programming (4)
2:16 Dynamic Programming (5)
2:19 Dynamic Programming (6)
2:21 Dynamic Programming (7)
2:33 Dynamic Programming (8)
3:17 Problems with Dynamic Programming (1)
3:32 Problems with Dynamic Programming (2)
3:42 Problems with Dynamic Programming (3)
3:51 Problems with Dynamic Programming (4)
4:00 Problems with Dynamic Programming (5)
4:33 Problems with Dynamic Programming (6)
4:43 Problems with Dynamic Programming (7)
5:12 Problems with Dynamic Programming (8)
5:26 Space-Indexed Dynamic Programming
5:50 Space-Indexed Dynamical Systems and Dynamic Programming
5:58 Difficulty with SIDP
6:30 Time-Indexed Dynamical System (1)
6:51 Time-Indexed Dynamical System (2)
6:54 Time-Indexed Dynamical System (3)
6:57 Time-Indexed Dynamical System (4)
7:01 Time-Indexed Dynamical System (5)
7:20 Space-Indexed Dynamical Systems (1)
7:43 Space-Indexed Dynamical Systems (2)
8:02 Space-Indexed Dynamical Systems (3)
8:32 Space-Indexed Dynamical Systems (4)
8:50 Space-Indexed Dynamic Programming (1)
9:23 Space-Indexed Dynamic Programming (2)
9:28 Space-Indexed Dynamic Programming (3)
9:34 Space-Indexed Dynamic Programming (4)
9:46 Space-Indexed Dynamic Programming (5)
9:50 Space-Indexed Dynamic Programming (6)
9:52 Space-Indexed Dynamic Programming (7)
9:55 Problems with Dynamic Programming
10:08 Space-Indexed Dynamic Programming
10:26 Problems with Dynamic Programming
10:34 Space-Indexed Dynamic Programming (1)
10:55 Space-Indexed Dynamic Programming (2)
11:25 Experiments
11:30 Experimental Domain
11:44 Experimental Setup
12:33 Time-Indexed PSDP
13:04 Time-Indexed PSDP w/ Re-indexing
14:08 Space-Indexed PSDP
15:05 Empirical Evaluation
16:06 Additional Experiments
16:20 Related Work
17:11 Summary
17:43 Thank you!
20:56 - Questions

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: