event thumbnail image
Reinforcement Learning

An Analysis of Reinforcement Learning with Function Approximation

author: Francisco Melo, Carnegie Mellon University

Description

We address the problem of computing the optimal Q-function in Markov decision problems with infinite state-space. We analyze the convergence properties of several variations of Q-learning when combined with function approximation, extending the analysis of TD-learning in (Tsitsilis and Van Roy, 1996) to stochastic control settings. We identify conditions under which such approximate methods converge with probability 1. We conclude with a brief discussion on the general applicability of our results and compare them with several related works.

You might be experiencing some problems with Your Video player.
Slides
0:00 An analysis of RL  with function approximation
0:45 Our problem:
1:55 RL with function approximation
3:03 TD(λ) with FA
4:01 TD(λ) with FA (cont.) (1)
4:08 TD(λ) with FA
4:21 TD(λ) with FA (cont.) (1)
5:14 TD(λ) with FA (cont.) (2)
6:17 What about control?
7:16 Q‐learning
7:36 Convergence of Q‐learning
8:08 Q‐learning
8:11 Convergence of Q‐learning
8:34 Sketch of the proof
8:57 What does this mean?
11:53 On‐policy vs. off‐policy
12:40 Convergence of SARSA
13:12 Sketch of the proof
13:52 Discussion

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: