An Analysis of Reinforcement Learning with Function Approximation
author:
Francisco Melo,
Carnegie Mellon University
Description
We address the problem of computing the optimal Q-function in Markov decision problems with infinite state-space. We analyze the convergence properties of several variations of Q-learning when combined with function approximation, extending the analysis of TD-learning in (Tsitsilis and Van Roy, 1996) to stochastic control settings. We identify conditions under which such approximate methods converge with probability 1. We conclude with a brief discussion on the general applicability of our results and compare them with several related works.
You might be experiencing some problems with Your Video player.
| Slides | |
| 0:00 | An analysis of RL with function approximation |
| 0:45 | Our problem: |
| 1:55 | RL with function approximation |
| 3:03 | TD(λ) with FA |
| 4:01 | TD(λ) with FA (cont.) (1) |
| 4:08 | TD(λ) with FA |
| 4:21 | TD(λ) with FA (cont.) (1) |
| 5:14 | TD(λ) with FA (cont.) (2) |
| 6:17 | What about control? |
| 7:16 | Q‐learning |
| 7:36 | Convergence of Q‐learning |
| 8:08 | Q‐learning |
| 8:11 | Convergence of Q‐learning |
| 8:34 | Sketch of the proof |
| 8:57 | What does this mean? |
| 11:53 | On‐policy vs. off‐policy |
| 12:40 | Convergence of SARSA |
| 13:12 | Sketch of the proof |
| 13:52 | Discussion |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
Visitors who watched this lecture also watched...
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !




