A Semi-parametric Statistical Approach to Model-free Policy Evaluation
Description
Reinforcement learning (RL) methods based on least-squares temporal difference (LSTD) have been developed recently and have shown good practical performance. However, the quality of their estimation has not been well elucidated. In this article, we discuss LSTD based policy evaluation from the new viewpoint of semiparametric statistical inference. In fact, the estimator can be obtained from a particular estimating function which guarantees its convergence to the true value asymptotically, without specifying a model of the environment. Based on these observations, we 1) analyze the asymptotic variance of an LSTD-based estimator, 2) derive the optimal estimating function with the minimum asymptotic estimation variance, and 3) derive a suboptimal estimator to reduce the computational burden in obtaining the optimal estimating function.
| Slides | |
| 0:00 | A Semiparametric Statistics Approach to Model-Free Policy Evaluation |
| 0:06 | Summary of This Talk |
| 1:19 | Model-Free Reinforcement Learning |
| 2:24 | Policy Iteration [Sutton & Barto, 1998] |
| 3:16 | Policy Evaluation Method: LSTD [Bratke & Barto, 1996] |
| 4:11 | Least Square Temporal Difference (LSTD) (1) |
| 5:17 | Least Square Temporal Difference (LSTD) (2) |
| 6:11 | Linear Regression with Error in Variables |
| 7:18 | Instrumental Variable Method [Soderstrom and Stoica, 2002] |
| 8:24 | Least Square Temporal Difference (LSTD) |
| 9:53 | Our Approach |
| 10:32 | Semiparametric Statistics Approach |
| 11:45 | Inference of Semiparametric Model |
| 12:31 | Estimating Functions |
| 13:20 | Are There Any Other Estimating Functions ? |
| 13:50 | Asymptotic Variance of LSTD-Based Estimators |
| 14:45 | The Optimal Estimating Function |
| 15:46 | gLSTD |
| 16:21 | Summary of gLSTD |
| 17:06 | Simulation (Markov Random Walk) |
| 17:58 | Simulation Result. |
| 18:39 | Conclusion |
| 19:11 | Future Work |
| 19:37 | End |
| 24:16 | - Questions |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !





