event thumbnail image
Reinforcement Learning

A Semi-parametric Statistical Approach to Model-free Policy Evaluation

author: Tsuyoshi Ueno, Graduate School of Informatics, Kyoto University

Description

Reinforcement learning (RL) methods based on least-squares temporal difference (LSTD) have been developed recently and have shown good practical performance. However, the quality of their estimation has not been well elucidated. In this article, we discuss LSTD based policy evaluation from the new viewpoint of semiparametric statistical inference. In fact, the estimator can be obtained from a particular estimating function which guarantees its convergence to the true value asymptotically, without specifying a model of the environment. Based on these observations, we 1) analyze the asymptotic variance of an LSTD-based estimator, 2) derive the optimal estimating function with the minimum asymptotic estimation variance, and 3) derive a suboptimal estimator to reduce the computational burden in obtaining the optimal estimating function.

You might be experiencing some problems with Your Video player.
Slides
0:00 A Semiparametric Statistics Approach to Model-Free Policy Evaluation
0:06 Summary of This Talk
1:19 Model-Free Reinforcement Learning
2:24 Policy Iteration [Sutton & Barto, 1998]
3:16 Policy Evaluation Method: LSTD [Bratke & Barto, 1996]
4:11 Least Square Temporal Difference (LSTD) (1)
5:17 Least Square Temporal Difference (LSTD) (2)
6:11 Linear Regression with Error in Variables
7:18 Instrumental Variable Method [Soderstrom and Stoica, 2002]
8:24 Least Square Temporal Difference (LSTD)
9:53 Our Approach
10:32 Semiparametric Statistics Approach
11:45 Inference of Semiparametric Model
12:31 Estimating Functions
13:20 Are There Any Other Estimating Functions ?
13:50 Asymptotic Variance of LSTD-Based Estimators
14:45 The Optimal Estimating Function
15:46 gLSTD
16:21 Summary of gLSTD
17:06 Simulation (Markov Random Walk)
17:58 Simulation Result.
18:39 Conclusion
19:11 Future Work
19:37 End
24:16 - Questions

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: