Optimal Online Learning Procedures for Model-Free Policy Evaluation

Published on 2009-10-202417 Views

Tsuyoshi Ueno

In this study, we extend the framework of semiparametric statistical inference introduced recently to reinforcement learning (Ueno, et.al., 2008) to online learning procedures for policy evaluation. T

Optimal Online Learning Procedures for Model-Free Policy Evaluation

Tsuyoshi Ueno

Sessions