Dyna(k): A Multi-Step Dyna Planning

author:Hengshuai Yao, School of Creative Media, City University of Hong Kong
published: Aug. 26, 2009,   recorded: June 2009,   views: 16
You might be experiencing some problems with Your Video player.

Related content

Visitors who watched this lecture also watched...
24:32
Preconditioned Temporal Difference Learning

74 views - Hengshuai Yao, 2008
56:20
Deconstructing Reinforcement Learning

153 views - Richard S. Sutton, 2009
01:00:56
Fifty Years of RL in Games

49 views - Gerald Tesauro, 2009
03:59:04
Gradient Methods for Machine Learning

587 views - Nicol Schraudolph, 2005
02:17:49
The Neuroscience of Reinforcement Learning

329 views - Yael Niv, 2009
01:14:04
High-Level Actions

30 views - Stuart Russell, 2009
05:47:38
Introduction to Reinforcement Learning

866 views - Csaba Szepesvari, 2008
04:38
Interview with Fei-Fei Li

3394 views - Davor Orlič, Fei-Fei Li, 2006
05:22:53
Monte Carlo Simulation for Statistical Inference, Model Selection and Decision Making

8316 views - Nando de Freitas, 2008
10:40
Third Annual Reinforcement Learning Competition

54 views - David Wingate, 2009

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.

Description

Dyna planning is an efficient way of learning from real and imaginary experience. Existing tabular and linear Dyna algorithms are single-step, because an "imaginary" feature is predicted only one step into the future. In this paper, we introduce a multi-step Dyna planning that predicts more steps into the future. Multi-step Dyna is able to figure out a sequence of multi-step results when a real instance happens, given that the instance itself, or a similar experience has been imagined (i.e., simulated from the model) and planned. Our multi-step Dyna is based on a multi-step model, which we call the λ-model. The λ-model interpolates between the onestep model and an in nite-step model, and can be learned efficiently online. The multistep Dyna algorithm, Dyna(k), uses the λ- model to generate predictions k steps ahead of the imagined feature, and applies TD on this imaginary multi-step transitioning.

Link this page  

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: