On Convergence of Emphatic Temporal-Difference Learning thumbnail
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

On Convergence of Emphatic Temporal-Difference Learning

Published on Aug 20, 20152155 Views

We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces. Such algorithms were recently proposed by Sutton, Mahmood

Related categories

Chapter list

On Convergence of Emphatic Temporal-Difference Learning00:00
Background: Off-Policy TD Learning00:00
Emphatic TD Algorithms01:56
Our Results: Stability and Convergence - 104:00
Our Results: Stability and Convergence - 205:32
References06:35