0.25
0.5
0.75
1.25
1.5
1.75
2
On Convergence of Emphatic Temporal-Difference Learning
Published on Aug 20, 20152150 Views
We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces. Such algorithms were recently proposed by Sutton, Mahmood
Related categories
Chapter list
On Convergence of Emphatic Temporal-Difference Learning00:00
Background: Off-Policy TD Learning00:00
Emphatic TD Algorithms01:56
Our Results: Stability and Convergence - 104:00
Our Results: Stability and Convergence - 205:32
References06:35