
0.25
0.5
0.75
1.25
1.5
1.75
2
On Convergence of Emphatic Temporal-Difference Learning
Published on Feb 4, 20252162 Views
We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces. Such algorithms were recently proposed by Sutton, Mahmood
Related categories
Presentation
On Convergence of Emphatic Temporal-Difference Learning00:00
Background: Off-Policy TD Learning00:00
Emphatic TD Algorithms01:56
Our Results: Stability and Convergence - 104:00
Our Results: Stability and Convergence - 205:32
References06:35