video thumbnail
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

On Convergence of Emphatic Temporal-Difference Learning

Published on Feb 4, 20252162 Views

We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces. Such algorithms were recently proposed by Sutton, Mahmood

Related categories

Presentation

On Convergence of Emphatic Temporal-Difference Learning00:00
Background: Off-Policy TD Learning00:00
Emphatic TD Algorithms01:56
Our Results: Stability and Convergence - 104:00
Our Results: Stability and Convergence - 205:32
References06:35