On Convergence of Emphatic Temporal-Difference Learning

Published on 2015-08-202172 Views

Huizhen Yu

We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces. Such algorithms were recently proposed by Sutton, Mahmood

COLT 2015 - Paris

Related categories

Presentation

On Convergence of Emphatic Temporal-Difference Learning00:00

Background: Off-Policy TD Learning00:00

Emphatic TD Algorithms01:56

Our Results: Stability and Convergence - 104:00

Our Results: Stability and Convergence - 205:32

References06:35