Proximal Reinforcement Learning: Learning to Act in Primal Dual Spaces

author: Sridhar Mahadevan, College of Information and Computer Sciences, University of Massachusetts Amherst
published: July 28, 2015,   recorded: June 2015,   views: 287


Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


In this talk, we set forth a new framework for reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding fundamental questions that have remained unresolved over the past three decades: (i) how to design “safe” reinforcement learning algorithms that remain in a stable region of the parameter space (ii) how to design true stochastic gradient temporal-difference learning algorithms and give finite-sample bounds characterizing their convergence? (iii) more broadly, how to specify a flexible algorithmic framework that simplifies the design of reinforcement learning algorithms for various objective functions? The most important idea that emerges as a motif throughout the solution of these three problems is the use of primal dual spaces connected through the use of “mirror maps”: Legendre transforms that elegantly unify and generalize a myriad past algorithms for solving reinforcement learning problems, from natural gradient actor-critic methods and exponentiated-gradient methods to gradient TD and sparse RL methods. We introduce mirror-descent RL, a powerful family of RL methods that uses mirror maps through different Legendre transforms to achieve reliability, scalability, and sparsity. Our work builds extensively on the past 50 years of advances in stochastic optimization, from the study of proximal mappings, monotone operators, and operator splitting began in the mid-1950s to recent advances in first-order optimization and saddle-point extragradient methods for solving variational inequalities.

See Also:

Download slides icon Download slides: rldm2015_mahadevan_dual_spaces_01.pdf (15.1 MB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: