Proximal Reinforcement Learning: Learning to Act in Primal Dual Spaces

Published on 2015-07-284366 Views

Sridhar Mahadevan

In this talk, we set forth a new framework for reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding fundamental questions t

RLDM 2015 - Edmonton

Related categories

Presentation

Proximal Reinforcement Learning: Learning to Act in Primal-Dual Spaces00:00

Thanks to my collaborators00:33

Proximal RL Framework01:40

Three Level Analysis04:02

The Key Idea of Proximal RL06:30

Recent publications08:23

Developing a True Stochastic Gradient TD Algorithm: The End of a 30 year Quest?08:54

Stability of RL Algorithms09:20

Instability of TD-Learning11:36

Take the Blue Pill or the Red?12:46

Operator Splitting14:52

Big-O Complexity15:27

Safe Reinforcement Learning with Projected Natural Actor Critic16:00

Natural Actor Critic16:11

Conjugate Functions17:37

Mirror Maps18:34

Mirror Descent = Natural Gradient!19:08

Proximal Mapping generalizes projections19:54

Gradient Descent as proximal mapping21:23

Gradient Descent as proximal operator22:53

Primal Approach to Gradient TD23:56

Details of the Framework - 223:56

Details of the Framework - 123:58

Linear System Reformulation of Gradient TD26:38

Unified Objective for Gradient TD27:15

Saddle point formulation27:37

Lemma28:49

"Gradient" TD Methods29:17

Analysis of gradient TD29:34

Variational Inequality30:48

Extragradient Method30:53

Extragradient TD-Learning31:15

What is the “optimal” gradient TD method?31:28

Mirror-Prox32:03

Proximal Gradient TD Algorithms32:22

Baird MDP33:04

50 state chain domain33:17

"Safe” Reinforcement Learning33:55

Equivalence of Natural Gradient Descent and Mirror Descent33:58

Safe Robot Learning with PNAC34:47

Proximal Reinforcement Learning35:05

Ongoing Work35:14