TD Learning

Published on 2017-07-2720861 Views

Richard S. Sutton

DLSS & RLSS 2017 - Montreal

Related categories

Presentation

Temporal-Difference Learning00:00

We are entering an era of vastly increased computation01:08

Methods that scale with computation are the future of AI04:55

Prediction learning is scalable08:09

Prediction learning is scalable - 109:58

Temporal-difference learning is a method for learning to predict11:00

TD learning is learning a prediction from another, later, learned prediction13:36

Example: TD-Gammon15:09

But do I need TD learning? or can I use ordinary supervised learning?16:10

RL + Deep Learning Performance on Atari Games16:46

TD learning is relevant only on multi-step prediction problems16:52

Examples of multi-step prediction17:43

Do we need to think about multi-step predictions?19:08

The one-step trap: Thinking that one-step predictions are sufficient19:53

Can’t we just use our familiar one-step supervised learning methods? 21:56

New RL notation25:17

Monte Carlo (Supervised Learning) (MC)32:13

Simplest TD Method32:31

cf. Dynamic Programming33:02

TD methods bootstrap and sample34:15

TD Prediction35:32

Example: Driving Home36:03

Driving Home38:00

Advantages of TD Learning40:54

Random Walk Example41:44

TD and MC on the Random Walk48:16

Batch Updating in TD and MC methods53:06

Random Walk under Batch Updating54:30

You are the Predictor56:03

You are the Predictor - 159:36

You are the Predictor - 201:01:30

Summary so far01:07:29

Unified View01:08:50

Learning An Action-Value Function01:11:43

Sarsa: On-Policy TD Control01:12:27

Q-Learning: Off-Policy TD Control01:12:31

Cliffwalking01:13:28

Expected Sarsa01:13:59

Performance on the cliff-walking task01:15:17

Off-policy Expected Sarsa01:15:22

Summary01:16:23

4 examples of the effect of bootstrapping01:16:45

With linear function approximation, TD converges to the TD fixedpoint, a biased but interesting answer01:19:30

Frontiers of TD learning01:24:02

TD learning is a uniquely important kind of learning, maybe ubiquitous01:25:25