TD Learning thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

TD Learning

Published on Jul 27, 201720337 Views

Related categories

Chapter list

Temporal-Difference Learning00:00
We are entering an era of vastly increased computation01:08
Methods that scale with computation are the future of AI04:55
Prediction learning is scalable08:09
Prediction learning is scalable - 109:58
Temporal-difference learning is a method for learning to predict11:00
TD learning is learning a prediction from another, later, learned prediction13:36
Example: TD-Gammon15:09
But do I need TD learning? or can I use ordinary supervised learning?16:10
RL + Deep Learning Performance on Atari Games16:45
RL + Deep Learning Performance on Atari Games16:46
TD learning is relevant only on multi-step prediction problems16:52
Examples of multi-step prediction17:43
Do we need to think about multi-step predictions?19:08
The one-step trap: Thinking that one-step predictions are sufficient19:53
Can’t we just use our familiar one-step supervised learning methods? 21:56
New RL notation25:17
Monte Carlo (Supervised Learning) (MC)32:13
Simplest TD Method32:31
cf. Dynamic Programming33:02
TD methods bootstrap and sample34:15
TD Prediction35:32
Example: Driving Home36:03
Driving Home38:00
Advantages of TD Learning40:54
Random Walk Example41:44
TD and MC on the Random Walk48:16
Batch Updating in TD and MC methods53:06
Random Walk under Batch Updating54:30
You are the Predictor56:03
You are the Predictor - 159:36
You are the Predictor - 201:01:30
Summary so far01:07:29
Unified View01:08:50
Learning An Action-Value Function01:11:43
Sarsa: On-Policy TD Control01:12:27
Q-Learning: Off-Policy TD Control01:12:31
Cliffwalking01:13:28
Expected Sarsa01:13:59
Performance on the cliff-walking task01:15:17
Off-policy Expected Sarsa01:15:22
Summary01:16:23
4 examples of the effect of bootstrapping01:16:45
With linear function approximation, TD converges to the TD fixedpoint, a biased but interesting answer01:19:30
Frontiers of TD learning01:24:02
TD learning is a uniquely important kind of learning, maybe ubiquitous01:25:25