Reinforcement Learning thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Reinforcement Learning

Published on Jul 27, 201717546 Views

Related categories

Chapter list

Reinforcement Learning: Basic concepts00:00
Reinforcement learning03:45
RL system circa 1990’s: TD-Gammon05:04
2016: World Go Champion Beaten by Deep Learning05:31
RL applications at RLDM 201706:13
When to use RL?07:06
RL vs supervised learning11:06
RL vs supervised learning - 112:38
Markov Decision Process (MDP)15:31
The Markov property16:53
The Markov property18:14
The goal of RL? Maximize return!20:06
The goal of RL? Maximize return! 20:06
The discount factor22:24
Defining behavior: The policy24:40
Example: Career Options26:36
Value functions29:59
Getting confused with terminology?31:43
The value of a policy34:18
The value of a policy - 135:48
The value of a policy - 236:48
The value of a policy - 339:43
The value of a policy - 441:07
Iterative Policy Evaluation: Fixed policy43:07
Convergence of Iterative Policy Evaluation44:35
Optimal policies and optimal value functions45:57
Optimal policies and optimal value functions47:11
Finding a good policy: Policy Iteration49:34
Finding a good policy: Value iteration52:57
Three related algorithms54:08
A 4x3 gridworld example58:58
Value Iteration (1)59:51
Value Iteration (2)01:00:00
Value Iteration (5)01:00:55
Value Iteration (20)01:00:58
Another example: Four Rooms01:02:18
Asynchronous value iteration01:03:00
Generalized Policy Iteration01:03:57
Key challenges in RL01:04:37
Learning online from trial & error01:08:47
Online reinforcement learning01:10:10
Temporal-Difference (TD) learning01:12:11
TD-Gammon (Tesauro, 1992)01:15:47
Several challenges in RL01:16:11
Tabular / Function approximation01:16:16
In large state spaces: Need approximation01:16:32
Learning representations for RL01:17:35
Deep Reinforcement Learning01:18:28
Deep RL in Minecraft01:19:38
The RL lingo01:20:26
On-policy / Off-policy01:21:18
Exploration / Exploitation01:24:28
Exploration / Exploitation - 101:24:44
Model-based vs Model-free RL01:25:56
Policy Optimization / Value Function01:27:28
Quick summary01:28:00
RL resources01:28:39