en-es
en-fr
en-sl
en
0.25
0.5
0.75
1.25
1.5
1.75
2
Introduction to Reinforcement Learning
Published on Aug 23, 201648696 Views
Related categories
Chapter list
Reinforcement Learning: From basic concepts to deep Q-networks00:00
Reinforcement learning00:55
Many applications of RL02:53
RL system circa 1990’s: TD-Gammon03:27
Human-level Atari agent (2015)05:05
DeepMind’s AlphaGo (2016)06:03
Adaptive neurostimulation for epilepsy suppression06:35
When to use RL?07:42
RL vs supervised learning09:00
Markov Decision Process (MDP)12:44
The Markov property13:23
Maximizing utility14:13
The discount factor, γ16:09
The policy17:02
Example: Career Options18:03
Value functions19:44
The value of a policy - 120:32
The value of a policy - 221:44
The value of a policy - 322:00
The value of a policy - 422:46
The value of a policy - 523:43
Iterative Policy Evaluation24:23
Convergence of Iterative Policy Evaluation25:36
Optimal policies and optimal value functions - 126:28
Optimal policies and optimal value functions - 227:48
Finding a good policy: Policy Iteration29:37
Questions? - 131:47
Finding a good policy: Value iteration32:09
A 4x3 gridworld example32:25
Value Iteration (1)33:03
Value Iteration (2)33:25
Value Iteration (5)33:45
Value Iteration (20)34:04
Another example: Four Rooms35:11
Asynchronous value iteration35:20
Want to know more?36:19
Key challenges in RL37:00
The RL lingo39:28
Episodic / Continuing40:18
Tabular / Function approximation42:19
Batch / Online43:24
Online learning44:49
Temporal-Difference with function approx.47:58
TD-Gammon (Tesauro, 1992)49:06
Online learning with eligibility: TD(λ)49:55
The RL lingo50:29
On-policy/Off-policy50:40
Exploration/Exploitation - 152:04
Exploration/Exploitation - 252:28
Model-based vs Model-free RL53:17
Policy Optimization/Value Function54:11
The RL lingo – done!55:30
In large state spaces: Need approximation56:50
Untitled57:57
Untitled01:02:01
Learning representations for RL01:02:24
Deep Q-network (DQN)01:02:35
Training score - 101:02:58
Untitled01:02:58
Training score - 201:03:38
DQN: Useful tips for stability - 101:04:05
DQN: Useful tips for stability - 201:05:29
Double DQN: Avoiding positive bias01:07:40
Dueling Q-networks - 101:08:31
Dueling Q-networks - 201:10:02
Untitled01:10:28
Deep Q-learning in the real world?01:11:44
Dialogue systems01:12:50
Neural interpretation machine01:13:10
Questions? - 201:13:11