Introduction to Reinforcement Learning

Published on 2016-08-2349039 Views

Joelle Pineau

Deep Learning Summer School 2016 - Montreal

Related categories

Presentation

Reinforcement Learning: From basic concepts to deep Q-networks00:00

Reinforcement learning00:55

Many applications of RL02:53

RL system circa 1990’s: TD-Gammon03:27

Human-level Atari agent (2015)05:05

DeepMind’s AlphaGo (2016)06:03

Adaptive neurostimulation for epilepsy suppression06:35

When to use RL?07:42

RL vs supervised learning09:00

Markov Decision Process (MDP)12:44

The Markov property13:23

Maximizing utility14:13

The discount factor, γ16:09

The policy17:02

Example: Career Options18:03

Value functions19:44

The value of a policy - 120:32

The value of a policy - 221:44

The value of a policy - 322:00

The value of a policy - 422:46

The value of a policy - 523:43

Iterative Policy Evaluation24:23

Convergence of Iterative Policy Evaluation25:36

Optimal policies and optimal value functions - 126:28

Optimal policies and optimal value functions - 227:48

Finding a good policy: Policy Iteration29:37

Questions? - 131:47

Finding a good policy: Value iteration32:09

A 4x3 gridworld example32:25

Value Iteration (1)33:03

Value Iteration (2)33:25

Value Iteration (5)33:45

Value Iteration (20)34:04

Another example: Four Rooms35:11

Asynchronous value iteration35:20

Want to know more?36:19

Key challenges in RL37:00

Episodic / Continuing40:18

Tabular / Function approximation42:19

Batch / Online43:24

Online learning44:49

Temporal-Difference with function approx.47:58

TD-Gammon (Tesauro, 1992)49:06

Online learning with eligibility: TD(λ)49:55

The RL lingo50:29

On-policy/Off-policy50:40

Exploration/Exploitation - 152:04

Exploration/Exploitation - 252:28

Model-based vs Model-free RL53:17

Policy Optimization/Value Function54:11

The RL lingo – done!55:30

In large state spaces: Need approximation56:50

Learning representations for RL01:02:24

Deep Q-network (DQN)01:02:35

Training score - 101:02:58

Training score - 201:03:38

DQN: Useful tips for stability - 101:04:05

DQN: Useful tips for stability - 201:05:29

Double DQN: Avoiding positive bias01:07:40

Dueling Q-networks - 101:08:31

Dueling Q-networks - 201:10:02

Untitled01:10:28

Deep Q-learning in the real world?01:11:44

Dialogue systems01:12:50

Neural interpretation machine01:13:10

Questions? - 201:13:11