video thumbnail

Reinforcement Learning

Published on 2017-07-2717614 Views

Joelle Pineau

DLSS & RLSS 2017 - Montreal

Related categories

Deep Learning Reinforcement Learning Unsupervised Learning

Presentation

Reinforcement Learning: Basic concepts00:00

Reinforcement learning03:45

RL system circa 1990’s: TD-Gammon05:04

2016: World Go Champion Beaten by Deep Learning05:31

RL applications at RLDM 201706:13

When to use RL?07:06

RL vs supervised learning11:06

RL vs supervised learning - 112:38

Markov Decision Process (MDP)15:31

The Markov property18:14

The goal of RL? Maximize return!20:06

The goal of RL? Maximize return! 20:06

The discount factor22:24

Defining behavior: The policy24:40

Example: Career Options26:36

Value functions29:59

Getting confused with terminology?31:43

The value of a policy34:18

The value of a policy - 135:48

The value of a policy - 236:48

The value of a policy - 339:43

The value of a policy - 441:07

Iterative Policy Evaluation: Fixed policy43:07

Convergence of Iterative Policy Evaluation44:35

Optimal policies and optimal value functions45:57

Finding a good policy: Policy Iteration49:34

Finding a good policy: Value iteration52:57

Three related algorithms54:08

A 4x3 gridworld example58:58

Value Iteration (1)59:51

Value Iteration (2)01:00:00

Value Iteration (5)01:00:55

Value Iteration (20)01:00:58

Another example: Four Rooms01:02:18

Asynchronous value iteration01:03:00

Generalized Policy Iteration01:03:57

Key challenges in RL01:04:37

Learning online from trial & error01:08:47

Online reinforcement learning01:10:10

Temporal-Difference (TD) learning01:12:11

TD-Gammon (Tesauro, 1992)01:15:47

Several challenges in RL01:16:11

Tabular / Function approximation01:16:16

In large state spaces: Need approximation01:16:32

Learning representations for RL01:17:35

Deep Reinforcement Learning01:18:28

Deep RL in Minecraft01:19:38

The RL lingo01:20:26

On-policy / Off-policy01:21:18

Exploration / Exploitation01:24:28

Exploration / Exploitation - 101:24:44

Model-based vs Model-free RL01:25:56

Policy Optimization / Value Function01:27:28

Quick summary01:28:00

RL resources01:28:39