video thumbnail

Deep RL

Published on 2018-10-111670 Views

Marc G. Bellemare

DLRL Summer School 2018 - Toronto

Related categories

Deep Learning Reinforcement Learning Unsupervised Learning

Presentation

Deep Reinforcement Learning and the Atari 260000:00

DEEP, REINFORCEMENT LEARNING00:14

Deep RL00:41

Some challenges in deep RL - 101:55

Some challenges in deep RL - 202:52

Some challenges in deep RL - 304:26

Some challenges in deep RL - 405:15

Stella - 105:59

Stella - 207:12

Stella - 308:16

General Competeny09:12

Narrow competency09:55

Diverse, Interesting, Independent11:01

Diverse - 111:21

Diverse - 211:45

Diverse - 313:08

Interesting (to people)13:21

Interesting (by and for people)14:04

Early attempts (2010 - 2013) - 114:56

Early attempts (2010 - 2013) - 216:10

Deep Q-Networks (DQN)17:38

DQN - 119:29

DQN - 220:10

DQN - 321:39

DQN - 422:41

LSTM23:31

Dueling networks (Wang et al., 2016)...26:30

Prioritized replay (Schaul et al., 2016)...27:03

Double Q-learning (van Hasselt et al., 2015)...28:54

1. Distributional reinforcement learning 30:03

1. Distributional reinforcement learning - 133:50

Bellman equation - 134:03

Ground truth, Implied model35:00

Implied model - 135:59

Implied model - 237:11

Implied model - 337:30

Bellman equation - 237:45

Distributional bellman equation 38:01

Value distribution38:51

$15039:05

$450 - 140:37

$450 - 241:30

$30041:44

Discrete distribution41:47

�-greedy w.r.t expected value43:02

Approximation44:25

From x, a, sample a transition - 144:49

From x, a, sample a transition - 245:04

From x, a, sample a transition - 349:36

Mean I Median I > H.B. I > DQN50:20

Seaquest51:35

Time - 152:16

Time - 254:30

Distributional perspective55:02

2. Exploration with pseudo-counts55:41

September 2017 - 156:53

September 2017 - 257:02

September 2017 - 358:04

September 2017 - 458:48

Exploration59:34

Exploration - 201:00:59

Most observations 01:01:36

Generative model01:01:59

Density model - 101:02:15

Density model - 201:02:34

Train01:03:09

The “CTS” model - 101:08:17

The “CTS” model - 201:09:12

periods without salient events01:09:31

Exploration - 301:10:18

Exploration - 401:11:18

Start01:12:00

Average Score01:14:24

Credit assignment issues in exploration01:15:42

Effect of mixed monte carlo update01:16:00

Removing extrinsic rewards - 101:17:04

Removing extrinsic rewards - 201:17:29

Removing extrinsic rewards - 301:18:00