Policy Search

Published on 2018-10-112687 Views

Sergey Levine

DLRL Summer School 2018 - Toronto

Related categories

Presentation

Policy Search - 100:00

Policy Search - 200:09

Overview - 102:49

Overview - 203:37

Terminology & notation - 103:50

Terminology & notation - 209:31

Imitation Learning11:55

Reward functions12:37

Definitions - 113:43

Definitions - 215:17

Definitions - 316:08

Definitions - 416:27

The goal of reinforcement learning - 117:56

The goal of reinforcement learning - 220:46

Overview - 321:22

Evaluating the objective21:42

Direct policy differentiation - 123:16

Direct policy differentiation - 226:36

Evaluating the policy gradient - 128:34

Evaluating the policy gradient - 230:15

Comparison to maximum likelihood31:55

What did we just do?32:08

Partial observability34:08

What is wrong with the policy gradient?35:29

Baselines43:35

Policy gradient with automatic differentiation - 149:33

Policy gradient with automatic differentiation - 252:16

Policy gradient in practice56:40

Example: trust region policy optimization59:15

Policy gradients suggested readings01:00:08

Overview - 401:01:35

The reinforcement learning objective01:01:57

Model-free reinforcement learning01:02:14

What if we knew the transition dynamics?01:02:43

The objective01:04:12

The deterministic case01:04:37

The stochastic open-loop case01:06:02

Aside: terminology01:07:16

The stochastic closed-loop case01:07:56

Open loop control with stochastic optimization01:08:49

What else can we do?01:08:52

Closed loop control: policy search with a model01:08:57

Model-free optimization with a model01:12:03

Learning policies without BPTT: Guided policy search01:12:57

Guided Policy Search01:14:24

Overview - 501:14:29

What if we don’t know the model?01:14:51

Does it work? Yes!01:15:54

Does it work? No!01:16:38

Can we do better? - 101:19:03

What if we make a mistake?01:20:07

Can we do better? - 201:20:11

How to replan?01:20:24

Backpropagate directly into the policy?01:20:44

What about observations?01:21:13

Summary01:22:33

Reducing variance01:26:15