Policy Search thumbnail
slide-image
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Policy Search

Published on Oct 11, 20182639 Views

Related categories

Chapter list

Policy Search - 100:00
Policy Search - 200:09
Overview - 102:49
Overview - 203:37
Terminology & notation - 103:50
Terminology & notation - 209:31
Imitation Learning11:55
Reward functions12:37
Definitions - 113:43
Definitions - 215:17
Definitions - 316:08
Definitions - 416:27
The goal of reinforcement learning - 117:56
The goal of reinforcement learning - 220:46
Overview - 321:22
Evaluating the objective21:42
Direct policy differentiation - 123:16
Direct policy differentiation - 226:36
Evaluating the policy gradient - 128:34
Evaluating the policy gradient - 230:15
Comparison to maximum likelihood31:55
What did we just do?32:08
Partial observability34:08
What is wrong with the policy gradient?35:29
Baselines43:35
Policy gradient with automatic differentiation - 149:33
Policy gradient with automatic differentiation - 252:16
Policy gradient in practice56:40
Example: trust region policy optimization59:15
Policy gradients suggested readings01:00:08
Overview - 401:01:35
The reinforcement learning objective01:01:57
Model-free reinforcement learning01:02:14
What if we knew the transition dynamics?01:02:43
The objective01:04:12
The deterministic case01:04:37
The stochastic open-loop case01:06:02
Aside: terminology01:07:16
The stochastic closed-loop case01:07:56
Open loop control with stochastic optimization01:08:49
What else can we do?01:08:52
Closed loop control: policy search with a model01:08:57
Backpropagate directly into the policy?01:09:21
Model-free optimization with a model01:12:03
Learning policies without BPTT: Guided policy search01:12:57
Guided Policy Search01:14:24
Overview - 501:14:29
What if we don’t know the model?01:14:51
Does it work? Yes!01:15:54
Does it work? No!01:16:38
Can we do better? - 101:19:03
What if we make a mistake?01:20:07
Can we do better? - 201:20:11
How to replan?01:20:24
Backpropagate directly into the policy?01:20:44
What about observations?01:21:13
Summary01:22:33
Reducing variance01:26:15