en-es
en-fr
en-sl
en
0.25
0.5
0.75
1.25
1.5
1.75
2
Reinforcement Learning
Published on Jul 27, 20175742 Views
Related categories
Chapter list
Steps Towards Continual Learning00:00
Some elements of Continual Learning01:02
A Child’s Playroom Domain08:10
Options10:41
Coverage of Continual Learning elements?10:59
Number of actions13:20
Hierarchy of Reusable Skills15:48
Do the Intrinsic Motivations Help?16:25
Discussion17:10
On the Optimal Reward Problem19:37
Autonomous Agent Problem20:06
Preferences-Parameters Confound21:28
Revised Autonomous Agent22:18
Approaches to designing reward23:35
Optimal Reward Problem23:41
Illustration: Fish-or-Bait27:47
Reward Space29:07
(PP Confound MaQers?) Mitigation33:43
Policy Gradient for Reward Design (PGRD)33:47
PGRD approximates gradient in 2 parts35:56
Deep Learning for Reward Design to Improve UCT in ATARI36:19
Forward View: From Rewards to Utility - 136:21
Backward View: From Rewards to Utility - 237:40
Main Results: improving UCT41:47
Repeated Inverse Reinforcement Learning43:33
Inverse Reinforcement Learning44:09
Unidentifiability of Inverse RL - 144:46
Unidentifiability of Inverse RL - 245:39
Lifelong Learning Agents46:43
Looking more carefully at unidentifiability51:01
Representational Unidentifiability51:35
“Experimenter” chooses tasks - 153:44
“Experimenter” chooses tasks - 255:28
The ellipsoid algorithm57:59
Experimenter chooses tasks58:22
Zero-Shot Task Generalization by Learning to Compose Sub-Tasks58:24
Rapid generalization is key to Continual Learning58:54
Problem: Instruction Execution59:35
Challenges01:00:55
Overview - 101:02:07
Overview - 201:02:49
Goal Decomposition01:02:59
Multitask Controller Architecture01:03:05
Analogy Making Regularization - 101:03:44
Analogy Making Regularization - 201:04:16
Multitask Controller: Training01:04:36
Meta Controller01:04:58
Meta Controller Architecture01:05:06
Meta Controller: Learning Temporal Abstraction - 101:05:39
Meta Controller: Learning Temporal Abstraction - 201:08:08
Meta Controller: Learning Temporal Abstraction - 301:09:06
Does it Work?01:11:36
Value Prediction Networks01:12:33
Motivation01:13:13
VPN: Architecture01:15:17
Planning in VPNs01:18:37
Learning in VPNs01:19:28
Collect Domain: Results 101:22:33
Collect Domain: Results 201:23:59
Collect Domain: Comparisons01:24:09
VPN: Results on ATARI Games01:24:10