Safe RL thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Safe RL

Published on Jul 27, 20173727 Views

Related categories

Chapter list

Safe Reinforcement Learning00:00
Overview00:10
Background01:04
Notation01:57
Background - 103:13
Potential Application: Digital Marketing04:03
Potential Application: Intelligent Tutoring Systems05:51
Potential Application: Functional Electrical Stimulation07:48
Potential Application: Diabetes Treatment09:02
Potential Application: Diabetes Treatment - 109:51
Potential Application: Diabetes Treatment - 210:20
Potential Application: Diabetes Treatment - 311:19
Potential Application: Diabetes Treatment - 413:02
Motivation for Safe Reinforcement Learning13:42
Learning Curves are Deceptive16:33
What property should a safe algorithm have?19:23
Limitations of the Safe RL Setting21:14
Standard RL vs safe RL22:04
Other Definitions of “Safe”23:37
Other Definitions of “Safe” 23:53
Risk-Sensitive Criterion24:53
Risk-Sensitive Criterion26:37
Benefits and Limitations of Changing Objectives28:03
Another notion of safety28:41
Another Definition of Safety28:45
Another Definition of Safety - 129:04
Overview - 129:43
Off-Policy Policy Evaluation (OPE)30:20
High Confidence Off-Policy Policy Evaluation (HCOPE)31:59
Safe Policy Improvement (SPI)32:52
Overview - 233:09
Importance Sampling (Intuition)33:28
Importance Sampling (Derivation)36:28
Importance Sampling (Derivation) - 139:42
Importance Sampling (Derivation) - 243:06
Importance Sampling (Derivation) - 345:16
Importance Sampling for Reinforcement Learning48:52
Computing the Importance Weight51:00
Importance Sampling for Reinforcement Learning52:14
Per-Decision Importance Sampling54:02
Importance Sampling Range / Variance55:46
Importance Sampling (More Intuition)56:54
An Idea57:12
Weighted Importance Sampling57:30
Weighted Importance Sampling - 158:14
Off-Policy Policy Evaluation (OPE) Overview59:48
Off-Policy Policy Evaluation (OPE) Examples01:00:43
High confidence off-policy policy evaluation (HCOPE)01:02:01
Hoeffding’s Inequality01:02:31
Applying Hoeffding’s Inequality01:04:28
What went wrong?01:05:23
Applying Other Concentration Inequalities01:05:40
Approximate Confidence Intervals: 𝑡-Test01:06:17
CI vs 𝑡-Test vs Bootstrap (non-negative rewards)01:07:24
HCOPE: Mountain Car01:07:55
HCOPE: Digital Marketing01:08:27
HCOPE Using Weighted Per-Decision Importance Sampling and Student’s 𝑡-Test01:08:41
Safe Policy Improvement (SPI)01:09:08
Safe Policy Improvement01:09:19
Selecting the Candidate Policy01:10:53
Experimental Results: Mountain Car01:12:39
Experimental Results: Mountain Car - 101:14:30
Experimental Results: Mountain Car - 201:15:10
Experimental Results: Digital Marketing01:15:56
Experimental Results: Digital Marketing - 101:16:13
Experimental Results: Digital Marketing - 201:16:18
Experimental Results: Diabetes Treatment01:17:13
Experimental Results: Diabetes Treatment - 101:17:51
Conclusion: Summary01:20:33
Conclusion: Future Directions01:21:02
Conclusion: References and Additional Reading01:21:09