Generalization and Exploration via Value Function Randomization thumbnail
slide-image
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Generalization and Exploration via Value Function Randomization

Published on Jul 28, 20153593 Views

Effective reinforcement learning calls for both efficient exploration and extrapolative generalization. I will discuss a new approach to exploration which combines the merits of provably efficient t

Related categories

Chapter list

Generalization and Exploration via Value Function Randomization00:00
Online Optimization - 100:36
Online Optimization - 200:50
Exploration Strategies - 101:30
Exploration Strategies - 201:40
Exploration Strategies - 302:43
Exploration Strategies - 403:43
Example: Online Linear Programming - 104:28
Example: Online Linear Programming - 204:40
Example: Online Linear Programming - 304:52
Example: Online Linear Programming - 405:22
Example: Online Linear Programming - 506:23
Is TS “better” than UCB? - 106:39
Is TS “better” than UCB? - 206:45
Is TS “better” than UCB? - 307:25
Is TS “better” than UCB? - 407:39
Is TS “better” than UCB? - 508:02
UCB is Often Computationally Intractable - 109:11
UCB is Often Computationally Intractable - 209:26
UCB is Often Computationally Intractable - 309:35
UCB is Often Computationally Intractable - 409:59
UCB is Often Computationally Intractable - 510:11
UCB is Often Computationally Intractable - 610:13
UCB is Often Computationally Intractable - 710:43
Summary on TS versus UCB - 111:48
Summary on TS versus UCB - 211:52
Summary on TS versus UCB - 312:22
Summary on TS versus UCB - 414:36
Troubling Example: Sparse Linear Bandit - 115:19
Troubling Example: Sparse Linear Bandit - 218:25
Troubling Example: Sparse Linear Bandit - 319:16
Troubling Example: Sparse Linear Bandit - 419:51
Troubling Example: Assortment Optimization - 120:08
Troubling Example: Assortment Optimization - 220:39
Troubling Example: Assortment Optimization - 321:03
Troubling Example: Assortment Optimization - 421:24
Troubling Example: Assortment Optimization - 521:47
Information-Directed Sampling (IDS) - 122:09
Information-Directed Sampling (IDS) - 222:53
Information-Directed Sampling (IDS) - 323:21
Information-Directed Sampling (IDS) - 424:09
Reinforcement Learning - 124:20
Reinforcement Learning - 224:25
Deep Exploration - 124:31
Deep Exploration - 224:32
Deep Exploration - 325:06
Deep Exploration - 425:14
Deep Exploration - 525:55
Deep Exploration - 626:00
“Efficient RL” Literature - 126:10
“Efficient RL” Literature - 226:17
“Efficient RL” Literature - 326:42
“Efficient RL” Literature - 426:58
“Efficient RL” Literature - 527:12
“Efficient RL” Literature - 627:16
Two Cultures? - 128:02
Two Cultures? - 228:08
Two Cultures? - 328:29
Toward Deep Exploration + Generalization - 128:43
Toward Deep Exploration + Generalization - 228:49
Toward Deep Exploration + Generalization - 329:10
Toward Deep Exploration + Generalization - 430:11
Toward Deep Exploration + Generalization - 530:18
Episodic RL Framework - 130:20
Episodic RL Framework - 230:22
Episodic RL Framework - 330:23
Episodic RL Framework - 430:24
Episodic RL Framework - 530:25
Episodic RL Framework - 630:27
Episodic RL Framework - 730:28
Value Function Randomization - 130:29
Value Function Randomization - 230:31
Value Function Randomization - 330:32
Value Function Randomization - 430:32
Value Function Randomization - 530:34
Value Function Randomization - 630:35
Regret Analysis - 131:11
Regret Analysis - 231:31
Regret Analysis - 331:36
Regret Analysis - 431:50
Regret Analysis - 532:06
LSVI-Boltzmann vs. RLSVI - 132:14
LSVI-Boltzmann vs. RLSVI - 232:15
LSVI-Boltzmann vs. RLSVI - 333:02
Varying the Number of Basis Functions33:16
Agnostic Learning33:17
Deeper Reinforcement Learning - 133:19
Deeper Reinforcement Learning - 233:32
Deeper Reinforcement Learning - 333:34
Deeper Reinforcement Learning - 433:41
Deeper Reinforcement Learning - 533:58