Generalization and Exploration via Value Function Randomization

Published on 2015-07-283629 Views

Ben Van Roy

Effective reinforcement learning calls for both efficient exploration and extrapolative generalization. I will discuss a new approach to exploration which combines the merits of provably efficient t

RLDM 2015 - Edmonton

Related categories

Presentation

Generalization and Exploration via Value Function Randomization00:00

Online Optimization - 100:36

Online Optimization - 200:50

Exploration Strategies - 101:30

Exploration Strategies - 201:40

Exploration Strategies - 302:43

Exploration Strategies - 403:43

Example: Online Linear Programming - 104:28

Example: Online Linear Programming - 204:40

Example: Online Linear Programming - 304:52

Example: Online Linear Programming - 405:22

Example: Online Linear Programming - 506:23

Is TS “better” than UCB? - 106:39

Is TS “better” than UCB? - 206:45

Is TS “better” than UCB? - 307:25

Is TS “better” than UCB? - 407:39

Is TS “better” than UCB? - 508:02

UCB is Often Computationally Intractable - 109:11

UCB is Often Computationally Intractable - 209:26

UCB is Often Computationally Intractable - 309:35

UCB is Often Computationally Intractable - 409:59

UCB is Often Computationally Intractable - 510:11

UCB is Often Computationally Intractable - 610:13

UCB is Often Computationally Intractable - 710:43

Summary on TS versus UCB - 111:48

Summary on TS versus UCB - 211:52

Summary on TS versus UCB - 312:22

Summary on TS versus UCB - 414:36

Troubling Example: Sparse Linear Bandit - 115:19

Troubling Example: Sparse Linear Bandit - 218:25

Troubling Example: Sparse Linear Bandit - 319:16

Troubling Example: Sparse Linear Bandit - 419:51

Troubling Example: Assortment Optimization - 120:08

Troubling Example: Assortment Optimization - 220:39

Troubling Example: Assortment Optimization - 321:03

Troubling Example: Assortment Optimization - 421:24

Troubling Example: Assortment Optimization - 521:47

Information-Directed Sampling (IDS) - 122:09

Information-Directed Sampling (IDS) - 222:53

Information-Directed Sampling (IDS) - 323:21

Information-Directed Sampling (IDS) - 424:09

Reinforcement Learning - 124:20

Reinforcement Learning - 224:25

Deep Exploration - 124:31

Deep Exploration - 224:32

Deep Exploration - 325:06

Deep Exploration - 425:14

Deep Exploration - 525:55

Deep Exploration - 626:00

“Efﬁcient RL” Literature - 126:10

“Efﬁcient RL” Literature - 226:17

“Efﬁcient RL” Literature - 326:42

“Efﬁcient RL” Literature - 426:58

“Efﬁcient RL” Literature - 527:12

“Efﬁcient RL” Literature - 627:16

Two Cultures? - 128:02

Two Cultures? - 228:08

Two Cultures? - 328:29

Toward Deep Exploration + Generalization - 128:43

Toward Deep Exploration + Generalization - 228:49

Toward Deep Exploration + Generalization - 329:10

Toward Deep Exploration + Generalization - 430:11

Toward Deep Exploration + Generalization - 530:18

Episodic RL Framework - 130:20

Episodic RL Framework - 230:22

Episodic RL Framework - 330:23

Episodic RL Framework - 430:24

Episodic RL Framework - 530:25

Episodic RL Framework - 630:27

Episodic RL Framework - 730:28

Value Function Randomization - 130:29

Value Function Randomization - 230:31

Value Function Randomization - 330:32

Value Function Randomization - 430:32

Value Function Randomization - 530:34

Value Function Randomization - 630:35

Regret Analysis - 131:11

Regret Analysis - 231:31

Regret Analysis - 331:36

Regret Analysis - 431:50

Regret Analysis - 532:06

LSVI-Boltzmann vs. RLSVI - 132:14

LSVI-Boltzmann vs. RLSVI - 232:15

LSVI-Boltzmann vs. RLSVI - 333:02

Varying the Number of Basis Functions33:16

Agnostic Learning33:17

Deeper Reinforcement Learning - 133:19

Deeper Reinforcement Learning - 233:32

Deeper Reinforcement Learning - 333:34

Deeper Reinforcement Learning - 433:41

Deeper Reinforcement Learning - 533:58