The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond

Published on 2011-08-023759 Views

Aurélien Garivier

This paper presents a finite-time analysis of the KL-UCB algorithm, an online, horizon-free index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary bounded r

COLT 2011 - Budapest

Related categories

Presentation

The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond00:00

Roadmap - 100:05

The Framework: Reinforcement Learning00:15

The (stochastic) Bernoulli Multi-Armed Bandit Model01:46

Performance Evaluation, Regret03:52

Roadmap - 205:43

Asymptotically Optimal Strategies05:43

The Bound of Lai & Robbins06:21

Roadmap - 307:24

Optimism in the Face of Uncertainty07:28

Upper Condence Bound Strategies08:06

UCB in Action09:30

Performance of UCB10:30

Roadmap - 412:31

KL-UCB12:34

The KL Upper Condence Bound in Picture - 113:00

The KL Upper Condence Bound in Picture - 213:28

Why focus on Bernoulli variables?14:24

Regret Bound for KL-UCB15:19

Comparison to UCB15:58

Main Tool: Deviation Inequality for Self-Normalized Sums16:39

Results: Two-Arm Scenario17:56

Results: Ten-Arm Scenario with Low Rewards19:51

Results: Truncated Exponentials20:32

Roadmap - 520:44

Exponential Family Rewards20:47

Results: Exponential Rewards21:28

Roadmap - 622:01

Conclusion - 122:03

Conclusion - 223:18