The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond thumbnail
slide-image
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond

Published on Aug 02, 20113739 Views

This paper presents a finite-time analysis of the KL-UCB algorithm, an online, horizon-free index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary bounded r

Related categories

Chapter list

The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond00:00
Roadmap - 100:05
The Framework: Reinforcement Learning00:15
The (stochastic) Bernoulli Multi-Armed Bandit Model01:46
Performance Evaluation, Regret03:52
Roadmap - 205:43
Asymptotically Optimal Strategies05:43
The Bound of Lai & Robbins06:21
Roadmap - 307:24
Optimism in the Face of Uncertainty07:28
Upper Con dence Bound Strategies08:06
UCB in Action09:30
Performance of UCB10:30
Roadmap - 412:31
KL-UCB12:34
The KL Upper Con dence Bound in Picture - 113:00
The KL Upper Con dence Bound in Picture - 213:28
Why focus on Bernoulli variables?14:24
Regret Bound for KL-UCB15:19
Comparison to UCB15:58
Main Tool: Deviation Inequality for Self-Normalized Sums16:39
Results: Two-Arm Scenario17:56
Results: Ten-Arm Scenario with Low Rewards19:51
Results: Truncated Exponentials20:32
Roadmap - 520:44
Exponential Family Rewards20:47
Results: Exponential Rewards21:28
Roadmap - 622:01
Conclusion - 122:03
Conclusion - 223:18