Thompson Sampling: a provably good Bayesian heuristic for bandit problems thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Thompson Sampling: a provably good Bayesian heuristic for bandit problems

Published on Nov 07, 20136349 Views

Multi-armed bandit problem is a basic model for managing the exploration/exploitation trade-off that arises in many situations. Thompson Sampling [Thompson 1933] is one of the earliest heuristic for t

Related categories

Chapter list

Thompson Sampling:a provably good heuristic for multi-armed bandits00:00
Stochastic multi-armed bandits00:18
Thompson Sampling - 101:36
History03:01
New theoretical guarantees04:03
Linear contextual bandits05:58
Thompson Sampling - 209:26
Thompson Sampling - 311:58
Thompson Sampling vs. UCB based algorithms13:09
Thompson sampling for linear contextual bandits14:57
Regret bounds15:52
Proof outline17:41
Proof techniques21:35
Proof of key inequality - 124:02
Proof of key inequality - 225:58
Summary: proof of regret bound27:34
Conclusion28:26