Thompson Sampling: a provably good Bayesian heuristic for bandit problems

Published on 2013-11-076365 Views

Shipra Agrawal

Multi-armed bandit problem is a basic model for managing the exploration/exploitation trade-off that arises in many situations. Thompson Sampling [Thompson 1933] is one of the earliest heuristic for t

LSOLDM 2013 - Cumberland Lodge

Related categories

Presentation

Thompson Sampling:a provably good heuristic for multi-armed bandits00:00

Stochastic multi-armed bandits00:18

Thompson Sampling - 101:36

History03:01

New theoretical guarantees04:03

Linear contextual bandits05:58

Thompson Sampling - 209:26

Thompson Sampling - 311:58

Thompson Sampling vs. UCB based algorithms13:09

Thompson sampling for linear contextual bandits14:57

Regret bounds15:52

Proof outline17:41

Proof techniques21:35

Proof of key inequality - 124:02

Proof of key inequality - 225:58

Summary: proof of regret bound27:34

Conclusion28:26