Stochastic Regret Minimization via Thompson Sampling

Published on 2014-07-152444 Views

Sudipto Guha

The Thompson Sampling (TS) policy is a widely implemented algorithm for the stochastic multi-armed bandit (MAB) problem. Given a prior distribution over possible parameter settings of the underlying r

COLT 2014 - Barcelona

Related categories