Stochastic Regret Minimization via Thompson Sampling thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Stochastic Regret Minimization via Thompson Sampling

Published on Jul 15, 20142424 Views

The Thompson Sampling (TS) policy is a widely implemented algorithm for the stochastic multi-armed bandit (MAB) problem. Given a prior distribution over possible parameter settings of the underlying r

Related categories