Thompson Sampling for the Duelling Bandits Problem

Published on 2013-05-283329 Views

Noel Welsh

In surprisingly many situations, absolute rewards are not available (or nonstationary) while relative preferences are easy to collect (or stable). This variation of the bandit problem is known at th

LSOLDM 2012 - Cumberland Lodge

Related categories