Thompson Sampling for the Duelling Bandits Problem
Published on May 28, 20133325 Views
In surprisingly many situations, absolute rewards are not available (or nonstationary) while relative preferences are easy to collect (or stable). This variation of the bandit problem is known at th