Thompson Sampling for the Duelling Bandits Problem thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Thompson Sampling for the Duelling Bandits Problem

Published on May 28, 20133326 Views

In surprisingly many situations, absolute rewards are not available (or nonstationary) while relative preferences are easy to collect (or stable). This variation of the bandit problem is known at th

Related categories