
0.25
0.5
0.75
1.25
1.5
1.75
2
Cheap Bandits
Published on 2015-12-051471 Views
We consider stochastic sequential learning problems where the learner can observe the average reward of several actions. Such a setting is interesting in many applications involving monitoring and sur