Cheap Bandits
Published on Dec 05, 20151469 Views
We consider stochastic sequential learning problems where the learner can observe the average reward of several actions. Such a setting is interesting in many applications involving monitoring and sur