From Bandits to Experts : On the Value of More Information
published: July 25, 2011, recorded: July 2011, views: 3065
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Learning from Experts and Multi-armed Bandits are two of the most common settings studied in online learning. Whereas the first setting assumes that the performance of all k actions are revealed at the end of each round, the bandit setting assumes that only the performance of the chosen action is revealed, with corresponding √k-degradation in the provable regret guarantee. In this paper, we study a natural setting which interpolates between the experts and the bandits setting, where choosing an action also reveals some side-information on the performance of some of the other actions. We develop practical algorithms with provable regret guarantees, as well as partially-matching lower bounds. The regret depends on non- trivial graph theoretic properties of the information feedback structure, and has an interesting trade-off between regret optimality and computational efficiency. We end by discussing some of the many open questions that remain.
Download slides: explorationexploitation2011_shamir_bandits_01.pdf (1.4 MB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !