From Bandits to Experts : On the Value of More Information

author: Ohad Shamir, Faculty of Mathematics and Computer Science, Weizmann Institute of Science
published: July 25, 2011,   recorded: July 2011,   views: 3065


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Learning from Experts and Multi-armed Bandits are two of the most common settings studied in online learning. Whereas the first setting assumes that the performance of all k actions are revealed at the end of each round, the bandit setting assumes that only the performance of the chosen action is revealed, with corresponding √k-degradation in the provable regret guarantee. In this paper, we study a natural setting which interpolates between the experts and the bandits setting, where choosing an action also reveals some side-information on the performance of some of the other actions. We develop practical algorithms with provable regret guarantees, as well as partially-matching lower bounds. The regret depends on non- trivial graph theoretic properties of the information feedback structure, and has an interesting trade-off between regret optimality and computational efficiency. We end by discussing some of the many open questions that remain.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: