Tradeoffs in online learning under partial information feedback

author: Csaba Szepesvári, Department of Computing Science, University of Alberta
published: Jan. 16, 2013,   recorded: December 2012,   views: 2813


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


How should an online learner choose its actions to trade off between exploration and exploitation to maximize the accuracy of predictions where the choice of actions directly influence what information the learner receives? First, using the abstract framework of partial monitoring, we provide a full answer to this question for any discrete prediction problems: As it turns out, the difficulty at the optimal tradeoff depends on a novel, yet intuitive geometric-algebraic condition. We also discuss tradeoffs and open problems concerning adaptation to benign environments, predictions with side-information, a specific problem when the learner needs to pay for accessing the feature values and the label, and the influence of delays in receiving the feedback.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: