Toward the understanding of partial-monitoring games
published: July 25, 2011, recorded: July 2011, views: 4470
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Partial monitoring games form a common ground for problems such as learning with expert advice, the multi-armed bandit problem, dynamic pricing, the dark pool problem, label efficient prediction, channel allocation in cognitive radio, linear and convex optimization with various feedbacks. How hard is to learn to play well in a partial monitoring game can be characterized by the minimax growth rate of the learner’s regret. It is well known that some of these games are harder, while some others are easier when it comes to learning, depending on how much information one receives at what cost: Some actions in a game might give more information for the learner but might be pricier, while some might give less selective (or no), but be cheaper. When designing a strategy, it should be clear that one should take into account the global information-cost structure of the game. The question asked in this talk is how? What are the good learning strategies? That is, given the structure of a game, what is the corresponding minimax regret and what algorithm achieves it? In this talk I will review recent progress toward answering this question, as well as some open problems.
Download slides: explorationexploitation2011_szepesvari_toward_01.pdf (5.1 MB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !