Tradeoffs in online learning under partial information feedback

Published on 2013-01-162841 Views

Csaba Szepesvári

How should an online learner choose its actions to trade off between exploration and exploitation to maximize the accuracy of predictions where the choice of actions directly influence what informatio

Multi-Trade-offs in Machine Learning

Related categories

On-line Learning

Presentation

Tradeoffs in online learning under partial information feedback00:00

Collaborators00:04

Contents00:25

Partial monitoring - an example00:48

The tradeoff - 104:29

Characterization of the tradeoff05:17

Theorem12:11

Back to dynamic pricing14:15

An adaptive strategy16:59

Adaptive control of the tradeoff!18:43

Prediction with side information18:57

The regret20:31

The algorithm CBP-Side21:33

Assumption:23:51

Result for logistic regression27:30

Open problems - 127:47

Online probing28:53

Regret30:47

The tradeoff - 232:22

Free labels!32:39

Finite competitor set F, Lipschitz losses33:08

Lipschitz losses: Covering arguments36:43

Adding structure: Linear regression with quadratic losses37:35

Regret for regression40:53

When the label is costly..43:27

The tradeoff - 345:06

Open problems - 245:52

Distributed bandit optimization46:12

Simplified model47:21

The tradeoff - 448:16

Previous work48:38

Our results49:20

Method for the adversarial setting - 149:58

Method for the adversarial setting - 251:11

Method for the adversarial setting - 351:54

Open questions52:23

Conclusions53:40