Tradeoffs in online learning under partial information feedback thumbnail
slide-image
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Tradeoffs in online learning under partial information feedback

Published on Jan 16, 20132824 Views

How should an online learner choose its actions to trade off between exploration and exploitation to maximize the accuracy of predictions where the choice of actions directly influence what informatio

Related categories

Chapter list

Tradeoffs in online learning under partial information feedback00:00
Collaborators00:04
Contents00:25
Partial monitoring - an example00:48
The tradeoff - 104:29
Characterization of the tradeoff05:17
Theorem12:11
Back to dynamic pricing14:15
An adaptive strategy16:59
Adaptive control of the tradeoff!18:43
Prediction with side information18:57
The regret20:31
The algorithm CBP-Side21:33
Assumption:23:51
Result for logistic regression27:30
Open problems - 127:47
Online probing28:53
Regret30:47
The tradeoff - 232:22
Free labels!32:39
Finite competitor set F, Lipschitz losses33:08
Lipschitz losses: Covering arguments36:43
Adding structure: Linear regression with quadratic losses37:35
Regret for regression40:53
When the label is costly..43:27
The tradeoff - 345:06
Open problems - 245:52
Distributed bandit optimization46:12
Simplified model47:21
The tradeoff - 448:16
Previous work48:38
Our results49:20
Method for the adversarial setting - 149:58
Method for the adversarial setting - 251:11
Method for the adversarial setting - 351:54
Open questions52:23
Conclusions53:40