Contextual Bandit Algorithms with Supervised Learning Guarantees, incl. discussion by Brendan McMahan thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Contextual Bandit Algorithms with Supervised Learning Guarantees, incl. discussion by Brendan McMahan

Published on May 06, 20116116 Views

We address the problem of competing with any large set of $N$ policies in the non-stochastic bandit setting, where the learner must repeatedly select among $K$ actions but observes only the reward of

Related categories

Chapter list

Contextual Bandit Algorithms with Supervised Learning Guarantees00:00
Serving Content to Users (1)00:10
Serving Content to Users (2)00:24
Serving Content to Users (3)00:35
Serving Content to Users (4)00:41
Serving Content to Users (5)00:44
Serving Content to Users (6)00:45
Outline (1)01:11
The Contextual Bandit Setting01:18
Regret (1)02:55
Regret (2)04:24
Some Observations05:18
Previous Results05:39
Our Result07:03
Outline (2)07:43
First Some Failed Approaches (1)07:54
First Some Failed Approaches (2)08:24
First Some Failed Approaches (3)08:58
First Some Failed Approaches (4)09:06
epsilon-greedy09:27
Outline (3)10:43
Ideas Behind Exp4.P10:50
Exponential Weight Algorithm for Exploration and Exploitaton with Experts (1)13:12
Exponential Weight Algorithm for Exploration and Exploitaton with Experts (2)14:46
Lemma 115:41
Lemma 216:38
One Problem ...17:39
Results18:19
Outline (4)18:53
Infinitely Many Policies18:58
VC Dimension19:28
VE, an Algorithm for VC Sets19:39
Outline of Analysis of VE20:15
Outline (5)21:35
Experiments on Yahoo! Data21:44
Experimental Results22:46
Summary24:55
Contextual Bandits in Context (1)26:18
Contextual Bandits in Context (2)27:28
Applications28:14
Two general approaches29:16
Contributions29:45
Experiments30:26