en-de
en-es
en-fr
en-pt
en-sl
en
en-zh
0.25
0.5
0.75
1.25
1.5
1.75
2
Contextual Bandit Algorithms with Supervised Learning Guarantees, incl. discussion by Brendan McMahan
Published on May 06, 20116113 Views
We address the problem of competing with any large set of $N$ policies in the non-stochastic bandit setting, where the learner must repeatedly select among $K$ actions but observes only the reward of
Related categories
Chapter list
Contextual Bandit Algorithms with Supervised Learning Guarantees00:00
Serving Content to Users (1)00:10
Serving Content to Users (2)00:24
Serving Content to Users (3)00:35
Serving Content to Users (4)00:41
Serving Content to Users (5)00:44
Serving Content to Users (6)00:45
Outline (1)01:11
The Contextual Bandit Setting01:18
Regret (1)02:55
Regret (2)04:24
Some Observations05:18
Previous Results05:39
Our Result07:03
Outline (2)07:43
First Some Failed Approaches (1)07:54
First Some Failed Approaches (2)08:24
First Some Failed Approaches (3)08:58
First Some Failed Approaches (4)09:06
epsilon-greedy09:27
Outline (3)10:43
Ideas Behind Exp4.P10:50
Exponential Weight Algorithm for Exploration and Exploitaton with Experts (1)13:12
Exponential Weight Algorithm for Exploration and Exploitaton with Experts (2)14:46
Lemma 115:41
Lemma 216:38
One Problem ...17:39
Results18:19
Outline (4)18:53
Infinitely Many Policies18:58
VC Dimension19:28
VE, an Algorithm for VC Sets19:39
Outline of Analysis of VE20:15
Outline (5)21:35
Experiments on Yahoo! Data21:44
Experimental Results22:46
Summary24:55
Contextual Bandits in Context (1)26:18
Contextual Bandits in Context (2)27:28
Applications28:14
Two general approaches29:16
Contributions29:45
Experiments30:26