Contextual Bandit Algorithms with Supervised Learning Guarantees, incl. discussion by Brendan McMahan

Published on 2011-05-066137 Views

Brendan McMahan

Lev Reyzin

We address the problem of competing with any large set of $N$ policies in the non-stochastic bandit setting, where the learner must repeatedly select among $K$ actions but observes only the reward of

AISTATS 2011 - Ft. Lauderdale

Related categories

Presentation

Contextual Bandit Algorithms with Supervised Learning Guarantees00:00

Serving Content to Users (1)00:10

Serving Content to Users (2)00:24

Serving Content to Users (3)00:35

Serving Content to Users (4)00:41

Serving Content to Users (5)00:44

Serving Content to Users (6)00:45

Outline (1)01:11

The Contextual Bandit Setting01:18

Regret (1)02:55

Regret (2)04:24

Some Observations05:18

Previous Results05:39

Our Result07:03

Outline (2)07:43

First Some Failed Approaches (1)07:54

First Some Failed Approaches (2)08:24

First Some Failed Approaches (3)08:58

First Some Failed Approaches (4)09:06

epsilon-greedy09:27

Outline (3)10:43

Ideas Behind Exp4.P10:50

Exponential Weight Algorithm for Exploration and Exploitaton with Experts (1)13:12

Exponential Weight Algorithm for Exploration and Exploitaton with Experts (2)14:46

Lemma 115:41

Lemma 216:38

One Problem ...17:39

Results18:19

Outline (4)18:53

Infinitely Many Policies18:58

VC Dimension19:28

VE, an Algorithm for VC Sets19:39

Outline of Analysis of VE20:15

Outline (5)21:35

Experiments on Yahoo! Data21:44

Experimental Results22:46

Summary24:55

Contextual Bandits in Context (1)26:18

Contextual Bandits in Context (2)27:28

Applications28:14

Two general approaches29:16

Contributions29:45

Experiments30:26