Use of variance estimation in the multi-armed bandit problem

Published on 2007-02-254544 Views

Jean Yves Audibert

An important aspect of most decision making problems concerns the appropriate balance between exploitation (acting optimally according to the partial knowledge acquired so far) and exploration of th

NIPS Workshops 2006 - Whistler

Related categories

Presentation

Use of variance estimation in the multi-armed<br> bandit problem00:01

Outline00:20

The multi-armed bandit problem00:41

Notation (1/2)01:33

Notation (2/2)02:32

UCB policies05:45

UCB policies0106:27

Bernstein’s type inequalities07:51

Sketch of the proof09:39

Sketch of the proof0111:03

Definition11:33

Definition0112:10

Definition0212:28

A deviation inequality for the number of plays of<br> non-optimal arms13:37

A deviation inequality for the number of plays of non-optimal arms0114:03

A deviation inequality for the number of plays of <br>non-optimal arms0214:48

Cumulative regret bounds14:56

Cumulative regret bounds0118:16

Cumulative regret bounds0218:55

Discussion on the 1/n-UCB policy19:49

Discussion on the 1/n-UCB policy0120:21

Discussion on the 1/n-UCB policy0220:27

Expected cumulative regret bound21:30

Expected cumulative regret bound0122:10

Sketch of the proof (1/3)22:47

Sketch of the proof (2/3)23:15

Sketch of the proof (3/3)23:53

Conclusion24:12