A simple multi-armed bandit algorithm with optimal variation-bounded regret

Published on 2011-08-024693 Views

Elad Hazan

We pose the question of whether it is possible to design a simple, linear-time algorithm for the basic multi-armed bandit problem in the adversarial setting which has a regret bound of O(√QlogT), wh

COLT 2011 - Budapest

Related categories

Presentation

A simple MAB algorithm with optimal variation bounded regret00:00

Multi-Armed Bandits00:20

State Of the art01:14

Can we be more optimal?03:14

Known Variational bounds/bandits04:45

The Question06:14