A simple multi-armed bandit algorithm with optimal variation-bounded regret  thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

A simple multi-armed bandit algorithm with optimal variation-bounded regret

Published on Aug 02, 20114671 Views

We pose the question of whether it is possible to design a simple, linear-time algorithm for the basic multi-armed bandit problem in the adversarial setting which has a regret bound of O(√QlogT), wh

Related categories

Chapter list

A simple MAB algorithm with optimal variation bounded regret00:00
Multi-Armed Bandits00:20
State Of the art01:14
Can we be more optimal?03:14
Known Variational bounds/bandits04:45
The Question06:14