Fast rates for the multi-armed bandit
author: Sébastien Bubeck,
Department of Operations Research and Financial Engineering, Princeton University
published: Oct. 6, 2014, recorded: December 2013, views: 2210
published: Oct. 6, 2014, recorded: December 2013, views: 2210
Related content
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Description
Since the seminal work of Lai and Robbins (1985) we know bandit strategies with normalized regret of order (i) 1/sqrt(T) for any stochastic bandit, and (ii) log(T) / T for 'benign' distributions. In Bubeck and Slivkins (2012) we designed a new strategy which extends property (i) to adversarial bandits while still having the fast rate given in (ii). I will present this algorithm and I will also discuss the possibility of even faster rates of order 1/T when extra information is available.
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !
Write your own review or comment: