Fast rates for the multi-armed bandit
published: Oct. 6, 2014, recorded: December 2013, views: 2209
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Since the seminal work of Lai and Robbins (1985) we know bandit strategies with normalized regret of order (i) 1/sqrt(T) for any stochastic bandit, and (ii) log(T) / T for 'benign' distributions. In Bubeck and Slivkins (2012) we designed a new strategy which extends property (i) to adversarial bandits while still having the fast rate given in (ii). I will present this algorithm and I will also discuss the possibility of even faster rates of order 1/T when extra information is available.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !