Fast rates for the multi-armed bandit
Published on Oct 06, 20142233 Views
Since the seminal work of Lai and Robbins (1985) we know bandit strategies with normalized regret of order (i) 1/sqrt(T) for any stochastic bandit, and (ii) log(T) / T for 'benign' distributions. In B