lil’ UCB: An Optimal Exploration Algorithm for Multi-Armed Bandits
Published on Jul 15, 20142150 Views
The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of tot