lil’ UCB: An Optimal Exploration Algorithm for Multi-Armed Bandits

Published on 2014-07-152167 Views

Kevin Jamieson

The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of tot

COLT 2014 - Barcelona

Related categories