
lil’ UCB: An Optimal Exploration Algorithm for Multi-Armed Bandits
Published on Feb 4, 20252153 Views
The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of tot