Bandit Algorithms for Online Linear Optimization
published: Aug. 26, 2009, recorded: June 2009, views: 5653
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
In the online linear optimization problem a forecaster chooses, at each time instance, a vector x from a certain given subset S of the D-dimensional Euclidean space and suffers a time-dependent loss that is linear in x. The goal of the forecaster is to achieve that, in the long run, the accumulated loss is not much larger than that of the best possible vector in S. In this talk we consider the "bandit" setting of the linear optimization problem, in which the forecaster has only access to the losses of the chosen vectors. We survey some recent algorithms that solve this problem. For the special case in which S is a subset of the d-dimensional Boolean hypercube, we describe a new forecaster whose performance, for a variety of concrete choices of S, is better than all previously known bounds, and not improvable in general. We also point out that computationally efficient implementations for various interesting choices of S exist. Joint work with Gabor Lugosi (Barcelona).
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !