Strategic Considerations in Bandit Settings - The Price of Truthfulness in Online Ad Auctions
published: Aug. 26, 2009, recorded: June 2009, views: 3530
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Research at the intersection of machine learning and economics has flourished in recent years due to the realization that many technological systems (such as the internet) are better understood and managed when they are viewed as economic systems rather than just merely technological ones. In particular, there is much recent work on understanding learning algorithms and models in settings where the strategic considerations of the participants must be taken into account (e.g. spam detection). This talk will examine the this broader issue in the setting of online ad-auctions (in particular "pay-per-click" auctions, where advertisers are charged only those rounds when their ad is clicked on). Designing such an auction faces the classic explore/exploit dilemma: while gathering information about the "click-through-rates" of advertisers, the mechanism may loose revenue; however, this gleaned information may prove valuable in the future for a more profitable allocation. In this sense, such mechanisms are prime candidates to be designed using multi-armed bandit techniques. However, a naive application of multi-armed bandit algorithms would not take into account the strategic considerations of the players - players might manipulate their bids (which determine the auction's revenue) in a way as to maximize their own utility. Hence, we consider the natural restriction that the auction be truthful. This work sharply characterizes what regret is achievable, under this truthful restriction. Interestingly, we show that this restriction imposes statistical limits on the achievable regret - that it is Theta(T^2/3), while for traditional bandit algorithms (without this truthful restriction) the achievable regret is O(T^1/2) (where T is the number of rounds). We term the extra T^1/6 factor, the "price of truthfulness".
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !