Calibrated fairness in bandits

author: Goran Radanovic, Harvard School of Engineering and Applied Sciences, Harvard University
published: Dec. 1, 2017,   recorded: August 2017,   views: 956

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


We study fairness within the stochastic, multi-armed bandit (MAB) decision making framework. We adapt the fairness framework of “treating similar individuals similarly” [5] to this setting. Here, an ‘individual’ corresponds to an arm and two arms are ‘similar’ if they have a similar quality distribution. First, we adopt a smoothness constraint that if two arms have a similar quality distribution then the probability of selecting each arm should be similar. In addition, we define the fairness regret, which corresponds to the degree to which an algorithm is not calibrated, where perfect calibration requires that the probability of selecting an arm is equal to the probability with which the arm has the best quality realization. We show that a variation on Thompson sampling satisfies smooth fairness for total variation distance, and give an O˜((kT ) 2/3 ) bound on fairness regret. This complements prior work [12], which protects an on-average better arm from being less favored. We also explain how to extend our algorithm to the dueling bandit settiing.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: