The Selective Labels Problem: Evaluating Algorithmic Predictions in the Presence of Unobservables

author: Himabindu Lakkaraju, Computer Science Department, Stanford University
published: Oct. 9, 2017,   recorded: August 2017,   views: 925

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Evaluating whether machines improve on human performance is one of the central questions of machine learning. However, there are many domains where the data is {\em selectively labeled} in the sense that the observed outcomes are themselves a consequence of the existing choices of the human decision-makers. For instance, in the context of judicial bail decisions, we observe the outcome of whether a defendant fails to return for their court appearance only if the human judge decides to release the defendant on bail. Comparing the performance of humans and machines on data with this type of bias can lead to erroneous estimates and wrong conclusions. Here we propose a novel framework for evaluating the performance of predictive models on selectively labeled data. We develop an evaluation methodology that is robust to the presence of unmeasured confounders (unobservables). We propose a metric that allows us to evaluate the effectiveness of any given black-box predictive model and benchmark it against the performance of human decision-makers. We also develop an approach called \emph{contraction} which allows us to compute this metric without resorting to counterfactual inference by exploiting the heterogeneity of human decision-makers. Experimental results on real world datasets spanning diverse domains such as health care, insurance, and criminal justice demonstrate the utility of our evaluation metric in comparing human decisions and machine predictions. Experiments on synthetic data also show that our contraction technique produces accurate estimates of our evaluation metric.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: