Estimating Rates of Rare Events with Multiple Hierarchies through Scalable Log-linear Models
published: Oct. 1, 2010, recorded: July 2010, views: 5266
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
We consider the problem of estimating rates of rare events for high dimensional, multivariate categorical data where several dimensions are hierarchical. Such problems are routine in several data mining applications including computational advertising, our main focus in this paper. We propose \NEWMODEL, a novel log-linear modeling method that scales to massive data applications with billions of training records and several million potential predictors in a map-reduce framework. Our method exploits correlations in aggregates observed at multiple resolutions when working with multiple hierarchies; stable estimates at coarser resolution provide informative prior information to improve estimates at finer resolutions. Other than prediction accuracy and scalability, our method has an inbuilt variable screening procedure based on a ``spike and slab prior'' that provides parsimony by removing non-informative predictors without hurting predictive accuracy. We perform large scale experiments on data from real computational advertising applications and illustrate our approach on datasets with several billion records and hundreds of millions of predictors. Extensive comparisons with other benchmark methods show significant improvements in prediction accuracy.
Download slides: kdd2010_agarwal_erre_01.pdf (701.3 KB)
Download slides: kdd2010_agarwal_erre_01.ppt (1.3 MB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !