We need a BIT more GUTS (Grand Unified Theory of Statistics)
published: Jan. 25, 2012, recorded: December 2011, views: 237
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
A remarkable variety of problems in machine learning and statistics can be recast as data compression under constraints: (1) sequential prediction with arbitrary loss functions can be transferred to equivalent log loss (data compression) problems. The worst-case optimal regret for the original loss is determined by Vovk’s mixability, which in fact measures how many bits we lose if we are not allowed to use mixture codes in the compression formulation. (2) in classification, we can map each set of candidate classifiers C to a corresponding probability model M. Tsybakov’s condition (which determines the optimal convergence rate) turns out to measure how much more we can compress data by coding it using the convex hul of M rather than just M. (3) hypothesis testing in the applied sciences is usually based on p-values, a brittle and much-criticized approach. Berger and Vovk independently proposed calibrated p-values, which are much more robust. Again we show these have a data compression interpretation. (4) Bayesian nonparametric approaches usually work well, but fail dramatically in Diaconis and Freedman’s pathological cases. We show that in these cases (and only in these) the Bayesian predictive distribution does not compress the data. We speculate that all this points towards a general theory that goes beyond standard MDL and Bayes.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !