Universal Coding/Prediction and Statistical (In)consistency of Bayesian inference

author: Peter Gr├╝nwald, Centrum Wiskunde & Informatica (CWI)
published: Feb. 25, 2007,   recorded: October 2005,   views: 4602

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Part of this talk is based on results of A. Barron (1986) and recent joint work with J. Langford (2004). We introduce the information-theoretic concepts of universal coding and prediction. Under weak conditions on the prior, Bayesian sequential prediction is universal. This means that a code based on the Bayesian predictive distribution allows one to substantially compress data. We give a simple proof of the fact that universality implies consistency of the Bayesian posterior. It follows that Bayesian inconsistency in nonparametric settings (a la Diaconis & Freedman) can only occur if priors are used that do not allow for data compression. This gives a frequentist rationale for Rissanen's Minimum Description Length Principle. We also show that under misspecification, the Bayesian predictions can substantially outperform the predictions of the best distribution in the model. Ironically, this implies that the Bayesian posterior can become *inconsistent*: in some sense good predictive performance implies inconsistency!

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: