## MDL Tutorial

author: Peter Grünwald, Center for Mathematics and Computer Science - CWI
published: Aug. 12, 2008,   recorded: July 2008,   views: 1500
Categories
You might be experiencing some problems with Your Video player.

# Slides

0:00 Slides Universal Modeling: Introduction to ‘Modern’ MDL Overview Minimum Description Length Principle (1) Minimum Description Length Principle (2) Minimum Description Length Principle (3) Model Selection Modern’ MDL? Modern MDL! Overview Codes Code Length & Probability Code Lengths ‘are’ probabilities… …and probabilities ‘are’ code lengths! The Most Important Slide! (1) The Most Important Slide! (2) Remarks The Most Important Slide! (2) Remarks Overview Universal Codes (1) Universal Codes (2) Universal Codes (3) Universal Models Terminology Bayesian Mixtures are universal models (1) Bayesian Mixtures are universal models (2) 2-part MDL code is a universal model (code) 2-part vs. Bayes universal models Bayesian Mixtures are universal models (2) 2-part vs. Bayes universal models Optimal Universal Model Optimal Universal Model - II MDL Model Selection (1) MDL Model Selection (2) Four Interpretations Counting Interpretation of MDL (1) Counting Interpretation of MDL (2) Counting Interpretation of MDL (3) Parametric Model Classes Geometric Interpretation of MDL (1) Geometric Interpretation of MDL (2) Bayesian Model Selection vs. MDL (1) Geometric Interpretation of MDL (2) Bayesian Model Selection vs. MDL (1) Bayesian Model Selection vs. MDL (2) Bayes and MDL, remarks Further topics Predictive Interpretation Predictive Interpretation, II Predictive Interpretation, III Predictive Interpretation, IV Comparing infinitely many models (1) Comparing infinitely many models (2) Overview New Developments Luckiness Principle (2)

# Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

# Description

We give a self-contained tutorial on the Minimum Description Length (MDL) approach to modeling, learning and prediction. We focus on the recent (post 1995) formulations of MDL, which can be quite different from the older methods that are often still called 'MDL' in the machine learning and UAI communities.

In its modern guise, MDL is based on the concept of a `universal model'. We explain this concept at length. We show that previous versions of MDL (based on so-called two-part codes), Bayesian model selection and predictive validation (a variation of cross-validation) can all be interpreted as approximations to model selection based on 'universal models'. Modern MDL prescribes the use of a certain `optimal' universal model, the so-called `normalized maximum likelihood model' or `Shtarkov distribution'. This is related to (yet different from) Bayesian model selection with non-informative priors. It leads to a penalization of `complex' models that can be given an intuitive differential-geometric interpretation. Roughly speaking, the complexity of a parametric model is directly related to the number of distinguishable probability distributions that it contains. We also discuss some recent extensions such as the 'luckiness principle', which can be used if the Shtarkov distribution is undefined, and the 'switch distribution', which allows for a resolution of the AIC-BIC dilemma.