MDL Tutorial
published: Aug. 12, 2008, recorded: July 2008, views: 437
Slides
Related content
03:04:09
364 views - Peter Grünwald, 2003
56:52
335 views - Peter Grünwald, 2008
08:09
54 views - Peter Grünwald, 2008
04:59:19
18460 views - Sam Roweis, 2006
29:51
43 views - Petri Myllymäki, 2008
01:02:54
82 views - Tong Zhang, 2004
26:08
117 views - Mike Dowman, 2007
25:38
112 views - Peter Grünwald, Petri Myllymäki, 2008
22:59
566 views - Adam Coates, 2008
01:47:07
2779 views - Joaquin Quiñonero Candela, 2007
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Description
We give a self-contained tutorial on the Minimum Description Length (MDL) approach to modeling, learning and prediction. We focus on the recent (post 1995) formulations of MDL, which can be quite different from the older methods that are often still called 'MDL' in the machine learning and UAI communities.
In its modern guise, MDL is based on the concept of a `universal model'. We explain this concept at length. We show that previous versions of MDL (based on so-called two-part codes), Bayesian model selection and predictive validation (a variation of cross-validation) can all be interpreted as approximations to model selection based on 'universal models'. Modern MDL prescribes the use of a certain `optimal' universal model, the so-called `normalized maximum likelihood model' or `Shtarkov distribution'. This is related to (yet different from) Bayesian model selection with non-informative priors. It leads to a penalization of `complex' models that can be given an intuitive differential-geometric interpretation. Roughly speaking, the complexity of a parametric model is directly related to the number of distinguishable probability distributions that it contains. We also discuss some recent extensions such as the 'luckiness principle', which can be used if the Shtarkov distribution is undefined, and the 'switch distribution', which allows for a resolution of the AIC-BIC dilemma.
See Also:
Download slides:
icml08_grunwald_mld_01.pdf (237.6 KB)
Download slides:
icml08_grunwald_mld_01.ppt (889.5 KB)
Launch in a standalone WM Player
Switch to Windows Media Player
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !




Write your own review or comment: