event thumbnail image
Machine Learning Summer School 2005 - Chicago
Pascal

Information Geometry

author: Sanjoy Dasgupta, University of California

Description

This tutorial will focus on entropy, exponential families, and information projection. We'll start by seeing the sense in which entropy is the only reasonable definition of randomness. We will then use entropy to motivate exponential families of distributions — which include the ubiquitous Gaussian, Poisson, and Binomial distributions, but also very general graphical models. The task of fitting such a distribution to data is a convex optimization problem with a geometric interpretation as an "information projection": the projection of a prior distribution onto a linear subspace (defined by the data) so as to minimize a particular information-theoretic distance measure. This projection operation, which is more familiar in other guises, is a core optimization task in machine learning and statistics. We'll study the geometry of this problem and discuss two popular iterative algorithms for it.

Categories

Top: Mathematics: Statistics

You might be experiencing some problems with Your Video player.
Slides
0:04 Information geometry
0:14 Learning a distribution
2:29 Outline
2:56 Part I: Entropy
3:01 Formulating the problem
5:11 What is randomness?
7:37 Entropy
10:41 Entropy is concave
11:19 Properties of entropy
13:07 Additivity
14:01 Properties of entropy, cont’d
16:04 KL divergence
18:24 Entropy and KL divergence
19:53 Another justification of entropy
21:27 Asymptotic equipartition
22:55 AEP: examples
24:40 Proof of AEP
26:53 Back to our main question
27:51 Maximum entropy
30:52 Alternative formulation
31:43 A projection operation
35:03 Solution by calculus
37:10 Form of the solution
38:50 Part II: Exponential families
39:06 Exponential families
41:35 Natural parameter space
43:17 Example: Bernoulli
46:09 Parametrization of Bernoulli
47:21 Example: Poisson
49:43 Example: Gaussian
51:41 Properties of exponential families
54:18 Maximum likelihood estimation
56:44 Maximum likelihood, cont’d
58:05 Our toy problem
58:53 The two spaces
60:20 Part III: Information projection
60:32 Back to maximum entropy
62:11 Maximum entropy example
64:08 Maximum entropy: restatement
65:11 Proof
68:36 Geometric interpretation

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

 Watch videos:   (click on thumbnail to launch)

Watch Part 1
Part 1 1:11:36
Flash video Slide Synchronization Windows Media video

!NOW PLAYING
Watch Part 2
Part 2 0:23:45
Flash video Slide Synchronization Windows Media video

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 Frank Nielsen, October 14, 2007 at 4:27 a.m.:

Highly recommend this tutorial that makes the connections clear between entropy, I-distance and I-projection for Bayesian estimation.

It would be good for a second talk to fully start from axiomatization (Csiszar-Bregman'91) and also consider Tsallis entropy and see how things change/invalidate or extend in this case.


Write your own review or comment: