event thumbnail image
Machine Learning Summer School 2005 - Chicago

Information Geometry

author: Sanjoy Dasgupta, University of California

Description

This tutorial will focus on entropy, exponential families, and information projection. We'll start by seeing the sense in which entropy is the only reasonable definition of randomness. We will then use entropy to motivate exponential families of distributions — which include the ubiquitous Gaussian, Poisson, and Binomial distributions, but also very general graphical models. The task of fitting such a distribution to data is a convex optimization problem with a geometric interpretation as an "information projection": the projection of a prior distribution onto a linear subspace (defined by the data) so as to minimize a particular information-theoretic distance measure. This projection operation, which is more familiar in other guises, is a core optimization task in machine learning and statistics. We'll study the geometry of this problem and discuss two popular iterative algorithms for it.

Categories

Top: Mathematics: Statistics

You might be experiencing some problems with Your Video player.
Slides
0:04 Information geometry
0:14 Learning a distribution
2:29 Outline
2:56 Part I: Entropy
3:01 Formulating the problem
5:11 What is randomness?
7:37 Entropy
10:41 Entropy is concave
11:19 Properties of entropy
13:07 Additivity
14:01 Properties of entropy, cont’d
16:04 KL divergence
18:24 Entropy and KL divergence
19:53 Another justification of entropy
21:27 Asymptotic equipartition
22:55 AEP: examples
24:40 Proof of AEP
26:53 Back to our main question
27:51 Maximum entropy
30:52 Alternative formulation
31:43 A projection operation
35:03 Solution by calculus
37:10 Form of the solution
38:50 Part II: Exponential families
39:06 Exponential families
41:35 Natural parameter space
43:17 Example: Bernoulli
46:09 Parametrization of Bernoulli
47:21 Example: Poisson
49:43 Example: Gaussian
51:41 Properties of exponential families
54:18 Maximum likelihood estimation
56:44 Maximum likelihood, cont’d
58:05 Our toy problem
58:53 The two spaces
60:20 Part III: Information projection
60:32 Back to maximum entropy
62:11 Maximum entropy example
64:08 Maximum entropy: restatement
65:11 Proof
68:36 Geometric interpretation

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

 Watch videos:   (click on thumbnail to launch)

Watch Part 1
Part 1 1:11:36
Flash video Windows Media video

!NOW PLAYING
Watch Part 2
Part 2 0:23:45
Flash video Windows Media video

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 Frank Nielsen, October 14, 2007 at 4:27 a.m.:

Highly recommend this tutorial that makes the connections clear between entropy, I-distance and I-projection for Bayesian estimation.

It would be good for a second talk to fully start from axiomatization (Csiszar-Bregman'91) and also consider Tsallis entropy and see how things change/invalidate or extend in this case.


Comment2 Anand, February 24, 2009 at 7:52 a.m.:

Hi!
I found that the above video lecture by Prof. Dasgupta is quite useful for me. But It found that the streaming is quite slow and stucks in between.

I will appreciate if somebody could help.

Anand


Comment3 sparkle daves, March 10, 2009 at 12:25 a.m.:

this video is the bomb it has helped me a lot pls can u send me a website which i could learn more

Write your own review or comment:

make sure you have javascript enabled or clear this field: