Information Geometry

author: Sanjoy Dasgupta, Department of Computer Science and Engineering, UC San Diego
published: Feb. 25, 2007,   recorded: May 2005,   views: 5248
Categories
You might be experiencing some problems with Your Video player.

Slides

0:04 Slides Information geometry Learning a distribution Outline Part I: Entropy Formulating the problem What is randomness? Entropy Entropy is concave Properties of entropy Additivity Properties of entropy, cont’d KL divergence Entropy and KL divergence Another justification of entropy Asymptotic equipartition AEP: examples Proof of AEP Back to our main question Maximum entropy Alternative formulation A projection operation Solution by calculus Form of the solution Part II: Exponential families Exponential families Natural parameter space Example: Bernoulli Parametrization of Bernoulli Example: Poisson Example: Gaussian Properties of exponential families Maximum likelihood estimation Maximum likelihood, cont’d Our toy problem The two spaces Part III: Information projection Back to maximum entropy Maximum entropy example Maximum entropy: restatement Proof Geometric interpretation

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.

Part 1 1:11:36
!NOW PLAYING

Part 2 23:45

Description

This tutorial will focus on entropy, exponential families, and information projection. We'll start by seeing the sense in which entropy is the only reasonable definition of randomness. We will then use entropy to motivate exponential families of distributions — which include the ubiquitous Gaussian, Poisson, and Binomial distributions, but also very general graphical models. The task of fitting such a distribution to data is a convex optimization problem with a geometric interpretation as an "information projection": the projection of a prior distribution onto a linear subspace (defined by the data) so as to minimize a particular information-theoretic distance measure. This projection operation, which is more familiar in other guises, is a core optimization task in machine learning and statistics. We'll study the geometry of this problem and discuss two popular iterative algorithms for it.

See Also:

Download slides: geometry.ppt (1.7 MB)

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

1 Frank Nielsen, October 14, 2007 at 4:27 a.m.:

Highly recommend this tutorial that makes the connections clear between entropy, I-distance and I-projection for Bayesian estimation.

It would be good for a second talk to fully start from axiomatization (Csiszar-Bregman'91) and also consider Tsallis entropy and see how things change/invalidate or extend in this case.

2 Anand, February 24, 2009 at 7:52 a.m.:

Hi!
I found that the above video lecture by Prof. Dasgupta is quite useful for me. But It found that the streaming is quite slow and stucks in between.

I will appreciate if somebody could help.

Anand

3 sparkle daves, March 10, 2009 at 12:25 a.m.:

this video is the bomb it has helped me a lot pls can u send me a website which i could learn more

4 Leonhard, November 17, 2012 at 2:45 p.m.:

Nice tutorial,
My pc just cant play it properly online. Where can I download it?