Information Geometry
Description
This tutorial will focus on entropy, exponential families, and information projection. We'll start by seeing the sense in which entropy is the only reasonable definition of randomness. We will then use entropy to motivate exponential families of distributions — which include the ubiquitous Gaussian, Poisson, and Binomial distributions, but also very general graphical models. The task of fitting such a distribution to data is a convex optimization problem with a geometric interpretation as an "information projection": the projection of a prior distribution onto a linear subspace (defined by the data) so as to minimize a particular information-theoretic distance measure. This projection operation, which is more familiar in other guises, is a core optimization task in machine learning and statistics. We'll study the geometry of this problem and discuss two popular iterative algorithms for it.
| Slides | |
| 0:04 | Information geometry |
| 0:14 | Learning a distribution |
| 2:29 | Outline |
| 2:56 | Part I: Entropy |
| 3:01 | Formulating the problem |
| 5:11 | What is randomness? |
| 7:37 | Entropy |
| 10:41 | Entropy is concave |
| 11:19 | Properties of entropy |
| 13:07 | Additivity |
| 14:01 | Properties of entropy, cont’d |
| 16:04 | KL divergence |
| 18:24 | Entropy and KL divergence |
| 19:53 | Another justification of entropy |
| 21:27 | Asymptotic equipartition |
| 22:55 | AEP: examples |
| 24:40 | Proof of AEP |
| 26:53 | Back to our main question |
| 27:51 | Maximum entropy |
| 30:52 | Alternative formulation |
| 31:43 | A projection operation |
| 35:03 | Solution by calculus |
| 37:10 | Form of the solution |
| 38:50 | Part II: Exponential families |
| 39:06 | Exponential families |
| 41:35 | Natural parameter space |
| 43:17 | Example: Bernoulli |
| 46:09 | Parametrization of Bernoulli |
| 47:21 | Example: Poisson |
| 49:43 | Example: Gaussian |
| 51:41 | Properties of exponential families |
| 54:18 | Maximum likelihood estimation |
| 56:44 | Maximum likelihood, cont’d |
| 58:05 | Our toy problem |
| 58:53 | The two spaces |
| 60:20 | Part III: Information projection |
| 60:32 | Back to maximum entropy |
| 62:11 | Maximum entropy example |
| 64:08 | Maximum entropy: restatement |
| 65:11 | Proof |
| 68:36 | Geometric interpretation |
Lecture rating
| People found this lecture: | ||
| Worth seeing | ||
| because it is: | ||
| Valuable and informative | ||
| Well presented | ||
| Easily understandable | ||
| Acceptably recorded | ||
| You need to login to cast your vote. | ||
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Related content
SEE ALSO:
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !







Highly recommend this tutorial that makes the connections clear between entropy, I-distance and I-projection for Bayesian estimation.
It would be good for a second talk to fully start from axiomatization (Csiszar-Bregman'91) and also consider Tsallis entropy and see how things change/invalidate or extend in this case.