Dirichlet Processes: Tutorial and Practical Course

author: Yee Whye Teh, University College London
published: Aug. 27, 2007,   recorded: August 2007,   views: 140213


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.

 Watch videos:   (click on thumbnail to launch)

Watch Part 1
Part 1 58:37
Watch Part 2
Part 2 36:06


The Bayesian approach allows for a coherent framework for dealing with uncertainty in machine learning. By integrating out parameters, Bayesian models do not suffer from overfitting, thus it is conceivable to consider models with infinite numbers of parameters, aka Bayesian nonparametric models. An example of such models is the Gaussian process, which is a distribution over functions used in regression and classification problems. Another example is the Dirichlet process, which is a distribution over distributions. Dirichlet processes are used in density estimation, clustering, and nonparametric relaxations of parametric models. It has been gaining popularity in both the statistics and machine learning communities, due to its computational tractability and modelling flexibility.

In the tutorial I shall introduce Dirichlet processes, and describe different representations of Dirichlet processes, including the Blackwell-MacQueen? urn scheme, Chinese restaurant processes, and the stick-breaking construction. I shall also go through various extensions of Dirichlet processes, and applications in machine learning, natural language processing, machine vision, computational biology and beyond.

In the practical course I shall describe inference algorithms for Dirichlet processes based on Markov chain Monte Carlo sampling, and we shall implement a Dirichlet process mixture model, hopefully applying it to discovering clusters of NIPS papers and authors.

See Also:

Download slides icon Download slides: teh_yee_whye_dp_talk.pdf (1.8 MB)

Download article icon Download article: teh_yee_whye_dp_article.pdf (142.3 KB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 Afzal Bhatti, September 18, 2007 at 12:40 p.m.:


Comment2 Prasenjit Mukherjee, September 11, 2008 at 1:26 p.m.:

One of the best tutorial on understanding Gaussian/Dirichlet Distribution/Process.

Comment3 xiaopingzhang, December 26, 2008 at 10:51 a.m.:

Thank you!The tutorial is much help to me because I am studying LDA model.

Comment4 Aditi Gupta, January 17, 2009 at 9:56 p.m.:

Very nice lecture. I really liked how the concepts were introduced and linked together. Very well explained. Thank You!!

Comment5 teddy, July 23, 2009 at 2:38 p.m.:

Shouldn't the formula of posterior over parameters be;

p(w|x,y) = p(w|x)p(y|x,w) / p(y|x)

instead of

p(w|x,y) = p(w)p(y|x,w) / p(y|x)

on slide 5 (time 4:37)?
If not, could anyone kindly tell me why it is ok to take away the conditional of x from the prior?


Comment6 Cauchy, July 26, 2009 at 1:25 p.m.:

Couldn't it be that w is independent with x?

Comment7 Rajib Acharya, March 2, 2010 at 10:23 p.m.:

Very nice. Is the practical session recorded?

Comment8 Brian, April 26, 2010 at 5:15 a.m.:

p(w|x,y)p(x,y) = p(x,y,w) = p(y|x,w)p(x|w)p(w)
using Bayes rule and as previous poster mentioned x indep. of w
p(w|x,y) = p(y|x,w)p(x)p(w) / (p(y|x)p(x))
= p(y|x,w)p(w) / p(y|x)

As in the slides

Comment9 fjanoos, May 1, 2010 at 10:09 p.m.:

In the slide on de Finetti's theorem, he says "if there exists a sequence of thetas that are exchangeable then there exists a *random* probability measure - a random distribution - which makes the theta's iid"

My question is what is a *random* probability measure ? I.e. does the measure itself depend / vary on the underlying sample space X - and if so, how ?

The wikipedia definition of this theorem does not seem to imply any dependence on X. Any clarifications would be appreciated !

Comment10 QiangYou, November 9, 2010 at 3:43 a.m.:

nice talk! a big help for me. ^_^

Comment11 Leigh, October 14, 2020 at 6 p.m.:

I think that this approach is very effective. Actually most of my friends are using it.


Write your own review or comment:

make sure you have javascript enabled or clear this field: