## Bayesian inference and Gaussian processes

author: Carl Edward Rasmussen,
Max Planck Institute

published: Aug. 20, 2007, recorded: August 2007, views: 77868

published: Aug. 20, 2007, recorded: August 2007, views: 77868

# Slides

# Related content

# Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our**to describe your request and upload the data.**

__ticket system__*Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.*

# Watch videos:
*(click on thumbnail to launch)*

# Link this page

Would you like to put a link to this lecture on your homepage?

Go ahead! Copy the HTML snippet !

## Reviews and comments:

Kebi_ww, January 14, 2008 at 6:22 p.m.:The guy is actually cool. I like this talk. Thanks for uploading.

Reader, June 3, 2008 at 7:17 p.m.:Slide 6.

"Notice: the likelihood function is a probability distribution over observations, not over parameters."

Likelihood function is a function (probability distribution) of a parameter, not the observations:

L(pi|D) or L(pi) is the likelihood function

p(D|pi) is the condition probability

Ref:

1. R. Hogg, A. Craig Introduction to Mathematical Statistics, 4th Ed 1978, p 202.

2. E. Lehmann, G Cassela Theory of Point Estimation (Springer Texts in Statistics) , 2nd Ed, 2003, p. 238

3. http://en.wikipedia.org/wiki/Likelihood

Reader, June 3, 2008 at 8:03 p.m.:Slide 10.

Usage of the Beta distribution with alpha=beta=1 is not correct description of the Leaner B case: probabilities p(pi=0) and p(pi=1) will never be obtained. Therefore, instead of informative prior (Beta distribution with alpha=beta=1) the non-informative prior (Beta distribution with alpha=beta=0) has to be used.

Reader, July 28, 2009 at 4:17 a.m.:Reader #2 is right!!! Come on, how this guy can say such a blunder!?

Markus, November 1, 2009 at 1:54 a.m.:Readers #2 and #4 have a misunderstanding here: the likelihood function really takes two arguments, observed data and model parameters. It will then give you the probability (up to a proportionality constant) of the observed data given the model parameters, i.e. you obtain a probability distribution _over observations_ given the parameters. You do _not_ get a probability distribution over parameters. This is exactly what the slides say and is perfectly consistent with the references that reader #2 provides. Reader #2 conflates "function of a parameter" and "probability distribution of a parameter", which is clearly wrong here.

Olivier Mgbra, May 18, 2010 at 1:33 a.m.:Reader 5 is right; definitely right; Very good video, I enjoyed it !

Reader, March 17, 2011 at 3:48 p.m.:I agree with reader 5. However, coming from the non Bayesian perspective this got me confused as well. The Bayesian approach assumes a prior over the parameters and outputs a distribution over the parameters given the data. Using this a predictive distribution can be evaluated. The problem is less the optimization than solving an integral. The non Bayesian approach optimizes the model and outputs just one set of parameters.

Reader, June 19, 2014 at 2:09 a.m.:I liked the Vocabulary part (Part 2, 7/10), good comments on confusing notions.

Philipp, September 2, 2019 at 9:37 p.m.:Thank you Mr. Rasmussen for this amazingly intuitive lecture about the basics of Bayesian principles!

At 29:55, the probability after two consecutive heads seems to be 1, given that 0^0 = 1. Hence with pi=k/n=2/2=1, n=2, k=2, yields p = 1^2(1-1)^(2-2).

Or am I missing something here?

## Write your own review or comment: