Bayesian inference and Gaussian processes
author: Carl Edward Rasmussen,
Max Planck Institute
published: Aug. 20, 2007, recorded: August 2007, views: 9261
published: Aug. 20, 2007, recorded: August 2007, views: 9261
You might be experiencing some problems with Your Video player.
Slides
Related content
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Watch videos: (click on thumbnail to launch)
See Also:
Launch in a standalone WM Player
Switch to Windows Media Player
Download slides:
rasmussen_carl_edward_bigp_00.pdf (1.5 MB)
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !









Reviews and comments:
The guy is actually cool. I like this talk. Thanks for uploading.
Slide 6.
"Notice: the likelihood function is a probability distribution over observations, not over parameters."
Likelihood function is a function (probability distribution) of a parameter, not the observations:
L(pi|D) or L(pi) is the likelihood function
p(D|pi) is the condition probability
Ref:
1. R. Hogg, A. Craig Introduction to Mathematical Statistics, 4th Ed 1978, p 202.
2. E. Lehmann, G Cassela Theory of Point Estimation (Springer Texts in Statistics) , 2nd Ed, 2003, p. 238
3. http://en.wikipedia.org/wiki/Likelihood
Slide 10.
Usage of the Beta distribution with alpha=beta=1 is not correct description of the Leaner B case: probabilities p(pi=0) and p(pi=1) will never be obtained. Therefore, instead of informative prior (Beta distribution with alpha=beta=1) the non-informative prior (Beta distribution with alpha=beta=0) has to be used.
Reader #2 is right!!! Come on, how this guy can say such a blunder!?
Readers #2 and #4 have a misunderstanding here: the likelihood function really takes two arguments, observed data and model parameters. It will then give you the probability (up to a proportionality constant) of the observed data given the model parameters, i.e. you obtain a probability distribution _over observations_ given the parameters. You do _not_ get a probability distribution over parameters. This is exactly what the slides say and is perfectly consistent with the references that reader #2 provides. Reader #2 conflates "function of a parameter" and "probability distribution of a parameter", which is clearly wrong here.
Reader 5 is right; definitely right; Very good video, I enjoyed it !
I agree with reader 5. However, coming from the non Bayesian perspective this got me confused as well. The Bayesian approach assumes a prior over the parameters and outputs a distribution over the parameters given the data. Using this a predictive distribution can be evaluated. The problem is less the optimization than solving an integral. The non Bayesian approach optimizes the model and outputs just one set of parameters.
Write your own review or comment: