Large-scale Bayesian Inference for Collaborative Filtering
published: Dec. 31, 2007, recorded: December 2007, views: 460
Slides
Related content
21:22
157 views - Libertad Tansini, 2006
16:56
320 views - Devdatt Dubhashi, 2006
46:57
607 views - John Winn, 2007
02:03
113 views - Ulrich Paquet, 2007
04:59:19
18285 views - Sam Roweis, 2006
01:00:47
12472 views - David MacKay, 2006
01:41
61 views - Arthur Choi, 2007
05:02:32
3625 views - Carl Edward Rasmussen, 2007
15:45
112 views - Qiang Yang, Bin Cao, Jian Tao Sun, Jianmin Wu, Zheng Chen, 2008
01:00:21
105 views - Ole Winther, 2008
Report a problem or upload files
If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Description
The Netflix prize problem provides an excellent testing ground for machine learning. The problem is large scale and the data complex and noisy. It is therefore likely that relatively complex models with careful regularization are needed in order to get reasonable predictions. A Bayesian modeling approach seems ideal for the task if it is possible to scale it up to the size of the Netflix data set, where extremely high-dimensional Bayesian expectations will possibly have to be approximated. In this talk, an ordinal regression low-rank matrix decomposition model is presented. We use a variational Bayes (VB) inference algorithm to demonstrate that it is possible to make a large scale Bayesian algorithm. This model also highlight some of the general limitations of VB. The more accurate expectation propagation/expectation consistent (EP/C) inference cannot be applied to this bi-linear model without further approximations. We therefore propose a hybrid approach with EP/C inspired modifications of the VB algorithm. We compare the different variational approximations with a Laplace approximation, a MAP approximation and a Hamiltonian MCMC. In the latter one sample takes around 6 hours of computing time on a 1GHz processor, with fast C++ code, so there is a very clear case to be made for deterministic approximate inference. Another good feature of the Netflix data is the magnitude of the the test set which makes even small differences in the performance significant.
See Also:
Download slides:
abi07_winther_lsb_01.pdf (123.9 KB)
Launch in a standalone WM Player
Switch to Windows Media Player
Link this page
Would you like to put a link to this lecture on your homepage?Go ahead! Copy the HTML snippet !




Write your own review or comment: