Deep Belief Networks

author: Geoffrey E. Hinton, Department of Computer Science, University of Toronto
published: Nov. 2, 2009,   recorded: September 2009,   views: 7592
Categories
You might be experiencing some problems with Your Video player.

Slides

Slides
0:00 Deep Belief Nets
0:16 some things you will learn in this tutorial
1:13 A spectrum of machine learning tasks
2:21 Historical background
2:53 Second generation neutral networks
4:12 A temporary digression
4:59 What is wrong with back-propagation
5:48 Overcoming the limitations of back-propagation
5:59 Belief Nets
6:58 Stochastic binary units
7:19 Learning Deep Belief Nets
8:06 The learning rule for sigmoid belief nets
9:16 Explaining away
10:55 Why is it usually verry hard to learn sigmoid belief nets one layer at a time
17:02 Some methods of learning deep belief nets
17:21 The breakthrough that makes deep learning efficient
17:21 Restricted Boltzmann Machines
19:04 The Energy of a joint configuration
20:03 Weights- Energies- Probabilities
20:32 Restricted Boltzmann Machines
20:54 Weights- Energies- Probabilities
21:06 Using energies to define probabilities
21:35 A picture of the maximum likelihood learning algorithm for an RBM
23:53 A quick way to learn an RBM
23:57 A picture of the maximum likelihood learning algorithm for an RBM
24:20 A quick way to learn an RBM
25:37 How to learn a set of features that are good for reconstructing images of the digit 2
26:54 The final 50 x 256 weights
27:40 How well can we reconstruct the digit images from the binary feature activations?
29:31 Three ways to combine probability density models
32:15 Training a deep network
34:23 The generative model after learning 3 layers
36:53 Why does greedy learning work -1
38:43 Why does greedy learning work -2
40:29 Why does greedy learning work -3
41:54 Which distributions are factorial in a directed belief net?
41:55 Why does greedy learning fail in a directed module?
43:19 A model of digit recognition
44:05 Fine tuning with a contrastive version of the "wake-sleep" algorithm
44:08 Show the movie of the network generating digits
53:51 Samples
54:32 Examples
54:42 How well does it disciminate on MNIST test set with no extra information about geometric distortions?
56:11 Unsupervised "pre-training" also helps for models that have more data and better priors
56:41 Another view of why layer-by-layer learning works
57:00 An infinite sigmoid belief net that is equivalent to an RBM
59:26 Inference in a directed net with replicated weighs
62:15 Picture -1
65:58 Learning a deep directed network
66:15 Picture -2
66:33 How manny layers should we use and how wide should they be?
66:51 What happens when the weights in higher layers become different from teh weights in the first layer?
67:11 Picture -2
68:22 How manny layers should we use and how wide should they be?
68:24 What happens when the weights in higher layers become different from teh weights in the first layer?
68:29 An improved version of Contrastive Divergence learning
73:20 How persistent CD moves between the models of the model's distribution
74:56 Summary
76:06 Break

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.
 
    Delicious Bibliography

 Watch videos:   (click on thumbnail to launch)

Watch Part 1
Part 1 1:26:55
!NOW PLAYING
Watch Part 2
Part 2 1:28:36

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Reviews and comments:

Comment1 Paul Hodgkinson, November 3, 2011 at 11:50 a.m.:

Great lecture, marred slightly by poor camera work: not enough focus on the slides.


Comment2 Chris, November 3, 2011 at 11:21 p.m.:

While an interesting talk, there are some historical notes that are not quite accurate. A lot more work was done in the 1980's than mentioned here.

But first let me explain we have to take into the account the political and cold war status that existed at the time. The being the existence and proliferation of nuclear weapons. Often in job interviews, one was at that time asked if you had moral objections in working in this field.

Further to that the technology actually was very advanced, in some ways it was perfect for message passing, because there existed the cpu called the transputer. That was a cpu that was a hybrid designed to do just that, pass messages at extremely high speed. With the significant benefit of be able to connect extra cpu's with great ease and allowing very large arrays of them to be build, only further enhancing the capabilities.

So why is this progress not seen today? Simply put there was a phrase that maybe was stuck in our minds more than today's culture, that being a quote from Oppenheimer, "I have become the destroyer of worlds".

With the military extremely interested in this technology applications, a lot of people just left the field altogether.

I posted this because yes, the technology is exciting, yes the potential is great, but the moral issues today are just as valid as the ones in the past, in fact with financial markets using this technology the risk in terms of damage has just shifted, and we can see what that has caused just recently.

So do think about what the consequences of what your work will cause, before you do it, not like Oppenheimer who only saw the consequences after Hiroshima was obliterated.

Write your own review or comment:

make sure you have javascript enabled or clear this field: