0.25
0.5
0.75
1.25
1.5
1.75
2
Tutorial on Neural Network Optimization Problems
Published on Sep 13, 201523304 Views
Related categories
Chapter list
Tutorial on Neural Network Optimization Problems00:00
In this presentation...01:47
Optimization03:20
Derivatives and Second Derivatives05:15
Directional Curvature08:16
Taylor series approximation14:20
How much does a gradient step improve?16:50
Critical points20:35
Newton’s method23:11
Newton’s method’s failure mode25:37
The old myth of SGD convergence26:06
The new myth of SGD convergence27:22
Some functions lack critical points27:58
SGD may not encounter critical points29:07
Poor conditioning - 136:20
Poor conditioning - 237:29
Why convergence may not happen43:47
Are saddle points or local minima more common?47:10
Do neural nets have saddle points? - 149:58
Do neural nets have saddle points? - 255:13
Gradient descent flees saddle points01:03:35
The state of modern optimization01:04:52
Why is optimization so slow?01:08:35
Linear view of the difficulty01:11:28
Factored linear loss function 01:14:43
Attractive saddle points and plateaus01:15:48
Questions for visualization01:15:54
History written by the winners01:15:55
2D Subspace Visualization01:15:58
A Special 1-D Subspace01:17:04
Maxout / MNIST experiment01:17:56
Other activation functions01:18:52
Convolutional network01:19:01
Sequence model (LSTM)01:19:39
Generative model (MP-DBM)01:19:56
3-D Visualization01:20:17
3-D Visualization of MP-DBM01:21:42
Random walk control experiment01:21:50
3-D plots without obstacles01:21:51
3-D plot of adversarial maxout01:22:24
Lessons from visualizations01:22:52
Conclusion01:23:22