Tutorial on Neural Network Optimization Problems

Published on 2015-09-1323779 Views

Deep Learning Summer School 2015 - Montreal

Tutorial on Neural Network Optimization Problems00:00

In this presentation...01:47

Optimization03:20

Derivatives and Second Derivatives05:15

Directional Curvature08:16

Taylor series approximation14:20

How much does a gradient step improve?16:50

Critical points20:35

Newton’s method23:11

Newton’s method’s failure mode25:37

The old myth of SGD convergence26:06

The new myth of SGD convergence27:22

Some functions lack critical points27:58

SGD may not encounter critical points29:07

Poor conditioning - 136:20

Poor conditioning - 237:29

Why convergence may not happen43:47

Are saddle points or local minima more common?47:10

Do neural nets have saddle points? - 149:58

Do neural nets have saddle points? - 255:13

Gradient descent flees saddle points01:03:35

The state of modern optimization01:04:52

Why is optimization so slow?01:08:35

Linear view of the difficulty01:11:28

Factored linear loss function 01:14:43

Attractive saddle points and plateaus01:15:48

Questions for visualization01:15:54

History written by the winners01:15:55

2D Subspace Visualization01:15:58

A Special 1-D Subspace01:17:04

Maxout / MNIST experiment01:17:56

Other activation functions01:18:52

Convolutional network01:19:01

Sequence model (LSTM)01:19:39

Generative model (MP-DBM)01:19:56

3-D Visualization01:20:17

3-D Visualization of MP-DBM01:21:42

Random walk control experiment01:21:50

3-D plots without obstacles01:21:51

3-D plot of adversarial maxout01:22:24

Lessons from visualizations01:22:52

Conclusion01:23:22