Optimization I thumbnail
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Optimization I

Published on Oct 11, 20182967 Views

Related categories

Chapter list

Tutorial on: Optimization I00:00
Outline00:03
Neural networks00:23
Why is learning difficult - 101:22
Why is learning difficult - 203:03
Why is learning difficult - 305:44
How to train neural networks with random search - 106:23
How to train neural networks with random search - 207:23
How to train neural networks with random search - 308:27
How to train neural networks with random search - 409:23
How well does random search work? - 111:21
How well does random search work? - 213:07
Gradient descent and back-propagation14:08
Gradient descent - 114:40
Gradient descent - 215:15
How well does gradient descent work? - 116:56
How well does gradient descent work? - 217:49
Momentum: smooth gradient with moving average20:54
Stochastic gradient descent: improve efficiency - 122:34
Stochastic gradient descent: improve efficiency - 224:08
Stochastic gradient descent: improve efficiency - 324:59
Revisit gradient descent - 125:49
Revisit gradient descent - 226:11
Natural gradient descent27:25
Fisher information matrix - 128:30
Fisher information matrix - 230:53
When first-order methods fails32:42
Second-order optimization algorithms33:02
Second-order method algorithms36:03
Find a good preconditioning matrix38:40
When first-order methods work well - 139:31
When first-order methods work well - 240:13
When first-order methods fail40:45
Learning on a single machine41:24
Distributed learning42:54
Here is the training plot of a state-of-the-art ResNet trained on 8 GPUs43:53
Scalability of the “black-box” optimization algorithms44:56
Background: Natural gradient for neural networks - 146:30
Background: Natural gradient for neural networks - 247:01
Background: Natural gradient for neural networks - 347:28
Background: Kronecker-factored natural gradient - 147:36
Background: Kronecker-factored natural gradient - 247:40
Background: Kronecker-factored natural gradient - 347:54
Distributed K-FAC natural gradient49:40
Scalability experiments51:59
Second-order method algorithms54:02
Thank you56:21