Optimization II

Published on 2018-10-111630 Views

DLRL Summer School 2018 - Toronto

Optimization: Part II 00:00

Many thanks to 01:01

Different perspectives on optimization01:10

Different perspectives on nonlinear optimization 02:30

Deterministic and Stochastic Optimization05:19

Stochastic and Deterministic Large-Scale Nonlinear Optimization Worlds 07:44

To make this concrete let’s talk about Momentum and Acceleration 12:11

Momentum (Heavy Ball Method) 12:19

Consider what momentum can do in the non-convex case 14:00

But momentum works in practice14:48

Nesterov acceleration17:58

Acceleration with noisy gradients 19:11

Understanding SGD 20:17

Convergence 20:30

Fixed steplength, Diminishing steplength 23:46

Efficiency of SGD 26:17

Non-convexity and SGD28:44

Weaknesses of SGD? 30:30

Optimization panorama32:41

SGD32:53

Three approaches for constructing second order information 34:00

Mini-Batches 35:27

The trade-offs of larger batch sizes 36:55

Robust minimizers 39:22

Progressive sampling gradient method - 140:36

Progressive sampling gradient method - 242:00

How to use progressive sampling in practice? 43:11

Two strategies 43:52

Implementation via sample variances - 144:22

Implementation via sample variances - 244:50

On the Steplengths 45:27

Scaling the Search Direction45:51

Scaling the Gradient Direction 46:41

Different gradient components should be scaled differently 49:25

Newton’s method 50:30

A Fundamental Equation for Newton’s method 53:00

Inexact Newton Method56:05

Sub-sampled Hessian Newton Methods57:46