Bayesian Neural Nets

Published on 2018-10-117410 Views

Andrew Gordon Wilson

DLRL Summer School 2018 - Toronto

Related categories

Presentation

Model Selection00:58

Bayesian or Frequentist?03:57

Bayesian Deep Learning05:50

How do we build models that learn and generalize? - 109:35

How do we build models that learn and generalize? - 210:06

How do we build models that learn and generalize? - 311:13

How do we build models that learn and generalize? - 412:59

Bayesian Inference14:02

Predictive Distribution14:39

Example: Bent Coin - 116:36

Example: Bent Coin - 218:42

Example: Bent Coin - 319:59

Beta Distribution20:36

Example: Bent Coin - 421:32

A Function Space View: Gaussian processes23:18

Gaussian processes24:43

Linear Basis Models - 126:55

Linear Basis Models - 227:43

Linear Basis Function Models27:54

Example: RBF Kernel28:45

Sampling from a GP with an RBF Kernel - 131:41

Sampling from a GP with an RBF Kernel - 232:21

RBF Kernel Covariance Matrix33:06

Learning and Predictions with Gaussian Processes33:45

Inference using an RBF kernel - 135:24

Inference using an RBF kernel - 237:13

Learning and Model Selection - 137:53

Learning and Model Selection - 240:21

Aside: How Do We Build Models that Generalize?40:42

Gaussian Processes and Neural Networks42:59

Deep Kernel Learning - 145:42

Deep Kernel Learning - 246:47

Scalable Gaussian Processes47:21

Deep Kernel Learning Results49:37

Face Orientation Extraction50:17

Learning Flexible Non-Euclidean Similarity Metrics51:01

Step Function51:49

LSTM Kernels52:56

GP-LSTM Predictive Distributions53:10

The Bayesian GAN54:04

Wide Optima Generalize Better56:37

Loss Surfaces in Deep Learning59:06

Mode Connectivity01:01:09

Example Parametrizations01:02:09

Connection Procedure with Tractable Loss01:02:44

Curve Ensembling01:03:27

Fast Geometric Ensembling01:05:59

Trajectory of SGD - 101:06:44

Trajectory of SGD - 201:07:04

Trajectory of SGD - 301:07:06

Trajectory of SGD - 401:07:18

Trajectory of SGD - 501:08:14

Following Random Paths01:08:43

Path from wSWA to wSGD01:09:14

Approximating an FGE Ensemble01:10:21

SWA Results, CIFAR01:10:52

SWA Results, ImageNet (Top-1 Error Rate)01:11:40

Sampling from a High Dimensional Gaussian01:12:09

High Constant LR01:13:27

Conclusions01:14:49

Deriving the RBF Kernel01:21:10