The Unreasonable Effectivness Of Deep Learning

Published on 2014-10-2941146 Views

Yann LeCun

SAHD 2014 - London

Related categories

Presentation

The Unreasonable Effectiveness of Deep Learning00:00

55 years of hand-crafted features01:29

Architecture of “Classical” Recognition Systems02:52

Architecture of Deep Learning-Based Recognition Systems 03:44

Future Systems03:58

Deep Learning = Learning Hierarchical Representations05:45

Trainable Feature Hierarchy07:51

Learning Representations: a challenge for ML, CV, AI, Neuroscience, Cognitive Science...07:58

The Mammalian Visual Cortex is Hierarchical08:15

Discovering the Hidden Structure in High-Dimensional Data The manifold hypothesis09:37

Basic Idea for Invariant Feature Learning09:42

Sparse Non-Linear Expansion → Pooling10:57

Overall Architecture: multiple stages of Normalization → Filter Bank → Non-Linearity → Pooling10:59

Deep Nets with ReLUs and Max Pooling12:40

Supervised Training: Stochastic (Sub)Gradient Optimization13:01

Loss Function for a simple network13:11

Deep Nets with ReLUs13:13

Deep Convolutional Nets (and other deep neural nets)15:42

Deep Nets with ReLUs: Objective Function is Piecewise Polynomial17:19

Convolutional Networks19:00

Convolutional Network19:08

Early Hierarchical Feature Models for Vision20:14

The Convolutional Net Model 20:19

Convolutional Network (ConvNet)20:36

Convolutional Network (vintage 1990) 20:38

LeNet1 Demo from 199320:44

Brute Force Approach To Multiple Object Recognition22:09

Idea #1: Sliding Window ConvNet + Weighted FSM - 122:24

Idea #1: Sliding Window ConvNet + Weighted FSM - 222:46

Idea #1: Sliding Window ConvNet + Weighted FSM - 323:22

Idea #1: Sliding Window ConvNet + Weighted FSM - 423:50

Convolutional Networks In Visual Object Recognition23:51

We knew ConvNet worked well with characters and small images23:56

NORB Dataset (2004): 5 categories, multiple views and illuminations24:03

mid 2000s: state of the art results on face detection24:07

Simultaneous face detection and pose estimation - 124:24

Simultaneous face detection and pose estimation - 224:27

Visual Object Recognition with Convolutional Nets24:35

Late 2000s: we could get decent results on object recognition25:15

Then., two things happened...26:33

ImageNet Large-Scale Visual Recognition Challenge27:24

Object Recognition [Krizhevsky, Sutskever, Hinton 2012]27:31

ConvNet-Based Google+ Photo Tagger28:11

NYU ConvNet Trained on ImageNet: OverFeat28:13

Kernels: Layer 1 (7x7)and Layer 2 (7x7) 29:39

Classification + Localization: multiscale sliding window29:51

Applying a ConvNet on Sliding Windows is Very Cheap!31:13

Classification + Localization: sliding window + bounding box regression31:16

Detection: Examples - 131:23

Detection: Examples - 231:46

ImageNet 2013: Detection32:01

Results: pre-trained on ImageNet1K, fine-tuned on ImageNet Detection - 132:36

Results: pre-trained on ImageNet1K, fine-tuned on ImageNet Detection - 232:40

Results: pre-trained on ImageNet1K, fine-tuned on ImageNet Detection - 332:44

Detection: Difficult Examples32:56

Detection: Interesting Failures33:01

Detection: Bad Groundtruth33:10

ConvNets As Generic Feature Extractors33:12

Cats vs Dogs - 133:39

Features are generic: Caltech 25633:44

OverFeat Features ->Trained Classifier on other datasets33:47

Image Similarity Matching With Siamese Networks Embedding, DrLIM34:27

DrLIM: Metric Learning 34:28

Y LeCun Loss function35:13

Face Recognition:DeepFace (Facebook AI Research) - 135:21

Face Recognition:DeepFace (Facebook AI Research) - 236:56

DeepFace: performance37:11

Depth Estimation from Stereo Pairs37:14

Depth Estimation from Stereo Pairs: Results38:33

Body Pose Estimation39:00

Pose Estimation and Attribute Recovery with ConvNets39:02

Other Tasks for Which Deep Convolutional Nets are the Best45:02

Deep Learning and Convolutional Networks in Speech, Audio, and Signals45:03

Acoustic Modeling in Speech Recognition (Google)45:08

Energy-Based Unsupervised Learning45:10

Learning the Energy Function45:57

Seven Strategies to Shape the Energy Function46:33

#1: constant volume of low energy Energy surface for PCA and K-means49:54

#2: push down of the energy of data points, push up everywhere else50:23

Dictionary Learning With Fast Approximate Inference: Sparse Auto-Encoders50:26

Sparse Modeling: Sparse Coding + Dictionary Learning50:27

#6. use a regularizer that limits the volume of space that has low energy51:46

Learning to Perform Approximate Inference: Predictive Sparse Decomposition Sparse Auto-Encoders51:59

Sparse auto-encoder: Predictive Sparse Decomposition (PSD)52:25

Regularized Encoder-Decoder Model (auto-Encoder) for Unsupervised Feature Learning53:14

PSD: Basis Functions on MNIST53:17

Predictive Sparse Decomposition (PSD): Training53:23

Learned Features on natural patches: V1-like receptive fields53:28

Learning to Perform Approximate Inference LISTA53:34

Better Idea: Give the “right” structure to the encoder53:36

LISTA: Train We and S matrices to give a good approximation quickly55:28

Learning ISTA (LISTA) vs ISTA/FISTA55:44

LISTA with partial mutual inhibition matrix56:10

Learning Coordinate Descent (LcoD): faster than LISTA56:16

Convolutional Sparse Coding56:24

Convolutional PSD: Encoder with a soft sh() Function 56:34

Convolutional Sparse Auto-Encoder on Natural Images57:28

Using PSD to Train a Hierarchy of Features - 157:34

Using PSD to Train a Hierarchy of Features - 258:01

Using PSD to Train a Hierarchy of Features - 358:05

Using PSD to Train a Hierarchy of Features - 458:07

Using PSD to Train a Hierarchy of Features - 558:12

Unsupervised + Supervised For Pedestrian Detection58:13

Untitled58:16

Pedestrian Detection, Face Detection58:23

ConvNet Architecture with Multi-Stage Features59:17

Video - 159:29

Video - 259:51

Pedestrian Detection: INRIA Dataset. Miss rate vs false positives59:53

Unsupervised Learning: Invariant Features59:54

Learning Invariant Features with L2 Group Sparsity - 159:59

Learning Invariant Features with L2 Group Sparsity - 201:00:15

Groups are local in a 2D Topographic Map01:00:44

Image-level training, local filters but no weight sharing - 101:02:12

Image-level training, local filters but no weight sharing - 201:02:15

Topographic Maps01:02:51

Image-level training, local filters but no weight sharing01:03:02

Invariant Features Lateral Inhibition01:03:06

Invariant Features via Lateral Inhibition: Structured Sparsity01:04:23

Invariant Features via Lateral Inhibition: Topographic Maps01:05:08

Invariant Features through Temporal Constancy01:06:00

What-Where Auto-Encoder Architecture01:06:59