The Unreasonable Effectivness Of Deep Learning thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

The Unreasonable Effectivness Of Deep Learning

Published on Oct 29, 201441003 Views

Related categories

Chapter list

The Unreasonable Effectiveness of Deep Learning00:00
55 years of hand-crafted features01:29
Architecture of “Classical” Recognition Systems02:52
Architecture of Deep Learning-Based Recognition Systems 03:44
Future Systems03:58
Deep Learning = Learning Hierarchical Representations05:45
Trainable Feature Hierarchy07:51
Learning Representations: a challenge for ML, CV, AI, Neuroscience, Cognitive Science...07:58
The Mammalian Visual Cortex is Hierarchical08:15
Untitled09:10
Discovering the Hidden Structure in High-Dimensional Data The manifold hypothesis09:37
Basic Idea for Invariant Feature Learning09:42
Sparse Non-Linear Expansion → Pooling10:57
Overall Architecture: multiple stages of Normalization → Filter Bank → Non-Linearity → Pooling10:59
Deep Nets with ReLUs and Max Pooling12:40
Supervised Training: Stochastic (Sub)Gradient Optimization13:01
Loss Function for a simple network13:11
Deep Nets with ReLUs13:13
Deep Convolutional Nets (and other deep neural nets)15:42
Deep Nets with ReLUs: Objective Function is Piecewise Polynomial17:19
Convolutional Networks19:00
Convolutional Network19:08
Early Hierarchical Feature Models for Vision20:14
The Convolutional Net Model 20:19
Convolutional Network (ConvNet)20:36
Convolutional Network (vintage 1990) 20:38
LeNet1 Demo from 199320:44
Brute Force Approach To Multiple Object Recognition22:09
Idea #1: Sliding Window ConvNet + Weighted FSM - 122:24
Idea #1: Sliding Window ConvNet + Weighted FSM - 222:46
Idea #1: Sliding Window ConvNet + Weighted FSM - 323:22
Idea #1: Sliding Window ConvNet + Weighted FSM - 423:50
Convolutional Networks In Visual Object Recognition23:51
We knew ConvNet worked well with characters and small images23:56
NORB Dataset (2004): 5 categories, multiple views and illuminations24:03
mid 2000s: state of the art results on face detection24:07
Simultaneous face detection and pose estimation - 124:24
Simultaneous face detection and pose estimation - 224:27
Visual Object Recognition with Convolutional Nets24:35
Late 2000s: we could get decent results on object recognition25:15
Object Recognition [Krizhevsky, Sutskever, Hinton 2012]26:20
Then., two things happened...26:33
ImageNet Large-Scale Visual Recognition Challenge27:24
Object Recognition [Krizhevsky, Sutskever, Hinton 2012]27:31
ConvNet-Based Google+ Photo Tagger28:11
NYU ConvNet Trained on ImageNet: OverFeat28:13
Kernels: Layer 1 (7x7)and Layer 2 (7x7) 29:39
Untitled29:41
Untitled29:44
Classification + Localization: multiscale sliding window29:51
Applying a ConvNet on Sliding Windows is Very Cheap!31:13
Classification + Localization: sliding window + bounding box regression31:16
Detection: Examples - 131:23
Detection: Examples - 231:46
ImageNet 2013: Detection32:01
Results: pre-trained on ImageNet1K, fine-tuned on ImageNet Detection - 132:36
Results: pre-trained on ImageNet1K, fine-tuned on ImageNet Detection - 232:40
Results: pre-trained on ImageNet1K, fine-tuned on ImageNet Detection - 332:44
Detection: Difficult Examples32:56
Detection: Interesting Failures33:01
Detection: Bad Groundtruth33:10
ConvNets As Generic Feature Extractors33:12
Cats vs Dogs - 133:29
Cats vs Dogs - 133:39
Features are generic: Caltech 25633:44
OverFeat Features ->Trained Classifier on other datasets33:47
Image Similarity Matching With Siamese Networks Embedding, DrLIM34:27
DrLIM: Metric Learning 34:28
Y LeCun Loss function35:13
Face Recognition:DeepFace (Facebook AI Research) - 135:21
Face Recognition:DeepFace (Facebook AI Research) - 236:56
DeepFace: performance37:11
Depth Estimation from Stereo Pairs37:14
Depth Estimation from Stereo Pairs: Results38:33
Body Pose Estimation39:00
Pose Estimation and Attribute Recovery with ConvNets39:02
Other Tasks for Which Deep Convolutional Nets are the Best45:02
Deep Learning and Convolutional Networks in Speech, Audio, and Signals45:03
Acoustic Modeling in Speech Recognition (Google)45:08
Energy-Based Unsupervised Learning45:10
Learning the Energy Function45:57
Seven Strategies to Shape the Energy Function46:33
#1: constant volume of low energy Energy surface for PCA and K-means49:54
#2: push down of the energy of data points, push up everywhere else50:23
Untitled50:24
Dictionary Learning With Fast Approximate Inference: Sparse Auto-Encoders50:26
Sparse Modeling: Sparse Coding + Dictionary Learning50:27
Untitled50:27
#6. use a regularizer that limits the volume of space that has low energy51:46
Learning to Perform Approximate Inference: Predictive Sparse Decomposition Sparse Auto-Encoders51:59
Sparse auto-encoder: Predictive Sparse Decomposition (PSD)52:25
Regularized Encoder-Decoder Model (auto-Encoder) for Unsupervised Feature Learning53:14
PSD: Basis Functions on MNIST53:17
Predictive Sparse Decomposition (PSD): Training53:23
Learned Features on natural patches: V1-like receptive fields53:28
Learning to Perform Approximate Inference LISTA53:34
Better Idea: Give the “right” structure to the encoder53:36
LISTA: Train We and S matrices to give a good approximation quickly55:28
Learning ISTA (LISTA) vs ISTA/FISTA55:44
LISTA with partial mutual inhibition matrix56:10
Learning Coordinate Descent (LcoD): faster than LISTA56:16
Convolutional Sparse Coding56:24
Convolutional PSD: Encoder with a soft sh() Function 56:34
Convolutional Sparse Auto-Encoder on Natural Images57:28
Using PSD to Train a Hierarchy of Features - 157:34
Using PSD to Train a Hierarchy of Features - 258:01
Using PSD to Train a Hierarchy of Features - 358:05
Using PSD to Train a Hierarchy of Features - 458:07
Using PSD to Train a Hierarchy of Features - 558:12
Unsupervised + Supervised For Pedestrian Detection58:13
Untitled58:16
Pedestrian Detection, Face Detection58:23
ConvNet Architecture with Multi-Stage Features59:17
Pedestrian Detection: INRIA Dataset. Miss rate vs false positives59:22
Video - 159:29
Video - 259:51
Pedestrian Detection: INRIA Dataset. Miss rate vs false positives59:53
Unsupervised Learning: Invariant Features59:54
Learning Invariant Features with L2 Group Sparsity - 159:59
Learning Invariant Features with L2 Group Sparsity - 201:00:15
Groups are local in a 2D Topographic Map01:00:44
Image-level training, local filters but no weight sharing - 101:02:12
Image-level training, local filters but no weight sharing - 201:02:15
Topographic Maps01:02:51
Image-level training, local filters but no weight sharing01:03:02
Invariant Features Lateral Inhibition01:03:06
Invariant Features via Lateral Inhibition: Structured Sparsity01:04:23
Invariant Features via Lateral Inhibition: Topographic Maps01:05:08
Invariant Features through Temporal Constancy01:06:00
What-Where Auto-Encoder Architecture01:06:59