Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Published on 2016-05-2720590 Views

Song Han

Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we introduce "

ICLR 2016 - San Juan

Related categories

Presentation

Deep Compression00:00

Intro00:31

Deep Learning: Next Wave of AI00:51

Deep Learning on Mobile01:00

DNN on the Cloud …01:10

Model Size! - 101:36

Model Size! - 202:28

Problem: DNN Model Size03:19

Deep Compression Overview03:36

Deep Compression Pipeline - 104:17

Deep Compression Pipeline - 204:27

1. Pruning04:31

Pruning: Motivation05:13

AlexNet & VGGNet05:41

Retrain to Recover Accuracy06:23

Pruning: Result07:02

Pruning RNN and LSTM07:16

Pruning NeuralTalk and LSTM07:42

Weight Distribution08:17

Deep Compression Pipeline08:40

2. Weight Sharing08:46

Weight Sharing: Overview - 109:24

Weight Sharing: Overview - 209:39

Weight Sharing: Overview - 309:44

Weight Sharing: Overview - 410:12

Weight Sharing: Overview - 510:17

Weight Sharing: Overview - 610:23

Weight Sharing: Overview - 710:28

Bits Per Weight12:08

Pruning + Trained Quantization - 112:13

Pruning + Trained Quantization - 212:33

Finetune Centroids13:05

Deep Compression Pipeline - 313:21

3. Huffman Coding13:29

Huffman Coding13:38

Deep Compression Result on 4 Convnets13:49

Speedup/Energy Efficiency on CPU/GPU - 114:32

Speedup/Energy Efficiency on CPU/GPU - 214:45

EIE: Efficient Inference Engine14:59

Conclusion15:48

Acknowledgment17:01

Thank you!17:09