0.25
0.5
0.75
1.25
1.5
1.75
2
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Published on May 27, 201620027 Views
Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we introduce "
Related categories
Chapter list
Deep Compression00:00
Intro00:31
Deep Learning: Next Wave of AI00:51
Deep Learning on Mobile01:00
DNN on the Cloud …01:10
Model Size! - 101:36
Model Size! - 202:28
Problem: DNN Model Size03:19
Deep Compression Overview03:36
Deep Compression Pipeline - 104:17
Deep Compression Pipeline - 204:27
1. Pruning04:31
Pruning: Motivation05:13
AlexNet & VGGNet05:41
Retrain to Recover Accuracy06:23
Pruning: Result07:02
Pruning RNN and LSTM07:16
Pruning NeuralTalk and LSTM07:42
Weight Distribution08:17
Deep Compression Pipeline08:40
2. Weight Sharing08:46
Weight Sharing: Overview - 109:24
Weight Sharing: Overview - 209:39
Weight Sharing: Overview - 309:44
Weight Sharing: Overview - 410:12
Weight Sharing: Overview - 510:17
Weight Sharing: Overview - 610:23
Weight Sharing: Overview - 710:28
Weight Distribution11:49
Bits Per Weight12:08
Pruning + Trained Quantization - 112:13
Pruning + Trained Quantization - 212:33
Finetune Centroids13:05
Deep Compression Pipeline - 313:21
3. Huffman Coding13:29
Huffman Coding13:38
Deep Compression Result on 4 Convnets13:49
Speedup/Energy Efficiency on CPU/GPU - 114:32
Speedup/Energy Efficiency on CPU/GPU - 214:45
EIE: Efficient Inference Engine14:59
Conclusion15:48
Acknowledgment17:01
Thank you!17:09