
0.25
0.5
0.75
1.25
1.5
1.75
2
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Published on 2016-05-2720086 Views
Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we introduce "
Related categories
Presentation
Deep Compression00:00
Intro00:31
Deep Learning: Next Wave of AI00:51
Deep Learning on Mobile01:00
DNN on the Cloud …01:10
Model Size! - 101:36
Model Size! - 202:28
Problem: DNN Model Size03:19
Deep Compression Overview03:36
Deep Compression Pipeline - 104:17
Deep Compression Pipeline - 204:27
1. Pruning04:31
Pruning: Motivation05:13
AlexNet & VGGNet05:41
Retrain to Recover Accuracy06:23
Pruning: Result07:02
Pruning RNN and LSTM07:16
Pruning NeuralTalk and LSTM07:42
Weight Distribution08:17
Deep Compression Pipeline08:40
2. Weight Sharing08:46
Weight Sharing: Overview - 109:24
Weight Sharing: Overview - 209:39
Weight Sharing: Overview - 309:44
Weight Sharing: Overview - 410:12
Weight Sharing: Overview - 510:17
Weight Sharing: Overview - 610:23
Weight Sharing: Overview - 710:28
Bits Per Weight12:08
Pruning + Trained Quantization - 112:13
Pruning + Trained Quantization - 212:33
Finetune Centroids13:05
Deep Compression Pipeline - 313:21
3. Huffman Coding13:29
Huffman Coding13:38
Deep Compression Result on 4 Convnets13:49
Speedup/Energy Efficiency on CPU/GPU - 114:32
Speedup/Energy Efficiency on CPU/GPU - 214:45
EIE: Efficient Inference Engine14:59
Conclusion15:48
Acknowledgment17:01
Thank you!17:09