video thumbnail

GPU programming for Deep Learning

Published on 2016-08-235923 Views

Julie Bernauer

Ryan Olson

Deep Learning Summer School 2016 - Montreal

Related categories

Deep Learning Reinforcement Learning Unsupervised Learning

Presentation

GPU programming for DL00:00

Outline00:25

GPU Computing - 101:46

GPU Computing - 201:49

CUDA - 102:42

GPU accelerated libraries03:23

Deep Neural Networks and GPUs04:09

Accelerating insights04:19

Recent improvements05:42

NVIDIA cuDNN06:31

Accelerating linear algebra: cuBLAS07:25

Accelerating sparse operations: cuSPARSE08:02

Multi-GPU communication: NCCL08:23

NCCL Example10:15

Platform10:42

Developer workstation10:58

World’s First Deep Learning Supercomputer12:25

Tesla p100 accelerator14:19

GIE (GPU Inference Engine)17:44

Jetson TX1 devkit19:27

Optimizations20:45

Performance21:13

Interactive Deep Learning GPU Training System21:59

CUDA - 222:19

GPU architecture30:21

Two Main Components30:37

Streaming Multiprocessor (SM)31:47

GPU memory hierarchy review32:16

CUDA Programming model32:43

Anatomy of a cuda c/c++ application32:58

C with a few keywords33:45

Cuda kernels - 134:51

Cuda kernels - 235:55

CUDA Kernels: Subdivide into Blocks - 136:34

CUDA Kernels: Subdivide into Blocks - 236:55

CUDA Kernels: Subdivide into Blocks - 337:05

Kernel Execution37:17

Thread blocks allow cooperation37:45

Thread blocks allow scalability38:24

Memory System Hierarchy39:18

Memory hierarchy - 139:20

Memory hierarchy - 239:31

Memory hierarchy - 339:43

Memory hierarchy - 440:18

Memory hierarchy - 540:53

CUDA memory management41:34

Memory spaces41:37

GPU memory allocation/release43:04

Data copies43:33

Basic kernels and execution44:49

Cuda programming model revisited44:51

Thread hierarchy45:22

Ids and dimensions - 145:44

Ids and dimensions - 248:02

Launching kernels on gpu48:08

Gpu kernel execution48:31

Blocks must be independent49:25

Hands-on labs49:43

Prepare and Start AWS Instance50:51

Software51:05

Want to try?53:57

Join Nvidia54:26