en-es
en-fr
en-sl
en
0.25
0.5
0.75
1.25
1.5
1.75
2
GPU programming for Deep Learning
Published on Aug 23, 20165890 Views
Related categories
Chapter list
GPU programming for DL00:00
Outline00:25
GPU Computing - 101:46
GPU Computing - 201:49
CUDA - 102:42
GPU accelerated libraries03:23
Deep Neural Networks and GPUs04:09
Accelerating insights04:19
Recent improvements05:42
NVIDIA cuDNN06:31
Accelerating linear algebra: cuBLAS07:25
Accelerating sparse operations: cuSPARSE08:02
Multi-GPU communication: NCCL08:23
NCCL Example10:15
Platform10:42
Developer workstation10:58
World’s First Deep Learning Supercomputer12:25
Tesla p100 accelerator14:19
GIE (GPU Inference Engine)17:44
Jetson TX1 devkit19:27
Optimizations20:45
Performance21:13
Interactive Deep Learning GPU Training System21:59
CUDA - 222:19
GPU architecture30:21
Two Main Components30:37
Streaming Multiprocessor (SM)31:47
GPU memory hierarchy review32:16
CUDA Programming model32:43
Anatomy of a cuda c/c++ application32:58
C with a few keywords33:45
Cuda kernels - 134:51
Cuda kernels - 235:55
CUDA Kernels: Subdivide into Blocks - 136:34
CUDA Kernels: Subdivide into Blocks - 236:55
CUDA Kernels: Subdivide into Blocks - 337:05
Kernel Execution37:17
Thread blocks allow cooperation37:45
Thread blocks allow scalability38:24
Memory System Hierarchy39:18
Memory hierarchy - 139:20
Memory hierarchy - 239:31
Memory hierarchy - 339:43
Memory hierarchy - 440:18
Memory hierarchy - 540:53
CUDA memory management41:34
Memory spaces41:37
GPU memory allocation/release43:04
Data copies43:33
Basic kernels and execution44:49
Cuda programming model revisited44:51
Thread hierarchy45:22
Ids and dimensions - 145:44
Ids and dimensions - 248:02
Launching kernels on gpu48:08
Gpu kernel execution48:31
Blocks must be independent49:25
Hands-on labs49:43
Prepare and Start AWS Instance50:51
Software51:05
Want to try?53:57
Join Nvidia54:26