Using OpenACC With CUDA Libraries (Part 4) thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Using OpenACC With CUDA Libraries (Part 4)

Published on Sep 19, 20161688 Views

Related categories

Chapter list

Using OpenACC With CUDA Libraries00:00
3 Ways to Accelerate Applications - 105:27
3 Ways to Accelerate Applications - 205:34
3 Ways to Accelerate Applications - 305:39
3 Ways to Accelerate Applications - 405:43
3 Ways to Accelerate Applications - 505:44
CUDA Math Libraries05:53
How To Use CUDA Libraries With OpenACC07:15
Sharing data with libraries07:17
deviceptr Data Clause07:19
host_data Construct07:20
Example: 1D convolution using CUFFT09:05
Source Excerpt - 109:11
Source Excerpt - 213:15
OpenACC Convolution Code13:25
Linking CUFFT14:15
Result14:31
Summary14:50
Appendix15:10
cuFFT: Multi-dimensional FFTs15:14
FFTs up to 10x Faster than MKL15:15
CUDA 4.1 optimizes 3D transforms15:21
cuBLAS: Dense Linear Algebra on GPUs15:23
cuBLAS Level 3 Performance15:24
ZGEMM Performance vs Intel MKL15:28
cuBLAS Batched GEMM API improves performance on batches of small matrices15:30
cuSPARSE: Sparse linear algebra routines15:31
OpenMP 4.0 (now 4.5) for Accelerators19:09
OpenACC vs. OpenMP19:11
OpenMP Thread Control Philosophy21:24
Intel’s MIC Approach22:08
What is MIC?23:35
MIC Architecture - 124:06
MIC Architecture - 224:27
OpenMP 4.0 Data Migration26:14
SAXPY in OpenMP 4.0 on NVIDIA29:22
Comparing OpenACC with OpenMP 4.0 on NVIDIA & Phi30:15
OpenMP 4.0 Across Architectures30:48
Which way to go?31:06
So, at this time…31:49
Going Hostless36:27
Some things we did not mention37:37