Advanced OpenACC (Part 3) thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Advanced OpenACC (Part 3)

Published on Sep 19, 20161091 Views

Related categories

Chapter list

Advanced OpenACC00:00
Outline - 101:48
Outline - 202:09
Targeting the Architecture (But Not Admitting It)03:34
OpenACC Task Granularity03:55
Targeting the Architecture04:49
NVIDIA GPU Task Granularity05:55
Warps – on Kepler08:26
Determining block size – on Kepler10:10
Determining grid size – on Kepler11:09
Mapping OpenACC to CUDA Threads and Blocks11:28
SAXPY Returns For Some Fine Tuning12:53
Rapid Evolution - 114:19
Rapid Evolution - 215:17
Parallel Regions vs. Kernels15:46
Parallel Construct18:31
Parallel Clauses18:41
Parallel Regions18:58
Compare and Contrast - 120:22
Compare and Contrast - 220:47
Parallel Regions vs. Kernels - 121:19
Parallel Regions vs. Kernels - 222:43
Parallel Regions vs. Kernels (Which is best?)24:40
OpenACC 2.0 & 2.526:13
Procedure Calls26:45
Nested Parallelism - 127:51
Nested Parallelism - 228:49
Device Specific Tuning29:30
Multiple Devices and Multiple Threads30:51
Asynchronous Behavior32:31
Data Management33:17
Profiling35:44
Mandlebrot Code36:21
Lots of Data Transfer Time37:29
Broken Into Blocks With Asynchronous Transfers37:38
Optimized In A Few Well-Informed Stages38:00
OpenACC Things Not Covered38:50