Performance Analysis and Optimization (Part 1) thumbnail
Pause
Mute
Subtitles
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Performance Analysis and Optimization (Part 1)

Published on Sep 19, 20161041 Views

Related categories

Chapter list

Performance Engineering of Parallel Applications00:00
Acknowledgment01:32
Outline for Performance Sessions02:01
Fitting algorithms to hardware…and vice versa03:51
Code Development and Optimization Process05:45
Performance engineering workflow - 107:56
Performance engineering workflow - 207:56
A little background...08:36
Hardware Counters08:49
Features of PAPI10:24
Measurement Techniques11:38
Inclusive and Exclusive Profiles13:12
Applying Performance Tools to Improve Parallel Performance of the UNRES MD code14:39
Structure of UNRES15:38
Performance Engineering: Procedure - 116:30
Performance Engineering: Procedure - 218:34
Is There a Performance Problem?22:03
Detecting Performance Problems22:06
Use a Sampling Tool for Initial Performance Check23:35
UNRES: Serial Performance24:40
UNRES: Parallel Performance25:24
Performance Engineering: Procedure - 325:53
Which Functions are Important?26:01
Contributions of Functions26:41
UNRES Function Summary27:04
Performance Engineering: Procedure - 427:41
Digging Deeper: Instrument Key Functions27:49
Choose a tool: there are many!28:38
TAU: Tuning and Analysis Utilities29:14
General Instructions for TAU29:38
Untitled29:59
Tiny Routines: High Overhead30:33
Reducing Overhead31:02
Selective Instrumentation File - 131:20
Selective Instrumentation File - 231:36
Getting a Call Path with TAU32:35
Getting Call Path Information32:56
Isolate regions of code execution33:10
Key UNRES Functions in TAU (with Startup Time) 33:21
Key UNRES Functions (MD Time Only)33:33
Performance Engineering: Procedure - 533:57
Detecting Serial Performance Issues34:06
Create a Derived Metric in Paraprof Manager35:49
Perf of EELEC (peak is 2)36:07
Performance Engineering: Procedure - 636:49
Do compiler optimization first! EELEC – After forcing inlining with compiler36:51
Further Info on Serial Optimization37:50
Performance Engineering: Procedure - 738:16
TAU Recipe #1: Detecting Serial Bottlenecks39:24
Serial Bottleneck Detection in UNRES: Function Scaling (2-32 cores) - 139:32
Serial Bottleneck Detection in UNRES: Function Scaling (2-32 cores) - 240:43
TAU Recipe #2: Detecting Parallel Load Imbalance40:47
Load Imbalance Detection in UNRES - 140:55
Load Imbalance Detection in UNRES - 241:14
Load Imbalance Detection in UNRES - 341:16
Load Imbalance Detection in UNRES - 441:29
Load Imbalance Detection in UNRES - 541:31
Major Serial Bottleneck and Load Imbalance in UNRES Eliminated - 142:13
Major Serial Bottleneck and Load Imbalance in UNRES Eliminated - 242:32
Next Iteration of Performance Engineering with Optimized Code42:55
Use Call Path Information: MPI Calls43:09
Use Call Path Information: MPI Calls43:20
Performance Engineering: Procedure - 843:26
Some Take-Home Points43:31
Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir01:04:57
Performance engineering workflow01:05:03
Fragmentation of tools landscape01:05:11
Scalasca ↔ TAU ↔ VAMPIR ↔ Paraver01:05:42
Score-P project idea01:07:38
Partners01:08:23
Score-P overview01:08:23
Future features and management01:13:08
Hands-on: NPB-MZ-MPI / BT01:13:09