Tutorial 1: Sat April 15 (Half-day tutorial - 3 hours)

Tutorial Title: Parallel performance engineering using Score-P and Vampir

Tutorial Abstract

This tutorial will introduce participants to the Score-P measurement system and the Vampir trace visualization tool for performance analysis. We will provide examples and hands-on exercises covering the full performance engineering workflow cycle on applications that include MPI, OpenMP, and GPU parallelism. Users will learn the following concepts:

  • How to collect an initial profile of their code with Score-P
  • Evaluation of that profile and its associated measurement overhead
  • The concepts of scoring and filtering a profile and measurement respectively
  • How to control the Score-P measurement system via environment variables
  • How to collect useful traces with acceptable overhead
  • How to understand trace visualization in Vampir

Presenter information

Holger Brunst (TU Dresden, Germany), William R. Williams (TU Dresden, Germany)

Tutorial 2: Sat April 15 (Half-day tutorial - 3 hours)

Tutorial Title: Performance Engineering at the Kernel Level: Tools and Techniques for Dependency and Wait Analysis

Tutorial Abstract

This tutorial will provide attendees with hands-on experience in using kernel-level tracing tools, such as ftrace, Perf, eBPF, LTTng, and Trace Compass, to identify and diagnose performance issues in systems and applications. The tutorial will focus on the analysis of potential dependencies between threads and system resources, as this is a crucial aspect of identifying performance bugs. This is particularly challenging to identify using user-level performance analysis tools, which have limited visibility into the operating system and the underlying processes. Attendees will learn how to use these kernel-level tracing tools to detect and analyze different types of waits and contentions, such as scheduler waits, memory management waits, disk-level waits, network congestion, lock contention, and I/O contention. By understanding these dependencies, attendees will be able to effectively identify the root-causes of performance issues and improve and fix them. This tutorial is designed for system administrators, performance engineers, and developers who are looking to improve the performance and scalability of their systems and applications. Attendees will leave with the skills and knowledge to use kernel-level tracing tools to diagnose and fix performance issues on their own systems, by understanding and analyzing the dependencies between threads and system resources.

Presenter information

Naser Ezzati-Jivan (Brock University, Canada)

Tutorial 3: Sun April 16 (Full-day tutorial - 6 hours)

Tutorial Title: Core-Level Performance Engineering with the Open-Source Architecture Code Analyzer (OSACA) and the Compiler Explorer

Tutorial Abstract

While many developers put a lot of effort into optimizing large-scale parallelism, they often neglect the importance of an efficient serial code. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted because no definite hardware performance limit (‘‘bottleneck’') is exhausted. This tutorial conveys the required knowledge to develop a thorough understanding of the interactions between software and hardware on the level of a single CPU core and the lowest memory hierarchy level (the L1 cache). We introduce general out-of-order core architectures and their typical performance bottlenecks using modern x86-64 (Intel Ice Lake) and ARM (Fujitsu A64FX) processors as examples. We then go into detail about x86 and AArch64 assembly code, specifically including vectorization (SIMD), pipeline utilization, critical paths, and loop-carried dependencies. We also demonstrate performance analysis and performance engineering using the Open-Source Architecture Code Analyzer (OSACA) in combination with a dedicated instance of the well-known Compiler Explorer. Various hands-on exercises will allow attendees to make their own experiments and measurements and identify in-core performance bottlenecks. Furthermore, we show real-life use cases to emphasize how profitable in-core performance engineering can be.

Presenter information

Georg Hager (Friedrich-Alexander-Universit├Ąt, Germany), Jan Laukemann (Friedrich-Alexander-Universit├Ąt, Germany)