Tutorial 2

Performance Engineering at the Kernel Level: Tools and Techniques for Dependency and Wait Analysis

April 15, 2023

Author: Naser Ezzati-Jivan, Assistant Professor at Brock University, Ontario, Canada

Dr. Naser Ezzati-Jivan is an assistant professor at Brock University in the Department of Computer Science with over 15 years of experience in the field of software engineering, software analysis and testing, and software performance engineering as a researcher and educator. He has a wealth of experience in teaching courses in these areas, including “Software Analysis and Testing,” “Software Performance Engineering,” and “Software Engineering” at Brock University. He is an experienced speaker and presenter, having recently given a keynote titled “Observability-Driven Software Development and Operations” at the WIDECOM 2022 conference. Dr. Ezzati-Jivan has collaborated with engineers and researchers at firms such as Google Montreal where he worked on chrome browser analysis, Ericsson, and Ciena, and is a demonstrated researcher and project leader with expertise in efficient software tracing and analysis algorithms, and in data abstraction, visualization, and analysis tools.

Abstract

This tutorial will provide attendees with hands-on experience in using kernel-level tracing tools, such as ftrace, Perf, eBPF, LTTng, and Trace Compass, to identify and diagnose performance issues in systems and applications. The tutorial will focus on the analysis of potential dependencies between threads and system resources, as this is a crucial aspect of identifying performance bugs. This is particularly challenging to identify using user-level performance analysis tools, which have limited visibility into the operating system and the underlying processes. Attendees will learn how to use these kernel-level tracing tools to detect and analyze different types of waits and contentions, such as scheduler waits, memory management waits, disk-level waits, network congestion, lock contention, and I/O contention. By understanding these dependencies, attendees will be able to effectively identify the root-causes of performance issues and improve and fix them. This tutorial is designed for system administrators, performance engineers, and developers who are looking to improve the performance and scalability of their systems and applications. Attendees will leave with the skills and knowledge to use kernel-level tracing tools to diagnose and fix performance issues on their own systems, by understanding and analyzing the dependencies between threads and system resources.

Outline

  • Introduction (15 minutes): Overview of the tutorial and its objectives
  • Concepts and Fundamentals (30 minutes): Overview of the complex interactions and relationships between threads, system resources, and the underlying kernel mechanisms
  • Kernel-level tracing tools (30 minutes): Overview and demonstration of the different types of kernel-level tracing tools such as ftrace, perf, eBPF, LTTng, and Trace Compass
  • Hands-on exercises (45 minutes): Attendees will work on exercises using the tracing tools to identify different types of waits and blocks, such as scheduler waits, memory management waits, and disk-level waits
  • Diagnosis and fixing performance issues (45 minutes): Discussion of how to diagnose and fix performance issues caused by dependencies and waits at the kernel level
  • Conclusion (15 minutes): Summary of the tutorial and its objectives, and next steps for attendees.