Benchmarking

Measuring software performance reliably is remarkably difficult. It’s a specialized version of a more general problem: trying to find a signal in a world full of noise. A benchmark that reports a 5% improvement might just be measuring thermal throttling, noisy neighbors, or the phase of the moon. In this talk, we walk through the full stack of reliable performance measurement — from controlling your benchmarking environment (bare metal instances, CPU affinity, disabling SMT and dynamic frequency scaling) to designing benchmarks that are both representative and repeatable. We cover the statistical methods needed to interpret results correctly (hypothesis testing, change point detection) and show how to integrate continuous benchmarking into development workflows so regressions are caught before they reach production. ...