jm.dev
j@jm.dev · @jmq_en
linkedin.com/in/jqq · github.com/jmqd
Tokyo, Japan 🇯🇵
Performance

What is performance?

There's a cohort of smart folks these days that don't like the word "performant", presumably because it's unspecific. I get it. I’ve always charitably interpreted “performant” to mean “reasonably optimized for some mix of minimizing latency, maximizing throughput, minimizing memory, etc. without unjustifiably sacrificing any other performance characteristics.” Obviously some of these compete with each other, so it’s ~nonspecific.

I think the best articulation of performance is: efficiently minimize space, time, and energy. At a certain level, thinking about performance starts to feel like an entry level physics course on thermodynamics.

Why care about performance?

Performance is inherently about resourcefulness. Our world has limited space and energy, and we have limited time. We ought to not waste these finite resources. To me, resourcefulness and sustainability are sufficient reasons to answer the why question, but I also just like tools that are fast.

Performance Characterstics

Performance CharacteristicClassification
RAM usageSpace
Instruction countTime
Execution latencyTime
ThroughputTime (a scalar measure of work per unit time)
Code size (e.g. binary file size)Space
Number of cores usedEnergy
CPU load time (non-idle time)Energy
Number of computers usedEnergy
Clockrate of CPU(s)Energy
Clockrate of MemoryEnergy
Data transfer (across links)Time × Space × Energy

The above are all sort of first-principle level performance characteristics. Working backwards from these, we can derive various causes and mechanisms to influence them.

Performance influencers

Performance InfluencerRelevant CharacteristicComments / how to measure
TLB Miss RateExecution latencye.g. perf stat -e dTLB-loads,dTLB-load-misses,iTLB-loads,iTLB-load-misses
L1/2/3 Cache Hit RateExecution latencye.g. perf cpu/L1-dcache-load-misses/ (also icache for instructions)
Register pressureExcetion latency~
Saturated Network IOTime, Spacesar ­n DEV 1
Saturated Disk IOTime, Spaceiostat ­xz 1
Unreliable network; TCP retransmitsTime, Spacesar ­n TCP,ETCP 1
Noisy neighborsTime, Spacepidstat 1, vmstat 1, uptime, dmseg
Memory leakRAM usagevalgrind, coredump
Context switchesTimee.g. perf record -F64 -e cpu-clock -e cs -a -g
Branch mispredictsTimee.g. perf stat -e branch-misses
False sharingTimeperf c2c
Mutex contentionTimeMeasure time to acquire locks, lock waiters, see also critical section.
Cache localityTimeL1/2/3 cache hit rate, look for data always accessed together but stored far apart
Access pattern uniform smearTimeL1/2/3 cache hit rate, TLB misses, analyze access patterns
Doing something when you could do nothingTime, Space, EnergyCritical thinking
Not batching workTime, Space, EnergyCritical thinking
Allocating too muchTime, Space, EnergyCoredumps, valgrind, critical thinking
SyscallsTime, Spacee.g. perf top -e raw_syscalls:sys_enter -ns comm, strace
Page faultsTimee.g. perf record -e page-faults -ag
CPU migrationsTimee.g. perf record -e migrations -a
Overserialized processingTimeCritical thinking, searching for things that are embarrassingly parallel
Automatic vectorizationTimeInspect ASM
SIMD manual vectorizationTimee.g. std::simd
CopiesTimeZero-copy serialization frameworks are popular for good reason, good to avoid copying if possible.
Memory stridingTimeIf memory access is predictable, we can prefetch data and pipeline instructions.

Perfomance quips

What about that whole "root of all evil" thing?

You're misquoting Donald Knuth and that's not what he was saying. Here's the paragraph that comes right before and shows the quote in its proper context in the paper it is from, Structured Programming with go to Statements.

The improvement in speed from Example 2 to Example 2a is only about 12%, and many people would pronounce that insignificant. The conventional wisdom shared by many of today’s software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny-wise- and-pound-foolish programmers, who can’t debug or maintain their “optimized” programs. In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering. Of course I wouldn’t bother making such optimizations on a one-shot job, but when it’s a question of preparing quality programs, I don’t want to restrict myself to tools that deny me such efficiencies.

There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

The thing is, now 50 years later... we do not live in a world of engineers who are doing much premature optimization. In fact, we've swung to the other side where we're prematurely non-optimizing.

Good performance resources?