Cache Coherency: Packed vs Padded Thread Counters
Question 2 / 17 • Correct so far: 0 (0 answered)
Packed Counters
struct Counters {
std::atomic<long long> a{0};
std::atomic<long long> b{0};
};
c.fetch_add(1, std::memory_order_relaxed); Padded Counters
struct alignas(kCacheLineBytes) PaddedCounter {
std::atomic<long long> value{0};
};
struct Counters {
PaddedCounter a;
PaddedCounter b;
};
c.fetch_add(1, std::memory_order_relaxed); Shared test data (shared-setup)
static constexpr std::size_t kCacheLineBytes = 64; Which snippet is faster?
Snippet B is faster. When two threads continuously write to variables that share the same 64-byte cache line, the hardware cache coherency protocol must transfer ownership of that line between cores on every write — a phenomenon called false sharing. Although the threads never read each other's counter, the CPU treats the whole cache line as the unit of coherency. Wrapping each counter in an alignas(64) struct forces the hardware to allocate a dedicated cache line per counter, eliminating the inter-core ping-pong and allowing both threads to write at full speed.
Benchmark results
| Snippet | CPU time / iteration | Speedup |
|---|---|---|
| Packed Counters | 8.34 ns | 1.0× |
| Padded Counters | 1.29 ns | 6.5× |
Explore the source
Open in Compiler ExplorerQuiz complete. You can return to the question list to restart and compare.