Schedule perturbation — random message delays, reordered RPCs
Invariant checking — linearizability, no data loss, leader uniqueness
The Results
The Results
Looks fine. Ship it?
Were those 50,000 "quiet" runs exploring new territory… or retreading the same ground?
"How thorough was this testing?"
In unit testing and fuzzing, we have ready tools to answer this question. In distributed systems testing, it's complicated.
Code Coverage: What It Gives Us
1deftransfer(src, dst, amt): 2if src.balance >= amt: 3 src.balance -= amt 4 dst.balance += amt 5return True 6elif src.overdraft_ok: 7 src.balance -= amt 8 dst.balance += amt 9return True10return False
Progress metric — 6/10 lines covered
Actionable gap — lines 6–9 untested: need a test with overdraft_ok
Stopping criterion — "90% branch coverage" is at least a legible statement
Coverage ≠ Correctness, But…
100% coverage doesn't mean bug-free.
But it's a necessary-condition check:
"If this branch was never executed, it was never tested."
AFL: Coverage Guidance for Fuzzing
Coverage as equivalence classes: same coverage => redundant behavior new coverage => new behavior worth exploring
Can we develop a notion of coverage for distributed systems?
Complex inter-component interactions => code coverage is rarely good enough.
What Are We Even Covering?
In unit testing or fuzzing, the space is paths through code. In distributed systems, the space is multi-dimensional:
States — global snapshots across all nodes
Faults — severity, scope, and timing
Schedules — message delivery orderings
Workloads — concurrent client operations
We need something that measures progress, identifies gaps, and can provide actionable feedback for further testing.
State-Space Coverage
Coverage = distinct states visited as a fraction of all states. But what is a state? Is the universe of states finite?
State-Space Coverage: What is a State?
An abstraction function extracts only protocol-relevant dimensions. Choosing the right abstraction is domain-specific.
State-Space Coverage: Exploiting Symmetry
Collapsing equivalent behaviors speeds up exploration and gives a more honest picture of coverage.
Leesatapornwongsa et al. — OSDI 2014 Lukman et al. — EuroSys 2019
Fault-State Coverage
Coverage over fault type × protocol phase (^ combinations).
E.g., From Jepsen's nemesis logs.
Happens-Before (Lamport Timeline) Coverage
In model checking, DPOR treats two executions with the same happens-before relation as equivalent. Lamport timelines are analogous to code paths in sequential code.
Flanagan & Godefroid — "Dynamic Partial-Order Reduction for Model Checking Software" (POPL 2005)
Happens-Before Pair Coverage
Abstracts concrete events to types (e.g., "[12:41] Read /foo in 3ms" → "Read")
For every pair of abstract event types, have we observed both possible orderings?
Much smaller space than full interleavings; reduction analogous to branch coverage for code paths
Extends to k-tuple coverage for deeper interactions
Mallory: Grey-Box Fuzzing of Distributed Systems
Key insight: new "happens-before pairs" are the distributed analogue of "new code branches" in AFL.
Meng, Pîrlea, Roychoudhury & Sergey — CCS 2023
PCT: Probabilistic Concurrency Testing
Small model hypothesis: most bugs need only d specific ordering constraints (e.g., crash → commit → rollback for d=3)
Assumes simulation testing where all node operations can be serialized into a single event stream.
Key idea: Aways simulate events from highest-priority node only, and randomly place d−1 priority-change points in the schedule.
Each run finds a depth-d bug with probability ≥ 1/nkd−1
Doesn't track coverage directly — but gives a probabilistic completeness guarantee as a function of number of runs