Test Coverage?
In Distributed Systems??!

Rohan Padhye
Carnegie Mellon University

Knock, knock

Distributed systems

Who's there?

(cue laughter)

Knock, knock

Who's there?

Distributed systems, who?

(cue undefined behavior)

Test Coverage?
In Distributed Systems??!

Rohan Padhye
Carnegie Mellon University

(and also:

)

A Real Testing Campaign

Your team is testing a Raft-based distributed store. You throw everything at it:

Randomized client workloads — mixed reads, writes, CAS ops
Fault injection — node crashes, network partitions, clock skew
Schedule perturbation — random message delays, reordered RPCs
Invariant checking — linearizability, no data loss, leader uniqueness

The Results

Looks fine. Ship it?

Were those 50,000 "quiet" runs exploring new territory…
or retreading the same ground?

"How thorough was this testing?"

In unit testing and fuzzing, we have ready tools to answer this question.
In distributed systems testing, it's complicated.

Code Coverage: What It Gives Us

 1def transfer(src, dst, amt): 2    if src.balance >= amt: 3        src.balance -= amt 4        dst.balance += amt 5        return True 6    elif src.overdraft_ok: 7        src.balance -= amt 8        dst.balance += amt 9        return True10    return False

Progress metric — 6/10 lines covered
Actionable gap — lines 6–9 untested: need a test with overdraft_ok
Stopping criterion — "90% branch coverage" is at least a legible statement

Coverage ≠ Correctness, But…

100% coverage doesn't mean bug-free.

But it's a necessary-condition check:

"If this branch was never executed,
it was never tested."

AFL: Coverage Guidance for Fuzzing

Coverage as equivalence classes:
same coverage => redundant behavior
new coverage => new behavior worth exploring

Can we develop a notion of coverage
for distributed systems?

Complex inter-component interactions => code coverage is rarely good enough.

What Are We Even Covering?

In unit testing or fuzzing, the space is paths through code.
In distributed systems, the space is multi-dimensional:

States — global snapshots across all nodes
Faults — severity, scope, and timing
Schedules — message delivery orderings
Workloads — concurrent client operations

We need something that measures progress, identifies gaps,
and can provide actionable feedback for further testing.

State-Space Coverage

Coverage = distinct states visited as a fraction of all states.
But what is a state? Is the universe of states finite?

State-Space Coverage: What is a State?

An abstraction function extracts only protocol-relevant dimensions.
Choosing the right abstraction is domain-specific.

State-Space Coverage: Exploiting Symmetry

Collapsing equivalent behaviors speeds up exploration and gives a more honest picture of coverage.

Leesatapornwongsa et al. — OSDI 2014
Lukman et al. — EuroSys 2019

Fault-State Coverage

Coverage over fault type × protocol phase (^ combinations).
E.g., From Jepsen's nemesis logs.

Happens-Before (Lamport Timeline) Coverage

In model checking, DPOR treats two executions with the same happens-before relation as equivalent.
Lamport timelines are analogous to code paths in sequential code.

Flanagan & Godefroid — "Dynamic Partial-Order Reduction for Model Checking Software" (POPL 2005)

Happens-Before Pair Coverage

Abstracts concrete events to types
(e.g., "[12:41] Read /foo in 3ms" → "Read")
For every pair of abstract event types, have we observed both possible orderings?
Much smaller space than full interleavings; reduction analogous to branch coverage for code paths
Extends to k-tuple coverage for deeper interactions

Mallory: Grey-Box Fuzzing of Distributed Systems

Key insight: new "happens-before pairs" are the distributed analogue of "new code branches" in AFL.

Meng, Pîrlea, Roychoudhury & Sergey — CCS 2023

PCT: Probabilistic Concurrency Testing

Small model hypothesis: most bugs need only d specific ordering constraints (e.g., crash → commit → rollback for d=3)
Assumes simulation testing where all node operations can be serialized into a single event stream.
Key idea: Aways simulate events from highest-priority node only, and randomly place d−1 priority-change points in the schedule.
Each run finds a depth-d bug with probability ≥ 1/nk^d−1
Doesn't track coverage directly — but gives a probabilistic completeness guarantee as a function of number of runs

Burckhardt, Kothari, Musuvathi & Nagarakatte — ASPLOS 2010

Predicate / Scenario Coverage

Coverage = fraction of predicates witnessed at least once.
Uncovered predicates → actionable: "these scenarios haven't been tested."

Fest: Feedback-Guided Adaptive Testing

Key idea: Use PCT to bound the search space,
then use happens-before pairs + scenario coverage as feedback to guide exploration.

Li, Desai & Padhye — NSDI 2026

Lineage-Driven Fault Injection (LDFI)

Alvaro, Rosen, Hellerstein — SIGMOD 2015

Opiniated Summary

Approach	What it measures	Assumes	Granularity	Tractability	Actionability
State-space coverage	Global states	State abstraction	Medium	Medium	Medium
Fault-state coverage	Fault combo × timing	Logging (states)	Coarse	High	High
Happens-before / DPOR	Causally distinct schedules	Logging (events)	Fine	Low	High
Happens-before pairs	Event orderings	Event abstraction	Fine	High	High
PCT	Small-depth bug probability	Controlled Simulation	Coarse	High	Low
Scenario coverage	User-defined predicates	Domain expertise	Medium	High	High
LDFI	Critical causal paths	Lineage Info	Fine	Medium	Very high

Closing

We started with: "50,000 runs, no new bugs — are we done?"

We now have a vocabulary for answering that more rigorously.

(It was probably the same knock-knock joke)

References

Alvaro et al. — "Lineage-Driven Fault Injection" (SIGMOD 2015)

Burckhardt et al. — "Randomized Testing of Distributed Systems with Probabilistic Guarantees" (ASPLOS 2010)

Flanagan & Godefroid — "Dynamic Partial-Order Reduction for Model Checking Software" (POPL 2005)

Kingsbury — Jepsen (jepsen.io)

Lamport — "Specifying Systems: TLA⁺"

Leesatapornwongsa et al. — "SAMC: Semantic-Aware Model Checking" (OSDI 2014)

Li, Desai & Padhye — "Feedback-guided Adaptive Testing of Distributed Systems Designs" (NSDI 2026)

Lukman et al. — "FlyMC: Highly Scalable Testing of Complex Interleavings in Distributed Systems" (EuroSys 2019)

Meng, Pîrlea et al. — "Greybox Fuzzing of Distributed Systems" (CCS 2023)

Test Coverage?In Distributed Systems??!

Test Coverage?In Distributed Systems??!

A Real Testing Campaign

The Results

The Results

Code Coverage: What It Gives Us

Coverage ≠ Correctness, But…

AFL: Coverage Guidance for Fuzzing

What Are We Even Covering?

State-Space Coverage

State-Space Coverage: What is a State?

State-Space Coverage: Exploiting Symmetry

Fault-State Coverage

Happens-Before (Lamport Timeline) Coverage

Happens-Before Pair Coverage

Mallory: Grey-Box Fuzzing of Distributed Systems

PCT: Probabilistic Concurrency Testing

Predicate / Scenario Coverage

Fest: Feedback-Guided Adaptive Testing

Lineage-Driven Fault Injection (LDFI)

Opiniated Summary

Closing

References

Test Coverage?
In Distributed Systems??!

Test Coverage?
In Distributed Systems??!