Raft Consensus with a Minority of Nodes

tl;dr — This post describes a (wacky) modification to the Raft consensus protocol such that progress can be made even if fewer than a majority of nodes are actively participating, given some constraints on exactly which minority of nodes are active. The math behind this comes from the same place as the card game Spot It! (Dobble).

Raft Consensus Basics

Raft is a consensus protocol for managing a replicated log across a cluster of nodes. Its key goals are: (1) maintain a consistent replicated log of state transitions, (2) tolerate node failures, and (3) ensure a single leader coordinates all changes while multiple followers replicate. Raft is designed to be understandable — it decomposes consensus into leader election, log replication, and safety — and is widely used in systems like etcd, CockroachDB, and TiKV.

In steady state, the leader receives client requests and appends them to its log. It then sends AppendEntries RPCs to all followers. Once a majority of nodes (including itself) have appended the entry, the leader considers it "committed" and applies it to the state machine. For example, in a 5-node cluster, the leader needs acknowledgments from at least 2 followers (3 total including itself) before committing. This provides fault tolerance for up two node failures or a network partition where at least a majority of nodes are able to communicative with each other.

If the leader crashes, a new one is elected. Any node can become a candidate, start an election for a new "term," and request votes. A candidate wins if it receives votes from a majority of nodes. This guarantees that at most one leader exists per term. Once elected, the new leader synchronizes followers' logs and resumes normal operation.

The key correctness insight is this: any two majorities of nodes must overlap in at least one node. So between any two consecutive global state changes — whether two commits, two leader elections, or one of each — at least one node participated in both. This single overlapping node carries forward the knowledge of what was previously committed, preventing conflicts and ensuring consistency. In a 5-node cluster, any two sets of 3 nodes must share at least one member. This overlap is what makes Raft safe.

Spot It!

Spot It! (known as Dobble outside North America) is a card game whose rules are relatively straightforward: flip a card from the deck to the center, and race to find the one symbol your card has in common with the center card. Call it out, discard your card, and repeat. It's fast, fun, and requires no reading or arithmetic. It's simple enough for a 5-year old to learn quickly, yet the game design is surprisingly complex.

Here's the remarkable property that makes the game work: the deck has 55 cards, each with 8 distinct symbols drawn from a pool of 57 unique symbols, and any two cards share exactly one symbol in common. This isn't trivial to engineer. Try designing even 10 cards with this property and you'll find it surprisingly difficult. The game's designers didn't just get lucky — they used a beautiful piece of mathematics: finite projective planes.

Finite Projective Planes

A finite projective plane of order $n$ is a combinatorial structure consisting of points and lines with three key properties: (1) any two distinct points lie on exactly one common line, (2) any two distinct lines intersect in exactly one common point, and (3) every line contains exactly $n + 1$ points and every point lies on exactly $n + 1$ lines. The total number of points equals the total number of lines, and both equal $n^2 + n + 1$.

The smallest example is the Fano plane (order $n = 2$): 7 points, 7 lines, with 3 points on each line and 3 lines through each point. In the diagram above, the seven "lines" are the three sides of the triangle, the three altitudes, and the inscribed circle — each passing through exactly 3 of the 7 points. You can verify that any two of these lines share exactly one point.

Spot It! uses a finite projective plane of order $n = 7$. This gives $7^2 + 7 + 1 = 57$ points (symbols) and 57 lines (cards), with $7 + 1 = 8$ points per line (symbols per card). The intersection property guarantees any two cards share exactly one symbol — exactly what the game needs.

Finite projective planes are known to exist when the order $n$ is a prime power. Here are some small examples:

(Order 6 is notably absent — it was proven not to exist. Order 10 was shown not to exist by an exhaustive computer search in 1989. Whether finite projective planes exist for non-prime-power orders remains an open question in combinatorics.)

Order ($n$)	Points = Lines ($n^2 + n + 1$)	Points per line ($n + 1$)	Notes
2	7	3	Fano plane
3	13	4
4	21	5
5	31	6
7	57	8	Spot It!
8	73	9
9	91	10
11	133	12

💡 Key insight: In Raft, the reason we need a majority is the overlap property — any two majorities share at least one node. But majorities aren't the only set systems with guaranteed pairwise intersection. Finite projective planes give us another: any two lines intersect in exactly one point. So if we assign nodes to points and designate each "line" as a valid voting bloc, any two blocs are guaranteed to share a common node — the same safety property Raft relies on.

For a 57-node system using the order-7 projective plane, we'd have 57 designated blocs of 8 nodes each. Consensus requires just 8 nodes to agree — far fewer than the 29 needed for a traditional majority. The trade-off, of course, is that not every subset of 8 nodes forms a valid bloc. We'll explore this trade-off later.

Raft with Finite Projective Planes

Here's the general construction. Given a cluster of $N$ nodes, find the smallest prime power $p$ such that $p^2 + p + 1 \geq N$. Construct the finite projective plane of order $p$, which gives us $p^2 + p + 1$ points (we use $N$ of them as our nodes) and $p^2 + p + 1$ lines. Each line contains $p + 1$ points. We call these lines blocs — the valid quorum sets for our modified protocol.

Why is this correct? Consider any two global state changes (commits or elections). Each involved some set of participating nodes that contains at least one complete voting bloc. Call these sets $S_1$ and $S_2$, containing blocs $B_1$ and $B_2$ respectively. By the projective plane intersection property, $B_1 \cap B_2 \neq \emptyset$ — there exists at least one node in both blocs. Since $B_1 \subseteq S_1$ and $B_2 \subseteq S_2$, we have $S_1 \cap S_2 \neq \emptyset$. Therefore, at least one node participated in both state changes, preserving Raft's consistency guarantee.

Demo: 7-node cluster with the Fano plane

Let's make this concrete with the smallest non-trivial example: 7 nodes arranged according to the Fano plane (order 2). We have 7 nodes and 7 blocs of 3 nodes each:

You can verify: any two blocs share exactly one node. (E.g., $B_1 \cap B_4 = \{2\}$, $B_2 \cap B_7 = \{5\}$, etc.)

In classic Raft with 7 nodes, a majority quorum requires 4 nodes. With our projective-plane modification, a bloc quorum requires only 3 nodes — but they must be a specific triple. Let's walk through several scenarios.

Bloc	Nodes
$B_1$	{1, 2, 3}
$B_2$	{1, 4, 5}
$B_3$	{1, 6, 7}
$B_4$	{2, 4, 6}
$B_5$	{2, 5, 7}
$B_6$	{3, 4, 7}
$B_7$	{3, 5, 6}

Scenario 1: Steady state — all nodes active

Node 1 is the leader (term 1). A client sends a write request. Node 1 appends to its log and sends AppendEntries to all followers. All respond.

Node	1	2	3	4	5	6	7
Status	Leader	Follower	Follower	Follower	Follower	Follower	Follower

The responding set {1,2,3,4,5,6,7} contains every bloc. The leader commits as soon as any bloc is complete — for instance, once nodes 2 and 3 respond, bloc $B_1 = \{1,2,3\}$ is satisfied.

Outcome: Commit succeeds.

Scenario 2: Only 3 nodes active — and they form a bloc

Nodes 4, 5, 6, 7 have crashed. Nodes 1, 2, 3 remain active. Node 1 is still leader.

Node	1	2	3	4	5	6	7
Status	Leader	Follower	Follower	DOWN	DOWN	DOWN	DOWN

The responding set is {1, 2, 3} = bloc $B_1$. Even though 4 out of 7 nodes are down (a majority has failed!), the protocol can still make progress because the active nodes happen to form a valid bloc.

Outcome: Commit succeeds with only 3 out of 7 nodes. Classic Raft would be stuck here.

Scenario 3: Only 2 nodes active — no bloc possible

Node 1 (the leader) crashes. Now only nodes 2 and 3 are active. Node 2 starts an election for term 2.

Node	1	2	3	4	5	6	7
Status	DOWN	Candidate	Vote #2	DOWN	DOWN	DOWN	DOWN

Node 2 gets a vote from node 3, so its vote set is {2, 3}. The blocs containing node 2 are $B_1 = \{1,2,3\}$, $B_4 = \{2,4,6\}$, $B_5 = \{2,5,7\}$. None of these are subsets of {2, 3}. No bloc is satisfied.

Outcome: Election fails. The system is stuck until at least one more node recovers.

Scenario 4: Successful leader election — with a provable overlap

Nodes 4 and 6 recover. Now nodes 2, 4, and 6 are active. Node 2 starts a new election (term 3).

Node	1	2	3	4	5	6	7
Status	DOWN	Candidate	DOWN	Vote #2	DOWN	Vote #2	DOWN

The vote set is {2, 4, 6} = bloc $B_4$. Election succeeds!

Now recall: the last commit (Scenario 2) was made by bloc $B_1 = \{1, 2, 3\}$. The election bloc is $B_4 = \{2, 4, 6\}$. Their intersection is $B_1 \cap B_4 = \{2\}$ — node 2, the new leader, was part of the commit bloc. It carries the committed log entry forward. This is guaranteed by the projective plane property: any bloc used for the election must share at least one node with $B_1$.

Outcome: Election succeeds. Safety is preserved — node 2 ensures the new leader knows about the prior commit.

Scenario 5: A majority is active, but no bloc is present

Nodes 1, 2, 4, 7 are active (4 out of 7 — a majority). Node 2 is trying to commit.

Node	1	2	3	4	5	6	7
Status	Follower	Leader	DOWN	Follower	DOWN	DOWN	Follower

The active set is {1, 2, 4, 7}. Let's check every bloc:

$B_1 = \{1, 2, 3\}$ — node 3 is down ✗
$B_2 = \{1, 4, 5\}$ — node 5 is down ✗
$B_3 = \{1, 6, 7\}$ — node 6 is down ✗
$B_4 = \{2, 4, 6\}$ — node 6 is down ✗
$B_5 = \{2, 5, 7\}$ — node 5 is down ✗
$B_6 = \{3, 4, 7\}$ — node 3 is down ✗
$B_7 = \{3, 5, 6\}$ — nodes 3, 5, 6 all down ✗

No bloc is fully contained in the active set. Even though a majority of nodes is available, the system cannot make progress.

Outcome: Commit fails. This is the fundamental trade-off — our protocol is not guaranteed to work whenever a majority is available.

Scenario 6: Full recovery

All nodes come back online. Node 2 is still leader from the election in Scenario 4.

Node	1	2	3	4	5	6	7
Status	Follower	Leader	Follower	Follower	Follower	Follower	Follower

The leader sends AppendEntries to all. As soon as any 2 followers respond (completing a bloc with the leader), entries commit. Recovered nodes that missed entries get their logs brought up to date via Raft's normal log replication mechanism. The system is fully operational again — every possible bloc is satisfiable.

Outcome: Normal operation resumes. All future commits and elections can proceed with any bloc.

Trade-off

The fundamental trade-off is clear from Scenario 5: unlike classical Raft, our modified protocol is not guaranteed to make progress whenever a majority of nodes is active. We need the active set to contain at least one complete bloc. So the natural question is: given a random subset of $k$ active nodes out of $N$ total, what's the probability that it contains at least one of our blocs?

Let's work this out. We have a projective plane of order $p$ with $N = p^2 + p + 1$ nodes and $N$ blocs of size $p + 1$ each. If $k$ nodes are active (chosen uniformly at random), the probability that a specific bloc is entirely contained in the active set is:

(We choose the remaining $k - (p+1)$ active nodes from the $N - (p+1)$ nodes not in bloc $B_i$, assuming all $p + 1$ nodes of $B_i$ are active.)

By a union bound (which slightly overcounts since blocs can overlap), the probability that at least one bloc is present is at most:

For an exact answer, we'd use inclusion-exclusion, but the union bound gives a useful upper estimate. Let's compute some examples:

Example: Order 2 (Fano plane, $N = 7$, bloc size 3)

With the Fano plane, $N = 7$ and each bloc has 3 nodes. We can compute exact probabilities by brute force: there are $\binom{7}{k}$ equally likely subsets of $k$ active nodes, and we count how many contain at least one of the 7 blocs.

So with the Fano plane: you can sometimes make progress with just 3 active nodes (20% chance if the active set is random), with 4 active nodes you succeed 80% of the time, and with 5+ you're always fine. Classic Raft always works at $k = 4$, but our scheme has a 20% failure rate there — the cost of needing only 3 nodes in the best case.

Example: Order 7 (Spot It!, $N = 57$, bloc size 8)

With 57 nodes, 57 blocs of size 8, and a classical majority quorum of 29, the landscape looks quite different. Exact computation via brute force is infeasible ($\binom{57}{20} \approx 10^{14}$), but the union bound gives a reasonable estimate for moderate $k$:

Active nodes ($k$)	$P(\text{at least one bloc present})$	Classic Raft ($k \geq 4$)?
3	$7/35 = 20\%$	No
4	$28/35 = 80\%$	Yes (always works)
5	$21/21 = 100\%$	Yes

Active nodes ($k$)	$P(\text{at least one bloc present})$ (union bound)	Classic Raft ($k \geq 29$)?
8	$\approx 0.0000035\%$	No
15	$\approx 0.02\%$	No
20	$\approx 0.4\%$	No
29	$\approx 15\%$	Yes (always works)

The numbers are sobering: even at the classical majority threshold of $k = 29$, a random subset only has about a 15% chance of containing a valid bloc. Our scheme trades away the guarantee of progress at majority for the possibility of progress with far fewer nodes — but that possibility is slim unless the active set is much larger than a single bloc.

Final Thoughts: Erdős–Ko–Rado theorem

Stepping back, what we're really trying to do is solve an optimization problem: given $N$ nodes, find a family of subsets (our "blocs") such that (1) any two subsets in the family intersect (the safety requirement), (2) the size of each subset is minimized (so fewer nodes need to be active for a quorum), and (3) the number of subsets in the family is maximized (so a random set of active nodes is more likely to contain one).

Finite projective planes give one beautiful construction, but they're not the only option. The Erdős–Ko–Rado theorem (1961) provides fundamental bounds on exactly this kind of structure. It tells us: given $N$ points, what is the maximum number of subsets of size $r$ such that any two subsets share at least one element? The answer is $\binom{N-1}{r-1}$ (when $N \geq 2r$) — achieved by fixing one element and taking all $r$-subsets containing it.

This gives us a framework for understanding the trade-off space. If we want blocs of size $r$ from $N$ nodes:

Projective planes are special because they achieve a particularly elegant balance: they give $N$ blocs of size $\sqrt{N}$ (roughly), all pairwise intersecting in exactly one point. The EKR theorem tells us we could potentially have more blocs of the same size if we relaxed other structural constraints — but the projective plane's rigid structure makes it easy to construct and reason about.

The deeper question is: can we find intersecting families that beat projective planes on the metric we care about most — the probability that a random $k$-subset of active nodes contains at least one bloc? This is an open design space, and the EKR theorem provides the ceiling on how many blocs we can have for a given size. Exploring constructions that approach this ceiling while remaining practical to implement could be a direction worth pursuing.

There is also a completely different angle: forget symmetry, and optimize for real-world failure patterns instead. In practice, failures are not random — they tend to be correlated within failure domains such as racks, availability zones, or regions. If we're willing to design our bloc family around the specific failure topology we care about, we don't need a projective plane at all. For example, suppose a cluster spans three availability zones. We could simply define three blocs, one rooted in each AZ, such that each bloc has at least one node from each of the other two AZs. Any two such blocs share at least one node (since they each reach into the other's home zone), satisfying the intersection property — and any two-AZ failure leaves the third bloc intact. This isn't as mathematically elegant, and it requires thinking carefully about your deployment rather than turning a combinatorial crank, but it will likely be more effective in practice than betting on a random active set containing a Fano-plane triple.