Connection · Interrupted

Something didn't load

Part of this page failed to reach you. Reload to try again — if it keeps happening, check your connection.

Skip to main content
Distributed Systems10 min read

Convergence Is a Property of Your Merge Function, Not the Network

I once watched an afternoon of offline edits vanish under a last-writer-wins sync, and the fix was not better networking — it was a better merge function. These are my notes on why CRDT replicas converge: a merge that is commutative, associative, and idempotent. I rebuild a minimal add-wins OR-Set in TypeScript, run it, and weigh what the guarantee costs in tombstones and memory.

All Posts
2/4

I shipped a "collaborative" feature once with a sync model I was sure was correct: each client wrote its full document state to a shared store, tagged with a wall-clock timestamp, and on read the latest timestamp won. Last-writer-wins. It survived every demo. Then two people edited offline on a train, reconnected, and one person's entire afternoon of edits vanished. No error. No conflict dialog. The newer timestamp simply overwrote the older state, and the older state happened to contain work the newer one had never seen.

That was not a network bug. The packets all arrived. It was a merge bug. The store had no merge function at all — it had an overwrite, dressed up as one.

This is the trap I want to write up, because the fix is more precise than "use a CRDT library." The useful idea underneath CRDTs is small and testable: convergence is a mathematical property of the function you use to combine two states. Get that function right and replicas agree regardless of the order, duplication, or delay of the updates between them. Get it wrong and no amount of retry logic, vector clocks bolted on later, or careful ordering will save you.

The topic is current again because the tooling crossed a threshold. Automerge 3.0, released in July 2025, cut memory use by more than 10x — pasting Moby Dick into a document dropped from 700 MB in memory to 1.3 MB, and one large-history document that had not finished loading after 17 hours opened in 9 seconds (Automerge 3.0 announcement). That release put local-first CRDT sync back on the Hacker News front page and back into the "should I actually use this?" conversation. Before reaching for a library, it is worth understanding what the merge function is doing, and what it costs.

Why last-writer-wins is eventually wrong

A last-writer-wins (LWW) register is the simplest possible "merge": keep the value with the highest timestamp. It is commutative and associative, so it does converge — every replica ends up with the same value. The problem is which value.

LWW converges by deletion. When two replicas hold concurrent updates, one of them is discarded. For a single scalar like a status flag that is fine; somebody's write had to win. For anything with internal structure — a set of cart items, a document, a list of collaborators — LWW throws away the parts of the loser that the winner never contained. That is exactly how my train-edit afternoon disappeared. The state was structured, the merge treated it as opaque, and the merge was destructive.

So the requirement is sharper than "converge." I want a merge that converges and preserves every concurrent update that does not logically contradict another. That is what a Conflict-free Replicated Data Type gives you, and the guarantee has a name: strong eventual consistency. Replicas that have received the same set of updates are in the same state, with no coordination round and no consensus protocol (Shapiro et al., "Conflict-free Replicated Data Types," 2011).

What convergence actually requires

The whole guarantee rests on three properties of the merge function. Call the function (join):

  • Commutative: a ⊔ b = b ⊔ a. Order of arrival does not matter.
  • Associative: (a ⊔ b) ⊔ c = a ⊔ (b ⊔ c). Grouping does not matter.
  • Idempotent: a ⊔ a = a. Receiving the same update twice does not matter.

Those three together mean the set of possible states forms a join-semilattice and computes the least upper bound. Once that holds, the network can do its worst — reorder messages, deliver duplicates, sit on a packet for an hour — and every replica that has seen the same updates lands on the identical state. The ordering guarantees you would normally fight for at the transport layer become irrelevant, because the merge does not care about order.

The trick I find clarifying: stop reasoning about sequences of operations and start reasoning about sets of facts. Each replica accumulates facts. Merge is set union over those facts. Union is trivially commutative, associative, and idempotent. The entire design problem reduces to choosing what a "fact" is so that union produces the semantics you want.

The diagram below shows the shape I keep in my head: two replicas accumulate different facts while disconnected, then a single merge folds both fact-sets into one state that contains everything either side observed.

A minimal OR-Set you can run

The canonical example where the merge has to be cleverer than union-of-values is a set with both adds and removes. A plain set fails immediately: if replica A adds pen and replica B concurrently removes pen, set union says present and set difference says absent, and the two replicas diverge. There is no order of operations that resolves it, because the operations are concurrent — neither happened "after" the other.

The Observed-Remove Set (OR-Set) resolves this with add-wins semantics. A concurrent add and remove of the same element resolves in favor of the add. A remove only cancels the specific adds it actually observed (Almeida, Shoker, Baquero, "Delta State Replicated Data Types," 2018). The mechanism is a unique tag per add — a "dot" of the form replica:counter. The element is present if it has at least one add-tag that no remove has tombstoned. Merge is union of the add-tags and union of the tombstones. Union, again, so the three properties hold for free.

Here is the whole thing in TypeScript, runnable as written:

typescript
// orset.ts — a state-based, add-wins Observed-Remove Set (OR-Set) CRDT.
// Convergence comes from the merge function, not the network.

type Tag = string; // a unique "dot": `${replica}:${counter}`

class ORSet<T> {
  private adds = new Map<T, Set<Tag>>();
  private removed = new Set<Tag>();
  private counter = 0;

  constructor(private readonly replica: string) {}

  add(e: T): void {
    const tag = `${this.replica}:${this.counter++}`;
    const tags = this.adds.get(e) ?? new Set<Tag>();
    tags.add(tag);
    this.adds.set(e, tags);
  }

  remove(e: T): void {
    // Tombstone only the tags this replica has already observed for e.
    for (const tag of this.adds.get(e) ?? []) this.removed.add(tag);
  }

  has(e: T): boolean {
    for (const tag of this.adds.get(e) ?? []) {
      if (!this.removed.has(tag)) return true; // a live add-tag exists
    }
    return false;
  }

  values(): T[] {
    return [...this.adds.keys()].filter((e) => this.has(e)).sort();
  }

  clone(): ORSet<T> {
    const c = new ORSet<T>(this.replica);
    for (const [e, tags] of this.adds) c.adds.set(e, new Set(tags));
    c.removed = new Set(this.removed);
    return c;
  }

  // Least upper bound: union the add-tags, union the tombstones.
  // commutative + associative + idempotent => order cannot change the result.
  merge(other: ORSet<T>): void {
    for (const [e, tags] of other.adds) {
      const mine = this.adds.get(e) ?? new Set<Tag>();
      for (const t of tags) mine.add(t);
      this.adds.set(e, mine);
    }
    for (const t of other.removed) this.removed.add(t);
  }
}

// Two replicas diverge offline, then exchange state.
const a = new ORSet<string>("A");
const b = new ORSet<string>("B");

a.add("book");
a.add("pen");
b.merge(a.clone());     // B syncs once and learns both items
b.remove("pen");        // B removes pen (it observed A's add-tag)
a.add("pen");           // A concurrently re-adds pen with a NEW tag

// Merge in both directions from independent copies.
const ab = a.clone(); ab.merge(b);
const ba = b.clone(); ba.merge(a);

// Idempotence: merging the same delta twice changes nothing.
const twice = ab.clone(); twice.merge(b); twice.merge(b);

console.log("A then B :", ab.values());
console.log("B then A :", ba.values());
console.log("merged 2x:", twice.values());
console.log("converged:", JSON.stringify(ab.values()) === JSON.stringify(ba.values()));
console.log("add-wins :", ab.has("pen")); // concurrent re-add beats the remove

Run it with one command:

npx tsx orset.ts

The output:

A then B : [ 'book', 'pen' ] B then A : [ 'book', 'pen' ] merged 2x: [ 'book', 'pen' ] converged: true add-wins : true

The lines worth pausing on are the merge and the scenario. merge does nothing but two unions, which is why merging A-then-B and B-then-A produce byte-identical results, and why merging the same state twice is a no-op. The scenario is the part that breaks a naive set: B removes pen after observing A's original add-tag, but A's re-add mints a fresh tag that B's remove never saw. That untombstoned tag keeps pen in the set. Add-wins is not a tie-breaker rule layered on top — it falls directly out of the tag-and-tombstone design.

One detail I got wrong the first time: remove must tombstone only the tags the replica has currently observed, not the element name. If you remove by element name, a remove can cancel a future add it never saw, and you lose add-wins. The whole guarantee lives in that one loop.

What add-wins costs you

CRDTs are not free convergence. They are convergence paid for in metadata, and the bill comes due in three places.

Tombstones never leave. My removed set only grows. A delete does not reclaim space; it adds a record. For text CRDTs the figures are concrete: roughly 16–32 bytes of metadata per character, so a 50,000-word document can carry 1.6–3.2 MB of overhead on top of its content, and a heavily edited 1,000-character document can hold tens of thousands of tombstones internally. Delta-state CRDTs — the design behind the Almeida paper above — reduce this by shipping small deltas and tracking removes in a compact causal context instead of per-element tombstones, but they reduce the overhead, they do not eliminate it.

Memory was the real ceiling, not disk. This is what made Automerge 3.0 notable. The on-disk compressed format was already small; the cost was that loading a document expanded its full history into memory. Re-architecting to keep the compressed columnar representation at runtime is what produced the 700 MB → 1.3 MB drop. If you are evaluating a CRDT library, the question to ask is not "how big is the file" but "how much memory does an open document with a long history consume," because that is where the older designs fell over.

You may not need decentralized consensus at all. This is the contrarian point worth taking seriously. CRDTs solve conflict resolution without a coordinating server. If you already have a server — and most SaaS products do — Operational Transformation gives you deterministic conflict resolution with no tombstones, no per-element IDs, and no garbage-collection problem, at the price of a central authority that orders operations. Reaching for a CRDT in a centralized app because it feels more modern means paying the metadata tax for an autonomy you are not using. The honest trade is: CRDTs buy you offline-first and peer-to-peer; if you need neither, a server-ordered log is simpler and cheaper.

When to reach for this, and when not to

From working through this, my rule of thumb:

  • Reach for a CRDT when replicas must accept writes while disconnected and reconcile later with no central referee — offline-first mobile, local-first desktop apps, multi-region active-active stores, peer-to-peer collaboration.
  • Design the fact, not the operation. Decide what a unit of evidence is (an add-tag, a counter increment, a versioned register) so that set union produces the semantics you want. If union does not, you have picked the wrong fact.
  • Test the algebra, not just the feature. The three properties are directly checkable: merge in both orders and assert equality; merge a duplicate and assert no change. Property-based testing across random operation interleavings catches a broken merge faster than any integration test, because a broken merge is a broken equation, not a broken request.
  • Avoid it when a server already orders your writes and you do not need offline editing. Operational Transformation or a simple versioned write with optimistic concurrency control will cost far less metadata.
  • Budget for tombstone growth up front. If documents are long-lived and edit-heavy, plan compaction or a delta-state design before launch, not after a memory page.

The one sentence I want to keep: a last-writer-wins map is not eventually consistent, it is eventually wrong, and the difference between the two is whether your merge function is commutative, associative, and idempotent. Everything else — tags, tombstones, delta encodings, the choice of library — is implementation detail in service of those three properties.

Read next

Still here? You might enjoy this.

Nothing close enough — try a different angle?

Was this helpful?

Leave a rating or a quick note — it helps me improve.