#distributed-systems

Every post I've written about distributed-systems — 11 articles, across Engineering, Distributed Systems, Backend. Most recent: “When SQL Is Enough for Streaming: What Incremental View Maintenance Guarantees — and What It Refuses.”

Engineering01

When SQL Is Enough for Streaming: What Incremental View Maintenance Guarantees — and What It Refuses

The 2026 pitch is that a hand-written stream job collapses into one CREATE MATERIALIZED VIEW, so I built one in RisingWave and spent my study time trying to break it. These are my notes on the consistency the view actually delivers as events arrive, why every join in it is a standing memory bill, and the queries that refuse to be incremental — with a Node consumer riding the view's changefeed over the plain Postgres protocol.

Jul 19

Distributed Systems02

Hybrid Logical Clocks: Making Last-Write-Wins Mean the Later Write

Wall-clock last-write-wins keeps the write from the faster clock, not the later event — and silently drops causally newer data under skew. These are my notes on rebuilding a Hybrid Logical Clock in Go: a 64-bit, monotonic, causal timestamp, why its counter stays bounded, and what it costs in CockroachDB-style uncertainty restarts.

Jul 8

Engineering03

What APISIX in the Trial Ring Actually Buys You: Notes on Its etcd-Backed Control Plane

Volume 34 of the Thoughtworks Technology Radar moved Apache APISIX into the Trial ring. I spent a week digging through the docs, source code, and a couple of bug reports to convince myself the etcd-backed dynamic-routing claim was real — and to weigh the operational cost it hides. These are my notes on the watch mechanism, the connection-scaling cliff at 263 long polls, and when I would and would not reach for APISIX in 2026.

Jul 1

Backend04

Catching a Retry Race with One Seed: Deterministic Simulation in Rust using turmoil

I had three flaky retry tests no one could reproduce on a laptop. I rewrote one in Rust on top of turmoil, Tokio's deterministic simulator, and a single 8-byte seed pinned the partition race byte-for-byte. These are my notes on what the seed actually controls, what leaks past it, and when deterministic simulation testing is worth the seam.

Jun 4

Distributed Systems05

Actor-per-Entity vs Postgres Optimistic Locking: A Seat-Reservation Bake-off

I ran the same hot-key seat reservation workload two ways: Postgres with a version column and retries, and a single actor per seat. The actor design did not scale better — it moved the hard problem from concurrency control to routing and rebalance correctness, and that trade was the easier one to reason about under hot keys.

May 26

Backend06

Durable Execution Isn't About Agents — It's About Replayable Backend Workflows

I came to durable-execution runtimes through the agent press, but the constraint that surprises everyone is determinism on replay. These are my notes from working a six-step payment reconciliation as a Restate workflow in TypeScript — the line that broke replay, the mental model that fixed it, and the trade-offs that come with the pattern.

May 19

Distributed Systems07

AckWait Is a Contract: How a 30-Second Default Took Down My JetStream Consumer

I lost an evening to a NATS JetStream pull consumer that doubled its work in production. The cause was three lines of ConsumerConfig I never wrote. These are my notes on what AckWait actually counts, why MaxDeliver = -1 is the silent footgun, and the 70-line Go contract I now ship on every JetStream consumer.

May 12

Engineering08

What `dbos ontime` Actually Asks: Building a Distributed Cron on etcd Leases in Go

A 0-click query for `dbos ontime` showed up in my Search Console last week. The reader is not asking about DBOS — they are asking how to run a job every minute, exactly once, across a fleet. From my own notes, an etcd lease, the `concurrency.Election` package, and a fencing token cover that case in under 100 lines of Go, without pulling in a workflow engine.

May 7

Engineering09

DBOS vs Temporal: When Postgres Is Enough for Durable Workflow Execution

DBOS reuses Postgres as the durability layer for workflows, while Temporal runs a dedicated cluster. The right choice depends on team size, workload shape, and where you want your operational budget to go. This is a practical rubric for picking between them.

Apr 26

Distributed Systems10

Cell-Based Architecture Isn't Free: What Slack, DoorDash, and Roblox Actually Paid For It

Cell-based architecture contains blast radius, but it is not free. A look at what Slack, DoorDash, and Roblox actually paid for cells in production — and a checklist for the cheaper fault-isolation patterns most teams should reach for first.

Apr 23

Engineering11

The Transactional Outbox Is Not a Queue

The transactional outbox is a ledger, not a queue. Treating it like one is what breaks Postgres under load. This post walks through the specific failure modes — autovacuum stalls, xmin horizon drift, replication slot lag, poison pills — and the operational rules that actually keep it working in production.

Apr 17