Structured Concurrency Fails Differently in 4 Runtimes

2/4

Structured concurrency arrived in the Java mainstream conversation in September 2025, when Java 25 shipped JEP 505. It is worth being precise about what landed: JEP 505 is the fifth preview of the feature, not a finished API. The Oracle reference still carries the preview banner, so you compile it with --enable-preview and you should expect the surface to shift again — and it has. JEP 525 shipped as a sixth preview in Java 26 (March 2026) with renames and return-type changes on Joiner (notably allSuccessfulOrThrow returning a List instead of a stream, and anySuccessfulResultOrThrow renamed to anySuccessfulOrThrow), and JEP 533 is queued as a seventh preview for JDK 27, adding a third type parameter to Joiner and wrapping subtask failures in ExecutionException. The headline I kept seeing ("structured concurrency is now stable in Java") is wrong. What is true is more interesting: with the JEP 505 preview, all four runtimes I reach for — Java, Kotlin, Swift, Python — now ship the same pattern, and for the first time I could put their failure behavior side by side.

So I did. I wrote the same toy workload four times: fan out three calls, fail the whole unit if any one fails, never leak a thread. The surface API is close enough that you can almost copy a mental model from one runtime to the next. The failure semantics are not. That gap is the whole point of this post, because it is invisible on the happy path and it is exactly what bites when a request spans more than one runtime.

The same fan-out, four times

Here is the shape in each runtime. I am showing only the spine — fork some children, wait, combine — because that is the part that looks identical.

Java 25, default scope:

java

try (var scope = StructuredTaskScope.open()) {
    var user  = scope.fork(this::findUser);
    var order = scope.fork(this::fetchOrder);
    scope.join();                       // throws on the first failure
    return new Response(user.get(), order.get());
}

Kotlin:

kotlin

coroutineScope {
    val user  = async { findUser() }
    val order = async { fetchOrder() }
    Response(user.await(), order.await())
}

Swift:

swift

try await withThrowingTaskGroup(of: Part.self) { group in
    group.addTask { try await findUser() }
    group.addTask { try await fetchOrder() }
    var parts: [Part] = []
    for try await part in group { parts.append(part) }
    return combine(parts)
}

Python:

python

async with asyncio.TaskGroup() as tg:
    user  = tg.create_task(find_user())
    order = tg.create_task(fetch_order())
# both awaited at the end of the block

Four idioms, one idea: the lexical block owns the children, and the block does not exit until every child has terminated. No detached work escapes. That guarantee is real in all four, and it is the reason the pattern is worth adopting at all. The trouble starts the moment one child throws.

Where cancellation actually diverges

When a child fails, every one of these runtimes cancels the remaining siblings. That sentence hides three different mechanisms, and the mechanism decides whether cancellation actually happens.

Java cancels by thread interrupt. The scope interrupts the virtual threads running the unfinished subtasks, and the interrupt surfaces as InterruptedException inside any blocking call. The catch: an interrupt only lands at an interruptible point. A subtask spinning in a tight CPU loop, or one that catches InterruptedException and swallows it, never notices. The scope will still wait for it, which is the guarantee working as designed — but "cancelled" here means "asked to stop," not "stopped."

Kotlin, Swift, and Python cancel cooperatively. Kotlin throws CancellationException at the next suspension point; Swift sets a flag the child reads through Task.isCancelled or Task.checkCancellation(); Python throws CancelledError at the next await. In all three, a child that never reaches a checkpoint — or that catches the cancellation signal and keeps going — defeats cancellation entirely. I confirmed the failure mode the same way in each: a while true loop with no await is uncancellable everywhere, and a try/except that eats the cancel turns "stop now" into "stop never."

None of the four can preempt a running computation. That is the first thing I would tell anyone treating these APIs as interchangeable: the structured guarantee is about waiting for children, not about forcing them to die.

Java has one extra trap the other three do not. In his critique of JEP 505, Adam Warski points out that when the scope cancels, the interrupt reaches the subtasks but not the body of the scope itself. If your scope body is acting as a coordinator — blocked on a queue, waiting for the children to feed it work — a child failure cancels the children and leaves the body parked on queue.take() forever. The pattern that looks safest (a driver loop coordinating workers) is the one that hangs. There is no clean way to interrupt the body under the current design, so you are back to manual discipline: catch failures inside the children and signal the body explicitly.

What gets swallowed on failure

Cancellation decides what stops. Aggregation decides what you learn about. This is where the four runtimes split hardest, and it is the difference I would actually pick a runtime on.

The diagram below contrasts what reaches your catch block when two children fail at once. Look at how many exceptions survive the trip back to the caller in each lane.

Java's default policy throws StructuredTaskScope.FailedException wrapping the first subtask that failed, and cancels the scope. The awaitAllSuccessfulOrThrow joiner does the same — it surfaces the first failure's exception. The second and third failures are gone. You can recover them by writing a custom Joiner that inspects every Subtask after join(), but out of the box, the count of failures you can see is one.

Kotlin's coroutineScope rethrows the first child's exception and cancels the siblings. Later exceptions from other children are attached as suppressed exceptions on the first, so they are technically reachable through Throwable.getSuppressed() — but only if you go looking, and most logging and error-handling code never does. (If you want children to fail independently instead of taking down the scope, that is what supervisorScope is for; it isolates each failure rather than aggregating anything.)

Swift's withThrowingTaskGroup rethrows the first error it sees while you iterate the group, marks the rest cancelled, awaits them, then completes. The other failures are discarded. Swift adds a trap the others lack: the group only rethrows when you consume its results. If you fire children with addTask and never iterate the group with for try await, a thrown error is silently dropped and the group completes as if nothing failed. I found this is the single easiest way to write a Swift task group that looks correct and reports success while a child blew up. The fix is mechanical — always drain the group — but nothing forces you to.

Python is the outlier, and it is the reason I would reach for asyncio.TaskGroup when I actually need to know everything that broke. When tasks fail, the group cancels the rest and then raises an ExceptionGroup carrying every non-cancellation failure (this is PEP 654, shipped in 3.11), which you unwrap with the except* syntax. It is the only one of the four that is lossless by default.

Here is the smallest program I kept that proves it. A barrier lines three workers up at the same instant; two of them then raise with no further await, so both failures are recorded before cancellation can intervene:

python

import asyncio

async def worker(name: str, barrier: asyncio.Barrier, fail: bool) -> str:
    await barrier.wait()            # all three line up at the same instant
    if fail:                        # then raise with no further await point
        raise RuntimeError(f"{name} failed")
    return f"{name} ok"

async def fan_out() -> None:
    barrier = asyncio.Barrier(3)
    try:
        async with asyncio.TaskGroup() as tg:
            tg.create_task(worker("A", barrier, fail=True))
            tg.create_task(worker("B", barrier, fail=True))
            tg.create_task(worker("C", barrier, fail=False))
    except* RuntimeError as eg:                       # note the star
        names = sorted(str(e) for e in eg.exceptions)
        print(f"aggregated {len(eg.exceptions)} failures: {names}")

asyncio.run(fan_out())

Run it with python3 fanout.py on Python 3.11 or newer. I ran it five times in a row and it printed aggregated 2 failures: ['A failed', 'B failed'] every time. The barrier is doing the work that makes this deterministic: without it, A would fail first, cancellation would reach B mid-await, and B would come back as a CancelledError rather than a RuntimeError — leaving you with one real failure in the group instead of two. That detail is the whole behavior in miniature: aggregation only captures failures that actually raised before cancellation swept the rest away.

The semantics, on one screen

This is the table I now keep next to the four code samples. The top three rows look the same across runtimes; the bottom three are where the bodies are buried.

Behavior	Java 25 `StructuredTaskScope`	Kotlin `coroutineScope`	Swift `withThrowingTaskGroup`	Python `asyncio.TaskGroup`
Block waits for all children	yes	yes	yes	yes
First failure cancels siblings	yes	yes	yes	yes
Cancel mechanism	thread interrupt	`CancellationException`	cooperative flag	`CancelledError`
Failures the caller sees	first only	first (+ suppressed)	first only	all (`ExceptionGroup`)
Easiest silent-swallow bug	hung coordinator body	caught `CancellationException`	results never consumed	caught `CancelledError`
Maturity	preview (`--enable-preview`)	stable	stable	stable (3.11+)

Picking a runtime knowing the trade

The reason this matters in a backend is partial failure. When you fan out to three downstream services and two of them are down, the difference between "I logged one timeout" and "I logged a connection refused and a 503" is the difference between chasing the wrong dependency at 2 a.m. and fixing the right one. Three of these four runtimes hand you the first failure and quietly drop the rest. If you are on Java, Kotlin, or Swift and you care about every failure, you have to write the code that collects them — a custom Joiner in Java, reading suppressed exceptions in Kotlin, draining and accumulating in Swift. It is not hard, but it is not the default, and defaults are what ship.

A few things I would act on:

Treat aggregation as a feature you opt into, not one you get. Only Python's TaskGroup surfaces every failure by default. Everywhere else, assume the first exception is all you will see unless you wrote the code to see more.
Audit your cancellation checkpoints. Cooperative cancellation (Kotlin, Swift, Python) is a no-op against a child that never suspends or that swallows the cancel signal. A CPU-bound child is uncancellable in all four — Java's interrupt does not help there either.
In Swift, always drain the group. An unconsumed withThrowingTaskGroup reports success while a child failed. Iterate it with for try await even when you do not need the values.
In Java, do not block the scope body on its own children. If the body coordinates the workers through a queue, a child failure can leave the body parked while the scope is cancelled around it.
Do not ship Java structured concurrency as if it were stable. It remains a preview behind --enable-preview through Java 26 (sixth preview, JEP 525), with a seventh queued for JDK 27 (JEP 533) that changes Joiner's type parameters and exception wrapping. The shape will keep moving until finalization.

When to reach for the pattern: any time you fan out concurrent I/O and want a single point that owns the lifetime and the failure. It is a strict improvement over hand-rolled futures and detached tasks. When to be careful: the moment your children share state, coordinate through a queue, or include a long-lived background task that never completes on its own — those are the cases where the convergent surface API hides four genuinely different runtimes underneath, and the one you reach past is the one that hangs or swallows.

The surface really does look the same in all four. The failure path is where they stop agreeing, and the failure path is the only one that matters under load.

Still here? You might enjoy this.

Nothing close enough — try a different angle?

Backend

Catching a Retry Race with One Seed: Deterministic Simulation in Rust using turmoil

I had three flaky retry tests no one could reproduce on a laptop. I rewrote one in Rust on top of turmoil, Tokio's deterministic simulator, and a single 8-byte seed pinned the partition race byte-for-byte. These are my notes on what the seed actually controls, what leaks past it, and when deterministic simulation testing is worth the seam.

Distributed Systems

Two-Phase Commit on the JVM: The Blocking Problem Nobody Puts in the Diagram

I crashed a Two-Phase Commit coordinator on purpose in a small Kotlin simulation to measure how long participants stay locked when the coordinator vanishes between phases. The result is the part of 2PC the diagrams never show — and the reason I would model most cross-service writes as a saga instead.

Backend

Durable Execution Isn't About Agents — It's About Replayable Backend Workflows

I came to durable-execution runtimes through the agent press, but the constraint that surprises everyone is determinism on replay. These are my notes from working a six-step payment reconciliation as a Restate workflow in TypeScript — the line that broke replay, the mental model that fixed it, and the trade-offs that come with the pattern.

Structured Concurrency Looks the Same in Four Runtimes — Until a Child Fails