Series navigation
Written by
Jagdish Salgotra
Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.
Was this useful?
Before a fan-out service can be trusted under load, four things need to be true: outcomes counted per scope, deadlines propagated, bulkheads in place, and pinning watched in JFR. What each one looks like in code.
Written by
Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.
Was this useful?
Note This part focuses on operating Java 21 preview structured concurrency APIs (
StructuredTaskScope, JEP 453) in real services. Part 9 is a migration-focused appendix for Java 21 -> Java 25 preview API changes.
The first thing that caught my attention in testing was a metric that would not come down. The client had already timed out. The scope had returned. But something was still running, still writing to logs, still holding a thread. That is the zombie-subtask problem, and it is the reason this article exists.
Structured concurrency is still preview in Java 21. This is not production guidance in the sense of battle-tested at scale. It is what I found running this seriously in testing, with an eye on what would have to be true before I trusted it in prod.
Without scope-level cancellation, a slow subtask keeps running after the response is gone. The client sees a timeout. Your metrics do not.
Same client outcome either way. Different cost. The before case leaves threads running work nobody is waiting for anymore.
public String shortTimeoutExample() throws Exception {
Instant deadline = Instant.now().plusMillis(350);
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
var slowTask = scope.fork(() -> simulateSlowService("slow-service", 500));
var fastTask = scope.fork(() -> simulateSlowService("fast-service", 100));
scope.joinUntil(deadline);
scope.throwIfFailed();
return String.format("Timeout Results: %s, %s",
slowTask.get(), fastTask.get());
}
}
joinUntil sets the ceiling. Without it, the scope waits as long as the slowest subtask wants to run.
For this to work, subtasks have to handle interruption. Code that swallows InterruptedException will not cancel cleanly regardless of what the scope does:
private static String simulateService(String name, long delayMs) {
try {
Thread.sleep(delayMs);
return name + "-OK";
} catch (InterruptedException e) {
// Propagate interrupt for graceful client cleanup.
Thread.currentThread().interrupt();
throw new RuntimeException("Interrupted: " + name, e);
}
}
If your downstream clients do not propagate interrupt, the deadline budget is set but cancellation never lands. The metric stays elevated. The logs keep writing. Same picture as before.
Track outcomes per scope, not just per request:
The cancellation count is the one most teams skip. It is also the one that tells you whether your deadline propagation is actually working. If cancellations are zero while timeouts are non-zero, something is swallowing the interrupt.
Wire these into Micrometer. Without scope-level counters, a zombie-subtask problem looks like normal latency variance until it does not.
Virtual threads do not remove downstream limits. A constrained dependency has a capacity ceiling regardless of how cheaply you can create threads. Semaphores or bulkheads around constrained downstreams, bounded retries, circuit breakers on unstable endpoints, none of that goes away with structured concurrency.
Keep timeout budgets explicit and path-specific. One global timeout for all fan-out paths will either be too tight for the slow paths or too loose for the fast ones.
Structured concurrency is still preview in Java 21 and still preview in Java 25. The API has evolved between versions. Practically:
javac --release 21 --enable-preview ...
java --enable-preview ...
Across Parts 1-8, the consistent theme is not "more threads"; it is better lifecycle control for concurrent work.
In Java 21 preview, structured concurrency can already reduce orchestration complexity when used with clear policies for timeout, cancellation, and degradation.
Part 9 provides the migration appendix for teams moving this code to Java 25 preview APIs.