Series navigation
Written by
Jagdish Salgotra
Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.
Long CompletableFuture chains look fine in code review and painful in incident calls because a failed call leaves its siblings still running, and a structured scope is what cancels them on the way out.
Written by
Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.
Note This series uses Java 21 as the baseline. Structured concurrency snippets in this part (
StructuredTaskScope, JEP 453) use preview APIs and require--enable-preview.
Structured concurrency is not mainly a performance feature. It is an ownership feature.
When a request forks three pieces of work, those pieces of work should have a visible parent. If one branch fails, the sibling branches should not drift away as orphaned tasks. If the caller only needs the first successful answer, the code should say that locally. If every branch is required, the parent should join the whole scope before it reads results.
That is the useful shift. Structured concurrency makes the lifetime of concurrent work look like the lifetime of ordinary block-scoped code.
This article is learning material. The main branch now builds with OpenJDK 25.0.2 and uses the Java 25 preview structured-concurrency API, with the Java 21 version separately managed in the feature/java-21 branch. The snippets below quote the current main-branch code that generated the measurements. Part 9 covers the Java 21 to Java 25 migration.
The smallest example is StructuredExample.java. It forks three service calls and joins them inside one scope:
try (var scope = StructuredTaskScope.open(StructuredTaskScope.Joiner.awaitAllSuccessfulOrThrow())) {
Subtask<String> fetch1 = scope.fork(() -> fetchFromService1());
Subtask<String> fetch2 = scope.fork(() -> fetchFromService2());
Subtask<String> fetch3 = scope.fork(() -> fetchFromService3());
scope.join();
String result = fetch1.get() + fetch2.get() + fetch3.get();
logger.info("Combined result: {}", result);
}
The service delays are fixed:
static String fetchFromService1() throws Exception {
Thread.sleep(300);
return "Service1 ";
}
static String fetchFromService2() throws Exception {
Thread.sleep(200);
return "Service2 ";
}
static String fetchFromService3() throws Exception {
Thread.sleep(100);
return "Service3 ";
}
When I ran it, the services completed in delay order and the combined result printed after the 300ms branch finished:
Service3 completed on thread:
Service2 completed on thread:
Service1 completed on thread:
Combined result: Service1 Service2 Service3
The blank thread name is the same virtual-thread naming detail from Parts 2 and 3. The logging pattern includes Thread.currentThread().getName(), but these virtual threads were not created with explicit names.
The important line is not scope.fork(...) by itself. The important shape is fork, join, then read:
var task1 = scope.fork(() -> simulateService("Service-A", 200));
var task2 = scope.fork(() -> simulateService("Service-B", 300));
var task3 = scope.fork(() -> simulateService("Service-C", 100));
scope.join();
String result = String.format("Results: %s, %s, %s",
task1.get(), task2.get(), task3.get());
That pattern is visible in StructuredConcurrencyComparison.java. The same file also has the CompletableFuture version:
CompletableFuture<String> cf1 = CompletableFuture.supplyAsync(() -> simulateService("Service-A", 200));
CompletableFuture<String> cf2 = CompletableFuture.supplyAsync(() -> simulateService("Service-B", 300));
CompletableFuture<String> cf3 = CompletableFuture.supplyAsync(() -> simulateService("Service-C", 100));
CompletableFuture.allOf(cf1, cf2, cf3).join();
String result = String.format("Results: %s, %s, %s",
cf1.get(), cf2.get(), cf3.get());
Both versions are capable of producing the right happy-path result. The distinction is where the ownership lives. In the structured version, the task lifetime is tied to the scope block. In the CompletableFuture version, the caller has to keep hold of the futures and remember the cancellation and cleanup policy separately.
The measurements below were generated with OpenJDK 25.0.2 and Maven 3.9.12:
mvn clean compile -DskipTests
mvn dependency:build-classpath -Dmdep.outputFile=cp.txt
export CP="$(cat cp.txt):target/classes"
The build succeeded and compiled 35 source files.
Then I ran the standalone examples:
java --enable-preview -cp "$CP" app.js.concurrent.StructuredExample
java --enable-preview -cp "$CP" app.js.concurrent.StructuredExampleWithSuccess
java --enable-preview -cp "$CP" app.js.concurrent.StructuredExampleWithErrors
java --enable-preview -cp "$CP" app.js.concurrent.StructuredConcurrencyComparison
java --enable-preview -cp "$CP" app.js.concurrent.StructuredConcurrencyTester
The results below come from those commands.
The comparison class runs three simulated service calls with 200ms, 300ms, and 100ms delays. Both the structured version and the CompletableFuture version should complete near 300ms because the slowest required branch sleeps for 300ms.
That is what happened:
StructuredTaskScope took: 310ms
CompletableFuture took: 313ms
The longer tester reported the same shape:
Expected time: ~300ms
StructuredTaskScope: 310ms (overhead: 10ms)
CompletableFuture: 306ms (overhead: 6ms)
StructuredTaskScope timing: ACCURATE
CompletableFuture timing: ACCURATE
That result is important because it prevents the wrong article from being written. This run does not support a claim that structured concurrency is faster on the happy path. Both approaches waited for the slowest required branch, and both landed near the expected duration.
The better claim is smaller: structured concurrency makes the ownership and failure policy easier to see in the code.
StructuredExampleWithSuccess.java shows two policies in one class. The first part waits for all services. The second part returns the first successful result:
try (var scope = StructuredTaskScope.open(
StructuredTaskScope.Joiner.<String>allUntil(s -> s.state() == Subtask.State.SUCCESS)
)) {
scope.fork(() -> slowService("Service-A", 1000));
scope.fork(() -> slowService("Service-B", 500));
scope.fork(() -> slowService("Service-C", 200));
Stream<Subtask<String>> results = scope.join();
String result = results
.filter(s -> s.state() == Subtask.State.SUCCESS)
.findFirst()
.map(Subtask::get)
.orElseThrow(() -> new Exception("No successful result"));
logger.info("First successful result: {}", result);
}
The run produced:
=== ShutdownOnFailure Example ===
Service3 completed on thread:
Service2 completed on thread:
Service1 completed on thread:
All services completed: Service1 Service2 Service3
=== ShutdownOnSuccess Example ===
Service-C completed on thread:
First successful result: Service-C result
The comparison class measured the same first-success shape:
StructuredTaskScope first result: Fast-Service-OK (took 207ms)
CompletableFuture first result: Fast-Service-OK (took 201ms)
Again, the headline is not raw speed. The headline is locality. The structured version says "join until one subtask succeeds" at the scope boundary. A reader does not have to reconstruct the policy from anyOf(...) plus separate cancellation calls.
StructuredExampleWithErrors.java forks three calls. One branch fails:
try (var scope = StructuredTaskScope.open(StructuredTaskScope.Joiner.awaitAllSuccessfulOrThrow())) {
Subtask<String> fetch1 = scope.fork(() -> fetchFromService("Service-1", 300, false));
Subtask<String> fetch2 = scope.fork(() -> fetchFromService("Service-2", 200, true));
Subtask<String> fetch3 = scope.fork(() -> fetchFromService("Service-3", 100, false));
scope.join();
String result = fetch1.get() + fetch2.get() + fetch3.get();
logger.info("Combined result: {}", result);
} catch (Exception e) {
logger.error("One or more services failed: {}", e.getMessage());
logger.error("All remaining tasks were cancelled automatically");
}
The run showed the 100ms branch completing and then the failure being reported when the failing 200ms branch threw:
Service-3 completed successfully on thread:
One or more services failed: java.lang.RuntimeException: Service-2 failed!
All remaining tasks were cancelled automatically
Notice what is missing: there is no Service-1 completed successfully line. That branch had a 300ms delay, and the failure happened earlier. The scope did not need to keep waiting for it.
The dedicated tester made the timing difference clearer:
Testing StructuredTaskScope error handling...
StructuredTaskScope caught error in: 57ms
Error message: java.lang.RuntimeException: Bad-Task failed!
Testing CompletableFuture error handling...
CompletableFuture caught error in: 204ms
Error: Bad-Task failed!
In that tester, the bad task fails after 50ms. The structured path reported the error in 57ms. The CompletableFuture.allOf(...) path reported it in 204ms because the all-of join waited for the other branches to finish before surfacing the failure.
That is the strongest practical result in this article. The policy changes when a failure is allowed to end the group.
The same tester includes a cancellation case. Two long tasks start, then a third task fails after 100ms:
try (var scope = StructuredTaskScope.open(StructuredTaskScope.Joiner.awaitAllSuccessfulOrThrow())) {
scope.fork(() -> cancellableTask("Task-1", 1000, structuredCancelled));
scope.fork(() -> cancellableTask("Task-2", 2000, structuredCancelled));
scope.fork(() -> failingTask("Failing-Task", 100));
scope.join();
} catch (Exception e) {
logger.info("StructuredTaskScope cancelled " + structuredCancelled.get() + " tasks");
}
The measured output:
StructuredTaskScope cancelled 2 tasks
CompletableFuture requires manual cancellation
StructuredTaskScope auto-cancelled: 2 tasks
CompletableFuture auto-cancelled: 0 tasks
This is not an abstract cleanup claim. The checked-in task increments a counter only when it receives InterruptedException:
private static String cancellableTask(String name, long delayMs, AtomicInteger cancelledCounter) {
try {
Thread.sleep(delayMs);
return name + "-COMPLETED";
} catch (InterruptedException e) {
cancelledCounter.incrementAndGet();
Thread.currentThread().interrupt();
return name + "-CANCELLED";
}
}
The structured run incremented the counter twice. The CompletableFuture run did not.
The current run does not support a structured-concurrency performance claim.
The comparison class reported:
StructuredTaskScope: 1336ms
CompletableFuture: 1310ms
Difference: -26ms
The longer tester reported:
StructuredTaskScope average: 1.01 ms
CompletableFuture average: 1.01 ms
Difference: 0.00 ms
StructuredTaskScope is 1.0x faster
That is effectively a tie for this tiny 1ms task loop, with noise larger than the lesson. It should not be turned into a performance claim either way.
The memory section is even noisier:
StructuredTaskScope: -489 KB
CompletableFuture: -26 KB
Difference: 462 KB
Negative allocation deltas after explicit GC are not article evidence. They are a signal that this local test is too rough for memory claims. Keep the memory output in the testing guide if it helps inspect behavior locally, but do not use it to claim that one approach allocates less.
In text, the diagram shows a parent scope owning several child tasks. The parent starts the children, waits according to the join policy, and exits the scope with cleanup tied to that block.
That is the mental model to keep. Structured concurrency gives concurrent work a visible parent.
Structured-concurrency tests should prove ownership policy. For all-required fan-out, verify that the result includes every required branch and that total time follows the slowest branch. For first-success fan-out, verify which branch wins and that the result returns near that branch's delay. For failure paths, assert both the time to observe failure and whether slower sibling tasks were interrupted. A test that only checks the final exception message can miss the main behavior.
Cancellation deserves its own check. Use tasks that record interruption, as StructuredConcurrencyTester does, so the test can distinguish "the parent returned" from "the siblings actually stopped." For performance, avoid turning 1ms local task loops into broad claims. Use them as smoke checks only, and measure real request paths separately.
Part 4 showed why structured concurrency is about local ownership, not automatic speed. The strongest evidence was the failure and cancellation behavior: a 50ms failing task surfaced in 57ms under the structured scope, and the scope interrupted two slower sibling tasks. The CompletableFuture examples can be made correct, but the policy is spread across futures, joins, and manual cancellation calls.
Part 5 moves from the core scope model into advanced patterns: timeouts, conditional cancellation, progressive result collection, and resource-aware orchestration.
--enable-preview