Series navigation
Written by
Jagdish Salgotra
Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.
A microservice that swaps its executor for virtual threads loses the thread-pool ceiling but gains a different operational surface: orchestrating fan-out cleanly, watching pinning instead of pool depth, and noticing that downstream capacity is now the limit.
Written by
Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.
Note This series uses Java 21 as the baseline. Virtual threads are stable in Java 21 (JEP 444). Structured concurrency snippets in this part (
StructuredTaskScope, JEP 453) use preview APIs and require--enable-preview.
A microservice request is rarely one piece of work. It often fans out to a database, a cache, a file, a profile service, a notification service, or a slower dependency owned by another team.
Virtual threads help because those branches often spend most of their time waiting. Structured concurrency helps because the branches belong to one request, and the lifetime of that request should own the lifetime of the work it starts.
This article uses the checked-in port 8080 service to look at four request patterns: aggregate, first success, fallback, and multi-service aggregation. The point is not that these toy endpoints are production systems. The point is that the code, timings, and failure behavior are concrete enough to reason about.
This article is learning material. The main branch now builds with OpenJDK 25.0.2 and uses the Java 25 preview structured-concurrency API, with the Java 21 version separately managed in the feature/java-21 branch. The snippets below quote the current main-branch code that generated the measurements. Part 9 covers the Java 21 to Java 25 structured-concurrency migration.
The backing file is VirtualThreadMicroservice.java. It starts an HttpServer on port 8080 and runs request handlers on virtual threads:
HttpServer server = HttpServer.create(new InetSocketAddress(PORT), 0);
server.setExecutor(Executors.newVirtualThreadPerTaskExecutor());
For Part 3, the relevant routes are:
| Endpoint | Method | Pattern |
|---|---|---|
/aggregate | aggregateWithStructuredConcurrency() | wait for block and file branches |
/first-success | firstSuccessWithStructuredConcurrency() | return the first successful branch |
/aggregate-with-fallback | aggregateWithFallback() | return fallback text when a branch fails |
/multi-aggregate | multiServiceAggregation() | combine block, file, compute, and cache branches |
/metrics | generateMetrics() | expose aggregate service counters |
The service also exposes /aggregate-old, but I did not use it for this article. This pass is not comparing CompletableFuture with structured concurrency. It is grounding the microservice patterns that the article teaches.
The aggregate endpoint forks two branches: a 300ms simulated blocking service and a local file read.
private static String aggregateWithStructuredConcurrency() throws Exception {
long startTime = System.currentTimeMillis();
try (var scope = StructuredTaskScope.open(StructuredTaskScope.Joiner.awaitAllSuccessfulOrThrow())) {
var blockFuture = scope.fork(() -> fetchBlock());
var fileFuture = scope.fork(() -> fetchFile());
scope.join();
long duration = System.currentTimeMillis() - startTime;
return String.format("StructuredTaskScope Combined: %s | %s (Total: %dms)",
blockFuture.get(), fileFuture.get(), duration);
}
}
The important detail is not the syntax. It is ownership. The scope owns both subtasks. The parent joins before reading results. If a branch fails under this joiner, the request does not silently leave sibling work behind.
The measured single request returned in about the time of the slowest branch:
StructuredTaskScope Combined: Block-Service-OK | File-Service-OK-10000-lines (Total: 305ms) (Duration: 306ms, Thread: , Request: #7)
status=200 total=0.307285s
The file branch is fast in this local run. The block branch sleeps for 300ms. The whole request completed just over 300ms.
The first-success endpoint starts three candidate branches:
private static String firstSuccessWithStructuredConcurrency() throws Exception {
long startTime = System.currentTimeMillis();
try (var scope = StructuredTaskScope.open(
StructuredTaskScope.Joiner.<String>allUntil(s -> s.state() == Subtask.State.SUCCESS)
)) {
scope.fork(() -> slowService("Cache-1", 500));
scope.fork(() -> slowService("Cache-2", 200));
scope.fork(() -> slowService("Database", 800));
Stream<Subtask<String>> results = scope.join();
long duration = System.currentTimeMillis() - startTime;
String firstResult = results
.filter(s -> s.state() == Subtask.State.SUCCESS)
.findFirst()
.map(Subtask::get)
.orElseThrow(() -> new Exception("No successful result"));
return String.format("First successful result: %s (Duration: %dms)",
firstResult, duration);
}
}
The service names make the result easy to audit. Cache-2 sleeps for 200ms, Cache-1 sleeps for 500ms, and Database sleeps for 800ms. The endpoint should return around 200ms if the first-success policy is doing what the code says.
The measured request returned:
First successful result: Cache-2-OK-200ms (Duration: 205ms) (Duration: 206ms, Thread: , Request: #8)
status=200 total=0.207463s
That result is the contract. A first-success endpoint should not wait for every branch when one successful answer is enough.
The fallback endpoint is deliberately nondeterministic:
private static String fetchFileWithPossibleError() throws Exception {
if (Math.random() < 0.3) {
throw new RuntimeException("File service temporarily unavailable");
}
return fetchFile();
}
That means a single request is not enough evidence. You need a sequence that shows both paths.
I ran twelve requests:
fallback-01 Aggregate with fallback: Block-Service-OK | File-Service-OK-10000-lines (Duration: 302ms) status=200 total=0.304022s
fallback-02 Aggregate with fallback: Block-Service-OK | File-Service-OK-10000-lines (Duration: 302ms) status=200 total=0.304377s
fallback-03 Aggregate with fallback: Block-Service-OK | File-Service-OK-10000-lines (Duration: 306ms) status=200 total=0.307403s
fallback-04 Aggregate with fallback: Block-Service-OK | File-Service-OK-10000-lines (Duration: 305ms) status=200 total=0.306844s
fallback-05 Aggregate with fallback: Block-Service-OK | File-Service-OK-10000-lines (Duration: 302ms) status=200 total=0.304278s
fallback-06 Aggregate with fallback: Block-Service-OK | File-Service-OK-10000-lines (Duration: 306ms) status=200 total=0.308128s
fallback-07 Fallback response: One service failed (java.lang.RuntimeException: File service temporarily unavailable), but we handled it gracefully (Duration: 0ms) status=200 total=0.001792s
fallback-08 Fallback response: One service failed (java.lang.RuntimeException: File service temporarily unavailable), but we handled it gracefully (Duration: 0ms) status=200 total=0.001394s
fallback-09 Aggregate with fallback: Block-Service-OK | File-Service-OK-10000-lines (Duration: 303ms) status=200 total=0.305008s
fallback-10 Fallback response: One service failed (java.lang.RuntimeException: File service temporarily unavailable), but we handled it gracefully (Duration: 0ms) status=200 total=0.001094s
fallback-11 Aggregate with fallback: Block-Service-OK | File-Service-OK-10000-lines (Duration: 306ms) status=200 total=0.306550s
fallback-12 Fallback response: One service failed (java.lang.RuntimeException: File service temporarily unavailable), but we handled it gracefully (Duration: 0ms) status=200 total=0.001240s
Eight requests took the normal path and returned in roughly 300ms. Four requests took the fallback path and returned in about 1ms.
That fast fallback is not magic. The possible file failure happens before the 300ms blocking branch completes. With the failure-oriented joiner, the scope can stop waiting for the sibling work and return the fallback response. That is the kind of detail a benchmark table would hide.
The multi-aggregate endpoint adds two more branches:
private static String multiServiceAggregation() throws Exception {
long startTime = System.currentTimeMillis();
try (var scope = StructuredTaskScope.open(StructuredTaskScope.Joiner.awaitAllSuccessfulOrThrow())) {
var blockFuture = scope.fork(() -> fetchBlock());
var fileFuture = scope.fork(() -> fetchFile());
var computeFuture = scope.fork(() -> fetchCompute());
var cacheFuture = scope.fork(() -> slowService("Cache", 150));
scope.join();
long duration = System.currentTimeMillis() - startTime;
return String.format("Multi-service result: Block[%s] | File[%s] | Compute[%s] | Cache[%s] (Total: %dms)",
blockFuture.get(), fileFuture.get(), computeFuture.get(), cacheFuture.get(), duration);
}
}
The slowest branch is still the 300ms block. The cache branch sleeps for 150ms, the file branch is local, and the compute branch sums primes up to 10,000.
The measured request returned:
Multi-service result: Block[Block-Service-OK] | File[File-Service-OK-10000-lines] | Compute[Compute-Service-OK-5736396] | Cache[Cache-OK-150ms] (Total: 305ms) (Duration: 306ms, Thread: , Request: #9)
status=200 total=0.307478s
Adding branches did not make this request take the sum of every branch. The duration still tracked the slowest branch in the scope.
The measurements below were generated with OpenJDK 25.0.2 and Maven 3.9.12:
mvn clean compile -DskipTests
mvn dependency:build-classpath -Dmdep.outputFile=cp.txt
export CP="$(cat cp.txt):target/classes"
The build succeeded and compiled 35 source files.
Then I started the service:
java --enable-preview -cp "$CP" app.js.microservices.VirtualThreadMicroservice
The checked-in script test_structured_concurrency.sh returned the key endpoint outputs:
DB call completed (Duration: 306ms, Thread: , Request: #1)
File read completed. Lines: 10000 (Duration: 8ms, Thread: , Request: #2)
StructuredTaskScope Combined: Block-Service-OK | File-Service-OK-10000-lines (Total: 303ms) (Duration: 303ms, Thread: , Request: #3)
First successful result: Cache-2-OK-200ms (Duration: 206ms) (Duration: 208ms, Thread: , Request: #4)
Fallback response: One service failed (java.lang.RuntimeException: File service temporarily unavailable), but we handled it gracefully (Duration: 1ms) (Duration: 1ms, Thread: , Request: #5)
Multi-service result: Block[Block-Service-OK] | File[File-Service-OK-10000-lines] | Compute[Compute-Service-OK-5736396] | Cache[Cache-OK-150ms] (Total: 305ms) (Duration: 305ms, Thread: , Request: #6)
The blank Thread: field is the same naming detail from Part 2: these virtual threads were not created with an explicit name. It does not affect the policy behavior, but it matters if you expect thread names to carry diagnostic context.
I also ran focused load checks, one endpoint at a time:
wrk -t4 -c40 -d10s http://localhost:8080/aggregate
wrk -t4 -c40 -d10s http://localhost:8080/first-success
wrk -t4 -c40 -d10s http://localhost:8080/multi-aggregate
The load results were:
| Endpoint | Expected slow branch | Average latency | Requests/sec | Total requests |
|---|---|---|---|---|
/aggregate | 300ms block | 305.70ms | 128.55 | 1,297 |
/first-success | 200ms cache hit | 206.51ms | 190.32 | 1,920 |
/multi-aggregate | 300ms block | 304.89ms | 130.91 | 1,320 |
The same results as raw output:
wrk -t4 -c40 -d10s http://localhost:8080/aggregate
Latency 305.70ms 2.82ms 317.63ms 74.25%
Requests/sec: 128.55
1297 requests in 10.09s
wrk -t4 -c40 -d10s http://localhost:8080/first-success
Latency 206.51ms 2.47ms 223.12ms 81.09%
Requests/sec: 190.32
1920 requests in 10.09s
wrk -t4 -c40 -d10s http://localhost:8080/multi-aggregate
Latency 304.89ms 2.32ms 314.85ms 70.91%
Requests/sec: 130.91
1320 requests in 10.08s
The math is useful. With 40 clients and a 300ms slow branch, the rough ceiling is 40 / 0.3, or about 133 requests per second. /aggregate measured 128.55 requests per second, and /multi-aggregate measured 130.91. With a 200ms first-success branch, the rough ceiling is 40 / 0.2, or about 200 requests per second. /first-success measured 190.32.
That alignment matters more than the absolute numbers. The benchmark is credible because the result follows the code.
The final metrics snapshot returned:
Virtual Thread Microservice Metrics:
=====================================
Active Requests: 0
Total Requests: 4669
Average Response Time: 261.42ms
CPU Usage: 12.57%
Memory Usage: 242.77MB / 776.00MB
JVM Uptime: 74 seconds
Thread Type: Virtual Threads
As in Part 2, the average response time is one aggregate number across every route touched in this run. It is useful as a broad smoke signal, not as a replacement for per-endpoint latency.
This pass does not prove that structured concurrency is universally faster than CompletableFuture, and it does not prove that virtual threads make microservices ready for real deployment by themselves. The service uses simulated delays, a local generated file, and small in-process branches.
The results do show something narrower and more useful: when the code says "fork these sibling tasks and join the request scope," the measured duration tracks the slowest required branch or the first successful branch, depending on the policy.
That is the teaching point. The value is not a headline speedup. The value is being able to read the policy locally in the code and then verify that the endpoint behaves the same way under a small load test.
Microservice fan-out needs tests that check policy, not just HTTP status. An aggregate endpoint should prove that all required branches contributed to the response and that the duration follows the slowest branch rather than the sum of every branch. A first-success endpoint should show which branch won and whether the response returns near that branch's latency. A fallback endpoint needs a sequence, because one request may only show the happy path. The sequence should make clear whether fallback waits for unrelated sibling work or returns as soon as the failed branch determines the outcome.
Load tests should keep one endpoint under test at a time. Mixing /aggregate, /first-success, and /multi-aggregate into one run would hide the relationship between branch delay and request latency. Capture /metrics after the load, but use per-endpoint wrk output for latency and throughput.
Part 3 showed how virtual threads and structured concurrency make request fan-out easier to read and test. The strongest evidence was not a big throughput claim. It was the alignment between branch delays and measured request times: 300ms aggregate scopes stayed near 300ms, the 200ms first-success branch returned near 200ms, and fallback returned immediately when the file branch failed before the blocking branch completed.
Part 4 moves deeper into structured concurrency itself: what the API is trying to fix, where preview syntax matters, and how scope lifetime changes the way Java code owns concurrent work.
wrk -t8 -c1000 -d30s http://localhost:8080/aggregate