engnotes.dev
NotebookTopicsAbout

Subscribe

One email when a new post goes up. Nothing else.

one per post · no tracking · also on RSS

Site

  • Notebook
  • Topics
  • About
  • Contact

Topics

Project Loom9Structured Concurrency9Tail Latency & System Behavior4

Elsewhere

  • GitHub
  • X
  • LinkedIn
  • Email
engnotes.dev© 2026 Jagdish Salgotra · written on personal time. not on employer time.
PrivacyTermsCookies
blog/project-loom/part 3
Project Loom · Part 3 of 9

Real-World Microservices

A microservice that swaps its executor for virtual threads loses the thread-pool ceiling but gains a different operational surface: orchestrating fan-out cleanly, watching pinning instead of pool depth, and noticing that downstream capacity is now the limit.

J
Jagdish Salgotra
2025-07-20·28 min read·~1,300 words

Series navigation

← Previous · Part 2Building Web Services with Virtual ThreadsNext · Part 4 →Structured Concurrency in Practice
Code repositoryproject-loom
#project-loom
share
J

Written by

Jagdish Salgotra

Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.

all posts

Keep reading · rest of the series

  • 2025-07-0615 min read
    Part 1
    Java Virtual Threads: Why They Matter for I/O Scalability
  • 2025-07-1315 min read
    Part 2
    Building Web Services with Virtual Threads
  • 2025-07-2725 min read
    Part 4
    Structured Concurrency in Practice
  • 2025-08-0312 min read
    Part 5
    Advanced Structured Concurrency Patterns
Was this article helpful? or email →
anonymous · no account needed

On this page

Reading progress

0 min of 28 · ~28 left

Ask the post

Any answer points back at the paragraph it came from.

Note This series uses Java 21 as the baseline. Virtual threads are stable in Java 21 (JEP 444). Structured concurrency snippets in this part (StructuredTaskScope, JEP 453) use preview APIs and require --enable-preview.

A microservice request is rarely one piece of work. It often fans out to a database, a cache, a file, a profile service, a notification service, or a slower dependency owned by another team.

Virtual threads help because those branches often spend most of their time waiting. Structured concurrency helps because the branches belong to one request, and the lifetime of that request should own the lifetime of the work it starts.

This article uses the checked-in port 8080 service to look at four request patterns: aggregate, first success, fallback, and multi-service aggregation. The point is not that these toy endpoints are production systems. The point is that the code, timings, and failure behavior are concrete enough to reason about.

This article is learning material. The main branch now builds with OpenJDK 25.0.2 and uses the Java 25 preview structured-concurrency API, with the Java 21 version separately managed in the feature/java-21 branch. The snippets below quote the current main-branch code that generated the measurements. Part 9 covers the Java 21 to Java 25 structured-concurrency migration.

The service in this article

The backing file is VirtualThreadMicroservice.java. It starts an HttpServer on port 8080 and runs request handlers on virtual threads:

java
HttpServer server = HttpServer.create(new InetSocketAddress(PORT), 0);
server.setExecutor(Executors.newVirtualThreadPerTaskExecutor());

For Part 3, the relevant routes are:

EndpointMethodPattern
/aggregateaggregateWithStructuredConcurrency()wait for block and file branches
/first-successfirstSuccessWithStructuredConcurrency()return the first successful branch
/aggregate-with-fallbackaggregateWithFallback()return fallback text when a branch fails
/multi-aggregatemultiServiceAggregation()combine block, file, compute, and cache branches
/metricsgenerateMetrics()expose aggregate service counters

The service also exposes /aggregate-old, but I did not use it for this article. This pass is not comparing CompletableFuture with structured concurrency. It is grounding the microservice patterns that the article teaches.

Aggregate waits for the slowest branch

The aggregate endpoint forks two branches: a 300ms simulated blocking service and a local file read.

java
private static String aggregateWithStructuredConcurrency() throws Exception {
    long startTime = System.currentTimeMillis();

    try (var scope = StructuredTaskScope.open(StructuredTaskScope.Joiner.awaitAllSuccessfulOrThrow())) {
        var blockFuture = scope.fork(() -> fetchBlock());
        var fileFuture = scope.fork(() -> fetchFile());

        scope.join();

        long duration = System.currentTimeMillis() - startTime;
        return String.format("StructuredTaskScope Combined: %s | %s (Total: %dms)",
            blockFuture.get(), fileFuture.get(), duration);
    }
}

The important detail is not the syntax. It is ownership. The scope owns both subtasks. The parent joins before reading results. If a branch fails under this joiner, the request does not silently leave sibling work behind.

The measured single request returned in about the time of the slowest branch:

text
StructuredTaskScope Combined: Block-Service-OK | File-Service-OK-10000-lines (Total: 305ms) (Duration: 306ms, Thread: , Request: #7)
status=200 total=0.307285s

The file branch is fast in this local run. The block branch sleeps for 300ms. The whole request completed just over 300ms.

First success returns one answer

The first-success endpoint starts three candidate branches:

java
private static String firstSuccessWithStructuredConcurrency() throws Exception {
    long startTime = System.currentTimeMillis();

    try (var scope = StructuredTaskScope.open(
            StructuredTaskScope.Joiner.<String>allUntil(s -> s.state() == Subtask.State.SUCCESS)
    )) {
        scope.fork(() -> slowService("Cache-1", 500));
        scope.fork(() -> slowService("Cache-2", 200));
        scope.fork(() -> slowService("Database", 800));

        Stream<Subtask<String>> results = scope.join();

        long duration = System.currentTimeMillis() - startTime;
        String firstResult = results
            .filter(s -> s.state() == Subtask.State.SUCCESS)
            .findFirst()
            .map(Subtask::get)
            .orElseThrow(() -> new Exception("No successful result"));

        return String.format("First successful result: %s (Duration: %dms)",
            firstResult, duration);
    }
}

The service names make the result easy to audit. Cache-2 sleeps for 200ms, Cache-1 sleeps for 500ms, and Database sleeps for 800ms. The endpoint should return around 200ms if the first-success policy is doing what the code says.

The measured request returned:

text
First successful result: Cache-2-OK-200ms (Duration: 205ms) (Duration: 206ms, Thread: , Request: #8)
status=200 total=0.207463s

That result is the contract. A first-success endpoint should not wait for every branch when one successful answer is enough.

Fallback is a sequence, not one request

The fallback endpoint is deliberately nondeterministic:

java
private static String fetchFileWithPossibleError() throws Exception {
    if (Math.random() < 0.3) {
        throw new RuntimeException("File service temporarily unavailable");
    }
    return fetchFile();
}

That means a single request is not enough evidence. You need a sequence that shows both paths.

I ran twelve requests:

text
fallback-01 Aggregate with fallback: Block-Service-OK | File-Service-OK-10000-lines (Duration: 302ms) status=200 total=0.304022s
fallback-02 Aggregate with fallback: Block-Service-OK | File-Service-OK-10000-lines (Duration: 302ms) status=200 total=0.304377s
fallback-03 Aggregate with fallback: Block-Service-OK | File-Service-OK-10000-lines (Duration: 306ms) status=200 total=0.307403s
fallback-04 Aggregate with fallback: Block-Service-OK | File-Service-OK-10000-lines (Duration: 305ms) status=200 total=0.306844s
fallback-05 Aggregate with fallback: Block-Service-OK | File-Service-OK-10000-lines (Duration: 302ms) status=200 total=0.304278s
fallback-06 Aggregate with fallback: Block-Service-OK | File-Service-OK-10000-lines (Duration: 306ms) status=200 total=0.308128s
fallback-07 Fallback response: One service failed (java.lang.RuntimeException: File service temporarily unavailable), but we handled it gracefully (Duration: 0ms) status=200 total=0.001792s
fallback-08 Fallback response: One service failed (java.lang.RuntimeException: File service temporarily unavailable), but we handled it gracefully (Duration: 0ms) status=200 total=0.001394s
fallback-09 Aggregate with fallback: Block-Service-OK | File-Service-OK-10000-lines (Duration: 303ms) status=200 total=0.305008s
fallback-10 Fallback response: One service failed (java.lang.RuntimeException: File service temporarily unavailable), but we handled it gracefully (Duration: 0ms) status=200 total=0.001094s
fallback-11 Aggregate with fallback: Block-Service-OK | File-Service-OK-10000-lines (Duration: 306ms) status=200 total=0.306550s
fallback-12 Fallback response: One service failed (java.lang.RuntimeException: File service temporarily unavailable), but we handled it gracefully (Duration: 0ms) status=200 total=0.001240s

Eight requests took the normal path and returned in roughly 300ms. Four requests took the fallback path and returned in about 1ms.

That fast fallback is not magic. The possible file failure happens before the 300ms blocking branch completes. With the failure-oriented joiner, the scope can stop waiting for the sibling work and return the fallback response. That is the kind of detail a benchmark table would hide.

Multi-service aggregation keeps the same shape

The multi-aggregate endpoint adds two more branches:

java
private static String multiServiceAggregation() throws Exception {
    long startTime = System.currentTimeMillis();

    try (var scope = StructuredTaskScope.open(StructuredTaskScope.Joiner.awaitAllSuccessfulOrThrow())) {
        var blockFuture = scope.fork(() -> fetchBlock());
        var fileFuture = scope.fork(() -> fetchFile());
        var computeFuture = scope.fork(() -> fetchCompute());
        var cacheFuture = scope.fork(() -> slowService("Cache", 150));

        scope.join();

        long duration = System.currentTimeMillis() - startTime;
        return String.format("Multi-service result: Block[%s] | File[%s] | Compute[%s] | Cache[%s] (Total: %dms)",
            blockFuture.get(), fileFuture.get(), computeFuture.get(), cacheFuture.get(), duration);
    }
}

The slowest branch is still the 300ms block. The cache branch sleeps for 150ms, the file branch is local, and the compute branch sums primes up to 10,000.

The measured request returned:

text
Multi-service result: Block[Block-Service-OK] | File[File-Service-OK-10000-lines] | Compute[Compute-Service-OK-5736396] | Cache[Cache-OK-150ms] (Total: 305ms) (Duration: 306ms, Thread: , Request: #9)
status=200 total=0.307478s

Adding branches did not make this request take the sum of every branch. The duration still tracked the slowest branch in the scope.

What I ran

The measurements below were generated with OpenJDK 25.0.2 and Maven 3.9.12:

bash
mvn clean compile -DskipTests
mvn dependency:build-classpath -Dmdep.outputFile=cp.txt
export CP="$(cat cp.txt):target/classes"

The build succeeded and compiled 35 source files.

Then I started the service:

bash
java --enable-preview -cp "$CP" app.js.microservices.VirtualThreadMicroservice

The checked-in script test_structured_concurrency.sh returned the key endpoint outputs:

text
DB call completed (Duration: 306ms, Thread: , Request: #1)
File read completed. Lines: 10000 (Duration: 8ms, Thread: , Request: #2)
StructuredTaskScope Combined: Block-Service-OK | File-Service-OK-10000-lines (Total: 303ms) (Duration: 303ms, Thread: , Request: #3)
First successful result: Cache-2-OK-200ms (Duration: 206ms) (Duration: 208ms, Thread: , Request: #4)
Fallback response: One service failed (java.lang.RuntimeException: File service temporarily unavailable), but we handled it gracefully (Duration: 1ms) (Duration: 1ms, Thread: , Request: #5)
Multi-service result: Block[Block-Service-OK] | File[File-Service-OK-10000-lines] | Compute[Compute-Service-OK-5736396] | Cache[Cache-OK-150ms] (Total: 305ms) (Duration: 305ms, Thread: , Request: #6)

The blank Thread: field is the same naming detail from Part 2: these virtual threads were not created with an explicit name. It does not affect the policy behavior, but it matters if you expect thread names to carry diagnostic context.

I also ran focused load checks, one endpoint at a time:

bash
wrk -t4 -c40 -d10s http://localhost:8080/aggregate
wrk -t4 -c40 -d10s http://localhost:8080/first-success
wrk -t4 -c40 -d10s http://localhost:8080/multi-aggregate

The load results were:

EndpointExpected slow branchAverage latencyRequests/secTotal requests
/aggregate300ms block305.70ms128.551,297
/first-success200ms cache hit206.51ms190.321,920
/multi-aggregate300ms block304.89ms130.911,320

The same results as raw output:

text
wrk -t4 -c40 -d10s http://localhost:8080/aggregate
Latency   305.70ms    2.82ms 317.63ms   74.25%
Requests/sec:    128.55
1297 requests in 10.09s

wrk -t4 -c40 -d10s http://localhost:8080/first-success
Latency   206.51ms    2.47ms 223.12ms   81.09%
Requests/sec:    190.32
1920 requests in 10.09s

wrk -t4 -c40 -d10s http://localhost:8080/multi-aggregate
Latency   304.89ms    2.32ms 314.85ms   70.91%
Requests/sec:    130.91
1320 requests in 10.08s

The math is useful. With 40 clients and a 300ms slow branch, the rough ceiling is 40 / 0.3, or about 133 requests per second. /aggregate measured 128.55 requests per second, and /multi-aggregate measured 130.91. With a 200ms first-success branch, the rough ceiling is 40 / 0.2, or about 200 requests per second. /first-success measured 190.32.

That alignment matters more than the absolute numbers. The benchmark is credible because the result follows the code.

The final metrics snapshot returned:

text
Virtual Thread Microservice Metrics:
=====================================
Active Requests: 0
Total Requests: 4669
Average Response Time: 261.42ms
CPU Usage: 12.57%
Memory Usage: 242.77MB / 776.00MB
JVM Uptime: 74 seconds
Thread Type: Virtual Threads

As in Part 2, the average response time is one aggregate number across every route touched in this run. It is useful as a broad smoke signal, not as a replacement for per-endpoint latency.

What this does not prove

This pass does not prove that structured concurrency is universally faster than CompletableFuture, and it does not prove that virtual threads make microservices ready for real deployment by themselves. The service uses simulated delays, a local generated file, and small in-process branches.

The results do show something narrower and more useful: when the code says "fork these sibling tasks and join the request scope," the measured duration tracks the slowest required branch or the first successful branch, depending on the policy.

That is the teaching point. The value is not a headline speedup. The value is being able to read the policy locally in the code and then verify that the endpoint behaves the same way under a small load test.

How to test these patterns

Microservice fan-out needs tests that check policy, not just HTTP status. An aggregate endpoint should prove that all required branches contributed to the response and that the duration follows the slowest branch rather than the sum of every branch. A first-success endpoint should show which branch won and whether the response returns near that branch's latency. A fallback endpoint needs a sequence, because one request may only show the happy path. The sequence should make clear whether fallback waits for unrelated sibling work or returns as soon as the failed branch determines the outcome.

Load tests should keep one endpoint under test at a time. Mixing /aggregate, /first-success, and /multi-aggregate into one run would hide the relationship between branch delay and request latency. Capture /metrics after the load, but use per-endpoint wrk output for latency and throughput.

What comes next

Part 3 showed how virtual threads and structured concurrency make request fan-out easier to read and test. The strongest evidence was not a big throughput claim. It was the alignment between branch delays and measured request times: 300ms aggregate scopes stayed near 300ms, the 200ms first-success branch returned near 200ms, and fallback returned immediately when the file branch failed before the blocking branch completed.

Part 4 moves deeper into structured concurrency itself: what the API is trying to fix, where preview syntax matters, and how scope lifetime changes the way Java code owns concurrent work.


Resources

  • Load Testing: Run wrk -t8 -c1000 -d30s http://localhost:8080/aggregate
  • Try It Yourself: Clone the repo and run the microservice locally
  • Official Documentation: JEP 453: Structured Concurrency