engnotes.dev
NotebookTopicsAbout

Subscribe

One email when a new post goes up. Nothing else.

one per post · no tracking · also on RSS

Site

  • Notebook
  • Topics
  • About
  • Contact

Topics

Project Loom9Structured Concurrency9Tail Latency & System Behavior4

Elsewhere

  • GitHub
  • X
  • LinkedIn
  • Email
engnotes.dev© 2026 Jagdish Salgotra · written on personal time. not on employer time.
PrivacyTermsCookies
blog/project-loom/part 2
Project Loom · Part 2 of 9

Building Web Services with Virtual Threads

The reason web servers moved to reactive code was that platform threads cost too much to dedicate one per blocking request, and virtual threads remove that cost without asking the request handler to look any different.

J
Jagdish Salgotra
2025-07-13·15 min read·~1,500 words

Series navigation

← Previous · Part 1Java Virtual Threads: Why They Matter for I/O ScalabilityNext · Part 3 →Real-World Microservices
Code repositoryproject-loom
#project-loom
share
J

Written by

Jagdish Salgotra

Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.

all posts

Keep reading · rest of the series

  • 2025-07-0615 min read
    Part 1
    Java Virtual Threads: Why They Matter for I/O Scalability
  • 2025-07-2028 min read
    Part 3
    Real-World Microservices
  • 2025-07-2725 min read
    Part 4
    Structured Concurrency in Practice
  • 2025-08-0312 min read
    Part 5
    Advanced Structured Concurrency Patterns
Was this article helpful? or email →
anonymous · no account needed

On this page

Reading progress

0 min of 15 · ~15 left

Ask the post

Any answer points back at the paragraph it came from.

Note This series uses Java 21 as the baseline. Virtual threads are stable in Java 21 (JEP 444). Structured concurrency snippets in this part (StructuredTaskScope, JEP 453) use preview APIs and require --enable-preview.

Virtual threads become interesting when they are attached to something ordinary: an HTTP server, a request handler, and a few routes that do different kinds of work.

Part 1 used a tiny /api endpoint to isolate one idea. A fixed platform-thread pool queued under a 40-connection blocking load, while a virtual-thread-per-task executor stayed close to the 200ms simulated dependency. That was useful because the code was small enough to reason about.

Part 2 keeps the same discipline but moves one step closer to a service. The checked-in service has separate routes for CPU work, simulated blocking work, file reading, basic metrics, and one small fan-out example. The goal is not to claim that virtual threads make every route faster. The goal is to see what the service actually does under load and to connect each result back to the code.

This article is learning material. The main branch now builds with OpenJDK 25.0.2 and uses the Java 25 preview structured-concurrency API, with the Java 21 version separately managed in the feature/java-21 branch. Virtual threads themselves are final in Java 21. The commands below use --enable-preview because they run inside the current companion repository, where the whole Maven project is configured for Java 25 preview features. Part 9 covers the Java 21 to Java 25 migration.

The service shape

The main file for this article is VirtualThreadMicroservice.java. It starts an HttpServer on port 8080 and sets the executor to a virtual-thread-per-task executor:

java
HttpServer server = HttpServer.create(new InetSocketAddress(PORT), 0);
server.setExecutor(Executors.newVirtualThreadPerTaskExecutor());

That one line changes the request execution model. Each HTTP exchange can run as ordinary blocking code on a virtual thread. The route bodies still look like normal Java.

The /compute route does CPU work by summing primes up to 50,000:

java
server.createContext("/compute", exchange -> handleRequest(exchange, "COMPUTE", () -> {
    long result = 0;
    for (int i = 2; i <= 50_000; i++) {
        if (isPrime(i)) {
            result += i;
        }
    }
    return "CPU Task completed. Result: " + result;
}));

The /block route simulates a database or network call with a 300ms sleep:

java
server.createContext("/block", exchange -> handleRequest(exchange, "BLOCK", () -> {
    try {
        Thread.sleep(300);
        return "DB call completed";
    } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
        throw new RuntimeException("Interrupted", e);
    }
}));

The /file route reads a generated 10,000-line local file:

java
server.createContext("/file", exchange -> handleRequest(exchange, "FILE", () -> {
    try {
        List<String> lines = Files.readAllLines(Paths.get(LARGE_FILE));
        return "File read completed. Lines: " + lines.size();
    } catch (IOException e) {
        throw new RuntimeException("File read error", e);
    }
}));

These routes are deliberately simple. They are not trying to model a full business system. They give us three different kinds of work to measure in the same service:

EndpointWork shapeWhat the result can teach
/computeCPU loopVirtual threads do not create more CPU cores
/block300ms simulated waitVirtual threads are a good fit when the task mostly waits
/filelocal file readFast I/O can become CPU, allocation, or filesystem-cache sensitive

That distinction matters more than a single throughput number.

What I ran

The measurements below were generated with OpenJDK 25.0.2 and Maven 3.9.12:

bash
mvn clean compile -DskipTests
mvn dependency:build-classpath -Dmdep.outputFile=cp.txt
export CP="$(cat cp.txt):target/classes"

The build succeeded and compiled 35 source files.

Then I started the service:

bash
java --enable-preview -cp "$CP" app.js.microservices.VirtualThreadMicroservice

The service created microservice_test_file.txt, started on port 8080, and exposed these routes:

text
GET /compute           - CPU-intensive task
GET /block             - Simulated DB call (300ms)
GET /file              - Large file read
GET /aggregate         - StructuredTaskScope aggregate
GET /aggregate-old     - CompletableFuture aggregate
GET /first-success     - First successful response
GET /aggregate-with-fallback - With error handling
GET /multi-aggregate   - Multiple service aggregation
GET /metrics           - Performance metrics
GET /health            - Health check

For Part 2, the relevant web-service routes are /compute, /block, /file, and /metrics. I also ran /aggregate once to confirm the fan-out route still behaves as expected, but the load measurements below focus on the three basic route shapes.

Single-request checks

The health check returned:

text
Virtual Thread Microservice is running!
status=200 total=0.026820s

The CPU route returned:

text
CPU Task completed. Result: 121013308 (Duration: 2ms, Thread: , Request: #1)
status=200 total=0.004772s

The blocking route returned:

text
DB call completed (Duration: 303ms, Thread: , Request: #2)
status=200 total=0.305189s

The file route returned:

text
File read completed. Lines: 10000 (Duration: 9ms, Thread: , Request: #3)
status=200 total=0.010590s

The first metrics snapshot returned:

text
Virtual Thread Microservice Metrics:
=====================================
Active Requests: 1
Total Requests: 2
Average Response Time: 5.50ms
CPU Usage: 0.00%
Memory Usage: 20.01MB / 776.00MB
JVM Uptime: 11 seconds
Thread Type: Virtual Threads

That metrics snapshot was taken while the smoke checks were running, so Active Requests: 1 is not a problem. It is a useful reminder that a metrics endpoint is also a request and can race with the work you are measuring.

Load-test results

I used the same small load shape for all three routes:

bash
wrk -t4 -c40 -d10s http://localhost:8080/compute
wrk -t4 -c40 -d10s http://localhost:8080/block
wrk -t4 -c40 -d10s http://localhost:8080/file

The CPU route produced:

text
Running 10s test @ http://localhost:8080/compute
  4 threads and 40 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.15ms    1.16ms  32.86ms   93.63%
    Req/Sec     9.98k     1.47k   21.67k    83.58%
  399007 requests in 10.10s, 69.53MB read
Requests/sec:  39504.87
Transfer/sec:      6.88MB

The blocking route produced:

text
Running 10s test @ http://localhost:8080/block
  4 threads and 40 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   304.55ms    1.64ms 307.80ms   64.47%
    Req/Sec    32.73      1.55    33.00     96.97%
  1320 requests in 10.08s, 212.70KB read
Requests/sec:    130.93
Transfer/sec:     21.10KB

The file route produced:

text
Running 10s test @ http://localhost:8080/file
  4 threads and 40 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.32ms    5.04ms 124.32ms   96.39%
    Req/Sec     2.08k   189.96     2.49k    90.75%
  82995 requests in 10.02s, 14.17MB read
Requests/sec:   8283.47
Transfer/sec:      1.41MB

The final metrics snapshot after the load checks returned:

text
Virtual Thread Microservice Metrics:
=====================================
Active Requests: 0
Total Requests: 483423
Average Response Time: 1.39ms
CPU Usage: 53.06%
Memory Usage: 556.80MB / 1360.00MB
JVM Uptime: 58 seconds
Thread Type: Virtual Threads

Here is the same measurement in table form:

EndpointCode pathWork per requestAverage latencyRequests/secTotal requests
/computeprime sum to 50,000CPU loop1.15ms39,504.87399,007
/blockThread.sleep(300)simulated wait304.55ms130.931,320
/fileread 10,000 lineslocal file read5.32ms8,283.4782,995

The /block number is the easiest one to trust. With 40 clients and a 300ms sleep, the rough ceiling is 40 / 0.3, or about 133 requests per second. The measured result was 130.93 requests per second, and the average latency was 304.55ms. That lines up with the code.

The /compute number should be read differently. It is high because the work is small on this machine, not because virtual threads somehow make CPU work special. If the prime loop were larger, the limit would move toward available CPU. Virtual threads are not a replacement for CPU sizing.

The /file number is also local to this machine. The service creates a 10,000-line test file and reads it repeatedly. Once the file is warm in the filesystem cache, this route is no longer a clean proxy for disk latency. It is still useful because it shows how to wire ordinary blocking file I/O through the same request handler, but it should not be used as a storage benchmark.

The request handler stays boring

The handler in VirtualThreadMicroservice is intentionally conventional:

java
private static void handleRequest(HttpExchange exchange, String endpoint, RequestHandler handler) {
    long requestId = requestCounter.incrementAndGet();
    long startTime = System.currentTimeMillis();
    activeRequests.incrementAndGet();

    try {
        String result = handler.handle();
        long duration = System.currentTimeMillis() - startTime;
        totalRequests.incrementAndGet();
        totalResponseTime.addAndGet(duration);

        // Default virtual threads are unnamed here; use a named factory if logs need names.
        String response = String.format("%s (Duration: %dms, Thread: %s, Request: #%d)",
            result, duration, Thread.currentThread().getName(), requestId);

        sendResponse(exchange, response);
    } catch (Exception e) {
        long duration = System.currentTimeMillis() - startTime;
        totalResponseTime.addAndGet(duration);
        sendErrorResponse(exchange, "Error: " + e.getMessage());
    } finally {
        activeRequests.decrementAndGet();
    }
}

There is no callback chain. The route body returns a string or throws an exception. The handler records timing and active request counts in one place. That is a real benefit of virtual threads: they let many services keep direct-style request handling while avoiding a platform thread per blocked request.

The code is not perfect, and that is worth saying in a teaching article. Thread.currentThread().getName() is empty in the responses above because these virtual threads were not named. If thread names matter to your logs, name the factory explicitly. The metrics endpoint also reports averages across all route types together, so the final Average Response Time: 1.39ms is dominated by the very fast /compute run and does not describe /block latency.

Metrics need dimensions. A single average across compute, blocking, and file routes is fine for a toy service, but it is not enough for real diagnosis.

Where structured concurrency appears

This service also exposes /aggregate, which fans out to the 300ms blocking call and the file read:

java
private static String aggregateWithStructuredConcurrency() throws Exception {
    long startTime = System.currentTimeMillis();

    try (var scope = StructuredTaskScope.open(StructuredTaskScope.Joiner.awaitAllSuccessfulOrThrow())) {
        var blockFuture = scope.fork(() -> fetchBlock());
        var fileFuture = scope.fork(() -> fetchFile());

        scope.join();

        long duration = System.currentTimeMillis() - startTime;
        return String.format("StructuredTaskScope Combined: %s | %s (Total: %dms)",
            blockFuture.get(), fileFuture.get(), duration);
    }
}

The smoke check returned:

text
StructuredTaskScope Combined: Block-Service-OK | File-Service-OK-10000-lines (Total: 307ms) (Duration: 307ms, Thread: , Request: #483424)
status=200 total=0.308223s

The useful observation is the same one that will return in later articles: the aggregate duration follows the slowest branch. The file read was much faster than the 300ms simulated dependency, so the whole fan-out completed in about 307ms.

The main branch now builds with OpenJDK 25.0.2 and uses the Java 25 preview structured-concurrency API, with the Java 21 version separately managed in the feature/java-21 branch. Part 9 covers the migration details.

What this does not prove

This run does not prove that virtual threads are always faster than platform threads. I did not collect a platform-thread baseline for this article, so the article should not compare against the port 8081 PlatformThreadMicroservice.

That means this article should not make a platform-thread-versus-virtual-thread performance claim from this run. The evidence here is narrower: the checked-in virtual-thread service builds, starts, serves the documented routes, and behaves in a way that matches the code.

The most useful result is the blocking route. It shows that a request can wait for 300ms without tying up a dedicated platform worker for the entire wait. With 40 clients, the endpoint stayed near the simulated dependency time instead of drifting into executor queueing.

The next useful result is the contrast between /block and /compute. Virtual threads help when waiting dominates. When CPU dominates, the CPU is still the scarce resource.

How to test this service

A good test for this service keeps the route shapes separate. Measure /compute, /block, and /file independently before mixing them, because each route has a different bottleneck. The blocking route should be checked against the simple concurrency math: concurrent clients divided by wait time. If that estimate and the measured result are close, the benchmark is probably exercising the code path you think it is exercising.

The metrics endpoint is useful as a sanity check, not as the only source of truth. It reports aggregate counters across route types, so use wrk output for per-route latency and throughput. Capture /metrics after the run to see request totals, memory, CPU, uptime, and whether active requests returned to zero.

If you want to compare virtual threads against the platform-thread microservice, make sure port 8081 is free before starting PlatformThreadMicroservice or running scripts/benchmark.sh. A forced comparison with the wrong service on the port is worse than no comparison.

What comes next

Part 2 moved from a tiny endpoint to a runnable web service. The important result was not the largest number in the table. It was the shape: /block followed the 300ms simulated wait, /compute showed that CPU work has a different limit, and /file showed why local file benchmarks need careful interpretation.

Part 3 moves from route-level behavior into microservice patterns: fan-out, fallback, first-success calls, and service aggregation. That is where virtual threads start to interact with structured concurrency and failure policy.


Resources

  • Load Testing: Use wrk or Apache Bench to test your implementations
  • Official Documentation: JEP 444: Virtual Threads
  • Try It Yourself: Clone the repo and run the examples with Java 21+