Series navigation
Written by
Jagdish Salgotra
Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.
The reason web servers moved to reactive code was that platform threads cost too much to dedicate one per blocking request, and virtual threads remove that cost without asking the request handler to look any different.
Written by
Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.
Note This series uses Java 21 as the baseline. Virtual threads are stable in Java 21 (JEP 444). Structured concurrency snippets in this part (
StructuredTaskScope, JEP 453) use preview APIs and require--enable-preview.
Virtual threads become interesting when they are attached to something ordinary: an HTTP server, a request handler, and a few routes that do different kinds of work.
Part 1 used a tiny /api endpoint to isolate one idea. A fixed platform-thread pool queued under a 40-connection blocking load, while a virtual-thread-per-task executor stayed close to the 200ms simulated dependency. That was useful because the code was small enough to reason about.
Part 2 keeps the same discipline but moves one step closer to a service. The checked-in service has separate routes for CPU work, simulated blocking work, file reading, basic metrics, and one small fan-out example. The goal is not to claim that virtual threads make every route faster. The goal is to see what the service actually does under load and to connect each result back to the code.
This article is learning material. The main branch now builds with OpenJDK 25.0.2 and uses the Java 25 preview structured-concurrency API, with the Java 21 version separately managed in the feature/java-21 branch. Virtual threads themselves are final in Java 21. The commands below use --enable-preview because they run inside the current companion repository, where the whole Maven project is configured for Java 25 preview features. Part 9 covers the Java 21 to Java 25 migration.
The main file for this article is VirtualThreadMicroservice.java. It starts an HttpServer on port 8080 and sets the executor to a virtual-thread-per-task executor:
HttpServer server = HttpServer.create(new InetSocketAddress(PORT), 0);
server.setExecutor(Executors.newVirtualThreadPerTaskExecutor());
That one line changes the request execution model. Each HTTP exchange can run as ordinary blocking code on a virtual thread. The route bodies still look like normal Java.
The /compute route does CPU work by summing primes up to 50,000:
server.createContext("/compute", exchange -> handleRequest(exchange, "COMPUTE", () -> {
long result = 0;
for (int i = 2; i <= 50_000; i++) {
if (isPrime(i)) {
result += i;
}
}
return "CPU Task completed. Result: " + result;
}));
The /block route simulates a database or network call with a 300ms sleep:
server.createContext("/block", exchange -> handleRequest(exchange, "BLOCK", () -> {
try {
Thread.sleep(300);
return "DB call completed";
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new RuntimeException("Interrupted", e);
}
}));
The /file route reads a generated 10,000-line local file:
server.createContext("/file", exchange -> handleRequest(exchange, "FILE", () -> {
try {
List<String> lines = Files.readAllLines(Paths.get(LARGE_FILE));
return "File read completed. Lines: " + lines.size();
} catch (IOException e) {
throw new RuntimeException("File read error", e);
}
}));
These routes are deliberately simple. They are not trying to model a full business system. They give us three different kinds of work to measure in the same service:
| Endpoint | Work shape | What the result can teach |
|---|---|---|
/compute | CPU loop | Virtual threads do not create more CPU cores |
/block | 300ms simulated wait | Virtual threads are a good fit when the task mostly waits |
/file | local file read | Fast I/O can become CPU, allocation, or filesystem-cache sensitive |
That distinction matters more than a single throughput number.
The measurements below were generated with OpenJDK 25.0.2 and Maven 3.9.12:
mvn clean compile -DskipTests
mvn dependency:build-classpath -Dmdep.outputFile=cp.txt
export CP="$(cat cp.txt):target/classes"
The build succeeded and compiled 35 source files.
Then I started the service:
java --enable-preview -cp "$CP" app.js.microservices.VirtualThreadMicroservice
The service created microservice_test_file.txt, started on port 8080, and exposed these routes:
GET /compute - CPU-intensive task
GET /block - Simulated DB call (300ms)
GET /file - Large file read
GET /aggregate - StructuredTaskScope aggregate
GET /aggregate-old - CompletableFuture aggregate
GET /first-success - First successful response
GET /aggregate-with-fallback - With error handling
GET /multi-aggregate - Multiple service aggregation
GET /metrics - Performance metrics
GET /health - Health check
For Part 2, the relevant web-service routes are /compute, /block, /file, and /metrics. I also ran /aggregate once to confirm the fan-out route still behaves as expected, but the load measurements below focus on the three basic route shapes.
The health check returned:
Virtual Thread Microservice is running!
status=200 total=0.026820s
The CPU route returned:
CPU Task completed. Result: 121013308 (Duration: 2ms, Thread: , Request: #1)
status=200 total=0.004772s
The blocking route returned:
DB call completed (Duration: 303ms, Thread: , Request: #2)
status=200 total=0.305189s
The file route returned:
File read completed. Lines: 10000 (Duration: 9ms, Thread: , Request: #3)
status=200 total=0.010590s
The first metrics snapshot returned:
Virtual Thread Microservice Metrics:
=====================================
Active Requests: 1
Total Requests: 2
Average Response Time: 5.50ms
CPU Usage: 0.00%
Memory Usage: 20.01MB / 776.00MB
JVM Uptime: 11 seconds
Thread Type: Virtual Threads
That metrics snapshot was taken while the smoke checks were running, so Active Requests: 1 is not a problem. It is a useful reminder that a metrics endpoint is also a request and can race with the work you are measuring.
I used the same small load shape for all three routes:
wrk -t4 -c40 -d10s http://localhost:8080/compute
wrk -t4 -c40 -d10s http://localhost:8080/block
wrk -t4 -c40 -d10s http://localhost:8080/file
The CPU route produced:
Running 10s test @ http://localhost:8080/compute
4 threads and 40 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.15ms 1.16ms 32.86ms 93.63%
Req/Sec 9.98k 1.47k 21.67k 83.58%
399007 requests in 10.10s, 69.53MB read
Requests/sec: 39504.87
Transfer/sec: 6.88MB
The blocking route produced:
Running 10s test @ http://localhost:8080/block
4 threads and 40 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 304.55ms 1.64ms 307.80ms 64.47%
Req/Sec 32.73 1.55 33.00 96.97%
1320 requests in 10.08s, 212.70KB read
Requests/sec: 130.93
Transfer/sec: 21.10KB
The file route produced:
Running 10s test @ http://localhost:8080/file
4 threads and 40 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 5.32ms 5.04ms 124.32ms 96.39%
Req/Sec 2.08k 189.96 2.49k 90.75%
82995 requests in 10.02s, 14.17MB read
Requests/sec: 8283.47
Transfer/sec: 1.41MB
The final metrics snapshot after the load checks returned:
Virtual Thread Microservice Metrics:
=====================================
Active Requests: 0
Total Requests: 483423
Average Response Time: 1.39ms
CPU Usage: 53.06%
Memory Usage: 556.80MB / 1360.00MB
JVM Uptime: 58 seconds
Thread Type: Virtual Threads
Here is the same measurement in table form:
| Endpoint | Code path | Work per request | Average latency | Requests/sec | Total requests |
|---|---|---|---|---|---|
/compute | prime sum to 50,000 | CPU loop | 1.15ms | 39,504.87 | 399,007 |
/block | Thread.sleep(300) | simulated wait | 304.55ms | 130.93 | 1,320 |
/file | read 10,000 lines | local file read | 5.32ms | 8,283.47 | 82,995 |
The /block number is the easiest one to trust. With 40 clients and a 300ms sleep, the rough ceiling is 40 / 0.3, or about 133 requests per second. The measured result was 130.93 requests per second, and the average latency was 304.55ms. That lines up with the code.
The /compute number should be read differently. It is high because the work is small on this machine, not because virtual threads somehow make CPU work special. If the prime loop were larger, the limit would move toward available CPU. Virtual threads are not a replacement for CPU sizing.
The /file number is also local to this machine. The service creates a 10,000-line test file and reads it repeatedly. Once the file is warm in the filesystem cache, this route is no longer a clean proxy for disk latency. It is still useful because it shows how to wire ordinary blocking file I/O through the same request handler, but it should not be used as a storage benchmark.
The handler in VirtualThreadMicroservice is intentionally conventional:
private static void handleRequest(HttpExchange exchange, String endpoint, RequestHandler handler) {
long requestId = requestCounter.incrementAndGet();
long startTime = System.currentTimeMillis();
activeRequests.incrementAndGet();
try {
String result = handler.handle();
long duration = System.currentTimeMillis() - startTime;
totalRequests.incrementAndGet();
totalResponseTime.addAndGet(duration);
// Default virtual threads are unnamed here; use a named factory if logs need names.
String response = String.format("%s (Duration: %dms, Thread: %s, Request: #%d)",
result, duration, Thread.currentThread().getName(), requestId);
sendResponse(exchange, response);
} catch (Exception e) {
long duration = System.currentTimeMillis() - startTime;
totalResponseTime.addAndGet(duration);
sendErrorResponse(exchange, "Error: " + e.getMessage());
} finally {
activeRequests.decrementAndGet();
}
}
There is no callback chain. The route body returns a string or throws an exception. The handler records timing and active request counts in one place. That is a real benefit of virtual threads: they let many services keep direct-style request handling while avoiding a platform thread per blocked request.
The code is not perfect, and that is worth saying in a teaching article. Thread.currentThread().getName() is empty in the responses above because these virtual threads were not named. If thread names matter to your logs, name the factory explicitly. The metrics endpoint also reports averages across all route types together, so the final Average Response Time: 1.39ms is dominated by the very fast /compute run and does not describe /block latency.
Metrics need dimensions. A single average across compute, blocking, and file routes is fine for a toy service, but it is not enough for real diagnosis.
This service also exposes /aggregate, which fans out to the 300ms blocking call and the file read:
private static String aggregateWithStructuredConcurrency() throws Exception {
long startTime = System.currentTimeMillis();
try (var scope = StructuredTaskScope.open(StructuredTaskScope.Joiner.awaitAllSuccessfulOrThrow())) {
var blockFuture = scope.fork(() -> fetchBlock());
var fileFuture = scope.fork(() -> fetchFile());
scope.join();
long duration = System.currentTimeMillis() - startTime;
return String.format("StructuredTaskScope Combined: %s | %s (Total: %dms)",
blockFuture.get(), fileFuture.get(), duration);
}
}
The smoke check returned:
StructuredTaskScope Combined: Block-Service-OK | File-Service-OK-10000-lines (Total: 307ms) (Duration: 307ms, Thread: , Request: #483424)
status=200 total=0.308223s
The useful observation is the same one that will return in later articles: the aggregate duration follows the slowest branch. The file read was much faster than the 300ms simulated dependency, so the whole fan-out completed in about 307ms.
The main branch now builds with OpenJDK 25.0.2 and uses the Java 25 preview structured-concurrency API, with the Java 21 version separately managed in the feature/java-21 branch. Part 9 covers the migration details.
This run does not prove that virtual threads are always faster than platform threads. I did not collect a platform-thread baseline for this article, so the article should not compare against the port 8081 PlatformThreadMicroservice.
That means this article should not make a platform-thread-versus-virtual-thread performance claim from this run. The evidence here is narrower: the checked-in virtual-thread service builds, starts, serves the documented routes, and behaves in a way that matches the code.
The most useful result is the blocking route. It shows that a request can wait for 300ms without tying up a dedicated platform worker for the entire wait. With 40 clients, the endpoint stayed near the simulated dependency time instead of drifting into executor queueing.
The next useful result is the contrast between /block and /compute. Virtual threads help when waiting dominates. When CPU dominates, the CPU is still the scarce resource.
A good test for this service keeps the route shapes separate. Measure /compute, /block, and /file independently before mixing them, because each route has a different bottleneck. The blocking route should be checked against the simple concurrency math: concurrent clients divided by wait time. If that estimate and the measured result are close, the benchmark is probably exercising the code path you think it is exercising.
The metrics endpoint is useful as a sanity check, not as the only source of truth. It reports aggregate counters across route types, so use wrk output for per-route latency and throughput. Capture /metrics after the run to see request totals, memory, CPU, uptime, and whether active requests returned to zero.
If you want to compare virtual threads against the platform-thread microservice, make sure port 8081 is free before starting PlatformThreadMicroservice or running scripts/benchmark.sh. A forced comparison with the wrong service on the port is worse than no comparison.
Part 2 moved from a tiny endpoint to a runnable web service. The important result was not the largest number in the table. It was the shape: /block followed the 300ms simulated wait, /compute showed that CPU work has a different limit, and /file showed why local file benchmarks need careful interpretation.
Part 3 moves from route-level behavior into microservice patterns: fan-out, fallback, first-success calls, and service aggregation. That is where virtual threads start to interact with structured concurrency and failure policy.
wrk or Apache Bench to test your implementations