Was this article helpful?
to mark as helpful
Enjoyed this article?
Get more engineering insights delivered weekly.
Comments
to join the discussion
to mark as helpful
Get more engineering insights delivered weekly.
to join the discussion
Jagdish Salgotra
Aug 10, 2025 · 35 min read · Project Loom
Settle the performance debate with an evidence-based comparison of Java's concurrency models. Analyze real-world benchmarks, memory footprints, and throughput metrics for virtual threads, platform threads, and reactive programming.
Your article assistant
Ask me anything about this article. I'll provide answers with relevant sources.
Try asking:
Note This series uses Java 21 as the baseline. Virtual threads are stable in Java 21 (JEP 444). Structured concurrency snippets in this part (
StructuredTaskScope, JEP 453) use preview APIs and require--enable-preview.
Performance debates get noisy fast, especially when architecture style turns into identity.
Teams often assume one model is always faster, then get surprised once memory, failure behavior, and operational overhead are measured together.
A common pattern in production:
// Async style in this project (AsyncHttpClient)
public CompletableFuture<String> getAsync(String url) {
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(url))
.timeout(Duration.ofSeconds(30))
.GET()
.build();
return httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofString())
.thenApply(HttpResponse::body);
}
Common sources of performance confusion:
This is why raw throughput alone is not enough for architectural decisions.
The key point: performance is not only throughput. Resource efficiency, failure behavior, and operational complexity matter too.
The rest of this part compares the three approaches using the same scenarios and test setup.
Test setup used for this run:
How to read these numbers: Useful for comparison, not a universal baseline.
The Scenario: Processing user orders that require calls to payment service (200ms), inventory service (150ms), and notification service (100ms).
public class VirtualThreadMicroservice {
static void main(String[] args) throws IOException {
HttpServer server = HttpServer.create(new InetSocketAddress(PORT), 0);
server.setExecutor(Executors.newVirtualThreadPerTaskExecutor());
server.createContext("/block", exchange -> handleRequest(exchange, "BLOCK", () -> {
Thread.sleep(300);
return "DB call completed";
}));
server.createContext("/file", exchange -> handleRequest(exchange, "FILE", () -> {
List<String> lines = Files.readAllLines(Paths.get(LARGE_FILE));
return "File read completed. Lines: " + lines.size();
}));
}
}
public class PlatformThreadOrderService {
ExecutorService threadPoolExecutor = Executors.newFixedThreadPool(THREAD_POOL_SIZE);
void configureServer() throws IOException {
HttpServer server = HttpServer.create(new InetSocketAddress(PORT), 0);
server.setExecutor(threadPoolExecutor);
server.createContext("/block", exchange -> {
handleRequest(exchange, "BLOCK", () -> {
Thread.sleep(300);
return "DB call completed";
});
});
}
}
// Async composition baseline with CompletableFuture
CompletableFuture<String> blockFuture = CompletableFuture.supplyAsync(() -> {
try {
return fetchBlock();
} catch (Exception e) {
throw new RuntimeException(e);
}
});
CompletableFuture<String> fileFuture = CompletableFuture.supplyAsync(() -> {
try {
return fetchFile();
} catch (Exception e) {
throw new RuntimeException(e);
}
});
CompletableFuture.allOf(blockFuture, fileFuture).join();
String result = blockFuture.get() + " | " + fileFuture.get();
These numbers are from one simplified environment and workload model. Treat them as illustrative.
Concurrent Requests Handling Capacity:
MAXIMUM CONCURRENT REQUESTS
Virtual Threads: 100,000+ requests
Platform Threads: 1,200 requests (crashes beyond this)
Reactive: 5,500 requests (performance degrades)
RESOURCE USAGE AT 10,000 CONCURRENT REQUESTS
Virtual Threads:
- Memory Usage: 512 MB
- CPU Usage: 15%
- Response Time: 205ms (average)
- Error Rate: 0%
Platform Threads:
- Memory Usage: 4.2 GB
- CPU Usage: 45%
- Response Time: 450ms (average)
- Error Rate: 15% (thread pool exhaustion)
Reactive:
- Memory Usage: 1.8 GB
- CPU Usage: 35%
- Response Time: 280ms (average)
- Error Rate: 2%
How to read the I/O-bound results:
The Scenario: Prime number calculation for numbers 2 to 100,000 - pure computational work with no I/O blocking.
// CPU-intensive computation test
CompletableFuture<String> future = CompletableFuture.supplyAsync(() -> {
platformThreadCount.incrementAndGet();
try {
long result = 0;
for (int i = 2; i <= 100_000; i++) {
if (isPrime(i)) {
result += i;
}
}
return "CPU Task completed on platform thread. Result: " + result;
} finally {
platformThreadCount.decrementAndGet();
}
}, cpuIntensiveExecutor);
return future.get();
CPU-INTENSIVE WORKLOAD RESULTS (Prime calculation to 100K)
Virtual Threads:
- Execution Time: 2,847ms
- Memory Usage: 145 MB
- CPU Utilization: 98%
- Threads Created: 8 (matches CPU cores)
Platform Threads:
- Execution Time: 2,832ms
- Memory Usage: 142 MB
- CPU Utilization: 99%
- Threads Created: 8
Reactive (Parallel Stream):
- Execution Time: 2,951ms
- Memory Usage: 167 MB
- CPU Utilization: 94%
- Additional Overhead: 8%
Performance Winner: Platform Threads (by 15ms - marginal)
Key insight: For pure CPU work in this run, the performance gap was small and platform threads held a slight edge.
The Scenario: E-commerce order processing with both I/O calls (payment validation, inventory check) and CPU work (fraud detection algorithm, tax calculation).
public class MixedWorkloadOrderService {
public String processOrderVirtualThreads() throws Exception {
CompletableFuture<String> cpuTask = CompletableFuture.supplyAsync(() -> {
platformThreadCount.incrementAndGet();
try {
long result = 0;
for (int i = 2; i <= 50_000; i++) {
if (isPrime(i)) result += i;
}
return "CPU: " + result;
} finally {
platformThreadCount.decrementAndGet();
}
}, cpuIntensiveExecutor);
CompletableFuture<String> ioTask = CompletableFuture.supplyAsync(() -> {
virtualThreadCount.incrementAndGet();
try {
Thread.sleep(200);
return "I/O: completed";
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return "I/O: interrupted";
} finally {
virtualThreadCount.decrementAndGet();
}
}, ioExecutor);
return "Mixed workload: " + cpuTask.get() + " | " + ioTask.get();
}
}
MIXED WORKLOAD PERFORMANCE (1000 concurrent orders)
Virtual Threads:
- Total Processing Time: 312ms (average)
- Memory Usage: 290 MB
- CPU Efficiency: 85%
- Throughput: 3,205 orders/second
- Success Rate: 100%
Platform Threads (200 thread pool):
- Total Processing Time: 565ms (average)
- Memory Usage: 2.1 GB
- CPU Efficiency: 65%
- Throughput: 1,770 orders/second
- Success Rate: 87% (thread starvation)
Reactive:
- Total Processing Time: 445ms (average)
- Memory Usage: 890 MB
- CPU Efficiency: 72%
- Throughput: 2,247 orders/second
- Success Rate: 95%
In this mixed-workload run, virtual threads showed better throughput and lower memory use:
- ~81% faster than platform threads
- ~30% faster than reactive
- ~86% less memory than platform threads
- ~67% less memory than reactive
- 100% success rate in this run
This section uses a simplified model and thread-stack assumptions for comparison. Actual memory profiles vary by OS, JVM flags, stack sizing, and allocation patterns.
Memory consumption per 10,000 concurrent requests:
MEMORY USAGE BREAKDOWN
Virtual Threads:
├── Thread Stack: ~3 MB (300 bytes × 10K threads)
├── Heap Objects: ~45 MB
├── JVM Overhead: ~25 MB
└── Total: ~73 MB
Platform Threads:
├── Thread Stack: ~20 GB (2MB × 10K threads)
├── Heap Objects: ~45 MB
├── JVM Overhead: ~25 MB
└── Total: ~20.07 GB
Reactive Streams:
├── Thread Pool: ~400 MB (200 threads × 2MB)
├── Reactive Objects: ~350 MB (intermediate streams)
├── Heap Objects: ~45 MB
├── JVM Overhead: ~25 MB
└── Total: ~820 MB
Memory comparison in this model:
- Virtual threads used less memory than platform threads
- Virtual threads used less memory than the reactive baseline
Production load test results (sustained 30-minute test):
These results are from one environment and traffic profile. Real systems can vary widely based on downstream behavior and capacity limits.
SUSTAINED LOAD TEST RESULTS
Load Pattern: Ramp from 1K to 50K concurrent requests over 5 minutes,
hold at 50K for 20 minutes, ramp down over 5 minutes
Virtual Threads:
- Peak Throughput: 47,500 requests/second
- Average Response Time: 185ms
- 95th Percentile: 220ms
- 99th Percentile: 350ms
- Error Rate: 0.02%
- Memory Stable: 850MB throughout test
Platform Threads (500 thread pool):
- Peak Throughput: 2,100 requests/second
- Average Response Time: 2,400ms
- 95th Percentile: 8,500ms
- 99th Percentile: 15,000ms
- Error Rate: 25% (thread pool exhaustion)
- Memory Growth: 1.2GB → 6.8GB (GC pressure)
Reactive Streams:
- Peak Throughput: 12,300 requests/second
- Average Response Time: 680ms
- 95th Percentile: 1,200ms
- 99th Percentile: 2,100ms
- Error Rate: 3.5% (backpressure failures)
- Memory Growth: 450MB → 2.1GB (operator overhead)
In this sustained run: virtual threads showed higher throughput than both platform threads and the reactive baseline, with more stable memory behavior.
Choose Virtual Threads when:
Real-world examples:
Choose Platform Threads when:
Real-world examples:
Consider Reactive when:
Reality check: If your workload does not require reactive-specific features, many services can be simplified with virtual threads and blocking flows.
Identify migration candidates:
// Benchmark harness used in this project
private static boolean isServerRunning(int port) {
try {
HttpResponse<String> response;
try (HttpClient client = HttpClient.newHttpClient()) {
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("http://localhost:" + port + "/health"))
.timeout(java.time.Duration.ofSeconds(5))
.GET()
.build();
response = client.send(request, HttpResponse.BodyHandlers.ofString());
}
return response.statusCode() == 200;
} catch (Exception e) {
return false;
}
}
Step-by-step migration approach:
// Before: Complex reactive chains
private static String aggregateWithCompletableFuture() throws Exception {
CompletableFuture<String> blockFuture = CompletableFuture.supplyAsync(() -> {
try {
return fetchBlock();
} catch (Exception e) {
throw new RuntimeException(e);
}
});
CompletableFuture<String> fileFuture = CompletableFuture.supplyAsync(() -> {
try {
return fetchFile();
} catch (Exception e) {
throw new RuntimeException(e);
}
});
CompletableFuture.allOf(blockFuture, fileFuture).join();
return blockFuture.get() + " | " + fileFuture.get();
}
// After: Simple blocking code with virtual threads
private static String aggregateWithStructuredConcurrency() throws Exception {
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
var blockFuture = scope.fork(() -> fetchBlock());
var fileFuture = scope.fork(() -> fetchFile());
scope.join();
scope.throwIfFailed();
return blockFuture.get() + " | " + fileFuture.get();
}
}
Production monitoring checklist:
// Virtual thread health monitoring
private static void startMemoryMonitoring() {
memoryMonitorScheduler = Executors.newScheduledThreadPool(1);
memoryMonitorScheduler.scheduleAtFixedRate(() -> {
MemoryUsage heapUsage = memoryBean.getHeapMemoryUsage();
long currentHeapUsage = heapUsage.getUsed();
heapGrowthRate = currentHeapUsage - lastHeapUsage;
lastHeapUsage = currentHeapUsage;
if (heapGrowthRate > 50 * 1024 * 1024) {
logger.error("WARNING: High memory growth detected: {}MB", heapGrowthRate / 1024 / 1024);
}
}, 0, 10, TimeUnit.SECONDS);
}
DO:
// Use virtual threads for I/O operations
server.createContext("/io-optimized", exchange -> handleRequest(exchange, "IO_OPTIMIZED", () -> {
virtualThreadCount.incrementAndGet();
try {
Thread.sleep(300);
List<String> lines = Files.readAllLines(Paths.get(LARGE_FILE));
return "I/O Task completed on virtual thread. Lines: " + lines.size();
} finally {
virtualThreadCount.decrementAndGet();
}
}));
server.createContext("/compute-optimized", exchange -> handleRequest(exchange, "COMPUTE_OPTIMIZED", () -> {
CompletableFuture<String> future = CompletableFuture.supplyAsync(() -> {
long result = 0;
for (int i = 2; i <= 100_000; i++) {
if (isPrime(i)) {
result += i;
}
}
return "CPU Task completed on platform thread. Result: " + result;
}, cpuIntensiveExecutor);
return future.get();
}));
}
DON'T:
// Don't use virtual threads for CPU-intensive work
ExecutorService cpuIntensiveExecutor =
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
CompletableFuture<String> future = CompletableFuture.supplyAsync(() -> {
long result = 0;
for (int i = 2; i <= 100_000; i++) {
if (isPrime(i)) {
result += i;
}
}
return "CPU Task completed on platform thread. Result: " + result;
}, cpuIntensiveExecutor);
// Built-in performance tracking
private static void printMetrics(int port) {
try {
HttpResponse<String> response;
try (HttpClient client = HttpClient.newHttpClient()) {
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("http://localhost:" + port + "/metrics"))
.timeout(java.time.Duration.ofSeconds(5))
.GET()
.build();
response = client.send(request, HttpResponse.BodyHandlers.ofString());
}
if (response.statusCode() == 200) {
logger.info(response.body());
} else {
logger.error("Failed to get metrics from port {}", port);
}
} catch (Exception e) {
logger.error("Error getting metrics: {}", e.getMessage());
}
}
Monthly AWS costs for handling 1M requests/day:
Cost numbers here are an illustrative model tied to specific instance assumptions and one region snapshot. Verify with current cloud pricing and your actual utilization profile. These are from simplified models/one environment; real savings vary widely by pricing, utilization, and reliability needs.
Based on AWS pricing snapshot; verify current rates.
INFRASTRUCTURE COST ANALYSIS (AWS us-east-1)
Platform Thread Service:
- Instance Type: c5.4xlarge (16 vCPU, 32GB RAM)
- Instance Count: 8 (for redundancy and load)
- Monthly Cost: $1,843.20
- Additional Costs:
- Load Balancer: $22.77
- CloudWatch: $45.50
- Total Monthly: $1,911.47
Virtual Thread Service:
- Instance Type: c5.xlarge (4 vCPU, 8GB RAM)
- Instance Count: 2 (for redundancy)
- Monthly Cost: $140.16
- Additional Costs:
- Load Balancer: $22.77
- CloudWatch: $15.20
- Total Monthly: $178.13
Reactive Service:
- Instance Type: c5.2xlarge (8 vCPU, 16GB RAM)
- Instance Count: 4 (complexity requires more instances)
- Monthly Cost: $561.12
- Additional Costs:
- Load Balancer: $22.77
- CloudWatch: $30.40
- Total Monthly: $614.29
Annual Savings with Virtual Threads:
- vs Platform Threads: $20,800 (91% savings)
- vs Reactive: $5,234 (85% savings)
Cost takeaway: Virtual threads can improve cost efficiency for I/O-heavy services, but validate with current pricing, utilization, and reliability targets.
jdk.VirtualThreadPinned events to detect pinning in mixed workloadsIn Part 7, we'll explore Production Readiness: Monitoring, Debugging, and Best Practices, with JFR profiling, virtual thread pinning detection, and observability patterns that help under real incidents.
We'll cover:
WrkBenchmarkRunner to benchmark /compute, /block, and /file endpoints-XX:+FlightRecorder for detailed analysisPart 6 complete. This part stayed close to data, trade-offs, and where each model actually fits.
Series Navigation:
If you run comparable benchmarks, compare latency tails, error rate, and memory growth alongside throughput.