Note
This series uses Java 21 as the baseline. Virtual threads are stable in Java 21 (JEP 444). Structured concurrency snippets in this part (StructuredTaskScope, JEP 453) use preview APIs and require --enable-preview.
TL;DR
- Virtual threads are often a strong fit for I/O-bound workloads
- CPU-bound workloads usually show smaller differences between virtual and platform threads
- Reactive approaches still fit event-driven/backpressure-heavy designs
- This part compares memory, throughput, and latency from one benchmark setup
- Benchmark numbers are illustrative and should be validated with your own workload
- Migration usually works best as an incremental rollout starting with I/O-heavy paths
Performance debates get noisy fast, especially when architecture style turns into identity.
Teams often assume one model is always faster, then get surprised once memory, failure behavior, and operational overhead are measured together.
Where Assumptions Break
A common pattern in production:
public CompletableFuture<String> getAsync(String url) {
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(url))
.timeout(Duration.ofSeconds(30))
.GET()
.build();
return httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofString())
.thenApply(HttpResponse::body);
}
Common sources of performance confusion:
- Memory overhead: intermediate objects and buffering can grow quickly
- CPU overhead: scheduling and operator coordination have a runtime cost
- Complexity tax: debugging and failure handling can become expensive
- Thread-pool interactions: multiple pools can contend for the same CPU and memory
- Cost shifting: non-blocking moves work, it does not remove work
This is why raw throughput alone is not enough for architectural decisions.
The key point: performance is not only throughput. Resource efficiency, failure behavior, and operational complexity matter too.
The rest of this part compares the three approaches using the same scenarios and test setup.
Test Environment and Methodology
Test setup used for this run:
- Hardware: 16GB RAM, 8-core Intel CPU (AWS c5.2xlarge equivalent)
- Java Version: OpenJDK 21 with virtual threads enabled
- Test Scenarios: I/O-bound (database calls), CPU-bound (computation), mixed workloads
- Load Pattern: Gradual ramp-up from 100 to 100,000 concurrent operations
- Duration: 10-minute sustained load tests with 5-minute warmup
How to read these numbers: Useful for comparison, not a universal baseline.
The Scenario: Processing user orders that require calls to payment service (200ms), inventory service (150ms), and notification service (100ms).
Virtual Threads Implementation
public class VirtualThreadMicroservice {
static void main(String[] args) throws IOException {
HttpServer server = HttpServer.create(new InetSocketAddress(PORT), 0);
server.setExecutor(Executors.newVirtualThreadPerTaskExecutor());
server.createContext("/block", exchange -> handleRequest(exchange, "BLOCK", () -> {
Thread.sleep(300);
return "DB call completed";
}));
server.createContext("/file", exchange -> handleRequest(exchange, "FILE", () -> {
List<String> lines = Files.readAllLines(Paths.get(LARGE_FILE));
return "File read completed. Lines: " + lines.size();
}));
}
}
public class PlatformThreadOrderService {
ExecutorService threadPoolExecutor = Executors.newFixedThreadPool(THREAD_POOL_SIZE);
void configureServer() throws IOException {
HttpServer server = HttpServer.create(new InetSocketAddress(PORT), 0);
server.setExecutor(threadPoolExecutor);
server.createContext("/block", exchange -> {
handleRequest(exchange, "BLOCK", () -> {
Thread.sleep(300);
return "DB call completed";
});
});
}
}
Reactive Implementation
CompletableFuture<String> blockFuture = CompletableFuture.supplyAsync(() -> {
try {
return fetchBlock();
} catch (Exception e) {
throw new RuntimeException(e);
}
});
CompletableFuture<String> fileFuture = CompletableFuture.supplyAsync(() -> {
try {
return fetchFile();
} catch (Exception e) {
throw new RuntimeException(e);
}
});
CompletableFuture.allOf(blockFuture, fileFuture).join();
String result = blockFuture.get() + " | " + fileFuture.get();
These numbers are from one simplified environment and workload model. Treat them as illustrative.
Concurrent Requests Handling Capacity:
MAXIMUM CONCURRENT REQUESTS
Virtual Threads: 100,000+ requests
Platform Threads: 1,200 requests (crashes beyond this)
Reactive: 5,500 requests (performance degrades)
RESOURCE USAGE AT 10,000 CONCURRENT REQUESTS
Virtual Threads:
- Memory Usage: 512 MB
- CPU Usage: 15%
- Response Time: 205ms (average)
- Error Rate: 0%
Platform Threads:
- Memory Usage: 4.2 GB
- CPU Usage: 45%
- Response Time: 450ms (average)
- Error Rate: 15% (thread pool exhaustion)
Reactive:
- Memory Usage: 1.8 GB
- CPU Usage: 35%
- Response Time: 280ms (average)
- Error Rate: 2%
How to read the I/O-bound results:
- Higher concurrent request capacity in this run
- Lower memory use than platform threads in this run
- Lower average response time than the reactive baseline in this run
- Lower observed error rate in this run
The Scenario: Prime number calculation for numbers 2 to 100,000 - pure computational work with no I/O blocking.
Test Implementation
CompletableFuture<String> future = CompletableFuture.supplyAsync(() -> {
platformThreadCount.incrementAndGet();
try {
long result = 0;
for (int i = 2; i <= 100_000; i++) {
if (isPrime(i)) {
result += i;
}
}
return "CPU Task completed on platform thread. Result: " + result;
} finally {
platformThreadCount.decrementAndGet();
}
}, cpuIntensiveExecutor);
return future.get();
CPU-INTENSIVE WORKLOAD RESULTS (Prime calculation to 100K)
Virtual Threads:
- Execution Time: 2,847ms
- Memory Usage: 145 MB
- CPU Utilization: 98%
- Threads Created: 8 (matches CPU cores)
Platform Threads:
- Execution Time: 2,832ms
- Memory Usage: 142 MB
- CPU Utilization: 99%
- Threads Created: 8
Reactive (Parallel Stream):
- Execution Time: 2,951ms
- Memory Usage: 167 MB
- CPU Utilization: 94%
- Additional Overhead: 8%
Performance Winner: Platform Threads (by 15ms - marginal)
Key insight: For pure CPU work in this run, the performance gap was small and platform threads held a slight edge.
The Scenario: E-commerce order processing with both I/O calls (payment validation, inventory check) and CPU work (fraud detection algorithm, tax calculation).
Mixed Workload Implementation
public class MixedWorkloadOrderService {
public String processOrderVirtualThreads() throws Exception {
CompletableFuture<String> cpuTask = CompletableFuture.supplyAsync(() -> {
platformThreadCount.incrementAndGet();
try {
long result = 0;
for (int i = 2; i <= 50_000; i++) {
if (isPrime(i)) result += i;
}
return "CPU: " + result;
} finally {
platformThreadCount.decrementAndGet();
}
}, cpuIntensiveExecutor);
CompletableFuture<String> ioTask = CompletableFuture.supplyAsync(() -> {
virtualThreadCount.incrementAndGet();
try {
Thread.sleep(200);
return "I/O: completed";
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return "I/O: interrupted";
} finally {
virtualThreadCount.decrementAndGet();
}
}, ioExecutor);
return "Mixed workload: " + cpuTask.get() + " | " + ioTask.get();
}
}
Mixed Workload Results
MIXED WORKLOAD PERFORMANCE (1000 concurrent orders)
Virtual Threads:
- Total Processing Time: 312ms (average)
- Memory Usage: 290 MB
- CPU Efficiency: 85%
- Throughput: 3,205 orders/second
- Success Rate: 100%
Platform Threads (200 thread pool):
- Total Processing Time: 565ms (average)
- Memory Usage: 2.1 GB
- CPU Efficiency: 65%
- Throughput: 1,770 orders/second
- Success Rate: 87% (thread starvation)
Reactive:
- Total Processing Time: 445ms (average)
- Memory Usage: 890 MB
- CPU Efficiency: 72%
- Throughput: 2,247 orders/second
- Success Rate: 95%
In this mixed-workload run, virtual threads showed better throughput and lower memory use:
- ~81% faster than platform threads
- ~30% faster than reactive
- ~86% less memory than platform threads
- ~67% less memory than reactive
- 100% success rate in this run
Real-World Production Analysis
Memory Usage Deep Dive
This section uses a simplified model and thread-stack assumptions for comparison. Actual memory profiles vary by OS, JVM flags, stack sizing, and allocation patterns.
Memory consumption per 10,000 concurrent requests:
MEMORY USAGE BREAKDOWN
Virtual Threads:
├── Thread Stack: ~3 MB (300 bytes × 10K threads)
├── Heap Objects: ~45 MB
├── JVM Overhead: ~25 MB
└── Total: ~73 MB
Platform Threads:
├── Thread Stack: ~20 GB (2MB × 10K threads)
├── Heap Objects: ~45 MB
├── JVM Overhead: ~25 MB
└── Total: ~20.07 GB
Reactive Streams:
├── Thread Pool: ~400 MB (200 threads × 2MB)
├── Reactive Objects: ~350 MB (intermediate streams)
├── Heap Objects: ~45 MB
├── JVM Overhead: ~25 MB
└── Total: ~820 MB
Memory comparison in this model:
- Virtual threads used less memory than platform threads
- Virtual threads used less memory than the reactive baseline
Throughput Analysis Under Real Load
Production load test results (sustained 30-minute test):
These results are from one environment and traffic profile. Real systems can vary widely based on downstream behavior and capacity limits.
SUSTAINED LOAD TEST RESULTS
Load Pattern: Ramp from 1K to 50K concurrent requests over 5 minutes,
hold at 50K for 20 minutes, ramp down over 5 minutes
Virtual Threads:
- Peak Throughput: 47,500 requests/second
- Average Response Time: 185ms
- 95th Percentile: 220ms
- 99th Percentile: 350ms
- Error Rate: 0.02%
- Memory Stable: 850MB throughout test
Platform Threads (500 thread pool):
- Peak Throughput: 2,100 requests/second
- Average Response Time: 2,400ms
- 95th Percentile: 8,500ms
- 99th Percentile: 15,000ms
- Error Rate: 25% (thread pool exhaustion)
- Memory Growth: 1.2GB → 6.8GB (GC pressure)
Reactive Streams:
- Peak Throughput: 12,300 requests/second
- Average Response Time: 680ms
- 95th Percentile: 1,200ms
- 99th Percentile: 2,100ms
- Error Rate: 3.5% (backpressure failures)
- Memory Growth: 450MB → 2.1GB (operator overhead)
In this sustained run: virtual threads showed higher throughput than both platform threads and the reactive baseline, with more stable memory behavior.
When to Use What: The Decision Framework
Virtual Threads: Common Fit
Choose Virtual Threads when:
- I/O-bound workloads (database calls, HTTP requests, file operations)
- High concurrency requirements (10K+ concurrent operations)
- Microservices architectures with multiple service calls
- Blocking I/O patterns that you want to keep simple
- Memory-constrained environments (containers, serverless)
Real-world examples:
- REST API gateways aggregating multiple backend services
- Order processing systems with payment, inventory, and shipping calls
- Data processing pipelines with multiple I/O steps
- Chat applications with thousands of concurrent connections
Choose Platform Threads when:
- CPU-intensive computations (mathematical algorithms, data processing)
- Low concurrency, high compute scenarios (< 100 concurrent operations)
- Performance-critical calculations requiring maximum CPU efficiency
- Legacy integration where migration complexity outweighs benefits
- CPU-bound pipelines where parallel streams or bounded executors are already effective
Real-world examples:
- Image/video processing services
- Machine learning model inference
- Cryptographic operations
- Financial calculations requiring maximum precision
Reactive Programming: Specialized Fit
Consider Reactive when:
- Event-driven architectures with complex event streams
- Backpressure handling is critical business requirement
- Existing reactive ecosystem investment (Spring WebFlux, Vert.x)
- Team expertise already exists and migration cost is high
Reality check: If your workload does not require reactive-specific features, many services can be simplified with virtual threads and blocking flows.
Migration Strategy: Incremental Rollout
Phase 1: Assessment and Quick Wins (Week 1-2)
Identify migration candidates:
private static boolean isServerRunning(int port) {
try {
HttpResponse<String> response;
try (HttpClient client = HttpClient.newHttpClient()) {
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("http://localhost:" + port + "/health"))
.timeout(java.time.Duration.ofSeconds(5))
.GET()
.build();
response = client.send(request, HttpResponse.BodyHandlers.ofString());
}
return response.statusCode() == 200;
} catch (Exception e) {
return false;
}
}
Phase 2: Gradual Migration (Week 3-6)
Step-by-step migration approach:
private static String aggregateWithCompletableFuture() throws Exception {
CompletableFuture<String> blockFuture = CompletableFuture.supplyAsync(() -> {
try {
return fetchBlock();
} catch (Exception e) {
throw new RuntimeException(e);
}
});
CompletableFuture<String> fileFuture = CompletableFuture.supplyAsync(() -> {
try {
return fetchFile();
} catch (Exception e) {
throw new RuntimeException(e);
}
});
CompletableFuture.allOf(blockFuture, fileFuture).join();
return blockFuture.get() + " | " + fileFuture.get();
}
private static String aggregateWithStructuredConcurrency() throws Exception {
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
var blockFuture = scope.fork(() -> fetchBlock());
var fileFuture = scope.fork(() -> fetchFile());
scope.join();
scope.throwIfFailed();
return blockFuture.get() + " | " + fileFuture.get();
}
}
Phase 3: Optimization and Monitoring (Week 7+)
Production monitoring checklist:
private static void startMemoryMonitoring() {
memoryMonitorScheduler = Executors.newScheduledThreadPool(1);
memoryMonitorScheduler.scheduleAtFixedRate(() -> {
MemoryUsage heapUsage = memoryBean.getHeapMemoryUsage();
long currentHeapUsage = heapUsage.getUsed();
heapGrowthRate = currentHeapUsage - lastHeapUsage;
lastHeapUsage = currentHeapUsage;
if (heapGrowthRate > 50 * 1024 * 1024) {
logger.error("WARNING: High memory growth detected: {}MB", heapGrowthRate / 1024 / 1024);
}
}, 0, 10, TimeUnit.SECONDS);
}
Best Practices: Lessons from Production Deployments
Virtual Thread Optimization
DO:
server.createContext("/io-optimized", exchange -> handleRequest(exchange, "IO_OPTIMIZED", () -> {
virtualThreadCount.incrementAndGet();
try {
Thread.sleep(300);
List<String> lines = Files.readAllLines(Paths.get(LARGE_FILE));
return "I/O Task completed on virtual thread. Lines: " + lines.size();
} finally {
virtualThreadCount.decrementAndGet();
}
}));
server.createContext("/compute-optimized", exchange -> handleRequest(exchange, "COMPUTE_OPTIMIZED", () -> {
CompletableFuture<String> future = CompletableFuture.supplyAsync(() -> {
long result = 0;
for (int i = 2; i <= 100_000; i++) {
if (isPrime(i)) {
result += i;
}
}
return "CPU Task completed on platform thread. Result: " + result;
}, cpuIntensiveExecutor);
return future.get();
}));
}
DON'T:
ExecutorService cpuIntensiveExecutor =
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
CompletableFuture<String> future = CompletableFuture.supplyAsync(() -> {
long result = 0;
for (int i = 2; i <= 100_000; i++) {
if (isPrime(i)) {
result += i;
}
}
return "CPU Task completed on platform thread. Result: " + result;
}, cpuIntensiveExecutor);
private static void printMetrics(int port) {
try {
HttpResponse<String> response;
try (HttpClient client = HttpClient.newHttpClient()) {
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("http://localhost:" + port + "/metrics"))
.timeout(java.time.Duration.ofSeconds(5))
.GET()
.build();
response = client.send(request, HttpResponse.BodyHandlers.ofString());
}
if (response.statusCode() == 200) {
logger.info(response.body());
} else {
logger.error("Failed to get metrics from port {}", port);
}
} catch (Exception e) {
logger.error("Error getting metrics: {}", e.getMessage());
}
}
Cost Analysis: The Business Case
Infrastructure Cost Comparison
Monthly AWS costs for handling 1M requests/day:
Cost numbers here are an illustrative model tied to specific instance assumptions and one region snapshot. Verify with current cloud pricing and your actual utilization profile.
These are from simplified models/one environment; real savings vary widely by pricing, utilization, and reliability needs.
Based on AWS pricing snapshot; verify current rates.
INFRASTRUCTURE COST ANALYSIS (AWS us-east-1)
Platform Thread Service:
- Instance Type: c5.4xlarge (16 vCPU, 32GB RAM)
- Instance Count: 8 (for redundancy and load)
- Monthly Cost: $1,843.20
- Additional Costs:
- Load Balancer: $22.77
- CloudWatch: $45.50
- Total Monthly: $1,911.47
Virtual Thread Service:
- Instance Type: c5.xlarge (4 vCPU, 8GB RAM)
- Instance Count: 2 (for redundancy)
- Monthly Cost: $140.16
- Additional Costs:
- Load Balancer: $22.77
- CloudWatch: $15.20
- Total Monthly: $178.13
Reactive Service:
- Instance Type: c5.2xlarge (8 vCPU, 16GB RAM)
- Instance Count: 4 (complexity requires more instances)
- Monthly Cost: $561.12
- Additional Costs:
- Load Balancer: $22.77
- CloudWatch: $30.40
- Total Monthly: $614.29
Annual Savings with Virtual Threads:
- vs Platform Threads: $20,800 (91% savings)
- vs Reactive: $5,234 (85% savings)
Cost takeaway: Virtual threads can improve cost efficiency for I/O-heavy services, but validate with current pricing, utilization, and reliability targets.
Validate Gains in Your Environment
- Re-run benchmarks with realistic traffic shape, concurrency ramps, and soak duration
- Compare p50/p95/p99 latency, throughput, error rate, and memory growth together
- Validate downstream bottlenecks (DB pools, API quotas, queue limits) before scaling conclusions
- Measure GC behavior and allocation rates under sustained load
- Track failure-mode behavior separately (timeouts, retries, fallback rates)
- Use JFR
jdk.VirtualThreadPinned events to detect pinning in mixed workloads
What's Next?
In Part 7, we'll explore Production Readiness: Monitoring, Debugging, and Best Practices, with JFR profiling, virtual thread pinning detection, and observability patterns that help under real incidents.
We'll cover:
- JFR (Java Flight Recorder) patterns for virtual thread profiling
- Detecting and resolving carrier thread pinning in production
- Building comprehensive monitoring dashboards
- Debugging structured concurrency applications
- Performance tuning strategies for different workload patterns
Resources
Part 6 complete. This part stayed close to data, trade-offs, and where each model actually fits.
Series Navigation:
If you run comparable benchmarks, compare latency tails, error rate, and memory growth alongside throughput.