Was this article helpful?
to mark as helpful
Enjoyed this article?
Get more engineering insights delivered weekly.
Comments
to join the discussion
to mark as helpful
Get more engineering insights delivered weekly.
to join the discussion
Jagdish Salgotra
Jul 6, 2025 · 15 min read · Project Loom
Discover Java's virtual threads revolution with Project Loom. Learn why Java's threading model is changing forever, performance benefits, migration strategies, and real-world implementation examples. Complete guide to lightweight concurrency in Java 21+.
Your article assistant
Ask me anything about this article. I'll provide answers with relevant sources.
Try asking:
Note This series uses Java 21 as the baseline. Part 1 uses virtual threads only (JEP 444), so no preview flags are needed here. Later parts that use structured concurrency (
StructuredTaskScope, JEP 453) require--enable-preview. See Part 9 for Java 21 -> Java 25 migration guidance.
Load tests of high-concurrency services often reveal the same ceiling: systems fall over at a few thousand concurrent users, even when business logic is simple.
The old pain point is straightforward: platform threads are expensive. Each one consumes 1-2MB of memory and uses real OS resources. In practice that puts a ceiling on concurrency long before business traffic does.
Consider this project example using a platform-thread pool for a blocking endpoint:
public class PlatformThreadPoolServer {
private static final int PORT = 8080;
private static final int THREAD_POOL_SIZE = 20;
private static final long BLOCKING_SIMULATION_TIME = 200;
public static void main(String[] args) throws IOException {
ExecutorService threadPoolExecutor = Executors.newFixedThreadPool(THREAD_POOL_SIZE);
HttpServer server = HttpServer.create(new InetSocketAddress(8080), 0);
server.createContext("/api", exchange -> {
threadPoolExecutor.submit(() -> {
try {
Thread.sleep(BLOCKING_SIMULATION_TIME);
String response = "Platform Thread Ok\n";
exchange.sendResponseHeaders(200, response.length());
exchange.getResponseBody().write(response.getBytes());
exchange.close();
} catch (Exception e) {
e.printStackTrace();
}
});
});
server.setExecutor(null);
server.start();
}
}
Why this approach breaks down at scale:
Most of us worked around this with larger thread pools, callback chains, or reactive plumbing. It worked, but usually at the cost of readability and operability.
Virtual threads solve this with M:N scheduling, mapping many lightweight virtual threads onto a smaller set of carrier threads (platform threads).
In this project, the virtual-thread variant keeps the same blocking style:
public class VirtualThreadPoolServer {
private static final int PORT = 8080;
private static final long BLOCKING_SIMULATION_TIME = 200;
public static void main(String[] args) throws IOException {
ExecutorService loomExecutor = Executors.newVirtualThreadPerTaskExecutor();
HttpServer server = HttpServer.create(new InetSocketAddress(8080), 0);
server.createContext("/api", exchange -> {
loomExecutor.submit(() -> {
try {
Thread.sleep(BLOCKING_SIMULATION_TIME);
String response = "Virtual Thread Ok\n";
exchange.sendResponseHeaders(200, response.length());
exchange.getResponseBody().write(response.getBytes());
exchange.close();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} catch (Exception e) {
e.printStackTrace();
}
});
});
server.setExecutor(null);
server.start();
}
}
What changed in this example:
Understanding this model is key:
The M:N scheduling model maps many lightweight virtual threads onto a small pool of OS-managed carrier threads (one per CPU core by default). When a virtual thread blocks on I/O, it parks — releasing its carrier thread immediately so the carrier can run another virtual thread. This is the core mechanic that allows 100,000+ concurrent virtual threads on just a handful of platform threads.
Thread creation patterns look similar, but runtime characteristics differ significantly:
public class LoomThreadTest {
public static void main(String[] args) throws InterruptedException {
int n = 10_000_00;
System.out.println("Running platform threads...");
long start = System.currentTimeMillis();
List<Thread> threads = new ArrayList<>();
// Platform thread loop will likely OOM; shown for contrast only.
for (int i = 0; i < n; i++) {
Thread t = new Thread(() -> {
try {
Thread.sleep(1000);
} catch (InterruptedException ignored) {}
});
threads.add(t);
t.start();
}
for (Thread t : threads) t.join();
long end = System.currentTimeMillis();
System.out.println("Platform threads took: " + (end - start) + " ms");
System.out.println("\nRunning virtual threads...");
start = System.currentTimeMillis();
threads.clear();
for (int i = 0; i < n; i++) {
Thread t = Thread.ofVirtual().unstarted(() -> {
try {
Thread.sleep(1000);
} catch (InterruptedException ignored) {}
});
threads.add(t);
t.start();
}
for (Thread t : threads) t.join();
end = System.currentTimeMillis();
System.out.println("Virtual threads took: " + (end - start) + " ms");
}
}
The lifecycle behavior matters more than syntax:
The important bit is parking/unmounting: blocked virtual threads free carrier threads to run other work.
private static final int VIRTUAL_THREAD_COUNT = 200_000;
private static void testVirtualThreads() throws InterruptedException {
System.out.println("Testing Virtual Threads...");
long start = System.currentTimeMillis();
List<Thread> threads = new ArrayList<>();
for (int i = 0; i < VIRTUAL_THREAD_COUNT; i++) {
Thread t = Thread.ofVirtual().unstarted(() -> {
try {
Thread.sleep(1000);
} catch (InterruptedException ignored) {}
});
t.start();
threads.add(t);
}
for (Thread t : threads) {
t.join();
}
long end = System.currentTimeMillis();
System.out.println("Virtual threads completed in: " + (end - start) + " ms");
System.out.println("Virtual threads created: " + threads.size());
}
How to read this test:
Thread references in a List adds heap overhead and skews memory resultsexecutor.submit(...) without holding Thread referencesIf you benchmark this pattern, treat it as an experiment. For realistic load testing, avoid keeping every Thread reference and validate end-to-end bottlenecks.
try (ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor()) {
for (int i = 0; i < 200_000; i++) {
executor.submit(() -> {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
});
}
}
This is a simplified comparison to build intuition:
| Thread Type | Typical Footprint Pattern | At Very High Counts | Reality Check |
|---|---|---|---|
| Platform Thread | MB-scale stack per thread | Becomes impractical quickly | OS/thread limits and memory pressure hit early |
| Virtual Thread | Small initial footprint, grows with stack | Usually far lower memory pressure | Still bounded by heap and downstream resources |
- VirtualThreadFlood and LoomThreadTest in this project focus on blocking/sleep-style workloads.
- For CPU-bound work, virtual threads usually do not outperform platform threads.
Key point: virtual threads do not automatically speed up compute-heavy code. They improve scalability and utilization for blocking I/O workloads.
synchronized hot paths, long native calls) because pinning can erase virtual-thread gainsBest fit for:
Use caution / alternatives for:
A practical migration sequence:
Executors.newFixedThreadPool() with Executors.newVirtualThreadPerTaskExecutor()Operational guidelines:
In Part 2, we'll build a high-performance web service using virtual threads and walk through C10K-style behavior with practical load testing.
We'll explore:
Part 1 complete. We covered the bottleneck and the execution model; next we build a real service with it.
Next up: Part 2: Building High-Performance Web Services with Virtual Threads →