engnotes.dev
NotebookTopicsAbout

Subscribe

One email when a new post goes up. Nothing else.

one per post · no tracking · also on RSS

Site

  • Notebook
  • Topics
  • About
  • Contact

Topics

Project Loom9Structured Concurrency9Tail Latency & System Behavior4

Elsewhere

  • GitHub
  • X
  • LinkedIn
  • Email
engnotes.dev© 2026 Jagdish Salgotra · written on personal time. not on employer time.
PrivacyTermsCookies
blog/project-loom/part 7
Project Loom · Part 7 of 9

Production Readiness, Monitoring, and Debugging

Most thread-pool dashboards go quiet once a service moves to virtual threads, because a small carrier count hides millions of virtual threads and high CPU usually means pinning, not business load.

J
Jagdish Salgotra
2025-08-17·14 min read·~1,500 words

Series navigation

← Previous · Part 6Performance Deep DiveNext · Part 8 →Future Directions and Migration Planning
Code repositoryproject-loom
#project-loom
share
J

Written by

Jagdish Salgotra

Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.

all posts

Keep reading · rest of the series

  • 2025-07-0615 min read
    Part 1
    Java Virtual Threads: Why They Matter for I/O Scalability
  • 2025-07-1315 min read
    Part 2
    Building Web Services with Virtual Threads
  • 2025-07-2028 min read
    Part 3
    Real-World Microservices
  • 2025-07-2725 min read
    Part 4
    Structured Concurrency in Practice
Was this article helpful? or email →
anonymous · no account needed

On this page

Reading progress

0 min of 14 · ~14 left

Ask the post

Any answer points back at the paragraph it came from.

Note This series uses Java 21 as the baseline. Virtual threads are stable in Java 21 (JEP 444). Structured concurrency snippets in this part (StructuredTaskScope, JEP 453) use preview APIs and require --enable-preview.

Monitoring virtual threads starts with admitting what ordinary thread metrics cannot tell you. A JVM can run a large number of virtual-thread tasks while the platform-thread count still looks small. That is the point of virtual threads, but it also means the old question, "how many threads are running?", is not enough.

The better question is: what work did the application start, how much of it finished, how much failed, where is it waiting, and did the JVM report pinning or lock contention while it was waiting?

This article is learning material. The main branch now builds with OpenJDK 25.0.2 and uses the Java 25 preview structured-concurrency API, with the Java 21 version separately managed in the feature/java-21 branch. The measurements below were generated from the current checked-in Java 25 code. Virtual threads are final in Java 21; the preview flag is still used in this repository because other examples in the same build use preview structured-concurrency APIs.

What I ran

The measured pass used OpenJDK 25.0.2 and Maven 3.9.12. The build succeeded and compiled 35 source files:

bash
mvn clean compile -DskipTests
mvn dependency:build-classpath -Dmdep.outputFile=cp.txt

Then I ran the monitoring and debugging entry points directly:

bash
java --enable-preview -cp "$(cat cp.txt):target/classes" app.js.observability.VirtualThreadObservability

java --enable-preview \
  -XX:+FlightRecorder \
  -XX:StartFlightRecording=duration=20s,filename=part7-monitoring.jfr \
  -cp "$(cat cp.txt):target/classes" \
  app.js.monitoring.JvmMonitoringService

java --enable-preview \
  -Djdk.tracePinnedThreads=full \
  -cp "$(cat cp.txt):target/classes" \
  app.js.pinning.VirtualThreadPinningDetector

java --enable-preview \
  -Djdk.tracePinnedThreads=full \
  -cp "$(cat cp.txt):target/classes" \
  app.js.pinning.PinningOptimizationExample

JVM metrics are useful, but they are not virtual-thread metrics

JvmMonitoringService is intentionally small. It starts an HttpServer, installs a virtual-thread-per-task executor, and exposes two endpoints:

java
HttpServer server = HttpServer.create(new InetSocketAddress(PORT), 0);
server.setExecutor(Executors.newVirtualThreadPerTaskExecutor());

server.createContext("/metrics", exchange -> {
    String metrics = generatePrometheusMetrics();
    sendResponse(exchange, metrics);
});

server.createContext("/jvm-info", exchange -> {
    String info = generateJvmInfo();
    sendResponse(exchange, info);
});

The /metrics endpoint reported:

text
# HELP jvm_memory_used_bytes Used memory in bytes
# TYPE jvm_memory_used_bytes gauge
jvm_memory_used_bytes{area="heap"} 26343600
# HELP jvm_memory_max_bytes Maximum memory in bytes
# TYPE jvm_memory_max_bytes gauge
jvm_memory_max_bytes{area="heap"} 12884901888
# HELP jvm_gc_collection_seconds Time spent in GC
# TYPE jvm_gc_collection_seconds counter
jvm_gc_collection_seconds{gc="G1 Young Generation"} 0.0
# HELP jvm_gc_collection_seconds Time spent in GC
# TYPE jvm_gc_collection_seconds counter
jvm_gc_collection_seconds{gc="G1 Concurrent GC"} 0.0
# HELP jvm_gc_collection_seconds Time spent in GC
# TYPE jvm_gc_collection_seconds counter
jvm_gc_collection_seconds{gc="G1 Old Generation"} 0.0

The /jvm-info endpoint reported:

text
JVM Information:
================
Java Version: 25.0.2
JVM Name: OpenJDK 64-Bit Server VM
JVM Version: 25.0.2+10-LTS
Uptime: 10 seconds
Available Processors: 14
Heap Usage: 25MB / 12288MB

This is useful baseline information. It tells you which JVM is running, how much heap is used, which collectors are present, and whether the process is alive. It does not tell you how many virtual-thread tasks were accepted, completed, failed, or parked.

That distinction matters because the server uses virtual threads for request handling, but the exported metrics are ordinary JVM heap and GC metrics. A virtual-thread service still needs normal JVM metrics. It also needs task-level metrics that are closer to the work the application actually started.

One more detail matters here. The method named startJFRProfiling() in this service does not start a JFR recording. It logs the JVM arguments a reader should use:

java
private static void startJFRProfiling() {
    logger.info("JFR Profiling enabled. Use JVM args: -XX:+FlightRecorder -XX:StartFlightRecording=duration=60s,filename=app.jfr");
}

The recording in this article came from the JVM command line, not from that method. The generated JFR file was real, and jfr summary showed a 20-second recording with ordinary runtime events:

text
Version: 2.1
Chunks: 1
Duration: 20 s

jdk.CPULoad                 18
jdk.ThreadStart             10
jdk.ThreadDump               2
jdk.JavaMonitorWait          2
jdk.VirtualThreadStart       0
jdk.VirtualThreadPinned      0

That summary proves the recording ran. It does not prove anything about pinning in this service. For this short HTTP run, the JFR summary did not include virtual-thread start or pinned events.

Task-level monitoring tells a different story

VirtualThreadObservability takes the opposite approach. It does not expose an HTTP endpoint. It wraps application tasks and counts what happened inside the process:

java
private final AtomicLong taskCount = new AtomicLong(0);
private final AtomicLong successCount = new AtomicLong(0);
private final AtomicLong errorCount = new AtomicLong(0);
private final AtomicLong totalExecutionTime = new AtomicLong(0);

private final Map<String, TaskMetrics> taskMetrics = new ConcurrentHashMap<>();
private final Map<String, Long> errorCounts = new ConcurrentHashMap<>();
private final ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();

The important method is trackTask(...). It increments the task count, records elapsed time, updates success or failure counters, and records the exception class when a task fails:

java
public <T> T trackTask(String taskName, Callable<T> task) throws Exception {
    if (!monitoring) return task.call();

    taskCount.incrementAndGet();
    long startTime = System.nanoTime();
    long startMemory = runtime.totalMemory() - runtime.freeMemory();

    try {
        T result = task.call();
        long executionTime = System.nanoTime() - startTime;
        long memoryUsed = (runtime.totalMemory() - runtime.freeMemory()) - startMemory;

        successCount.incrementAndGet();
        totalExecutionTime.addAndGet(executionTime);
        taskMetrics.compute(taskName, (key, metrics) -> {
            if (metrics == null) {
                metrics = new TaskMetrics();
            }
            metrics.addExecution(executionTime, memoryUsed, true);
            return metrics;
        });

        return result;
    } catch (Exception e) {
        long executionTime = System.nanoTime() - startTime;
        errorCount.incrementAndGet();
        totalExecutionTime.addAndGet(executionTime);
        errorCounts.merge(e.getClass().getSimpleName(), 1L, Long::sum);

        taskMetrics.compute(taskName, (key, metrics) -> {
            if (metrics == null) {
                metrics = new TaskMetrics();
            }
            metrics.addExecution(executionTime, 0, false);
            return metrics;
        });

        throw e;
    }
}

The shape is simple: count the task, run it, record the outcome.

The run started with the monitor reporting ordinary JVM thread counts:

text
[10:05:50] Threads: 7, Memory: 6.89MB, Tasks: 0 (Success: 0, Errors: 0)

After the demo tasks ran, the monitor reported:

text
[10:05:51] Threads: 23, Memory: 24.83MB, Tasks: 105 (Success: 99, Errors: 6)
[10:05:52] Threads: 23, Memory: 24.83MB, Tasks: 105 (Success: 99, Errors: 6)

The final summary was:

text
Monitoring Duration: 2s
Total Tasks: 105
Successful Tasks: 99 (94.3%)
Failed Tasks: 6 (5.7%)
Average Execution Time: 141.63ms
Tasks per Second: 39.61
RuntimeException: 6 occurrences

Current Memory Usage:
  Total: 776.00MB, Used: 26.10MB, Free: 749.90MB
  Memory Usage: 3.4%

Thread Information:
  Current Thread Count: 23
  Peak Thread Count: 23
  Total Started Threads: 23

This is the main lesson from the observability demo: the task count and the JVM thread count are different signals. The demo tracked 105 application tasks, while ThreadMXBean reported 23 current threads and 23 peak threads. If you only graph platform-thread counts, you miss the application-level task volume.

The error count is intentionally variable. The error-tracking section uses ThreadLocalRandom.current().nextBoolean(), so another run may report a different number of RuntimeException failures. That does not make the monitor useless. It means the article should not pretend that six failures is a stable benchmark result. The stable result is that the monitor records task outcomes and exception classes.

There is also a small formatting issue in the current demo output. The per-task breakdown prints many entries on one long line because the stream block uses System.out.printf(...) without a newline. That is why the summary counters are the evidence here.

Pinning needs evidence, not labels

VirtualThreadPinningDetector is written as a pinning demo. Its log labels say synchronized blocks cause pinning and ReentrantLock does not:

java
logger.info(" Test 1: Synchronized blocks (CAUSES PINNING)");
testSynchronizedPinning();

logger.info("\n Test 2: ReentrantLock (NO PINNING)");
testReentrantLockNoPinning();

That was the expectation encoded in the demo. The current Java 25 run is more interesting than that label.

I ran it with:

bash
java --enable-preview \
  -Djdk.tracePinnedThreads=full \
  -cp "$(cat cp.txt):target/classes" \
  app.js.pinning.VirtualThreadPinningDetector

The direct timing section reported:

text
Heavy synchronized load completed in 1214ms
Optimized ReentrantLock completed in 1218ms

The metrics section reported:

text
Synchronized operations: 25 (avg: 1349.30ms)
ReentrantLock operations: 25 (avg: 1348.31ms)
  Synchronized is 1.00x slower (likely due to pinning)

The text likely due to pinning comes from the checked-in demo. The measured result does not support a meaningful performance difference in this run. The two averages were almost identical, and the console output did not include pinned-thread stack traces even though the process was started with -Djdk.tracePinnedThreads=full.

That is not a failed result. It is the lesson. A method name, a log label, or a comment can tell you what the author expected. The run tells you what the current JDK and current code actually did.

One small debugging detail also shows up in this output. The task lines print Thread.currentThread().getName(), but these virtual threads were started without explicit names, so the name field is blank:

text
Synchronized task 1 on thread
ReentrantLock task 0 on thread

If a service needs request-level debugging, name virtual threads through a thread factory or log request IDs, task names, and scope IDs directly. A blank virtual-thread name is technically correct, but it is not helpful during investigation.

Lock choices still need measurement

PinningOptimizationExample compares a few lock and data-structure choices. The first three sections produced output:

text
Synchronized: 1309ms (pinning)
ReentrantLock: 1297ms (no pinning)
ReentrantLock is 1.01x faster

Synchronized cache: 1275ms (pinning)
ConcurrentHashMap: 5ms (no pinning)
ConcurrentHashMap is 255.00x faster

ReentrantLock: 2249ms
StampedLock: 1211ms
StampedLock is 1.86x faster for read-heavy workloads

Only the second and third comparisons have a strong signal in this run. The basic synchronized versus ReentrantLock comparison was effectively a tie. The cache comparison was dramatically different, but not because ConcurrentHashMap has magic virtual-thread properties. The synchronized version serializes the map operation and the sleep behind one shared lock. The ConcurrentHashMap version removes that global synchronized section. The measured difference is a lock-shape and serialization result.

The StampedLock result is also workload-specific. The test creates 800 reader tasks and 200 writer tasks. With that read-heavy mix, the measured run reported 1211ms for StampedLock compared with 2249ms for ReentrantLock.

The fourth section did not complete:

text
Test 4: Producer-Consumer Pattern
Producer-Consumer pattern optimization

The reason is in the code. The demo starts 50 producers and 150 consumers:

java
for (int i = 0; i < 50; i++) {
    Thread.startVirtualThread(optimizer::produce);
}

for (int i = 0; i < 150; i++) {
    Thread.startVirtualThread(optimizer::consume);
}

Each consumer calls queue.take(). With only 50 produced items and 150 consumers, 100 consumers can wait forever. I stopped that run instead of inventing a completion time.

This is exactly why monitoring examples should include liveness checks. Virtual threads make waiting cheap, but a cheap wait can still be the wrong wait. A service that starts more consumers than it can satisfy needs a termination policy, a timeout, a poison pill, a bounded test shape, or a different assertion.

How to test monitoring code

Monitoring code should be tested against signals that can disagree with each other. In this repository, the JVM endpoint exposes heap, GC, uptime, processor count, and JVM version, while the observability demo exposes application task outcomes. A useful test run records both, then asks whether the thread count, task count, error count, and latency story all point to the same explanation.

Pinning checks need the same discipline. Run with -Djdk.tracePinnedThreads=full, but do not claim pinning unless the console or JFR output actually shows it. If the checked-in labels say a path should pin and the current JDK does not print a pinned trace, write that down. That is better evidence than forcing the article to match an old expectation.

Lock-choice examples need workload context. A ConcurrentHashMap result is not interchangeable with a ReentrantLock result, and a read-heavy StampedLock test does not say much about a write-heavy service. The useful benchmark is the one where the code path, the contention shape, and the measured result are all visible.

Liveness belongs in the same testing conversation. The producer-consumer example did not finish because the number of consumers exceeded the number of produced items. A monitoring guide that ignores that kind of hang is incomplete. The test should either make completion possible or make the expected blocking state explicit.

What this does not prove

This article does not prove that the sample monitor is a complete observability system. It is a small teaching utility that shows why task-level counters matter.

It does not prove that synchronized blocks are always harmless under Java 25. It proves that this checked-in demo, on this JDK, did not emit pinned-thread stack traces and did not show a meaningful synchronized-versus-ReentrantLock timing difference.

It also does not prove that the JFR recording captured every virtual-thread event a real service would need. The service run produced a valid JFR file, but the summary showed zero jdk.VirtualThreadStart and zero jdk.VirtualThreadPinned events. For deeper virtual-thread analysis, configure the recording for the events you need and verify the event counts before drawing conclusions.

What comes next

Part 8 looks at where the Java concurrency model is heading next: scoped values, context propagation, integration patterns, and the parts of Project Loom that are still evolving.


Resources

  • Complete Code: VirtualThreadObservability.java - Comprehensive monitoring utilities
  • Pinning Detection: VirtualThreadPinningDetector.java - Production pinning detection
  • JFR Monitoring: JvmMonitoringService.java - JFR integration examples
  • Production Examples: Run the monitoring service locally and test with realistic load
  • Official Documentation: JEP 444: Virtual Threads

If you deploy virtual-thread monitoring, compare old and new dashboards during induced failure tests.