Structured Concurrency · Part 8 of 9

Four operational checks we run on every StructuredTaskScope

Before a fan-out service can be trusted under load, four things need to be true: outcomes counted per scope, deadlines propagated, bulkheads in place, and pinning watched in JFR. What each one looks like in code.

Jagdish Salgotra

2026-05-10·10 min read·~1,800 words

← Previous · Part 7Three structured-concurrency patterns we run in a fan-out service Next · Part 9 →Migrating our fan-out service from Java 21 to Java 25

Code repositoryproject-loom

#structured-concurrency

Written by

Jagdish Salgotra

Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.

all posts

Keep reading · rest of the series

Was this article helpful? or email →

anonymous · no account needed

Structured Concurrency · Part 8 of 9

Four operational checks we run on every StructuredTaskScope

Jagdish Salgotra

2026-05-10·10 min read·~1,800 words

Structured concurrency makes concurrent work easier to read, but it does not make the work observable by default.

That was the main lesson from running the companion services for this part. The code can return the right HTTP status and still leave you with weak operational evidence. A timeout counter can move while the request denominator does not. A memory endpoint can report cumulative allocation pressure that looks like retained heap if you read it too quickly. A thread endpoint can tell you whether CPU work is actually staying on a bounded platform-thread pool.

These are local learning runs, not production incidents. Structured concurrency is still in preview, and the useful question is narrow: when we run the examples locally, what can we actually observe, and what would we need to add before carrying the same pattern into a real service? The main branch now builds with OpenJDK 25.0.2 and uses the Java 25 preview structured-concurrency API, with the Java 21 version separately managed in the feature/java-21 branch. The measurements below were generated from the current Java 25 code. The Java 21 preview syntax shown here remains valid for learning purposes, and Part 9 covers the migration details.

The measurements first

For the Article 8 pass, the local toolchain was OpenJDK 25.0.2 and Maven 3.9.12:

bash

mvn clean compile -DskipTests

The detailed reproduction steps for this article are in testing-and-benchmarking.md.

The build succeeded and compiled 35 source files.

Then I started the advanced structured service on port 8082, the JVM monitoring service on port 8083, the memory service on port 8084, and the thread service on port 8086.

The timeout endpoint produced the most important result:

text

GET /timeout/short
HTTP 500
Request timed out after 508ms

The same endpoint failed three more times in sequence:

text

short-timeout-01 Request timed out after 503ms status=500
short-timeout-02 Request timed out after 501ms status=500
short-timeout-03 Request timed out after 504ms status=500

Then a focused load check showed the same shape under concurrency:

bash

wrk -t2 -c20 -d10s http://localhost:8082/timeout/short

text

Latency average: 510.62ms
Requests: 380
Requests/sec: 37.94
Non-2xx or 3xx responses: 380

After that load check, the service metrics said:

text

Total Requests: 2
Timeout Count: 404
Average Response Time: 102541.00ms

That is the kind of number this article is about. The timeout counter is real. The denominator is not measuring the same population. In AdvancedStructuredConcurrencyMicroservice.handleRequest(), successful requests increment totalRequests, but timeout requests increment timeoutCount and still add duration to totalResponseTime. After many timeouts, the average response time becomes mathematically misleading because timeout durations are divided by the success count.

This is not a reason to distrust the example. It is the lesson. Operational counters are part of the concurrency contract.

The successful deadline endpoint gave a cleaner baseline:

text

GET /deadline/strict
Deadline Results: task-1-within-deadline, task-2-within-deadline, task-3-within-deadline
Duration: 406ms

The load check for that endpoint completed without HTTP errors:

text

Latency average: 411.23ms
Requests: 480
Requests/sec: 47.69

The JVM, memory, and thread services gave separate evidence. JvmMonitoringService.java exposes the JVM metrics, MemoryOptimizedMicroservice.java exposes memory counters, and ThreadOptimizedMicroservice.java exposes the thread counters.

Check	Fresh result
JVM info	Java 25.0.2, 14 processors, heap 9MB / 12288MB
JVM metrics	heap used `9566384` bytes, heap max `12884901888` bytes, GC collection counters at `0.0` seconds
Single file I/O request	50,000 lines read in 21ms, reported memory delta `+6.08MB`
File I/O load	22,506 requests in 10.05s, 2,239.80 requests/sec, average latency 10.45ms
Memory stats after file I/O load	`FILE_IO: 24051.51MB total`, active requests 0, total requests 22,524, average response time 6ms
Memory stats after explicit GC	heap dropped to 2.79MB while cumulative `FILE_IO` memory usage remained 24051.51MB
I/O thread load	active virtual threads 20, active platform threads 0 during `/io-optimized` load
CPU thread load	active virtual threads 0, active platform threads 14 during `/compute-optimized` load

The memory result is worth reading carefully. FILE_IO: 24051.51MB total is not retained heap. After calling /gc, heap dropped to 2.79MB while the endpoint total remained unchanged. That counter is cumulative per-request memory delta. It is useful as allocation-pressure evidence, not as proof of a memory leak.

Check the timeout boundary

In Java 21 preview syntax, the teaching shape for a hard deadline uses joinUntil:

java

public String shortTimeoutExample() throws Exception {
    Instant deadline = Instant.now().plusMillis(350);

    try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
        var slowTask = scope.fork(() -> simulateSlowService("slow-service", 500));
        var fastTask = scope.fork(() -> simulateSlowService("fast-service", 100));

        scope.joinUntil(deadline);
        scope.throwIfFailed();

        return String.format("Timeout Results: %s, %s",
                slowTask.get(), fastTask.get());
    }
}

That snippet teaches the intended ownership rule: the parent has a deadline, and work that misses the deadline should not keep running as if the caller were still waiting.

The migrated Java 25 companion code is different. In ConcurrentServiceLayer.shortTimeoutExample(), the code sets a 300ms deadline, forks a 500ms slow task and a 100ms fast task, then calls scope.join() before checking the deadline.

That is why the measured timeout was about 500ms, not 300ms:

text

Request timed out after 508ms

The code detects that the deadline was exceeded, but it detects it after the slow branch has already finished. For this article, that distinction is more useful than a polished happy-path example. A timeout check after join() is not the same thing as a timeout at the wait boundary.

The operational check is simple: compare the configured budget with the measured failure time. If a 300ms budget consistently fails after 500ms, the deadline is being observed too late.

Count outcomes from the same population

The advanced service exposes a compact metrics endpoint:

text

Active Requests: 0
Total Requests: 2
Timeout Count: 404
Average Response Time: 102541.00ms
Thread Type: Virtual Threads + Structured Concurrency

That output is not wrong by accident. It follows directly from the checked-in handler. Successful requests increment totalRequests. Timeout requests increment timeoutCount. Both paths add to totalResponseTime.

Once you see it, the fix is obvious: either include timeouts in the total denominator or keep separate duration totals for success, timeout, and failure. What matters for the article is the review habit. Outcome counters need to describe the same population, otherwise the dashboard can become less trustworthy exactly when the service is degraded.

The checked-in metrics do not include a cancellation count. That matters too. If the article wants to talk about zombie subtasks, the code needs a way to prove whether slow child work was actually interrupted or merely waited out. The current companion service can show timeout responses. It cannot show scope-level cancellation.

That is the gap to look for in your own code: success, timeout, failure, and cancellation should be separate outcomes, but they should still roll up from the same request population.

Measure the downstream boundary

Structured concurrency groups related work. It does not create capacity in the downstream system.

The checked-in bulkhead endpoint is a useful example because it separates critical and non-critical work into different scopes:

text

GET /pattern/bulkhead
Bulkhead Pattern: Critical[critical-auth-ok, critical-payment-ok] Non-Critical[analytics-ok, logging-ok]
Duration: 207ms

The response still waits for both groups before returning. The critical work takes 100ms and 150ms. The non-critical work includes a 200ms analytics branch. The measured response tracks the slow non-critical side.

That is not a bug. It is the current policy. The code demonstrates grouping, not degraded return after optional work misses a budget.

The operational check is to name the downstream boundary in the metric. If analytics is optional, there should be a visible count for "analytics skipped" or "analytics degraded." If analytics is required, the response should honestly wait and the dashboard should say the normal side is part of the response contract.

Virtual threads make it cheap to wait. They do not make analytics, databases, caches, or HTTP clients infinitely wide.

Separate CPU and I/O evidence

ThreadOptimizedMicroservice.java makes one separation visible:

java

private static final ExecutorService virtualThreadExecutor =
    Executors.newVirtualThreadPerTaskExecutor();

private static final ExecutorService cpuIntensiveExecutor =
    Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());

private static final ExecutorService ioExecutor =
    Executors.newVirtualThreadPerTaskExecutor();

The I/O endpoint sleeps for 300ms and reads a file on virtual threads. During a 20-connection load check, /thread-stats reported:

text

Active Virtual Threads: 20
Active Platform Threads: 0
Total Requests: 262
Available Processors: 14
Average Response Time: 288ms

The corresponding wrk result was:

text

GET /io-optimized
Latency average: 313.42ms
Requests: 640
Requests/sec: 63.57

The CPU endpoint uses the fixed platform-thread pool. During its load check, /thread-stats reported:

text

Active Virtual Threads: 0
Active Platform Threads: 14
Available Processors: 14

The important number is not the very high request rate from the local prime-summing demo. The important number is that CPU work was bounded at 14 platform threads on a 14-processor machine.

That is the operational habit to keep: virtual threads are a good fit for waiting-heavy work. CPU-heavy work still needs a CPU-sized boundary.

Separating CPU and I/O evidence: I/O-bound work on virtual threads scales with concurrency (20 virtual, 0 platform) because waiting parks cheaply, versus CPU-bound work on a fixed pool bounded to the 14 cores (0 virtual, 14 platform) because cores are finite

Read memory counters by what they actually count

The memory service exposes two different kinds of information.

The JVM monitoring service reports current JVM state:

text

jvm_memory_used_bytes{area="heap"} 9566384
jvm_memory_max_bytes{area="heap"} 12884901888
jvm_gc_collection_seconds{gc="G1 Young Generation"} 0.0

The memory service reports request-level memory deltas by endpoint:

text

Endpoint Memory Usage:
  FILE_IO: 24051.51MB total

Request Statistics:
Active Requests: 0
Total Requests: 22524
Average Response Time: 6ms

Those are different measurements. The first is current heap. The second is cumulative allocation pressure. The explicit GC check made the distinction visible:

text

Garbage collection triggered
Heap Usage: 2.79MB / 12288.00MB (0.02%)
FILE_IO: 24051.51MB total

If you read FILE_IO: 24051.51MB total as retained heap, the conclusion is wrong. If you read it as "this endpoint allocated a lot of memory across 22,524 requests," the metric becomes useful.

Article measurements should not turn every large number into a claim. First decide what the number counts.

Keep preview usage contained

The Java 21 snippets in this article use StructuredTaskScope.ShutdownOnFailure and joinUntil. The companion code now uses Java 25 preview APIs such as StructuredTaskScope.open(...) and Joiner.

That migration is exactly why the preview boundary should be small. The helpers in ScopedRequestHandler.java are a better place for preview API usage than spreading scope construction across every request handler. The codebase still has examples in several classes because this is a learning repository, but service code that intends to last should keep the preview API close to the orchestration layer.

The scripts show the same boundary at runtime. run-memory-optimized.sh and run-thread-optimized.sh pass --enable-preview and add JVM tuning flags. monitor-jvm.sh polls the memory, thread, and JVM endpoints. That is enough for local exploration, but it is not a full observability system. It gives you evidence to inspect, not a dashboard contract.

Testing the operational path

Operational tests need to make the failure mode visible. A timeout test should compare configured budget with measured failure time, then inspect whether timeout and total request counters moved together. A cancellation test should prove interruption landed in the child task, not merely that the parent returned. A bulkhead test should state whether optional work is still part of the response contract. Thread tests should separate waiting-heavy work from CPU-heavy work. Memory tests should distinguish current heap from cumulative allocation pressure.

The checked-in Article 8 run found two concrete gaps worth keeping in the article. The timeout endpoint observes the deadline after the slow task finishes, so the configured 300ms budget produces roughly 500ms failures. The metrics endpoint counts timeouts separately from successful requests but mixes their durations into one total, so the average response time becomes misleading during a timeout-heavy run.

Those are not embarrassing details. They are the point of running the code.

What comes next

Across Parts 1-8, the theme has not been "more threads." It has been local ownership of concurrent work.

Ownership is only useful if you can see what happened at the boundary. Did the scope succeed? Did it time out? Did cancellation land? Did fallback run? Did optional work delay the response? Did CPU work stay bounded?

Part 9 closes the series with the migration from Java 21 preview syntax to Java 25 preview APIs. The API changed enough that the migration is not just a find-and-replace exercise. Article 4 showed why the owner-join rule matters, and Article 8 showed why the timeout wait boundary matters.

Resources

← Previous · Part 7Three structured-concurrency patterns we run in a fan-out service Next · Part 9 →Migrating our fan-out service from Java 21 to Java 25

Code repositoryproject-loom

#structured-concurrency

Written by

Jagdish Salgotra

Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.

all posts

Keep reading · rest of the series

Was this article helpful? or email →

anonymous · no account needed

Structured concurrency makes concurrent work easier to read, but it does not make the work observable by default.

The measurements first

For the Article 8 pass, the local toolchain was OpenJDK 25.0.2 and Maven 3.9.12:

bash

mvn clean compile -DskipTests

The detailed reproduction steps for this article are in testing-and-benchmarking.md.

The build succeeded and compiled 35 source files.

Then I started the advanced structured service on port 8082, the JVM monitoring service on port 8083, the memory service on port 8084, and the thread service on port 8086.

The timeout endpoint produced the most important result:

text

GET /timeout/short
HTTP 500
Request timed out after 508ms

The same endpoint failed three more times in sequence:

text

short-timeout-01 Request timed out after 503ms status=500
short-timeout-02 Request timed out after 501ms status=500
short-timeout-03 Request timed out after 504ms status=500

Then a focused load check showed the same shape under concurrency:

bash

wrk -t2 -c20 -d10s http://localhost:8082/timeout/short

text

Latency average: 510.62ms
Requests: 380
Requests/sec: 37.94
Non-2xx or 3xx responses: 380

After that load check, the service metrics said:

text

Total Requests: 2
Timeout Count: 404
Average Response Time: 102541.00ms

This is not a reason to distrust the example. It is the lesson. Operational counters are part of the concurrency contract.

The successful deadline endpoint gave a cleaner baseline:

text

GET /deadline/strict
Deadline Results: task-1-within-deadline, task-2-within-deadline, task-3-within-deadline
Duration: 406ms

The load check for that endpoint completed without HTTP errors:

text

Latency average: 411.23ms
Requests: 480
Requests/sec: 47.69

Check	Fresh result
JVM info	Java 25.0.2, 14 processors, heap 9MB / 12288MB
JVM metrics	heap used `9566384` bytes, heap max `12884901888` bytes, GC collection counters at `0.0` seconds
Single file I/O request	50,000 lines read in 21ms, reported memory delta `+6.08MB`
File I/O load	22,506 requests in 10.05s, 2,239.80 requests/sec, average latency 10.45ms
Memory stats after file I/O load	`FILE_IO: 24051.51MB total`, active requests 0, total requests 22,524, average response time 6ms
Memory stats after explicit GC	heap dropped to 2.79MB while cumulative `FILE_IO` memory usage remained 24051.51MB
I/O thread load	active virtual threads 20, active platform threads 0 during `/io-optimized` load
CPU thread load	active virtual threads 0, active platform threads 14 during `/compute-optimized` load

Check the timeout boundary

In Java 21 preview syntax, the teaching shape for a hard deadline uses joinUntil:

java

public String shortTimeoutExample() throws Exception {
    Instant deadline = Instant.now().plusMillis(350);

    try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
        var slowTask = scope.fork(() -> simulateSlowService("slow-service", 500));
        var fastTask = scope.fork(() -> simulateSlowService("fast-service", 100));

        scope.joinUntil(deadline);
        scope.throwIfFailed();

        return String.format("Timeout Results: %s, %s",
                slowTask.get(), fastTask.get());
    }
}

That snippet teaches the intended ownership rule: the parent has a deadline, and work that misses the deadline should not keep running as if the caller were still waiting.

That is why the measured timeout was about 500ms, not 300ms:

text

Request timed out after 508ms

The operational check is simple: compare the configured budget with the measured failure time. If a 300ms budget consistently fails after 500ms, the deadline is being observed too late.

Count outcomes from the same population

The advanced service exposes a compact metrics endpoint:

text

Active Requests: 0
Total Requests: 2
Timeout Count: 404
Average Response Time: 102541.00ms
Thread Type: Virtual Threads + Structured Concurrency

That is the gap to look for in your own code: success, timeout, failure, and cancellation should be separate outcomes, but they should still roll up from the same request population.

Measure the downstream boundary

Structured concurrency groups related work. It does not create capacity in the downstream system.

The checked-in bulkhead endpoint is a useful example because it separates critical and non-critical work into different scopes:

text

GET /pattern/bulkhead
Bulkhead Pattern: Critical[critical-auth-ok, critical-payment-ok] Non-Critical[analytics-ok, logging-ok]
Duration: 207ms

That is not a bug. It is the current policy. The code demonstrates grouping, not degraded return after optional work misses a budget.

Virtual threads make it cheap to wait. They do not make analytics, databases, caches, or HTTP clients infinitely wide.

Separate CPU and I/O evidence

ThreadOptimizedMicroservice.java makes one separation visible:

java

private static final ExecutorService virtualThreadExecutor =
    Executors.newVirtualThreadPerTaskExecutor();

private static final ExecutorService cpuIntensiveExecutor =
    Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());

private static final ExecutorService ioExecutor =
    Executors.newVirtualThreadPerTaskExecutor();

The I/O endpoint sleeps for 300ms and reads a file on virtual threads. During a 20-connection load check, /thread-stats reported:

text

Active Virtual Threads: 20
Active Platform Threads: 0
Total Requests: 262
Available Processors: 14
Average Response Time: 288ms

The corresponding wrk result was:

text

GET /io-optimized
Latency average: 313.42ms
Requests: 640
Requests/sec: 63.57

The CPU endpoint uses the fixed platform-thread pool. During its load check, /thread-stats reported:

text

Active Virtual Threads: 0
Active Platform Threads: 14
Available Processors: 14

The important number is not the very high request rate from the local prime-summing demo. The important number is that CPU work was bounded at 14 platform threads on a 14-processor machine.

That is the operational habit to keep: virtual threads are a good fit for waiting-heavy work. CPU-heavy work still needs a CPU-sized boundary.

Read memory counters by what they actually count

The memory service exposes two different kinds of information.

The JVM monitoring service reports current JVM state:

text

jvm_memory_used_bytes{area="heap"} 9566384
jvm_memory_max_bytes{area="heap"} 12884901888
jvm_gc_collection_seconds{gc="G1 Young Generation"} 0.0

The memory service reports request-level memory deltas by endpoint:

text

Endpoint Memory Usage:
  FILE_IO: 24051.51MB total

Request Statistics:
Active Requests: 0
Total Requests: 22524
Average Response Time: 6ms

Those are different measurements. The first is current heap. The second is cumulative allocation pressure. The explicit GC check made the distinction visible:

text

Garbage collection triggered
Heap Usage: 2.79MB / 12288.00MB (0.02%)
FILE_IO: 24051.51MB total

If you read FILE_IO: 24051.51MB total as retained heap, the conclusion is wrong. If you read it as "this endpoint allocated a lot of memory across 22,524 requests," the metric becomes useful.

Article measurements should not turn every large number into a claim. First decide what the number counts.

Keep preview usage contained

Testing the operational path

Those are not embarrassing details. They are the point of running the code.

What comes next

Across Parts 1-8, the theme has not been "more threads." It has been local ownership of concurrent work.

Four operational checks we run on every StructuredTaskScope

Series navigation

Jagdish Salgotra

Keep reading · rest of the series

Four operational checks we run on every StructuredTaskScope

The measurements first

Check the timeout boundary

Count outcomes from the same population

Measure the downstream boundary

Separate CPU and I/O evidence

Read memory counters by what they actually count

Keep preview usage contained

Testing the operational path

What comes next

Resources

Series navigation

Jagdish Salgotra

Keep reading · rest of the series

The measurements first

Check the timeout boundary

Count outcomes from the same population

Measure the downstream boundary

Separate CPU and I/O evidence

Read memory counters by what they actually count

Keep preview usage contained

Testing the operational path

What comes next

Resources