Series navigation
Written by
Jagdish Salgotra
Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.
StructuredTaskScope on its own is a parallel-fetch helper, and the resilience policy that makes it usable against a flaky downstream belongs in one handler instead of duplicated at every call site.
Written by
Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.
Note This series uses Java 21 as the baseline. Structured concurrency snippets in this part (
StructuredTaskScope, JEP 453) use preview APIs and require--enable-preview.
Advanced concurrency patterns are useful only when the policy is visible in the code and testable from the outside. A timeout that is checked too late, a retry counter that lives in the wrong place, or a bulkhead that still waits for non-critical work can all look reasonable in a small code sample. The behavior only becomes clear when you run the code as a sequence.
This article is learning material. The main branch now builds with OpenJDK 25.0.2 and uses the Java 25 preview structured-concurrency API, with the Java 21 version separately managed in the feature/java-21 branch. The measurements below were generated from the current checked-in Java 25 code. The Java 21 structured-concurrency syntax used when the article series was first written is covered separately from the migration details in Part 9.
The point of this part is not that structured concurrency makes every advanced pattern faster. The point is that a scope gives the parent request an ownership boundary. Once the boundary exists, you can ask sharper questions: when do siblings stop, which result wins, which work is allowed to outlive failure, and what evidence proves the policy actually ran?
One detail matters before looking at the numbers. Not every helper in AdvancedStructuredPatterns.java is implemented with StructuredTaskScope in the current branch. The partial-results helper, for example, uses CompletableFuture.supplyAsync and polls for completion. The HTTP service and shared scope helpers are where the structured-concurrency examples live. That distinction is useful. It keeps the article from pretending that every named pattern automatically becomes structured just because the file name says so.
The measurements below were generated with OpenJDK 25.0.2 and Maven 3.9.12:
mvn clean compile -DskipTests
The build succeeded and compiled 35 source files. After creating the runtime classpath, I ran the standalone pattern demo, started the port 8082 service, ran the endpoint script, ran the circuit-breaker sequence demo, and then ran focused wrk benchmarks against the endpoints used in this article.
The standalone partial-results example starts four tasks with fixed delays:
List<Callable<String>> tasks = List.of(
() -> { Thread.sleep(100); return "Quick result"; },
() -> { Thread.sleep(500); return "Medium result"; },
() -> { Thread.sleep(1000); return "Slow result"; },
() -> { Thread.sleep(2000); return "Very slow result"; }
);
The timeout limit is 600ms. The run completed the 100ms and 500ms tasks, then reported the two slower tasks as timed out:
Completed: 2/4 tasks
Results: [Quick result, Medium result]
Timed out: [2, 3]
This is the first lesson: partial result handling is not just a return type. It needs a boundary. The parent must know which tasks were accepted, which tasks produced usable output, and which tasks missed the policy window.
The HTTP timeout endpoint shows why the boundary has to be placed carefully. In ConcurrentServiceLayer.shortTimeoutExample(), the code creates a 300ms deadline, forks a 500ms slow service and a 100ms fast service, then calls scope.join() before checking whether the deadline has passed. The direct endpoint check returned a timeout response, but it did so after the slow branch completed:
| Endpoint | Direct result |
|---|---|
GET /timeout/short | status 500 in 506ms |
GET /timeout/graceful | status 200 in 54ms |
The load test matched that behavior:
| Endpoint | Load | Average latency | Requests/sec | Total requests | Notes |
|---|---|---|---|---|---|
/timeout/short | wrk -t2 -c20 -d10s | 504.72ms | 37.66 | 380 | all 380 responses were non-2xx |
/timeout/graceful | wrk -t4 -c40 -d10s | 55.23ms | 721.32 | 7,274 | first successful result was the 50ms cache path |
The short-timeout endpoint is therefore a timeout classifier, not a 300ms wait limiter. It notices that the request exceeded the deadline after the join returns. That is a useful learning example because the number exposes the policy mistake. If the intended behavior is to stop waiting at 300ms, the wait itself has to be deadline-aware.
The graceful endpoint is different. It uses a first-success joiner and returns the cache result. The measured 54ms single request and 55.23ms average under load line up with the fixed 50ms cache delay in the service code.
The conditional-cancellation demo starts four tasks. The second task returns "error" after 200ms, and the condition marks the later tasks as cancelled:
Cancellation triggered! Found error in results: [success, error]
Conditional Cancellation Results:
- Completed results: [success, error]
- Cancelled task indices: [2, 3]
- Was cancelled: true
- Reason: Cancellation condition met
- Execution time: 410 ms
That output is more interesting than a happy-path demo. The code identified the cancellation condition, but the elapsed time was 410ms, close to the slowest 400ms branch. In the current Java 25 migrated version, the helper records the cancellation decision but does not return immediately when the condition becomes true. A reader should not infer early sibling interruption from the label alone.
This is why cancellation examples need timing evidence. A cancellation flag tells you what the policy decided. The elapsed time tells you whether the parent actually stopped waiting.
Progressive result handling is often described as streaming output. The more precise idea is that terminal progress and successful output are separate concepts. A task can finish with a value, finish with an error, or still be running. A progress API has to represent those states without forcing the caller to wait silently for the whole group.
The standalone progressive demo used fixed task delays of 100ms, 200ms, 150ms, and 250ms. The callback order from the run was:
Task 0 completed: Result 1
Task 2 completed: Result 3
Task 1 completed: Result 2
Task 3 completed: Result 4
Completion rate: 100.0% (4/4 tasks)
Total execution time: 275 ms
Results: [Result 1, Result 2, Result 3, Result 4]
Errors: 0
The completion order follows the delays, while the final result list remains in task order. Both views are useful. The callback view tells a UI or caller what can be shown now. The ordered result view gives the parent a stable aggregate shape after the scope has finished.
The port 8082 service has three useful endpoint patterns for this article.
The hedged read endpoint forks a primary branch that sleeps for 300ms and a hedge branch that waits 30ms before running a 150ms service call. A hedge without a delay is just permanent duplicate work. In this example the delay is the policy: the second request only becomes active if the primary has not already won quickly.
The direct check and load test were consistent:
| Endpoint | Direct result | Average latency | Requests/sec | Total requests |
|---|---|---|---|---|
/async/race | 185ms | 188.73ms | 210.22 | 2,120 |
/pattern/scatter-gather | 186ms | 185.13ms | 214.33 | 2,160 |
/pattern/bulkhead | 204ms | 205.13ms | 194.26 | 1,960 |
The scatter-gather endpoint fans out to five fixed service delays: 100ms, 150ms, 120ms, 180ms, and 90ms. The measured 186ms direct request tracks the slowest 180ms branch, which is exactly what the code shape predicts.
The bulkhead endpoint separates critical and non-critical work into two scopes. Critical work sleeps for 100ms and 150ms. Non-critical work sleeps for 200ms and 50ms. The measured response was around 204ms because the current implementation still waits for the non-critical scope before returning:
Bulkhead Pattern: Critical[critical-auth-ok, critical-payment-ok] Non-Critical[analytics-ok, logging-ok]
That is not wrong, but it is a specific policy. The code groups work by importance, yet the response still waits for the normal side. If the intended behavior is to degrade when the normal side is slow, the endpoint needs a different wait policy and a test that proves the normal work does not outlive the parent request.
The resource-aware standalone demo returned:
Resource-aware results: [CPU-1 completed, CPU-2 completed, MEM-1 completed, MEM-2 completed, IO-1 completed]
The useful thing to notice is what it does not do. The current example groups tasks by resource type and executes each group through a scope. It does not enforce CPU, memory, or IO capacity limits. Resource grouping answers which category the work belongs to. Capacity limiting answers how many tasks in that category may run at once. Those are related ideas, but they are not the same policy.
The adaptive example completed 20 tasks in four batches of five:
| Batch | Duration | Next batch size |
|---|---|---|
| 1 | 292ms | 5 |
| 2 | 291ms | 5 |
| 3 | 301ms | 5 |
| 4 | 274ms | 5 |
The batch size stayed at five because the run never crossed either threshold in the controller. In the current implementation, a batch below 100ms can increase concurrency and a batch above 500ms can reduce it. To exercise those branches, lower the simulated task delays below 100ms or raise them above 500ms and rerun the standalone demo. This run stayed in the middle. That is still useful evidence because it prevents a false claim that the controller adapted during this pass.
A breaker is a state machine, and the sequence matters. A single request can show success or failure, but it cannot prove that the breaker opens, rejects work while open, and later allows a retry.
The HTTP /service/circuit-breaker endpoint uses a random failure path and resets the failure counter on success. In a 25-request sequence during this pass, the endpoint produced both successes and failures, but the breaker did not open. I am not using that endpoint as evidence for the open-breaker path.
The standalone EnhancedCircuitBreakerDemo.java did show the state transition:
--- Call 6 ---
FAILURE: Unreliable service failed
--- Call 7 ---
FAILURE: Unreliable service failed
--- Call 8 ---
FAILURE: Unreliable service failed
--- After 5 seconds ---
FAILURE: Circuit breaker is OPEN for DB_SERVICE - failing fast (failures: 3/3, next retry in: 24s)
That output proves the part that matters: after the configured threshold is reached, the next protected call is rejected by breaker state instead of calling the unreliable service again.
The retry evidence also needs the right source. The HTTP /service/retry endpoint uses a counter stored on ConcurrentServiceLayer, so the sequence is spread across requests:
retry-01 status 500 attempt 1 total 0.106526s
retry-02 status 500 attempt 2 total 0.103035s
retry-03 status 200 retryable-service-ok-after-3-attempts total 0.105222s
retry-04 status 500 attempt 1 total 0.106363s
retry-05 status 500 attempt 2 total 0.102853s
retry-06 status 200 retryable-service-ok-after-3-attempts total 0.106081s
That is useful for showing stateful endpoint behavior, but it is not a per-request retry loop. The per-request retry helper is ScopedRequestHandler.runWithRetry(...), and the standalone demo showed it clearly:
Attempt 1 failed: External service failed
Attempt 2 failed: External service failed
RETRY SUCCESS: External-important-task
The helper sleeps between attempts with Thread.sleep(retryDelay.toMillis()). With virtual threads, that parks the virtual thread instead of occupying a carrier thread for the whole delay. If the same helper is used from platform threads, the blocking cost needs a separate review.
After the endpoint script, direct checks, retry and breaker sequences, and focused load tests, the port 8082 metrics endpoint reported:
| Metric | Value |
|---|---|
| Active requests | 0 |
| Total requests | 13,829 |
| Timeout count | 412 |
| Average response time | 136.39ms |
| CPU usage | 13.62% |
| Memory usage | 387.01MB / 776.00MB |
| JVM uptime | 177 seconds |
The total request count is useful mainly as a sanity check. It confirms the load tests and sequences hit the same running service instance before the final metrics snapshot. The average response time blends fast cache responses, hedged reads, bulkhead responses, and timeout failures, so it should not be used as a single headline number.
Advanced patterns should be tested as policies, not as isolated status codes. For timeout examples, compare the configured deadline with the measured response time and check whether the wait itself stops at the boundary. For first-success and hedged reads, verify which branch won and whether the measured latency matches the winning branch plus any configured hedge delay. For bulkheads, measure both the critical path and the non-critical path, then decide whether the response is supposed to wait, degrade, or reject. For circuit breakers and retries, run sequences long enough to cross the state boundary; one request is not evidence of a state machine.
The same rule applies to load tests. Average latency is helpful, but it is not enough. Pair it with total requests, non-2xx counts, timeout counts, and the fixed delays in the code. When a result does not match the code shape, trust the mismatch and inspect the policy boundary before turning the number into a lesson.
These numbers do not prove that the advanced structured-concurrency version is universally faster than a CompletableFuture version, a reactive version, or a library-backed resilience stack. I did not run a controlled comparison against those alternatives for this article, so there is no comparison table here.
The measurements also do not prove that these examples are finished resilience components. The breaker is intentionally small, the retry examples are teaching code, and the adaptive controller did not cross its thresholds in this run. The value of the examples is that they make ownership, waiting, cancellation, and state transitions visible enough to test.
Part 6 moves from pattern shape to performance analysis. The same discipline applies there: start from the code path, predict what the fixed delays and fan-out imply, run the benchmark, and only publish the numbers that the code can explain.