Series navigation
Written by
Jagdish Salgotra
Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.
Resilience composition in structured concurrency comes down to one rule: keep each policy a separable layer that fires visibly in logs, or rebuild the complexity you were trying to remove.
Written by
Distributed systems, cloud-native architecture, and the JVM. mostly shipping, occasionally reading.
Note This article targets Java 21 preview structured concurrency APIs (
StructuredTaskScope, JEP 453). Part 9 covers migration to newer preview APIs in Java 25.
I sat in a debugging session where the question was embarrassingly simple: did the dependency recover, or did we serve fallback? We had retries. We had a timeout. We had a fallback to cache. The dashboard said: clean success.
It took two engineers and forty minutes of log tracing to figure out that "clean success" meant the fallback had been serving cached responses for twenty minutes while upstream recovered. Clean success rate. Zero alerts. One completely invisible failure.
That is the composition problem. Once timeout, retry, and fallback all live in the same handler, the code becomes harder to reason about than the failure itself. And worse -> the metrics lie to you.
The rest of this article is the pattern that fixes it.
Most teams can adopt a single pattern (for example timeout + fail-fast). Complexity usually appears when patterns are combined:
The goal is to keep policy explicit and layered, not hidden in nested lambdas.
Composition stays readable when each policy layer has one clear job.
A practical order for policy composition:
Each layer should be testable independently.
The structured path makes success, retry, and fallback distinguishable in metrics and logs.
public String performRetryableOperation(String operation) throws Exception {
logger.info("Performing retryable operation: {}", operation);
// Layer order: scope -> retry -> timeout (adjust per policy needs).
return scopedHandler.runWithRetry(
() -> unstableExternalService(operation),
3,
Duration.ofMillis(500)
);
}
public <T> T runWithRetry(Callable<T> task, int maxRetries, Duration retryDelay) throws Exception {
Exception lastException = null;
for (int attempt = 1; attempt <= maxRetries; attempt++) {
try {
return runInScope(task);
} catch (Exception e) {
lastException = e;
logger.warn("Attempt {} failed: {}", attempt, e.getMessage());
if (attempt < maxRetries) {
Thread.sleep(retryDelay.toMillis());
}
}
}
throw new RuntimeException("All " + maxRetries + " attempts failed", lastException);
}
public <T> T runWithFallback(Callable<T> primary, Callable<T> fallback) throws Exception {
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
var primaryFuture = scope.fork(primary);
scope.join();
try {
scope.throwIfFailed();
return primaryFuture.get();
} catch (Exception e) {
logger.warn("Primary task failed, using fallback: {}", e.getMessage());
return fallback.call();
}
}
}
Keep fallback bounded and observable. Unbounded fallback chains are hard to operate.
public <T1, T2> ParallelResult<T1, T2> runInParallel(
Callable<T1> task1,
Callable<T2> task2) throws Exception {
// One scope per logical unit prevents fragmented cancellation.
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
var future1 = scope.fork(task1);
var future2 = scope.fork(task2);
scope.join();
scope.throwIfFailed();
return new ParallelResult<>(future1.get(), future2.get());
}
}
Use one scope per logical unit of work; avoid spreading a single request across unrelated scopes.
throwIfFailed() is called after join() in Java 21 patterns.For composed policies, test combinations, not only single failures:
joinUntil(...) throws TimeoutException; define deterministic behavior afterwards.javac --release 21 --enable-preview ...
java --enable-preview ...