
Modern microservices have various integration points in form of external APIs, databases etc. In order to perform these integrations the service needs to maintain connections with these points & these connections tend to be costly. You need to perform authz/authn, do network handshakes(Eg TCP) & also some form of book-keeping around what connections are alive. So normally all applications tend to have a hard limit on number of such active connections & this is where we bring in our problem statement.
A single integration point in an application can end up hoarding majority of the connections in case of issues due to high latency, bugs etc & starve other integration points. For eg when we have a limited number of database connections, a slow database query can end up holding database connections for longer amount of time & in turn block all the remaining queries which might have taken a fraction of time. This problem can soon end up choking our applications & in absence of good observability tooling, it becomes harder to trace the root cause. Lets first try to see this problem in action & then we will start looking into the solution i.e. the bulkhead pattern.
Visualizing the problem
Lets try to visualize the problem that can arise due to connection starvation through an example. Here we have 3 services i.e. centralservice, servicea & serviceb. Both servicea & serviceb serve a single REST endpoint individually & the only caveat is that servicea returns the response after a delay of 10 connections. This is done to depict a latency issue that you might typically see in your production application. centralservice uses these 2 services as an integration point & expose 2 client facing REST APIs as below:
@GetMapping("/servicea")
@ResponseStatus(HttpStatus.OK)
public String servicea() {
return serviceaRestClient.get()
.retrieve()
.body(String.class);
}
@GetMapping("/serviceb")
@ResponseStatus(HttpStatus.OK)
public String serviceb() {
return servicebRestClient.get()
.retrieve()
.body(String.class);
}
As of now we don’t have any resource restrictions on the 2 integration points. We have simple REST clients for these 2 integration points with no constraints on connection limit as we are using SimpleClientHttpRequestFactory which doesn’t have support for connection pooling & creates a new TCP connection for each request.
@Bean
public RestClient serviceaRestClient(@Value("${servicea.url}") String serviceaUrl) {
return RestClient.builder()
.requestFactory(new SimpleClientHttpRequestFactory())
.baseUrl(serviceaUrl)
.build();
}
@Bean
public RestClient servicebRestClient(@Value("${serviceb.url}") String servicebUrl) {
return RestClient.builder()
.requestFactory(new SimpleClientHttpRequestFactory())
.baseUrl(servicebUrl)
.build();
}
We also limit the number of active connections Tomcat will accept for centralservice to 10 for demo purposes. In production this number will be much higher though the underlying behavior will remain similar.
server:
tomcat:
threads:
max: 10
accept-count: 10
Now that we have done the code walkthrough of our services(All the code can be found on this Github branch), lets go through the stress test which I am doing using a simple Golang program. In this test, we start with sending a few requests to serviceb which return successfully & then start sending requests to servicea endpoint in parallel which come with high latency. The code for stress test is as below
package main
import (
"fmt"
"net/http"
"sync"
"time"
)
const (
RequestCount = 1000
ServiceAUrl = "http://localhost:8000/api/v1/servicea" // The 10s delay
ServiceBUrl = "http://localhost:8000/api/v1/serviceb" // The healthy service
)
func main() {
client := &http.Client{
Timeout: 2 * time.Second,
}
fmt.Println("Starting Service B Heartbeat...")
go func() {
for {
start := time.Now()
_, err := client.Get(ServiceBUrl)
if err != nil {
fmt.Printf("❌ Service B FAILED: %v\n", err)
} else {
fmt.Printf("✅ Service B OK (%v)\n", time.Since(start))
}
time.Sleep(500 * time.Millisecond)
}
}()
time.Sleep(3 * time.Second)
fmt.Println("\n⚠️ INITIATING LOAD ON SERVICE A (The 10s delay). Watch Service B...")
var wg sync.WaitGroup
for range RequestCount {
wg.Add(1)
wg.Go(func() {
_, _ = client.Get(ServiceAUrl)
})
}
wg.Wait()
fmt.Println("Load test complete.")
}
Here is the stress test in action where we see successful response for serviceb in isolation but then we encounter failure for both servicea & serviceb as the underlying connection pool is depleted by servicea.
Introducing Bulkhead Pattern
Bulkhead aka partition comes from the world of boats where partitions are built so that a fault in one part of the boat doesn’t ends up drowning the entire boat. The problem that we went through in last part can also be solved using something similar. If we are able to constrain the resources such as connections for integration points then we can ensure that a fault in one part of the points doesn’t ends up stalling our complete application.
This pattern of restricting resources is common in other parts of software too. Kubernetes uses resource limits in order to manage resources for its containers & pods. Similar solutions are used to solve the noisy neighbor problem in multi-tenanted applications.
In this pattern we ensure that each service is assigned its own pool of resources & further usage is either blocked or rejected if the service ends up consuming all the resources. This way we prevent the problem that we encountered in our stress test. Now lets look at how to implement this pattern in code.
Implementation
I have implemented the bulkhead pattern in 3 different ways. I start by creating a custom implementation using annotations. This is similar to what you will get from resilience4j though you get a chance to look at the actual implementation instead of just an annotation. Next I implement the same functionality using the features introduced in Spring framework 7 that removes the requirement for an external library altogether. In the final implementation I implement the bulkhead pattern during the configuration by adding a connection limit instead of restricting it during service invocation. Lets dive in(As usual the code for each implementation will be shared under each sub-section).
AOP Implementation
I implement Bulkhead pattern here by building a custom AOP based library(bulkheadlite) & then using it in the centralservice. Lets do a code walkthrough of the library & then we will take a look at its usage in the service followed by the stress test. The library introduces 2 client facing components in form of an annotation as CustomBulkhead(Client uses this annotation to define the resource limits) & an exception as BulkheadFullException (Which they receive if the limit is reached). The annotation is processed through AOP where the implementation makes use of a Semaphore to limit the resource usage. Code for the implementation:
@Component
@Aspect
public class BulkheadAspect {
private final Map<String, Semaphore> semaphores = new ConcurrentHashMap<>();
@Around("@annotation(bulkhead)")
public Object apply(ProceedingJoinPoint joinPoint, CustomBulkhead bulkhead) throws Throwable {
Semaphore semaphore = semaphores.computeIfAbsent(
bulkhead.name(),
_ -> new Semaphore(bulkhead.maxConcurrent(), true)
);
if (semaphore.tryAcquire()) {
try {
return joinPoint.proceed();
} finally {
semaphore.release();
}
} else {
throw new BulkheadFullException("Bulkhead: " + bulkhead.name() + " is full. Request rejected");
}
}
}
The bulkheadlite library is used in the centralservice as below. Aside from this there are no other changes in the service implementation. All the code can be found on this Github branch.
@GetMapping("/servicea")
@ResponseStatus(HttpStatus.OK)
@CustomBulkhead(name = "getServiceA")
public String servicea() {
return serviceaRestClient.get()
.retrieve()
.body(String.class);
}
Lets take a look at the demo for this implementation where initial calls to serviceb work in isolation. Then when we send large number of concurrent requests for servicea, only 5 of them go through while the remaining requests are rejected with a 429 “Too Many Requests” error code. We also see that serviceb continues working as expected & we also receive successful response for 5 of the servicea requests after the initial delay.
Spring 7 APIs
Spring framework 7 released a set of resiliency features which simplify our earlier implementation to great extent. Now instead of depending upon an external library, we can instead use the @ConcurrencyLimit annotation provided by the framework. Internally this too is backed by Semaphore based mechanism. There are bunch of other features introduced as part of this release & I encourage you to check them out. One less external dependency means one less vulnerability source to patch & in the AI world a smaller context for your agents.
Now the implementation changes by removing the dependency on external library & replacing the custom annotation with @ConcurrencyLimit annotation. All the code can be found on this Github branch.
@GetMapping("/servicea")
@ResponseStatus(HttpStatus.OK)
@ConcurrencyLimit(limit = 5, policy = ConcurrencyLimit.ThrottlePolicy.REJECT)
public String servicea() {
return serviceaRestClient.get()
.retrieve()
.body(String.class);
}
Tuning the connection limit
Both the previous implementations were about restricting the resource usage at point of invocation though we can also restrict the resource usage by limiting the connection count while creating the REST client. In our example we can do this by using setMaxTotal on ClientHttpRequestFactory implementation to the desired number of connections. Now any invocation of this HTTP REST client will be limited to maxConnections. Note that now instead of using SimpleClientHttpRequestFactory which doesn’t provide support for connection pool, we make use of HttpComponentsClientHttpRequestFactory.
private HttpComponentsClientHttpRequestFactory createFactory(int maxConnections) {
PoolingHttpClientConnectionManager connectionManager = new PoolingHttpClientConnectionManager();
connectionManager.setMaxTotal(maxConnections);
connectionManager.setDefaultMaxPerRoute(maxConnections);
RequestConfig requestConfig = RequestConfig.custom()
// If we can't get a connection from the pool instantly, throw an exception!
.setConnectionRequestTimeout(Timeout.ZERO_MILLISECONDS)
.build();
CloseableHttpClient httpClient = HttpClients.custom()
.setConnectionManager(connectionManager)
.setDefaultRequestConfig(requestConfig)
.build();
return new HttpComponentsClientHttpRequestFactory(httpClient);
}
@Bean
public RestClient serviceaRestClient(
@Value("${servicea.url}") String serviceaUrl,
@Value("${servicea.max-connections}") int maxConn
) {
return RestClient.builder()
.requestFactory(createFactory(maxConn))
.baseUrl(serviceaUrl)
.build();
}
With above implementation, once the connection limit is reached & if we still have more requests demanding the connection then we will throw ConnectionRequestTimeoutException. There are options to either block those requests for certain amount of time.
Now we don’t need to add any form of annotation on the service layer & the resource limit will be respected. All the code can be found on this Github branch.
Isn’t it just a rate limiter?
On first look, bulkhead might seem like a corporate jargon for a rate limiter though both of them serve different purposes. With rate limiter, we ensure that only a set of requests go through in a certain duration(Think 100 tokens in 1 minute). Though we don’t have control over what happens with those requests once the tokens are assigned. When the next set of tokens are refilled, we don’t have visibility about the state of previous requests. If those requests are still in processing state then it means we are going to soon consume more number resources from our application leading to the original problem state that we are tackling here.
Think this way: Rate Limiters usually sit at your front door to protect you from your users, while Bulkheads sit inside your walls to protect you from your own application’s use of your downstream dependencies. Bulkhead is a fixed token based rate limiter where the tokens are refilled only when a token is returned.
Conclusion
Bulkhead is a brilliant resiliency pattern to help you control the resources for integration points used within your application & to ensure that failure in a single downstream service doesn’t ends up bringing down your complete application. Also like most of things in software, bulkhead is not a one time thing which you can set & forget but rather its a scale which you need to bring up or down based upon the usage pattern of your application. So a connection limit of 10 for a service won’t be sufficient if you vertically scale your service nodes 1 year down the line. You will need to come back & evaluate the new config based upon the metrics for your application.
In addition to this pattern you would likely need a good observability stack so that you have clear view of what specific service is in downgraded state & its corresponding impact on your application. Till next time, happy learning!
