Thundering Herd/Cache Stampede

What is the most common solution you have heard for scaling a system with high number of read request for a resource that gets computed from database?
Just put a cache in front of the database. Only read from the database if you don’t find the result in cache and once you read from the database, populate the cache in order to assist the future requests. Something like as below:

Sounds simple, doesn’t it.
Well this works completely fine as long as you don’t end up in a system with large number of CONCURRENT read request (Mind the caps on concurrent) for same resource. Think an episode of Game of Thrones and viewers tuning into HBO to watch it as soon as it’s available on OTT platform. 

Now in the above described system when let’s say we get 1000 concurrent request for the same resource and assuming we don’t have the value populated in cache, all of these request will fail the if check in our code and move on to compute the value from the database. As the one request is unaware of the fact that the remaining 999 requests are also computing the same result, it will go ahead and start reading from the database just like the other requests. This in turn results in a huge spike in database load and this phenomenon is known as cache stampede and its result of a problem known as thundering herd. These are herds of concurrent requests running over the cache that we built to improve the performance of our system and going directly to the database.

If a system that is read heavy doesn’t considers this problem and engineers for overcoming it then it will lead to a scenario where the database will be overloaded with similar concurrent requests affecting the overall health of the system. We are in a place where we are being hit with concurrent requests. Our cache doesn’t contains the key for these requests. And fetching from DB for all these requests will end up crashing our system.

What can we do to tackle this situation?

  • Just allow one request at a time to DB (Isn’t that slow? One DB request per resource? How would you know how many DB requests have you sent till now?)
  • Add locking for the function fetching from the DB (Yes. But that’s an additional engineering problem to solve? Does locking makes the request processing slow? What do you do for the remaining requests that fail to get the lock?)
  • Never allow a cache miss (This sounds interesting)

Let’s first discuss the queuing and locking approach:

  • Request Queue: Add the requests to the queue and allow one request to be processed at a time. Following this approach requires some form of back-of-envelope calculation as you will need to build separate queues for requests based on the resource they are trying to access. You don’t want to timeout a single request for a resource A just because there are 1000 requests ahead of it, all trying to access resource B. Facebook video does this to solve the issue.
  • DB Lock: Effective lock implementation is a monster on its own. You will have to consider additional write latency for lock’s state machine. You will also have to decide what to do with the requests that are unable to acquire the lock. Does it make sense to return a 404?. You will have to ensure that the locking works for each resource individually and not on the complete DB. So, you will have to find some metadata to identify the resource they are trying to access.

Now we dig into the interesting part:
Can we build a system where the cache miss never happens? Or at least cache miss never happens for resources high in demand?

  • As part of any write operation, update the cache as part of the update request rather than invalidating the entry for cache. So for updates like POST or PUT request, cache should be updated with new value. For DELETE request, cache should be updated with a tombstone value that depicts that the cache key is deleted and DB shouldn’t be scanned for the same key.
  • Another approach is to have an event which gets triggered for updating the cache at regular intervals if your system can work without strong consistency. This can be done on batch intervals or based on the time when cache is about to expire.

Wouldn’t this result in cache getting out of memory?
Well if your DB has 1 million records, there is a very low chance that your system is going to encounter 1000 concurrent requests for all the million records at the same time. This is where some smart data decisions need to be made in the system. At what point do you implement a no cache-miss approach on a DB record? This can be based on metrics coming from analytics team or understanding metrics on the kind of traffic you are seeing on your application.