As part of a blog post in the past, we discussed how Quorums play a huge part in multi-node system. In order to revisit the concept we just need to focus on three attributes that define a quorum.
- N – Total number of nodes in our system
- W – Minimum number of nodes required for a successful write operation
- R – Minimum number of nodes required for a successful read operation
For maintaining a successful quorum, we have to ensure that
R + W > N. Even though this criteria looks achievable in the first look, we will start seeing its limitations once our system in deployed on more than one cluster of nodes or once we start seeing frequent node failures in our system. Quorums are not fault tolerant and we can easily end up in a scenario where the nodes from one cluster are unable to communicate to nodes from another cluster. This will result in failure to reach a quorum and result in failing read and write requests from the client.
Consider a scenario where we have to assigned
N nodes for a particular key. These
N nodes are spread across two clusters and none of these clusters have more than
W - 1 nodes for this key. Whenever we receive a request from the client to update the key, we send it to these
N nodes and if we receive a confirmation from
W of these
N nodes, we consider the update to be successful. Now if we encounter a network failure between these two node clusters then we won’t be able to reach a quorum for all the upcoming write requests as we are unable to communicate across all
N nodes and gather at least
W votes to get a quorum.
To handle the above issue we have a modified version of quorum called as Sloppy Quorum. Under sloppy quorum, when we are unable to reach all the
N nodes due to a partition failure, we temporarily store the updates on backup nodes. These backup nodes were not initially responsible for storing updates for the required key but they store the updates only in case of partition failure. The updates are stored along with the metadata which describes the original node that the key was required to be stored on.
So let us now revisit the above example. We had
N nodes spread across two clusters
P2. Now we are unable to reach
P2 cluster so in order to save an update for a key, we will go through the following flow:
- Store the update on original nodes for the key on
- Store update on backup nodes present on
P1cluster. Each backup node maps to the original node on
P2cluster. The update will contain the details about which node the update originally belongs to on
With above flow, we are able to reach a quorum and process the write request for the client. Each backup node will ping the original node on P2 cluster to check if partition failure is resolved. Once the backup node is able to communicate, it will share the update with original node on P2 cluster and remove the update from its own storage. This process of sharing the update post failure resolution is called hinted handoff.
In simple words…
A real-life example of sloppy quorum & hinted handoff will be if a coworker takes a message on your behalf once you are out on a break and shares the message with you once you are back. If you didn’t had such an understanding coworker with you then you might have missed the message completely. You will have to figure out on your own about what messages you missed and will be scared to leave your desk in future for a break.
So why not always use the updated variation of sloppy quorum and hinted handoff instead of boring old quorum? While this updated variation is perfect for a write-heavy system. It might not be the best solution if your system requires consistent reads. During network failure, your system will mark a write operation as successful by using sloppy quorum but note that not all the nodes for this updated record are in sync with the most recent update. So even though clusters in your system are unable to communicate amongst each other, a client might be able to send a read request to an outdated cluster and will receive a stale value. Systems that can tolerate eventually consistent results are the best candidates for this methodology. One example of such system is Dynamo which leverages sloppy quorum and hinted handoff to overcome temporary node and partition failure.
As an application developer, you will have to make the call about using sloppy quorum over strict quorum based on what are the requirements for your system. If you have strict requirements on showing the most consistent state to end-user then stick to strict quorums and the ask the client to retry in case of node/network failures. But if you are ok with eventually consistent results then you can start exploring the combination of sloppy quorum and hinted handoff to build a highly available system.