Distributed Consensus: Why do we need everyone to agree?

This is first part of a series of posts about distributed consensus. I plan to cover distributed consensus in detail along with a deep-dive into Raft (A well-known consensus algorithm). I also plan to finish this series with a code-walkthrough of the Raft implementation in Java along with the resources that I referenced while studying the topic. Stay tuned.

Consider you went to your office after a long break post covid. You enter into a cafeteria and you seem to encounter a colleague. You both plan on grabbing a bite before you enter into a never-ending conversation about covid. In order to do that you need to decide what to eat. You suggest a sandwich and your colleague suggests a salad.

One catch is that you can only order one dish as the cafeteria has a very simple system for placing the order: A dish name and count of people who are gonna eat it. As your colleague is gonna end up paying, he has more control over the decision. So you guys end up deciding to go for a salad. Sounds simple enough?

Now as your colleague was going to order, you see your complete project team entering the cafeteria. You look back and you can’t seem to find your colleague who was going to pay for the order (Well no one wants to pick such a big check). Now you already were set to go for a salad but now there are a couple of questions you need to answer before you start eating:

  • What to order? Remember that you can only order one dish and the majority needs to agree on that?
  • Who is going to place the order? The colleague who was gonna pay initially has vanished so someone needs to become the leader of the group and take the responsibility of paying for the order.

This is somewhat similar to what happens in a distributed systems. Let’s consider a scenario where client(Order counter) interacts with one server node for all its operations. This server node acts as a leader node (Person who pays and interacts with ordering system) and replicates the operation onto other follower server nodes.

Now with time any of these nodes can go down in a distributed environment. So there are two main questions that are needed to be answered:

  • Given a request what should be the response? It should be what majority nodes in server cluster agree to and leader node should be responsible for ensuring that majority agrees i.e Consensus.
  • What happens if the leader server goes down? Just like your colleague, a leader server can go down and someone has to step up to take the responsibility of becoming a leader server. How does this actually happen?

There is a family of protocols that solve the above problems. Paxos, Raft to name a few. Raft is much easier to understand and implement. There are also a lot of use cases of distributed consensus. One of well-known one being how the consensus protocol ensures that every new block that is being added to a Blockchain is the correct version of the truth that is agreed upon by all the nodes in the Blockchain.