This is second part of a series of posts about distributed consensus. I plan to cover distributed consensus in detail along with a deep-dive into Raft (A well-known consensus algorithm). I also plan to finish this series with a code-walkthrough of the Raft implementation in Java along with the resources that I referenced while studying the topic. Stay tuned.
List of articles in the series:
Let’s revisit the problems we are facing in our cafeteria before we start our lunch?
- How to decide what to order?
- Who is going to take responsibility to ask everyone & place the order?
In this post we are going to tackle the first question i.e. how to decide what to order for lunch. We have our teammates among us. Our decision is based on a simple criteria which is that the dish with the highest votes win. So now let us break down how we will do the voting. For now we can assume that you are responsible for orchestrating the whole process of asking everyone what to eat and then finalize a result. Although consider that choosing a leader is a challenge on its own which we will tackle in future. What is the most basic step you will perform this operation:
- Ask everyone what they want to eat.
- Keep track of votes for each dish.
- Calculate which dish won with the highest vote (Assuming there is going to be no tie).
Is that going to be all? Let’s say I voted for a double cheeseburger(Don’t care about the diet) but the votes were in favor of a healthy arugula salad. Although I respect the voting system, I would like to be informed before hand that we are having a boring salad so that I can start preparing my tastebuds for the disappointment they are going to face.
So an additional task you need to do is to broadcast to everyone what we all have decided to eat by consensus so that we are well-informed about the result of voting.
Something similar happens when we are dealing with distributed systems. When a client interacts with the leader server by sending a request (Say SET A to 1
), the leader server has to perform actions somewhat similar to what you were doing for deciding on what to order:
- Get a consensus from peer servers about the operation
- Once it has majority of votes agreeing about the operation, it performs the operation on the key-value store by setting A to 1
- Inform all the peer nodes that the operation has been committed so that the follower nodes can also update their key-value stores
- Return response (This could be done before informing the peer nodes about consensus result too)
Above described is how log replication for a client operation happens in Raft protocol. Leader initially sends append entries for the operation it has received from client to all its follower servers. Follower servers have a responsibility of processing these entries and returning back with an acknowledgement. Once the leader has received a majority(2 out of 3 servers in above diagram) of acknowledgments, it updates its own state machine for the client operation. At the same time it sends out a message to all followers that the client operation has succeeded and the followers can update their state machines accordingly.
This is what a happy path of log replication in Raft protocol looks like. Having such a log replication is helpful when leader server goes down and one of the follower server has to take over the role of a leader. Though we miss out on some scenarios which we need to account for:
- What happens when a follower server comes up after leader has executed a bunch of operations? In other words how does a follower server catches up on past operations?
- How does leader ensures that a follower server does not contains incorrect set of logs on its end? Leader is also responsible for fixing follower nodes that have become inconsistent due to failures or missing out on previous messages from leader due to network issues?
I will cover these scenarios as part of my next post. I will also cover them on code layer as part of future posts which will describe how we cover such scenarios during implementation phase.
This is really awesome content. Thank you for putting this together