Replication: Can we have more than one leader? – Distributed Computing Musings

Answer to above question is a resounding YES! Up until now we have covered leader-follower based replication where all the writes go through the leader and we can scale the read traffic by distributing it among follower nodes. Although in above system we can very easily point out a single point of failure which is a single leader node. If our leader goes down there is no alternative to serve our write requests other than appointing a new leader. And as we understand from handling node failure for leader node, replacing a failed leader node comes with its own set of edge cases including:

Correctly deciding about failure of leader node
Ensuring that the failed leader doesn’t comes back up and starts acting as a leader after leader election
Have a robust leader election process

So all the above mentioned problems can be solved if we can just have multiple leaders. An extension to leader-follower replication is multi-leader replication. In this replication mechanism replication still happens in the same way where a leader replicates writes to all the followers. But now you have an option to fallback on other leader nodes if one of the leaders fails. In a multi-leader replication, each of the leaders individually also acts as followers to other leaders so that all leaders are also in sync with each other.

This mechanism brings about some good use cases that can be solved if we have a system with multiple leaders. Some of them are as follows:

System spread across multiple data centers

In today’s global ecosystem, storing the leader node in one data center can expose the application to downtime when that data center goes down. Also it increases latency of write requests if the request has to travel across the globe to the leader node situated in a single data center. In order to overcome the above two challenges, we can leverage multi-leader replication by distributing multiple leaders in various data centers across the globe. So in case of a data center failure, write requests can be routed to another data center. Also having multiple leaders can help in routing write requests to the closest leader in order to limit the network latency in comparison to a single leader system.

Offline operations

Applications that are expected to function correctly in offline mode can benefit greatly from multi-leader replication. For example the calendar app we use to schedule meetings and view our day/week/month should function even without internet access. They should also get synced with other calendars when they gain internet access.

So in offline mode we should be able to create meetings(WRITE) and view(READ) our schedule. And when we go online our calendar should reflect meetings where we were invited by other users of the application. In this case the calendar application itself acts as a leader by storing the schedule data on its local storage. When we go online, this local leader node syncs with other calendars(who also have their respective calendar nodes in their local storage) to provide you a consistent view of your calendar.

Collaborative Editing

Shared document editor applications such as Google Docs allow multiple users to update a document simultaneously. In this case, the user’s application or browser acts as a local replica. It stores the changed state and then replicates these changes to other owners of documents. This avoids locking a resource being edited by one user which otherwise will end up in making that resource inaccessible to other users. Consider unable to edit a line in Google Doc because another user is already writing on the same line.

One problem that is very visible from above examples is that there is a very high chance of more than one user updating the same resource and ending up in a state of conflict. Consider you adding a no-meeting block on your calendar in offline-mode vs a coworker adding you to a meeting during the same block. When you both go online there is a conflict in the calendar application on how to represent these two meetings. Think about multiple users typing at the same time on a shared doc editor. What happens if each of them types a different alphabet? What should be the global state of the doc? What state should the users see? What happens if one user enters a keyword and at the same time another user presses backspace at the same coordinate?

When concurrent requests land on two leader nodes for the same resource then there should be some mechanism to pick up one of the requests and discard others. This poses a challenging problem of conflict resolution in a multi-leader system which can introduce new kinds of bugs in our system and present a confusing state to our users.

So even though we solve the problem of single point of failure and latency that we were facing in the single leader system, we come face to face with another challenging problem of handling conflicts due to writes on multiple leader nodes. In upcoming posts we will see what approaches can we use to handle this write conflict and present a consistent state to all the users in a multi-leader replication system.