Failure is the norm in distributed systems. It’s a question of when and not if. These failures can be due to various reasons. Node failure…
Replication: Challenges in onboarding a new follower
In case of a leader-follower system for replication, a typical flow looks like as following: Leader accepts all update operations from clients Leader replicates any…
An introduction to replication
As long as we are dealing with a system where our database lives on a server which never fails and the data that database stores…
Distributed Consensus: How to deal with disagreement?
This is third part of a series of posts about distributed consensus. I plan to cover distributed consensus in detail along with a deep-dive into Raft (A…
Distributed Consensus: How to decide what everyone agrees on?
This is second part of a series of posts about distributed consensus. I plan to cover distributed consensus in detail along with a deep-dive into Raft (A…
Distributed Consensus: Why do we need everyone to agree?
This is first part of a series of posts about distributed consensus. I plan to cover distributed consensus in detail along with a deep-dive into…
Thundering Herd/Cache Stampede
What is the most common solution you have heard for scaling a system with high number of read request for a resource that gets computed…