Transactions: Why we need all or nothing? – Distributed Computing Musings

In real life applications, things can go wrong for scenarios which you have never considered while designing the application. These failures can happen due to

Server failure
Network failure
Concurrent requests trying to update the same resource

Any request that requires doing more than one thing to fulfill the operation, can push the system in an inconsistent state in case of above failures. We can tolerate the complete failure of the request and allow the client to retry the request. But what we cannot tolerate is the request failing while creating some side-effects and leaving the system in a different state as compared to when the request was received by the system. So the requirement for our system is to perform an operation completely or don’t do anything and just fail.

This is the bare minimum definition of a transaction. Transaction in its most simplest form has just two states

Success
Failure (With no side-effects)

If we are able to achieve it then we can be confident that if there was a failure then we can ask the client to retry without being concerned about the state of the system. Consider an e-commerce website where order flow looks as below (Please note that this is a simplified flow. Actual systems are much more complex)

Read from ItemStock table to check if item is in stock
Insert into Payment table to record an entry for item cost against customer’s payment method
Insert into Order table an entry for item against customer_id
Update the stock of item in ItemStock table
Insert into Delivery table a record for item against custom’s delivery address

All the above updates could be done as part of a single system or by various micro-services making updates in their respective databases. For simplicity sake let us consider that we are doing these updates as a single monolithic application to a single database node. Failure in either of the step requires reverting the changes in state that have been made as part of previous steps. Not doing this cleanup will leave our system in an inconsistent state after which we cannot consider our data store as a source of truth. This will lead to a bad customer experience as they too will be facing the side-effects of an inconsistent system.

In the above system which does not perform cleanup after a transaction has failed can lead to side-effects such as below:

ItemStock table now has one less in count for item_A even though item_A is not actually sold and delivered to any customer.
Customer has been charged because an entry is inserted in the Payment table. This will result in customer seeing a charge on their end for an item which they actually didn’t received.
If the initial count of item_A was 1 in ItemStock table then the customer will now get ITEM_OUT_OF_STOCK error while trying to place an order for item_A. Remember that we have just decremented the count for item_A but it was never delivered to the customer.
We will see discrepancy during reconciliation due to the fact that we won’t have a record for item_A in Delivery table but we will have a record in Order table.
Customer can reach out to support and the best we can do is to issue them a refund. So now we don’t have a net positive balance for item_A in Payment table but we do have one less count of item_A and an entry for the same in Order table.

It is a bad experience for both the customer and the e-commerce company. Customer is not able to place the order and e-commerce has to deal with an item just sitting in the inventory which cannot be sold due to a broken system. The expected outcome from our system is that we start with a clean slate once we face a failure rather than deal with an inconsistent system.

Most of the mainstream database applications provide support for transactions. Almost all the programming languages have wrappers and libraries that make it easy for developers to create transactions. Cloud databases such as DynamoDB also provide support for transactions. For now we have considered just one transaction but in a real-life application our system will have to deal with multiple transactions in parallel. There are various problems a database system needs to solve in order to build a robust storage system

Should one transaction see the updates done by another transaction if it has not been committed yet?
Should each transaction have their own copy of database system view for consistency purposes?
How do we deal with two transactions trying to update the same resource? Eg two clients trying to place item_A in the above system.

There are lot of interesting concepts that handle such scenarios and databases provide you with ability and freedom to choose from a wide-array of solutions to tackle these problems. As a developer we might be dealing with a higher level of abstraction in our development workflow but knowing what goes under the hood can help us to tackle the non-obvious issues and bugs. In this blog series we will cover the problems that can happen during a transaction in storage system and how does systems solve them.

Also note that transactions become a bigger challenge when the data updates are not part of a single database but across multiple micro-services handling their respective database nodes. Consider the above e-commerce website where

ItemStock service is responsible for reading & updating the item count
Payment service is responsible for charging customer and creating payment record
Order service is responsible for creating order records
Delivery service is responsible for scheduling a delivery and adding delivery records in database

Now we need to handle multiple service calls which comes with their own challenges where each service can fail individually or service can perform the operation on their end successfully but we are unable to receive a successful response due to network issues. Clean up in case of failure also becomes challenging when we are dealing with multiple nodes. This is also known as Distributed Transaction. I will cover this after covering the challenges that come with creating a transaction in a single node database system.