In real life applications, things can go wrong for scenarios which you have never considered while designing the application. These failures can happen due to
- Server failure
- Network failure
- Concurrent requests trying to update the same resource
Any request that requires doing more than one thing to fulfill the operation, can push the system in an inconsistent state in case of above failures. We can tolerate the complete failure of the request and allow the client to retry the request. But what we cannot tolerate is the request failing while creating some side-effects and leaving the system in a different state as compared to when the request was received by the system. So the requirement for our system is to perform an operation completely or don’t do anything and just fail.
This is the bare minimum definition of a transaction. Transaction in its most simplest form has just two states
- Success
- Failure (With no side-effects)
If we are able to achieve it then we can be confident that if there was a failure then we can ask the client to retry without being concerned about the state of the system. Consider an e-commerce website where order flow looks as below (Please note that this is a simplified flow. Actual systems are much more complex)
- Read from
ItemStock
table to check if item is in stock - Insert into
Payment
table to record an entry for item cost against customer’s payment method - Insert into
Order
table an entry for item against customer_id - Update the stock of item in
ItemStock
table - Insert into
Delivery
table a record for item against custom’s delivery address
All the above updates could be done as part of a single system or by various micro-services making updates in their respective databases. For simplicity sake let us consider that we are doing these updates as a single monolithic application to a single database node. Failure in either of the step requires reverting the changes in state that have been made as part of previous steps. Not doing this cleanup will leave our system in an inconsistent state after which we cannot consider our data store as a source of truth. This will lead to a bad customer experience as they too will be facing the side-effects of an inconsistent system.
In the above system which does not perform cleanup after a transaction has failed can lead to side-effects such as below:
ItemStock
table now has one less in count foritem_A
even though item_A is not actually sold and delivered to any customer.- Customer has been charged because an entry is inserted in the
Payment
table. This will result in customer seeing a charge on their end for an item which they actually didn’t received. - If the initial count of
item_A
was 1 inItemStock
table then the customer will now getITEM_OUT_OF_STOCK
error while trying to place an order for item_A. Remember that we have just decremented the count foritem_A
but it was never delivered to the customer. - We will see discrepancy during reconciliation due to the fact that we won’t have a record for
item_A
inDelivery
table but we will have a record inOrder
table. - Customer can reach out to support and the best we can do is to issue them a refund. So now we don’t have a net positive balance for
item_A
inPayment
table but we do have one less count ofitem_A
and an entry for the same inOrder
table.
It is a bad experience for both the customer and the e-commerce company. Customer is not able to place the order and e-commerce has to deal with an item just sitting in the inventory which cannot be sold due to a broken system. The expected outcome from our system is that we start with a clean slate once we face a failure rather than deal with an inconsistent system.
Most of the mainstream database applications provide support for transactions. Almost all the programming languages have wrappers and libraries that make it easy for developers to create transactions. Cloud databases such as DynamoDB also provide support for transactions. For now we have considered just one transaction but in a real-life application our system will have to deal with multiple transactions in parallel. There are various problems a database system needs to solve in order to build a robust storage system
- Should one transaction see the updates done by another transaction if it has not been committed yet?
- Should each transaction have their own copy of database system view for consistency purposes?
- How do we deal with two transactions trying to update the same resource? Eg two clients trying to place
item_A
in the above system.
There are lot of interesting concepts that handle such scenarios and databases provide you with ability and freedom to choose from a wide-array of solutions to tackle these problems. As a developer we might be dealing with a higher level of abstraction in our development workflow but knowing what goes under the hood can help us to tackle the non-obvious issues and bugs. In this blog series we will cover the problems that can happen during a transaction in storage system and how does systems solve them.
Also note that transactions become a bigger challenge when the data updates are not part of a single database but across multiple micro-services handling their respective database nodes. Consider the above e-commerce website where
ItemStock
service is responsible for reading & updating the item countPayment
service is responsible for charging customer and creating payment recordOrder
service is responsible for creating order recordsDelivery
service is responsible for scheduling a delivery and adding delivery records in database
Now we need to handle multiple service calls which comes with their own challenges where each service can fail individually or service can perform the operation on their end successfully but we are unable to receive a successful response due to network issues. Clean up in case of failure also becomes challenging when we are dealing with multiple nodes. This is also known as Distributed Transaction. I will cover this after covering the challenges that come with creating a transaction in a single node database system.