Working with various isolation levels, we have seen a compromise being done between getting things correct and getting things fast. It feels like being stuck between a rock and a hard place. If we choose a weaker isolation level then we run the risk of lost updates, write skews. If we go towards a serializable isolation then we tend to slow down the performance of our system even though the system works correctly.
Serializable snapshot isolation(SSI for short) brings out the best of both worlds. It provides a mechanism using which we can avoid issues such as lost updates, write skews but not at the cost of becoming a bottleneck for our application. It is a fairly recent concept but has been adopted by databases such as FoundationDB.
Previously when we saw serializable isolation, it resembled to great extent with pessimistic concurrency control. The isolation level assumes that bad things are going to happen as a result of concurrency and the best way to prevent them is to disallow concurrency altogether. Though it does work in practice, it comes at a cost of performance. If we look closely then serializable isolation doesn’t actually handle concurrency. It just disallows it due to which we don’t face the issues. But this solution doesn’t scale well as we are bound to receive concurrent requests and we don’t want clients to get frustrated by being blocked as our database is busy processing a request at that time.
On the other hand, optimistic concurrency control allows the concurrent transactions to move forward instead of blocking them. At time of commit, it verifies for any violations and if there is any violation then it aborts the transaction which needs to be tried again. Optimistic concurrency control works well if there is a low amount of contention between transactions for the resource they are trying to access. But if this contention is high then it can lead to large numbers of transactions being aborted and then being retried altogether in turn overloading the database system.
SSI is a flavor of optimistic concurrency control. It combines optimistic concurrency control with snapshot isolation. So all reads within a transaction are done from a consistent snapshot and writes go through optimistic concurrency control where conflicting transactions are aborted. Major challenge for SSI is to detect when a transaction enters the conflict zone. It does this by:
Detecting stale MVCC reads
If two transactions are trying to update the same resource under their respective snapshots, SSI keeps track of all the instances where it ignores another transaction’s write while processing the read/write operation for a transaction. For example:
inventory_table(There is no longer an
T2reads to check if
itemAis available in
inventory_table(Result will be true as
T2is reading from its consistent snapshot)
inventory_table(The write is successful due to consistent snapshot but the database detects that it is working on a stale result)
T2tries to commit and is aborted
Why does the database wait for a transaction to commit before aborting it? Why not abort it as soon as it sees T2 acting on stale data?
- It can be the case that T1 might have itself aborted the transaction
- It can very well be the case that T2 might decide to abort the transaction
This ensures that the database avoids unnecessary aborts as each abort will result in a retry.
Detecting writes that affect prior reads
SSI also detects writes that modify the data after it has been read. In the on-call example that we saw during introduction to serializability, a worker can go off the on-call if there are more than one workers already on-call. SSI ensures that the transaction that works on modified data should be aborted as it might be acting on previously read data that has now gone stale.
This is done by keeping a record of indexes which concurrent transactions have read. Modification done by one transaction on these indexes enable a flag that notifies other transactions when they try to commit an update on these records. This notification acts in aborting the transaction and retrying it again. As any update to index creates a possibility that prior read is now outdated and hence transaction should be retried to get consistent view of data. Consider the following scenario:
- Transaction T1 checks for list of workers on-call for a shift_id (Count = 2)
- Transaction T2 checks for list of workers on-call for a shift_id (Count = 2)
- Database records the index for shift_id for reference
- T1 removes itself from on-call making the total on-call count to 1
- Database detects this modification
- T2 removes itself from on-call making the total on-call count to 1 (On snapshot level. Actually it makes the count = 0 as T1 has also removed itself from on-call)
- T1 commits
- T2 commits and is aborted by database.
As SSI keeps record of changes across the transactions, this bookkeeping adds an overhead to overall performance of database. We cannot afford to reduce this bookkeeping as it might lead our database to an inconsistent state due to concurrency issues. Compared to two-phase locking, SSI does not blocks read transactions for write and vice-versa. As long as transaction performs read operations, other transaction can update the records. This solves a big performance bottleneck that we saw in 2PL.
Task of detecting conflicts can be scaled by distributing the load across multiple machines to make conflict detection faster. One issue which can arise is due to aborting the transactions. A long running read-write transaction can become an issue if its gets aborted and is required to be retried. For this case SSI requires read-write transactions to be short. Long running read transactions are ok as they run on their respective snapshot.
This finishes an introduction to transactions and various techniques that can be used for handling concurrency on transactions. I feel it’s just scratching the surface and there is lot that can be discussed into building a reliable storage system that can handle transactions correctly and efficiently. I will continue diving more into storage technologies and share my findings as part of this blog post.