Monday, 24 January 2011

Understanding Transactional Anomolies & Isolation Levels

This isn't going to be a particularly technical post, but I have seen some confusion lately about transactional anomalies, and the isolation levels that protect against them, so I first wanted to cover what the three main anomalies are, and then talk about how isolation levels protect against them.

The worst anomaly is a Dirty Read. This is where a transaction reads data that is subsequently rolled back by another transaction, and hence, never exists in the database, as in the example below. This should never be allowed if business decisions are going to be made, based on this data.


DIRTY READ


Transaction 1
Transaction 2


BEGIN TRAN

INSERT INTO myTbl(Col1,Col2)
VALUES (1,2)


BEGIN TRAN

SELECT Col1, Col2
FROM myTbl
WHERE Col1 = 1 AND Col2 = 2

COMMIT TRAN
ROLLBACK



The next anomaly is a non-repeatable read. This is where, within the context of one transaction, the same query is issue twice, but the second time it runs the results are different, due to an UPDATE statement in a different transaction, as in the example below. This can cause an issue if you are performing complex calculations such as compound interest.

NON REPEATABLE READ

Transaction 1
Transaction 2


BEGIN TRAN

SELECT Col1, Col2
FROM myTbl
WHERE Col1 = 1 AND Col2 = 2


BEGIN TRAN

UPDATE myTbl
SET Col1 = 2
WHERE Col1 = 1 AND Col2 = 2

COMMIT TRAN
SELECT Col1, Col2
FROM myTbl
WHERE Col1 = 1 AND Col2 = 2

COMMIT TRAN



A phantom read is similar to a non-repeatable read, but relates to the other transaction inserting or deleting data, as opposed to updating it, such as in the example below. This will cause problems if you are performing very complex calculations, such as actuarial calculations.

PHANTOM READ


Transaction 1
Transaction 2


BEGIN TRAN

SELECT Col1, Col2
FROM myTbl
WHERE Col1 = 1 AND Col2 = 2


BEGIN TRAN

INSERT INTO myTbl(Col1,Col2)
VALUES (1,2)

COMMIT TRAN
SELECT Col1, Col2
FROM myTbl
WHERE Col1 = 1 AND Col2 = 2

COMMIT TRAN



So how can we protect against these anomalies? Well, the main tool we have at our disposal is transaction isolation levels. Traditionally, SQL Server has supported the following isolation levels:
Read Uncommitteded
Read Committed (DEFAULT)
Repeatable Read
Serializable
But since SQL Server 2005, the following isolation levels have also been provided:
Read Committed Snapshot (Becomes DEFAULT if turned on)
Snapshot
Read Uncommitted does not protect against any of the aforementioned anomalies, and works by not taking out any locks for read operations. It is known as an "optimistic" concurrency level.
Read Committed will protect against dirty reads, but will allow the other anomalies to occur. This level is the default, and works by taking out a lock for read operations, but then immediately releasing it. This is also an "optimistic" concurrency level.
Repeatable Read is a "pessimistic" concurrency level, and is so called because it will stop all anomalies except for Phantom Reads, but works by taking out locks for read operations and holding them for the duration of the transaction.
Snapshot is also a "pessimistic" concurrency level, and works by locking either an index range (if possible) or the entire table.
It goes without saying that if you choose either of the two pessimistic isolation levels, then you will encounter higher levels of contention, meaning more waits and worse...deadlocks!
So to get around this issue, Microsoft introduced Snapshot and Read Committed Snapshot. These isolation levels offer the same level of protection as Serializable and Read Committed respectively, but without taking out any locks!
So that's fantastic isn't it? Lets just use Snapshot for everything right? Wrong! Like everything, there is always a trade-off, and in this case, it relates to how these two isolation levels work. They both work by taking advantage of TempDB called the Version Store.
If you use Snapshot isolation, then at the beginning of every transaction, SQL takes a "snapshot" of all of the required data and puts it in the version store, and satisfies the query(s) from there. This has tow issues. Firstly, you are generating a lot more I/O, so you must ensure that TempDB is properly sized and scaled, and that the I/O sub-system is man enough to handle the extra workload. The second issue is that this could potentially lead to "lost updates" which are not supported by SQL Server. If they were, this isolation level would be aptly called Chaos. This means that if you update the data in the version store, and in the base table at the same time, your snapshot transaction will throw an error and be rolled back.
Read Committed Snapshot also uses the version store, but this time it takes the snapshot at the beginning of every statement within the transaction, rather than at the beginning of the transaction itself. Worse than that, if you turn Read Committed Snapshot on, you are turning off Read Committed, which means that Read Committed Snapshot becomes the default isolation level, and is used for any transactions where an isolation level is not explicitly stated...Watch your server die!
Quite often, if you need a pessimistic isolation level, you just need it for part of a transaction, or for one or two specific tables. If this is the case, rather than using a transaction isolation level, you can use query hints, which give you much more granular control. More information about query hints and how to use them is available in books online.
I hope this cleared things up for anybody who was not familiar with this topic!

Find my book, Pro SQL Server Administration on Amazon -

America


United Kingdom

No comments:

Post a Comment