Breaking News

Sunday, February 16, 2014

Use of NOLOCK in SQL Server Database - SQL Server 2005, 2008, 2012, 2014

Intro:

In case of OLTP systems, where the person making the recommendation wants to ‘improve’ the performance of the data base by reducing and removing the locks (that are held for SELECT queries), NOLOCK is generally used.

For e.g.:
USE [dbo].[ssamd]
GO
SELECT * FROM [MANF] WITH (NOLOCK);

It’s a known fact that NOLOCK allows dirty reads.  Data that has not been committed can and will, be returned.  In straight view, an insert, update or delete that is in process, will considered for the result set, regardless of the state of the transaction.  This may mean:  returning rows from an INSERT statement that may potentially rollback, or returning rows that are being deleted.

There are times when a query is returning rows that you know for certain are not being modified. Let us consider a real life scenario:  Suppose, you are updating rows for one potential client and need to return rows for a second client.  In this case, will the use of NOLOCK be safe?  The data isn’t being modified for the second client, so you might assume that the returning that data won’t have an opportunity for dirty reads.  Unfortunately, this particular assumption is not correct.

Side effects of NOLOCK (a real-time scenario):

1st Session:

Suppose, there is a table for storing information for a client that represents ‘Points’ which the client has in his system.  In the code below, let’s add in 10 rows for the client with ‘personid’ equal to’ –1’.  Then the script will loop through an INSERT for random clients and adding in more points.  Since the table uses a ‘unique-identifier’ for the clustering key, the inserts will cause frequent page splits.



2nd Session:

In this session, the below query is executed that returns a summary of the data for the client where ‘personid’ equals ‘–1’. In this query, NOLOCK is used (since we are working under the premise that it is safe to use the hint on rows that we know are not affected by the ongoing inserts.).



On the first execution of this query, the loop exited almost immediately.  (See the image below) The first time it ran in less than a second.  Instead of returning a total of $100 in points, the result was $90.



On second execution, the same script returns different results.  Instead of the $90 returned on exit last time, the total is now $110 in points.  Looking at the results, this time it took 7 seconds to get an incorrect total !!



While these two executions did return rather quickly, there were a few subsequent executions that ran for 6, 10, and 24 minutes before incorrect results were encountered.  Even though the length of time increased, the risk for incorrect results was equal.  There is no guarantee that correct or incorrect will results won’t be returned at any time.


Analysis and Cure of the problem:
The cause for the incorrect results (in the example) is the occurrence of page splits from the data modifications.  While the secondary client’s data wasn’t being changed, the pages that contained this data were being changed.  When the pages were being split, all of the data on the pages is affected, even those that are not related to the actual rows being changed.  And depending on the state of the page split and how the data is currently stored, this may result in either more or less rows data being returned.

Ø  The typical protection for this type of operation is the use of ‘READ COMMITTED’; which prevents dirty reads from occurring. Of course, this uses ‘locking’ (which it should) to guarantee that the dirty reads do not occur.  But this provides that guarantee that the data returned is the data that is committed, regardless of whether a page split is moving around your data.

Ø  One of the main concerns that people have with ‘locking’, is the blocking that is associated with two users trying to access the same locked resource.  As an alternative to using NOLOCK, try using ‘SNAPSHOT isolation levelinstead.  Through this, readers won’t block writers; which will reduce the amount of lock blocking on your data platform.


Final words:

So, try not to use NOLOCK if you have a choice. If you have to use NOLOCK, please keep in mind, a page split can change the location of the data being queried, which in turn changes the data in-flight.






Read more ...