Thursday, 24 February 2011

SQL Server Schedulers and CPU Performance

For most applications, Windows handles (as you would expect) the scheduling of work onto each CPU. This uses a preemptive scheduling approach, which means that Windows will give each process a priority based on a combination of resource usage, interactivity, activity and user preference.

Once a process has reached its turn, Windows assigns it to a core and it begins, but the preemptive approach, means that Windows is able to forcibly relinquish this process and let another process take over.

This works well for most applications, but not so well for SQL Server. This is because it treats SQL Servers worker threads with the same priority algorithms as any other processes within Windows, and this can lead to lots of context switching.

To get around this issue, SQLOS has it's own CPU Scheduler, called the SOS Scheduler. This allows SQL Server to schedule it's own processor workload, using a cooperative scheduling algorithm. This means that instead of forcing a thread to yield, it allows to threads to yield voluntarily.

But hold on a moment, surely, this could lead to a situation where a runaway thread just takes control of the entire system and does not let anything else get a look in, right? Thankfully, no! Instead, SQL has a process called the Scheduler Monitor, which is a thread that will periodically check to make sure that active schedulers have yielded since the last time it checked. Unlike other SQL threads, this thread works in preemptive mode, so it can not cause the above issue itself.

So how many schedulers do you have? Is there a one-to-one mapping with CPU cores? Well, not exactly. There will be one scheduler for each CPU core in the server (even if SQL has been configured not to use them, by setting processor affinity), there will be one scheduler for the DAC (so that even if something has gone badly wrong with the schedulers, you can still connect) and then one scheduler for each Resource Monitor.

Resource Monitor is a process that helps control memory pressure in the cache, and there will be one for every NUMA Node within the system, regardless if you have hard or soft NUMA. (Non-Uniform Memory Access is a way improving performance by splitting memory between processors or groups of processors, to avoid cache misses - but I digress further and further, so I will talk about this another time!)

You can see your schedulers by querying the sys.dm_os_schedulers DMV. This will give you lots of information, but there are some columns that are especially worthy of note.

First up, is a column called Status. Here you may see various options made up of ONLINE/OFFLINE and VISIBLE/HIDDEN. If a scheduler is Online, it means the SQL Instance has access to it. Offline, means that you have set CPU Affinity, and SQL is not allowed to use it. The reason it is set to Offline, rather than removed, is because it allows you to change your CPU affinity, without the need to restart the Instance. (Very cool!)

If a scheduler is Visible, it means that it is mapping directly to a CPU Core and will accept user requests. If it is set to hidden, it means it is associated with a Resource Monitor. The scheduler associated with the DAC is easy to identify, as it ends with (DAC) ! ;-)

The next interesting column is preemptive_switches_count. Now, you will remember me telling you earlier that SQL threads are cooperative, rather than preemptive, so why do we have this counter? Well, if SQL needs to run anything outside of the SQL Server environment (such as distributed queries, or certain CLR objects, etc), then it must switch to preemptive mode, so that Windows can schedule it to run.

runnable_tasks_count brings some context to our understanding of SQL Waits. A request in SQL can be in one of three states. RUNNING, RUNNABLE or SUSPENDED. Running is fairly obvious, Suspended means that it is waiting for a resource to become available, (this is where you see wait types associated with disc such as IO_LATCH_STATS, network waits, such as NETWORKIO and transaction log bottlenecks, such as WRITELOG see this post, etc), and Runnable, means that it is ready to run, but waiting for a scheduler to become available.

work_que_count shows you the number of tasks that are assigned to the scheduler and waiting to run. pending_io_count shows the number of IOs that the scheduler has that are waiting to be completed. These can be helpful for diagnosing resource pressure.

Whilst we are on the subject of resource pressure, it is quite common in multi-processor servers to see a large number of CXPACKET Waits. These waits can often be a symptom of badly tuned queries, and can be resolved by introducing proper indexing strategies, etc, but sometimes they can occur as a result of parallelism being a bottleneck. But parallelism is good right? Well not for OLTP systems. It would be nice if there was a way to tell SQL the type of workload to expect, but there isn't. If your queries are tuned, and you are still getting lots of CXPACKET Waits, then basically a query is running in parallel, and one core has already finished it's work, and is waiting to receive a packet from another core, so that it can synchronise, etc. The way to get around this, is to reduce the MAXDOP setting for the query (or the whole instance in some cases).

If however, you reduce your MAXDOP setting, then you can suddenly see your CXPACKET Waits being replaced by SOS_SCHEDULER_YIELD Waits. This is because now you have reduced the parallelism, then you are coming under CPU pressure on the cores that are still being used. Basically it is telling you that a thread has voluntarily yielded, so that another thread can take over, and is now waiting for it's next turn on the Scheduler.

So if you are coming under CPU pressure, what can you do about it? Well, that is another long blog for another time, but here are a few suggestions.

1) Check your benchmarks to see if you can establish a usage increase pattern (You are taking regular benchmarks, right?) You may find that it is a natural curve, or you may be able to correlate a sharp spike to a particular release, and then go troubleshooting...

2) Run SELECT * FROM sys.dm_exec_query_stats a CROSS APPLY sys.dm_exec_query_plan(a.Plan_Handle) ORDER BY total_worker_time DESC You can then click through to the query plans of the most expensive queries to see if they can be optimized.

3) Run a Perfmon trace simultaneously with a Profiler trace. You can then correlate the two traces in Profiler to identify the queries causing CPU pressure and investigate.

I hope you enjoy reading this post as much as I enjoyed writing it. After a week spent playing with a very large FastTrack at a Microsoft Lab, I felt the overwhelming urge to write something geeky! :)

Monday, 21 February 2011

Is Dynamic SQL ALWAYS Bad? - Follow Up

Last week, after a chat with Kimberley Tripp, I wrote this post. Kimberley has turned this conversation into two SQLMag articles, which probably explain the situation slightly more elequantly that I did...

See here and here. :)

Tuesday, 15 February 2011

SSIS In Denali. Sexy! But Substance?

In Denali, SSIS has been given a face lift, and it looks pretty sexy, that's a given. But for my first post on the subject, I wanted to look instead at some of the new functionality to see how it performed, and one of the biggest new features in terms of functionality, is the Integration Services Database, so...what does it give us?

Well, I started by creating a simple SSIS package in the VS 2008 Shell. (We have been promised VS 2010 shell for RTM!) I used the new Source Assistant and Destination Assistant to create a data flow that just pumps rows from one table to another.

Then, in SSMS, I created an Integration Services Catalog, by right clicking on the Integration Services folder. Two points of note here were that it made me enable CLR, and it enforced a complex password...

 ...Then I created an organisational folder, under the catalog...

...I then tried to import the package from the context menu of the folder. It did not work, but to be fair, I am using CTP1! So I went back to BIDS and used the new Deployment Wizard (I'll blog about this soon, because it is quite good, and very, very different!) and deployed the project to the SSIS database. Denali uses the concept of deploying projects, much more rigorously than previous version, as many of the new features (such as parameters) are based at the project level.

After deployment, The project and package appeared under my folder. Right clicking the package allowed me to run a validation process, shown below. I could also have done this at the project level...

...Everything seemed Ok, so I ran the package. This gave me a jazzed up "Progress Window", which furnished me with status information in real-time...

...After package execution, a right click on the Catalog, allowed me to click the operations button, which gave me a history of all the actions that I had performed so far...

...I then found a whole raft of extra information in the SSISDB Database, that was automatically created when I created the catalog. Some of the most interesting tables included Executions table, the Catalog_Property table and the Data_Type_Mapping table, all shown below...

...Finally, a quick look at the Properties of the Project, revealed the Previous Versions Window. This allows us see, and easily roll back to a previous version of the package, in the event that you had not tested your code properly! (Not that would every happen of course!)...

...So in conclusion, there will be many more Denali SSIS posts to come, but I'm sure you can tell from this post alone, that DBAs are going to love it!!!

When Is Data Written To Disc?

When you run a query in SQL, the first thing that happens, is SQL checks to see if the data pages that it needs to modify are currently in the Buffer Cache or not. It does this by reading an in-memory hash table, that records the dbid-fileno-pageno of each page that is in the cache at that time. You can see for yourself what pages are currently in your cache, by sys.dm_os_buffer_descriptors. If the pages are not in the cache, then it reads them in from disc.

Once all the required pages are in cache, SQL will begin an Implicit Transaction (if an Explicit Transaction is not already open for the statement). It will then perform the required operations on the page(s). If this was a DML operation, it will also mark them as being dirty. (This just means that they have changed in memory, without being written to disc). It also writes the transaction records to the log cache.

When the transaction (either implicit or explicit) commits, all transaction records, up to and including the commit, will be written to the log. There is a caveat here however; It is not just the log records from this specific transaction that are written. In order to make the transaction durable, SQL will also have to flush other log records that are intermingled with this transaction. It is also worthy of note, that SQL never writes individual log records, it always writes in blocks of between 512 bytes and 60 kb. This is called the "Log Block" and it's size is managed dynamically, internally.

This process is called write-ahead logging, or WAL, and ensures database consistency, but can cause a bottleneck, because SQL caps the amount of transactions that can be "in-flight". (This means that the Log Manager has issued a write, but an acknowledgement that the write has completed). The cap is different depending on your version of SQL, but in 2008, it is 3840KB, regardless of whether you are using a 32-bit or 64-bit version.

If this is causing a bottleneck, then you will notice lots of WRITELOG waits, when you query sys.dm_os_wait_stats. If this is the case, then you should try to optimize your transaction log. Details can be found in this post.

Once the log records have been committed to disc, then SQL is free to flush the "Dirty Pages" to disc. It does not do this straight away however. These dirty data pages are written to disc when the LazyWritter kicks in, or when a Checkpoint occurs.

A checkpoint occurs in a number of circumstances. Firstly, and most obviously, if you manually issue a checkpoint, when a backup starts, when the server is shut down (gracefully), or when an ALTER DATABASE statement is run. Also, if you are in Simple Recovery Model, a checkpoint will occur when your log is 70% full, or if you are in Full Recovery Model, a Checkpoint will run periodically, depending on your setting for recoveryinterval, which allows you to specify a target recovery time.

The LazyWriter process is a SQLOS thread, and it's role in life is to keep the Buffer Pool clean, with enough space to add new pages. It does this by periodically scanning the buffer cache and keeping track of how often a page is referenced. If it decides that it is no longer useful to keep a page in the cache, (using an algorithm that tracks how often a page is used and even what type of page it is) it checks to see if the page is "Dirty", if it is it writes it to the data file. It then marks the page as being "free" in a linked list. When a worker thread needs a page, it will choose one from this list.

Find my book, Pro SQL Server Administration on Amazon -


United Kingdom

Saturday, 12 February 2011

Is Dynamic String Execution ALWAYS bad?

I have always worked on the principle that DSE is always evil, and should only be used as a very last resort. However, I have been having a really interesting conversation with Kimberley Tripp (find her Blog here) this weekend, and I may have changed my mind...

So first off, lets set the scene... Imagine that you have a multi use stored procedure, that accepts multiple parameters, and different queries pass in widely different values, or even similar values with highly variable selectivity.

The first time you run the stored procedure, or the first time it needs to recompile, (due to stats updates, being kicked out of the cache, etc) SQL will use "parameter sniffing". This is where it builds the plan based on the first set of values you pass in. If, the next time you run the procedure, you give a value with a very different level of selectivity (i.e. The first time, you pass in 'Banana Custard' and the second time, you pass in 'B%') then SQL will use the original execution plan, which will obviously not be the optimal one.

To get around this issue, you can use local variables in the stored procedure. The problem with this, is that the local variables are unknown at compilation, which means that when SQL will use statistics that are based on an overall average of values, rather than from the histogram. This means that once again, you can end up with a sub-optimal execution plan.

So, to get around this, we could use OPTION(RECOMPILE), which means that we force SQL to use a "one-use" plan, and you will get a new plan every time it runs. This issue here, is that OPTION(RECOMPILE) has several issues including (depending on service pack level) returns incorrect query results.
So to get around this, (and yes, we are finally there...!) use dynamic string execution with EXEC(@Query). Although it is possible for this plan to be reused, it is pretty unlikely, and it is essentially treated like an ad-hoc query.

Using this technique comes with it's own challenges, not least the risk of SQL Injection, but you can mitigate these risks with simple techniques, such as ensuring you use the "principle of least privilege", using EXECUTE AS clause in your procedures, using the QUOTENAME() function of parameters you pass into the procedure, etc.

So in conclusion, yes there are times, when dynamic string execution is the best method. You learn something new every day!

A big thank you to Kimberley for helping me get my head around this one!

Sunday, 6 February 2011

Deleting Duplicate Rows

Myself and two of my colleges were lucky enough to be invited to meet Denise Drapper (Head of Data Integration at Microsoft) last week, and amongst other things, she gave us a tour of the upcoming DQS (Data Quality Services) that is set to ship with Denali. Basically, it supports the kind of functionality that previously we would have to do manually, by using Fuzzy Lookup component, etc in SSIS, but this is using different algorithms and is all wrapped up into a nice easy to use GUI, that makes suggestions about possible data quality issues, with a similarity score, and then asks you to confirm or deny the duplicate. It also learns from the work you have done previously, which is pretty cool.

Unfortunately, it doesn't look like I'm going to get my hands on this tech, to play around with it until CTP3 in the Autumn, however, so for now, we are stuck with the tools we have. And that brings me nicely to the point of this post. Basically, if you scan the Net for ways of deleting rows which are duplicated, but have different keys, there are several solutions, most of them derived from the Microsoft solution, which is not pretty, and involves moving data into # tables, etc.

In SQL 05 and 08, however, providing that there is a way of uniquly identifying each row on a series of columns, other than the unique key (even if this is consists of every column in the table) there is a method that can delete duplicates in one statement, without any # tables. (Although it will still create structures behind the scenes in TempDB).

Lets assume that we have a table created with the following structure...

CREATE TABLE DeleteDuplicates
    Col1 INT,
    Col2 VARCHAR(20),
    Col3 VARCHAR(20)

...Now lets populate it with a few rows...

INSERT INTO DeleteDuplicates(Col1, Col2, Col3)
VALUES (1, 'Peter', 'Carter-Greenan'),
       (2, 'Peter', 'Carter-Greenan'),
       (3, 'Charles', 'Feddersen')

...Ok, so if we assume that in this table, business rules allow us to identify duplicates using Col2 and Col3, we can see that we have one duplicate. So now, we can run the following query to remove the duplicate row...

FROM DeleteDuplicates


        ROW_NUMBER() OVER(PARTITION BY Col2, Col3 ORDER BY Col2, Col3) as RowNumber
        FROM DeleteDuplicates
    ) Dupes
    WHERE Dupes.RowNumber > 1

...So there you have it. Dupes removed, no temp tables, one statement. What more is there to say?

Find my book, Pro SQL Server Administration on Amazon -


United Kingdom

Salvage The Tail End Of The Log

In the event of disaster, the first thing that any experienced DBA will do, is try to back-up the tail-end of the log, in an attempt to salvage DML operations that have been made since the last log backup, but what if the server is inaccessible?
Well, the fix is "dirty" but effective. You basically need to "fudge" the log onto another server, in order to perform the backup.

First, on a second instance or server, create a shell database with the same name as the original. In this case...


...Now, set this database to be offline...


...Next, delete the data files and the log file that exists for the database on the second server. Now, Copy the log file from the original server, to the folder that the new database's file were in (before you deleted them).

If you now try to bring the database back on-line, you will receieve an error (obviously, considering you have no data files and an incorrect log file!), but you will be able to perform a tail-log backup, with the following syntax...

BACKUP LOG FudgeLog TO DISK = 'D:\SQLServerDownAndDirty\FudgeLog\FudgeLog_Log.bck' WITH INIT, NO_TRUNCATE

...You can now restore this backup at the end of your log chain, on your original server. You will loose no data, keep your job, and sleep happily at night! ;-)

Find my book, Pro SQL Server Administration on Amazon -


United Kingdom

Optimizing You Log File

Optimizing your transaction log could easily fill a 50 page white paper, or a week's consultancy, but here are a few points to get you started:

How many physical log files?
This one is easy...One! Exactly one, always one, the answer is one! There is an urban myth that having multiple log files will improve performance, but this is just a myth. Log files are written sequentially, so if you have multiple files, it fills one, then fills the next one, etc. Pointless!

How many virtual log files?
This one is a bit more complicated. Your physical log file is managed internally with a series of virtual log files. See this post for more details. If you have too few, then the active portion of your log may be too large, leading to longer backup times. If you have too many VLFs however, this leads to "log fragmentation" and cause performance issues.

The general rule of thumb is to have VLFs sized at around 500MB. To achieve this, grow your log in 8000MB chunks. If you already have log fragmentation, the way to resolve it, is to shrink your log file (thus deleting the existing VLFs) and then grow it again, using the desired chunk size.

I/O Subsystem
This breaks down into two parts. Firstly, make sure that you put the log file on a separate physical disc to your data files. Otherwise you will get I/O contention.

The second part is a little more complex, and that is what RAID level should you use? Well, this depends on what features of SQL you are using. Before we begin, it may be beneficial to bring some context, by looking at the standard RAID levels, and how many IOs they generate.

RAID 1 - IOs = R + (W*2) / 2
RAID 5 - IOs = R + (W*4) / No Of Discs
RAID 10 - IOs = R + (W*2) / No Of Discs

If you are just using SQL Server at a basic level, then RAID 1 is a good choice. This is because almost all activity on the log will be write activity. The only read activity will be backups and restores. In this scenario, RAID 1 is better than RAID 5 because for every one write, there will be two physical writes to the volume with RAID 1, as opposed to 4 writes with RAID 5. It is also better than RAID 10 in this scenario, because RAID 1 will perform the operations as sequential IO, where as RAID 10 will perform them as random IO.

However, this recommendation changes if you are using some of the more advanced feature of SQL, especially the HA features. If you are using Database Mirroring, Replication or Log Shipping, then you are liable to have a much higher read to write ratio, and hence need better read performance, whilst maintaining write performance. In this scenario, RAID 5 is still a no-go, because you will double the number of physical writes for every write operation, from 2 to 4, as discussed above, so your best choice would be RAID 10. RAID 10 will maintain the 2 writes per write operation ratio, but also give you improved (and more scalable) read performance. This is because the the overall IOs will be divided by the number of discs in both RAID 1 & RAID 10, but in RAID 1, you always have exactly 2 discs, where as with RAID 10, you will have a minimum of 4, and as many as you require.

Log Manager Bottlenecks
One reason that your log can become a bottleneck, is because in SQL Server 2008, the Log Manager will only allow you to have 3840K of data that has been sent to be written, but where no acknowledgement of the write has been received back. Before SQL 2008, this limit was even lower.

Monitoring Log Performance
Within SQL, keep an on the sys.dm_os_wait_stats dmv. If you see lots of WRITELOG waits, then the log is probably coming under pressure.

In Perfmon, you should be monitoring the obvious disc counters for you log drive, such as Avg Disk Queue Length, Disk Writes/Sec, etc, but you should also keep track of the Log specific SQL counters. In particular, pay attention to Log Flushes/Sec and Log Flush Wait Time.

Together, these counters should give you an all-round idea of how your log is performing. From a performance aspect, the Log tends to be the forgotten relative, but be careful, as forgotten relatives have a nasty habit of turning up unexpectedly, and making a nuisance of themselves!

Find my book, Pro SQL Server Administration on Amazon -


United Kingdom