Friday 21 January 2011

To GUID Or Not To GUID?

Sometime, I come across people who are in the habit of using GUIDs as Primary keys (and building a Clustered index on that primary key).

Well, the idea of a Primary Key is that it uniquely identifies records in a table, and the idea of a GUID is that it's unique, so that's fine, right? Wrong!

There are a couple reasons for this, the first of which is fragmentation. If you create you GUID values by using a DEFAULT Constraint that calls the NEWID() function, then it creates the 16-Byte numbers in a random order, which means that in order to maintain the logical order of the rows, (remember that a clustered index does NOT maintain physical order of rows see this post) then it is going to have to shift the logical order of rows in order to insert the new values. This can lead to page splits, and thus fragmentation.

But it gets worse than this if you have non-clustered indexes on the table, because each of these indexes will need to have their pointers to the new CI pages updated, leading to extra overhead, and fragmentation of those indexes as well. You know I like to prove things, so lets take a look at an example...

The first thing I did, was create a database, with one table using the following script...

CREATE TABLE FragWithNewID
 (
 ID uniqueidentifier NOT NULL,
 Col1 nchar(10) NOT NULL,
 Col2 int NOT NULL
 )  ON [PRIMARY]
GO
ALTER TABLE FragWithNewID ADD CONSTRAINT
 DF_Table_1_ID DEFAULT NEWID() FOR ID
GO
ALTER TABLE FragWithNewID ADD CONSTRAINT
 PK_Table_1 PRIMARY KEY CLUSTERED
 (
 ID
 )

CREATE NONCLUSTERED INDEX [IX_Col1] ON [dbo].[FragWithNewID]
(
 [Col1] ASC
)

CREATE NONCLUSTERED INDEX [IX_Col2] ON [dbo].[FragWithNewID]
(
 [Col2] ASC
)


 ...Next, a simple script to insert some values into the table...

INSERT INTO FragWithNewID (Col1, Col2)
VALUES('aaaaaa', 1)
GO 10000


...So now, lets check out the fragmentation. To do this, I will use the DMV sys.dm_db_index_physical_stats (Yes, I know it's actally a DMF, but like everybody else, I tend to use the term DMV collectivly)...

SELECT OBJECT_NAME(object_id), index_type_desc, avg_fragmentation_in_percent, avg_page_space_used_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID('GUIDFragmentation'),OBJECT_ID('FragWithNewID'),NULL,NULL,'DETAILED')
WHERE index_level = 0


...The results are staggering! Almost perfect fragmentation of all 3 indexes...

 (No column name)   index_type_desc                   avg_fragmentation_in_percent    avg_page_space_used_in_percent
  FragWithNewID     CLUSTERED INDEX           98.80952381                              72.04514455
  FragWithNewID     NONCLUSTERED INDEX   96.96969697                              78.59694836
  FragWithNewID     NONCLUSTERED INDEX   98                                             64.22041018



...So what about if you use NEWSEQUENTIALID(), rather than NEWID()? Well, it is the better of the two evils, as it will not cause anything like as much fragmentation, because the GUIDs will be generated in order, but you are still using a 16-Byte value, rather than a 4-Byte INT column, with the IDENTITY property set! This means that your rows will take up more space on the page, meaning you will have move pages, and thus less IO performance. Also, remember that the clustered key is not just stored in your CI, it is stored in your non-clustered index as well, so all indexes on the table will be less efficient.

I'm not saying UNIQUEIDENTIFIERs don't have their place, they certainly do, but that place is not in a clustered index! I saw one instance where a third party application insisted that the PK was a GUID, the recommendation I made was to have the PK constraint on the UNIQUIEIDENTIFIER, and then create the clustered index on a different column with a UNIQUE constraint. This column was an INT with the IDENTITY property set!

There is always a way around things! :)

No comments:

Post a Comment