Friday, 31 December 2010

Why Should I Rebuild My Clustered Index? Part I

I am currently working in an environment, where end users are able to create their own tables, and then subsequently add and remove columns from those tables, through a GUI. (I know, I know - I didn't design it!!!)

The point is, that a lot of these tables are pretty large, and because of a lack of Clustered Index rebuilds, they are a lot larger than they need to be. What do I mean? Well, you may be surprised to know, that when you drop a column, it does not actually delete the column from the pages that store the table. All it does is delete the metedata describing the column. (And yes, of course I'll prove it!)

Take the following table as an example...

 ...Now we have a table, populate with a few rows, we need to look inside a page of this table, and see how the data is stored. We will do this using DBCC PAGE, where we will pass in the database, the file, the page ID and finally the print option that we want. We will choose option 3, which gives us the most detail. (Note that if you run this script, your page ID will almost certainly be different to mine)
 CREATE TABLE AlterTableDemo
(
Entry_ID  INT   IDENTITY,
Large_Column NCHAR(500)
);


CREATE CLUSTERED INDEX CI__AlterTableDemo ON AlterTableDemo(Entry_ID);

INSERT INTO AlterTableDemo (Large_Column)
VALUES ('This is a large column');

GO 5


Before we do that however, we need to find out the page ID that our table is stored on. To do this, we will use the undocumented fn_physlocformatter() function. (Please see this post for more info)...


SELECT sys.fn_PhysLocFormatter(%%physloc%%), *
FROM AlterTableDemo;


DBCC TRACEON(3604)

DBCC PAGE(Adventureworks2008,1,29014,3);
...A portion of the DBCC PAGE output is shown below...



...You can see from this output, that as expected, slot 0 (the first logical row) contains our 2 columns. So now, lets delete our text column...


ALTER TABLE AlterTableDemo
 DROP COLUMN Large_Column;
...So now, if we look inside the page again, surly there will only be one column? Or will there? Lets have a look...



...From this output, you can clearly see, that although the column has been marked as dropped, the physical length is still 100 bytes, meaning that the column has not been physically removed. This is done as a performance optimization, and in fact, this column will not be physically removed, until we rebuild the Clustered Index on the table.

Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

No comments:

Post a Comment