Friday 31 December 2010

SQL Server 2011 - THROW

Just a quick one, as 3 posts on New Year's Eve is probably a little excessive, but I have just found a cool new (and long awaited) feature in Denali. The ability to Throw an error.

You can use this in your custom error handling to throw a custom error message, like RAISERROR but also (and probably more usefully) you can use it in a CATCH block, to essentially re-throw the error that caused the CATCH block to kick in. For example...

...To produce the following output...
BEGIN TRY
     SELECT 1/0
END TRY
BEGIN CATCH
     THROW
END CATCH


...There you go, very simple, but every liuttle helps!
(0 row(s) affected)
Msg 8134, Level 16, State 1, Line 2
Divide by zero error encountered.


I am reliably informed that the final release will also include a FINALLY block, but that does not seem to work in CTP1, so I will post on that after I have had the chance to play with it.

HAPPY NEW YEAR!!!

Logical Order NOT Physical Order

There is a common misconception, that your data will physically be stored in the order determined by your Clustered Index, but this is not true. A CI in SQL will logically order your data, but not physically order it. Take the table below as an example...


CREATE TABLE LogicalOrderDemo
(
Entry_ID  INT,
Large_Column NCHAR(500)
);


CREATE CLUSTERED INDEX CI__LogicalOrderDemo ON LogicalOrderDemo(Entry_ID);

INSERT INTO LogicalOrderDemo
VALUES (2,'This is a large column'),
(3,'This is a large column'),
(4,'This is a large column'),
(5,'This is a large column');
...If we now use sys.fn_physlocformatter (see this post for more details) to find the page number, and then look inside the page using DBCC PAGE, to view the row offset table...


SELECT sys.fn_PhysLocFormatter(%%physloc%%), *
FROM LogicalOrderDemo;


DBCC TRACEON(3604)

DBCC PAGE(Adventureworks2008,1,29012,1);
...If you scroll to the bottom of the output, you will see a row offset table, similar to the one below. The row offset table tells SQL the physical offset of each slot, from the beginning of the page. A slot is a logical container for a row. The offsets are in reverse order, so Slot 0 holds the first row (With an Entry_ID of 2)...



...So now, lets go ahead and insert another row. This time, we will insert a row with an Entry_ID value of 1. As we have a Clustered Index on this column, then we know that SQL will insert it into Slot 0, in order to enforce a logical order to the rows, but will it physically re-order the data on the page? Lets see...



...No! You can see from the offset table that it has inserted the row at the end of the page. You can interpret this because Slot 0 will contain our new data, but its offset value is the largest in the table, meaning that it is furthest away from the beginning of the page.

In reality, it has inserted it in the first free row space on the page, but as we have not deleted any data, in our case, that is at the end.

Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

Why Should I Rebuild My Clustered Index? Part I

I am currently working in an environment, where end users are able to create their own tables, and then subsequently add and remove columns from those tables, through a GUI. (I know, I know - I didn't design it!!!)

The point is, that a lot of these tables are pretty large, and because of a lack of Clustered Index rebuilds, they are a lot larger than they need to be. What do I mean? Well, you may be surprised to know, that when you drop a column, it does not actually delete the column from the pages that store the table. All it does is delete the metedata describing the column. (And yes, of course I'll prove it!)

Take the following table as an example...

 ...Now we have a table, populate with a few rows, we need to look inside a page of this table, and see how the data is stored. We will do this using DBCC PAGE, where we will pass in the database, the file, the page ID and finally the print option that we want. We will choose option 3, which gives us the most detail. (Note that if you run this script, your page ID will almost certainly be different to mine)
 CREATE TABLE AlterTableDemo
(
Entry_ID  INT   IDENTITY,
Large_Column NCHAR(500)
);


CREATE CLUSTERED INDEX CI__AlterTableDemo ON AlterTableDemo(Entry_ID);

INSERT INTO AlterTableDemo (Large_Column)
VALUES ('This is a large column');

GO 5


Before we do that however, we need to find out the page ID that our table is stored on. To do this, we will use the undocumented fn_physlocformatter() function. (Please see this post for more info)...


SELECT sys.fn_PhysLocFormatter(%%physloc%%), *
FROM AlterTableDemo;


DBCC TRACEON(3604)

DBCC PAGE(Adventureworks2008,1,29014,3);
...A portion of the DBCC PAGE output is shown below...



...You can see from this output, that as expected, slot 0 (the first logical row) contains our 2 columns. So now, lets delete our text column...


ALTER TABLE AlterTableDemo
 DROP COLUMN Large_Column;
...So now, if we look inside the page again, surly there will only be one column? Or will there? Lets have a look...



...From this output, you can clearly see, that although the column has been marked as dropped, the physical length is still 100 bytes, meaning that the column has not been physically removed. This is done as a performance optimization, and in fact, this column will not be physically removed, until we rebuild the Clustered Index on the table.

Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

Saturday 25 December 2010

Contention On The Heap

I was reading a very interesting blog post from Paul Randal, called Lock Logging and Fast Recovery. The theme was log internals, but it set my train of though to how much concurrency you can achieve on a heap, with no non-clustered indexes, and the results may (or may not surprise you).

I created a table called ConcurrencyTest, and populated it, with the following script...

USE Adventureworks2008
GO

CREATE TABLE ConcurrencyTest
 (
 PKCol int NOT NULL IDENTITY (1, 1),
 TextCol varchar(50) NOT NULL
 )  ON [PRIMARY]


GO
ALTER TABLE ConcurrencyTest ADD  CONSTRAINT DF_ConcurrencyTest_TextCol  DEFAULT ('Test Data') FOR [TextCol]
GO


CREATE UNIQUE CLUSTERED INDEX Clust ON ConcurrencyTest(PKCOL)
GO


INSERT INTO ConcurrencyTest DEFAULT VALUES;
GO 10000


...I then picked 2 rows at random, and used %%physloc%% to see what pages the rows were stored on. (See my post What Files and Pages Is My Table Stored On? for more info).

The results below, show that the 2 rows are on separate pages...



...I then opened 2 new query windows, and SQL assigned them the session IDs 53 and 57 respectively.

Next, in each of the query windows, I began an explicit transaction (in order to make SQL hold the locks) and ran an update statement against the 2 rows I choose a moment ago...


(Session 53)
BEGIN TRANSACTION
UPDATE ConcurrencyTest
SET TextCol = 'UpdatedValue'
WHERE PKCol = 1


(Session 57)
BEGIN TRANSACTION
UPDATE ConcurrencyTest
SET TextCol = 'UpdatedValue'
WHERE PKCol = 5000
...I set both queries running, and a quick glance at sys.dm_exec_requests...


SELECT session_id, blocking_session_id, wait_resource
FROM sys.dm_exec_requests
WHERE blocking_session_id > 0
...showed that there were no blocks (as you would expect). The same would hold true if I dropped the Clustered index and created a non-clustered index on the TextCol column (Although I will let you prove that for yourself)

Looking at sys.dm_tran_locks, shows that exclusive locks have been taken on the keys, and Intent exclusive locks have been taken on the pages and also the table...


SELECT * FROM sys.dm_tran_locks
WHERE resource_type <> 'DATABASE'
GO


...Again, this is exactly as you would expect. The interesting behavior occurs when all indexes are removed from the table. I dropped the clustered index, rolled back the 2 transactions, and then ran queries again. This time, when I looked at sys.dm_exec_requests, I saw the following results...



...As you can see, this time there is a resource wait on the RID of the first row, even though the where clauses would prevent the same row being updated by both queries.

A look at sys.dm_tran_locks, shows slightly different locks and holds the reason for this contention...



...Instead of locking an index key, SQL has had to take out locks on the RIDs, which are an internal unique identifier found on heaps.

Basically, if there is no clustered index on a table, and the column being updated is not included in the key of a non-clustered index, then SQL has no key to lock and instead locks the RIDs. However, if there is no index, then the data can be stored in any order, which means that a row can move between slots on a page, or if the page is full, could even move to a different page. Therefore, SQL can not allow concurrency across the 2 updates, because although unlikely, there is no guarantee that the 2 updates will not conflict with each other, and SQL Server does not support lost updates.

The moral of the story? Always use indexes....!

Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

Thursday 23 December 2010

SQL Server 2011 - EXECUTE Proc WITH RESULT SETS

Another new feature of Denali, is the ability to enforce that the results of a stored procedure meet your client app's data contract, without having to modify the stored procedure. This is pretty useful, as it will allow you to have one stored procedure, servicing multiple apps, without the need for custom formatting at the front-end.
It will allow you to perform implicit conversions, and even change the local of a column, with the EXECUTE statement. Cool, eh?

I created the following Proc in the AdventureWorks2008 Database...


CREATE PROCEDURE ResultSetsDemo
AS
BEGIN
     SELECT  FirstName
     ,       MiddleName
     ,       LastName
     ,       ModifiedDate
     FROM Person.Person
END

...Then executed the procedure with the following statement...

...This produced the following results...


...As you can see, it has CAST the modified date to a DATE column, and the Firstname column has also had it's collation changed.

I then altered the procedure as below...

...And executed it again with the following code...

...And it failed with the following, (slightly missleading) error...

Msg 8114, Level 16, State 2, Procedure ResultSetsDemo, Line 12
Error converting data type nvarchar to nvarchar.
..In actual fact, the failure is because there are NULL values in MiddleName, and I have specified NOT NULL. (I though it would be kind of nice if it implicitly filtered out the NULLs, but I can see why it doesn't!).

Running the EXECUTE statement again, with the NOT NULL constraint removed, produces the expected results...



EXEC ResultSetsDemo
WITH RESULT SETS
(
     (
     Firstname  VARCHAR(50) COLLATE Albanian_BIN2,
     MiddleName  CHAR(5),
     LastName  VARCHAR(50),
     ModifiedDate DATE
     ),
     (
     FirstName VARCHAR(50),
     MiddleName CHAR(50),
     LastName VARCHAR(50)
     )
)

EXEC ResultSetsDemo
WITH RESULT SETS
(
     (
     Firstname  VARCHAR(50) COLLATE Albanian_BIN2,
     MiddleName  CHAR(5),
     LastName  VARCHAR(50),
     ModifiedDate DATE
     ),
     (
     FirstName VARCHAR(50),
     MiddleName CHAR(50) NOT NULL,
     LastName VARCHAR(50)
     )
)

ALTER PROCEDURE ResultSetsDemo
AS
BEGIN
     SELECT  FirstName
     ,       MiddleName
     ,       LastName
     ,       ModifiedDate
     FROM Person.Person


     SELECT FirstName
     ,      MiddleName
     ,      LastName
     FROM Person.Person
END

EXEC ResultSetsDemo
WITH RESULT SETS
(
    (
     Firstname VARCHAR(50) COLLATE Albanian_BIN2,
     MiddleName CHAR(5),
     LastName VARCHAR(50),
     ModifiedDate DATE
     )
)

Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

Tuesday 21 December 2010

Using Extended Events in SQL Server

Extended events are the result of full WMI integration with the SQL Server environment, and allow you to monitor virtually any event in the SQL environment, even the Configuration Manager. WMI integration has been in SQL since 2005, you may even have used it without knowing. For example, have you ever created a DDL trigger? If so, you have harnessed the WMI SQL provider.
In Denali, Microsoft have really ramped-up there publicity, even giving them a GUI in Object Explorer.
The example I want to talk about, however, will work fine in SQL 2008, as well as Denali, and that is how to use them to monitor Page Splits on your Server. This can be helpful in tracking down a rouge process, in the event that you are experiencing unexpectedly high levels of index fragmentation.

To do this, the first thing that you need to do, is create an Event Session on the server. This is similar to setting up an Event Notification, and is using the same underlying technology.
In this case, we will want to add an event, that looks at the sqlserver.pagesplits event specifier.  Within this event, we will need to add an action, and in here, we will specify the data that we want to capture. For our purposes, we will want to capture the Session ID, the client app, the Database ID and the SQL statement that caused the page split to occur.
Next, we will need to add a target. Here, we are giving a target specifier, made up of the event package name, and the target type package0.ring_buffer, and we will use an option that adds 5 seconds latency, by holding the event in buffer for 5 seconds before writing it. We need to use the ring_buffer event, because we want the raw data, rather than rolled-up aggregates.

The syntax to create this Event Session is as follows…

…The next step, is to turn the session on. This is done with a simple ALTER EVENT SESSION statement…

…Now, as I am running this on a test server, I created a table with the following definition…

…And this will allow me to generate some page splits, by using the following “lazy man’s loop” (Don’t forget to cancel the query when you are finished!!!)…

…I then ran the following query, which interrogates the event, through the dm_xe_session_targets DMO…

…This query produces an XML document, showing the page splits that have occurred. If you look for spikes in here, it can help you track down a problem process.
SELECT CAST(xst.target_data as xml) FROM sys.dm_xe_session_targets xst
join sys.dm_xe_sessions xe
ON (xe.address = xst.event_session_address)
WHERE xe.name = 'pagesplits'

WHILE 1=1
BEGIN
 INSERT INTO test(col1)
 VALUES (1)
END

CREATE TABLE [dbo].[test](
 [col1] [int] NULL
)

GO
CREATE NONCLUSTERED INDEX [test] ON [dbo].[test]
(
 [col1] ASC
)

ALTER EVENT SESSION pagesplits ON SERVER STATE = start

CREATE EVENT SESSION pagesplits ON SERVER
ADD EVENT sqlserver.page_split
(
ACTION
     (
     sqlserver.session_id,
     sqlserver.client_app_name,
     sqlserver.database_id ,
     sqlserver.sql_text
     )
)
ADD TARGET package0.ring_buffer
WITH (max_dispatch_latency=5seconds)

Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

Sunday 19 December 2010

SQL Server 2011 - SEQUENCE Part II


Following my first post of the new Sequence object in Denali, and possibly promoted by the disturbingly  "cursor-like" syntax, I decided to see how the feature performs, like for like, against the Identity column property.

To do this, I altered my original sequence by using the following syntax...


ALTER SEQUENCE Sales.SalesOrderNumbers 
--AS INT   
MINVALUE 1  
NO MAXVALUE   
RESTART WITH 1
INCREMENT BY 1

...I then created a control table called dbo.Control, that contains one column, called col1, which holds the number 1 through 1000000. I then created 2 tables with the following code...


CREATE TABLE Sales.SalesOrdersSequence
(
EntID INT,
OrderNumber INT DEFAULT  (NEXT VALUE FOR Sales.SalesOrderNumbers)
)

CREATE TABLE Sales.SalesOrdersIdentityCol
(
EntID INT,
OrderNumber INT IDENTITY(1,1)
)

...Next, to ensure a fair test, I tore down the procedure and buffer caches with the following statements...


DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

...and in Options, I turned on the options to record Time statistics. I then ran the following INSERT statement, and included the actual execution plan...


INSERT INTO Sales.SalesOrdersIdentityCol (EntID)
SELECT col1 FROM Controltbl

...I returned the following statistics and query plan...

 SQL Server Execution Times:
   CPU time = 6536 ms,  elapsed time = 17100 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.




...I then tore down the caches once again, before running the following INSERT statement against the table using the sequence object...

INSERT INTO Sales.SalesOrdersSequence (EntID)
SELECT col1 FROM Controltbl

...Although the execution plan was (unsurprisingly) identical, the following stats were returned...

 SQL Server Execution Times:
   CPU time = 4758 ms,  elapsed time = 10098 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.


...Frankly, I was very pleasantly surprised by the results. The second INSERT took only 59% of the time, and caused only 73% of the CPU overhead. This means that replacing your IDENTITY Columns with references to a sequence object could have a positive effect on Bulk Inserts.

Find my book, Pro SQL Server Administration on Amazon -


SQL Server 2011 - Offset

One very simple, but very useful new feature of Denali, is the OFFSET clause, which can be used in conjunction with ORDER BY, to return just a specified number of rows from a table.

For example, if I run a SELECT * against the Sales.SalesOrderDetail table in the AdventureWorks2008 database, I return 121317 rows, starting with SalesOrderDetailID 1.

If however, I run the following query...


SELECT
     *
FROM
     Sales.SalesOrderDetail
ORDER BY
     SalesOrderDetailID
          OFFSET 10 ROWS
...I skip the first 10 rows, based on SalesOrderDetailID, and return the following results...


...As you can see, I have only returned 121307 rows, and have missed out the first 10 rows, based on SalesOrderDetailID.

I can also limit, how many rows will be returned after the offset. For example, if I run the following query...


SELECT
    *
FROM
    Sales.SalesOrderDetail
ORDER BY
    SalesOrderDetailID
         OFFSET 100 ROWS
         FETCH NEXT 10 ROWS ONLY
..I return only 10 rows, with SalesOrderDetailIDs from 101 through to 110, as you can see below...



...One big bonus, is that the OFFSET statement can be fully parametrised, so for example, I could run the following query to return identical results...


DECLARE @Offset INT
DECLARE @Limit INT
SET @Offset = 100
SET @Limit = 10
SELECT
    *
FROM
    Sales.SalesOrderDetail
ORDER BY
    SalesOrderDetailID
         OFFSET @Offset ROWS
         FETCH NEXT @Limit ROWS ONLY

Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

SQL Server 2011 - SEQUENCE Part I

I have started playing with CTP1 of SQL Server 2011, (codename Denali). As you would expect, there are not a massive amount of new features included in this "first look", but over the Christmas period, I intend to post about some of the features that are available. The first of these is a long-awaited feature called a Sequence.

This is essentially a hugh extension to an identity, which only allowed you to specify a seed and an increment.

The first thing you notice about a sequence, is that it is not a column property, it is a separate, schema bound object. This means that you can run DDL statements against them, such as CREATE SEQUENCE, ALTER SEQUENCE, etc, and also means that you can run queries against them, to find out the next value, etc. Of course the biggest benefit, however, is that you can maintain a sequence across multiple tables.

The sequence object supports features, including Start with (equivalent of seed on an identity column), Min Value, Max Value, Increment (equivalent of Increment on an identity column)

I created a sequence object in the AdventureWorks2008 database, that mirrored a bog standard IDENTITY(1,1) column, to use as a starting point. To do this, I used the following statement...


CREATE SEQUENCE Sales.SalesOrderNumbers 
AS INT   
MINVALUE 1  
NO MAXVALUE   
START WITH 1
...I then used the following query, to pull back the first 3 numbers in the sequence...


SELECT NextOrderID=NEXT VALUE FOR Sales.SalesOrderNumbers
UNION ALL    
SELECT NEXT VALUE FOR Sales.SalesOrderNumbers
UNION ALL    
SELECT NEXT VALUE FOR Sales.SalesOrderNumbers
...And saw the following results...


..When I then ran the query again, it gave me the next 3 numbers...


...The next thing I tried, was resetting the sequence. To do this, I ran the following statement...


ALTER SEQUENCE Sales.SalesOrderNumbers 
--AS INT   
MINVALUE 1  
NO MAXVALUE   
RESTART WITH 1
...Running the SELECT statement again, following this restart, produced the following output...


Next, I tried out the increment, by altering the sequence object with the following command...


ALTER SEQUENCE Sales.SalesOrderNumbers 
--AS INT   
MINVALUE 1  
NO MAXVALUE   
RESTART WITH 100
INCREMENT BY 10
...After this alteration, the SELECT statement produces the following results...

...I then changed the sequence object to...


ALTER SEQUENCE Sales.SalesOrderNumbers 
--AS INT   
MINVALUE 1  
NO MAXVALUE   
RESTART WITH 100
INCREMENT BY -10
...To produce the following results...


...Finally, I created a table with the following statement...


CREATE TABLE Sales.SalesOrders
(
EntID INT,
OrderNumber INT DEFAULT (NEXT VALUE FOR Sales.SalesOrderNumbers)
)
...and with the sequence altered as follows...


ALTER SEQUENCE Sales.SalesOrderNumbers 
--AS INT   
MINVALUE 1  
NO MAXVALUE   
RESTART WITH 1
INCREMENT BY 10
...I ran the following INSERT statement...


INSERT INTO Sales.SalesOrders
 (
 EntID
 )
VALUES
 (1),
 (2),
 (3)
...A SELECT statement from Sales.SalesOrders table, then produced the following results...

...In summary, sequence seems easy to use, and adds some really useful functionality to SQL Server

Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

What files & Pages are is my table stored on?


Where SQL Server is such an IO bound application, designing how your data is stored across files and pages is one of the most important things you can do, from a performance perspective.

There are many simple things you can do to improve IO performance in SQL Server, and I intend to post about some of these techniques over the next few weeks. But first, considering SQL's proportional fill algorithm, how can you even definitely discover what files your rows are stored on, let alone which pages?

I must admit that, until recently, I had no efficient way of doing this, but at the moment, I am lucky enough to be working alongside Charles Feddersen, who is a Senior SQL Server consultant at Avanade,  (and a bit of a SQL legend) and he showed me the following tip -

If you run the following query against the AdventureWorks2008R2 Database, you will return all rows from the SalesOrderDetail table, but alongside that, you will return an extra column, returned by the function, that will show you the File:Page:Slot that each row is stored in. You can then use DBCC Page against the pages, to see how the data is being stored.



SELECT sys.fn_PhysLocFormatter(%%physloc%%) AS 'File:Page:Slot', *
FROM AdventureWorks2008R2.Sales.SalesOrderDetail


Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

Managing policies with PowerShell

Whilst Powershell is undoubtedly an incredibly powerful tool, in SQL Server 2008, Microsoft did not seem to get a chance to finish off the help files, and although it is included to allow you to automate the management of policies, there seems to be very little written on the web about how to do this, so let me try and unravel a few mysteries for you…
Firstly, lets assume that we have three policies stored in the file system. They are in a folder named c:\policies and are called MyPolicy1.xml, MyPolicy2.xml and MyPolicy3.xml.
Now assume that you want to evaluate both of these policies against your sever, but you also want to enforce the policies at the same time. From within PowerShell, the first thing that we need to do is navigate to our folder in the file system…
sl "c:\policies\"
Now we are in the correct location, we need to evaluate them, but to enforce the policies, we will also set an ad-hoc evaluation mode…
 
invoke-policyevaluation -policy "name of pol.xml", "name of pol2.xml" -adhocpolicyevaluationmode "configure"
That is useful, but more than likely, or policies will not be stored in the file system, they will be stored in our SQL Server instance, and if you have many policy objects, you may also be taking advantage of policy categories. So lets navigate back to our Policy Store in SQL…
 
sl sqlserver:\sqlpolicy\ny-srv-01\default\policies
Now for the slightly more complex statement. Here, we are going to use | as a for loop, to loop through the policies stored in a category called MyCategory1. We are also going to use > to send the results out to a xml file, so that we have an audit, rather than having to view the results manually…
 
get-childitem | where-object {$_.policycategory -eq "new2" } | invoke-policyevaluation -targetserver "ny-srv-01" -outputxml > "c:\policies\policyeval.xml“
This is a simple, but classic example of combining PowerShell with SMO & Policies. If you want you can go much further than this, and on CodePlex, you will find a community project called the EPM Framework, which uses PowerShell in conjunction with Reporting Services to produce a fantastic compliance tool. http://epmframework.codeplex.com/

Find my book, Pro SQL Server Administration on Amazon -


Shredding XML with OPENXML & Nodes Part III

In Parts I & II, I looked at the different ways of shredding XML into relational data. The only question that remains now, is when should you use each one?
Well, firstly, there is a scenario in which the nodes method is the only way forward; and that is if you want to shred data from multiple rows within a table. For example, imagine you have created a table using the following code...


CREATE TABLE Products(
ProductID   int IDENTITY(1,1),
ProductDescription      nvarchar(100),
ProductReviews    XML)
ON [PRIMARY]

…If this table was holding the following data…

clip_image001

…How would you go about extracting a relational list of comments? Well you could not use the OPENXML function, because you would need to call it once for each row, so the trick is, to use the Nodes method, with the T-SQL CROSS APPLY operator.
If you ran the following code…


SELECT ProductDescription,
       TempCol.value('text()[1]', 'NVARCHAR(500)') Review
FROM Products
CROSS APPLY ProductReviews.nodes('/Review/Comments') AS
TempTable(TempCol)

…You would see this output…

clip_image002

…Here, we are using the nodes and value methods in the same way as we were in Part II. The only differences are that we are now implementing CROSS APPLY, which is calling the nodes method against every row in the result set, and passing the results into our temporary table structure.
Secondly, that because we are reading data from an element, as opposed to an attribute, we are using the text() function, instead of using @attribute.
But what about performance? Are there any performance considerations to take into account, if we are shredding a single variable, and hence have the choice of either Nodes or OPENXML?
Well, if we run the OPENXML code from Part I again, but this time prefixing it with the following statement…


SET STATISTICS TIME ON
GO

We will see the following statistics in the Messages Tab of the Results Pane…


…Giving us a total execution time of 540ms, with a total CPU time of 20ms.
If we ran the equivalent code using the Nodes method however, (see Part II) we would get the following statistics…   

clip_image004

…Here, you will notice a total execution time of just 113ms, but at the expense of 90ms CPU time. (Over 4 times quicker, but at a cost of more than 4 times the pressure on the processor sub-system.)
If you scale these statistics out to a larger amount of data, and consider realistic pressure on the processor from other processes, the result is that the if you are only returning a small number of columns, from an XML document, nodes will usually perform better. However, the more columns you are returning, the more Value method statements you will be performing, and the more pressure you will put on the processor, so as the number of columns returned increases, you will see OPENXML start to catch up, and eventually overtake the Nodes method on execution time. The more existing pressure there is on the processor, the faster this will happen.


Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

Shredding XML with OPENXML & NODES - PART II

...In Part I, I discussed how to shred XML data by using the OPENXML method. So how would we perform this same operation by using the Nodes XQuery method? Well, firstly, we would once again need to create an XML variable, and assign our document to it...


DECLARE @XML_Document XML

SET @XML_Document = N'<SalesOrder>
  <soh SalesOrderID="43667" OrderDate="2001-07-01T00:00:00" CustomerID="646" TotalDue="8095.7863">
    <sod ProductID="710" OrderQty="3" UnitPrice="5.7000" />
    <sod ProductID="773" OrderQty="1" UnitPrice="2039.9940" />
    <sod ProductID="778" OrderQty="1" UnitPrice="2024.9940" />
    <sod ProductID="775" OrderQty="1" UnitPrice="2024.9940" />
  </soh>
  <soh SalesOrderID="44287" OrderDate="2001-10-01T00:00:00" CustomerID="646" TotalDue="21601.0565">
    <sod ProductID="773" OrderQty="2" UnitPrice="2039.9940" />
    <sod ProductID="777" OrderQty="2" UnitPrice="2024.9940" />
    <sod ProductID="776" OrderQty="1" UnitPrice="2024.9940" />
    <sod ProductID="709" OrderQty="3" UnitPrice="5.7000" />
    <sod ProductID="771" OrderQty="3" UnitPrice="2039.9940" />
  </soh>
  <soh SalesOrderID="45045" OrderDate="2002-01-01T00:00:00" CustomerID="646" TotalDue="18909.0911">
    <sod ProductID="709" OrderQty="2" UnitPrice="5.7000" />
    <sod ProductID="778" OrderQty="1" UnitPrice="2024.9940" />
    <sod ProductID="771" OrderQty="1" UnitPrice="2039.9940" />
    <sod ProductID="774" OrderQty="1" UnitPrice="2039.9940" />
    <sod ProductID="773" OrderQty="3" UnitPrice="2039.9940" />
    <sod ProductID="776" OrderQty="1" UnitPrice="2024.9940" />
  </soh>
  <soh SalesOrderID="45785" OrderDate="2002-04-01T00:00:00" CustomerID="646" TotalDue="54042.8077">
    <sod ProductID="715" OrderQty="2" UnitPrice="28.8404" />
    <sod ProductID="774" OrderQty="4" UnitPrice="2039.9940" />
    <sod ProductID="771" OrderQty="3" UnitPrice="2039.9940" />
    <sod ProductID="773" OrderQty="2" UnitPrice="2039.9940" />
    <sod ProductID="772" OrderQty="2" UnitPrice="2039.9940" />
    <sod ProductID="709" OrderQty="2" UnitPrice="5.7000" />
    <sod ProductID="710" OrderQty="2" UnitPrice="5.7000" />
    <sod ProductID="712" OrderQty="4" UnitPrice="5.1865" />
    <sod ProductID="775" OrderQty="2" UnitPrice="2024.9940" />
    <sod ProductID="776" OrderQty="3" UnitPrice="2024.9940" />
    <sod ProductID="778" OrderQty="2" UnitPrice="2024.9940" />
    <sod ProductID="777" OrderQty="2" UnitPrice="2024.9940" />
  </soh>
  <soh SalesOrderID="61206" OrderDate="2004-01-01T00:00:00" CustomerID="646" TotalDue="50.2703">
    <sod ProductID="869" OrderQty="1" UnitPrice="41.9940" />
  </soh>
</SalesOrder>'

...We will now combine the Value method, with the Nodes method, to retrieve a relation result set...

SELECT temp_column.value('../@SalesOrderID', 'int') AS 'SalesOrderID',
temp_column.value('../@OrderDate', 'datetime') AS 'OrderDate',
temp_column.value('../@CustomerID', 'int') AS 'CustomerID',
temp_column.value('../@TotalDue', 'decimal') AS 'TotalDue',
temp_column.value('@ProductID', 'int') AS 'ProductID',
temp_column.value('@OrderQty', 'int') AS 'OrderQty',
temp_column.value('@UnitPrice', 'decimal') AS 'UnitPrice'
FROM @xml_document.nodes('/SalesOrder/soh/sod') temp_table(temp_column)

...This statement will produce an identical result set to the OPENXML statement we discussed earlier...



...So how is this statement working? Well, at first glance, the syntax can be quite confusing. The trick to understanding it, is to remember that SQL Server does not actually process a statement in the order that you write it. The actual order of processing here is as follows...
Create a temporary table called temp_table, containing one column called temp_column.
Populate temp_column with the results of the nodes statement, which is a parsed version of our XML document.
Repeatedly run the value method against the temporary column, once for each node you wish to extract, mapping each node to a relational column and data type.
It would also be possible to use this process to what I like to call ‘semi-shred’ the data. By that, I mean to split the document into a row set, but with each row containing an XML document. In this example, there would be five rows, each containing one sales order.
To do this, the syntax would be similar, but we will replace the value method with the query method, which returns an XML fragment, as opposed to a scalar value...


SELECT temp_column.query('.')
FROM @xml_document.nodes('/SalesOrder/soh') temp_table(temp_column)

...The row set returned will look like this... 


...The first XML fragment in the row set would look like this...

<soh SalesOrderID="43667" OrderDate="2001-07-01T00:00:00" CustomerID="646" TotalDue="8095.7863">
  <sod ProductID="710" OrderQty="3" UnitPrice="5.7000" />
  <sod ProductID="773" OrderQty="1" UnitPrice="2039.9940" />
  <sod ProductID="778" OrderQty="1" UnitPrice="2024.9940" />
  <sod ProductID="775" OrderQty="1" UnitPrice="2024.9940" />
</soh>

...So now we know how to shred XML using both the OPENXML function and the nodes method, the question remains, when should we use each option? A good question, and one that I will answer in Part III...