Sunday, 19 December 2010

Shredding XML with OPENXML & NODES - PART I

In SQL Server 2005 & 2008, there are two ways of shredding XML into relational data. The first is the ‘traditional method’ of using OPENXML, but there is also the more ‘contemporary’ method of using the Nodes XQuery method, but which should you use and when?
Well, before we begin to tackle that question, lets look at how to implement each technique. But before we can shred XML, we will need an XML document, so the first thing I will do, is generate a very simple XML document with the following statement...

SELECT soh.SalesOrderID, soh.OrderDate, soh.CustomerID, soh.TotalDue, sod.ProductID, sod.OrderQty, sod.UnitPrice
FROM Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesOrderDetail sod
ON soh.SalesOrderID = sod.SalesOrderID
WHERE soh.CustomerID = 646
FOR XML AUTO, ROOT('SalesOrder')
...So now, to shred this data using the OPENXML function. The first thing that we will need to do is to declare two variables. The first will hold this XML document, and the second will be an Integer, that will be used to point to a parsed version of the document that will be stored in memory...

SET @XML_Document = N'<SalesOrder>
  <soh SalesOrderID="43667" OrderDate="2001-07-01T00:00:00" CustomerID="646" TotalDue="8095.7863">
    <sod ProductID="710" OrderQty="3" UnitPrice="5.7000" />
    <sod ProductID="773" OrderQty="1" UnitPrice="2039.9940" />
    <sod ProductID="778" OrderQty="1" UnitPrice="2024.9940" />
    <sod ProductID="775" OrderQty="1" UnitPrice="2024.9940" />
  <soh SalesOrderID="44287" OrderDate="2001-10-01T00:00:00" CustomerID="646" TotalDue="21601.0565">
    <sod ProductID="773" OrderQty="2" UnitPrice="2039.9940" />
    <sod ProductID="777" OrderQty="2" UnitPrice="2024.9940" />
    <sod ProductID="776" OrderQty="1" UnitPrice="2024.9940" />
    <sod ProductID="709" OrderQty="3" UnitPrice="5.7000" />
    <sod ProductID="771" OrderQty="3" UnitPrice="2039.9940" />
  <soh SalesOrderID="45045" OrderDate="2002-01-01T00:00:00" CustomerID="646" TotalDue="18909.0911">
    <sod ProductID="709" OrderQty="2" UnitPrice="5.7000" />
    <sod ProductID="778" OrderQty="1" UnitPrice="2024.9940" />
    <sod ProductID="771" OrderQty="1" UnitPrice="2039.9940" />
    <sod ProductID="774" OrderQty="1" UnitPrice="2039.9940" />
    <sod ProductID="773" OrderQty="3" UnitPrice="2039.9940" />
    <sod ProductID="776" OrderQty="1" UnitPrice="2024.9940" />
  <soh SalesOrderID="45785" OrderDate="2002-04-01T00:00:00" CustomerID="646" TotalDue="54042.8077">
    <sod ProductID="715" OrderQty="2" UnitPrice="28.8404" />
    <sod ProductID="774" OrderQty="4" UnitPrice="2039.9940" />
    <sod ProductID="771" OrderQty="3" UnitPrice="2039.9940" />
    <sod ProductID="773" OrderQty="2" UnitPrice="2039.9940" />
    <sod ProductID="772" OrderQty="2" UnitPrice="2039.9940" />
    <sod ProductID="709" OrderQty="2" UnitPrice="5.7000" />
    <sod ProductID="710" OrderQty="2" UnitPrice="5.7000" />
    <sod ProductID="712" OrderQty="4" UnitPrice="5.1865" />
    <sod ProductID="775" OrderQty="2" UnitPrice="2024.9940" />
    <sod ProductID="776" OrderQty="3" UnitPrice="2024.9940" />
    <sod ProductID="778" OrderQty="2" UnitPrice="2024.9940" />
    <sod ProductID="777" OrderQty="2" UnitPrice="2024.9940" />
  <soh SalesOrderID="61206" OrderDate="2004-01-01T00:00:00" CustomerID="646" TotalDue="50.2703">
    <sod ProductID="869" OrderQty="1" UnitPrice="41.9940" />
...The next step, is to create the parsed version of the document in memory. To do this, we pass the two variables that we have just created into the sp_xml_preparedocument procedure...
EXEC sp_xml_preparedocument @handle OUTPUT, @xml_document
You will notice that in this statement, I have specified the OUTPUT keyword after our @handle parameter. This is because the pointer to the parse tree is being assigned to this variable, and from now on, we will no longer reference our original XML document, we will only ever reference the pointer.
So now, we will use the OPENXML function itself, to produce our relational result set...

OPENXML ( @handle, '/SalesOrder/soh/sod', 1)
WITH ( SalesOrderID     int         '../@SalesOrderID',
       OrderDate        datetime    '../@OrderDate',
       CustomerID       int         '../@CustomerID',
       TotalDue         decimal     '../@TotalDue',
       ProductID        int         '@ProductID',
       OrderQty         int         '@OrderQty',
       UnitPrice        decimal     '@UnitPrice')
...Here, we are specifying that we want to select all of the relational columns that we are mapping to the nodes within the XML document.
The three variables that we are passing into the OPENXML statement are the handle, the row pattern and the mapping flag. The row pattern specifies the lowest node level of the XML document that we will be extracting data from, and the mapping flag specifies how nodes will be mapped to relational columns. This parameter must always be specified, but in this example has no effect, because we are explicitly mapping all nodes. This parameter comes into its own, if we are using auto mapping, which I will discuss in a later post.
In the WITH clause, we are specifying the nodes that we wish to return, and mapping them to relational columns and data types. The @ symbol, is specifying that the node is an attribute. If the node was an element, we would omit this symbol. Where you see “../”, we are moving up one level of the node hierarchy. If we wanted to move up two levels, we would state “../../”, three levels would be “../../../”, etc.
For example, UnitPrice is an attribute at the lowest level that we specified in the row pattern, so there is no need to move up the hierarchy to find the node. Hence, there is no “../”. However, the SalesOrderID attribute is at the next level of the node hierarchy, so we have specified “../”.
The last thing that we need to do, is to remove the handle to the parsed document. As OPENXML is an expensive operation, this is VERY important. Think of it like de-allocating a cursor! We will perform this clean-up with the following statement...
EXEC sp_xml_removedocument @handle
...The memory space can now be reallocated and reused.
This code should give the following result set...

In Part II, I will discuss how to produce the same results using the nodes XQuery method...

No comments:

Post a Comment