Sunday 19 December 2010

Shredding XML with OPENXML & NODES - PART II

...In Part I, I discussed how to shred XML data by using the OPENXML method. So how would we perform this same operation by using the Nodes XQuery method? Well, firstly, we would once again need to create an XML variable, and assign our document to it...


DECLARE @XML_Document XML

SET @XML_Document = N'<SalesOrder>
  <soh SalesOrderID="43667" OrderDate="2001-07-01T00:00:00" CustomerID="646" TotalDue="8095.7863">
    <sod ProductID="710" OrderQty="3" UnitPrice="5.7000" />
    <sod ProductID="773" OrderQty="1" UnitPrice="2039.9940" />
    <sod ProductID="778" OrderQty="1" UnitPrice="2024.9940" />
    <sod ProductID="775" OrderQty="1" UnitPrice="2024.9940" />
  </soh>
  <soh SalesOrderID="44287" OrderDate="2001-10-01T00:00:00" CustomerID="646" TotalDue="21601.0565">
    <sod ProductID="773" OrderQty="2" UnitPrice="2039.9940" />
    <sod ProductID="777" OrderQty="2" UnitPrice="2024.9940" />
    <sod ProductID="776" OrderQty="1" UnitPrice="2024.9940" />
    <sod ProductID="709" OrderQty="3" UnitPrice="5.7000" />
    <sod ProductID="771" OrderQty="3" UnitPrice="2039.9940" />
  </soh>
  <soh SalesOrderID="45045" OrderDate="2002-01-01T00:00:00" CustomerID="646" TotalDue="18909.0911">
    <sod ProductID="709" OrderQty="2" UnitPrice="5.7000" />
    <sod ProductID="778" OrderQty="1" UnitPrice="2024.9940" />
    <sod ProductID="771" OrderQty="1" UnitPrice="2039.9940" />
    <sod ProductID="774" OrderQty="1" UnitPrice="2039.9940" />
    <sod ProductID="773" OrderQty="3" UnitPrice="2039.9940" />
    <sod ProductID="776" OrderQty="1" UnitPrice="2024.9940" />
  </soh>
  <soh SalesOrderID="45785" OrderDate="2002-04-01T00:00:00" CustomerID="646" TotalDue="54042.8077">
    <sod ProductID="715" OrderQty="2" UnitPrice="28.8404" />
    <sod ProductID="774" OrderQty="4" UnitPrice="2039.9940" />
    <sod ProductID="771" OrderQty="3" UnitPrice="2039.9940" />
    <sod ProductID="773" OrderQty="2" UnitPrice="2039.9940" />
    <sod ProductID="772" OrderQty="2" UnitPrice="2039.9940" />
    <sod ProductID="709" OrderQty="2" UnitPrice="5.7000" />
    <sod ProductID="710" OrderQty="2" UnitPrice="5.7000" />
    <sod ProductID="712" OrderQty="4" UnitPrice="5.1865" />
    <sod ProductID="775" OrderQty="2" UnitPrice="2024.9940" />
    <sod ProductID="776" OrderQty="3" UnitPrice="2024.9940" />
    <sod ProductID="778" OrderQty="2" UnitPrice="2024.9940" />
    <sod ProductID="777" OrderQty="2" UnitPrice="2024.9940" />
  </soh>
  <soh SalesOrderID="61206" OrderDate="2004-01-01T00:00:00" CustomerID="646" TotalDue="50.2703">
    <sod ProductID="869" OrderQty="1" UnitPrice="41.9940" />
  </soh>
</SalesOrder>'

...We will now combine the Value method, with the Nodes method, to retrieve a relation result set...

SELECT temp_column.value('../@SalesOrderID', 'int') AS 'SalesOrderID',
temp_column.value('../@OrderDate', 'datetime') AS 'OrderDate',
temp_column.value('../@CustomerID', 'int') AS 'CustomerID',
temp_column.value('../@TotalDue', 'decimal') AS 'TotalDue',
temp_column.value('@ProductID', 'int') AS 'ProductID',
temp_column.value('@OrderQty', 'int') AS 'OrderQty',
temp_column.value('@UnitPrice', 'decimal') AS 'UnitPrice'
FROM @xml_document.nodes('/SalesOrder/soh/sod') temp_table(temp_column)

...This statement will produce an identical result set to the OPENXML statement we discussed earlier...



...So how is this statement working? Well, at first glance, the syntax can be quite confusing. The trick to understanding it, is to remember that SQL Server does not actually process a statement in the order that you write it. The actual order of processing here is as follows...
Create a temporary table called temp_table, containing one column called temp_column.
Populate temp_column with the results of the nodes statement, which is a parsed version of our XML document.
Repeatedly run the value method against the temporary column, once for each node you wish to extract, mapping each node to a relational column and data type.
It would also be possible to use this process to what I like to call ‘semi-shred’ the data. By that, I mean to split the document into a row set, but with each row containing an XML document. In this example, there would be five rows, each containing one sales order.
To do this, the syntax would be similar, but we will replace the value method with the query method, which returns an XML fragment, as opposed to a scalar value...


SELECT temp_column.query('.')
FROM @xml_document.nodes('/SalesOrder/soh') temp_table(temp_column)

...The row set returned will look like this... 


...The first XML fragment in the row set would look like this...

<soh SalesOrderID="43667" OrderDate="2001-07-01T00:00:00" CustomerID="646" TotalDue="8095.7863">
  <sod ProductID="710" OrderQty="3" UnitPrice="5.7000" />
  <sod ProductID="773" OrderQty="1" UnitPrice="2039.9940" />
  <sod ProductID="778" OrderQty="1" UnitPrice="2024.9940" />
  <sod ProductID="775" OrderQty="1" UnitPrice="2024.9940" />
</soh>

...So now we know how to shred XML using both the OPENXML function and the nodes method, the question remains, when should we use each option? A good question, and one that I will answer in Part III...

No comments:

Post a Comment