Tuesday, July 3, 2012

Hierarchical Data with FOR XML

I was playing with selection of hierarchical XML using FOR XML clause of Microsoft SQL Server. I was using AdventureWorks2008 database and wanted to select something like

<Root>
  <Group Name="Europe">
    <Country Code="DE">
      <Area>Germany</Area>
    </Country>
    <Country Code="FR">
      <Area>France</Area>
    </Country>
    <Country Code="GB">
      <Area>United Kingdom</Area>
    </Country>
  </Group>
  <Group Name="North America">
    <Country Code="CA">
      <Area>Canada</Area>
    </Country>
    <Country Code="US">
      <Area>Northwest</Area>
      <Area>Northeast</Area>
      <Area>Central</Area>
      <Area>Southwest</Area>
      <Area>Southeast</Area>
    </Country>
  </Group>
  <Group Name="Pacific">
    <Country Code="AU">
      <Area>Australia</Area>
    </Country>
  </Group>
</Root>
from Sales.SalesTerritory table.

FOR XML PATH

First, I tried to do that using FOR XML PATH. It was nothing particularly difficult. The only thing that is worth mentioning is that subqueries must be used to allow for grouping. (With grouping, I mean having multiple children inside a single parent node.)

SELECT [Group] [@Name],
        (SELECT [CountryRegionCode] [@Code],
                (SELECT t3.[Name] [text()]
                FROM Sales.SalesTerritory t3
                WHERE t3.[Group] = t2.[Group] 
                AND t3.[CountryRegionCode] = t2.[CountryRegionCode]
                ORDER BY [Group], [CountryRegionCode], [Name]
                FOR XML PATH('Area'), TYPE)
        FROM Sales.SalesTerritory t2
        WHERE t1.[Group] = t2.[Group]
        GROUP BY t2.[Group], t2.[CountryRegionCode]
        ORDER BY t2.[Group], t2.[CountryRegionCode]
        FOR XML PATH('Country'), TYPE)
FROM Sales.SalesTerritory t1
GROUP BY [Group]
ORDER BY [Group]
FOR XML PATH ('Group'), ROOT('Root')

FOR XML AUTO

It is not possible to mix attributes and elements when using FOR XML AUTO. However, if we do not mind having <Area Name="Germany" /> instead of <Area>Germany</Area>, it is still an option.

An interesting thing is that grouping is done automatically when selecting from joined tables. (Let me remind that FOR XML PATH requires subqueries for that to work.)

SELECT [Group].[Group] [Name]
 , [Country].[CountryRegionCode] [Code]
 , [Area].[Name]
FROM Sales.SalesTerritory [Group]
JOIN Sales.SalesTerritory [Country] 
 ON [Group].[Group] = [Country].[Group]
JOIN Sales.SalesTerritory [Area] 
 ON [Country].[Group] = [Area].[Group] 
 AND [Country].[CountryRegionCode] = [Area].[CountryRegionCode]
GROUP BY [Group].[Group], [Country].[CountryRegionCode], [Area].[Name]
ORDER BY [Group].[Group], [Code], [Area].[Name]
FOR XML AUTO, ROOT('Root')

FOR XML EXPLICIT

This is rather specific mode. The proper ordering is essential. (Note the Tag in ORDER BY clause.) That said, it works all right.

SELECT 1 Tag
 , NULL Parent
 , NULL [Root!1!!element]
 , NULL [Group!2!Name]
 , NULL [Country!3!Code]
 , NULL [Area!4!!element]
UNION ALL
SELECT 2, 1, NULL, [Group], NULL, NULL
FROM Sales.SalesTerritory
GROUP BY [Group]
UNION ALL
SELECT 3, 2, NULL, [Group], [CountryRegionCode], NULL
FROM Sales.SalesTerritory
GROUP BY [Group],[CountryRegionCode]
UNION ALL
SELECT 4, 3, NULL, [Group], [CountryRegionCode], [Name]
FROM Sales.SalesTerritory
ORDER BY [Group!2!Name], [Country!3!Code], Tag
FOR XML EXPLICIT

Conclusion

Both FOR XML PATH and FOR XML EXPLICIT can provide the required XML. The output of FOR XML AUTO would require an additional transformation, as it provides all data as attributes.

While FOR XML PATH is in general the most recommended mode, I did not like the fact I needed to use correlated subqueries. This very case is perhaps too tiny (10 rows, that is), yet the IO statistics are remarkable.

modescans reads
EXPLICIT36
AUTO3104
PATH10(!)140

That is, if you need to select XML and performance matters, it is always a good idea to try multiple FOR XML modes.

No comments: