SQLPASS AD501-M XQuery MRys

Best Practices and
Performance Tuning of
XML Queries in SQL Server
AD-501-M

Michael Rys
Principal Program Manager
Microsoft Corp

mrys@microsoft.com
@SQLServerMike

October 11-14, Seattle, WA

Session Objectives
• Understand when and how
to use XML in SQL Server
• Understand and correct common
performance problems with XML and
XQuery

Session Agenda

XML Scenarios and when to store XML

XML Design Optimizations

General Optimizations

XML Datatype method Optimizations

XQuery Optimizations

XML Index Optimizations

AD-501-M| XQuery Performance 3

XML Scenarios

Data Exchange between loosely-coupled systems
• XML is ubiquitous, extensible, platform independent transport format
• Message Envelope in XML
Simple Object Access Protocol (SOAP), RSS, REST
• Message Payload/Business Data in XML
• Vertical Industry Exchange schemas
Document Management
• XHTML, DocBook, Home-grown, domain-specific markup (e.g.
contracts), OpenOffice, Microsoft Office XML (both default and user-
extended)
Ad-hoc modeling of semistructured data
• Storing and querying heterogeneous complex objects
• Semistructured data with sparse, highly-varying
structure at the instance level
• XML provides self-describing format and extensible schemas

→Transport, Store, and Query XML data

Decision Tree: Processing XML In SQL Server
Does the data fit
Shred the XML
the relational
Yes into relations
model?
No structured
Known sparse
Shred the structured
XML into relations, store Shred known
Is the data semi- semistructured aspects sparse data into
structured? Yes as XML and/or sparse
sparse columns
col

No Open schema

Is the XML Promote
Yes
Is the data a Search within
constrainedthe
Query into by frequently queried
document? the XML?
XML? properties
Yes schemas? relationally
No Yes

Use primary and
Constrain XML if
Store as Define a full-text secondary XML
validation XML is
Store as cost
varbinary(max) index indexes as
ok AD-501-M| needed 6
XQuery Performance

SQL Server XML Data Type Architecture

XML Relational

XML
XML Parser XML Schemata

Schema
Validation Collection

OpenXML/nodes() PATH
XML-DML XML data type Rowsets
Index
(binary XML) PRIMARY Node
Table PROP
XML INDEX with
FOR XML Index
TYPE directive
VALUE
XQuery Index


General Impacts
Concurrency Control
• Locks on both XML data type and relevant
rows in primary and secondary XML Indices
• Lock escalation on indices
• Snapshot Isolation reduces locks and lock contention
Transaction Logs
• Bulkinsert into XML Indices may fill transaction log
• Delay the creation of the XML indexes and use the SIMPLE recovery
model
• Preallocate database file instead of dynamically growing
• Place log on different disk
In-Row/Out-of-Row of XML large object
• Moving XML into side table or out-of-row if
mixed with relational data reduces scan time
Due to clustering, insertion into XML Index may not be linear
• Chose integer/bigint identity column as key

Choose The Right XML Model
• Element-centric versus attribute-centric
<Customer><name>Joe</name></Customer>
<Customer name="Joe" />
+: Attributes often better performing querying
–: Parsing Attributes uniqueness check

• Generic element names with type attribute vs Specific
element names
<Entity type="Customer">
<Prop type="Name">Joe</Prop>
</Entity>
<Customer><name>Joe</name></Customer>
+: Specific names shorter path expressions
+: Specific names no filter on type attribute
/Entity[@type="Customer"]/Prop[@type="Name"] vs /Customer/name

• Wrapper elements
<Orders><Order id="1"/></Orders>
+: No wrapper elements smaller XML, shorter path expressions

Use an XML Schema Collection?

Using no XML Schema (untyped XML)
• Can still use XQuery and XML Index!!!
• Atomic values are always weakly typed strings
compare as strings to avoid runtime
conversions and loss of index usage
• No schema validation overhead
• No schema evolution revalidation costs

XML Schema provides structural information
• Atomic typed elements are now using only one instead of two
rows in node table/XML index (closer to attributes)
• Static typing can detect cardinality and feasibility of expression

XML Schema provides semantic information
• Elements/attributes have correct atomic
type for comparison and order semantics
• No runtime casts required and better use of index for value lookup


XQuery Methods

query() creates new, untyped XML data type
instance

exist() returns 1 if the XQuery expression returns
at least one item, 0 otherwise

value() extracts an XQuery value into the SQL
value and type space
• Expression has to statically be a singleton
• String value of atomized XQuery item is cast to
SQL type
• SQL type has to be SQL scalar type
(no XML or CLR UDT) AD-501-M| XQuery Performance 11

XQuery: nodes()

Returns a row per selected node as a special
XML data type instance
• Preserves the original structure and types
• Can only be used with the XQuery methods (but not
modify()), count(*), and IS (NOT) NULL

Appears as Table-valued Function (TVF) in
queryplan if no index present


sql:column()/sql:variable()

Map SQL value and type into XQuery values and types in context of XQuery or
XML-DML
• sql:variable(): accesses a SQL variable/parameter
declare @value int
set @value=42
select * from T
where
T.x.exist('/a/b[@id=sql:variable("@value")]')=1
• sql:column(): accesses another column value

tables: T(key int, x xml), S(key int, val int)

select * from T join S on T.key=S.key
where T.x.exist('/a/b[@id=sql:column("S.val")]')=1

• Restrictions in SQL Server:
No XML, CLR UDT, datetime, or deprecated text/ntext/image

Improving Slow XQueries, Bad
FOR XML
demo


Optimal Use Of Methods
How to Cast from XML to SQL

BAD:
CAST( CAST(xmldoc.query('/a/b/text()') as
nvarchar(500)) as int)
GOOD:
xmldoc.value('(/a/b/text())[1]', 'int')
BAD:
node.query('.').value('@attr',
'nvarchar(50)')
GOOD:
node.value('@attr', 'nvarchar(50)')


Grouping value() method
Group value() methods on same XML instance next to
each other if the path expressions in the value()
methods are
• Simple path expressions that only use child and attribute axis
and do not contain wildcards, predicates, node tests, ordinals
• The path expressions infer statically a singleton

The singleton can be statically inferred from
• the DOCUMENT and XML Schema Collection
• Relative paths on the context node provided by the nodes()
method

Requires XML index to be present

Optimal Use of Methods
Using the right method to join and compare

Use exist() method, sql:column()/sql:variable() and an
XQuery comparison for checking for a value or joining
if secondary XML indices present
BAD:*
select doc
from doc_tab join authors
on doc.value('(/doc/mainauthor/lname/text())[1]',
'nvarchar(50)') = lastname
GOOD:
select doc
from doc_tab join authors
on 1 = doc.exist('/doc/mainauthor/lname/text()[. =
sql:column("lastname")]')
* If applied on XML variable/no index present, value()
method is most of the time more efficient

Optimal Use of Methods
Avoiding bad costing with nodes()
nodes() without XML index is a Table-valued function (details later)
Bad cardinality estimates can lead to bad plans
• BAD:
select c.value('@id', 'int') as CustID
, c.value('@name', 'nvarchar(50)') as CName
from Customer, @x.nodes('/doc/customer') as N(c)
where Customer.ID = c.value('@id', 'int')
• BETTER (if only one wrapper doc element):
, c.value('@name', 'nvarchar(50)') as CName
from Customer, @x.nodes('/doc[1]') as D(d)
cross apply d.nodes('customer') as N(c)
where Customer.ID = c.value('@id', 'int')
Use temp table (insert into #temp select … from nodes()) or Table-
valued parameter instead of XML to get better estimates

Avoiding multiple method evaluations
Use subqueries
• BAD:
SELECT CASE isnumeric (doc.value(
'(/doc/customer/order/price)[1]', 'nvarchar(32)'))
WHEN 1 THEN doc.value(
'(/doc/customer/order/price)[1]', 'decimal(5,2)')
ELSE 0 END
FROM T
• GOOD:
SELECT CASE isnumeric (Price)
WHEN 1 THEN CAST(Price as decimal(5,2))
ELSE 0 END
FROM (SELECT doc.value(
'(/doc/customer/order/price)[1]',
'nvarchar(32)')) as Price FROM T) X

Use subqueries also with NULLIF()

Combined SQL And XQuery/DML Processing
SELECT x.query('…'), y FROM T WHERE …

Static Metadata
SQL Parser XQuery Parser
Phase
XML
Static Typing Static Typing Schema
Collection

Algebrization Algebrization

Static Optimization of
combined Logical and
Physical Operation Tree

Dynamic Runtime Optimization XML and
Phase and Execution of rel.
physical Op Tree Indices

New XQuery Algebra Operators
XML Reader TVF
Table-Valued Function XML Reader UDF with XPath Filter
Used if no Primary XML Index is present
Creates node table rowset in query flow
Multiple XPath filters can be pushed in to reduce node table
to subtree
Base cardinality estimate is always 10’000 rows! 
Some adjustment based on pushed path filters

XMLReader node table format example (simplified)

ID TAG ID Node Type-ID VALUE HID
1.3.1 4 (TITLE) Element 2 (xs:string) Bad Bugs #title#section#book


New XQuery Algebra Operators
UDX

• Serializer UDX
serializes the query result as XML
• XQuery String UDX
evaluates the XQuery string() function
• XQuery Data UDX
evaluates the XQuery data() function
• Check UDX
validates XML being inserted

• UDX name visible in SSMS properties window

Optimal Use Of XQuery
Atomization of nodes
Value comparisons, XQuery casts and value() method
casts require atomization of item
• attribute:
/person[@age = 42]
/person[data(@age) = 42]
• Atomic typed element:
/person[age = 42] /person[data(age) = 42]
• Untyped, mixed content typed element (adds UDX):
/person[age = 42] /person[data(age) = 42]
/person[string(age) = 42]
• If only one text node for untyped element (better):
/person[age/text() = 42]
/person[data(age/text()) = 42]
• value() method on untyped elements:
value('/person/age', 'int')
value('/person/age/text()', 'int')

String() aggregates all text nodes, prohibits index use

Casting Values
Value comparisons require casts and type promotion
• Untyped attribute:
/person[@age = 42] /person[xs:decimal(@age) = 42]
• Untyped text node():
/person[age/text() = 42]
/person[xs:decimal(age/text()) = 42]
• Typed element (typed as xs:int):
/person[salary = 3e4] /person[xs:double(salary) =
3e4]

Casting is expensive and prohibits index lookup

Tips to avoid casting
• Use appropriate types for comparison (string for untyped)
• Use schema to declare type AD-501-M| XQuery Performance 24

Maximize XPath expressions

Single paths are more efficient than twig paths
Avoid predicates in the middle of path expressions
book[@ISBN = "1-8610-0157-6"]/author[first-
name = "Davis"]
/book[@ISBN = "1-8610-0157-6"] "∩"
/book/author[first-name = "Davis"]

Move ordinals to the end of path expressions
• Make sure you get the same semantics!
• /a[1]/b[1] ≠ (/a/b)[1] ≠ /a/b[1]
• (/book/@isbn)[1] is better than/book[1]/@isbn

Maximize XPath expressions in exist()
Use context item in predicate to lengthen path in exist()
• Existential quantification makes returned node irrelevant

• BAD:
SELECT * FROM docs WHERE 1 = xCol.exist
('/book/subject[text() = "security"]')
• GOOD:
('/book/subject/text()[. = "security"]')
• BAD:
('/book[@price > 9.99 and @price < 49.99]')
• GOOD:
('/book/@price[. > 9.99 and . < 49.99]')

This does not work with or-predicate AD-501-M| XQuery Performance 26

Inefficient operations: Parent axis

Most frequent offender: parent axis with nodes()

• BAD:
select o.value('../@id', 'int') as CustID
, o.value('@id', 'int') as OrdID
from T
cross apply x.nodes('/doc/customer/orders') as N(o)

• GOOD:
, o.value('@id', 'int') as OrdID
from T cross apply x.nodes('/doc/customer') as N1(c)
cross apply c.nodes('orders') as N2(o)

Inefficient operations
Avoid descendant axes and // in the middle of path
expressions if the data structure is known.
• // still can use the HID lookup, but is less efficient

XQuery construction performs worse than FOR XML
• BAD:
SELECT notes.query('
<Customer cid="{sql:column(''cid'')}">{
<name>{sql:column("name")}</name>, /
}</Customer>')
FROM Customers WHERE cid=1
• GOOD:
SELECT cid as "@cid", name, notes as "*"
FROM Customers WHERE cid=1
FOR XML PATH('Customer'), TYPE

Optimal Use Of FOR XML
Use TYPE directive when assigning result to XML
• BAD:
declare @x xml;
set @x =
(select * from Customers for xml raw);
• GOOD:
declare @x xml;
set @x =
(select * from Customers for xml raw,
type);

Use FOR XML PATH for complex grouping and additional
hierarchy levels over FOR XML EXPLICIT

Use FOR XML EXPLICIT for complex nesting if FOR XML PATH
performance is not appropriate


XML Indices
Create XML index on XML column
CREATE PRIMARY XML INDEX idx_1 ON docs (xDoc)
Create secondary indexes on tags, values, paths
Creation:
• Single-threaded only for primary XML index
• Multi-threaded for secondary XML indexes
Uses:
• Primary Index will always be used if defined (not a cost
based decision)
• Results can be served directly from index
• SQL’s cost based optimizer will consider secondary indexes
Maintenance:
• Primary and Secondary Indices will be efficiently maintained
during updates
• Only subtree that changes will be updated
• No online index rebuild 
• Clustered key may lead to non-linear maintenance cost 
Schema revalidation still checks whole instance

Example Index Contents

insert into Person values (42,
'<book ISBN=”1-55860-438-3”>
<section>
<title>Bad Bugs</title>
Nobody loves bad bugs.
</section>
<section>
<title>Tree Frogs</title>
All right-thinking people
<bold>love</bold> tree frogs.
</section>
</book>')


Primary XML Index
CREATE PRIMARY XML INDEX PersonIdx ON Person (Pdesc)
PK XID TAG ID Node Type-ID VALUE HID
42 1 1 (book) Element 1 (bookT) null #book
42 1.1 2 (ISBN) Attribute 2 (xs:string) 1-55860-438-3 #@ISBN#book
42 1.3 3 (section) Element 3 (sectionT) null #section#book
42 1.3.1 4 (TITLE) Element 2 (xs:string) Bad Bugs #title#section#book

42 1.3.3 -- Text -- Nobody loves #text()#section#book
bad bugs.
42 1.5 3 (section) Element 3 (sectionT) null #section#book
42 1.5.1 4 (title) Element 2 (xs:string) Tree frogs #title#section#book
42 1.5.3 -- Text -- All right-thinking #text()#section#book
people
42 1.5.5 7 (bold) Element 4 (boldT) love #bold#section#book
42 1.5.7 -- Text -- tree frogs #text()#section#book

Assumes typed data; Columns and Values are simplified, see VLDB 2004 paper for details


Secondary XML Indices

XML Column Primary XML Index (1 per XML column)
in table T(id, x) Clustered on Primary Key (of table T), XID

PK XID NID TID VALUE LVALUE HID xsinil …
id x
1
1 Binary XML
1

1

2 Binary XML 2

2
1 34 1
2
3 1
2
2
3 Binary XML
3

3

3

Non-clustered Secondary Indices (n per primary Index)

Value Index Property Index Path Index


XQueries And XML
Indices
demo


Takeaway: XML Indices

PRIMARY XML Index – Use when lots of XQuery
FOR VALUE – Useful for queries where values are
more selective than paths such as
//*[.=“Seattle”]
FOR PATH – Useful for Path expressions: avoids
joins by mapping paths to hierarchical index
(HID) numbers. Example: /person/address/zip
FOR PROPERTY – Useful when optimizer chooses
other index (for example, on relational column,
or FT Index) in addition so row is already known


Shredding Approaches
Approach Complex Bulkload Server Business Programming Scale/
Shapes vs logic Performance
Midtier
SQLXML Yes with Yes midtier staging annotated very good/
Bulkload limits tables on XSD and small very good
with server, API
annotated XSLT on
schema midtier
ADO.Net No No midtier midtier, DataSet API good/good
DataSet SSIS or SSIS
CLR Table- Yes No Server Server or C#, VB limited/good
valued or midtier custom code
function midtier
OpenXML Yes No Server T-SQL declarative T- limited/good
SQL, XPath
against
variable
nodes() Yes No Server T-SQL declarative good/careful
SQL, XQuery
against var or
table

To Promote or Not Promote…
Promotion pre-calculates paths
Requires relational query
• XQuery does not know about promotion

Promotion during loading of the data
• Using any of the shredding mechanisms
• 1-to-1 or 1-to-many relationships

Promotion using computed columns
• 1-to-1 only
• Persist computed column: Fast lookup and retrieval
• Relational index on persisted computed column: Fast lookup
• Have to be precise

Promotion using Triggers
• 1-to-1 or 1-to-many relationships
• Trigger overhead

Relational View over XML data
• Filters on relational view are not pushed down due to different type/value system

Promotion using computed columns
Use a schema-bound UDF that encapsulates XQuery

Persist computed column
• Fast lookup and retrieval

Relational index on persisted computed column
• Fast lookup

Query will have to use the schema-bound UDF to match

CAVEAT: No parallel plans with a persisted computed
column based on a UDF


Use of Full-Text Index for Optimization

Can provide improvement for XQuery contains() queries

Query for documents where section title contains “optimization”

Use Fulltext index to prefilter candidates (includes false positives)

SELECT * FROM docs
WHERE contains(xCol, 'optimization')
1 = xCol.exist('
/book/section/title/text()[contains(.,"optimization")]
AND 1 = xCol.exist('
')
/book/section/title/text()[contains(.,"optimization")]
')


Futures: Selective XML Index
CREATE SELECTIVE XML INDEX pxi_index ON Tbl(xmlcol)
FOR (
-– the first four match XQuery predicates
-- in all XML data type methods

-- simple flavor - default mapping (xs:untypedAtomic),
-- no optimization hints
node42 = ‘/a/b’,
pathatc = ‘/a/b/c/@atc’,

-- advanced flavor - use of optimization hints
path02 =‘/a/b/c’ as XQUERY ‘xs:string’ MAXLENGTH(25),
node13 = ‘/a/b/d’ as XQUERY ‘xs:double SINGLETON,

-– the next two match value() method
-- require regular SQL Server type semantics
-- they can be mixed with the XQUERY ones
-- specifying a type is mandatory for the SQL type semantics

pathfloat = ‘/a/b/c’ as SQL FLOAT,
pathabd = ‘/a/b/d’ as SQL VARCHAR(200)
)

Session Takeaways

• Understand when and how
to use XML in SQL Server
• Understand and correct common
performance problems with XML and
XQuery
• Shred “relational” XML to relations
• Use XML datatype for semistructured
and markup scenarios
• Write your XQueries so that XML
Indices can be used
• Use persisted computed columns to
promote XQuery results (with caveat)

Related Content
Optimization whitepapers
http://msdn2.microsoft.com/en-us/library/ms345118.aspx
http://msdn2.microsoft.com/en-us/library/ms345121.aspx
General XML and Databases whitepapers
http://msdn2.microsoft.com/en-us/xml/bb190603.aspx
Online WebCasts
http://www.microsoft.com/events/series/msdnsqlserver2005.mspx#SQ
LXML
Newsgroups & Forum:
microsoft.public.sqlserver.xml
http://communities.microsoft.com/newsgroups/default.asp?ICP=sqlse
rver2005&sLCID=us
http://forums.microsoft.com/msdn/ShowForum.aspx?ForumID=89

My E-mail: mrys@microsoft.com
My Weblog: http://sqlblog.com/blogs/michael_rys/


Complete the Evaluation Form to Win!

Win a Dell Mini Netbook – every day – just for
submitting your completed form. Each session
evaluation form represents a chance to win.

Pick up your evaluation form:
• In each presentation room Sponsored by Dell
• Online on the PASS Summit website
Drop off your completed form:
• Near the exit of each presentation room
• At the Registration desk
• Online on the PASS Summit website


Thank you
for attending this session and the
2011 PASS Summit in Seattle


Microsoft SQL Microsoft Expert Pods Hands-on Labs
Server Clinic Product Pavilion Meet Microsoft SQL
Server Engineering
Work through your Talk with Microsoft SQL Get experienced through
team members &
technical issues with SQL Server & BI experts to self-paced & instructor-
SQL MVPs
Server CSS & get learn about the next led labs on our cloud
architectural guidance version of SQL Server based lab platform -
from SQLCAT and check out the new bring your laptop or use
Database Consolidation HP provided hardware
Appliance

Room 611 Expo Hall 6th Floor Lobby Room 618-620


SQLPASS AD501-M XQuery MRys

More Related Content

What's hot

Similar to SQLPASS AD501-M XQuery MRys

More from Michael Rys

Recently uploaded

SQLPASS AD501-M XQuery MRys