Your SlideShare is downloading. ×
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply



Published on

SQLPASS presentation on performance tuning and best practices for XML and XQuery in Microsoft SQL Server 2005, SQL Server 2008, SQL Server 2008 R2 and SQL Server 2012.

SQLPASS presentation on performance tuning and best practices for XML and XQuery in Microsoft SQL Server 2005, SQL Server 2008, SQL Server 2008 R2 and SQL Server 2012.

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Best Practices andPerformance Tuning ofXML Queries in SQL ServerAD-501-MMichael RysPrincipal Program ManagerMicrosoft October 11-14, Seattle, WA
  • 2. Session Objectives• Understand when and how to use XML in SQL Server• Understand and correct common performance problems with XML and XQuery
  • 3. Session AgendaXML Scenarios and when to store XMLXML Design OptimizationsGeneral OptimizationsXML Datatype method OptimizationsXQuery OptimizationsXML Index Optimizations AD-501-M| XQuery Performance 3
  • 4. AD-501-M| XQuery Performance 4
  • 5. XML ScenariosData Exchange between loosely-coupled systems• XML is ubiquitous, extensible, platform independent transport format• Message Envelope in XML Simple Object Access Protocol (SOAP), RSS, REST• Message Payload/Business Data in XML• Vertical Industry Exchange schemasDocument Management• XHTML, DocBook, Home-grown, domain-specific markup (e.g. contracts), OpenOffice, Microsoft Office XML (both default and user- extended)Ad-hoc modeling of semistructured data• Storing and querying heterogeneous complex objects• Semistructured data with sparse, highly-varying structure at the instance level• XML provides self-describing format and extensible schemas →Transport, Store, and Query XML data AD-501-M| XQuery Performance 5
  • 6. Decision Tree: Processing XML In SQL ServerDoes the data fit Shred the XML the relational Yes into relations model? No structured Known sparse Shred the structured XML into relations, store Shred knownIs the data semi- semistructured aspects sparse data into structured? Yes as XML and/or sparse sparse columns col No Open schema Is the XML Promote Yes Is the data a Search within constrainedthe Query into by frequently queried document? the XML? XML? properties Yes schemas? relationally No Yes Use primary and Constrain XML if Store as Define a full-text secondary XML validation XML is Store as cost varbinary(max) index indexes as ok AD-501-M| needed 6 XQuery Performance
  • 7. SQL Server XML Data Type Architecture XML Relational XML XML Parser XML Schemata Schema Validation Collection OpenXML/nodes() PATHXML-DML XML data type Rowsets Index (binary XML) PRIMARY Node Table PROP XML INDEX with FOR XML Index TYPE directive VALUE XQuery Index AD-501-M| XQuery Performance 7
  • 8. General ImpactsConcurrency Control• Locks on both XML data type and relevant rows in primary and secondary XML Indices• Lock escalation on indices• Snapshot Isolation reduces locks and lock contentionTransaction Logs• Bulkinsert into XML Indices may fill transaction log• Delay the creation of the XML indexes and use the SIMPLE recovery model• Preallocate database file instead of dynamically growing• Place log on different diskIn-Row/Out-of-Row of XML large object• Moving XML into side table or out-of-row if mixed with relational data reduces scan timeDue to clustering, insertion into XML Index may not be linear• Chose integer/bigint identity column as key AD-501-M| XQuery Performance 8
  • 9. Choose The Right XML Model• Element-centric versus attribute-centric <Customer><name>Joe</name></Customer> <Customer name="Joe" /> +: Attributes often better performing querying –: Parsing Attributes uniqueness check• Generic element names with type attribute vs Specific element names <Entity type="Customer"> <Prop type="Name">Joe</Prop> </Entity> <Customer><name>Joe</name></Customer> +: Specific names shorter path expressions +: Specific names no filter on type attribute /Entity[@type="Customer"]/Prop[@type="Name"] vs /Customer/name• Wrapper elements <Orders><Order id="1"/></Orders> +: No wrapper elements smaller XML, shorter path expressions AD-501-M| XQuery Performance 9
  • 10. Use an XML Schema Collection?Using no XML Schema (untyped XML)• Can still use XQuery and XML Index!!!• Atomic values are always weakly typed strings compare as strings to avoid runtime conversions and loss of index usage• No schema validation overhead• No schema evolution revalidation costsXML Schema provides structural information• Atomic typed elements are now using only one instead of two rows in node table/XML index (closer to attributes)• Static typing can detect cardinality and feasibility of expressionXML Schema provides semantic information• Elements/attributes have correct atomic type for comparison and order semantics• No runtime casts required and better use of index for value lookup AD-501-M| XQuery Performance 10
  • 11. XQuery Methodsquery() creates new, untyped XML data typeinstanceexist() returns 1 if the XQuery expression returnsat least one item, 0 otherwisevalue() extracts an XQuery value into the SQLvalue and type space• Expression has to statically be a singleton• String value of atomized XQuery item is cast to SQL type• SQL type has to be SQL scalar type (no XML or CLR UDT) AD-501-M| XQuery Performance 11
  • 12. XQuery: nodes()Returns a row per selected node as a specialXML data type instance• Preserves the original structure and types• Can only be used with the XQuery methods (but not modify()), count(*), and IS (NOT) NULLAppears as Table-valued Function (TVF) inqueryplan if no index present AD-501-M| XQuery Performance 12
  • 13. sql:column()/sql:variable()Map SQL value and type into XQuery values and types in context of XQuery orXML-DML• sql:variable(): accesses a SQL variable/parameter declare @value int set @value=42 select * from T where T.x.exist(/a/b[@id=sql:variable("@value")])=1• sql:column(): accesses another column value tables: T(key int, x xml), S(key int, val int) select * from T join S on T.key=S.key where T.x.exist(/a/b[@id=sql:column("S.val")])=1• Restrictions in SQL Server: No XML, CLR UDT, datetime, or deprecated text/ntext/image AD-501-M| XQuery Performance 13
  • 14. Improving Slow XQueries, BadFOR XMLdemo October 11-14, Seattle, WA
  • 15. Optimal Use Of MethodsHow to Cast from XML to SQLBAD:CAST( CAST(xmldoc.query(/a/b/text()) as nvarchar(500)) as int)GOOD:xmldoc.value((/a/b/text())[1], int)BAD:node.query(.).value(@attr, nvarchar(50))GOOD:node.value(@attr, nvarchar(50)) AD-501-M| XQuery Performance 15
  • 16. Optimal Use Of MethodsGrouping value() methodGroup value() methods on same XML instance next toeach other if the path expressions in the value()methods are• Simple path expressions that only use child and attribute axis and do not contain wildcards, predicates, node tests, ordinals• The path expressions infer statically a singletonThe singleton can be statically inferred from• the DOCUMENT and XML Schema Collection• Relative paths on the context node provided by the nodes() methodRequires XML index to be present AD-501-M| XQuery Performance 16
  • 17. Optimal Use of MethodsUsing the right method to join and compare Use exist() method, sql:column()/sql:variable() and an XQuery comparison for checking for a value or joining if secondary XML indices present BAD:* select doc from doc_tab join authors on doc.value((/doc/mainauthor/lname/text())[1], nvarchar(50)) = lastname GOOD: select doc from doc_tab join authors on 1 = doc.exist(/doc/mainauthor/lname/text()[. = sql:column("lastname")]) * If applied on XML variable/no index present, value() method is most of the time more efficient AD-501-M| XQuery Performance 17
  • 18. Optimal Use of MethodsAvoiding bad costing with nodes()nodes() without XML index is a Table-valued function (details later)Bad cardinality estimates can lead to bad plans • BAD: select c.value(@id, int) as CustID , c.value(@name, nvarchar(50)) as CName from Customer, @x.nodes(/doc/customer) as N(c) where Customer.ID = c.value(@id, int) • BETTER (if only one wrapper doc element): select c.value(@id, int) as CustID , c.value(@name, nvarchar(50)) as CName from Customer, @x.nodes(/doc[1]) as D(d) cross apply d.nodes(customer) as N(c) where Customer.ID = c.value(@id, int)Use temp table (insert into #temp select … from nodes()) or Table-valued parameter instead of XML to get better estimates AD-501-M| XQuery Performance 18
  • 19. Optimal Use Of MethodsAvoiding multiple method evaluationsUse subqueries • BAD: SELECT CASE isnumeric (doc.value( (/doc/customer/order/price)[1], nvarchar(32))) WHEN 1 THEN doc.value( (/doc/customer/order/price)[1], decimal(5,2)) ELSE 0 END FROM T • GOOD: SELECT CASE isnumeric (Price) WHEN 1 THEN CAST(Price as decimal(5,2)) ELSE 0 END FROM (SELECT doc.value( (/doc/customer/order/price)[1], nvarchar(32))) as Price FROM T) XUse subqueries also with NULLIF() AD-501-M| XQuery Performance 19
  • 20. Combined SQL And XQuery/DML Processing SELECT x.query(…), y FROM T WHERE …Static Metadata SQL Parser XQuery ParserPhase XML Static Typing Static Typing Schema Collection Algebrization Algebrization Static Optimization of combined Logical and Physical Operation TreeDynamic Runtime Optimization XML andPhase and Execution of rel. physical Op Tree Indices AD-501-M| XQuery Performance 20
  • 21. New XQuery Algebra OperatorsXML Reader TVFTable-Valued Function XML Reader UDF with XPath FilterUsed if no Primary XML Index is presentCreates node table rowset in query flowMultiple XPath filters can be pushed in to reduce node tableto subtreeBase cardinality estimate is always 10’000 rows! Some adjustment based on pushed path filtersXMLReader node table format example (simplified) ID TAG ID Node Type-ID VALUE HID 1.3.1 4 (TITLE) Element 2 (xs:string) Bad Bugs #title#section#book AD-501-M| XQuery Performance 21
  • 22. New XQuery Algebra OperatorsUDX• Serializer UDX serializes the query result as XML• XQuery String UDX evaluates the XQuery string() function• XQuery Data UDX evaluates the XQuery data() function• Check UDX validates XML being inserted• UDX name visible in SSMS properties window AD-501-M| XQuery Performance 22
  • 23. Optimal Use Of XQueryAtomization of nodesValue comparisons, XQuery casts and value() methodcasts require atomization of item • attribute: /person[@age = 42] /person[data(@age) = 42] • Atomic typed element: /person[age = 42] /person[data(age) = 42] • Untyped, mixed content typed element (adds UDX): /person[age = 42] /person[data(age) = 42] /person[string(age) = 42] • If only one text node for untyped element (better): /person[age/text() = 42] /person[data(age/text()) = 42] • value() method on untyped elements: value(/person/age, int) value(/person/age/text(), int)String() aggregates all text nodes, prohibits index use AD-501-M| XQuery Performance 23
  • 24. Optimal Use Of XQueryCasting ValuesValue comparisons require casts and type promotion • Untyped attribute: /person[@age = 42] /person[xs:decimal(@age) = 42] • Untyped text node(): /person[age/text() = 42] /person[xs:decimal(age/text()) = 42] • Typed element (typed as xs:int): /person[salary = 3e4] /person[xs:double(salary) = 3e4]Casting is expensive and prohibits index lookupTips to avoid casting • Use appropriate types for comparison (string for untyped) • Use schema to declare type AD-501-M| XQuery Performance 24
  • 25. Optimal Use Of XQueryMaximize XPath expressionsSingle paths are more efficient than twig pathsAvoid predicates in the middle of path expressions book[@ISBN = "1-8610-0157-6"]/author[first- name = "Davis"] /book[@ISBN = "1-8610-0157-6"] "∩" /book/author[first-name = "Davis"]Move ordinals to the end of path expressions • Make sure you get the same semantics! • /a[1]/b[1] ≠ (/a/b)[1] ≠ /a/b[1] • (/book/@isbn)[1] is better than/book[1]/@isbn AD-501-M| XQuery Performance 25
  • 26. Optimal Use Of XQueryMaximize XPath expressions in exist()Use context item in predicate to lengthen path in exist() • Existential quantification makes returned node irrelevant• BAD: SELECT * FROM docs WHERE 1 = xCol.exist (/book/subject[text() = "security"])• GOOD: SELECT * FROM docs WHERE 1 = xCol.exist (/book/subject/text()[. = "security"])• BAD: SELECT * FROM docs WHERE 1 = xCol.exist (/book[@price > 9.99 and @price < 49.99])• GOOD: SELECT * FROM docs WHERE 1 = xCol.exist (/book/@price[. > 9.99 and . < 49.99])This does not work with or-predicate AD-501-M| XQuery Performance 26
  • 27. Optimal Use Of XQueryInefficient operations: Parent axisMost frequent offender: parent axis with nodes()• BAD: select o.value(../@id, int) as CustID , o.value(@id, int) as OrdID from T cross apply x.nodes(/doc/customer/orders) as N(o)• GOOD: select c.value(@id, int) as CustID , o.value(@id, int) as OrdID from T cross apply x.nodes(/doc/customer) as N1(c) cross apply c.nodes(orders) as N2(o) AD-501-M| XQuery Performance 27
  • 28. Optimal Use Of XQueryInefficient operationsAvoid descendant axes and // in the middle of pathexpressions if the data structure is known. • // still can use the HID lookup, but is less efficientXQuery construction performs worse than FOR XML • BAD: SELECT notes.query( <Customer cid="{sql:column(cid)}">{ <name>{sql:column("name")}</name>, / }</Customer>) FROM Customers WHERE cid=1 • GOOD: SELECT cid as "@cid", name, notes as "*" FROM Customers WHERE cid=1 FOR XML PATH(Customer), TYPE AD-501-M| XQuery Performance 28
  • 29. Optimal Use Of FOR XMLUse TYPE directive when assigning result to XML • BAD: declare @x xml; set @x = (select * from Customers for xml raw); • GOOD: declare @x xml; set @x = (select * from Customers for xml raw, type);Use FOR XML PATH for complex grouping and additionalhierarchy levels over FOR XML EXPLICITUse FOR XML EXPLICIT for complex nesting if FOR XML PATHperformance is not appropriate AD-501-M| XQuery Performance 29
  • 30. XML IndicesCreate XML index on XML column CREATE PRIMARY XML INDEX idx_1 ON docs (xDoc)Create secondary indexes on tags, values, pathsCreation: • Single-threaded only for primary XML index • Multi-threaded for secondary XML indexesUses: • Primary Index will always be used if defined (not a cost based decision) • Results can be served directly from index • SQL’s cost based optimizer will consider secondary indexesMaintenance: • Primary and Secondary Indices will be efficiently maintained during updates • Only subtree that changes will be updated • No online index rebuild  • Clustered key may lead to non-linear maintenance cost Schema revalidation still checks whole instance AD-501-M| XQuery Performance 30
  • 31. Example Index Contentsinsert into Person values (42,<book ISBN=”1-55860-438-3”> <section> <title>Bad Bugs</title> Nobody loves bad bugs. </section> <section> <title>Tree Frogs</title> All right-thinking people <bold>love</bold> tree frogs.</section></book>) AD-501-M| XQuery Performance 31
  • 32. Primary XML Index CREATE PRIMARY XML INDEX PersonIdx ON Person (Pdesc)PK XID TAG ID Node Type-ID VALUE HID42 1 1 (book) Element 1 (bookT) null #book42 1.1 2 (ISBN) Attribute 2 (xs:string) 1-55860-438-3 #@ISBN#book42 1.3 3 (section) Element 3 (sectionT) null #section#book42 1.3.1 4 (TITLE) Element 2 (xs:string) Bad Bugs #title#section#book42 1.3.3 -- Text -- Nobody loves #text()#section#book bad bugs.42 1.5 3 (section) Element 3 (sectionT) null #section#book42 1.5.1 4 (title) Element 2 (xs:string) Tree frogs #title#section#book42 1.5.3 -- Text -- All right-thinking #text()#section#book people42 1.5.5 7 (bold) Element 4 (boldT) love #bold#section#book42 1.5.7 -- Text -- tree frogs #text()#section#book Assumes typed data; Columns and Values are simplified, see VLDB 2004 paper for details AD-501-M| XQuery Performance 32
  • 33. Secondary XML Indices XML Column Primary XML Index (1 per XML column) in table T(id, x) Clustered on Primary Key (of table T), XID PK XID NID TID VALUE LVALUE HID xsinil …id x 11 Binary XML 1 12 Binary XML 2 2 1 34 1 2 3 1 2 23 Binary XML 3 3 3 Non-clustered Secondary Indices (n per primary Index) Value Index Property Index Path Index AD-501-M| XQuery Performance 33
  • 34. XQueries And XMLIndicesdemo October 11-14, Seattle, WA
  • 35. Takeaway: XML IndicesPRIMARY XML Index – Use when lots of XQueryFOR VALUE – Useful for queries where values aremore selective than paths such as//*[.=“Seattle”]FOR PATH – Useful for Path expressions: avoidsjoins by mapping paths to hierarchical index(HID) numbers. Example: /person/address/zipFOR PROPERTY – Useful when optimizer choosesother index (for example, on relational column,or FT Index) in addition so row is already known AD-501-M| XQuery Performance 35
  • 36. Shredding ApproachesApproach Complex Bulkload Server Business Programming Scale/ Shapes vs logic Performance MidtierSQLXML Yes with Yes midtier staging annotated very good/Bulkload limits tables on XSD and small very goodwith server, APIannotated XSLT onschema midtierADO.Net No No midtier midtier, DataSet API good/goodDataSet SSIS or SSISCLR Table- Yes No Server Server or C#, VB limited/goodvalued or midtier custom codefunction midtierOpenXML Yes No Server T-SQL declarative T- limited/good SQL, XPath against variablenodes() Yes No Server T-SQL declarative good/careful SQL, XQuery against var or table
  • 37. To Promote or Not Promote…Promotion pre-calculates pathsRequires relational query• XQuery does not know about promotionPromotion during loading of the data• Using any of the shredding mechanisms• 1-to-1 or 1-to-many relationshipsPromotion using computed columns• 1-to-1 only• Persist computed column: Fast lookup and retrieval• Relational index on persisted computed column: Fast lookup• Have to be precisePromotion using Triggers• 1-to-1 or 1-to-many relationships• Trigger overheadRelational View over XML data• Filters on relational view are not pushed down due to different type/value system AD-501-M| XQuery Performance 37
  • 38. Promotion using computed columnsUse a schema-bound UDF that encapsulates XQueryPersist computed column • Fast lookup and retrievalRelational index on persisted computed column • Fast lookupQuery will have to use the schema-bound UDF to matchCAVEAT: No parallel plans with a persisted computedcolumn based on a UDF AD-501-M| XQuery Performance 38
  • 39. Use of Full-Text Index for Optimization Can provide improvement for XQuery contains() queries Query for documents where section title contains “optimization” Use Fulltext index to prefilter candidates (includes false positives) SELECT * FROM docs WHERE contains(xCol, optimization) 1 = xCol.exist( /book/section/title/text()[contains(.,"optimization")] AND 1 = xCol.exist( ) /book/section/title/text()[contains(.,"optimization")] ) AD-501-M| XQuery Performance 39
  • 40. Futures: Selective XML IndexCREATE SELECTIVE XML INDEX pxi_index ON Tbl(xmlcol)FOR (-– the first four match XQuery predicates-- in all XML data type methods-- simple flavor - default mapping (xs:untypedAtomic),-- no optimization hintsnode42 = ‘/a/b’,pathatc = ‘/a/b/c/@atc’,-- advanced flavor - use of optimization hintspath02 =‘/a/b/c’ as XQUERY ‘xs:string’ MAXLENGTH(25),node13 = ‘/a/b/d’ as XQUERY ‘xs:double SINGLETON,-– the next two match value() method-- require regular SQL Server type semantics-- they can be mixed with the XQUERY ones-- specifying a type is mandatory for the SQL type semanticspathfloat = ‘/a/b/c’ as SQL FLOAT,pathabd = ‘/a/b/d’ as SQL VARCHAR(200))
  • 41. Session Takeaways• Understand when and how to use XML in SQL Server• Understand and correct common performance problems with XML and XQuery• Shred “relational” XML to relations• Use XML datatype for semistructured and markup scenarios• Write your XQueries so that XML Indices can be used• Use persisted computed columns to promote XQuery results (with caveat)
  • 42. October 11-14, Seattle, WA
  • 43. Related ContentOptimization whitepapers XML and Databases whitepapers WebCasts & Forum:microsoft.public.sqlserver.xml E-mail: mrys@microsoft.comMy Weblog: AD-501-M| XQuery Performance 43
  • 44. Complete the Evaluation Form to Win! Win a Dell Mini Netbook – every day – just for submitting your completed form. Each session evaluation form represents a chance to win. Pick up your evaluation form: • In each presentation room Sponsored by Dell • Online on the PASS Summit website Drop off your completed form: • Near the exit of each presentation room • At the Registration desk • Online on the PASS Summit website AD-501-M| XQuery Performance 44
  • 45. Thank youfor attending this session and the2011 PASS Summit in Seattle October 11-14, Seattle, WA
  • 46. Microsoft SQL Microsoft Expert Pods Hands-on Labs Server Clinic Product Pavilion Meet Microsoft SQL Server Engineering Work through your Talk with Microsoft SQL Get experienced through team members &technical issues with SQL Server & BI experts to self-paced & instructor- SQL MVPs Server CSS & get learn about the next led labs on our cloud architectural guidance version of SQL Server based lab platform - from SQLCAT and check out the new bring your laptop or use Database Consolidation HP provided hardware Appliance Room 611 Expo Hall 6th Floor Lobby Room 618-620 AD-501-M| XQuery Performance 46