SQLPASS presentation on performance tuning and best practices for XML and XQuery in Microsoft SQL Server 2005, SQL Server 2008, SQL Server 2008 R2 and SQL Server 2012.
The is the RFC for AvocadoDB's query language. AvocadoDB is an open source nosql database (see www.avocadodb.org) offering a mixture of data models like key value pairs, documents and graphs.
The REST API for AvocadoDB is already available and stable and people are writing APIs using it. Awesome. As AvocacoDB offers more complex data structures like graphs and lists REST is not enough. We implemented a first version of a query language some time ago which is very similar to SQL and UNQL.
Then we realized that this approach was not completely satisfying as some queries cannot expressed very well with it, especially multi-valued attributes/lists. UNQL addresses this partly, but does not go far enough. Another issue are graphs. AvocadoDB supports querying graphs, neither SQL nor UNQL offer any "natural" graph traversal facilities.
As we did not find any existing query language that addresses the problems we found we had to define a new query language which is presented in the presentation.
Have some feedback on this? Come to www.avocadodb.org and tell us what you think about it. :-)
This presentation deals with the fundamentals of SQL, Installation and Database concepts. Presented by our team in Alphalogic Inc: https://www.alphalogicinc.com/
The is the RFC for AvocadoDB's query language. AvocadoDB is an open source nosql database (see www.avocadodb.org) offering a mixture of data models like key value pairs, documents and graphs.
The REST API for AvocadoDB is already available and stable and people are writing APIs using it. Awesome. As AvocacoDB offers more complex data structures like graphs and lists REST is not enough. We implemented a first version of a query language some time ago which is very similar to SQL and UNQL.
Then we realized that this approach was not completely satisfying as some queries cannot expressed very well with it, especially multi-valued attributes/lists. UNQL addresses this partly, but does not go far enough. Another issue are graphs. AvocadoDB supports querying graphs, neither SQL nor UNQL offer any "natural" graph traversal facilities.
As we did not find any existing query language that addresses the problems we found we had to define a new query language which is presented in the presentation.
Have some feedback on this? Come to www.avocadodb.org and tell us what you think about it. :-)
This presentation deals with the fundamentals of SQL, Installation and Database concepts. Presented by our team in Alphalogic Inc: https://www.alphalogicinc.com/
In this lecture we look at the patterns in chapter 18 in the textbook (Patterns of Enterprise Application Architecture). The lecture is in two parts. First we go through each of the patterns and explain each.
Then in the second part we look at a problem we have to solve and try to get the patterns to show themselves at the time they are needed.
With the introduction of SQL Server 2012 data developers have new ways to interact with their databases. This session will review the powerful new analytic windows functions, new ways to generate numeric sequences and new ways to page the results of our queries. Other features that will be discussed are improvements in error handling and new parsing and concatenating features.
In this presentation, Vineet will be explaining case study of one of my customers using Spark to migrate terabytes of data from GPFS into Hive tables. The ETL pipeline was built purely using Spark. The pipeline extracted target (Hive) table properties such as - identification of Hive Date/Timestamp columns, whether target table is partitioned or non-partitioned, target storage formats (Parquet or Avro) and source to target columns mappings. These target tables contain few to hundreds of columns and non standard date fomats into Hive standard timestamp format.
Killer Scenarios with Data Lake in Azure with U-SQLMichael Rys
Presentation from Microsoft Data Science Summit 2016
Presents 4 examples of custom U-SQL data processing: Overlapping Range Aggregation, JSON Processing, Image Processing and R with U-SQL
Developing Dynamic Reports for TMS Using Crystal ReportsChad Petrovay
Like many other institutions, The Morgan Library and Museum utilizes TMS to generate reports using templates prepared in Crystal Reports. But the reports and forms we desire most – loan agreements, condition reports, and exhibition checklists – need to be highly dynamic. Instead of typesetting each block of text or checkbox, the Morgan leverages Crystal Reports’ powerful capabilities to make reports flexible and easy to maintain. This presentation will dissect some of our most complicated reports to look at the underlying structures and formulae, and allow attendees to step up their Crystal Report skills.
In this lecture we look at the patterns in chapter 18 in the textbook (Patterns of Enterprise Application Architecture). The lecture is in two parts. First we go through each of the patterns and explain each.
Then in the second part we look at a problem we have to solve and try to get the patterns to show themselves at the time they are needed.
With the introduction of SQL Server 2012 data developers have new ways to interact with their databases. This session will review the powerful new analytic windows functions, new ways to generate numeric sequences and new ways to page the results of our queries. Other features that will be discussed are improvements in error handling and new parsing and concatenating features.
In this presentation, Vineet will be explaining case study of one of my customers using Spark to migrate terabytes of data from GPFS into Hive tables. The ETL pipeline was built purely using Spark. The pipeline extracted target (Hive) table properties such as - identification of Hive Date/Timestamp columns, whether target table is partitioned or non-partitioned, target storage formats (Parquet or Avro) and source to target columns mappings. These target tables contain few to hundreds of columns and non standard date fomats into Hive standard timestamp format.
Killer Scenarios with Data Lake in Azure with U-SQLMichael Rys
Presentation from Microsoft Data Science Summit 2016
Presents 4 examples of custom U-SQL data processing: Overlapping Range Aggregation, JSON Processing, Image Processing and R with U-SQL
Developing Dynamic Reports for TMS Using Crystal ReportsChad Petrovay
Like many other institutions, The Morgan Library and Museum utilizes TMS to generate reports using templates prepared in Crystal Reports. But the reports and forms we desire most – loan agreements, condition reports, and exhibition checklists – need to be highly dynamic. Instead of typesetting each block of text or checkbox, the Morgan leverages Crystal Reports’ powerful capabilities to make reports flexible and easy to maintain. This presentation will dissect some of our most complicated reports to look at the underlying structures and formulae, and allow attendees to step up their Crystal Report skills.
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Michael Rys
SQLBits 2020 presentation on how you can build solutions based on the modern data warehouse pattern with Azure Synapse Spark and SQL including demos of Azure Synapse.
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Michael Rys
Presentation by James Baker and myself on Running cost effective big data workloads with Azure Synapse and Azure Datalake Storage (ADLS) at Microsoft Ignite 2020. Covers Modern Data warehouse architecture supported by Azure Synapse, integration benefits with ADLS and some features that reduce cost such as Query Acceleration, integration of Spark and SQL processing with integrated meta data and .NET For Apache Spark support.
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
The presentation discusses how to migrate expensive open source big data workloads to Azure and leverage latest compute and storage innovations within Azure Synapse with Azure Data Lake Storage to develop a powerful and cost effective analytics solutions. It shows how you can bring your .NET expertise with .NET for Apache Spark to bear and how the shared meta data experience in Synapse makes it easy to create a table in Spark and query it from T-SQL.
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
This presentation shows how you can build solutions that follow the modern data warehouse architecture and introduces the .NET for Apache Spark support (https://dot.net/spark, https://github.com/dotnet/spark)
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Michael Rys
More and more customers who are looking to modernize analytics needs are exploring the data lake approach in Azure. Typically, they are most challenged by a bewildering array of poorly integrated technologies and a variety of data formats, data types not all of which are conveniently handled by existing ETL technologies. In this session, we’ll explore the basic shape of a modern ETL pipeline through the lens of Azure Data Lake. We will explore how this pipeline can scale from one to thousands of nodes at a moment’s notice to respond to business needs, how its extensibility model allows pipelines to simultaneously integrate procedural code written in .NET languages or even Python and R, how that same extensibility model allows pipelines to deal with a variety of formats such as CSV, XML, JSON, Images, or any enterprise-specific document format, and finally explore how the next generation of ETL scenarios are enabled though the integration of Intelligence in the data layer in the form of built-in Cognitive capabilities.
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
When processing TB and PB of data, running your Big Data queries at scale and having them perform at peak is essential. In this session, we show you some state-of-the art tools on how to analyze U-SQL job performances and we discuss in-depth best practices on designing your data layout both for files and tables and writing performing and scalable queries using U-SQL. You will learn how to analyze performance and scale bottlenecks and will learn several tips on how to make your big data processing scripts both faster and scale better.
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Michael Rys
Big data processing increasingly needs to address not just querying big data but needs to apply domain specific algorithms to large amounts of data at scale. This ranges from developing and applying machine learning models to custom, domain specific processing of images, texts, etc. Often the domain experts and programmers have a favorite language that they use to implement their algorithms such as Python, R, C#, etc. Microsoft Azure Data Lake Analytics service is making it easy for customers to bring their domain expertise and their favorite languages to address their big data processing needs. In this session, I will showcase how you can bring your Python, R, and .NET code and apply it at scale using U-SQL.
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
From theory to implementation - follow the steps of implementing an end-to-end analytics solution illustrated with some best practices and examples in Azure Data Lake.
During this full training day we will share the architecture patterns, tooling, learnings and tips and tricks for building such services on Azure Data Lake. We take you through some anti-patterns and best practices on data loading and organization, give you hands-on time and the ability to develop some of your own U-SQL scripts to process your data and discuss the pros and cons of files versus tables.
This were the slides presented at the SQLBits 2018 Training Day on Feb 21, 2018.
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...Michael Rys
When analyzing big data, you often have to process data at scale that is not rectangular in nature and you would like to scale out your existing programs and cognitive algorithms to analyze your data. To address this need and make it easy for the programmer to add her domain specific code, U-SQL includes a rich extensibility model that allows you to process any kind of data, ranging from CSV files over JSON and XML to image files and add your own custom operators. In this presentation, we will provide some examples on how to use U-SQL to process interesting data formats with custom extractors and functions, including JSON, images, use U-SQL’s cognitive library and finally show how U-SQL allows you to invoke custom code written in Python and R.
Slides for SQL Saturday 635, Vancouver BC presentation, Vancouver BC. Aug 2017.
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
Data Lakes have become a new tool in building modern data warehouse architectures. In this presentation we will introduce Microsoft's Azure Data Lake offering and its new big data processing language called U-SQL that makes Big Data Processing easy by combining the declarativity of SQL with the extensibility of C#. We will give you an initial introduction to U-SQL by explaining why we introduced U-SQL and showing with an example of how to analyze some tweet data with U-SQL and its extensibility capabilities and take you on an introductory tour of U-SQL that is geared towards existing SQL users.
slides for SQL Saturday 635, Vancouver BC, Aug 2017
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
SQLPASS AD501-M XQuery MRys
1. Best Practices and
Performance Tuning of
XML Queries in SQL Server
AD-501-M
Michael Rys
Principal Program Manager
Microsoft Corp
mrys@microsoft.com
@SQLServerMike
October 11-14, Seattle, WA
2. Session Objectives
• Understand when and how
to use XML in SQL Server
• Understand and correct common
performance problems with XML and
XQuery
3. Session Agenda
XML Scenarios and when to store XML
XML Design Optimizations
General Optimizations
XML Datatype method Optimizations
XQuery Optimizations
XML Index Optimizations
AD-501-M| XQuery Performance 3
5. XML Scenarios
Data Exchange between loosely-coupled systems
• XML is ubiquitous, extensible, platform independent transport format
• Message Envelope in XML
Simple Object Access Protocol (SOAP), RSS, REST
• Message Payload/Business Data in XML
• Vertical Industry Exchange schemas
Document Management
• XHTML, DocBook, Home-grown, domain-specific markup (e.g.
contracts), OpenOffice, Microsoft Office XML (both default and user-
extended)
Ad-hoc modeling of semistructured data
• Storing and querying heterogeneous complex objects
• Semistructured data with sparse, highly-varying
structure at the instance level
• XML provides self-describing format and extensible schemas
→Transport, Store, and Query XML data
AD-501-M| XQuery Performance 5
6. Decision Tree: Processing XML In SQL Server
Does the data fit
Shred the XML
the relational
Yes into relations
model?
No structured
Known sparse
Shred the structured
XML into relations, store Shred known
Is the data semi- semistructured aspects sparse data into
structured? Yes as XML and/or sparse
sparse columns
col
No Open schema
Is the XML Promote
Yes
Is the data a Search within
constrainedthe
Query into by frequently queried
document? the XML?
XML? properties
Yes schemas? relationally
No Yes
Use primary and
Constrain XML if
Store as Define a full-text secondary XML
validation XML is
Store as cost
varbinary(max) index indexes as
ok AD-501-M| needed 6
XQuery Performance
7. SQL Server XML Data Type Architecture
XML Relational
XML
XML Parser XML Schemata
Schema
Validation Collection
OpenXML/nodes() PATH
XML-DML XML data type Rowsets
Index
(binary XML) PRIMARY Node
Table PROP
XML INDEX with
FOR XML Index
TYPE directive
VALUE
XQuery Index
AD-501-M| XQuery Performance 7
8. General Impacts
Concurrency Control
• Locks on both XML data type and relevant
rows in primary and secondary XML Indices
• Lock escalation on indices
• Snapshot Isolation reduces locks and lock contention
Transaction Logs
• Bulkinsert into XML Indices may fill transaction log
• Delay the creation of the XML indexes and use the SIMPLE recovery
model
• Preallocate database file instead of dynamically growing
• Place log on different disk
In-Row/Out-of-Row of XML large object
• Moving XML into side table or out-of-row if
mixed with relational data reduces scan time
Due to clustering, insertion into XML Index may not be linear
• Chose integer/bigint identity column as key
AD-501-M| XQuery Performance 8
9. Choose The Right XML Model
• Element-centric versus attribute-centric
<Customer><name>Joe</name></Customer>
<Customer name="Joe" />
+: Attributes often better performing querying
–: Parsing Attributes uniqueness check
• Generic element names with type attribute vs Specific
element names
<Entity type="Customer">
<Prop type="Name">Joe</Prop>
</Entity>
<Customer><name>Joe</name></Customer>
+: Specific names shorter path expressions
+: Specific names no filter on type attribute
/Entity[@type="Customer"]/Prop[@type="Name"] vs /Customer/name
• Wrapper elements
<Orders><Order id="1"/></Orders>
+: No wrapper elements smaller XML, shorter path expressions
AD-501-M| XQuery Performance 9
10. Use an XML Schema Collection?
Using no XML Schema (untyped XML)
• Can still use XQuery and XML Index!!!
• Atomic values are always weakly typed strings
compare as strings to avoid runtime
conversions and loss of index usage
• No schema validation overhead
• No schema evolution revalidation costs
XML Schema provides structural information
• Atomic typed elements are now using only one instead of two
rows in node table/XML index (closer to attributes)
• Static typing can detect cardinality and feasibility of expression
XML Schema provides semantic information
• Elements/attributes have correct atomic
type for comparison and order semantics
• No runtime casts required and better use of index for value lookup
AD-501-M| XQuery Performance 10
11. XQuery Methods
query() creates new, untyped XML data type
instance
exist() returns 1 if the XQuery expression returns
at least one item, 0 otherwise
value() extracts an XQuery value into the SQL
value and type space
• Expression has to statically be a singleton
• String value of atomized XQuery item is cast to
SQL type
• SQL type has to be SQL scalar type
(no XML or CLR UDT) AD-501-M| XQuery Performance 11
12. XQuery: nodes()
Returns a row per selected node as a special
XML data type instance
• Preserves the original structure and types
• Can only be used with the XQuery methods (but not
modify()), count(*), and IS (NOT) NULL
Appears as Table-valued Function (TVF) in
queryplan if no index present
AD-501-M| XQuery Performance 12
13. sql:column()/sql:variable()
Map SQL value and type into XQuery values and types in context of XQuery or
XML-DML
• sql:variable(): accesses a SQL variable/parameter
declare @value int
set @value=42
select * from T
where
T.x.exist('/a/b[@id=sql:variable("@value")]')=1
• sql:column(): accesses another column value
tables: T(key int, x xml), S(key int, val int)
select * from T join S on T.key=S.key
where T.x.exist('/a/b[@id=sql:column("S.val")]')=1
• Restrictions in SQL Server:
No XML, CLR UDT, datetime, or deprecated text/ntext/image
AD-501-M| XQuery Performance 13
15. Optimal Use Of Methods
How to Cast from XML to SQL
BAD:
CAST( CAST(xmldoc.query('/a/b/text()') as
nvarchar(500)) as int)
GOOD:
xmldoc.value('(/a/b/text())[1]', 'int')
BAD:
node.query('.').value('@attr',
'nvarchar(50)')
GOOD:
node.value('@attr', 'nvarchar(50)')
AD-501-M| XQuery Performance 15
16. Optimal Use Of Methods
Grouping value() method
Group value() methods on same XML instance next to
each other if the path expressions in the value()
methods are
• Simple path expressions that only use child and attribute axis
and do not contain wildcards, predicates, node tests, ordinals
• The path expressions infer statically a singleton
The singleton can be statically inferred from
• the DOCUMENT and XML Schema Collection
• Relative paths on the context node provided by the nodes()
method
Requires XML index to be present
AD-501-M| XQuery Performance 16
17. Optimal Use of Methods
Using the right method to join and compare
Use exist() method, sql:column()/sql:variable() and an
XQuery comparison for checking for a value or joining
if secondary XML indices present
BAD:*
select doc
from doc_tab join authors
on doc.value('(/doc/mainauthor/lname/text())[1]',
'nvarchar(50)') = lastname
GOOD:
select doc
from doc_tab join authors
on 1 = doc.exist('/doc/mainauthor/lname/text()[. =
sql:column("lastname")]')
* If applied on XML variable/no index present, value()
method is most of the time more efficient
AD-501-M| XQuery Performance 17
18. Optimal Use of Methods
Avoiding bad costing with nodes()
nodes() without XML index is a Table-valued function (details later)
Bad cardinality estimates can lead to bad plans
• BAD:
select c.value('@id', 'int') as CustID
, c.value('@name', 'nvarchar(50)') as CName
from Customer, @x.nodes('/doc/customer') as N(c)
where Customer.ID = c.value('@id', 'int')
• BETTER (if only one wrapper doc element):
select c.value('@id', 'int') as CustID
, c.value('@name', 'nvarchar(50)') as CName
from Customer, @x.nodes('/doc[1]') as D(d)
cross apply d.nodes('customer') as N(c)
where Customer.ID = c.value('@id', 'int')
Use temp table (insert into #temp select … from nodes()) or Table-
valued parameter instead of XML to get better estimates
AD-501-M| XQuery Performance 18
19. Optimal Use Of Methods
Avoiding multiple method evaluations
Use subqueries
• BAD:
SELECT CASE isnumeric (doc.value(
'(/doc/customer/order/price)[1]', 'nvarchar(32)'))
WHEN 1 THEN doc.value(
'(/doc/customer/order/price)[1]', 'decimal(5,2)')
ELSE 0 END
FROM T
• GOOD:
SELECT CASE isnumeric (Price)
WHEN 1 THEN CAST(Price as decimal(5,2))
ELSE 0 END
FROM (SELECT doc.value(
'(/doc/customer/order/price)[1]',
'nvarchar(32)')) as Price FROM T) X
Use subqueries also with NULLIF()
AD-501-M| XQuery Performance 19
20. Combined SQL And XQuery/DML Processing
SELECT x.query('…'), y FROM T WHERE …
Static Metadata
SQL Parser XQuery Parser
Phase
XML
Static Typing Static Typing Schema
Collection
Algebrization Algebrization
Static Optimization of
combined Logical and
Physical Operation Tree
Dynamic Runtime Optimization XML and
Phase and Execution of rel.
physical Op Tree Indices
AD-501-M| XQuery Performance 20
21. New XQuery Algebra Operators
XML Reader TVF
Table-Valued Function XML Reader UDF with XPath Filter
Used if no Primary XML Index is present
Creates node table rowset in query flow
Multiple XPath filters can be pushed in to reduce node table
to subtree
Base cardinality estimate is always 10’000 rows!
Some adjustment based on pushed path filters
XMLReader node table format example (simplified)
ID TAG ID Node Type-ID VALUE HID
1.3.1 4 (TITLE) Element 2 (xs:string) Bad Bugs #title#section#book
AD-501-M| XQuery Performance 21
22. New XQuery Algebra Operators
UDX
• Serializer UDX
serializes the query result as XML
• XQuery String UDX
evaluates the XQuery string() function
• XQuery Data UDX
evaluates the XQuery data() function
• Check UDX
validates XML being inserted
• UDX name visible in SSMS properties window
AD-501-M| XQuery Performance 22
23. Optimal Use Of XQuery
Atomization of nodes
Value comparisons, XQuery casts and value() method
casts require atomization of item
• attribute:
/person[@age = 42]
/person[data(@age) = 42]
• Atomic typed element:
/person[age = 42] /person[data(age) = 42]
• Untyped, mixed content typed element (adds UDX):
/person[age = 42] /person[data(age) = 42]
/person[string(age) = 42]
• If only one text node for untyped element (better):
/person[age/text() = 42]
/person[data(age/text()) = 42]
• value() method on untyped elements:
value('/person/age', 'int')
value('/person/age/text()', 'int')
String() aggregates all text nodes, prohibits index use
AD-501-M| XQuery Performance 23
24. Optimal Use Of XQuery
Casting Values
Value comparisons require casts and type promotion
• Untyped attribute:
/person[@age = 42] /person[xs:decimal(@age) = 42]
• Untyped text node():
/person[age/text() = 42]
/person[xs:decimal(age/text()) = 42]
• Typed element (typed as xs:int):
/person[salary = 3e4] /person[xs:double(salary) =
3e4]
Casting is expensive and prohibits index lookup
Tips to avoid casting
• Use appropriate types for comparison (string for untyped)
• Use schema to declare type AD-501-M| XQuery Performance 24
25. Optimal Use Of XQuery
Maximize XPath expressions
Single paths are more efficient than twig paths
Avoid predicates in the middle of path expressions
book[@ISBN = "1-8610-0157-6"]/author[first-
name = "Davis"]
/book[@ISBN = "1-8610-0157-6"] "∩"
/book/author[first-name = "Davis"]
Move ordinals to the end of path expressions
• Make sure you get the same semantics!
• /a[1]/b[1] ≠ (/a/b)[1] ≠ /a/b[1]
• (/book/@isbn)[1] is better than/book[1]/@isbn
AD-501-M| XQuery Performance 25
26. Optimal Use Of XQuery
Maximize XPath expressions in exist()
Use context item in predicate to lengthen path in exist()
• Existential quantification makes returned node irrelevant
• BAD:
SELECT * FROM docs WHERE 1 = xCol.exist
('/book/subject[text() = "security"]')
• GOOD:
SELECT * FROM docs WHERE 1 = xCol.exist
('/book/subject/text()[. = "security"]')
• BAD:
SELECT * FROM docs WHERE 1 = xCol.exist
('/book[@price > 9.99 and @price < 49.99]')
• GOOD:
SELECT * FROM docs WHERE 1 = xCol.exist
('/book/@price[. > 9.99 and . < 49.99]')
This does not work with or-predicate AD-501-M| XQuery Performance 26
27. Optimal Use Of XQuery
Inefficient operations: Parent axis
Most frequent offender: parent axis with nodes()
• BAD:
select o.value('../@id', 'int') as CustID
, o.value('@id', 'int') as OrdID
from T
cross apply x.nodes('/doc/customer/orders') as N(o)
• GOOD:
select c.value('@id', 'int') as CustID
, o.value('@id', 'int') as OrdID
from T cross apply x.nodes('/doc/customer') as N1(c)
cross apply c.nodes('orders') as N2(o)
AD-501-M| XQuery Performance 27
28. Optimal Use Of XQuery
Inefficient operations
Avoid descendant axes and // in the middle of path
expressions if the data structure is known.
• // still can use the HID lookup, but is less efficient
XQuery construction performs worse than FOR XML
• BAD:
SELECT notes.query('
<Customer cid="{sql:column(''cid'')}">{
<name>{sql:column("name")}</name>, /
}</Customer>')
FROM Customers WHERE cid=1
• GOOD:
SELECT cid as "@cid", name, notes as "*"
FROM Customers WHERE cid=1
FOR XML PATH('Customer'), TYPE
AD-501-M| XQuery Performance 28
29. Optimal Use Of FOR XML
Use TYPE directive when assigning result to XML
• BAD:
declare @x xml;
set @x =
(select * from Customers for xml raw);
• GOOD:
declare @x xml;
set @x =
(select * from Customers for xml raw,
type);
Use FOR XML PATH for complex grouping and additional
hierarchy levels over FOR XML EXPLICIT
Use FOR XML EXPLICIT for complex nesting if FOR XML PATH
performance is not appropriate
AD-501-M| XQuery Performance 29
30. XML Indices
Create XML index on XML column
CREATE PRIMARY XML INDEX idx_1 ON docs (xDoc)
Create secondary indexes on tags, values, paths
Creation:
• Single-threaded only for primary XML index
• Multi-threaded for secondary XML indexes
Uses:
• Primary Index will always be used if defined (not a cost
based decision)
• Results can be served directly from index
• SQL’s cost based optimizer will consider secondary indexes
Maintenance:
• Primary and Secondary Indices will be efficiently maintained
during updates
• Only subtree that changes will be updated
• No online index rebuild
• Clustered key may lead to non-linear maintenance cost
Schema revalidation still checks whole instance
AD-501-M| XQuery Performance 30
31. Example Index Contents
insert into Person values (42,
'<book ISBN=”1-55860-438-3”>
<section>
<title>Bad Bugs</title>
Nobody loves bad bugs.
</section>
<section>
<title>Tree Frogs</title>
All right-thinking people
<bold>love</bold> tree frogs.
</section>
</book>')
AD-501-M| XQuery Performance 31
32. Primary XML Index
CREATE PRIMARY XML INDEX PersonIdx ON Person (Pdesc)
PK XID TAG ID Node Type-ID VALUE HID
42 1 1 (book) Element 1 (bookT) null #book
42 1.1 2 (ISBN) Attribute 2 (xs:string) 1-55860-438-3 #@ISBN#book
42 1.3 3 (section) Element 3 (sectionT) null #section#book
42 1.3.1 4 (TITLE) Element 2 (xs:string) Bad Bugs #title#section#book
42 1.3.3 -- Text -- Nobody loves #text()#section#book
bad bugs.
42 1.5 3 (section) Element 3 (sectionT) null #section#book
42 1.5.1 4 (title) Element 2 (xs:string) Tree frogs #title#section#book
42 1.5.3 -- Text -- All right-thinking #text()#section#book
people
42 1.5.5 7 (bold) Element 4 (boldT) love #bold#section#book
42 1.5.7 -- Text -- tree frogs #text()#section#book
Assumes typed data; Columns and Values are simplified, see VLDB 2004 paper for details
AD-501-M| XQuery Performance 32
33. Secondary XML Indices
XML Column Primary XML Index (1 per XML column)
in table T(id, x) Clustered on Primary Key (of table T), XID
PK XID NID TID VALUE LVALUE HID xsinil …
id x
1
1 Binary XML
1
1
2 Binary XML 2
2
1 34 1
2
3 1
2
2
3 Binary XML
3
3
3
Non-clustered Secondary Indices (n per primary Index)
Value Index Property Index Path Index
AD-501-M| XQuery Performance 33
35. Takeaway: XML Indices
PRIMARY XML Index – Use when lots of XQuery
FOR VALUE – Useful for queries where values are
more selective than paths such as
//*[.=“Seattle”]
FOR PATH – Useful for Path expressions: avoids
joins by mapping paths to hierarchical index
(HID) numbers. Example: /person/address/zip
FOR PROPERTY – Useful when optimizer chooses
other index (for example, on relational column,
or FT Index) in addition so row is already known
AD-501-M| XQuery Performance 35
36. Shredding Approaches
Approach Complex Bulkload Server Business Programming Scale/
Shapes vs logic Performance
Midtier
SQLXML Yes with Yes midtier staging annotated very good/
Bulkload limits tables on XSD and small very good
with server, API
annotated XSLT on
schema midtier
ADO.Net No No midtier midtier, DataSet API good/good
DataSet SSIS or SSIS
CLR Table- Yes No Server Server or C#, VB limited/good
valued or midtier custom code
function midtier
OpenXML Yes No Server T-SQL declarative T- limited/good
SQL, XPath
against
variable
nodes() Yes No Server T-SQL declarative good/careful
SQL, XQuery
against var or
table
37. To Promote or Not Promote…
Promotion pre-calculates paths
Requires relational query
• XQuery does not know about promotion
Promotion during loading of the data
• Using any of the shredding mechanisms
• 1-to-1 or 1-to-many relationships
Promotion using computed columns
• 1-to-1 only
• Persist computed column: Fast lookup and retrieval
• Relational index on persisted computed column: Fast lookup
• Have to be precise
Promotion using Triggers
• 1-to-1 or 1-to-many relationships
• Trigger overhead
Relational View over XML data
• Filters on relational view are not pushed down due to different type/value system
AD-501-M| XQuery Performance 37
38. Promotion using computed columns
Use a schema-bound UDF that encapsulates XQuery
Persist computed column
• Fast lookup and retrieval
Relational index on persisted computed column
• Fast lookup
Query will have to use the schema-bound UDF to match
CAVEAT: No parallel plans with a persisted computed
column based on a UDF
AD-501-M| XQuery Performance 38
39. Use of Full-Text Index for Optimization
Can provide improvement for XQuery contains() queries
Query for documents where section title contains “optimization”
Use Fulltext index to prefilter candidates (includes false positives)
SELECT * FROM docs
WHERE contains(xCol, 'optimization')
1 = xCol.exist('
/book/section/title/text()[contains(.,"optimization")]
AND 1 = xCol.exist('
')
/book/section/title/text()[contains(.,"optimization")]
')
AD-501-M| XQuery Performance 39
40. Futures: Selective XML Index
CREATE SELECTIVE XML INDEX pxi_index ON Tbl(xmlcol)
FOR (
-– the first four match XQuery predicates
-- in all XML data type methods
-- simple flavor - default mapping (xs:untypedAtomic),
-- no optimization hints
node42 = ‘/a/b’,
pathatc = ‘/a/b/c/@atc’,
-- advanced flavor - use of optimization hints
path02 =‘/a/b/c’ as XQUERY ‘xs:string’ MAXLENGTH(25),
node13 = ‘/a/b/d’ as XQUERY ‘xs:double SINGLETON,
-– the next two match value() method
-- require regular SQL Server type semantics
-- they can be mixed with the XQUERY ones
-- specifying a type is mandatory for the SQL type semantics
pathfloat = ‘/a/b/c’ as SQL FLOAT,
pathabd = ‘/a/b/d’ as SQL VARCHAR(200)
)
41. Session Takeaways
• Understand when and how
to use XML in SQL Server
• Understand and correct common
performance problems with XML and
XQuery
• Shred “relational” XML to relations
• Use XML datatype for semistructured
and markup scenarios
• Write your XQueries so that XML
Indices can be used
• Use persisted computed columns to
promote XQuery results (with caveat)
44. Complete the Evaluation Form to Win!
Win a Dell Mini Netbook – every day – just for
submitting your completed form. Each session
evaluation form represents a chance to win.
Pick up your evaluation form:
• In each presentation room Sponsored by Dell
• Online on the PASS Summit website
Drop off your completed form:
• Near the exit of each presentation room
• At the Registration desk
• Online on the PASS Summit website
AD-501-M| XQuery Performance 44
45. Thank you
for attending this session and the
2011 PASS Summit in Seattle
October 11-14, Seattle, WA
46. Microsoft SQL Microsoft Expert Pods Hands-on Labs
Server Clinic Product Pavilion Meet Microsoft SQL
Server Engineering
Work through your Talk with Microsoft SQL Get experienced through
team members &
technical issues with SQL Server & BI experts to self-paced & instructor-
SQL MVPs
Server CSS & get learn about the next led labs on our cloud
architectural guidance version of SQL Server based lab platform -
from SQLCAT and check out the new bring your laptop or use
Database Consolidation HP provided hardware
Appliance
Room 611 Expo Hall 6th Floor Lobby Room 618-620
AD-501-M| XQuery Performance 46