Query Processing Innovations for data intensive, modern applications
1.
2.
3. Agenda
• Learn how your application can benefit from new query
processing capabilities in the Azure SQL Database and SQL Server
platform
• Graph data processing to model complex relationships between
objects
• Advanced self-tuning query processing to solve or avoid performance
related problems
4.
5. What is a Graph Database?
Bob
USB
Flash
Drive
White
Chocolate
FriendOf
Bought
Bought
Furry
Socks
City
from
Bought
Mary
6. Our approach – Embrace and Extend
Backed by Research
References
J. Fan, A. Gerald, S. Raj and J. M. Patel,
"The case against specialized graph
analytics engines," in CIDR, Asilomar, CA,
2015.
A. Jindal, S. Madden, M. Castellanos and
M. Hsu, "Graph analytics using vertica
relational database," in IEEE BigData,
Santa Clara, CA, 2015
Matured Product
40+ years of academic and
industry research.
Highly evolved ecosystem,
including tooling and
community support
Build on-prem, cloud,
Hybrid Solutions
Best of both relational
and graph database on a
single platform
Trusted
Used and trusted by
millions of customers for
enterprise and mission
critical workloads.
9. DDL Extensions: CREATE NODE
CREATE TABLE Customers (
[CustomerID] INTEGER NOT NULL,
[CustomerName] NVARCHAR(100) NOT NULL,
[WebsiteURL] NVARCHAR(256) NOT NULL
) AS NODE
GO
SELECT TOP 5 * FROM Customers;
10. DDL Extensions: CREATE EDGE
CREATE TABLE Bought (
[PurchasedCount] BIGINT
) AS EDGE;
GO
CREATE TABLE FriendOf AS EDGE
GO
SELECT TOP 5 * FROM Bought;
11. Inserting data into graph tables
INSERT INTO Customers(CustomerName, …)
SELECT CustomerName, …
FROM WideWorldImporters.Sales.Customers
GO
INSERT INTO Bought($from_id, $to_id, PurchasedCount)
SELECT Customers.$node_id, StockItems.$node_id, @purchasecount
FROM Customers, StockItems
WHERE Customers.CustomerID = @customer_id
AND StockItems.StockItemID = @stockitem_id
GO
12. Query Language Extensions: MATCH
SELECT
CustomerName,
StockItemName
FROM
StockItems,
Customers,
Bought
WHERE MATCH(Customers-(Bought)->StockItems)
AND StockItemName = 'White chocolate snow balls 250g'
14. Product Recommendations (“Before”)
WITH Current_Usr AS
(
SELECT
CustomerID = 88,
StockItemID = 226, -- 'White chocolate snow balls 250g'
PurchasedCount = 1
) ,
-- Identify the other users who have also purchased the item he/she is
looking for
Other_Usr AS
(
SELECT
C.CustomerID,
P.StockItemID,
Purchased_by_others = COUNT(*)
FROM
Sales.OrderLines AS OD
JOIN
Sales.Orders AS OH ON OH.OrderID=OD.OrderID
JOIN
Sales.Customers AS C ON OH.CustomerID=C.CustomerID
JOIN
Current_Usr AS P ON P.StockItemID=OD.StockItemID
WHERE
C.CustomerID<>P.CustomerID
GROUP BY
C.CustomerID, P.StockItemID
) ,
-- Find the other items which those other customers have also purchased
Other_Items AS
(
SELECT
C.CustomerID,
P.StockItemID,
Other_purchased = COUNT(*)
FROM
Sales.OrderLines AS OD
JOIN
Sales.Orders AS OH ON OH.OrderID=OD.OrderID
JOIN
Other_Usr AS C ON OH.CustomerID=C.CustomerID
JOIN
Warehouse.StockItems AS P ON P.StockItemID=OD.StockItemID
WHERE
P.StockItemName<>'White chocolate snow balls 250g'
GROUP BY
C.CustomerID, P.StockItemID
)
-- Outer query
-- Recommend to the current user to the top items from those other items,
-- ordered by the number of times they were purchased
SELECT
top 10 P.StockItemName,
COUNT(Other_purchased)
FROM
Other_Items
JOIN
Warehouse.StockItems AS P ON P.StockItemID=Other_Items.StockItemID
GROUP BY
P.StockItemName
ORDER BY
COUNT(Other_purchased) DESC;
GO
15. SELECT
TOP 10 RecommendedItem.StockItemName,
COUNT(*)
FROM
StockItems AS Item,
Customers AS C,
Bought AS BoughtOther,
Bought AS BoughtThis,
StockItems AS RecommendedItem
WHERE
MATCH(RecommendedItem<-(BoughtOther)-C-(BoughtThis)->Item)
AND Item.StockItemName LIKE 'White chocolate snow balls 250g’
AND (Item.StockItemName <> RecommendedItem.StockItemName)
AND C.customerID <> 88
GROUP BY
RecommendedItem.StockItemName
ORDER BY COUNT(*) DESC;
GO
Product Recommendations with SQL Graph (“After”)
16. SELECT
TOP 10 RecommendedItem.StockItemName,
COUNT(*)
FROM
StockItems AS Item,
Customers AS C,
Bought AS BoughtOther,
Bought AS BoughtThis,
StockItems AS RecommendedItem
WHERE
Item.StockItemName LIKE 'White chocolate snow balls 250g'
AND MATCH(RecommendedItem<-(BoughtOther)-C-(BoughtThis)->Item)
AND (Item.StockItemName <> RecommendedItem.StockItemName)
and C.customerID <> 88
GROUP BY
RecommendedItem.StockItemName
ORDER BY COUNT(*) DESC;
GO
Product Recommendations with SQL Graph (“After”)
17. Design Choices
• Relational vs. Graph
Graph and relational
designs can answer
similar questions
But if traversal of
relationships define the
primary application
requirements, Graph can
solve this more intuitively
and with less code
Recommendation
Systems
Fraud Detection
Content Management
Bill of Materials,
product hierarchy
CRM
20. Query
•The query
text is parsed
in a logical
tree
Transformations
•QO comes up
with equivalent
variations of
the tree
Cardinality
Estimation (CE)
•CE estimates
number of
rows flowing
through each
operator in
each tree
Costing
•Based on the
CE, each
equivalent tree
gets a cost.
The cheapest
one is the
winner.
Plan Cache
•The plan is
cached for re-
use by queries
with the same
text
Query
•Same query is
executed
Plan Cache
•The previously
cached plan is
retrieved
Memory, DOP
•Based on the
CE in the final
plan, memory
grant is
acquired
Query
Execution
•The query
starts
executing
according to
the plan
Query
Execution
(QE)
What could go wrong?
Query
Optimization
(QO)
21. Missing statistics Stale statistics
Inadequate
statistics sample
rate
Bad parameter
sniffing scenarios
Out-of-model
query constructs
• E.g. Multi-Statement
TVFs, table
variables, XQuery
Assumptions not
aligned with data
being queried
• E.g. independence
vs. correlation
22. Slow query
response time due
to inefficient plans
Excessive resource
utilization (CPU,
Memory, IO)
Spills to disk
Reduced
throughput and
concurrency
T-SQL refactoring
to work around off-
model statements
23. Problem: Multi-statement table
valued functions (MSTVFs) are
treated as a black box by QP
and we use a fixed optimization
guess
Interleaved Execution will
materialize and use row counts
for MSTVFs
Downstream operations will
benefit from the corrected
MSTVF cardinality estimate
24. Problem: Queries may spill to disk or
take too much memory based on poor
cardinality estimates
MGF will adjust memory grants based
on execution feedback
MGF will remove spills and improve
concurrency for repeating queries
25. Problem: If cardinality estimates are
skewed, we may choose an
inappropriate join algorithm
Adaptive Join will defer the choice of
hash join or nested loop until after the
first join input has been scanned
Adaptive Join uses nested loop for small
inputs, hash joins for large inputs
28. Better performance with Automatic Plan Correction
Automatically fix
problems without
tuning
Better Performance with
Automatic Tuning
Continuous performance plan monitoring
and analysis
Detect problematic plans
Automatically fix performance problems
caused by SQL plan choice regressions
Querytimes
Plan 1 Plan 2 Plan 3 Plan 2
Revert to previously
effective plan
29. • dm_db_tuning_recommendations
Detect
• and system corrects
Turn on Auto
• to “last known good”
Reverts back
Can be helpful for parameter
sniffing situations
Automatic Tuning
32. SQL Graph Overview
SQL Graph Architechture
About SQL Graph
Bulk Insert Best Practices
Recommendation System on Million Song Dataset
Product Recommendations in WideWorldImporters using SQL Graph
Recursive Queries and Shortest Path
Shortest Path on Yelp Dataset