SlideShare a Scribd company logo
Measures in SQL
Julian Hyde (Google)
John Fremlin (Google)
2024-06-11 17:30 Europa
Measures in SQL
ABSTRACT
SQL has attained widespread adoption, but Business Intelligence tools still use their
own higher level languages based upon a multidimensional paradigm. Composable
calculations are what is missing from SQL, and we propose a new kind of column,
called a measure, that attaches a calculation to a table. Like regular tables, tables
with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional
languages but retains SQL semantics. Measure invocations can be expanded in place
to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive
expressions (a way to evaluate multidimensional expressions that is consistent with
existing SQL semantics), a concept called evaluation context, and several operations
for setting and modifying the evaluation context.
SIGMOD, June 9–15, 2024, Santiago, Chile
Julian Hyde
Google Inc.
San Francisco, CA, USA
julianhyde@google.com
John Fremlin
Google Inc.
New York, NY, USA
fremlin@google.com
1. Problem
Tables are broken!
Tables are unable to provide reusable calculations.
Problem: Calculate profit margin of orders
SELECT prodName,
(SUM(revenue) - SUM(cost))
/ SUM(revenue) AS profitMargin
FROM Orders
WHERE prodName = ‘Happy’;
profitMargin
============
0.47
prodName custName orderDate revenue cost
Happy Alice 2023/11/28 6 4
Acme Bob 2023/11/27 5 2
Happy Alice 2024/11/28 7 4
Whizz Celia 2023/11/25 3 1
Happy Bob 2022/11/27 4 1
SELECT prodName,
(SUM(revenue) - SUM(cost))
/ SUM(revenue) AS profitMargin
FROM Orders
WHERE prodName = ‘Happy’;
profitMargin
============
0.47
Attempted solution: Create a view
SELECT AVG(profitMargin) AS profitMargin
FROM SummarizedOrders
WHERE prodName = ‘Happy’;
profitMargin
============
0.50
CREATE VIEW SummarizedOrders AS
SELECT prodName, orderDate,
(SUM(revenue) - SUM(cost))
/ SUM(revenue) AS profitMargin
FROM Orders
GROUP BY prodName, orderDate;
prodName custName orderDate revenue cost
Happy Alice 2023/11/28 6 4
Acme Bob 2023/11/27 5 2
Happy Alice 2024/11/28 7 4
Whizz Celia 2023/11/25 3 1
Happy Bob 2022/11/27 4 1
SELECT prodName,
(SUM(revenue) - SUM(cost))
/ SUM(revenue) AS profitMargin
FROM Orders
WHERE prodName = ‘Happy’;
profitMargin
============
0.47
2. Theory
1. Allow tables to have measures DESCRIBE EnhancedOrders;
column type
============ ==============
prodName STRING
custName STRING
orderDate DATE
revenue INTEGER
cost INTEGER
profitMargin DOUBLE MEASURE
2. Operators for evaluating measures
SELECT prodName, profitMargin
FROM EnhancedOrders
GROUP BY prodName;
prodName profitMargin
======== ============
Acme 0.60
Happy 0.47
Whizz 0.67
3. Syntax to define measures in a query
SELECT *,
(SUM(revenue) - SUM(cost)) / SUM(revenue)
AS MEASURE profitMargin
FROM Orders
GROUP BY prodName;
Extend the relational model with measures
SELECT prodName,
profitMargin
FROM EnhancedOrders
GROUP BY prodName;
Definitions
A context-sensitive expression (CSE) is an expression
whose value is determined by an evaluation context.
An evaluation context is a predicate whose terms are one
or more columns from the same table.
● This set of columns is the dimensionality of the
CSE.
A measure is a special kind of column that becomes a
CSE when used in a query.
● A measure’s dimensionality is the set of non-
measure columns in its table.
● The data type of a measure that returns a value of
type t is t MEASURE, e.g. INTEGER MEASURE.
prodName profitMargin
======== ============
Acme 0.60
Happy 0.50
Whizz 0.67
SELECT (SUM(revenue) - SUM(cost))
/ SUM(revenue) AS profitMargin
FROM Orders
WHERE prodName = ‘Acme’;
profitMargin
============
0.60
profitMargin is a
measure (and a CSE)
Dimensionality is
{prodName, custName,
orderDate, revenue, cost}
Evaluation context for
this cell is
prodName = ‘Acme’
SELECT (SUM(revenue) - SUM(cost))
/ SUM(revenue) AS m
FROM Orders
WHERE prodName = ‘Whizz’
AND custName = ‘Bob’;
m
====
NULL
SELECT (SUM(revenue) - SUM(cost))
/ SUM(revenue) AS m
FROM Orders
WHERE prodName = ‘Acme’;
m
====
0.60
SELECT (SUM(revenue) - SUM(cost))
/ SUM(revenue) AS m
FROM Orders
WHERE prodName = ‘Happy’;
m
====
0.50
SELECT prodName,
profitMargin,
profitMargin
AT (SET prodName = ‘Happy’)
AS happyMargin,
profitMargin
AT (SET custName = ‘Bob’)
AS bobMargin
FROM EnhancedOrders
GROUP BY prodName;
AT operator
The context transformation operator AT modifies the
evaluation context.
Syntax:
expression AT (contextModifier…)
contextModifier ::=
WHERE predicate
| ALL
| ALL dimension
| SET dimension = [CURRENT] expression
| VISIBLE
prodName profitMargin happyMargin bobMargin
======== ============ =========== =========
Acme 0.60 0.50 0.60
Happy 0.50 0.50 0.75
Whizz 0.67 0.50 NULL
Evaluation context
for this cell is
prodName = ‘Acme’
Evaluation context
for these cells is
prodName = ‘Happy’
Evaluation context for
this cell is
prodName = ‘Whizz’
AND custName = ‘Bob’
3. Consequences
Grain-locking
What is the average age of the customer
who would ordered each product?
When we use an aggregate function in
a join query, it will ‘double count’ if the
join duplicates rows.
This is generally not we want for
measures – except if we want a
weighted average – but is difficult to
avoid in SQL.
Measures are locked to the grain of the
table that defined them.
WITH EnhancedCustomers AS (
SELECT *,
AVG(custAge) AS MEASURE avgAge
FROM Customers)
SELECT o.prodName,
AVG(c.custAge) AS weightedAvgAge,
c.avgAge AS avgAge
FROM Orders AS o
JOIN EnhancedCustomers AS c USING (custName)
GROUP BY o.prodName;
prodName weightedAvgAge avgAge
======== ============== ======
Acme 41 41
Happy 29 32
Whizz 17 17
prodName custName orderDate revenue cost
Happy Alice 2023/11/28 6 4
Acme Bob 2023/11/27 5 2
Happy Alice 2024/11/28 7 4
Whizz Celia 2023/11/25 3 1
Happy Bob 2022/11/27 4 1
custName custAge
Alice 23
Bob 41
Celia 17
Alice (age 23)
has two orders;
Bob (age 41) has
one order.
Measures prevent self-joins
In 2020, what was the revenue and year-
on-year revenue growth of each product?
SELECT o20.prodName
o20.sumRevenue,
o20.sumRevenue - o19.sumRevenue
AS revenueGrowth
FROM (
SELECT prodName,
SUM(revenue) AS sumRevenue
FROM Orders
JOIN Products USING (prodName)
WHERE YEAR(orderDate) = 2020
GROUP BY prodName) AS o20
JOIN (
SELECT prodName,
SUM(revenue) AS sumRevenue
FROM Orders
JOIN Products USING (prodName)
WHERE YEAR(orderDate) = 2019
GROUP BY prodName) AS o19
ON o20.prodName = 019.prodName;
SELECT prodName,
sumRevenue,
sumRevenue
- sumRevenue AT (SET YEAR(orderDate)
= CURRENT YEAR(orderDate) - 1)
FROM (
SELECT *,
SUM(revenue) AS MEASURE sumRevenue
FROM Orders
JOIN Products USING (prodName))
WHERE YEAR(orderDate) = 2020
GROUP BY prodName;
Relational algebra (bottom-up) Multidimensional (top-down)
Products
Customers
⨝
⨝
Σ
⨝
σ
Orders
Products
Customers
⨝
⨝
Σ
σ
Orders
π
(customer: all,
orderYear: 2019,
prodName: all)
(customer: all,
orderYear: 2020,
prodName: all)
custName
prodName
orderDate
Bottom-up vs Top-down query
Represent a Business Intelligence model as a SQL view
Orders Products
Customers
CREATE VIEW OrdersCube AS
SELECT *
FROM (
SELECT o.orderDate AS `order.date`,
o.revenue AS `order.revenue`,
SUM(o.revenue) AS MEASURE `order.sum_revenue`
FROM Orders) AS o
LEFT JOIN (
SELECT c.custName AS `customer.name`,
c.state AS `customer.state`,
c.custAge AS `customer.age`,
AVG(c.custAge) AS MEASURE `customer.avg_age`
FROM Customers) AS c
ON o.custName = c.custName
LEFT JOIN (
SELECT p.prodName AS `product.name`,
p.color AS `product.color`,
AVG(p.weight) AS MEASURE `product.avg_weight`
FROM Products) AS p
ON o.prodName = p.prodName;
SELECT `customer.state`, `product.avg_weight`
FROM OrdersCube
GROUP BY `customer.state`;
● SQL planner handles view expansion
● Grain locking makes it safe to use a
star schema
● Users can define new models simply
by writing queries
SELECT *,
SUM(cost) AS MEASURE sumCost,
SUM(revenue) AS MEASURE sumRevenue
FROM Orders
Composition & closure
Just as tables are closed under queries, so
tables-with-measures are closed under
queries-with-measures
Measures can reference measures
Complex analytical calculations without
touching the FROM clause
Evaluation contexts can be nested
SELECT *,
SUM(cost) AS MEASURE sumCost,
SUM(revenue) AS MEASURE sumRevenue,
(sumRevenue - sumCost) / sumRevenue
AS MEASURE profitMargin
FROM Orders
SELECT *,
SUM(cost) AS MEASURE sumCost,
SUM(revenue) AS MEASURE sumRevenue,
(sumRevenue - sumCost) / sumRevenue
AS MEASURE profitMargin,
sumRevenue
- sumRevenue AT (SET YEAR(orderDate)
= CURRENT YEAR(orderDate) - 1)
AS MEASURE revenueGrowthYoY
FROM Orders
SELECT *,
SUM(cost) AS MEASURE sumCost,
SUM(revenue) AS MEASURE sumRevenue,
(sumRevenue - sumCost) / sumRevenue
AS MEASURE profitMargin,
sumRevenue
- sumRevenue AT (SET YEAR(orderDate)
= CURRENT YEAR(orderDate) - 1)
AS MEASURE revenueGrowthYoY,
ARRAY_AGG(productId
ORDER BY sumRevenue DESC LIMIT 5)
AT (ALL productId)
AS MEASURE top5Products
FROM Orders;
SELECT *,
SUM(cost) AS MEASURE sumCost,
SUM(revenue) AS MEASURE sumRevenue,
(sumRevenue - sumCost) / sumRevenue
AS MEASURE profitMargin,
sumRevenue
- sumRevenue AT (SET YEAR(orderDate)
= CURRENT YEAR(orderDate) - 1)
AS MEASURE revenueGrowthYoY,
ARRAY_AGG(productId
ORDER BY sumRevenue DESC LIMIT 5)
AT (ALL productId)
AS MEASURE top5Products,
ARRAY_AGG(customerId
ORDER BY sumRevenue DESC LIMIT 3)
AT (ALL customerId
SET productId MEMBER OF top5Products
AT (SET YEAR(orderDate)
= CURRENT YEAR(orderDate) - 1))
AS MEASURE top3CustomersOfTop5Products
FROM Orders;
Implementing measures & CSEs as SQL rewrites
simple
complex
Complexity Query Expanded query
Simple measure
can be inlined
SELECT prodName, avgRevenue
FROM OrdersCube
GROUP BY prodName
SELECT prodName, AVG(revenue)
FROM orders
GROUP BY prodName
Join requires grain-
locking
SELECT prodName, avgAge
FROM OrdersCube
GROUP BY prodName
SELECT o.prodName, AVG(c.custAge PER
c.custName) FROM orders JOIN customers
GROUP BY prodName
→ (something with GROUPING SETS)
Period-over- period SELECT prodName, avgAge -
avgAge AT (SET year =
CURRENT year - 1)
FROM OrdersCube
GROUP BY prodName
(something with window aggregates)
Scalar subquery
can accomplish
anything
SELECT prodName, prodColor
avgAge AT (ALL custState
SET year = CURRENT year - 1)
FROM OrdersCube
GROUP BY prodName, prodColor
SELECT prodName, prodColor,
(SELECT … FROM orders
WHERE <evaluation context>)
FROM orders
GROUP BY prodName, prodColor
Summary
Measures provide reusable calculations
● Can represent BI models (aka ‘cubes’, ‘semantic layer’) as SQL views
Top-down evaluation makes queries concise
● Fewer self joins → fewer user errors, less planner effort, more efficient execution
Measures don’t break SQL
● Queries without measures give same results to regular SQL
● Queries with measures give same row count as regular SQL
● Measures can be implemented by expanding to SQL
Measures provide
reusable
calculations in
SQL
https://doi.org/10.1145/3626246.3653374
@julianhyde
@JohnFremlin
@ApacheCalcite
https://calcite.apache.org

More Related Content

Similar to Measures in SQL (SIGMOD 2024, Santiago, Chile)

MySql: Queries
MySql: QueriesMySql: Queries
MySql: Queries
DataminingTools Inc
 
MySQL Queries
MySQL QueriesMySQL Queries
MySQL Queries
mysql content
 
Intro to SQL for Beginners
Intro to SQL for BeginnersIntro to SQL for Beginners
Intro to SQL for Beginners
Product School
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
Vasudev pendyala
 
Sql wksht-3
Sql wksht-3Sql wksht-3
Sql wksht-3
Mukesh Tekwani
 
Processes in Query Optimization in (ABMS) Advanced Database Management Systems
Processes in Query Optimization in (ABMS) Advanced Database Management Systems Processes in Query Optimization in (ABMS) Advanced Database Management Systems
Processes in Query Optimization in (ABMS) Advanced Database Management Systems
gamemaker762
 
Aggregate Functions,Final
Aggregate Functions,FinalAggregate Functions,Final
Aggregate Functions,Final
mukesh24pandey
 
Oracle_Analytical_function.pdf
Oracle_Analytical_function.pdfOracle_Analytical_function.pdf
Oracle_Analytical_function.pdf
KalyankumarVenkat1
 
Uncertainty-Penalized Portfolio Optimization
Uncertainty-Penalized Portfolio OptimizationUncertainty-Penalized Portfolio Optimization
Uncertainty-Penalized Portfolio Optimization
Anish Shah
 
Oracle query optimizer
Oracle query optimizerOracle query optimizer
Oracle query optimizer
Smitha Padmanabhan
 
Meet the CBO in Version 11g
Meet the CBO in Version 11gMeet the CBO in Version 11g
Meet the CBO in Version 11g
Sage Computing Services
 
DBIC 2 - Resultsets
DBIC 2 - ResultsetsDBIC 2 - Resultsets
DBIC 2 - Resultsets
Aran Deltac
 
Oracle tips and tricks
Oracle tips and tricksOracle tips and tricks
Oracle tips and tricks
Yanli Liu
 
Open06
Open06Open06
Open06
butest
 
Use Oracle 9i Summary Advisor To Better Manage Your Data Warehouse
Use Oracle 9i Summary Advisor To Better Manage Your Data WarehouseUse Oracle 9i Summary Advisor To Better Manage Your Data Warehouse
Use Oracle 9i Summary Advisor To Better Manage Your Data Warehouse
info_sunrise24
 
Part3 Explain the Explain Plan
Part3 Explain the Explain PlanPart3 Explain the Explain Plan
Part3 Explain the Explain Plan
Maria Colgan
 
fdocuments.in_the-model-clause-explained (1).pptx
fdocuments.in_the-model-clause-explained (1).pptxfdocuments.in_the-model-clause-explained (1).pptx
fdocuments.in_the-model-clause-explained (1).pptx
hesham alataby
 
Simplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functionsSimplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functions
Clayton Groom
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolio
Chris Seebacher
 
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docxWeek 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
jessiehampson
 

Similar to Measures in SQL (SIGMOD 2024, Santiago, Chile) (20)

MySql: Queries
MySql: QueriesMySql: Queries
MySql: Queries
 
MySQL Queries
MySQL QueriesMySQL Queries
MySQL Queries
 
Intro to SQL for Beginners
Intro to SQL for BeginnersIntro to SQL for Beginners
Intro to SQL for Beginners
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
 
Sql wksht-3
Sql wksht-3Sql wksht-3
Sql wksht-3
 
Processes in Query Optimization in (ABMS) Advanced Database Management Systems
Processes in Query Optimization in (ABMS) Advanced Database Management Systems Processes in Query Optimization in (ABMS) Advanced Database Management Systems
Processes in Query Optimization in (ABMS) Advanced Database Management Systems
 
Aggregate Functions,Final
Aggregate Functions,FinalAggregate Functions,Final
Aggregate Functions,Final
 
Oracle_Analytical_function.pdf
Oracle_Analytical_function.pdfOracle_Analytical_function.pdf
Oracle_Analytical_function.pdf
 
Uncertainty-Penalized Portfolio Optimization
Uncertainty-Penalized Portfolio OptimizationUncertainty-Penalized Portfolio Optimization
Uncertainty-Penalized Portfolio Optimization
 
Oracle query optimizer
Oracle query optimizerOracle query optimizer
Oracle query optimizer
 
Meet the CBO in Version 11g
Meet the CBO in Version 11gMeet the CBO in Version 11g
Meet the CBO in Version 11g
 
DBIC 2 - Resultsets
DBIC 2 - ResultsetsDBIC 2 - Resultsets
DBIC 2 - Resultsets
 
Oracle tips and tricks
Oracle tips and tricksOracle tips and tricks
Oracle tips and tricks
 
Open06
Open06Open06
Open06
 
Use Oracle 9i Summary Advisor To Better Manage Your Data Warehouse
Use Oracle 9i Summary Advisor To Better Manage Your Data WarehouseUse Oracle 9i Summary Advisor To Better Manage Your Data Warehouse
Use Oracle 9i Summary Advisor To Better Manage Your Data Warehouse
 
Part3 Explain the Explain Plan
Part3 Explain the Explain PlanPart3 Explain the Explain Plan
Part3 Explain the Explain Plan
 
fdocuments.in_the-model-clause-explained (1).pptx
fdocuments.in_the-model-clause-explained (1).pptxfdocuments.in_the-model-clause-explained (1).pptx
fdocuments.in_the-model-clause-explained (1).pptx
 
Simplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functionsSimplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functions
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolio
 
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docxWeek 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
 

More from Julian Hyde

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using Calcite
Julian Hyde
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
Julian Hyde
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
Julian Hyde
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
Julian Hyde
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
Julian Hyde
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
Julian Hyde
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
Julian Hyde
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're Incubating
Julian Hyde
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Julian Hyde
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
Julian Hyde
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
Julian Hyde
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
Julian Hyde
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
Julian Hyde
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Julian Hyde
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Julian Hyde
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
Julian Hyde
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
Julian Hyde
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
Julian Hyde
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
Julian Hyde
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
Julian Hyde
 

More from Julian Hyde (20)

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using Calcite
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're Incubating
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 

Recently uploaded

Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
3610stuck
 
To Avoid Mistakes When Using Online Attendance Sheets
To Avoid Mistakes When Using Online Attendance SheetsTo Avoid Mistakes When Using Online Attendance Sheets
To Avoid Mistakes When Using Online Attendance Sheets
Task Tracker
 
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
87tomato
 
Celebrity Girls Call Mumbai 9930687706 Unlimited Short Providing Girls Servic...
Celebrity Girls Call Mumbai 9930687706 Unlimited Short Providing Girls Servic...Celebrity Girls Call Mumbai 9930687706 Unlimited Short Providing Girls Servic...
Celebrity Girls Call Mumbai 9930687706 Unlimited Short Providing Girls Servic...
kiara pandey
 
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
neshakor5152
 
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
ThousandEyes
 
GT degree offer diploma Transcript
GT degree offer diploma TranscriptGT degree offer diploma Transcript
GT degree offer diploma Transcript
attueb
 
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
revolutionary575
 
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
shanihomely
 
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdfAI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
Daniel Zivkovic
 
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
ashiklo9823
 
Wired_2.0_Create_AmsterdamJUG_09072024.pptx
Wired_2.0_Create_AmsterdamJUG_09072024.pptxWired_2.0_Create_AmsterdamJUG_09072024.pptx
Wired_2.0_Create_AmsterdamJUG_09072024.pptx
SimonedeGijt
 
NYGGS 360: A Complete ERP for Construction Innovation
NYGGS 360: A Complete ERP for Construction InnovationNYGGS 360: A Complete ERP for Construction Innovation
NYGGS 360: A Complete ERP for Construction Innovation
NYGGS Construction ERP Software
 
ERP Software Solutions Provider in Coimbatore
ERP Software Solutions Provider in CoimbatoreERP Software Solutions Provider in Coimbatore
ERP Software Solutions Provider in Coimbatore
Nextskill Technologies
 
Artificial intelligence in customer services or chatbots
Artificial intelligence  in customer services or chatbotsArtificial intelligence  in customer services or chatbots
Artificial intelligence in customer services or chatbots
kayash1656
 
Celebrity Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Servic...
Celebrity Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Servic...Celebrity Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Servic...
Celebrity Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Servic...
45unexpected
 
TEQnation 2024: Sustainable Software: May the Green Code Be with You
TEQnation 2024: Sustainable Software: May the Green Code Be with YouTEQnation 2024: Sustainable Software: May the Green Code Be with You
TEQnation 2024: Sustainable Software: May the Green Code Be with You
marcofolio
 
Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazioneIl Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech.Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech
 
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
jealousviolet
 

Recently uploaded (20)

Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
 
To Avoid Mistakes When Using Online Attendance Sheets
To Avoid Mistakes When Using Online Attendance SheetsTo Avoid Mistakes When Using Online Attendance Sheets
To Avoid Mistakes When Using Online Attendance Sheets
 
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
 
Celebrity Girls Call Mumbai 9930687706 Unlimited Short Providing Girls Servic...
Celebrity Girls Call Mumbai 9930687706 Unlimited Short Providing Girls Servic...Celebrity Girls Call Mumbai 9930687706 Unlimited Short Providing Girls Servic...
Celebrity Girls Call Mumbai 9930687706 Unlimited Short Providing Girls Servic...
 
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
 
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
 
GT degree offer diploma Transcript
GT degree offer diploma TranscriptGT degree offer diploma Transcript
GT degree offer diploma Transcript
 
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
 
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
 
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdfAI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
 
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
 
Wired_2.0_Create_AmsterdamJUG_09072024.pptx
Wired_2.0_Create_AmsterdamJUG_09072024.pptxWired_2.0_Create_AmsterdamJUG_09072024.pptx
Wired_2.0_Create_AmsterdamJUG_09072024.pptx
 
NYGGS 360: A Complete ERP for Construction Innovation
NYGGS 360: A Complete ERP for Construction InnovationNYGGS 360: A Complete ERP for Construction Innovation
NYGGS 360: A Complete ERP for Construction Innovation
 
ERP Software Solutions Provider in Coimbatore
ERP Software Solutions Provider in CoimbatoreERP Software Solutions Provider in Coimbatore
ERP Software Solutions Provider in Coimbatore
 
Artificial intelligence in customer services or chatbots
Artificial intelligence  in customer services or chatbotsArtificial intelligence  in customer services or chatbots
Artificial intelligence in customer services or chatbots
 
Celebrity Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Servic...
Celebrity Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Servic...Celebrity Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Servic...
Celebrity Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Servic...
 
TEQnation 2024: Sustainable Software: May the Green Code Be with You
TEQnation 2024: Sustainable Software: May the Green Code Be with YouTEQnation 2024: Sustainable Software: May the Green Code Be with You
TEQnation 2024: Sustainable Software: May the Green Code Be with You
 
Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazioneIl Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione
 
Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech.Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech.
 
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
 

Measures in SQL (SIGMOD 2024, Santiago, Chile)

  • 1. Measures in SQL Julian Hyde (Google) John Fremlin (Google) 2024-06-11 17:30 Europa
  • 2. Measures in SQL ABSTRACT SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries. SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL. To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context. SIGMOD, June 9–15, 2024, Santiago, Chile Julian Hyde Google Inc. San Francisco, CA, USA julianhyde@google.com John Fremlin Google Inc. New York, NY, USA fremlin@google.com
  • 4. Tables are broken! Tables are unable to provide reusable calculations.
  • 5. Problem: Calculate profit margin of orders SELECT prodName, (SUM(revenue) - SUM(cost)) / SUM(revenue) AS profitMargin FROM Orders WHERE prodName = ‘Happy’; profitMargin ============ 0.47 prodName custName orderDate revenue cost Happy Alice 2023/11/28 6 4 Acme Bob 2023/11/27 5 2 Happy Alice 2024/11/28 7 4 Whizz Celia 2023/11/25 3 1 Happy Bob 2022/11/27 4 1 SELECT prodName, (SUM(revenue) - SUM(cost)) / SUM(revenue) AS profitMargin FROM Orders WHERE prodName = ‘Happy’; profitMargin ============ 0.47
  • 6. Attempted solution: Create a view SELECT AVG(profitMargin) AS profitMargin FROM SummarizedOrders WHERE prodName = ‘Happy’; profitMargin ============ 0.50 CREATE VIEW SummarizedOrders AS SELECT prodName, orderDate, (SUM(revenue) - SUM(cost)) / SUM(revenue) AS profitMargin FROM Orders GROUP BY prodName, orderDate; prodName custName orderDate revenue cost Happy Alice 2023/11/28 6 4 Acme Bob 2023/11/27 5 2 Happy Alice 2024/11/28 7 4 Whizz Celia 2023/11/25 3 1 Happy Bob 2022/11/27 4 1 SELECT prodName, (SUM(revenue) - SUM(cost)) / SUM(revenue) AS profitMargin FROM Orders WHERE prodName = ‘Happy’; profitMargin ============ 0.47
  • 8. 1. Allow tables to have measures DESCRIBE EnhancedOrders; column type ============ ============== prodName STRING custName STRING orderDate DATE revenue INTEGER cost INTEGER profitMargin DOUBLE MEASURE 2. Operators for evaluating measures SELECT prodName, profitMargin FROM EnhancedOrders GROUP BY prodName; prodName profitMargin ======== ============ Acme 0.60 Happy 0.47 Whizz 0.67 3. Syntax to define measures in a query SELECT *, (SUM(revenue) - SUM(cost)) / SUM(revenue) AS MEASURE profitMargin FROM Orders GROUP BY prodName; Extend the relational model with measures
  • 9. SELECT prodName, profitMargin FROM EnhancedOrders GROUP BY prodName; Definitions A context-sensitive expression (CSE) is an expression whose value is determined by an evaluation context. An evaluation context is a predicate whose terms are one or more columns from the same table. ● This set of columns is the dimensionality of the CSE. A measure is a special kind of column that becomes a CSE when used in a query. ● A measure’s dimensionality is the set of non- measure columns in its table. ● The data type of a measure that returns a value of type t is t MEASURE, e.g. INTEGER MEASURE. prodName profitMargin ======== ============ Acme 0.60 Happy 0.50 Whizz 0.67 SELECT (SUM(revenue) - SUM(cost)) / SUM(revenue) AS profitMargin FROM Orders WHERE prodName = ‘Acme’; profitMargin ============ 0.60 profitMargin is a measure (and a CSE) Dimensionality is {prodName, custName, orderDate, revenue, cost} Evaluation context for this cell is prodName = ‘Acme’
  • 10. SELECT (SUM(revenue) - SUM(cost)) / SUM(revenue) AS m FROM Orders WHERE prodName = ‘Whizz’ AND custName = ‘Bob’; m ==== NULL SELECT (SUM(revenue) - SUM(cost)) / SUM(revenue) AS m FROM Orders WHERE prodName = ‘Acme’; m ==== 0.60 SELECT (SUM(revenue) - SUM(cost)) / SUM(revenue) AS m FROM Orders WHERE prodName = ‘Happy’; m ==== 0.50 SELECT prodName, profitMargin, profitMargin AT (SET prodName = ‘Happy’) AS happyMargin, profitMargin AT (SET custName = ‘Bob’) AS bobMargin FROM EnhancedOrders GROUP BY prodName; AT operator The context transformation operator AT modifies the evaluation context. Syntax: expression AT (contextModifier…) contextModifier ::= WHERE predicate | ALL | ALL dimension | SET dimension = [CURRENT] expression | VISIBLE prodName profitMargin happyMargin bobMargin ======== ============ =========== ========= Acme 0.60 0.50 0.60 Happy 0.50 0.50 0.75 Whizz 0.67 0.50 NULL Evaluation context for this cell is prodName = ‘Acme’ Evaluation context for these cells is prodName = ‘Happy’ Evaluation context for this cell is prodName = ‘Whizz’ AND custName = ‘Bob’
  • 12. Grain-locking What is the average age of the customer who would ordered each product? When we use an aggregate function in a join query, it will ‘double count’ if the join duplicates rows. This is generally not we want for measures – except if we want a weighted average – but is difficult to avoid in SQL. Measures are locked to the grain of the table that defined them. WITH EnhancedCustomers AS ( SELECT *, AVG(custAge) AS MEASURE avgAge FROM Customers) SELECT o.prodName, AVG(c.custAge) AS weightedAvgAge, c.avgAge AS avgAge FROM Orders AS o JOIN EnhancedCustomers AS c USING (custName) GROUP BY o.prodName; prodName weightedAvgAge avgAge ======== ============== ====== Acme 41 41 Happy 29 32 Whizz 17 17 prodName custName orderDate revenue cost Happy Alice 2023/11/28 6 4 Acme Bob 2023/11/27 5 2 Happy Alice 2024/11/28 7 4 Whizz Celia 2023/11/25 3 1 Happy Bob 2022/11/27 4 1 custName custAge Alice 23 Bob 41 Celia 17 Alice (age 23) has two orders; Bob (age 41) has one order.
  • 13. Measures prevent self-joins In 2020, what was the revenue and year- on-year revenue growth of each product? SELECT o20.prodName o20.sumRevenue, o20.sumRevenue - o19.sumRevenue AS revenueGrowth FROM ( SELECT prodName, SUM(revenue) AS sumRevenue FROM Orders JOIN Products USING (prodName) WHERE YEAR(orderDate) = 2020 GROUP BY prodName) AS o20 JOIN ( SELECT prodName, SUM(revenue) AS sumRevenue FROM Orders JOIN Products USING (prodName) WHERE YEAR(orderDate) = 2019 GROUP BY prodName) AS o19 ON o20.prodName = 019.prodName; SELECT prodName, sumRevenue, sumRevenue - sumRevenue AT (SET YEAR(orderDate) = CURRENT YEAR(orderDate) - 1) FROM ( SELECT *, SUM(revenue) AS MEASURE sumRevenue FROM Orders JOIN Products USING (prodName)) WHERE YEAR(orderDate) = 2020 GROUP BY prodName;
  • 14. Relational algebra (bottom-up) Multidimensional (top-down) Products Customers ⨝ ⨝ Σ ⨝ σ Orders Products Customers ⨝ ⨝ Σ σ Orders π (customer: all, orderYear: 2019, prodName: all) (customer: all, orderYear: 2020, prodName: all) custName prodName orderDate Bottom-up vs Top-down query
  • 15. Represent a Business Intelligence model as a SQL view Orders Products Customers CREATE VIEW OrdersCube AS SELECT * FROM ( SELECT o.orderDate AS `order.date`, o.revenue AS `order.revenue`, SUM(o.revenue) AS MEASURE `order.sum_revenue` FROM Orders) AS o LEFT JOIN ( SELECT c.custName AS `customer.name`, c.state AS `customer.state`, c.custAge AS `customer.age`, AVG(c.custAge) AS MEASURE `customer.avg_age` FROM Customers) AS c ON o.custName = c.custName LEFT JOIN ( SELECT p.prodName AS `product.name`, p.color AS `product.color`, AVG(p.weight) AS MEASURE `product.avg_weight` FROM Products) AS p ON o.prodName = p.prodName; SELECT `customer.state`, `product.avg_weight` FROM OrdersCube GROUP BY `customer.state`; ● SQL planner handles view expansion ● Grain locking makes it safe to use a star schema ● Users can define new models simply by writing queries
  • 16. SELECT *, SUM(cost) AS MEASURE sumCost, SUM(revenue) AS MEASURE sumRevenue FROM Orders Composition & closure Just as tables are closed under queries, so tables-with-measures are closed under queries-with-measures Measures can reference measures Complex analytical calculations without touching the FROM clause Evaluation contexts can be nested SELECT *, SUM(cost) AS MEASURE sumCost, SUM(revenue) AS MEASURE sumRevenue, (sumRevenue - sumCost) / sumRevenue AS MEASURE profitMargin FROM Orders SELECT *, SUM(cost) AS MEASURE sumCost, SUM(revenue) AS MEASURE sumRevenue, (sumRevenue - sumCost) / sumRevenue AS MEASURE profitMargin, sumRevenue - sumRevenue AT (SET YEAR(orderDate) = CURRENT YEAR(orderDate) - 1) AS MEASURE revenueGrowthYoY FROM Orders SELECT *, SUM(cost) AS MEASURE sumCost, SUM(revenue) AS MEASURE sumRevenue, (sumRevenue - sumCost) / sumRevenue AS MEASURE profitMargin, sumRevenue - sumRevenue AT (SET YEAR(orderDate) = CURRENT YEAR(orderDate) - 1) AS MEASURE revenueGrowthYoY, ARRAY_AGG(productId ORDER BY sumRevenue DESC LIMIT 5) AT (ALL productId) AS MEASURE top5Products FROM Orders; SELECT *, SUM(cost) AS MEASURE sumCost, SUM(revenue) AS MEASURE sumRevenue, (sumRevenue - sumCost) / sumRevenue AS MEASURE profitMargin, sumRevenue - sumRevenue AT (SET YEAR(orderDate) = CURRENT YEAR(orderDate) - 1) AS MEASURE revenueGrowthYoY, ARRAY_AGG(productId ORDER BY sumRevenue DESC LIMIT 5) AT (ALL productId) AS MEASURE top5Products, ARRAY_AGG(customerId ORDER BY sumRevenue DESC LIMIT 3) AT (ALL customerId SET productId MEMBER OF top5Products AT (SET YEAR(orderDate) = CURRENT YEAR(orderDate) - 1)) AS MEASURE top3CustomersOfTop5Products FROM Orders;
  • 17. Implementing measures & CSEs as SQL rewrites simple complex Complexity Query Expanded query Simple measure can be inlined SELECT prodName, avgRevenue FROM OrdersCube GROUP BY prodName SELECT prodName, AVG(revenue) FROM orders GROUP BY prodName Join requires grain- locking SELECT prodName, avgAge FROM OrdersCube GROUP BY prodName SELECT o.prodName, AVG(c.custAge PER c.custName) FROM orders JOIN customers GROUP BY prodName → (something with GROUPING SETS) Period-over- period SELECT prodName, avgAge - avgAge AT (SET year = CURRENT year - 1) FROM OrdersCube GROUP BY prodName (something with window aggregates) Scalar subquery can accomplish anything SELECT prodName, prodColor avgAge AT (ALL custState SET year = CURRENT year - 1) FROM OrdersCube GROUP BY prodName, prodColor SELECT prodName, prodColor, (SELECT … FROM orders WHERE <evaluation context>) FROM orders GROUP BY prodName, prodColor
  • 18. Summary Measures provide reusable calculations ● Can represent BI models (aka ‘cubes’, ‘semantic layer’) as SQL views Top-down evaluation makes queries concise ● Fewer self joins → fewer user errors, less planner effort, more efficient execution Measures don’t break SQL ● Queries without measures give same results to regular SQL ● Queries with measures give same row count as regular SQL ● Measures can be implemented by expanding to SQL

Editor's Notes

  1. Image source: http://www.hydromatic.net/pix/pix2021/raw/P2021719_j6d_IMG_2063.JPG
  2. https://tngchristians.ca/images/articles/gridcube.jpg
  3. Don’t touch the FROM clause – to form analytic calculations, it is sufficient to write complex expressions in the SELECT clause
  4. Top-down evaluation makes queries concise. No self-joins are necessary (so you don’t have to repeat yourself, self-join, and deal with duplicate data) Calculations are reusable, which means that we can define the calculations in sub-queries, store the sub-queries as views, and share the views as the ‘model’.
  5. Image source: http://www.hydromatic.net/pix/pix2021/raw/P2021719_j6d_IMG_2063.JPG