SlideShare a Scribd company logo
1 of 55
Download to read offline
Building a semantic/metrics
layer using Apache Calcite
Julian Hyde (Calcite, Google)
Community over Code • Halifax, Nova Scotia • 2023-10-09
Abstract
A semantic layer, also known as a metrics layer, lies between business users and the
database, and lets those users compose queries in the concepts that they understand.
It also governs access to the data, manages data transformations, and can tune the
database by defining materializations.
Like many new ideas, the semantic layer is a distillation and evolution of many old ideas, such as query
languages, multidimensional OLAP, and query federation.
In this talk, we describe the features we are adding to Calcite to define business views, query measures, and
optimize performance.
Julian Hyde is the original developer of Apache Calcite, an open source framework for building data
management systems, and Morel, a functional query language. Previously he created Mondrian, an analytics
engine, and SQLstream, an engine for continuous queries. He is a staff engineer at Google, where he works on
Looker and BigQuery.
Database
What products
are doing better
this year?
Semantic
layer
SELECT …
FROM …
GROUP BY …
Data system = Model + Query + Engine
Query
language
Data
model
Engine
Agenda
1. Relational model vs
dimensional model
2. Adding measures to SQL
3. Machine-learning patterns
4. Semantic layer
1. Relational model vs
dimensional model
2. Adding measures to SQL
3. Machine-learning patterns
4. Semantic layer
1. Relational model vs
dimensional model
2. Adding measures to SQL
3. Machine-learning patterns
4. Semantic layer
SQL vs BI
BI tools implement their own languages on top of SQL. Why not SQL?
Possible reasons:
● Semantic Model
● Control presentation / visualization
● Governance
● Pre-join tables
● Define reusable calculations
● Ask complex questions in a concise way
Processing BI in SQL
Why we should do it
● Move processing, not data
● Cloud SQL scale
● Remove data lag
● SQL is open
Why it’s hard
● Different paradigm
● More complex data model
● Can’t break SQL
Apache Calcite
Apache Calcite
Avatica
JDBC server
JDBC client
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
ODBC client
Adapter
Physical
operators
Storage
SQL parser &
validator
Query
planner
Relational
algebra
Core – Operator expressions
(relational algebra) and planner
(based on Cascades)
External – Data storage,
algorithms and catalog
Optional – SQL parser, JDBC &
ODBC drivers
Extensible – Planner rewrite rules,
statistics, cost model, algebra,
UDFs
RelBuilder
Pasta machine vs Pizza delivery
Relational algebra (bottom-up) Multidimensional (top-down)
Products
Suppliers
⨝
⨝
Σ
⨝
σ
Sales
Products
Suppliers
⨝
⨝
Σ
σ
Sales
π
(Supplier: ‘ACE’,
Date: ‘1994-01’,
Product: all)
(Supplier: ‘ACE’,
Date: ‘1995-01’,
Product: all)
Supplier
Product
Date
Bottom-up vs Top-down query
1. Relational model vs
dimensional model
2. Adding measures to SQL
3. Machine-learning patterns
4. Semantic layer
1. Relational model vs
dimensional model
2. Adding measures to SQL
3. Machine-learning patterns
4. Semantic layer
Some multidimensional queries
● Give the total sales for each product in each quarter of 1995. (Note that quarter is a function of date).
● For supplier “Ace” and for each product, give the fractional increase in the sales in January 1995 relative to
the sales in January 1994.
● For each product give its market share in its category today minus its market share in its category in
October 1994.
● Select top 5 suppliers for each product category for last year, based on total sales.
● For each product category, select total sales this month of the product that had highest sales in that
category last month.
● Select suppliers that currently sell the highest selling product of last month.
● Select suppliers for which the total sale of every product increased in each of last 5 years.
● Select suppliers for which the total sale of every product category increased in each of last 5 years.
From [Agrawal1997]. Assumes a database with dimensions {supplier, date, product} and measure {sales}.)
Some multidimensional queries
● Give the total sales for each product in each quarter of 1995. (Note that quarter is a function of date).
● For supplier “Ace” and for each product, give the fractional increase in the sales in January 1995 relative to
the sales in January 1994.
● For each product give its market share in its category today minus its market share in its category in
October 1994.
● Select top 5 suppliers for each product category for last year, based on total sales.
● For each product category, select total sales this month of the product that had highest sales in that
category last month.
● Select suppliers that currently sell the highest selling product of last month.
● Select suppliers for which the total sale of every product increased in each of last 5 years
● Select suppliers for which the total sale of every product category increased in each of last 5 years.
From [Agrawal1997]. Assumes a database with dimensions {supplier, date, product} and measure {sales}.)
Query:
● For supplier “Ace” and for each product, give the fractional increase in the sales in January 1995 relative to
the sales in January 1994.
SQL MDX
SELECT p.prodId,
s95.sales,
(s95.sales - s94.sales) / s95.sales
FROM (
SELECT p.prodId, SUM(s.sales) AS sales
FROM Sales AS s
JOIN Suppliers AS u USING (suppId)
JOIN Products AS p USING (prodId)
WHERE u.name = ‘ACE’
AND FLOOR(s.date TO MONTH) = ‘1995-01-01’
GROUP BY p.prodId) AS s95
LEFT JOIN (
SELECT p.prodId, SUM(s.sales) AS sales
FROM Sales AS s
JOIN Suppliers AS u USING (suppId)
JOIN Products AS p USING (prodId)
WHERE u.name = ‘ACE’
AND FLOOR(s.date TO MONTH) = ‘1994-01-01’
GROUP BY p.prodId) AS s94
USING (prodId)
WITH MEMBER [Measures].[Sales Last Year] =
([Measures].[Sales],
ParallelPeriod([Date], 1, [Date].[Year]))
MEMBER [Measures].[Sales Growth] =
([Measures].[Sales]
- [Measures].[Sales Last Year])
/ [Measures].[Sales Last Year]
SELECT [Measures].[Sales Growth] ON COLUMNS,
[Product].Members ON ROWS
FROM [Sales]
WHERE [Supplier].[ACE]
Query:
● For supplier “Ace” and for each product, give the fractional increase in the sales in January 1995 relative to
the sales in January 1994.
SQL SQL with measures
SELECT p.prodId,
s95.sales,
(s95.sales - s94.sales) / s95.sales
FROM (
SELECT p.prodId, SUM(s.sales) AS sales
FROM Sales AS s
JOIN Suppliers AS u USING (suppId)
JOIN Products AS p USING (prodId)
WHERE u.name = ‘ACE’
AND FLOOR(s.date TO MONTH) = ‘1995-01-01’
GROUP BY p.prodId) AS s95
LEFT JOIN (
SELECT p.prodId, SUM(s.sales) AS sales
FROM Sales AS s
JOIN Suppliers AS u USING (suppId)
JOIN Products AS p USING (prodId)
WHERE u.name = ‘ACE’
AND FLOOR(s.date TO MONTH) = ‘1994-01-01’
GROUP BY p.prodId) AS s94
USING (prodId)
SELECT p.prodId,
SUM(s.sales) AS MEASURE sumSales,
sumSales AT (SET FLOOR(s.date TO MONTH)
= ‘1994-01-01’)
AS MEASURE sumSalesLastYear
FROM Sales AS s
JOIN Suppliers AS u USING (suppId)
JOIN Products AS p USING (prodId))
WHERE u.name = ‘ACE’
AND FLOOR(s.date TO MONTH) = ‘1995-01-01’
GROUP BY p.prodId
Self-joins, correlated subqueries, window aggregates, measures
Window aggregate functions were introduced to save on
self-joins.
Some DBs rewrite scalar subqueries and self-joins to
window aggregates [Zuzarte2003].
Window aggregates are more concise, easier to optimize,
and often more efficient.
However, window aggregates can only see data that is from
the same table, and is allowed by the WHERE clause.
Measures overcome that limitation.
SELECT *
FROM Employees AS e
WHERE sal > (
SELECT AVG(sal)
FROM Employees
WHERE deptno = e.deptno)
SELECT *
FROM Employees AS e
WHERE sal > AVG(sal)
OVER (PARTITION BY deptno)
A measure is… ?
… a column with an aggregate function. SUM(sales)
A measure is… ?
… a column with an aggregate function. SUM(sales)
… a column that, when used as an
expression, knows how to aggregate itself.
(SUM(sales) - SUM(cost))
/ SUM(sales)
A measure is… ?
… a column with an aggregate function. SUM(sales)
… a column that, when used as an
expression, knows how to aggregate itself.
(SUM(sales) - SUM(cost))
/ SUM(sales)
… a column that, when used as expression,
can evaluate itself in any context.
(SELECT SUM(forecastSales)
FROM SalesForecast AS s
WHERE predicate(s))
ExchService$ClosingRate(
‘USD’, ‘EUR’, sales.date)
A measure is…
… a column with an aggregate function. SUM(sales)
… a column that, when used as an
expression, knows how to aggregate itself.
(SUM(sales) - SUM(cost))
/ SUM(sales)
… a column that, when used as expression,
can evaluate itself in any context.
Its value depends on, and only on, the
predicate placed on its dimensions.
(SELECT SUM(forecastSales)
FROM SalesForecast AS s
WHERE predicate(s))
ExchService$ClosingRate(
‘USD’, ‘EUR’, sales.date)
SELECT MOD(deptno, 2) = 0 AS evenDeptno, avgSal2
FROM
WHERE deptno < 30
SELECT deptno, AVG(avgSal) AS avgSal2
FROM
GROUP BY deptno
Table model
Tables are SQL’s fundamental
model.
The model is closed – queries
consume and produce tables.
Tables are opaque – you can’t
deduce the type, structure or
private data of a table.
SELECT deptno, job,
AVG(sal) AS avgSal
FROM Employees
GROUP BY deptno, job
Employees2
Employees3
SELECT MOD(deptno, 2) = 0 AS evenDeptno, avgSal2
FROM
WHERE deptno < 30
SELECT deptno, AVG(avgSal) AS avgSal2
FROM
GROUP BY deptno
Table model
Tables are SQL’s fundamental
model.
The model is closed – queries
consume and produce tables.
Tables are opaque – you can’t
deduce the type, structure or
private data of a table.
SELECT deptno, job,
AVG(sal) AS avgSal
FROM Employees
GROUP BY deptno, job
SELECT e.deptno, e.job, d.dname, e.avgSal / e.deptAvgSal
FROM
AS e
JOIN Departments AS d USING (deptno)
WHERE d.dname <> ‘MARKETING’
GROUP BY deptno, job
We propose to allow any table and
query to have measure columns.
The model is closed – queries
consume and produce
tables-with-measures.
Tables-with-measures are
semi-opaque – you can’t deduce the
type, structure or private data, but
you can evaluate the measure in any
context that can be expressed as a
predicate on the measure’s
dimensions.
SELECT *,
avgSal AS MEASURE avgSal,
avgSal AT (CLEAR deptno) AS MEASURE deptAvgSal
FROM
Table model with measures
SELECT *,
AVG(sal) AS MEASURE avgSal
FROM Employees
AnalyticEmployees
AnalyticEmployees2
SELECT e.deptno, e.job, d.dname, e.avgSal / e.deptAvgSal
FROM
AS e
JOIN Departments AS d USING (deptno)
WHERE d.dname <> ‘MARKETING’
GROUP BY deptno, job
We propose to allow any table and
query to have measure columns.
The model is closed – queries
consume and produce
tables-with-measures.
Tables-with-measures are
semi-opaque – you can’t deduce the
type, structure or private data, but
you can evaluate the measure in any
context that can be expressed as a
predicate on the measure’s
dimensions.
SELECT *,
avgSal AS MEASURE avgSal,
avgSal AT (ALL deptno) AS MEASURE deptAvgSal
FROM
Table model with measures
SELECT *,
AVG(sal) AS MEASURE avgSal
FROM Employees
Syntax
expression AS MEASURE – defines a measure in the SELECT clause
AGGREGATE(measure) – evaluates a measure in a GROUP BY query
expression AT (contextModifier…) – evaluates expression in a modified context
contextModifier ::=
ALL
| ALL dimension [, dimension…]
| ALL EXCEPT dimension [, dimension…]
| SET dimension = [CURRENT] expression
| VISIBLE
aggFunction(aggFunction(expression) PER dimension) – multi-level aggregation
Plan of attack
1. Add measures to the table model, and allow queries to use them
◆ Measures are defined only via the Table API
2. Define measures using SQL expressions (AS MEASURE)
◆ You can still define them using the Table API
3. Context-sensitive expressions (AT)
Semantics
0. We have a measure M, value type V,
in a table T.
CREATE VIEW AnalyticEmployees AS
SELECT *, AVG(sal) AS MEASURE avgSal
FROM Employees
1. System defines a row type R with the
non-measure columns.
CREATE TYPE R AS
ROW (deptno: INTEGER, job: VARCHAR)
2. System defines an auxiliary function
for M. (Function is typically a scalar
subquery that references the measure’s
underlying table.)
CREATE FUNCTION computeAvgSal(
rowPredicate: FUNCTION<R, BOOLEAN>) =
(SELECT AVG(e.sal)
FROM Employees AS e
WHERE APPLY(rowPredicate, e))
Semantics (continued)
3. We have a query that uses M. SELECT deptno,
avgSal
/ avgSal AT (ALL deptno)
FROM AnalyticEmployees AS e
GROUP BY deptno
4. Substitute measure references with
calls to the auxiliary function with the
appropriate predicate
SELECT deptno,
computeAvgSal(r ->🠚(r.deptno = e.deptno))
/ computeAvgSal(r 🠚 TRUE))
FROM AnalyticEmployees AS e
GROUP BY deptno
5. Planner inlines computeAvgSal and
scalar subqueries
SELECT deptno, AVG(sal) / MIN(avgSal)
FROM (
SELECT deptno, sal,
AVG(sal) OVER () AS avgSal
FROM Employees)
GROUP BY deptno
Calculating at the right grain
Example Formula Grain
Computing the revenue from
units and unit price
units * pricePerUnit AS revenue Row
Sum of revenue (additive) SUM(revenue)
AS MEASURE sumRevenue
Top
Profit margin (non-additive) (SUM(revenue) - SUM(cost))
/ SUM(revenue)
AS MEASURE profitMargin
Top
Inventory (semi-additive) SUM(LAST_VALUE(unitsInStock)
PER inventoryDate)
AS MEASURE sumInventory
Intermediate
Daily average (weighted
average)
AVG(sumRevenue PER orderDate)
AS MEASURE dailyAvgRevenue
Intermediate
Subtotals & visible
SELECT deptno, job,
SUM(sal), sumSal
FROM (
SELECT *,
SUM(sal) AS MEASURE sumSal
FROM Employees)
WHERE job <> ‘ANALYST’
GROUP BY ROLLUP(deptno, job)
ORDER BY 1,2
deptno job SUM(sal) sumSal
10 CLERK 1,300 1,300
10 MANAGER 2,450 2,450
10 PRESIDENT 5,000 5,000
10 8,750 8,750
20 CLERK 1,900 1,900
20 MANAGER 2,975 2,975
20 4,875 10,875
30 CLERK 950 950
30 MANAGER 2,850 2,850
30 SALES 5,600 5,600
30 9,400 9,400
20,750 29,025
Measures by default sum ALL rows;
Aggregate functions sum only VISIBLE rows
Visible
Expression Example Which rows?
Aggregate function SUM(sal) Visible only
Measure sumSal All
AGGREGATE applied to measure AGGREGATE(sumSal) Visible only
Measure with VISIBLE sumSal AT (VISIBLE) Visible only
Measure with ALL sumSal AT (ALL) All
1. Relational model vs
dimensional model
2. Adding measures to SQL
3. Machine-learning patterns
4. Semantic layer
1. Relational model vs
dimensional model
2. Adding measures to SQL
3. Machine-learning patterns
4. Semantic layer
Forecasting
A forecast is simply a measure whose value at
some point in the future is determined, in some
manner, by a calculation on past data.
SELECT year(order_date), product, revenue,
forecast_revenue
FROM Orders
WHERE year(order_date) BETWEEN 2018 AND 2022
GROUP BY 1, 2
Forecasting: implementation
Problems
1. Predictive model under the forecast
(such as ARIMA or linear regression) is
probably too expensive to re-compute for
every query
2. We want to evaluate forecast for regions
for which there is not (yet) any data
Solutions
1. Amortize the cost of running the model
using (some kind of) materialized view
2. Add a SQL EXTEND operation to
implicitly generate data
SELECT year(order_date), product, revenue,
forecast_revenue
FROM Orders EXTEND (order_date)
WHERE year(order_date) BETWEEN 2021 AND 2025
GROUP BY 1, 2
Clustering
A clustering algorithm assigns data points to
regions of N-dimensional space called clusters such
that points that are in the same cluster are, by some
measure, close to each other and distant from points
in other clusters.
SELECT id, firstName, lastName, firstPurchaseDate,
latitude, longitude, revenue, region
FROM Customers;
CREATE VIEW Customers AS
SELECT *,
KMEANS(3, ROW(latitude, longitude)) AS MEASURE
region,
(SELECT SUM(revenue)
FROM Orders AS o
WHERE o.customerId = c.id) AS MEASURE revenue
FROM BaseCustomers AS c;
region is a measure (based on the
centroid of a cluster)
Clustering: fixing the baseline
The measure is a little too dynamic. Fix the baseline, so that cluster centroids don’t
change from one query to the next:
SELECT id, firstName, lastName, firstPurchaseDate,
latitude, longitude,
region AT (ALL
SET YEAR(firstPurchaseDate) = 2020)
FROM Customers;
Clustering: amortizing the cost
To amortize the cost of the algorithm, create a materialized view:
CREATE MATERIALIZED VIEW CustomersMV AS
SELECT *,
region AT (ALL
SET YEAR(firstPurchaseDate) = 2020) AS region2020
FROM Customers;
Classification
Classification predicts the value of a variable given the values of other variables and a
model trained on similar data.
For example, does a particular household own a dog?
Whether they have a dog may depend on household income, education level, location
of the household, purchasing history of the household.
SELECT last_name, zipcode, probability_that_household_has_dog,
expected_dog_count
FROM Customers
WHERE state = ‘AZ’
Classification: training & running
Pseudo-function CLASSIFY:
SELECT last_name, zipcode,
CLASSIFY(firstPurchaseDate = ‘2023-05-01’,
has_dog,
ROW (zipcode, state, income_level, education_level))
AS probability_that_household_has_dog,
expected_dog_count
FROM Customers
GROUP BY state
FUNCTION classify(isTraining, actualValue, features)
We assume that has_dog has
the correct value for customers
who purchased on 2023-05-01
A SQL view can both train the algorithm (given the correct
result) and execute it (generating the result from features):
1. Relational model vs
dimensional model
2. Adding measures to SQL
3. Machine-learning patterns
4. Semantic layer
1. Relational model vs
dimensional model
2. Adding measures to SQL
3. Machine-learning patterns
4. Semantic layer
Database
What products
are doing better
this year?
Semantic
layer
SELECT …
FROM …
GROUP BY …
Natural language query
Example query:
“Show me the top 5 products in each state where revenue declined since last year”
“Revenue” is a measure.
“Declined since last year” asks whether
revenue - revenue AT (SET year = CURRENT year - 1)
is negative.
“Products in each state” establishes the filter context.
Semantic model for natural-language query
Extended semantic model
“Show me regions where customers ordered low-inventory products last year”
Data model is a graph that
connects business views:
● Business views – tables,
possibly based on joins, with
measures, and display hints
● Domains – shared attributes
● Entities – shared dimensions
● Metrics – shared measures
● Ontology/synonyms
Do we need a new query language?
orders
product
customer
warehouse
shipments
inventory
geography
Summary
Measures in SQL allow…
● concise queries without self-joins
● top-down evaluation
● reusable calculations
● natural-language query
…and don’t break SQL
A semantic model is table with measures, accessed via analytic SQL..
A extended semantic model links such tables into a knowledge graph.
Resources
Papers
● “Modeling multidimensional databases”
(Agrawal, Gupta, and Sarawagi, 1997)
● “WinMagic: Subquery Elimination Using
Window Aggregation” (Zuzarte, Pirahash,
Ma, Cheng, Liu, and Wong, 2003)
● “Analyza: Exploring Data with
Conversation” (Dhamdhere, McCurley,
Nahmias, Sundararajan, Yan, 2017)
Issues
● [CALCITE-4488] WITHIN DISTINCT
clause for aggregate functions
(experimental)
● [CALCITE-4496] Measure columns
("SELECT ... AS MEASURE")
● [CALCITE-5105] Add MEASURE type and
AGGREGATE aggregate function
● [CALCITE-5155] Custom time frames
● [CALCITE-xxxx] PER operator
● [CALCITE-5692] Add AT operator, for
context-sensitive expressions
● [CALCITE-5951] PRECEDES function, for
period-to-date calculations
Thank you!
Any questions?
@julianhyde
@ApacheCalcite
https://calcite.apache.org

More Related Content

What's hot

Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkFlink Forward
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits allJulian Hyde
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Julian Hyde
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyftmarkgrover
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
 
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)DataWorks Summit
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsYaroslav Tkachenko
 
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache ArrowAchieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache ArrowNeo4j
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingDatabricks
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLScyllaDB
 
Productionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflowProductionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflowDatabricks
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streamsconfluent
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuFlink Forward
 

What's hot (20)

Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits all
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your Analytics
 
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache ArrowAchieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
Productionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflowProductionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflow
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streams
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Presto overview
Presto overviewPresto overview
Presto overview
 
Gcp dataflow
Gcp dataflowGcp dataflow
Gcp dataflow
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 

Similar to Building a semantic/metrics layer using Calcite

Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQLJulian Hyde
 
Multidimensional Data Analysis with Ruby (sample)
Multidimensional Data Analysis with Ruby (sample)Multidimensional Data Analysis with Ruby (sample)
Multidimensional Data Analysis with Ruby (sample)Raimonds Simanovskis
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Julian Hyde
 
Project report aditi paul1
Project report aditi paul1Project report aditi paul1
Project report aditi paul1guest9529cb
 
Company segmentation - an approach with R
Company segmentation - an approach with RCompany segmentation - an approach with R
Company segmentation - an approach with RCasper Crause
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Spark Summit
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPDhiren Gala
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolioeileensauer
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolioeileensauer
 
Lec -1 & 2MarketingAnalytics_Ch1_Introduction.ppt
Lec -1 & 2MarketingAnalytics_Ch1_Introduction.pptLec -1 & 2MarketingAnalytics_Ch1_Introduction.ppt
Lec -1 & 2MarketingAnalytics_Ch1_Introduction.pptSomitAwasthi
 
IT301-Datawarehousing (1) and its sub topics.pptx
IT301-Datawarehousing (1) and its sub topics.pptxIT301-Datawarehousing (1) and its sub topics.pptx
IT301-Datawarehousing (1) and its sub topics.pptxReneeClintGortifacio
 
Rick Watkins Power Point presentation
Rick Watkins Power Point presentationRick Watkins Power Point presentation
Rick Watkins Power Point presentationrickwatkins
 
DMA Analytic Challenge 2015 final
DMA Analytic Challenge 2015 finalDMA Analytic Challenge 2015 final
DMA Analytic Challenge 2015 finalKaran Sarao
 

Similar to Building a semantic/metrics layer using Calcite (20)

Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Multidimensional Data Analysis with Ruby (sample)
Multidimensional Data Analysis with Ruby (sample)Multidimensional Data Analysis with Ruby (sample)
Multidimensional Data Analysis with Ruby (sample)
 
02 Essbase
02 Essbase02 Essbase
02 Essbase
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!
 
Project report aditi paul1
Project report aditi paul1Project report aditi paul1
Project report aditi paul1
 
Essbase intro
Essbase introEssbase intro
Essbase intro
 
DWO -Pertemuan 1
DWO -Pertemuan 1DWO -Pertemuan 1
DWO -Pertemuan 1
 
SAP Flexible Planning
SAP Flexible PlanningSAP Flexible Planning
SAP Flexible Planning
 
Company segmentation - an approach with R
Company segmentation - an approach with RCompany segmentation - an approach with R
Company segmentation - an approach with R
 
Set Analyse OK.pdf
Set Analyse OK.pdfSet Analyse OK.pdf
Set Analyse OK.pdf
 
Getting power bi
Getting power biGetting power bi
Getting power bi
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolio
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolio
 
Lec -1 & 2MarketingAnalytics_Ch1_Introduction.ppt
Lec -1 & 2MarketingAnalytics_Ch1_Introduction.pptLec -1 & 2MarketingAnalytics_Ch1_Introduction.ppt
Lec -1 & 2MarketingAnalytics_Ch1_Introduction.ppt
 
IT301-Datawarehousing (1) and its sub topics.pptx
IT301-Datawarehousing (1) and its sub topics.pptxIT301-Datawarehousing (1) and its sub topics.pptx
IT301-Datawarehousing (1) and its sub topics.pptx
 
Rick Watkins Power Point presentation
Rick Watkins Power Point presentationRick Watkins Power Point presentation
Rick Watkins Power Point presentation
 
Data ware housing- Introduction to olap .
Data ware housing- Introduction to  olap .Data ware housing- Introduction to  olap .
Data ware housing- Introduction to olap .
 
DMA Analytic Challenge 2015 final
DMA Analytic Challenge 2015 finalDMA Analytic Challenge 2015 final
DMA Analytic Challenge 2015 final
 

More from Julian Hyde

Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming languageJulian Hyde
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityJulian Hyde
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're IncubatingJulian Hyde
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteJulian Hyde
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesJulian Hyde
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineeringJulian Hyde
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Julian Hyde
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databasesJulian Hyde
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and FastJulian Hyde
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache CalciteJulian Hyde
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteJulian Hyde
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache CalciteJulian Hyde
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Julian Hyde
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteJulian Hyde
 

More from Julian Hyde (20)

Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're Incubating
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 

Recently uploaded

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 

Recently uploaded (20)

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 

Building a semantic/metrics layer using Calcite

  • 1. Building a semantic/metrics layer using Apache Calcite Julian Hyde (Calcite, Google) Community over Code • Halifax, Nova Scotia • 2023-10-09
  • 2. Abstract A semantic layer, also known as a metrics layer, lies between business users and the database, and lets those users compose queries in the concepts that they understand. It also governs access to the data, manages data transformations, and can tune the database by defining materializations. Like many new ideas, the semantic layer is a distillation and evolution of many old ideas, such as query languages, multidimensional OLAP, and query federation. In this talk, we describe the features we are adding to Calcite to define business views, query measures, and optimize performance. Julian Hyde is the original developer of Apache Calcite, an open source framework for building data management systems, and Morel, a functional query language. Previously he created Mondrian, an analytics engine, and SQLstream, an engine for continuous queries. He is a staff engineer at Google, where he works on Looker and BigQuery.
  • 3.
  • 4. Database What products are doing better this year? Semantic layer SELECT … FROM … GROUP BY …
  • 5. Data system = Model + Query + Engine Query language Data model Engine
  • 6. Agenda 1. Relational model vs dimensional model 2. Adding measures to SQL 3. Machine-learning patterns 4. Semantic layer
  • 7. 1. Relational model vs dimensional model 2. Adding measures to SQL 3. Machine-learning patterns 4. Semantic layer
  • 8. 1. Relational model vs dimensional model 2. Adding measures to SQL 3. Machine-learning patterns 4. Semantic layer
  • 9. SQL vs BI BI tools implement their own languages on top of SQL. Why not SQL? Possible reasons: ● Semantic Model ● Control presentation / visualization ● Governance ● Pre-join tables ● Define reusable calculations ● Ask complex questions in a concise way
  • 10. Processing BI in SQL Why we should do it ● Move processing, not data ● Cloud SQL scale ● Remove data lag ● SQL is open Why it’s hard ● Different paradigm ● More complex data model ● Can’t break SQL
  • 11. Apache Calcite Apache Calcite Avatica JDBC server JDBC client Pluggable rewrite rules Pluggable stats / cost Pluggable catalog ODBC client Adapter Physical operators Storage SQL parser & validator Query planner Relational algebra Core – Operator expressions (relational algebra) and planner (based on Cascades) External – Data storage, algorithms and catalog Optional – SQL parser, JDBC & ODBC drivers Extensible – Planner rewrite rules, statistics, cost model, algebra, UDFs RelBuilder
  • 12.
  • 13. Pasta machine vs Pizza delivery
  • 14. Relational algebra (bottom-up) Multidimensional (top-down) Products Suppliers ⨝ ⨝ Σ ⨝ σ Sales Products Suppliers ⨝ ⨝ Σ σ Sales π (Supplier: ‘ACE’, Date: ‘1994-01’, Product: all) (Supplier: ‘ACE’, Date: ‘1995-01’, Product: all) Supplier Product Date Bottom-up vs Top-down query
  • 15. 1. Relational model vs dimensional model 2. Adding measures to SQL 3. Machine-learning patterns 4. Semantic layer
  • 16. 1. Relational model vs dimensional model 2. Adding measures to SQL 3. Machine-learning patterns 4. Semantic layer
  • 17. Some multidimensional queries ● Give the total sales for each product in each quarter of 1995. (Note that quarter is a function of date). ● For supplier “Ace” and for each product, give the fractional increase in the sales in January 1995 relative to the sales in January 1994. ● For each product give its market share in its category today minus its market share in its category in October 1994. ● Select top 5 suppliers for each product category for last year, based on total sales. ● For each product category, select total sales this month of the product that had highest sales in that category last month. ● Select suppliers that currently sell the highest selling product of last month. ● Select suppliers for which the total sale of every product increased in each of last 5 years. ● Select suppliers for which the total sale of every product category increased in each of last 5 years. From [Agrawal1997]. Assumes a database with dimensions {supplier, date, product} and measure {sales}.)
  • 18. Some multidimensional queries ● Give the total sales for each product in each quarter of 1995. (Note that quarter is a function of date). ● For supplier “Ace” and for each product, give the fractional increase in the sales in January 1995 relative to the sales in January 1994. ● For each product give its market share in its category today minus its market share in its category in October 1994. ● Select top 5 suppliers for each product category for last year, based on total sales. ● For each product category, select total sales this month of the product that had highest sales in that category last month. ● Select suppliers that currently sell the highest selling product of last month. ● Select suppliers for which the total sale of every product increased in each of last 5 years ● Select suppliers for which the total sale of every product category increased in each of last 5 years. From [Agrawal1997]. Assumes a database with dimensions {supplier, date, product} and measure {sales}.)
  • 19. Query: ● For supplier “Ace” and for each product, give the fractional increase in the sales in January 1995 relative to the sales in January 1994. SQL MDX SELECT p.prodId, s95.sales, (s95.sales - s94.sales) / s95.sales FROM ( SELECT p.prodId, SUM(s.sales) AS sales FROM Sales AS s JOIN Suppliers AS u USING (suppId) JOIN Products AS p USING (prodId) WHERE u.name = ‘ACE’ AND FLOOR(s.date TO MONTH) = ‘1995-01-01’ GROUP BY p.prodId) AS s95 LEFT JOIN ( SELECT p.prodId, SUM(s.sales) AS sales FROM Sales AS s JOIN Suppliers AS u USING (suppId) JOIN Products AS p USING (prodId) WHERE u.name = ‘ACE’ AND FLOOR(s.date TO MONTH) = ‘1994-01-01’ GROUP BY p.prodId) AS s94 USING (prodId) WITH MEMBER [Measures].[Sales Last Year] = ([Measures].[Sales], ParallelPeriod([Date], 1, [Date].[Year])) MEMBER [Measures].[Sales Growth] = ([Measures].[Sales] - [Measures].[Sales Last Year]) / [Measures].[Sales Last Year] SELECT [Measures].[Sales Growth] ON COLUMNS, [Product].Members ON ROWS FROM [Sales] WHERE [Supplier].[ACE]
  • 20. Query: ● For supplier “Ace” and for each product, give the fractional increase in the sales in January 1995 relative to the sales in January 1994. SQL SQL with measures SELECT p.prodId, s95.sales, (s95.sales - s94.sales) / s95.sales FROM ( SELECT p.prodId, SUM(s.sales) AS sales FROM Sales AS s JOIN Suppliers AS u USING (suppId) JOIN Products AS p USING (prodId) WHERE u.name = ‘ACE’ AND FLOOR(s.date TO MONTH) = ‘1995-01-01’ GROUP BY p.prodId) AS s95 LEFT JOIN ( SELECT p.prodId, SUM(s.sales) AS sales FROM Sales AS s JOIN Suppliers AS u USING (suppId) JOIN Products AS p USING (prodId) WHERE u.name = ‘ACE’ AND FLOOR(s.date TO MONTH) = ‘1994-01-01’ GROUP BY p.prodId) AS s94 USING (prodId) SELECT p.prodId, SUM(s.sales) AS MEASURE sumSales, sumSales AT (SET FLOOR(s.date TO MONTH) = ‘1994-01-01’) AS MEASURE sumSalesLastYear FROM Sales AS s JOIN Suppliers AS u USING (suppId) JOIN Products AS p USING (prodId)) WHERE u.name = ‘ACE’ AND FLOOR(s.date TO MONTH) = ‘1995-01-01’ GROUP BY p.prodId
  • 21. Self-joins, correlated subqueries, window aggregates, measures Window aggregate functions were introduced to save on self-joins. Some DBs rewrite scalar subqueries and self-joins to window aggregates [Zuzarte2003]. Window aggregates are more concise, easier to optimize, and often more efficient. However, window aggregates can only see data that is from the same table, and is allowed by the WHERE clause. Measures overcome that limitation. SELECT * FROM Employees AS e WHERE sal > ( SELECT AVG(sal) FROM Employees WHERE deptno = e.deptno) SELECT * FROM Employees AS e WHERE sal > AVG(sal) OVER (PARTITION BY deptno)
  • 22. A measure is… ? … a column with an aggregate function. SUM(sales)
  • 23. A measure is… ? … a column with an aggregate function. SUM(sales) … a column that, when used as an expression, knows how to aggregate itself. (SUM(sales) - SUM(cost)) / SUM(sales)
  • 24. A measure is… ? … a column with an aggregate function. SUM(sales) … a column that, when used as an expression, knows how to aggregate itself. (SUM(sales) - SUM(cost)) / SUM(sales) … a column that, when used as expression, can evaluate itself in any context. (SELECT SUM(forecastSales) FROM SalesForecast AS s WHERE predicate(s)) ExchService$ClosingRate( ‘USD’, ‘EUR’, sales.date)
  • 25. A measure is… … a column with an aggregate function. SUM(sales) … a column that, when used as an expression, knows how to aggregate itself. (SUM(sales) - SUM(cost)) / SUM(sales) … a column that, when used as expression, can evaluate itself in any context. Its value depends on, and only on, the predicate placed on its dimensions. (SELECT SUM(forecastSales) FROM SalesForecast AS s WHERE predicate(s)) ExchService$ClosingRate( ‘USD’, ‘EUR’, sales.date)
  • 26. SELECT MOD(deptno, 2) = 0 AS evenDeptno, avgSal2 FROM WHERE deptno < 30 SELECT deptno, AVG(avgSal) AS avgSal2 FROM GROUP BY deptno Table model Tables are SQL’s fundamental model. The model is closed – queries consume and produce tables. Tables are opaque – you can’t deduce the type, structure or private data of a table. SELECT deptno, job, AVG(sal) AS avgSal FROM Employees GROUP BY deptno, job Employees2 Employees3
  • 27. SELECT MOD(deptno, 2) = 0 AS evenDeptno, avgSal2 FROM WHERE deptno < 30 SELECT deptno, AVG(avgSal) AS avgSal2 FROM GROUP BY deptno Table model Tables are SQL’s fundamental model. The model is closed – queries consume and produce tables. Tables are opaque – you can’t deduce the type, structure or private data of a table. SELECT deptno, job, AVG(sal) AS avgSal FROM Employees GROUP BY deptno, job
  • 28. SELECT e.deptno, e.job, d.dname, e.avgSal / e.deptAvgSal FROM AS e JOIN Departments AS d USING (deptno) WHERE d.dname <> ‘MARKETING’ GROUP BY deptno, job We propose to allow any table and query to have measure columns. The model is closed – queries consume and produce tables-with-measures. Tables-with-measures are semi-opaque – you can’t deduce the type, structure or private data, but you can evaluate the measure in any context that can be expressed as a predicate on the measure’s dimensions. SELECT *, avgSal AS MEASURE avgSal, avgSal AT (CLEAR deptno) AS MEASURE deptAvgSal FROM Table model with measures SELECT *, AVG(sal) AS MEASURE avgSal FROM Employees AnalyticEmployees AnalyticEmployees2
  • 29. SELECT e.deptno, e.job, d.dname, e.avgSal / e.deptAvgSal FROM AS e JOIN Departments AS d USING (deptno) WHERE d.dname <> ‘MARKETING’ GROUP BY deptno, job We propose to allow any table and query to have measure columns. The model is closed – queries consume and produce tables-with-measures. Tables-with-measures are semi-opaque – you can’t deduce the type, structure or private data, but you can evaluate the measure in any context that can be expressed as a predicate on the measure’s dimensions. SELECT *, avgSal AS MEASURE avgSal, avgSal AT (ALL deptno) AS MEASURE deptAvgSal FROM Table model with measures SELECT *, AVG(sal) AS MEASURE avgSal FROM Employees
  • 30. Syntax expression AS MEASURE – defines a measure in the SELECT clause AGGREGATE(measure) – evaluates a measure in a GROUP BY query expression AT (contextModifier…) – evaluates expression in a modified context contextModifier ::= ALL | ALL dimension [, dimension…] | ALL EXCEPT dimension [, dimension…] | SET dimension = [CURRENT] expression | VISIBLE aggFunction(aggFunction(expression) PER dimension) – multi-level aggregation
  • 31. Plan of attack 1. Add measures to the table model, and allow queries to use them ◆ Measures are defined only via the Table API 2. Define measures using SQL expressions (AS MEASURE) ◆ You can still define them using the Table API 3. Context-sensitive expressions (AT)
  • 32. Semantics 0. We have a measure M, value type V, in a table T. CREATE VIEW AnalyticEmployees AS SELECT *, AVG(sal) AS MEASURE avgSal FROM Employees 1. System defines a row type R with the non-measure columns. CREATE TYPE R AS ROW (deptno: INTEGER, job: VARCHAR) 2. System defines an auxiliary function for M. (Function is typically a scalar subquery that references the measure’s underlying table.) CREATE FUNCTION computeAvgSal( rowPredicate: FUNCTION<R, BOOLEAN>) = (SELECT AVG(e.sal) FROM Employees AS e WHERE APPLY(rowPredicate, e))
  • 33. Semantics (continued) 3. We have a query that uses M. SELECT deptno, avgSal / avgSal AT (ALL deptno) FROM AnalyticEmployees AS e GROUP BY deptno 4. Substitute measure references with calls to the auxiliary function with the appropriate predicate SELECT deptno, computeAvgSal(r ->🠚(r.deptno = e.deptno)) / computeAvgSal(r 🠚 TRUE)) FROM AnalyticEmployees AS e GROUP BY deptno 5. Planner inlines computeAvgSal and scalar subqueries SELECT deptno, AVG(sal) / MIN(avgSal) FROM ( SELECT deptno, sal, AVG(sal) OVER () AS avgSal FROM Employees) GROUP BY deptno
  • 34. Calculating at the right grain Example Formula Grain Computing the revenue from units and unit price units * pricePerUnit AS revenue Row Sum of revenue (additive) SUM(revenue) AS MEASURE sumRevenue Top Profit margin (non-additive) (SUM(revenue) - SUM(cost)) / SUM(revenue) AS MEASURE profitMargin Top Inventory (semi-additive) SUM(LAST_VALUE(unitsInStock) PER inventoryDate) AS MEASURE sumInventory Intermediate Daily average (weighted average) AVG(sumRevenue PER orderDate) AS MEASURE dailyAvgRevenue Intermediate
  • 35. Subtotals & visible SELECT deptno, job, SUM(sal), sumSal FROM ( SELECT *, SUM(sal) AS MEASURE sumSal FROM Employees) WHERE job <> ‘ANALYST’ GROUP BY ROLLUP(deptno, job) ORDER BY 1,2 deptno job SUM(sal) sumSal 10 CLERK 1,300 1,300 10 MANAGER 2,450 2,450 10 PRESIDENT 5,000 5,000 10 8,750 8,750 20 CLERK 1,900 1,900 20 MANAGER 2,975 2,975 20 4,875 10,875 30 CLERK 950 950 30 MANAGER 2,850 2,850 30 SALES 5,600 5,600 30 9,400 9,400 20,750 29,025 Measures by default sum ALL rows; Aggregate functions sum only VISIBLE rows
  • 36. Visible Expression Example Which rows? Aggregate function SUM(sal) Visible only Measure sumSal All AGGREGATE applied to measure AGGREGATE(sumSal) Visible only Measure with VISIBLE sumSal AT (VISIBLE) Visible only Measure with ALL sumSal AT (ALL) All
  • 37. 1. Relational model vs dimensional model 2. Adding measures to SQL 3. Machine-learning patterns 4. Semantic layer
  • 38. 1. Relational model vs dimensional model 2. Adding measures to SQL 3. Machine-learning patterns 4. Semantic layer
  • 39. Forecasting A forecast is simply a measure whose value at some point in the future is determined, in some manner, by a calculation on past data. SELECT year(order_date), product, revenue, forecast_revenue FROM Orders WHERE year(order_date) BETWEEN 2018 AND 2022 GROUP BY 1, 2
  • 40. Forecasting: implementation Problems 1. Predictive model under the forecast (such as ARIMA or linear regression) is probably too expensive to re-compute for every query 2. We want to evaluate forecast for regions for which there is not (yet) any data Solutions 1. Amortize the cost of running the model using (some kind of) materialized view 2. Add a SQL EXTEND operation to implicitly generate data SELECT year(order_date), product, revenue, forecast_revenue FROM Orders EXTEND (order_date) WHERE year(order_date) BETWEEN 2021 AND 2025 GROUP BY 1, 2
  • 41. Clustering A clustering algorithm assigns data points to regions of N-dimensional space called clusters such that points that are in the same cluster are, by some measure, close to each other and distant from points in other clusters. SELECT id, firstName, lastName, firstPurchaseDate, latitude, longitude, revenue, region FROM Customers; CREATE VIEW Customers AS SELECT *, KMEANS(3, ROW(latitude, longitude)) AS MEASURE region, (SELECT SUM(revenue) FROM Orders AS o WHERE o.customerId = c.id) AS MEASURE revenue FROM BaseCustomers AS c; region is a measure (based on the centroid of a cluster)
  • 42. Clustering: fixing the baseline The measure is a little too dynamic. Fix the baseline, so that cluster centroids don’t change from one query to the next: SELECT id, firstName, lastName, firstPurchaseDate, latitude, longitude, region AT (ALL SET YEAR(firstPurchaseDate) = 2020) FROM Customers;
  • 43. Clustering: amortizing the cost To amortize the cost of the algorithm, create a materialized view: CREATE MATERIALIZED VIEW CustomersMV AS SELECT *, region AT (ALL SET YEAR(firstPurchaseDate) = 2020) AS region2020 FROM Customers;
  • 44. Classification Classification predicts the value of a variable given the values of other variables and a model trained on similar data. For example, does a particular household own a dog? Whether they have a dog may depend on household income, education level, location of the household, purchasing history of the household. SELECT last_name, zipcode, probability_that_household_has_dog, expected_dog_count FROM Customers WHERE state = ‘AZ’
  • 45. Classification: training & running Pseudo-function CLASSIFY: SELECT last_name, zipcode, CLASSIFY(firstPurchaseDate = ‘2023-05-01’, has_dog, ROW (zipcode, state, income_level, education_level)) AS probability_that_household_has_dog, expected_dog_count FROM Customers GROUP BY state FUNCTION classify(isTraining, actualValue, features) We assume that has_dog has the correct value for customers who purchased on 2023-05-01 A SQL view can both train the algorithm (given the correct result) and execute it (generating the result from features):
  • 46. 1. Relational model vs dimensional model 2. Adding measures to SQL 3. Machine-learning patterns 4. Semantic layer
  • 47. 1. Relational model vs dimensional model 2. Adding measures to SQL 3. Machine-learning patterns 4. Semantic layer
  • 48. Database What products are doing better this year? Semantic layer SELECT … FROM … GROUP BY …
  • 49. Natural language query Example query: “Show me the top 5 products in each state where revenue declined since last year” “Revenue” is a measure. “Declined since last year” asks whether revenue - revenue AT (SET year = CURRENT year - 1) is negative. “Products in each state” establishes the filter context.
  • 50. Semantic model for natural-language query
  • 51. Extended semantic model “Show me regions where customers ordered low-inventory products last year” Data model is a graph that connects business views: ● Business views – tables, possibly based on joins, with measures, and display hints ● Domains – shared attributes ● Entities – shared dimensions ● Metrics – shared measures ● Ontology/synonyms Do we need a new query language? orders product customer warehouse shipments inventory geography
  • 52.
  • 53. Summary Measures in SQL allow… ● concise queries without self-joins ● top-down evaluation ● reusable calculations ● natural-language query …and don’t break SQL A semantic model is table with measures, accessed via analytic SQL.. A extended semantic model links such tables into a knowledge graph.
  • 54. Resources Papers ● “Modeling multidimensional databases” (Agrawal, Gupta, and Sarawagi, 1997) ● “WinMagic: Subquery Elimination Using Window Aggregation” (Zuzarte, Pirahash, Ma, Cheng, Liu, and Wong, 2003) ● “Analyza: Exploring Data with Conversation” (Dhamdhere, McCurley, Nahmias, Sundararajan, Yan, 2017) Issues ● [CALCITE-4488] WITHIN DISTINCT clause for aggregate functions (experimental) ● [CALCITE-4496] Measure columns ("SELECT ... AS MEASURE") ● [CALCITE-5105] Add MEASURE type and AGGREGATE aggregate function ● [CALCITE-5155] Custom time frames ● [CALCITE-xxxx] PER operator ● [CALCITE-5692] Add AT operator, for context-sensitive expressions ● [CALCITE-5951] PRECEDES function, for period-to-date calculations