Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
This document discusses processing business intelligence (BI) queries in SQL rather than BI-specific languages. It proposes extensions to Apache Calcite's SQL dialect to support measures, context-sensitive expressions, and analytic views that contain metrics and calculations. The talk will describe the SQL syntax for measures, how to define and use them for cross-dimensional calculations, and approaches for optimizing such queries. It provides examples of multidimensional queries that can be expressed in both MDX and the proposed SQL extensions.
This presentation features the fundamentals of SQL tunning like SQL Processing, Optimizer and Execution Plan, Accessing Tables, Performance Improvement Consideration Partition Technique. Presented by Alphalogic Inc : https://www.alphalogicinc.com/
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Julian Hyde
What if Looker saw the queries you just executed and could predict your next query? Could it make those queries faster, by smarter caching, or aggregate navigation? Could it read your past SQL queries and help you write your LookML model? Those are some of the reasons to add relational algebra into Looker’s query engine, and why Looker hired Julian Hyde, author of Apache Calcite, to lead the effort. In this talk about the internals of Looker’s query engine, Julian Hyde will describe how the engine works, how Looker queries are described in Calcite’s relational algebra, and some features that it makes possible.
A talk by Julian Hyde at JOIN 2019 in San Francisco.
This document contains examples from a portfolio of business intelligence projects including data modeling, SQL programming, SSIS, SSAS, SSRS, PPS, Excel Services, and SharePoint. It includes examples of relational and dimensional data models, SQL queries, SSIS packages for data integration and processing, an SSAS cube with calculations, KPIs and reports, Excel dashboards published to SharePoint using Excel Services, and reports and dashboards deployed to SharePoint.
Ground Breakers Romania: Explain the explain_planMaria Colgan
This session was delivered as part of the EMEA Ground Breakers tour in Romania, Oct. 2019. The execution plan for a SQL statement can often seem complicated and hard to understand. Determining if the execution plan you are looking at is the best plan you could get or attempting to improve a poorly performing execution plan can be a daunting task even for the most experienced DBA or developer. This session examines the different aspects of an execution plan, from selectivity to parallel execution and explains what information you should be gleaming from the plan and how it affects the execution. It offers insight into what caused the Optimizer to make the decision it did as well as a set of corrective measures that can be used to improve each aspect of the plan.
A Comprehensive and detailed approach using Machine Learning - An international E-Commerce company wants to use some of the most advanced machine learning techniques to analyse their customers with respect to their services and some important customer success matrix. They also have future expansion plans to India.
The document discusses various aggregate functions used in SQL such as SUM, COUNT, AVG, MIN, and MAX. It provides examples of how to use these functions to retrieve aggregated data from tables, including the use of GROUP BY and HAVING clauses. Aggregate functions summarize data over multiple rows, returning a single value. COUNT(*) counts all rows while other functions like COUNT ignore NULL values.
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
This document discusses processing business intelligence (BI) queries in SQL rather than BI-specific languages. It proposes extensions to Apache Calcite's SQL dialect to support measures, context-sensitive expressions, and analytic views that contain metrics and calculations. The talk will describe the SQL syntax for measures, how to define and use them for cross-dimensional calculations, and approaches for optimizing such queries. It provides examples of multidimensional queries that can be expressed in both MDX and the proposed SQL extensions.
This presentation features the fundamentals of SQL tunning like SQL Processing, Optimizer and Execution Plan, Accessing Tables, Performance Improvement Consideration Partition Technique. Presented by Alphalogic Inc : https://www.alphalogicinc.com/
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Julian Hyde
What if Looker saw the queries you just executed and could predict your next query? Could it make those queries faster, by smarter caching, or aggregate navigation? Could it read your past SQL queries and help you write your LookML model? Those are some of the reasons to add relational algebra into Looker’s query engine, and why Looker hired Julian Hyde, author of Apache Calcite, to lead the effort. In this talk about the internals of Looker’s query engine, Julian Hyde will describe how the engine works, how Looker queries are described in Calcite’s relational algebra, and some features that it makes possible.
A talk by Julian Hyde at JOIN 2019 in San Francisco.
This document contains examples from a portfolio of business intelligence projects including data modeling, SQL programming, SSIS, SSAS, SSRS, PPS, Excel Services, and SharePoint. It includes examples of relational and dimensional data models, SQL queries, SSIS packages for data integration and processing, an SSAS cube with calculations, KPIs and reports, Excel dashboards published to SharePoint using Excel Services, and reports and dashboards deployed to SharePoint.
Ground Breakers Romania: Explain the explain_planMaria Colgan
This session was delivered as part of the EMEA Ground Breakers tour in Romania, Oct. 2019. The execution plan for a SQL statement can often seem complicated and hard to understand. Determining if the execution plan you are looking at is the best plan you could get or attempting to improve a poorly performing execution plan can be a daunting task even for the most experienced DBA or developer. This session examines the different aspects of an execution plan, from selectivity to parallel execution and explains what information you should be gleaming from the plan and how it affects the execution. It offers insight into what caused the Optimizer to make the decision it did as well as a set of corrective measures that can be used to improve each aspect of the plan.
A Comprehensive and detailed approach using Machine Learning - An international E-Commerce company wants to use some of the most advanced machine learning techniques to analyse their customers with respect to their services and some important customer success matrix. They also have future expansion plans to India.
The document discusses various aggregate functions used in SQL such as SUM, COUNT, AVG, MIN, and MAX. It provides examples of how to use these functions to retrieve aggregated data from tables, including the use of GROUP BY and HAVING clauses. Aggregate functions summarize data over multiple rows, returning a single value. COUNT(*) counts all rows while other functions like COUNT ignore NULL values.
This document discusses summary queries in SQL. It explains that summary queries are used to retrieve aggregate or summary information rather than details of individual records. It describes SQL column functions such as SUM, AVG, MIN, MAX, COUNT that can be used to summarize data. It also discusses GROUP BY and HAVING clauses that allow grouping and filtering of aggregated data. Subqueries and the CASE statement for conditional logic in SQL queries are also briefly covered.
How to Realize an Additional 270% ROI on SnowflakeAtScale
Companies of all sizes have embraced the power, scale and ease of use of Snowflake’s cloud data platform, along with the promise of cost-savings. But if you aren’t careful, cloud compute usage can sneak up on you and leave you with runaway costs no matter what BI tool you are using.
The presentation from experts from Rakuten Rewards and AtScale shows practical techniques on how you can reduce unnecessary compute and boost BI performance to realize an additional 270% ROI on Snowflake. For the on-demand webinar, go to: https://www.atscale.com/resource/wbr-cloud-compute-cost-snowflake-tableau/
This document summarizes various SQL concepts:
1) It discusses non-correlated subqueries, UNION queries to combine result sets, and table-valued functions with CROSS APPLY to invoke the function for each row.
2) It demonstrates EXCEPT and INTERSECT to compare results between two tables, CUBE and ROLLUP to add hierarchical data summaries when grouping, and hints to override the query optimizer.
3) JOIN hints are discussed to force a specific join type, and table hints like NOLOCK are presented to modify query behavior.
James Colby Maddox Business Intellignece and Computer Science Portfoliocolbydaman
This portfolio covers the business intelligence course work I have completed at Set Focus, and some of the course work I have completed at Kennesaw State University
Part 3 of the SQL Tuning workshop examines the different aspects of an execution plan, from cardinality estimates to parallel execution and explains what information you should be gleaming from the plan and how it affects the execution. It offers insight into what caused the Optimizer to make the decision it did as well as a set of corrective measures that can be used to improve each aspect of the plan.
If SQL is the universal language of data, why do we author our most important data applications (metrics, analytics, business intelligence) in languages other than SQL? Multidimensional databases and languages such as MDX, DAX and Tableau LOD solve these problems but introduce others: they require specialized knowledge, complicate the data pipeline and don’t integrate well. Is it possible to define and query business intelligence models in SQL?
Apache Calcite has extended SQL to support measures, evaluation context, and multidimensional expressions. With these concepts you can define data models that contain measures, use them in queries, and define new measures in queries.
A talk given by Julian Hyde to Apache Calcite's virtual meetup, 2023-03-15.
This document provides an overview of online analytical processing (OLAP). It defines OLAP as a process for analyzing multidimensional data to help decision makers. OLAP uses data warehouses to store historical data in a structured format. It allows for analytical queries and operations like aggregation, roll-up, drill-down and slicing and dicing of data. SQL extensions and OLAP functions further aid analysis. OLAP systems can be MOLAP, ROLAP or HOLAP based on their architecture and data storage methods. Commercial OLAP systems include IBM, Oracle and Microsoft products.
Many developers/dbas are still applying the same tuning techniques in version 10 and 11 of the database as they did in version 8i when they first adopted the Cost Based Optimiser. This presentation is designed to bring your SQL tuning skills up to date. It will discuss differences in the way the optimiser behaves in version 11g, cover new hints, and look at new tuning techniques
This document discusses functions and the GROUP BY clause in Oracle. It covers single row functions like character, number, date functions and aggregate functions that operate on multiple rows. Examples are provided to demonstrate character functions like INITCAP, UPPER, SUBSTR and number functions like ROUND, TRUNC, MOD. Date functions like MONTH, YEAR and date arithmetic are also covered. Conversion, general functions like NVL, NVL2, NULLIF are explained. The syntax for aggregate functions like COUNT, MIN, MAX, AVG, SUM with the GROUP BY clause is demonstrated to group and filter results.
This document discusses functions and the GROUP BY clause in Oracle. It covers single row functions like character, number, date functions and aggregate functions that operate on multiple rows. Examples are provided to demonstrate character functions like INITCAP, UPPER, LOWER, number functions like ROUND, TRUNC, MOD, date functions like SYSDATE, and aggregate functions like COUNT, MIN, MAX, AVG, SUM. The GROUP BY clause is used to divide the table into groups based on a column and requires including that column in the SELECT statement along with any aggregate functions.
The document provides examples of the author's business intelligence skills, including data modeling, SQL programming, SQL Server Integration Services, SQL Server Analysis Services, MDX programming, SQL Server Reporting Services, Excel Power Pivot, Performance Point Services, and SharePoint Services. Examples include data warehouse modeling, stored procedures, SSIS packages, SSAS cubes and MDX queries, SSRS reports, Power Pivot reports, PerformancePoint dashboards, and SharePoint dashboards.
SQL provides powerful but reasonably simple tools for data analysis and handling. Mike McClellan, the Senior Product Manager for Paddle8, took beginners through the basics of SQL. He talked about the SQL queries needed to collect data from a database, even if it lives in different places and analyze it to find the answers you’re looking for.
He taught the understanding of essential SQL skills that allow developers to write queries against single and multiple tables, manipulate data in tables, and create database objects.
This portfolio contains examples of Carmen Faber's Microsoft Business Intelligence work using SSAS (SQL Server Analysis Services) and MDX (Multi-Dimensional Expressions). It includes cube structure, dimensions, hierarchies, calculations, KPIs (key performance indicators), and sample MDX queries analyzing data by measures, dimensions, and time periods.
Quick iteration and reusability of metric calculations for powerful data exploration.
At Looker, we want to make it easier for data analysts to service the needs of the data-hungry users in their organizations. We believe too much of their time is spent responding to ad hoc data requests and not enough time is spent building, experimenting, and embellishing a robust model of the business. Worse yet, business users are starving for data, but are forced to make important decisions without access to data that could guide them in the right direction. Looker addresses both of these problems with a YAML-based modeling language called LookML.
This paper walks through a number of data modeling examples, demonstrating how to use LookML to generate, alter, and update reports—without the need to rewrite any SQL. With LookML, you build your business logic, defining your important metrics once and then reusing them throughout a model—allowing quick, rapid iteration of data exploration, while also ensuring the accuracy of the SQL that’s generated. Small updates are quick and can be made immediately available to business users to manipulate, iterate, and transform in any way they see fit.
The document provides an overview of U-SQL, highlighting some differences from traditional SQL like C# keywords overlapping with SQL keywords, the ability to write C# expressions for data transformations, and supporting windowing functions, joins, and analytics capabilities. It also briefly covers topics like sorting, constant rowsets, inserts, and additional resources for learning more about U-SQL.
Nena Marín presents solutions for analyzing large datasets from internet advertising. She discusses building a recommender system using co-clustering that was trained on over 100 million ratings in under 17 minutes. For attribution reporting, pre-aggregated metrics are deployed to a GUI within 20 minutes for weekly reports. Lessons learned include addressing data quality, performance baselines, schema flexibility, and integration challenges.
Simplifying SQL with CTE's and windowing functionsClayton Groom
Too busy to learn the new capabilities of SQL Server? This session will cover several of the new features of the T-SQL language, specifically Common Table Expressions (CTE's) and Windowing Functions. This will be an code-heavy session with examples hat you can readily leverage in your solutions.
The focus will be on techniques to shape and manipulate your data for easier consumption by your application, and to leverage your SQL Server to avoid writing code in your application.
A basic to intermediate understanding of T-SQL is required.
Use Oracle 9i Summary Advisor To Better Manage Your Data Warehouseinfo_sunrise24
- The document discusses Oracle's Summary Advisor tool which recommends materialized views to improve data warehouse query performance and manage storage and processing costs.
- It provides an overview of Summary Advisor's functionalities and walks through the steps to invoke it using the DBMS_OLAP package - including loading workloads, applying filters, generating recommendations, and reports.
- The recommendations are evaluated based on performance gain, storage cost, and benefit-to-cost ratio to determine which materialized views should be created, retained, or dropped.
Advance Sql Server Store procedure PresentationAmin Uddin
This document discusses stored procedures in SQL, including:
1. What stored procedures are and how they improve performance over dynamic SQL.
2. How to create stored procedures with parameters and return values.
3. How to call/execute stored procedures and examples of inserting, updating, and deleting data using stored procedures.
4. The process of stored procedure compilation and recompilation, including when recompilation may be needed due to changes in data or indexes.
Oracle provides several analytical functions that allow for powerful data analysis using SQL. These include group functions that aggregate data over groups or windows, as well as window functions like ROW_NUMBER, RANK, and LAG that analyze data relative to the current row. ROLLUP and CUBE extensions to the GROUP BY clause enable calculation of subtotals across multiple dimensions of data with a single query.
Building a semantic/metrics layer using CalciteJulian Hyde
A semantic layer, also known as a metrics layer, lies between business users and the database, and lets those users compose queries in the concepts that they understand. It also governs access to the data, manages data transformations, and can tune the database by defining materializations.
Like many new ideas, the semantic layer is a distillation and evolution of many old ideas, such as query languages, multidimensional OLAP, and query federation.
In this talk, we describe the features we are adding to Calcite to define business views, query measures, and optimize performance.
A talk given at Community over Code, the annual conference of the Apache Software Foundation, in Halifax, NS, on 9th October, 2023.
Morel, a data-parallel programming languageJulian Hyde
This document discusses Morel, a data-parallel programming language that is an extension of Standard ML with relational operators. Morel aims to provide the expressiveness of a functional programming language, the power and conciseness of SQL, and efficient execution on different hardware. It is implemented on top of Apache Calcite's relational algebra framework. The talk describes Morel's evolution and how it is pushing Calcite's capabilities with graph and recursive queries. Standard ML concepts like functions, recursion, and higher-order functions are extended in Morel with relational operators like "from" to enable data-parallel programming over immutable datasets. Functions can also be treated as values in Morel.
More Related Content
Similar to Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
This document discusses summary queries in SQL. It explains that summary queries are used to retrieve aggregate or summary information rather than details of individual records. It describes SQL column functions such as SUM, AVG, MIN, MAX, COUNT that can be used to summarize data. It also discusses GROUP BY and HAVING clauses that allow grouping and filtering of aggregated data. Subqueries and the CASE statement for conditional logic in SQL queries are also briefly covered.
How to Realize an Additional 270% ROI on SnowflakeAtScale
Companies of all sizes have embraced the power, scale and ease of use of Snowflake’s cloud data platform, along with the promise of cost-savings. But if you aren’t careful, cloud compute usage can sneak up on you and leave you with runaway costs no matter what BI tool you are using.
The presentation from experts from Rakuten Rewards and AtScale shows practical techniques on how you can reduce unnecessary compute and boost BI performance to realize an additional 270% ROI on Snowflake. For the on-demand webinar, go to: https://www.atscale.com/resource/wbr-cloud-compute-cost-snowflake-tableau/
This document summarizes various SQL concepts:
1) It discusses non-correlated subqueries, UNION queries to combine result sets, and table-valued functions with CROSS APPLY to invoke the function for each row.
2) It demonstrates EXCEPT and INTERSECT to compare results between two tables, CUBE and ROLLUP to add hierarchical data summaries when grouping, and hints to override the query optimizer.
3) JOIN hints are discussed to force a specific join type, and table hints like NOLOCK are presented to modify query behavior.
James Colby Maddox Business Intellignece and Computer Science Portfoliocolbydaman
This portfolio covers the business intelligence course work I have completed at Set Focus, and some of the course work I have completed at Kennesaw State University
Part 3 of the SQL Tuning workshop examines the different aspects of an execution plan, from cardinality estimates to parallel execution and explains what information you should be gleaming from the plan and how it affects the execution. It offers insight into what caused the Optimizer to make the decision it did as well as a set of corrective measures that can be used to improve each aspect of the plan.
If SQL is the universal language of data, why do we author our most important data applications (metrics, analytics, business intelligence) in languages other than SQL? Multidimensional databases and languages such as MDX, DAX and Tableau LOD solve these problems but introduce others: they require specialized knowledge, complicate the data pipeline and don’t integrate well. Is it possible to define and query business intelligence models in SQL?
Apache Calcite has extended SQL to support measures, evaluation context, and multidimensional expressions. With these concepts you can define data models that contain measures, use them in queries, and define new measures in queries.
A talk given by Julian Hyde to Apache Calcite's virtual meetup, 2023-03-15.
This document provides an overview of online analytical processing (OLAP). It defines OLAP as a process for analyzing multidimensional data to help decision makers. OLAP uses data warehouses to store historical data in a structured format. It allows for analytical queries and operations like aggregation, roll-up, drill-down and slicing and dicing of data. SQL extensions and OLAP functions further aid analysis. OLAP systems can be MOLAP, ROLAP or HOLAP based on their architecture and data storage methods. Commercial OLAP systems include IBM, Oracle and Microsoft products.
Many developers/dbas are still applying the same tuning techniques in version 10 and 11 of the database as they did in version 8i when they first adopted the Cost Based Optimiser. This presentation is designed to bring your SQL tuning skills up to date. It will discuss differences in the way the optimiser behaves in version 11g, cover new hints, and look at new tuning techniques
This document discusses functions and the GROUP BY clause in Oracle. It covers single row functions like character, number, date functions and aggregate functions that operate on multiple rows. Examples are provided to demonstrate character functions like INITCAP, UPPER, SUBSTR and number functions like ROUND, TRUNC, MOD. Date functions like MONTH, YEAR and date arithmetic are also covered. Conversion, general functions like NVL, NVL2, NULLIF are explained. The syntax for aggregate functions like COUNT, MIN, MAX, AVG, SUM with the GROUP BY clause is demonstrated to group and filter results.
This document discusses functions and the GROUP BY clause in Oracle. It covers single row functions like character, number, date functions and aggregate functions that operate on multiple rows. Examples are provided to demonstrate character functions like INITCAP, UPPER, LOWER, number functions like ROUND, TRUNC, MOD, date functions like SYSDATE, and aggregate functions like COUNT, MIN, MAX, AVG, SUM. The GROUP BY clause is used to divide the table into groups based on a column and requires including that column in the SELECT statement along with any aggregate functions.
The document provides examples of the author's business intelligence skills, including data modeling, SQL programming, SQL Server Integration Services, SQL Server Analysis Services, MDX programming, SQL Server Reporting Services, Excel Power Pivot, Performance Point Services, and SharePoint Services. Examples include data warehouse modeling, stored procedures, SSIS packages, SSAS cubes and MDX queries, SSRS reports, Power Pivot reports, PerformancePoint dashboards, and SharePoint dashboards.
SQL provides powerful but reasonably simple tools for data analysis and handling. Mike McClellan, the Senior Product Manager for Paddle8, took beginners through the basics of SQL. He talked about the SQL queries needed to collect data from a database, even if it lives in different places and analyze it to find the answers you’re looking for.
He taught the understanding of essential SQL skills that allow developers to write queries against single and multiple tables, manipulate data in tables, and create database objects.
This portfolio contains examples of Carmen Faber's Microsoft Business Intelligence work using SSAS (SQL Server Analysis Services) and MDX (Multi-Dimensional Expressions). It includes cube structure, dimensions, hierarchies, calculations, KPIs (key performance indicators), and sample MDX queries analyzing data by measures, dimensions, and time periods.
Quick iteration and reusability of metric calculations for powerful data exploration.
At Looker, we want to make it easier for data analysts to service the needs of the data-hungry users in their organizations. We believe too much of their time is spent responding to ad hoc data requests and not enough time is spent building, experimenting, and embellishing a robust model of the business. Worse yet, business users are starving for data, but are forced to make important decisions without access to data that could guide them in the right direction. Looker addresses both of these problems with a YAML-based modeling language called LookML.
This paper walks through a number of data modeling examples, demonstrating how to use LookML to generate, alter, and update reports—without the need to rewrite any SQL. With LookML, you build your business logic, defining your important metrics once and then reusing them throughout a model—allowing quick, rapid iteration of data exploration, while also ensuring the accuracy of the SQL that’s generated. Small updates are quick and can be made immediately available to business users to manipulate, iterate, and transform in any way they see fit.
The document provides an overview of U-SQL, highlighting some differences from traditional SQL like C# keywords overlapping with SQL keywords, the ability to write C# expressions for data transformations, and supporting windowing functions, joins, and analytics capabilities. It also briefly covers topics like sorting, constant rowsets, inserts, and additional resources for learning more about U-SQL.
Nena Marín presents solutions for analyzing large datasets from internet advertising. She discusses building a recommender system using co-clustering that was trained on over 100 million ratings in under 17 minutes. For attribution reporting, pre-aggregated metrics are deployed to a GUI within 20 minutes for weekly reports. Lessons learned include addressing data quality, performance baselines, schema flexibility, and integration challenges.
Simplifying SQL with CTE's and windowing functionsClayton Groom
Too busy to learn the new capabilities of SQL Server? This session will cover several of the new features of the T-SQL language, specifically Common Table Expressions (CTE's) and Windowing Functions. This will be an code-heavy session with examples hat you can readily leverage in your solutions.
The focus will be on techniques to shape and manipulate your data for easier consumption by your application, and to leverage your SQL Server to avoid writing code in your application.
A basic to intermediate understanding of T-SQL is required.
Use Oracle 9i Summary Advisor To Better Manage Your Data Warehouseinfo_sunrise24
- The document discusses Oracle's Summary Advisor tool which recommends materialized views to improve data warehouse query performance and manage storage and processing costs.
- It provides an overview of Summary Advisor's functionalities and walks through the steps to invoke it using the DBMS_OLAP package - including loading workloads, applying filters, generating recommendations, and reports.
- The recommendations are evaluated based on performance gain, storage cost, and benefit-to-cost ratio to determine which materialized views should be created, retained, or dropped.
Advance Sql Server Store procedure PresentationAmin Uddin
This document discusses stored procedures in SQL, including:
1. What stored procedures are and how they improve performance over dynamic SQL.
2. How to create stored procedures with parameters and return values.
3. How to call/execute stored procedures and examples of inserting, updating, and deleting data using stored procedures.
4. The process of stored procedure compilation and recompilation, including when recompilation may be needed due to changes in data or indexes.
Oracle provides several analytical functions that allow for powerful data analysis using SQL. These include group functions that aggregate data over groups or windows, as well as window functions like ROW_NUMBER, RANK, and LAG that analyze data relative to the current row. ROLLUP and CUBE extensions to the GROUP BY clause enable calculation of subtotals across multiple dimensions of data with a single query.
Similar to Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22) (20)
Building a semantic/metrics layer using CalciteJulian Hyde
A semantic layer, also known as a metrics layer, lies between business users and the database, and lets those users compose queries in the concepts that they understand. It also governs access to the data, manages data transformations, and can tune the database by defining materializations.
Like many new ideas, the semantic layer is a distillation and evolution of many old ideas, such as query languages, multidimensional OLAP, and query federation.
In this talk, we describe the features we are adding to Calcite to define business views, query measures, and optimize performance.
A talk given at Community over Code, the annual conference of the Apache Software Foundation, in Halifax, NS, on 9th October, 2023.
Morel, a data-parallel programming languageJulian Hyde
This document discusses Morel, a data-parallel programming language that is an extension of Standard ML with relational operators. Morel aims to provide the expressiveness of a functional programming language, the power and conciseness of SQL, and efficient execution on different hardware. It is implemented on top of Apache Calcite's relational algebra framework. The talk describes Morel's evolution and how it is pushing Calcite's capabilities with graph and recursive queries. Standard ML concepts like functions, recursion, and higher-order functions are extended in Morel with relational operators like "from" to enable data-parallel programming over immutable datasets. Functions can also be treated as values in Morel.
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
The perfect data parallel language has not yet been invented. SQL queries can achieve great performance and scale, but there are many general purpose algorithms that it cannot express. In Morel, we build on the functional and relational roots of MapReduce in an elegant and strongly-typed general-purpose programming language. But Morel is, in a real sense, a query language; programs are executed on relational frameworks such as Google BigQuery and Spark.
In this talk, we describe the principles that drove Morel’s design, the problems that we had to solve in order to implement a hybrid functional/relational language, and how Morel can be applied to implement data-intensive systems.
We also introduce Apache Calcite, the popular open source framework for query planning, and describe how Morel's compiler uses Calcite's relational algebra and rewrite rules to generate efficient plans.
Is it easier to add functional programming features to a query language, or to add query capabilities to a functional language? In Morel, we have done the latter.
Functional and query languages have much in common, and yet much to learn from each other. Functional languages have a rich type system that includes polymorphism and functions-as-values and Turing-complete expressiveness; query languages have optimization techniques that can make programs several orders of magnitude faster, and runtimes that can use thousands of nodes to execute queries over terabytes of data.
Morel is an implementation of Standard ML on the JVM, with language extensions to allow relational expressions. Its compiler can translate programs to relational algebra and, via Apache Calcite’s query optimizer, run those programs on relational backends.
In this talk, we describe the principles that drove Morel’s design, the problems that we had to solve in order to implement a hybrid functional/relational language, and how Morel can be applied to implement data-intensive systems.
(A talk given by Julian Hyde at Strange Loop 2021, St. Louis, MO, on October 1st, 2021.)
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
The document provides instructions for setting up the environment and coding tutorial for the BOSS'21 Copenhagen tutorial on Apache Calcite.
It includes the following steps:
1. Clone the GitHub repository containing sample code and dependencies.
2. Compile the project.
3. It outlines the draft schedule for the tutorial, which will cover topics like Calcite introduction, demonstration of SQL queries on CSV files, setting up the coding environment, using Lucene for indexing, and coding exercises to build parts of the logical and physical query plans in Calcite.
4. The tutorial will be led by Stamatis Zampetakis from Cloudera and Julian Hyde from Google, who are both committers to
The evolution of Apache Calcite and its CommunityJulian Hyde
Apache Calcite is an open source framework for building databases, and includes a SQL parser, relational algebra, and a highly extensible query optimizer.
It has achieved wide adoption, used in many commercial products, open source projects, and as a test bed for computer science research.
But there is a bootstrap problem: If software is written by a community of contributors, and each contributor acts in their own self-interest, how do you get the first working version of the product? The answer is in the story of how the technology evolved, and how the community evolved with it, and in this talk we tell that story.
The Hop project entered Apache Software Foundation as an Incubator project in 2020, and Julian Hyde, one of their mentors, gave this presentation to educate the initial committers on the Apache Way and what to expect during the Incubation process.
The talk was given by Julian Hyde on October 1st, 2020, with the original title "Apache Incubation - What's it all about?"
Efficient spatial queries on vanilla databasesJulian Hyde
A talk given by Julian Hyde at the Apache Calcite online meetup, 2021/01/20.
Spatial and GIS applications have traditionally required specialized databases, or at least specialized data structures like r-trees. Unfortunately this means that hybrid applications such as spatial analytics are not well served, and many people are unaware of the power of spatial queries because their favorite database does not support them.
In this talk, we describe how Apache Calcite enables efficient spatial queries using generic data structures such as HBase’s key-sorted tables, using techniques like Hilbert space-filling curves and materialized views. Calcite implements much of the OpenGIS function set and recognizes query patterns that can be rewritten to use particular spatial indexes. Calcite is bringing spatial query to the masses!
A talk given by Julian Hyde at DataCouncil SF on April 18, 2019
How do you organize your data so that your users get the right answers at the right time? That question is a pretty good definition of data engineering — but it is also describes the purpose of every DBMS (database management system). And it’s not a coincidence that these are so similar.
This talk looks at the patterns that reoccur throughout data management — such as caching, partitioning, sorting, and derived data sets. As the speaker is the author of Apache Calcite, we first look at these patterns through the lens of Relational Algebra and DBMS architecture. But then we apply these patterns to the modern data pipeline, ETL and analytics. As a case study, we look at how Looker’s “derived tables” blur the line between ETL and caching, and leverage the power of cloud databases.
Don't optimize my queries, organize my data!Julian Hyde
Your queries won't run fast if your data is not organized right. Apache Calcite optimizes queries, but can we make it optimize data? We had to solve several challenges. Users are too busy to tell us the structure of their database, and the query load changes daily, so Calcite has to learn and adapt. We talk about new algorithms we developed for gathering statistics on massive database, and how we infer and evolve the data model based on the queries.
A talk given by Julian Hyde at ApacheCon NA 2018 in Montreal on September 26th, 2018.
Spatial and GIS applications have traditionally required specialized databases, or at least specialized data structures like r-trees. Unfortunately this means that hybrid applications such as spatial analytics are not well served, and many people are unaware of the power of spatial queries because their favorite database does not support them.
In this talk, we describe how Apache Calcite enables efficient spatial queries using generic data structures such as HBase’s key-sorted tables, using techniques like Hilbert space-filling curves and materialized views. Calcite implements much of the OpenGIS function set and recognizes query patterns that can be rewritten to use particular spatial indexes. Calcite is bringing spatial query to the masses!
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
The revolution has happened. We are living the age of the deconstructed database. The modern enterprises are powered by data, and that data lives in many formats and locations, in-flight and at rest, but somewhat surprisingly, the lingua franca for remains SQL.
In this talk, Julian describes Apache Calcite, a toolkit for relational algebra that powers many systems including Apache Beam, Flink and Hive. He discusses some areas of development in Calcite: streaming SQL, materialized views, enabling spatial query on vanilla databases, and what a mash-up of all three might look like.
He also describes how SQL is being extended to handle streaming, and the challenges that will need to be solved if it is to become standard.
A talk given by Julian Hyde at Lyft, San Francisco, on 2018/06/27.
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
Apache Calcite is an open source framework for building data management systems that allows for optimized query processing over heterogeneous data sources. It uses a flexible relational algebra and extensible adapter-based architecture that allows it to incorporate diverse data sources. Calcite's rule-based optimizer transforms logical query plans into efficient physical execution plans tailored for different data sources. It has been adopted by many projects and companies and is also used in research.
A talk given by Julian Hyde at DataEngConf SF on April 17th 2018.
Did you know that databases often “cheat”? Even with a scalable query engine and smart optimizer, many real-world queries would be too slow if the engine read all the data, so the engine re-writes your query to use a pre-materialized result. B-tree indexes made the first relational databases possible, and there are now many flavors of materialization, from explicit materialized views to OLAP-style caching and spatial indexes. Materialization is more relevant than ever in today’s heterogenous, distributed systems.
If you are evaluating data engines, we describe what materialization features to look for in your next engine. If you are implementing an engine, we describe the features provided by Apache Calcite to design, maintain and use materializations.
Don’t optimize my queries, optimize my data!Julian Hyde
The document discusses strategies for optimizing data through materialized views and how data systems can learn to optimize themselves. It proposes an algorithm that uses sketches and information theory to profile data cardinalities and recommend materialized views. The algorithm aims to defeat the combinatorial search space by only considering combinations with "surprising" cardinalities. This profiling provides the cost and benefit information needed to optimize data structures. The document also discusses using query logs and statistics to infer relationships between tables and design summary tables through lattices.
Query optimizers and people have one thing in common: the better they understand their data, the better they can do their jobs. Optimizing queries is hard if you don't have good estimates for the sizes of the intermediate join and aggregate results. Data profiling is a technique that scans data, looking for patterns within the data such as keys, functional dependencies, and correlated columns. These richer statistics can be used in Apache Calcite's query optimizer, and the projects that use it, such as Apache Hive, Phoenix and Drill. We describe how we built a data profiler as a table function in Apache Calcite, review the recent research and algorithms that made it possible, and show how you can use the profiler to improve the quality of your data.
A talk given by Julian Hyde at DataWorks Summit, San Jose, on June 14th 2017.
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteJulian Hyde
What if Apache Pig had a SQL front-end and query optimizer? What if Apache Calcite was able to use Pig and MapReduce to run queries? In this project, we aimed to answer both questions by adding a Pig adapter for Calcite. In this talk, we describe Calcite's adapter framework, how we used it to write a Pig adapter, and how you can use this SQL interface to Pig for interactive and long-running queries.
A talk given by Eli Levine and Julian Hyde at Apache: Big Data, Miami, on May 17th, 2017.
Query optimizers and people have one thing in common: the better they understand their data, the better they can do their jobs. Optimizing queries is hard if you don't have good estimates for the sizes of the intermediate join and aggregate results. Data profiling is a technique that scans data, looking for patterns within the data such as keys, functional dependencies, and correlated columns. These richer statistics can be used in Apache Calcite's query optimizer, and the projects that use it, such as Apache Hive, Phoenix and Drill. We describe how we built a data profiler as a table function in Apache Calcite, review the recent research and algorithms that made it possible, and show how you can use the profiler to improve the quality of your data.
A talk given by Julian Hyde at Apache: Big Data, Miami, on May 16th 2017.
Streaming is necessary to handle data rates and latency, but SQL is unquestionably the lingua franca of data. Is it possible to combine SQL with streaming, and if so, what does the resulting language look like? Apache Calcite is extending SQL to include streaming, and Apache Apex is using Calcite to support streaming SQL. In this talk, Julian Hyde describes streaming SQL in detail and shows how you can use streaming SQL in your application. He also describes how Calcite’s planner optimizes queries for throughput and latency.
Julian Hyde gave this talk at Apex Big Data World, Mountain View, on April 4, 2017.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/how-axelera-ai-uses-digital-compute-in-memory-to-deliver-fast-and-energy-efficient-computer-vision-a-presentation-from-axelera-ai/
Bram Verhoef, Head of Machine Learning at Axelera AI, presents the “How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-efficient Computer Vision” tutorial at the May 2024 Embedded Vision Summit.
As artificial intelligence inference transitions from cloud environments to edge locations, computer vision applications achieve heightened responsiveness, reliability and privacy. This migration, however, introduces the challenge of operating within the stringent confines of resource constraints typical at the edge, including small form factors, low energy budgets and diminished memory and computational capacities. Axelera AI addresses these challenges through an innovative approach of performing digital computations within memory itself. This technique facilitates the realization of high-performance, energy-efficient and cost-effective computer vision capabilities at the thin and thick edge, extending the frontier of what is achievable with current technologies.
In this presentation, Verhoef unveils his company’s pioneering chip technology and demonstrates its capacity to deliver exceptional frames-per-second performance across a range of standard computer vision networks typical of applications in security, surveillance and the industrial sector. This shows that advanced computer vision can be accessible and efficient, even at the very edge of our technological ecosystem.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframePrecisely
Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market.
Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
1. Measures in SQL
Julian Hyde (Google)
2024-05-22
SF Distributed Systems Meetup
San Francisco, California
Measures in SQL
ABSTRACT
SQL has attained widespread adoption, but Business Intelligence tools still use their
own higher level languages based upon a multidimensional paradigm. Composable
calculations are what is missing from SQL, and we propose a new kind of column,
called a measure, that attaches a calculation to a table. Like regular tables, tables
with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional
languages but retains SQL semantics. Measure invocations can be expanded in place
to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive
expressions (a way to evaluate multidimensional expressions that is consistent with
existing SQL semantics), a concept called evaluation context, and several operations
for setting and modifying the evaluation context.
SIGMOD, June 9–15, 2024, Santiago, Chile
Julian Hyde
Google Inc.
San Francisco, CA, USA
julianhyde@google.com
John Fremlin
Google Inc.
New York, NY, USA
fremlin@google.com
7. 1. Allow tables to have measures DESCRIBE EnhancedOrders;
column type
============ =====
prodName STRING
custName STRING
orderDate DATE
revenue INTEGER
cost INTEGER
profitMargin DOUBLE MEASURE
2. Operators for evaluating measures SELECT prodName, profitMargin
FROM EnhancedOrders
GROUP BY prodName;
prodName profitMargin
======== ============
Acme 0.60
Happy 0.47
Whizz 0.67
3. Syntax to define measures in a query SELECT *,
(SUM(revenue) - SUM(cost)) / SUM(revenue)
AS MEASURE profitMargin
FROM Orders
GROUP BY prodName;
Extend the relational model with measures
8. SELECT prodName,
profitMargin
FROM EnhancedOrders
GROUP BY prodName;
Definitions
A context-sensitive expression (CSE) is an expression
whose value is determined by an evaluation context.
An evaluation context is a predicate whose terms are
one or more columns from the same table.
● This set of columns is the dimensionality of the
CSE.
A measure is a special kind of column that becomes a
CSE when used in a query.
● A measure’s dimensionality is the set of
non-measure columns in its table.
● The data type of a measure that returns a value of
type t is t MEASURE, e.g. INTEGER MEASURE.
prodName profitMargin
======== ============
Acme 0.60
Happy 0.50
Whizz 0.67
SELECT (SUM(revenue) - SUM(cost))
/ SUM(revenue) AS profitMargin
FROM Orders
WHERE prodName = ‘Acme’;
profitMargin
============
0.60
profitMargin is a
measure (and a
CSE)
Dimensionality is
{prodName, custName,
orderDate, revenue,
cost}
Evaluation context
for this cell is
prodName = ‘Acme’
9. SELECT (SUM(revenue) - SUM(cost))
/ SUM(revenue) AS m
FROM Orders
WHERE prodName = ‘Acme’;
m
====
0.60
SELECT (SUM(revenue) - SUM(cost))
/ SUM(revenue) AS m
FROM Orders
WHERE prodName = ‘Happy’;
m
====
0.50
SELECT (SUM(revenue) - SUM(cost))
/ SUM(revenue) AS m
FROM Orders
WHERE prodName = ‘Whizz’
AND custName = ‘Bob’;
m
====
NULL
SELECT prodName,
profitMargin,
profitMargin
AT (SET prodName = ‘Happy’)
AS happyMargin,
profitMargin
AT (SET custName = ‘Bob’)
AS bobMargin
FROM EnhancedOrders
GROUP BY prodName;
AT operator
The context transformation operator AT modifies the
evaluation context.
Syntax:
expression AT (contextModifier…)
contextModifier ::=
WHERE predicate
| ALL
| ALL dimension
| SET dimension = [CURRENT] expression
| VISIBLE
prodName profitMargin happyMargin bobMargin
======== ============ =========== =========
Acme 0.60 0.50 0.60
Happy 0.50 0.50 0.75
Whizz 0.67 0.50 NULL
Evaluation context for
this cell is
prodName = ‘Whizz’
AND custName = ‘Bob’
11. Grain-locking
What is the average age of the customer
who would ordered each product?
When we use an aggregate function in a
join query, it will ‘double count’ if the
join duplicates rows.
This is generally not we want for
measures – except if we want a
weighted average – but is difficult to
avoid in SQL.
Measures are locked to the grain of the
table that defined them.
WITH EnhancedCustomers AS (
SELECT *,
AVG(custAge) AS MEASURE avgAge
FROM Customers)
SELECT o.prodName,
AVG(c.custAge) AS weightedAvgAge,
c.avgAge AS avgAge
FROM Orders AS o
JOIN EnhancedCustomers AS c USING (custName)
GROUP BY o.prodName;
prodName weightedAvgAge avgAge
======== ============== ======
Acme 41 41
Happy 29 32
Whizz 17 17
prodName custName orderDate revenue cost
Happy Alice 2023/11/28 6 4
Acme Bob 2023/11/27 5 2
Happy Alice 2024/11/28 7 4
Whizz Celia 2023/11/25 3 1
Happy Bob 2022/11/27 4 1
custName custAge
Alice 23
Bob 41
Celia 17
Alice (age 23)
has two orders;
Bob (age 41) has
one order.
12. SELECT *,
SUM(cost) AS MEASURE sumCost,
SUM(revenue) AS MEASURE sumRevenue
FROM Orders
Composition & closure
Just as tables are closed under queries, so
tables-with-measures are closed under
queries-with-measures
Measures can reference measures
Complex analytical calculations without
touching the FROM clause
Evaluation contexts can be nested
SELECT *,
SUM(cost) AS MEASURE sumCost,
SUM(revenue) AS MEASURE sumRevenue,
(sumRevenue - sumCost) / sumRevenue
AS MEASURE profitMargin
FROM Orders
SELECT *,
SUM(cost) AS MEASURE sumCost,
SUM(revenue) AS MEASURE sumRevenue,
(sumRevenue - sumCost) / sumRevenue
AS MEASURE profitMargin,
sumRevenue
- sumRevenue AT (SET YEAR(orderDate)
= CURRENT YEAR(orderDate) - 1)
AS MEASURE revenueGrowthYoY
FROM Orders
SELECT *,
SUM(cost) AS MEASURE sumCost,
SUM(revenue) AS MEASURE sumRevenue,
(sumRevenue - sumCost) / sumRevenue
AS MEASURE profitMargin,
sumRevenue
- sumRevenue AT (SET YEAR(orderDate)
= CURRENT YEAR(orderDate) - 1)
AS MEASURE revenueGrowthYoY,
ARRAY_AGG(productId
ORDER BY sumRevenue DESC LIMIT 5)
AT (ALL productId)
AS MEASURE top5Products
FROM Orders;
SELECT *,
SUM(cost) AS MEASURE sumCost,
SUM(revenue) AS MEASURE sumRevenue,
(sumRevenue - sumCost) / sumRevenue
AS MEASURE profitMargin,
sumRevenue
- sumRevenue AT (SET YEAR(orderDate)
= CURRENT YEAR(orderDate) - 1)
AS MEASURE revenueGrowthYoY,
ARRAY_AGG(productId
ORDER BY sumRevenue DESC LIMIT 5)
AT (ALL productId)
AS MEASURE top5Products,
ARRAY_AGG(customerId
ORDER BY sumRevenue DESC LIMIT 3)
AT (ALL customerId
SET productId MEMBER OF top5Products
AT (SET YEAR(orderDate)
= CURRENT YEAR(orderDate) - 1))
AS MEASURE top3CustomersOfTop5Products
FROM Orders;
14. Represent a Business Intelligence model as a SQL view
Orders Products
Customers
CREATE VIEW OrdersCube AS
SELECT *
FROM (
SELECT o.orderDate AS `order.date`,
o.revenue AS `order.revenue`,
SUM(o.revenue) AS MEASURE `order.sum_revenue`
FROM Orders) AS o
LEFT JOIN (
SELECT c.custName AS `customer.name`,
c.state AS `customer.state`,
c.custAge AS `customer.age`,
AVG(c.custAge) AS MEASURE `customer.avg_age`
FROM Customers) AS c
ON o.custName = c.custName
LEFT JOIN (
SELECT p.prodName AS `product.name`,
p.color AS `product.color`,
AVG(p.weight) AS MEASURE `product.avg_weight`
FROM Products) AS p
ON o.prodName = p.prodName;
SELECT `customer.state`, `product.avg_weight`
FROM OrdersCube
GROUP BY `customer.state`;
● SQL planner handles view expansion
● Grain locking makes it safe to use a
star schema
● Users can define new models simply
by writing queries
15. Implementing measures & CSEs as SQL rewrites
simple
complex
Complexity Query Expanded query
Simple measure can
be inlined
SELECT prodName, avgRevenue
FROM OrdersCube
GROUP BY prodName
SELECT prodName, AVG(revenue)
FROM orders
GROUP BY prodName
Join requires
grain-locking
SELECT prodName, avgAge
FROM OrdersCube
GROUP BY prodName
SELECT o.prodName, AVG(c.custAge PER
c.custName) FROM orders JOIN customers
GROUP BY prodName
→ (something with GROUPING SETS)
Period-over- period SELECT prodName, avgAge -
avgAge AT (SET year =
CURRENT year - 1)
FROM OrdersCube
GROUP BY prodName
(something with window aggregates)
Scalar subquery can
accomplish
anything
SELECT prodName, prodColor
avgAge AT (ALL custState
SET year = CURRENT year - 1)
FROM OrdersCube
GROUP BY prodName, prodColor
SELECT prodName, prodColor,
(SELECT … FROM orders
WHERE <evaluation context>)
FROM orders
GROUP BY prodName, prodColor