Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite

•

0 likes•1,152 views

Julian Hyde

A talk given by Julian Hyde and Mosha Pasumansky at Northwest Database Society Annual Meeting, 2021/01/20.

Software

Open Source SQL -
beyond parsers:
ZetaSQL & Apache
Calcite
Northwest Database Society Annual Meeting
2021/01/20
Mosha Pasumansky & Julian Hyde (Google)

Apache Calcite goals
Make it easier to write a simple DBMS
Advance the state of the art for complex DBMS
Bring database approaches to new areas (e.g. streaming, geospatial, federation,
data science)
Composition + evolution (framework + open source)
Apache license & governance

LucidDB
C++
Calcite evolution - origins as an SMP DB
JDBC server
JDBC client
Physical
operators
Rewrite rules
Catalog
Storage & data
structures
SQL parser &
validator
Query
planner
Relational
algebra
Java

Optiq
Calcite evolution - pluggable components
JDBC server
JDBC client
Physical
operators
Rewrite rules
SQL parser &
validator
Query
planner
Relational
algebra

Optiq
Calcite evolution - pluggable components
JDBC server
JDBC client
SQL parser &
validator
Query
planner
Adapter
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
Physical
operators
Storage
Relational
algebra

Apache Calcite
Calcite evolution - separate JDBC stack
Avatica
JDBC server
JDBC client
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
ODBC client
Adapter
Physical
operators
Storage
SQL parser &
validator
Query
planner
Relational
algebra

Apache Calcite
Calcite evolution - federation via adapters
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
Adapter
Physical
operators
Storage
SQL parser &
validator
Query
planner
Relational
algebra
SQL

Calcite evolution - federation via adapters
Apache Calcite
JDBC adapter
Pluggable
rewrite rules
Pluggable
stats / cost
Enumerable
adapter
MongoDB
adapter
File adapter
(CSV, JSON, Http)
Apache Kafka
adapter
Apache Spark
adapter
Pluggable
catalog
SQL
SQL parser &
validator
Query
planner
Relational
algebra

Calcite evolution - federation via adapters
Apache Calcite
Pluggable
rewrite rules
Pluggable
stats / cost
Enumerable
adapter
Pluggable
catalog
SQL
SQL parser &
validator
Query
planner
Relational
algebra

Calcite evolution - federation via adapters
Apache Calcite
JDBC adapter
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
SQL
SQL parser &
validator
Query
planner
Relational
algebra

Apache Calcite
Calcite evolution - SQL dialects
Pluggable
rewrite rules
Pluggable parser, lexical,
conformance, operators
Pluggable
SQL dialect
SQL
SQL
SQL parser &
validator
Query
planner
Relational
algebra
JDBC adapter

Apache Calcite
Calcite evolution - other front-end languages
SQL
Adapter
Physical
operators
Storage
SQL parser &
validator
Query
planner
Relational
algebra

Calcite evolution - other front-end languages
Pig
RelBuilder
Adapter
Physical
operators
Morel
Storage
Query
planner
Relational
algebra
Datalog
SQL parser &
validator
SQL

Apache Calcite
Calcite architecture
Avatica
JDBC server
JDBC client
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
ODBC client
Adapter
Physical
operators
Storage
SQL parser &
validator
Query
planner
Relational
algebra
Core – Operator expressions
(relational algebra) and planner
(based on Cascades)
External – Data storage,
algorithms and catalog
Optional – SQL parser, JDBC &
ODBC drivers
Extensible – Planner rewrite
rules, statistics, cost model,
algebra, UDFs
RelBuilder

Lessons learned
Decompose the database into components
SQL is standard but also allows innovation
Relational algebra intermediate language
Calcite has many uses, including:
● Embedded within DBMS (e.g. Apache Hive, OmniSciDB)
● Lightweight DBMS
● Platform for research
● Sandbox for relational algebra
● Toolkit for translating between SQL dialects

ZetaSQL
SQL
Parser
Catalog
AST
Resolver
Resolved
AST
BigQuery
Spanner
F1
DataFlow
Test Harness
Corpus of
compliance
tests
Reference
implementation

Thank you!
Questions?
#ZetaSQL
https://github.com/google/zetasql
@ApacheCalcite
https://calcite.apache.org

Apache Calcite is an open source framework for building databases, and includes a SQL parser, relational algebra, and a highly extensible query optimizer. It has achieved wide adoption, used in many commercial products, open source projects, and as a test bed for computer science research. But there is a bootstrap problem: If software is written by a community of contributors, and each contributor acts in their own self-interest, how do you get the first working version of the product? The answer is in the story of how the technology evolved, and how the community evolved with it, and in this talk we tell that story.

Apache Calcite (a tutorial given at BOSS '21)

Julian Hyde

Apache Calcite is a dynamic data management framework. Think of it as a toolkit for building databases: it has an industry-standard SQL parser, validator, highly customizable optimizer (with pluggable transformation rules and cost functions, relational algebra, and an extensive library of rules), but it has no preferred storage primitives. In this tutorial (given at BOSS '21 in Copenhagen as part of VLDB '21) the attendees will use Apache Calcite to build a fully fledged query processor from scratch with very few lines of code. This processor is a full implementation of SQL over an Apache Lucene storage engine. (Lucene does not support SQL queries and lacks a declarative language for performing complex operations such as joins or aggregations.) Attendees will also learn how to use Calcite as an effective tool for research. Presenters: Julian Hyde and Stamatis Zampetakis

SQL for NoSQL and how Apache Calcite can help

Christian Tzolov

https://fosdem.org/2017/schedule/event/hpc_bigdata_calcite/ When working with BigData & IoT systems we often feel the need for a Common Query Language. The platform specific languages are often harder to integrate with and require longer adoption time. To fill this gap many NoSql (Not-only-Sql) vendors are building SQL layers for their platforms. It is worth exploring the driving forces behind this trend, how it fits in your BigData stacks and how we can adopt it in our favorite tools. However building SQL engine from scratch is a daunting job and frameworks like Apache Calcite can help you with the heavy lifting. Calcite allow you to integrate SQL parser, cost-based optimizer, and JDBC with your big data system. Calcite has been used to empower many Big-Data platforms such as Hive, Spark, Drill Phoenix to name some. I will walk you through the process of building a SQL access layer for Apache Geode (In-Memory Data Grid). I will share my experience, pitfalls and technical consideration like balancing between the SQL/RDBMS semantics and the design choices and limitations of the data system. Hopefully this will enable you to add SQL capabilities to your prefered NoSQL data system.

Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...

Julian Hyde

A talk given at ACM SIGMOD 2018 in support of the paper <a href="https://arxiv.org/abs/1802.10233"> Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources</a>. Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.

Introduction to Apache Calcite

Jordan Halterman

Fast federated SQL with Apache Calcite

Chris Baynes

Apache Calcite Tutorial - BOSS 21

Stamatis Zampetakis

Apache Calcite is a dynamic data management framework. Think of it as a toolkit for building databases: it has an industry-standard SQL parser, validator, highly customizable optimizer (with pluggable transformation rules and cost functions, relational algebra, and an extensive library of rules), but it has no preferred storage primitives. In this tutorial, the attendees will use Apache Calcite to build a fully fledged query processor from scratch with very few lines of code. This processor is a full implementation of SQL over an Apache Lucene storage engine. (Lucene does not support SQL queries and lacks a declarative language for performing complex operations such as joins or aggregations.) Attendees will also learn how to use Calcite as an effective tool for research.

Apache Calcite overview

Julian Hyde

Apache Calcite is a dynamic data management framework. Think of it as a toolkit for building databases: it has an industry-standard SQL parser, validator, highly customizable optimizer (with pluggable transformation rules and cost functions, relational algebra, and an extensive library of rules), but it has no preferred storage primitives. In this tutorial, the attendees will use Apache Calcite to build a fully fledged query processor from scratch with very few lines of code. This processor is a full implementation of SQL over an Apache Lucene storage engine. (Lucene does not support SQL queries and lacks a declarative language for performing complex operations such as joins or aggregations.) Attendees will also learn how to use Calcite as an effective tool for research.

Apache Calcite overview

Julian Hyde

Cost-based Query Optimization in Apache Phoenix using Apache Calcite

Julian Hyde

This talk, given by Maryann Xue and Julian Hyde at Hadoop Summit, San Jose on June 30th, 2016, describes how we re-engineered Apache Phoenix with a cost-based optimizer based on Apache Calcite. Apache Phoenix has rapidly become a workhorse in many organizations, providing a convenient standard SQL interface to HBase suitable for a wide variety of workloads from transactions to ETL and analytics. But Phoenix's initial query optimizer was based on static optimization procedures and thus could not choose between several potential plans or indices based on cost metrics. We describe how we rebuilt Phoenix's parser and query optimizer using the Calcite framework, improving Phoenix's performance and SQL compliance. The new architecture uses relational algebra as an intermediate language, and this enables you to switch in other engines, especially those also based on Calcite. As an example of this, we demonstrate querying a Phoenix database via Apache Drill.

Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...

Spark Summit

What if you could get the simplicity, convenience, interoperability, and storage niceties of an old-fashioned CSV with the speed of a NoSQL database and the storage requirements of a gzipped file? Enter Parquet. At The Weather Company, Parquet files are a quietly awesome and deeply integral part of our Spark-driven analytics workflow. Using Spark + Parquet, we’ve built a blazing fast, storage-efficient, query-efficient data lake and a suite of tools to accompany it. We will give a technical overview of how Parquet works and how recent improvements from Tungsten enable SparkSQL to take advantage of this design to provide fast queries by overcoming two major bottlenecks of distributed analytics: communication costs (IO bound) and data decoding (CPU bound).

Optimizing Delta/Parquet Data Lakes for Apache Spark

Databricks

This talk outlines data lake design patterns that can yield massive performance gains for all downstream consumers. We will talk about how to optimize Parquet data lakes and the awesome additional features provided by Databricks Delta. * Optimal file sizes in a data lake * File compaction to fix the small file problem * Why Spark hates globbing S3 files * Partitioning data lakes with partitionBy * Parquet predicate pushdown filtering * Limitations of Parquet data lakes (files aren't mutable!) * Mutating Delta lakes * Data skipping with Delta ZORDER indexes Speaker: Matthew Powers

Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...

Julian Hyde

What if Looker saw the queries you just executed and could predict your next query? Could it make those queries faster, by smarter caching, or aggregate navigation? Could it read your past SQL queries and help you write your LookML model? Those are some of the reasons to add relational algebra into Looker’s query engine, and why Looker hired Julian Hyde, author of Apache Calcite, to lead the effort. In this talk about the internals of Looker’s query engine, Julian Hyde will describe how the engine works, how Looker queries are described in Calcite’s relational algebra, and some features that it makes possible. A talk by Julian Hyde at JOIN 2019 in San Francisco.

Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...

Christian Tzolov

When working with BigData & IoT systems we often feel the need for a Common Query Language. The system specific languages usually require longer adoption time and are harder to integrate within the existing stacks. To fill this gap some NoSql vendors are building SQL access to their systems. Building SQL engine from scratch is a daunting job and frameworks like Apache Calcite can help you with the heavy lifting. Calcite allow you to integrate SQL parser, cost-based optimizer, and JDBC with your NoSql system. We will walk through the process of building a SQL access layer for Apache Geode (In-Memory Data Grid). I will share my experience, pitfalls and technical consideration like balancing between the SQL/RDBMS semantics and the design choices and limitations of the data system. Hopefully this will enable you to add SQL capabilities to your prefered NoSQL data system.

Data profiling with Apache Calcite

Julian Hyde

Query optimizers and people have one thing in common: the better they understand their data, the better they can do their jobs. Optimizing queries is hard if you don't have good estimates for the sizes of the intermediate join and aggregate results. Data profiling is a technique that scans data, looking for patterns within the data such as keys, functional dependencies, and correlated columns. These richer statistics can be used in Apache Calcite's query optimizer, and the projects that use it, such as Apache Hive, Phoenix and Drill. We describe how we built a data profiler as a table function in Apache Calcite, review the recent research and algorithms that made it possible, and show how you can use the profiler to improve the quality of your data. A talk given by Julian Hyde at DataWorks Summit, San Jose, on June 14th 2017.

Apache Calcite: One Frontend to Rule Them All

Michael Mior

Apache Calcite: One planner fits all

Julian Hyde

A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai

Databricks

Catalyst is becoming one of the most important components of Apache Spark, as it underpins all the major new APIs in Spark 2.0 and later versions, from DataFrames and Datasets to Streaming. At its core, Catalyst is a general library for manipulating trees. In this talk, Yin explores a modular compiler frontend for Spark based on this library that includes a query analyzer, optimizer, and an execution planner. Yin offers a deep dive into Spark SQL’s Catalyst optimizer, introducing the core concepts of Catalyst and demonstrating how developers can extend it. You’ll leave with a deeper understanding of how Spark analyzes, optimizes, and plans a user’s query.

The openCypher Project - An Open Graph Query Language

Neo4j

We want to present the openCypher project, whose purpose is to make Cypher available to everyone – every data store, every tooling provider, every application developer. openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone. openCypher is an open source project that delivers four key artifacts released under a permissive license: (i) the Cypher reference documentation, (ii) a Technology compatibility kit (TCK), (iii) Reference implementation (a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool) and (iv) the Cypher language specification. We are also seeking to make the process of specifying and evolving the Cypher query language as open as possible, and are actively seeking comments and suggestions on how to improve the Cypher query language. The purpose of this talk is to provide more details regarding the above-mentioned aspects. We want to present the openCypher project, whose purpose is to make Cypher available to everyone – every data store, every tooling provider, every application developer. openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone. openCypher is an open source project that delivers four key artifacts released under a permissive license: (i) the Cypher reference documentation, (ii) a Technology compatibility kit (TCK), (iii) Reference implementation (a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool) and (iv) the Cypher language specification. We are also seeking to make the process of specifying and evolving the Cypher query language as open as possible, and are actively seeking comments and suggestions on how to improve the Cypher query language. The purpose of this talk is to provide more details regarding the above-mentioned aspects.

Materialized Column: An Efficient Way to Optimize Queries on Nested Columns

Databricks

In data warehouse area, it is common to use one or more columns in complex type, such as map, and put many subfields into it. It may impact the query performance dramatically because: 1) It is a waste of IO. The whole column (in map), which may contain tens of subfields, need to be read. And Spark will traverse the whole map and get the value of the target key. 2) Vectorized read can not be exploit when nested type column is read. 3) Filter pushdown can not be utilized when nested columns is read. Over the last year, we have added a series of optimizations in Apache Spark to solve the above problems for Parquet.

From Zero to Hero with Kafka Connect

confluent

Watch this talk here: https://www.confluent.io/online-talks/from-zero-to-hero-with-kafka-connect-on-demand Integrating Apache Kafka® with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren't working. This talk will discuss the key design concepts within Apache Kafka Connect and the pros and cons of standalone vs distributed deployment modes. We'll do a live demo of building pipelines with Apache Kafka Connect for streaming data in from databases, and out to targets including Elasticsearch. With some gremlins along the way, we'll go hands-on in methodically diagnosing and resolving common issues encountered with Apache Kafka Connect. The talk will finish off by discussing more advanced topics including Single Message Transforms, and deployment of Apache Kafka Connect in containers.

Data all over the place! How SQL and Apache Calcite bring sanity to streaming...

Julian Hyde

The revolution has happened. We are living the age of the deconstructed database. The modern enterprises are powered by data, and that data lives in many formats and locations, in-flight and at rest, but somewhat surprisingly, the lingua franca for remains SQL. In this talk, Julian describes Apache Calcite, a toolkit for relational algebra that powers many systems including Apache Beam, Flink and Hive. He discusses some areas of development in Calcite: streaming SQL, materialized views, enabling spatial query on vanilla databases, and what a mash-up of all three might look like. He also describes how SQL is being extended to handle streaming, and the challenges that will need to be solved if it is to become standard. A talk given by Julian Hyde at Lyft, San Francisco, on 2018/06/27.

Understanding Query Plans and Spark UIs

Databricks

"The common use cases of Spark SQL include ad hoc analysis, logical warehouse, query federation, and ETL processing. Spark SQL also powers the other Spark libraries, including structured streaming for stream processing, MLlib for machine learning, and GraphFrame for graph-parallel computation. For boosting the speed of your Spark applications, you can perform the optimization efforts on the queries prior employing to the production systems. Spark query plans and Spark UIs provide you insight on the performance of your queries. This talk discloses how to read and tune the query plans for enhanced performance. It will also cover the major related features in the recent and upcoming releases of Apache Spark. "

Performance Stability, Tips and Tricks and Underscores

Jitendra Singh

Processing Large Data with Apache Spark -- HasGeek

Venkata Naga Ravi

BI, Reporting and Analytics on Apache Cassandra

Victor Coustenoble

How to Extend Apache Spark with Customized Optimizations

Databricks

There are a growing set of optimization mechanisms that allow you to achieve competitive SQL performance. Spark has extension points that help third parties to add customizations and optimizations without needing these optimizations to be merged into Apache Spark. This is very powerful and helps extensibility. We have added some enhancements to the existing extension points framework to enable some fine grained control. This talk will be a deep dive at the extension points that is available in Spark today. We will also talk about the enhancements to this API that we developed to help make this API more powerful. This talk will be of benefit to developers who are looking to customize Spark in their deployments.

InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...

InfluxData

Full Text Search In PostgreSQL

Karwin Software Solutions LLC

Virtuoso Universal Server Overview

rumito

A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite

Julian Hyde

What if Apache Pig had a SQL front-end and query optimizer? What if Apache Calcite was able to use Pig and MapReduce to run queries? In this project, we aimed to answer both questions by adding a Pig adapter for Calcite. In this talk, we describe Calcite's adapter framework, how we used it to write a Pig adapter, and how you can use this SQL interface to Pig for interactive and long-running queries. A talk given by Eli Levine and Julian Hyde at Apache: Big Data, Miami, on May 17th, 2017.

What's hot

Cost-based Query Optimization in Apache Phoenix using Apache Calcite

Julian Hyde

Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...

Spark Summit

Optimizing Delta/Parquet Data Lakes for Apache Spark

Databricks

Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...

Julian Hyde

Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...

Christian Tzolov

Data profiling with Apache Calcite

Julian Hyde

Apache Calcite: One Frontend to Rule Them All

Michael Mior

Apache Calcite: One planner fits all

Julian Hyde

A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai

Databricks

The openCypher Project - An Open Graph Query Language

Neo4j

Materialized Column: An Efficient Way to Optimize Queries on Nested Columns

Databricks

From Zero to Hero with Kafka Connect

confluent

Data all over the place! How SQL and Apache Calcite bring sanity to streaming...

Julian Hyde

Understanding Query Plans and Spark UIs

Databricks

Performance Stability, Tips and Tricks and Underscores

Jitendra Singh

Processing Large Data with Apache Spark -- HasGeek

Venkata Naga Ravi

BI, Reporting and Analytics on Apache Cassandra

Victor Coustenoble

How to Extend Apache Spark with Customized Optimizations

Databricks

InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...

InfluxData

Full Text Search In PostgreSQL

Karwin Software Solutions LLC

What's hot (20)

Cost-based Query Optimization in Apache Phoenix using Apache Calcite

Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...

Optimizing Delta/Parquet Data Lakes for Apache Spark

Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...

Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...

Data profiling with Apache Calcite

Apache Calcite: One Frontend to Rule Them All

Apache Calcite: One planner fits all

A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai

The openCypher Project - An Open Graph Query Language

Materialized Column: An Efficient Way to Optimize Queries on Nested Columns

From Zero to Hero with Kafka Connect

Data all over the place! How SQL and Apache Calcite bring sanity to streaming...

Understanding Query Plans and Spark UIs

Performance Stability, Tips and Tricks and Underscores

Processing Large Data with Apache Spark -- HasGeek

BI, Reporting and Analytics on Apache Cassandra

How to Extend Apache Spark with Customized Optimizations

InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...

Full Text Search In PostgreSQL

Similar to Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite

Virtuoso Universal Server Overview

rumito

A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite

Julian Hyde

Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...

Simplilearn

This presentation about Spark SQL will help you understand what is Spark SQL, Spark SQL features, architecture, data frame API, data source API, catalyst optimizer, running SQL queries and a demo on Spark SQL. Spark SQL is an Apache Spark's module for working with structured and semi-structured data. It is originated to overcome the limitations of Apache Hive. Now, let us get started and understand Spark SQL in detail. Below topics are explained in this Spark SQL presentation: 1. What is Spark SQL? 2. Spark SQL features 3. Spark SQL architecture 4. Spark SQL - Dataframe API 5. Spark SQL - Data source API 6. Spark SQL - Catalyst optimizer 7. Running SQL queries 8. Spark SQL demo This Apache Spark and Scala certification training is designed to advance your expertise working with the Big Data Hadoop Ecosystem. You will master essential skills of the Apache Spark open source framework and the Scala programming language, including Spark Streaming, Spark SQL, machine learning programming, GraphX programming, and Shell Scripting Spark. This Scala Certification course will give you vital skillsets and a competitive advantage for an exciting career as a Hadoop Developer. What is this Big Data Hadoop training course about? The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab. What are the course objectives? Simplilearn’s Apache Spark and Scala certification training are designed to: 1. Advance your expertise in the Big Data Hadoop Ecosystem 2. Help you master essential Apache and Spark skills, such as Spark Streaming, Spark SQL, machine learning programming, GraphX programming and Shell Scripting Spark 3. Help you land a Hadoop developer job requiring Apache Spark expertise by giving you a real-life industry project coupled with 30 demos What skills will you learn? By completing this Apache Spark and Scala course you will be able to: 1. Understand the limitations of MapReduce and the role of Spark in overcoming these limitations 2. Understand the fundamentals of the Scala programming language and its features 3. Explain and master the process of installing Spark as a standalone cluster 4. Develop expertise in using Resilient Distributed Datasets (RDD) for creating applications in Spark 5. Master Structured Query Language (SQL) using SparkSQL 6. Gain a thorough understanding of Spark streaming features 7. Master and describe the features of Spark ML programming and GraphX programming Learn more at https://www.simplilearn.com/big-data-and-analytics/apache-spark-scala-certification-training

SAP S/4 HANA ONLINE TRAINING

Glory IT Technologies

Roles y Responsabilidades en SQL Azure

Eduardo Castro

En esta presentación examinamos los roles y responsabilidades en la administración de SQL Azure. Saludos, Eduardo Castro Martinez – Microsoft SQL Server MVP http://mswindowscr.org http://comunidadwindows.org Costa Rica Technorati Tags: SQL Server LiveJournal Tags: SQL Server del.icio.us Tags: SQL Server http://ecastrom.blogspot.com http://ecastrom.wordpress.com http://ecastrom.spaces.live.com http://universosql.blogspot.com http://todosobresql.blogspot.com http://todosobresqlserver.wordpress.com http://mswindowscr.org/blogs/sql/default.aspx http://citicr.org/blogs/noticias/default.aspx

Building Read Models using event streams

Denis Ivanov

Visualization with Solr Math Expressions and Fusion SQL

Lucidworks

Percona Lucid Dbguestd3896369

Spark SQL In Depth www.syedacademy.com

Syed Hadoop

NoSQL Database: Classification, Characteristics and Comparison

Mayuree Srikulwong

Te kslate sap bods

tekslate1

SAP BODS (Business Objects Data Services) SAP DATA SERVICES 12.2.1/BODI ONLINE TRAINING Data Services Training Manual-60 hrs Duration Overview of Data Services Introduction of Data Services Architecture – Data Services Designer – Data Services repository – Data Services Job Server – Data Services engine – Data Services Access Server – Data Services Address Server – Data Services Administrator – Data Services Metadata Reports applications – Data Services Service – …DATA

HANA SP10 ONLINE TRAINING

SAP FICO Training in Hyderabad

New course content hana sps10 1

SAP FICO Training in Hyderabad

HANA SP10 ONLINE TRAINING

SAP FICO Training in Hyderabad

SQLAnywhere 16.0 and Odata

SAP Technology

OData is becoming the "Lingua Franca" for data exchange across the internet. Serve up OData web services from most relational database backends requires a web server and 3rd-party custom components that translate OData calls into SQL statements (and vice-versa). SQLAnywhere 16.0 introduced a new OData Producer that does all this work for you automatically. This session will walk throught the setup and configuration of this new server process, and show real world examples on how to use it.

Mobilefirstmedit

SQL Server Reporting Services: IT Best Practices

Denny Lee

Dev/Test Environment Provisioning and Management on AWS

Shiva Narayanaswamy

Serverless Data Platform

Shu-Jeng Hsieh

Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell

Databricks

In this webcast, Patrick Wendell from Databricks will be speaking about Apache Spark's new 1.6 release. Spark 1.6 will include (but not limited to) a type-safe API called Dataset on top of DataFrames that leverages all the work in Project Tungsten to have more robust and efficient execution (including memory management, code generation, and query optimization) [SPARK-9999], adaptive query execution [SPARK-9850], and unified memory management by consolidating cache and execution memory [SPARK-10000].

Similar to Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite (20)

Virtuoso Universal Server Overview

A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite

Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...

SAP S/4 HANA ONLINE TRAINING

Roles y Responsabilidades en SQL Azure

Building Read Models using event streams

Visualization with Solr Math Expressions and Fusion SQL

Percona Lucid Db

Spark SQL In Depth www.syedacademy.com

NoSQL Database: Classification, Characteristics and Comparison

Te kslate sap bods

HANA SP10 ONLINE TRAINING

New course content hana sps10 1

HANA SP10 ONLINE TRAINING

SQLAnywhere 16.0 and Odata

Mobile

SQL Server Reporting Services: IT Best Practices

Dev/Test Environment Provisioning and Management on AWS

Serverless Data Platform

Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell

More from Julian Hyde

Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)

Julian Hyde

Building a semantic/metrics layer using Calcite

Julian Hyde

A semantic layer, also known as a metrics layer, lies between business users and the database, and lets those users compose queries in the concepts that they understand. It also governs access to the data, manages data transformations, and can tune the database by defining materializations. Like many new ideas, the semantic layer is a distillation and evolution of many old ideas, such as query languages, multidimensional OLAP, and query federation. In this talk, we describe the features we are adding to Calcite to define business views, query measures, and optimize performance. A talk given at Community over Code, the annual conference of the Apache Software Foundation, in Halifax, NS, on 9th October, 2023.

Cubing and Metrics in SQL, oh my!

Julian Hyde

If SQL is the universal language of data, why do we author our most important data applications (metrics, analytics, business intelligence) in languages other than SQL? Multidimensional databases and languages such as MDX, DAX and Tableau LOD solve these problems but introduce others: they require specialized knowledge, complicate the data pipeline and don’t integrate well. Is it possible to define and query business intelligence models in SQL? Apache Calcite has extended SQL to support metrics (which we call ‘measures’), filter context, and analytic expressions. With these concepts you can define data models (which we call Analytic Views) that contain metrics, use them in queries, and define new metrics in queries. In this talk by the original developer of Apache Calcite, we describe the SQL syntax extensions for metrics, and how to use them for cross-dimensional calculations such as period-over-period, percent-of-total, non-additive and semi-additive measures. We describe how we got around fundamental limitations in SQL semantics, and approaches for optimizing queries that use metrics. A talk given by Julian Hyde at Data Council, Austin, TX, on March 29, 2023.

Adding measures to Calcite SQL

Julian Hyde

Morel, a data-parallel programming language

Julian Hyde

What would the perfect data-parallel programming language look like? It would be as expressive as a general-purpose functional programming language, as powerful and concise as SQL, and run programs just as efficiently on a laptop or a thousand-node cluster. We present Morel, a functional programming language with relational extensions, working towards that goal. Morel is implemented in the Apache Calcite community on top of Calcite’s relational algebra framework. In this talk, we describe Morel’s evolution, including how we are pushing Calcite’s capabilities with graph and recursive queries. A talk given by Julian Hyde at ApacheCon, New Orleans, October 4th 2022.

Is there a perfect data-parallel programming language? (Experiments with More...

Julian Hyde

The perfect data parallel language has not yet been invented. SQL queries can achieve great performance and scale, but there are many general purpose algorithms that it cannot express. In Morel, we build on the functional and relational roots of MapReduce in an elegant and strongly-typed general-purpose programming language. But Morel is, in a real sense, a query language; programs are executed on relational frameworks such as Google BigQuery and Spark. In this talk, we describe the principles that drove Morel’s design, the problems that we had to solve in order to implement a hybrid functional/relational language, and how Morel can be applied to implement data-intensive systems. We also introduce Apache Calcite, the popular open source framework for query planning, and describe how Morel's compiler uses Calcite's relational algebra and rewrite rules to generate efficient plans.

Morel, a Functional Query Language

Julian Hyde

Is it easier to add functional programming features to a query language, or to add query capabilities to a functional language? In Morel, we have done the latter. Functional and query languages have much in common, and yet much to learn from each other. Functional languages have a rich type system that includes polymorphism and functions-as-values and Turing-complete expressiveness; query languages have optimization techniques that can make programs several orders of magnitude faster, and runtimes that can use thousands of nodes to execute queries over terabytes of data. Morel is an implementation of Standard ML on the JVM, with language extensions to allow relational expressions. Its compiler can translate programs to relational algebra and, via Apache Calcite’s query optimizer, run those programs on relational backends. In this talk, we describe the principles that drove Morel’s design, the problems that we had to solve in order to implement a hybrid functional/relational language, and how Morel can be applied to implement data-intensive systems. (A talk given by Julian Hyde at Strange Loop 2021, St. Louis, MO, on October 1st, 2021.)

What to expect when you're Incubating

Julian Hyde

Efficient spatial queries on vanilla databases

Julian Hyde

A talk given by Julian Hyde at the Apache Calcite online meetup, 2021/01/20. Spatial and GIS applications have traditionally required specialized databases, or at least specialized data structures like r-trees. Unfortunately this means that hybrid applications such as spatial analytics are not well served, and many people are unaware of the power of spatial queries because their favorite database does not support them. In this talk, we describe how Apache Calcite enables efficient spatial queries using generic data structures such as HBase’s key-sorted tables, using techniques like Hilbert space-filling curves and materialized views. Calcite implements much of the OpenGIS function set and recognizes query patterns that can be rewritten to use particular spatial indexes. Calcite is bringing spatial query to the masses!

Tactical data engineering

Julian Hyde

A talk given by Julian Hyde at DataCouncil SF on April 18, 2019 How do you organize your data so that your users get the right answers at the right time? That question is a pretty good definition of data engineering — but it is also describes the purpose of every DBMS (database management system). And it’s not a coincidence that these are so similar. This talk looks at the patterns that reoccur throughout data management — such as caching, partitioning, sorting, and derived data sets. As the speaker is the author of Apache Calcite, we first look at these patterns through the lens of Relational Algebra and DBMS architecture. But then we apply these patterns to the modern data pipeline, ETL and analytics. As a case study, we look at how Looker’s “derived tables” blur the line between ETL and caching, and leverage the power of cloud databases.

Don't optimize my queries, organize my data!

Julian Hyde

Your queries won't run fast if your data is not organized right. Apache Calcite optimizes queries, but can we make it optimize data? We had to solve several challenges. Users are too busy to tell us the structure of their database, and the query load changes daily, so Calcite has to learn and adapt. We talk about new algorithms we developed for gathering statistics on massive database, and how we infer and evolve the data model based on the queries.

Spatial query on vanilla databases

Julian Hyde

A talk given by Julian Hyde at ApacheCon NA 2018 in Montreal on September 26th, 2018. Spatial and GIS applications have traditionally required specialized databases, or at least specialized data structures like r-trees. Unfortunately this means that hybrid applications such as spatial analytics are not well served, and many people are unaware of the power of spatial queries because their favorite database does not support them. In this talk, we describe how Apache Calcite enables efficient spatial queries using generic data structures such as HBase’s key-sorted tables, using techniques like Hilbert space-filling curves and materialized views. Calcite implements much of the OpenGIS function set and recognizes query patterns that can be rewritten to use particular spatial indexes. Calcite is bringing spatial query to the masses!

Lazy beats Smart and Fast

Julian Hyde

A talk given by Julian Hyde at DataEngConf SF on April 17th 2018. Did you know that databases often “cheat”? Even with a scalable query engine and smart optimizer, many real-world queries would be too slow if the engine read all the data, so the engine re-writes your query to use a pre-materialized result. B-tree indexes made the first relational databases possible, and there are now many flavors of materialization, from explicit materialized views to OLAP-style caching and spatial indexes. Materialization is more relevant than ever in today’s heterogenous, distributed systems. If you are evaluating data engines, we describe what materialization features to look for in your next engine. If you are implementing an engine, we describe the features provided by Apache Calcite to design, maintain and use materializations.

Don’t optimize my queries, optimize my data!

Julian Hyde

Your queries won't run fast if your data is not organized right. Apache Calcite optimizes queries, but can we evolve it so that it can optimize data? We had to solve several challenges. Users are too busy to tell us the structure of their database, and the query load changes daily, so Calcite has to learn and adapt. We talk about new algorithms we developed for gathering statistics on massive database, and how we infer and evolve the data model based on the queries, suggesting materialized views that will make your queries run faster without you changing them. A talk given by Julian Hyde at DataEngConf NYC, Columbia University, on 2017/10/30.

Data Profiling in Apache Calcite

Julian Hyde

Streaming SQL

Julian Hyde

Streaming is necessary to handle data rates and latency, but SQL is unquestionably the lingua franca of data. Is it possible to combine SQL with streaming, and if so, what does the resulting language look like? Apache Calcite is extending SQL to include streaming, and Apache Apex is using Calcite to support streaming SQL. In this talk, Julian Hyde describes streaming SQL in detail and shows how you can use streaming SQL in your application. He also describes how Calcite’s planner optimizes queries for throughput and latency. Julian Hyde gave this talk at Apex Big Data World, Mountain View, on April 4, 2017.

Streaming SQL (at FlinkForward, Berlin, 2016/09/12)

Julian Hyde

A talk given by Julian Hyde at FlinkForward, Berlin, on 2016/09/12. Streaming is necessary to handle data rates and latency, but SQL is unquestionably the lingua franca of data. Is it possible to combine SQL with streaming, and if so, what does the resulting language look like? Apache Calcite is extending SQL to include streaming, and Apache Flink is using Calcite to support both regular and streaming SQL. In this talk, Julian Hyde describes streaming SQL in detail and shows how you can use streaming SQL in your application. He also describes how Calcite’s planner optimizes queries for throughput and latency.

Streaming SQL

Julian Hyde

Streaming is necessary to handle IoT data rates and latency but SQL is unquestionably the lingua franca of data. Apache Samza and Apache Storm have new high-level query interfaces based on standard SQL with streaming extensions, both powered by Apache Calcite. Calcite's relational algebra allows query optimization and federation with data-at-rest in databases, memory, or HDFS. A talk given by Julian Hyde at Hadoop Summit, San Jose, on 2016/06/29.

Streaming SQL

Julian Hyde

Streaming is a paradigm for data processing that is rapidly growing in popularity, because it allows high throughput, low latency responses, and efficiently manages multitudes of IoT devices. Is it an alternative to database processing, or is it complementary? Julian Hyde argues for applying the database paradigm to streaming systems, using SQL as a high-level language for streaming. He presents streaming SQL, a super-set of standard SQL developed in collaboration with several Apache projects, and the use cases it can solve, such as combining data in flight with historic data at rest. He also shows how query optimization techniques can make streaming applications more efficient. A talk given by Julian Hyde at 9th XLDB conference at SLAC, Menlo Park, on 2016/05/25.

Streaming SQL with Apache Calcite

Julian Hyde

With the rise of the Internet of Things (IoT) and low-latency analytics, streaming data becomes ever more important. Surprisingly, one of the most promising approaches for processing streaming data is SQL. In this presentation, Julian Hyde shows how to build streaming SQL analytics that deliver results with low latency, adapt to network changes, and play nicely with BI tools and stored data. He also describes how Apache Calcite optimizes streaming queries, and the ongoing collaborations between Calcite and the Storm, Flink and Samza projects. This talk was given Julian Hyde at Apache Big Data conference, Vancouver, on 2016/05/09.

More from Julian Hyde (20)

Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)

Building a semantic/metrics layer using Calcite

Cubing and Metrics in SQL, oh my!

Adding measures to Calcite SQL

Morel, a data-parallel programming language

Is there a perfect data-parallel programming language? (Experiments with More...

Morel, a Functional Query Language

What to expect when you're Incubating

Efficient spatial queries on vanilla databases

Tactical data engineering

Don't optimize my queries, organize my data!

Spatial query on vanilla databases

Lazy beats Smart and Fast

Don’t optimize my queries, optimize my data!

Data Profiling in Apache Calcite

Streaming SQL

Streaming SQL (at FlinkForward, Berlin, 2016/09/12)

Streaming SQL

Streaming SQL with Apache Calcite

Recently uploaded

Navigating the Metaverse: A Journey into Virtual Evolution"

Donna Lenk

Enhancing Research Orchestration Capabilities at ORNL.pdf

Globus

Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.

Graphic Design Crash Course for beginners

e20449

Cracking the code review at SpringIO 2024

Paco van Beckhoven

Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production. Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process? In this session we will cover: - The Art of Effective Code Reviews - Streamlining the Review Process - Elevating Reviews with Automated Tools By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces

Enterprise Resource Planning System in Telangana

NYGGS Automation Suite

Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics. To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/

In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...

Juraj Vysvader

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...

Globus

Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.

May Marketo Masterclass, London MUG May 22 2024.pdf

Adele Miller

OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam

takuyayamamoto1800

Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better

XfilesPro

2024 RoOUG Security model for the cloud.pptx

Georgi Kodinov

Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...

Mind IT Systems

Globus Compute wth IRI Workflows - GlobusWorld 2024

Globus

As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.

A Sighting of filterA in Typelevel Rite of Passage

Philip Schwarz

Enterprise Software Development with No Code Solutions.pptx

QuickwayInfoSystems3

In the ever-evolving landscape of technology, enterprise software development is undergoing a significant transformation. Traditional coding methods are being challenged by innovative no-code solutions, which promise to streamline and democratize the software development process. This shift is particularly impactful for enterprises, which require robust, scalable, and efficient software to manage their operations. In this article, we will explore the various facets of enterprise software development with no-code solutions, examining their benefits, challenges, and the future potential they hold.

Essentials of Automations: The Art of Triggers and Actions in FME

Safe Software

In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation. We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios. Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!

Vitthal Shirke Microservices Resume Montevideo

Vitthal Shirke

Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx

ShamsuddeenMuhammadA

Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx

rickgrimesss22

Cyaniclab : Software Development Agency Portfolio.pdf

Cyanic lab

CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.

Recently uploaded (20)

Navigating the Metaverse: A Journey into Virtual Evolution"

Enhancing Research Orchestration Capabilities at ORNL.pdf

Graphic Design Crash Course for beginners

Cracking the code review at SpringIO 2024

Enterprise Resource Planning System in Telangana

In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...

May Marketo Masterclass, London MUG May 22 2024.pdf

OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam

Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better

2024 RoOUG Security model for the cloud.pptx

Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...

Globus Compute wth IRI Workflows - GlobusWorld 2024

A Sighting of filterA in Typelevel Rite of Passage

Enterprise Software Development with No Code Solutions.pptx

Essentials of Automations: The Art of Triggers and Actions in FME

Vitthal Shirke Microservices Resume Montevideo

Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx

Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx

Cyaniclab : Software Development Agency Portfolio.pdf

Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite

1. Open Source SQL - beyond parsers: ZetaSQL & Apache Calcite Northwest Database Society Annual Meeting 2021/01/20 Mosha Pasumansky & Julian Hyde (Google)

2. Apache Calcite goals Make it easier to write a simple DBMS Advance the state of the art for complex DBMS Bring database approaches to new areas (e.g. streaming, geospatial, federation, data science) Composition + evolution (framework + open source) Apache license & governance

3. LucidDB C++ Calcite evolution - origins as an SMP DB JDBC server JDBC client Physical operators Rewrite rules Catalog Storage & data structures SQL parser & validator Query planner Relational algebra Java

4. Optiq Calcite evolution - pluggable components JDBC server JDBC client Physical operators Rewrite rules SQL parser & validator Query planner Relational algebra

5. Optiq Calcite evolution - pluggable components JDBC server JDBC client SQL parser & validator Query planner Adapter Pluggable rewrite rules Pluggable stats / cost Pluggable catalog Physical operators Storage Relational algebra

6. Apache Calcite Calcite evolution - separate JDBC stack Avatica JDBC server JDBC client Pluggable rewrite rules Pluggable stats / cost Pluggable catalog ODBC client Adapter Physical operators Storage SQL parser & validator Query planner Relational algebra

7. Apache Calcite Calcite evolution - federation via adapters Pluggable rewrite rules Pluggable stats / cost Pluggable catalog Adapter Physical operators Storage SQL parser & validator Query planner Relational algebra SQL

8. Calcite evolution - federation via adapters Apache Calcite JDBC adapter Pluggable rewrite rules Pluggable stats / cost Enumerable adapter MongoDB adapter File adapter (CSV, JSON, Http) Apache Kafka adapter Apache Spark adapter Pluggable catalog SQL SQL parser & validator Query planner Relational algebra

9. Calcite evolution - federation via adapters Apache Calcite Pluggable rewrite rules Pluggable stats / cost Enumerable adapter Pluggable catalog SQL SQL parser & validator Query planner Relational algebra

10. Calcite evolution - federation via adapters Apache Calcite JDBC adapter Pluggable rewrite rules Pluggable stats / cost Pluggable catalog SQL SQL parser & validator Query planner Relational algebra

11. Apache Calcite Calcite evolution - SQL dialects Pluggable rewrite rules Pluggable parser, lexical, conformance, operators Pluggable SQL dialect SQL SQL SQL parser & validator Query planner Relational algebra JDBC adapter

12. Apache Calcite Calcite evolution - other front-end languages SQL Adapter Physical operators Storage SQL parser & validator Query planner Relational algebra

13. Calcite evolution - other front-end languages Pig RelBuilder Adapter Physical operators Morel Storage Query planner Relational algebra Datalog SQL parser & validator SQL

14. Apache Calcite Calcite architecture Avatica JDBC server JDBC client Pluggable rewrite rules Pluggable stats / cost Pluggable catalog ODBC client Adapter Physical operators Storage SQL parser & validator Query planner Relational algebra Core – Operator expressions (relational algebra) and planner (based on Cascades) External – Data storage, algorithms and catalog Optional – SQL parser, JDBC & ODBC drivers Extensible – Planner rewrite rules, statistics, cost model, algebra, UDFs RelBuilder

15.

16.

17. Lessons learned Decompose the database into components SQL is standard but also allows innovation Relational algebra intermediate language Calcite has many uses, including: ● Embedded within DBMS (e.g. Apache Hive, OmniSciDB) ● Lightweight DBMS ● Platform for research ● Sandbox for relational algebra ● Toolkit for translating between SQL dialects

18. ZetaSQL SQL Parser Catalog AST Resolver Resolved AST BigQuery Spanner F1 DataFlow Test Harness Corpus of compliance tests Reference implementation

19. Thank you! Questions? #ZetaSQL https://github.com/google/zetasql @ApacheCalcite https://calcite.apache.org

Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite

Similar to Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite (20)

More from Julian Hyde

More from Julian Hyde (20)

Recently uploaded

Recently uploaded (20)

Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite