SlideShare a Scribd company logo
1 of 109
Download to read offline
><
INTRODUCTIONTO
INTRODUCTION TO APACHE CALCITE
APACHE CALCITE
JORDAN HALTERMAN
1
WHAT IS APACHE
CALCITE?
next 2
><INTRODUCTION TO APACHE CALCITE 3
What is Apache Calcite?
• A framework for building SQL databases
• Developed over more than ten years
• Written in Java
• Previously known as Optiq
• Previously known as Farrago
• Became an Apache project in 2013
• Led by Julian Hyde at Hortonworks
><INTRODUCTION TO APACHE CALCITE 4
Projects using Calcite
• Apache Hive
• Apache Drill
• Apache Flink
• Apache Phoenix
• Apache Samza
• Apache Storm
• Apache everything…
><INTRODUCTION TO APACHE CALCITE 5
What is Apache Calcite?
• SQL parser
• SQL validation
• Query optimizer
• SQL generator
• Data federator
><INTRODUCTION TO APACHE CALCITE
Parse
Queries are parsed using
a JavaCC generated
parser
Validate
Queries are validated
against known database
metadata
Optimize
Logical plans are optimized
and converted into physical
expressions
Execute
P h y s i c a l p l a n s a r e
converted into application-
specific executions
01 02 03 04
Stages of query execution
6
COMPONENTS next 7
><INTRODUCTION TO APACHE CALCITE 8
Components of Calcite
• Catalog - Defines metadata and namespaces
that can be accessed in SQL queries
• SQL parser - Parses valid SQL queries into an
abstract syntax tree (AST)
• SQL validator - Validates abstract syntax trees
against metadata provided by the catalog
• Query optimizer - Converts AST into logical
plans, optimizes logical plans, and converts
logical expressions into physical plans
• SQL generator - Converts physical plans to
SQL
CATALOG next 9
><INTRODUCTION TO APACHE CALCITE 10
Calcite Catalog
• Defines namespaces that can be accessed in Calcite
queries
• Schema
• A collection of schemas and tables
• Can be arbitrarily nested
• Table
• Represents a single data set
• Fields defined by a RelDataType
• RelDataType
• Represents fields in a data set
• Supports all SQL data types, including structs and
><INTRODUCTION TO APACHE CALCITE 11
Schema
• A collection of schemas and tables
• Schemas can be arbitrarily nested
><INTRODUCTION TO APACHE CALCITE 12
Schema
• A collection of schemas and tables
• Schemas can be arbitrarily nested
><INTRODUCTION TO APACHE CALCITE 13
Table
• Represents a single data set
• Fields are defined by a RelDataType
><INTRODUCTION TO APACHE CALCITE 14
Table
• Represents a single data set
• Fields are defined by a RelDataType
><INTRODUCTION TO APACHE CALCITE 15
RelDataType
• Represents the data type of an object
• Supports all SQL data types, including
structs and arrays
• Similar to Spark’s DataType
><INTRODUCTION TO APACHE CALCITE 16
RelDataType
><INTRODUCTION TO APACHE CALCITE 17
RelDataType
data type enum
><INTRODUCTION TO APACHE CALCITE 18
Statistic
• Provide table statistics used in optimization
><INTRODUCTION TO APACHE CALCITE 19
Statistic
• Provide table statistics used in optimization
><INTRODUCTION TO APACHE CALCITE 20
Usage of the Calcite catalog
><INTRODUCTION TO APACHE CALCITE 21
Usage of the Calcite catalog
schema
><INTRODUCTION TO APACHE CALCITE 22
Usage of the Calcite catalog
schema table
><INTRODUCTION TO APACHE CALCITE 23
Usage of the Calcite catalog
schema table
data type
><INTRODUCTION TO APACHE CALCITE 24
Usage of the Calcite catalog
schema table
data typedata type field
SQL PARSER next 25
><INTRODUCTION TO APACHE CALCITE 26
Calcite SQL parser
• LL(k) parser written in JavaCC
• Input queries are parsed into an abstract
syntax tree (AST)
• Tokens are represented in Calcite by
SqlNode
• SqlNode can also be converted back to a
SQL string via the unparse method
><INTRODUCTION TO APACHE CALCITE 27
JavaCC
• Java Compiler Compiler
• Created in 1996 at Sun Microsystems
• Generates Java code from a domain-
specific language
• ANTLR is the modern alternative used in
projects like Hive and Drill
• JavaCC has sparse documentation
><INTRODUCTION TO APACHE CALCITE 28
JavaCC
><INTRODUCTION TO APACHE CALCITE 29
JavaCC
><INTRODUCTION TO APACHE CALCITE 30
JavaCC
tokens
><INTRODUCTION TO APACHE CALCITE 31
JavaCC
tokens
or
><INTRODUCTION TO APACHE CALCITE 32
JavaCC
tokens
or
function call
><INTRODUCTION TO APACHE CALCITE 33
JavaCC
tokens
Java code
or
function call
><INTRODUCTION TO APACHE CALCITE 34
SqlNode
• SqlNode represents an element in an
abstract syntax tree
><INTRODUCTION TO APACHE CALCITE 35
SqlNode
• SqlNode represents an element in an
abstract syntax tree
select
><INTRODUCTION TO APACHE CALCITE 36
SqlNode
• SqlNode represents an element in an
abstract syntax tree
identifiersselect
><INTRODUCTION TO APACHE CALCITE 37
SqlNode
• SqlNode represents an element in an
abstract syntax tree
identifiersselect operator
><INTRODUCTION TO APACHE CALCITE 38
SqlNode
• SqlNode represents an element in an
abstract syntax tree
identifiersselect operator identifier
><INTRODUCTION TO APACHE CALCITE 39
SqlNode
• SqlNode represents an element in an
abstract syntax tree
identifiersselect operator identifier
data type
><INTRODUCTION TO APACHE CALCITE 40
SqlNode
• SqlNode represents an element in an
abstract syntax tree
identifiersselect operator identifier
data type
identifier
><INTRODUCTION TO APACHE CALCITE 41
SqlNode
• SqlNode’s unparse method converts a
SQL element back into a string
><INTRODUCTION TO APACHE CALCITE 42
SqlNode
• SqlNode’s unparse method converts a
SQL element back into a string
><INTRODUCTION TO APACHE CALCITE 43
SqlNode
><INTRODUCTION TO APACHE CALCITE 44
SqlNode
• SqlDialect indicates the capitalization
and quoting rules of specific databases
><INTRODUCTION TO APACHE CALCITE 45
SqlNode
• SqlDialect indicates the capitalization
and quoting rules of specific databases
QUERY OPTIMIZER next 46
><INTRODUCTION TO APACHE CALCITE 47
Query Plans
• Query plans represent the steps necessary
to execute a query
><INTRODUCTION TO APACHE CALCITE 48
Query Plans
• Query plans represent the steps necessary
to execute a query
><INTRODUCTION TO APACHE CALCITE 49
Query Plans
• Query plans represent the steps necessary
to execute a query
table scan
table
scan
><INTRODUCTION TO APACHE CALCITE 50
Query Plans
• Query plans represent the steps necessary
to execute a query
inner join table scan
table
scan
><INTRODUCTION TO APACHE CALCITE 51
Query Plans
• Query plans represent the steps necessary
to execute a query
filter
inner join table scan
table
scan
><INTRODUCTION TO APACHE CALCITE 52
Query Plans
• Query plans represent the steps necessary
to execute a query
filter
inner join
project
table scan
table
scan
><INTRODUCTION TO APACHE CALCITE 53
Query Plans
• Query plans represent the steps necessary
to execute a query
filter
inner join
project
table scan
table
scan
><INTRODUCTION TO APACHE CALCITE 54
Query Optimization
• Optimize logical plan
• Goal is typically to try to reduce the amount
of data that must be processed early in the
plan
• Convert logical plan into a physical plan
• Physical plan is engine specific and
represents the physical execution stages
><INTRODUCTION TO APACHE CALCITE 55
Query Optimization
• Prune unused fields
• Merge projections
• Convert subqueries to joins
• Reorder joins
• Push down projections
• Push down filters
><INTRODUCTION TO APACHE CALCITE 56
Query Optimization
><INTRODUCTION TO APACHE CALCITE 57
Query Optimization
><INTRODUCTION TO APACHE CALCITE 58
Query Optimization
><INTRODUCTION TO APACHE CALCITE 59
Query Optimization
push down
project
><INTRODUCTION TO APACHE CALCITE 60
Query Optimization
push down
project
push down
filter
><INTRODUCTION TO APACHE CALCITE 61
Query Optimization
><INTRODUCTION TO APACHE CALCITE 62
Key Concepts
Relational algebra
Row expressions
Traits
Conventions
Rules
Planners
Programs
><INTRODUCTION TO APACHE CALCITE 63
Key Concepts
Relational algebra
Row expressions
Traits
Conventions
Rules
Planners
Programs
RelNode
RexNode
RelTrait
Convention
RelOptRule
RelOptPlanner
Program
><INTRODUCTION TO APACHE CALCITE 64
Relational Algebra
• RelNode represents a relational expression
• Largely equivalent to Spark’s DataFrame
methods
• Logical algebra
• Physical algebra
><INTRODUCTION TO APACHE CALCITE 65
Relational Algebra
TableScan
Project
Filter
Aggregate
Join
Union
Intersect
Sort
><INTRODUCTION TO APACHE CALCITE 66
Relational Algebra
TableScan
Project
Filter
Aggregate
Join
Union
Intersect
Sort
SparkTableScan
SparkProject
SparkFilter
SparkAggregate
SparkJoin
SparkUnion
SparkIntersect
SparkSort
><INTRODUCTION TO APACHE CALCITE 67
Row Expressions
• RexNode represents a row-level expression
• Largely equivalent to Spark’s Column
functions
• Projection fields
• Filter condition
• Join condition
• Sort fields
><INTRODUCTION TO APACHE CALCITE 68
Row Expressions
Input column ref
Literal
Struct field access
Function call
Window expression
><INTRODUCTION TO APACHE CALCITE 69
Row Expressions
Input column ref
Literal
Struct field access
Function call
Window expression
RexInputRef
RexLiteral
RexFieldAccess
RexCall
RexOver
><INTRODUCTION TO APACHE CALCITE 70
Row Expressions
><INTRODUCTION TO APACHE CALCITE 71
Row Expressions
input ref
><INTRODUCTION TO APACHE CALCITE 72
Row Expressions
input ref
function call
><INTRODUCTION TO APACHE CALCITE 73
Traits
• Defined by the RelTrait interface
• Represent a trait of a relational expression
that does not alter execution
• Traits are used to validate plan output
• Three primary trait types:
• Convention
• RelCollation
• RelDistribution
><INTRODUCTION TO APACHE CALCITE 74
Conventions
• Convention is a type of RelTrait
• A Convention is associated with a
RelNode interface
• SparkConvention, JdbcConvention,
EnumerableConvention, etc
• Conventions are used to represent a single
data source
• Inputs to a relational expression must be in
the same convention
><INTRODUCTION TO APACHE CALCITE 75
Conventions
><INTRODUCTION TO APACHE CALCITE 76
Conventions
Spark convention
><INTRODUCTION TO APACHE CALCITE 77
Conventions
Spark convention
JDBC
convention
><INTRODUCTION TO APACHE CALCITE 78
Conventions
Spark convention
JDBC
convention
converter
><INTRODUCTION TO APACHE CALCITE 79
Rules
• Rules are used to modify query plans
• Defined by the RelOptRule interface
• Two types of rules: converters and
transformers
• Converter rules implement Converter and
convert from one convention to another
• Rules are matched to elements of a query
plan using pattern matching
• onMatch is called for matched rules
• Converter rules applied via convert
><INTRODUCTION TO APACHE CALCITE 80
Converter Rule
><INTRODUCTION TO APACHE CALCITE 81
Converter Rule
expression type
><INTRODUCTION TO APACHE CALCITE 82
Converter Rule
expression type
input convention
><INTRODUCTION TO APACHE CALCITE 83
Converter Rule
expression type
input convention
converted convention
><INTRODUCTION TO APACHE CALCITE 84
Converter Rule
expression type
input convention
converted convention
converter function
><INTRODUCTION TO APACHE CALCITE 85
Pattern Matching
><INTRODUCTION TO APACHE CALCITE 86
Pattern Matching
><INTRODUCTION TO APACHE CALCITE 87
Pattern Matching
no match
:-(
><INTRODUCTION TO APACHE CALCITE 88
Pattern Matching
no match
:-(
><INTRODUCTION TO APACHE CALCITE 89
Pattern Matching
match!
no match
:-(
><INTRODUCTION TO APACHE CALCITE 90
Planners
• Planners implement the RelOptPlanner
interface
• Two types of planners:
• HepPlanner
• VolcanoPlanner
><INTRODUCTION TO APACHE CALCITE 91
Heuristic Optimization
• HepPlanner is a heuristic optimizer similar
to Spark’s optimizer
• Applies all matching rules until none can be
applied
• Heuristic optimization is faster than cost-
based optimization
• Risk of infinite recursion if rules make
opposing changes to the plan
><INTRODUCTION TO APACHE CALCITE 92
Cost-based Optimization
• VolcanoPlanner is a cost-based
optimizer
• Applies matching rules iteratively, selecting
the plan with the cheapest cost on each
iteration
• Costs are provided by relational expressions
• Not all possible plans can be computed
• Stops optimization when the cost does not
significantly improve through a determinable
number of iterations
><INTRODUCTION TO APACHE CALCITE 93
Cost-based Optimization
• Cost is provided by each RelNode
• Cost is represented by RelOptCost
• Cost typically includes row count, I/O, and
CPU cost
• Cost estimates are relative
• Statistics are used to improve accuracy of
cost estimations
• Calcite provides utilities for computing
various resource-related statistics for use in
cost estimations
><INTRODUCTION TO APACHE CALCITE 94
Cost-based Optimization
><INTRODUCTION TO APACHE CALCITE 95
Cost-based Optimization
><INTRODUCTION TO APACHE CALCITE 96
Cost-based Optimization
><INTRODUCTION TO APACHE CALCITE 97
Cost-based Optimization
><INTRODUCTION TO APACHE CALCITE 98
Cost-based Optimization
PUTTING IT ALL
TOGETHER
next 99
><INTRODUCTION TO APACHE CALCITE 100
Putting it all together
><INTRODUCTION TO APACHE CALCITE 101
Putting it all together
><INTRODUCTION TO APACHE CALCITE 102
Putting it all together
><INTRODUCTION TO APACHE CALCITE 103
Putting it all together
><INTRODUCTION TO APACHE CALCITE 104
Putting it all together
><INTRODUCTION TO APACHE CALCITE 105
Putting it all together
><INTRODUCTION TO APACHE CALCITE 106
Putting it all together
><INTRODUCTION TO APACHE CALCITE 107
Putting it all together
><INTRODUCTION TO APACHE CALCITE 108
Putting it all together
><INTRODUCTION TO APACHE CALCITE 109
Putting it all together

More Related Content

What's hot

What's hot (20)

Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overview
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
 
SQL for NoSQL and how Apache Calcite can help
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can help
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 

Similar to Introduction to Apache Calcite

ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
Evans Ye
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
Saltlux Inc.
 

Similar to Introduction to Apache Calcite (20)

Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
Accelerating SQL queries in NoSQL Databases using Apache Drill and Secondary ...
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
 
PL/SQL Tips and Techniques Webinar Presentation
PL/SQL Tips and Techniques Webinar PresentationPL/SQL Tips and Techniques Webinar Presentation
PL/SQL Tips and Techniques Webinar Presentation
 
KSQL Intro
KSQL IntroKSQL Intro
KSQL Intro
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
 
Dan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New FeaturesDan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New Features
 
What's new in Apache Spark 2.4
What's new in Apache Spark 2.4What's new in Apache Spark 2.4
What's new in Apache Spark 2.4
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql
 
Data Science at Scale with Apache Spark and Zeppelin Notebook
Data Science at Scale with Apache Spark and Zeppelin NotebookData Science at Scale with Apache Spark and Zeppelin Notebook
Data Science at Scale with Apache Spark and Zeppelin Notebook
 
Spark sql meetup
Spark sql meetupSpark sql meetup
Spark sql meetup
 
Flink's SQL Engine: Let's Open the Engine Room!
Flink's SQL Engine: Let's Open the Engine Room!Flink's SQL Engine: Let's Open the Engine Room!
Flink's SQL Engine: Let's Open the Engine Room!
 
Pydata london meetup - RiakTS, PySpark and Python by Stephen Etheridge
Pydata london meetup - RiakTS, PySpark and Python by Stephen EtheridgePydata london meetup - RiakTS, PySpark and Python by Stephen Etheridge
Pydata london meetup - RiakTS, PySpark and Python by Stephen Etheridge
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
 

Recently uploaded

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 

Recently uploaded (20)

%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 

Introduction to Apache Calcite

  • 1. >< INTRODUCTIONTO INTRODUCTION TO APACHE CALCITE APACHE CALCITE JORDAN HALTERMAN 1
  • 3. ><INTRODUCTION TO APACHE CALCITE 3 What is Apache Calcite? • A framework for building SQL databases • Developed over more than ten years • Written in Java • Previously known as Optiq • Previously known as Farrago • Became an Apache project in 2013 • Led by Julian Hyde at Hortonworks
  • 4. ><INTRODUCTION TO APACHE CALCITE 4 Projects using Calcite • Apache Hive • Apache Drill • Apache Flink • Apache Phoenix • Apache Samza • Apache Storm • Apache everything…
  • 5. ><INTRODUCTION TO APACHE CALCITE 5 What is Apache Calcite? • SQL parser • SQL validation • Query optimizer • SQL generator • Data federator
  • 6. ><INTRODUCTION TO APACHE CALCITE Parse Queries are parsed using a JavaCC generated parser Validate Queries are validated against known database metadata Optimize Logical plans are optimized and converted into physical expressions Execute P h y s i c a l p l a n s a r e converted into application- specific executions 01 02 03 04 Stages of query execution 6
  • 8. ><INTRODUCTION TO APACHE CALCITE 8 Components of Calcite • Catalog - Defines metadata and namespaces that can be accessed in SQL queries • SQL parser - Parses valid SQL queries into an abstract syntax tree (AST) • SQL validator - Validates abstract syntax trees against metadata provided by the catalog • Query optimizer - Converts AST into logical plans, optimizes logical plans, and converts logical expressions into physical plans • SQL generator - Converts physical plans to SQL
  • 10. ><INTRODUCTION TO APACHE CALCITE 10 Calcite Catalog • Defines namespaces that can be accessed in Calcite queries • Schema • A collection of schemas and tables • Can be arbitrarily nested • Table • Represents a single data set • Fields defined by a RelDataType • RelDataType • Represents fields in a data set • Supports all SQL data types, including structs and
  • 11. ><INTRODUCTION TO APACHE CALCITE 11 Schema • A collection of schemas and tables • Schemas can be arbitrarily nested
  • 12. ><INTRODUCTION TO APACHE CALCITE 12 Schema • A collection of schemas and tables • Schemas can be arbitrarily nested
  • 13. ><INTRODUCTION TO APACHE CALCITE 13 Table • Represents a single data set • Fields are defined by a RelDataType
  • 14. ><INTRODUCTION TO APACHE CALCITE 14 Table • Represents a single data set • Fields are defined by a RelDataType
  • 15. ><INTRODUCTION TO APACHE CALCITE 15 RelDataType • Represents the data type of an object • Supports all SQL data types, including structs and arrays • Similar to Spark’s DataType
  • 16. ><INTRODUCTION TO APACHE CALCITE 16 RelDataType
  • 17. ><INTRODUCTION TO APACHE CALCITE 17 RelDataType data type enum
  • 18. ><INTRODUCTION TO APACHE CALCITE 18 Statistic • Provide table statistics used in optimization
  • 19. ><INTRODUCTION TO APACHE CALCITE 19 Statistic • Provide table statistics used in optimization
  • 20. ><INTRODUCTION TO APACHE CALCITE 20 Usage of the Calcite catalog
  • 21. ><INTRODUCTION TO APACHE CALCITE 21 Usage of the Calcite catalog schema
  • 22. ><INTRODUCTION TO APACHE CALCITE 22 Usage of the Calcite catalog schema table
  • 23. ><INTRODUCTION TO APACHE CALCITE 23 Usage of the Calcite catalog schema table data type
  • 24. ><INTRODUCTION TO APACHE CALCITE 24 Usage of the Calcite catalog schema table data typedata type field
  • 26. ><INTRODUCTION TO APACHE CALCITE 26 Calcite SQL parser • LL(k) parser written in JavaCC • Input queries are parsed into an abstract syntax tree (AST) • Tokens are represented in Calcite by SqlNode • SqlNode can also be converted back to a SQL string via the unparse method
  • 27. ><INTRODUCTION TO APACHE CALCITE 27 JavaCC • Java Compiler Compiler • Created in 1996 at Sun Microsystems • Generates Java code from a domain- specific language • ANTLR is the modern alternative used in projects like Hive and Drill • JavaCC has sparse documentation
  • 28. ><INTRODUCTION TO APACHE CALCITE 28 JavaCC
  • 29. ><INTRODUCTION TO APACHE CALCITE 29 JavaCC
  • 30. ><INTRODUCTION TO APACHE CALCITE 30 JavaCC tokens
  • 31. ><INTRODUCTION TO APACHE CALCITE 31 JavaCC tokens or
  • 32. ><INTRODUCTION TO APACHE CALCITE 32 JavaCC tokens or function call
  • 33. ><INTRODUCTION TO APACHE CALCITE 33 JavaCC tokens Java code or function call
  • 34. ><INTRODUCTION TO APACHE CALCITE 34 SqlNode • SqlNode represents an element in an abstract syntax tree
  • 35. ><INTRODUCTION TO APACHE CALCITE 35 SqlNode • SqlNode represents an element in an abstract syntax tree select
  • 36. ><INTRODUCTION TO APACHE CALCITE 36 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect
  • 37. ><INTRODUCTION TO APACHE CALCITE 37 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect operator
  • 38. ><INTRODUCTION TO APACHE CALCITE 38 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect operator identifier
  • 39. ><INTRODUCTION TO APACHE CALCITE 39 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect operator identifier data type
  • 40. ><INTRODUCTION TO APACHE CALCITE 40 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect operator identifier data type identifier
  • 41. ><INTRODUCTION TO APACHE CALCITE 41 SqlNode • SqlNode’s unparse method converts a SQL element back into a string
  • 42. ><INTRODUCTION TO APACHE CALCITE 42 SqlNode • SqlNode’s unparse method converts a SQL element back into a string
  • 43. ><INTRODUCTION TO APACHE CALCITE 43 SqlNode
  • 44. ><INTRODUCTION TO APACHE CALCITE 44 SqlNode • SqlDialect indicates the capitalization and quoting rules of specific databases
  • 45. ><INTRODUCTION TO APACHE CALCITE 45 SqlNode • SqlDialect indicates the capitalization and quoting rules of specific databases
  • 47. ><INTRODUCTION TO APACHE CALCITE 47 Query Plans • Query plans represent the steps necessary to execute a query
  • 48. ><INTRODUCTION TO APACHE CALCITE 48 Query Plans • Query plans represent the steps necessary to execute a query
  • 49. ><INTRODUCTION TO APACHE CALCITE 49 Query Plans • Query plans represent the steps necessary to execute a query table scan table scan
  • 50. ><INTRODUCTION TO APACHE CALCITE 50 Query Plans • Query plans represent the steps necessary to execute a query inner join table scan table scan
  • 51. ><INTRODUCTION TO APACHE CALCITE 51 Query Plans • Query plans represent the steps necessary to execute a query filter inner join table scan table scan
  • 52. ><INTRODUCTION TO APACHE CALCITE 52 Query Plans • Query plans represent the steps necessary to execute a query filter inner join project table scan table scan
  • 53. ><INTRODUCTION TO APACHE CALCITE 53 Query Plans • Query plans represent the steps necessary to execute a query filter inner join project table scan table scan
  • 54. ><INTRODUCTION TO APACHE CALCITE 54 Query Optimization • Optimize logical plan • Goal is typically to try to reduce the amount of data that must be processed early in the plan • Convert logical plan into a physical plan • Physical plan is engine specific and represents the physical execution stages
  • 55. ><INTRODUCTION TO APACHE CALCITE 55 Query Optimization • Prune unused fields • Merge projections • Convert subqueries to joins • Reorder joins • Push down projections • Push down filters
  • 56. ><INTRODUCTION TO APACHE CALCITE 56 Query Optimization
  • 57. ><INTRODUCTION TO APACHE CALCITE 57 Query Optimization
  • 58. ><INTRODUCTION TO APACHE CALCITE 58 Query Optimization
  • 59. ><INTRODUCTION TO APACHE CALCITE 59 Query Optimization push down project
  • 60. ><INTRODUCTION TO APACHE CALCITE 60 Query Optimization push down project push down filter
  • 61. ><INTRODUCTION TO APACHE CALCITE 61 Query Optimization
  • 62. ><INTRODUCTION TO APACHE CALCITE 62 Key Concepts Relational algebra Row expressions Traits Conventions Rules Planners Programs
  • 63. ><INTRODUCTION TO APACHE CALCITE 63 Key Concepts Relational algebra Row expressions Traits Conventions Rules Planners Programs RelNode RexNode RelTrait Convention RelOptRule RelOptPlanner Program
  • 64. ><INTRODUCTION TO APACHE CALCITE 64 Relational Algebra • RelNode represents a relational expression • Largely equivalent to Spark’s DataFrame methods • Logical algebra • Physical algebra
  • 65. ><INTRODUCTION TO APACHE CALCITE 65 Relational Algebra TableScan Project Filter Aggregate Join Union Intersect Sort
  • 66. ><INTRODUCTION TO APACHE CALCITE 66 Relational Algebra TableScan Project Filter Aggregate Join Union Intersect Sort SparkTableScan SparkProject SparkFilter SparkAggregate SparkJoin SparkUnion SparkIntersect SparkSort
  • 67. ><INTRODUCTION TO APACHE CALCITE 67 Row Expressions • RexNode represents a row-level expression • Largely equivalent to Spark’s Column functions • Projection fields • Filter condition • Join condition • Sort fields
  • 68. ><INTRODUCTION TO APACHE CALCITE 68 Row Expressions Input column ref Literal Struct field access Function call Window expression
  • 69. ><INTRODUCTION TO APACHE CALCITE 69 Row Expressions Input column ref Literal Struct field access Function call Window expression RexInputRef RexLiteral RexFieldAccess RexCall RexOver
  • 70. ><INTRODUCTION TO APACHE CALCITE 70 Row Expressions
  • 71. ><INTRODUCTION TO APACHE CALCITE 71 Row Expressions input ref
  • 72. ><INTRODUCTION TO APACHE CALCITE 72 Row Expressions input ref function call
  • 73. ><INTRODUCTION TO APACHE CALCITE 73 Traits • Defined by the RelTrait interface • Represent a trait of a relational expression that does not alter execution • Traits are used to validate plan output • Three primary trait types: • Convention • RelCollation • RelDistribution
  • 74. ><INTRODUCTION TO APACHE CALCITE 74 Conventions • Convention is a type of RelTrait • A Convention is associated with a RelNode interface • SparkConvention, JdbcConvention, EnumerableConvention, etc • Conventions are used to represent a single data source • Inputs to a relational expression must be in the same convention
  • 75. ><INTRODUCTION TO APACHE CALCITE 75 Conventions
  • 76. ><INTRODUCTION TO APACHE CALCITE 76 Conventions Spark convention
  • 77. ><INTRODUCTION TO APACHE CALCITE 77 Conventions Spark convention JDBC convention
  • 78. ><INTRODUCTION TO APACHE CALCITE 78 Conventions Spark convention JDBC convention converter
  • 79. ><INTRODUCTION TO APACHE CALCITE 79 Rules • Rules are used to modify query plans • Defined by the RelOptRule interface • Two types of rules: converters and transformers • Converter rules implement Converter and convert from one convention to another • Rules are matched to elements of a query plan using pattern matching • onMatch is called for matched rules • Converter rules applied via convert
  • 80. ><INTRODUCTION TO APACHE CALCITE 80 Converter Rule
  • 81. ><INTRODUCTION TO APACHE CALCITE 81 Converter Rule expression type
  • 82. ><INTRODUCTION TO APACHE CALCITE 82 Converter Rule expression type input convention
  • 83. ><INTRODUCTION TO APACHE CALCITE 83 Converter Rule expression type input convention converted convention
  • 84. ><INTRODUCTION TO APACHE CALCITE 84 Converter Rule expression type input convention converted convention converter function
  • 85. ><INTRODUCTION TO APACHE CALCITE 85 Pattern Matching
  • 86. ><INTRODUCTION TO APACHE CALCITE 86 Pattern Matching
  • 87. ><INTRODUCTION TO APACHE CALCITE 87 Pattern Matching no match :-(
  • 88. ><INTRODUCTION TO APACHE CALCITE 88 Pattern Matching no match :-(
  • 89. ><INTRODUCTION TO APACHE CALCITE 89 Pattern Matching match! no match :-(
  • 90. ><INTRODUCTION TO APACHE CALCITE 90 Planners • Planners implement the RelOptPlanner interface • Two types of planners: • HepPlanner • VolcanoPlanner
  • 91. ><INTRODUCTION TO APACHE CALCITE 91 Heuristic Optimization • HepPlanner is a heuristic optimizer similar to Spark’s optimizer • Applies all matching rules until none can be applied • Heuristic optimization is faster than cost- based optimization • Risk of infinite recursion if rules make opposing changes to the plan
  • 92. ><INTRODUCTION TO APACHE CALCITE 92 Cost-based Optimization • VolcanoPlanner is a cost-based optimizer • Applies matching rules iteratively, selecting the plan with the cheapest cost on each iteration • Costs are provided by relational expressions • Not all possible plans can be computed • Stops optimization when the cost does not significantly improve through a determinable number of iterations
  • 93. ><INTRODUCTION TO APACHE CALCITE 93 Cost-based Optimization • Cost is provided by each RelNode • Cost is represented by RelOptCost • Cost typically includes row count, I/O, and CPU cost • Cost estimates are relative • Statistics are used to improve accuracy of cost estimations • Calcite provides utilities for computing various resource-related statistics for use in cost estimations
  • 94. ><INTRODUCTION TO APACHE CALCITE 94 Cost-based Optimization
  • 95. ><INTRODUCTION TO APACHE CALCITE 95 Cost-based Optimization
  • 96. ><INTRODUCTION TO APACHE CALCITE 96 Cost-based Optimization
  • 97. ><INTRODUCTION TO APACHE CALCITE 97 Cost-based Optimization
  • 98. ><INTRODUCTION TO APACHE CALCITE 98 Cost-based Optimization
  • 100. ><INTRODUCTION TO APACHE CALCITE 100 Putting it all together
  • 101. ><INTRODUCTION TO APACHE CALCITE 101 Putting it all together
  • 102. ><INTRODUCTION TO APACHE CALCITE 102 Putting it all together
  • 103. ><INTRODUCTION TO APACHE CALCITE 103 Putting it all together
  • 104. ><INTRODUCTION TO APACHE CALCITE 104 Putting it all together
  • 105. ><INTRODUCTION TO APACHE CALCITE 105 Putting it all together
  • 106. ><INTRODUCTION TO APACHE CALCITE 106 Putting it all together
  • 107. ><INTRODUCTION TO APACHE CALCITE 107 Putting it all together
  • 108. ><INTRODUCTION TO APACHE CALCITE 108 Putting it all together
  • 109. ><INTRODUCTION TO APACHE CALCITE 109 Putting it all together