Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
><
INTRODUCTIONTO
INTRODUCTION TO APACHE CALCITE
APACHE CALCITE
JORDAN HALTERMAN
1
WHAT IS APACHE
CALCITE?
next 2
><INTRODUCTION TO APACHE CALCITE 3
What is Apache Calcite?
• A framework for building SQL databases
• Developed over more ...
><INTRODUCTION TO APACHE CALCITE 4
Projects using Calcite
• Apache Hive
• Apache Drill
• Apache Flink
• Apache Phoenix
• A...
><INTRODUCTION TO APACHE CALCITE 5
What is Apache Calcite?
• SQL parser
• SQL validation
• Query optimizer
• SQL generator...
><INTRODUCTION TO APACHE CALCITE
Parse
Queries are parsed using
a JavaCC generated
parser
Validate
Queries are validated
a...
COMPONENTS next 7
><INTRODUCTION TO APACHE CALCITE 8
Components of Calcite
• Catalog - Defines metadata and namespaces
that can be accessed i...
CATALOG next 9
><INTRODUCTION TO APACHE CALCITE 10
Calcite Catalog
• Defines namespaces that can be accessed in Calcite
queries
• Schema
•...
><INTRODUCTION TO APACHE CALCITE 11
Schema
• A collection of schemas and tables
• Schemas can be arbitrarily nested
><INTRODUCTION TO APACHE CALCITE 12
Schema
• A collection of schemas and tables
• Schemas can be arbitrarily nested
><INTRODUCTION TO APACHE CALCITE 13
Table
• Represents a single data set
• Fields are defined by a RelDataType
><INTRODUCTION TO APACHE CALCITE 14
Table
• Represents a single data set
• Fields are defined by a RelDataType
><INTRODUCTION TO APACHE CALCITE 15
RelDataType
• Represents the data type of an object
• Supports all SQL data types, inc...
><INTRODUCTION TO APACHE CALCITE 16
RelDataType
><INTRODUCTION TO APACHE CALCITE 17
RelDataType
data type enum
><INTRODUCTION TO APACHE CALCITE 18
Statistic
• Provide table statistics used in optimization
><INTRODUCTION TO APACHE CALCITE 19
Statistic
• Provide table statistics used in optimization
><INTRODUCTION TO APACHE CALCITE 20
Usage of the Calcite catalog
><INTRODUCTION TO APACHE CALCITE 21
Usage of the Calcite catalog
schema
><INTRODUCTION TO APACHE CALCITE 22
Usage of the Calcite catalog
schema table
><INTRODUCTION TO APACHE CALCITE 23
Usage of the Calcite catalog
schema table
data type
><INTRODUCTION TO APACHE CALCITE 24
Usage of the Calcite catalog
schema table
data typedata type field
SQL PARSER next 25
><INTRODUCTION TO APACHE CALCITE 26
Calcite SQL parser
• LL(k) parser written in JavaCC
• Input queries are parsed into an...
><INTRODUCTION TO APACHE CALCITE 27
JavaCC
• Java Compiler Compiler
• Created in 1996 at Sun Microsystems
• Generates Java...
><INTRODUCTION TO APACHE CALCITE 28
JavaCC
><INTRODUCTION TO APACHE CALCITE 29
JavaCC
><INTRODUCTION TO APACHE CALCITE 30
JavaCC
tokens
><INTRODUCTION TO APACHE CALCITE 31
JavaCC
tokens
or
><INTRODUCTION TO APACHE CALCITE 32
JavaCC
tokens
or
function call
><INTRODUCTION TO APACHE CALCITE 33
JavaCC
tokens
Java code
or
function call
><INTRODUCTION TO APACHE CALCITE 34
SqlNode
• SqlNode represents an element in an
abstract syntax tree
><INTRODUCTION TO APACHE CALCITE 35
SqlNode
• SqlNode represents an element in an
abstract syntax tree
select
><INTRODUCTION TO APACHE CALCITE 36
SqlNode
• SqlNode represents an element in an
abstract syntax tree
identifiersselect
><INTRODUCTION TO APACHE CALCITE 37
SqlNode
• SqlNode represents an element in an
abstract syntax tree
identifiersselect o...
><INTRODUCTION TO APACHE CALCITE 38
SqlNode
• SqlNode represents an element in an
abstract syntax tree
identifiersselect o...
><INTRODUCTION TO APACHE CALCITE 39
SqlNode
• SqlNode represents an element in an
abstract syntax tree
identifiersselect o...
><INTRODUCTION TO APACHE CALCITE 40
SqlNode
• SqlNode represents an element in an
abstract syntax tree
identifiersselect o...
><INTRODUCTION TO APACHE CALCITE 41
SqlNode
• SqlNode’s unparse method converts a
SQL element back into a string
><INTRODUCTION TO APACHE CALCITE 42
SqlNode
• SqlNode’s unparse method converts a
SQL element back into a string
><INTRODUCTION TO APACHE CALCITE 43
SqlNode
><INTRODUCTION TO APACHE CALCITE 44
SqlNode
• SqlDialect indicates the capitalization
and quoting rules of specific databas...
><INTRODUCTION TO APACHE CALCITE 45
SqlNode
• SqlDialect indicates the capitalization
and quoting rules of specific databas...
QUERY OPTIMIZER next 46
><INTRODUCTION TO APACHE CALCITE 47
Query Plans
• Query plans represent the steps necessary
to execute a query
><INTRODUCTION TO APACHE CALCITE 48
Query Plans
• Query plans represent the steps necessary
to execute a query
><INTRODUCTION TO APACHE CALCITE 49
Query Plans
• Query plans represent the steps necessary
to execute a query
table scan
...
><INTRODUCTION TO APACHE CALCITE 50
Query Plans
• Query plans represent the steps necessary
to execute a query
inner join ...
><INTRODUCTION TO APACHE CALCITE 51
Query Plans
• Query plans represent the steps necessary
to execute a query
filter
inne...
><INTRODUCTION TO APACHE CALCITE 52
Query Plans
• Query plans represent the steps necessary
to execute a query
filter
inne...
><INTRODUCTION TO APACHE CALCITE 53
Query Plans
• Query plans represent the steps necessary
to execute a query
filter
inne...
><INTRODUCTION TO APACHE CALCITE 54
Query Optimization
• Optimize logical plan
• Goal is typically to try to reduce the am...
><INTRODUCTION TO APACHE CALCITE 55
Query Optimization
• Prune unused fields
• Merge projections
• Convert subqueries to jo...
><INTRODUCTION TO APACHE CALCITE 56
Query Optimization
><INTRODUCTION TO APACHE CALCITE 57
Query Optimization
><INTRODUCTION TO APACHE CALCITE 58
Query Optimization
><INTRODUCTION TO APACHE CALCITE 59
Query Optimization
push down
project
><INTRODUCTION TO APACHE CALCITE 60
Query Optimization
push down
project
push down
filter
><INTRODUCTION TO APACHE CALCITE 61
Query Optimization
><INTRODUCTION TO APACHE CALCITE 62
Key Concepts
Relational algebra
Row expressions
Traits
Conventions
Rules
Planners
Prog...
><INTRODUCTION TO APACHE CALCITE 63
Key Concepts
Relational algebra
Row expressions
Traits
Conventions
Rules
Planners
Prog...
><INTRODUCTION TO APACHE CALCITE 64
Relational Algebra
• RelNode represents a relational expression
• Largely equivalent t...
><INTRODUCTION TO APACHE CALCITE 65
Relational Algebra
TableScan
Project
Filter
Aggregate
Join
Union
Intersect
Sort
><INTRODUCTION TO APACHE CALCITE 66
Relational Algebra
TableScan
Project
Filter
Aggregate
Join
Union
Intersect
Sort
SparkT...
><INTRODUCTION TO APACHE CALCITE 67
Row Expressions
• RexNode represents a row-level expression
• Largely equivalent to Sp...
><INTRODUCTION TO APACHE CALCITE 68
Row Expressions
Input column ref
Literal
Struct field access
Function call
Window expre...
><INTRODUCTION TO APACHE CALCITE 69
Row Expressions
Input column ref
Literal
Struct field access
Function call
Window expre...
><INTRODUCTION TO APACHE CALCITE 70
Row Expressions
><INTRODUCTION TO APACHE CALCITE 71
Row Expressions
input ref
><INTRODUCTION TO APACHE CALCITE 72
Row Expressions
input ref
function call
><INTRODUCTION TO APACHE CALCITE 73
Traits
• Defined by the RelTrait interface
• Represent a trait of a relational expressi...
><INTRODUCTION TO APACHE CALCITE 74
Conventions
• Convention is a type of RelTrait
• A Convention is associated with a
Rel...
><INTRODUCTION TO APACHE CALCITE 75
Conventions
><INTRODUCTION TO APACHE CALCITE 76
Conventions
Spark convention
><INTRODUCTION TO APACHE CALCITE 77
Conventions
Spark convention
JDBC
convention
><INTRODUCTION TO APACHE CALCITE 78
Conventions
Spark convention
JDBC
convention
converter
><INTRODUCTION TO APACHE CALCITE 79
Rules
• Rules are used to modify query plans
• Defined by the RelOptRule interface
• Tw...
><INTRODUCTION TO APACHE CALCITE 80
Converter Rule
><INTRODUCTION TO APACHE CALCITE 81
Converter Rule
expression type
><INTRODUCTION TO APACHE CALCITE 82
Converter Rule
expression type
input convention
><INTRODUCTION TO APACHE CALCITE 83
Converter Rule
expression type
input convention
converted convention
><INTRODUCTION TO APACHE CALCITE 84
Converter Rule
expression type
input convention
converted convention
converter function
><INTRODUCTION TO APACHE CALCITE 85
Pattern Matching
><INTRODUCTION TO APACHE CALCITE 86
Pattern Matching
><INTRODUCTION TO APACHE CALCITE 87
Pattern Matching
no match
:-(
><INTRODUCTION TO APACHE CALCITE 88
Pattern Matching
no match
:-(
><INTRODUCTION TO APACHE CALCITE 89
Pattern Matching
match!
no match
:-(
><INTRODUCTION TO APACHE CALCITE 90
Planners
• Planners implement the RelOptPlanner
interface
• Two types of planners:
• H...
><INTRODUCTION TO APACHE CALCITE 91
Heuristic Optimization
• HepPlanner is a heuristic optimizer similar
to Spark’s optimi...
><INTRODUCTION TO APACHE CALCITE 92
Cost-based Optimization
• VolcanoPlanner is a cost-based
optimizer
• Applies matching ...
><INTRODUCTION TO APACHE CALCITE 93
Cost-based Optimization
• Cost is provided by each RelNode
• Cost is represented by Re...
><INTRODUCTION TO APACHE CALCITE 94
Cost-based Optimization
><INTRODUCTION TO APACHE CALCITE 95
Cost-based Optimization
><INTRODUCTION TO APACHE CALCITE 96
Cost-based Optimization
><INTRODUCTION TO APACHE CALCITE 97
Cost-based Optimization
><INTRODUCTION TO APACHE CALCITE 98
Cost-based Optimization
PUTTING IT ALL
TOGETHER
next 99
><INTRODUCTION TO APACHE CALCITE 100
Putting it all together
><INTRODUCTION TO APACHE CALCITE 101
Putting it all together
><INTRODUCTION TO APACHE CALCITE 102
Putting it all together
><INTRODUCTION TO APACHE CALCITE 103
Putting it all together
><INTRODUCTION TO APACHE CALCITE 104
Putting it all together
><INTRODUCTION TO APACHE CALCITE 105
Putting it all together
><INTRODUCTION TO APACHE CALCITE 106
Putting it all together
><INTRODUCTION TO APACHE CALCITE 107
Putting it all together
><INTRODUCTION TO APACHE CALCITE 108
Putting it all together
><INTRODUCTION TO APACHE CALCITE 109
Putting it all together
Upcoming SlideShare
Loading in …5
×

Introduction to Apache Calcite

7,062 views

Published on

This talk provides an in-depth overview of the key concepts of Apache Calcite. It explores the Calcite catalog, parsing, validation, and optimization with various planners.

Published in: Software

Introduction to Apache Calcite

  1. 1. >< INTRODUCTIONTO INTRODUCTION TO APACHE CALCITE APACHE CALCITE JORDAN HALTERMAN 1
  2. 2. WHAT IS APACHE CALCITE? next 2
  3. 3. ><INTRODUCTION TO APACHE CALCITE 3 What is Apache Calcite? • A framework for building SQL databases • Developed over more than ten years • Written in Java • Previously known as Optiq • Previously known as Farrago • Became an Apache project in 2013 • Led by Julian Hyde at Hortonworks
  4. 4. ><INTRODUCTION TO APACHE CALCITE 4 Projects using Calcite • Apache Hive • Apache Drill • Apache Flink • Apache Phoenix • Apache Samza • Apache Storm • Apache everything…
  5. 5. ><INTRODUCTION TO APACHE CALCITE 5 What is Apache Calcite? • SQL parser • SQL validation • Query optimizer • SQL generator • Data federator
  6. 6. ><INTRODUCTION TO APACHE CALCITE Parse Queries are parsed using a JavaCC generated parser Validate Queries are validated against known database metadata Optimize Logical plans are optimized and converted into physical expressions Execute P h y s i c a l p l a n s a r e converted into application- specific executions 01 02 03 04 Stages of query execution 6
  7. 7. COMPONENTS next 7
  8. 8. ><INTRODUCTION TO APACHE CALCITE 8 Components of Calcite • Catalog - Defines metadata and namespaces that can be accessed in SQL queries • SQL parser - Parses valid SQL queries into an abstract syntax tree (AST) • SQL validator - Validates abstract syntax trees against metadata provided by the catalog • Query optimizer - Converts AST into logical plans, optimizes logical plans, and converts logical expressions into physical plans • SQL generator - Converts physical plans to SQL
  9. 9. CATALOG next 9
  10. 10. ><INTRODUCTION TO APACHE CALCITE 10 Calcite Catalog • Defines namespaces that can be accessed in Calcite queries • Schema • A collection of schemas and tables • Can be arbitrarily nested • Table • Represents a single data set • Fields defined by a RelDataType • RelDataType • Represents fields in a data set • Supports all SQL data types, including structs and
  11. 11. ><INTRODUCTION TO APACHE CALCITE 11 Schema • A collection of schemas and tables • Schemas can be arbitrarily nested
  12. 12. ><INTRODUCTION TO APACHE CALCITE 12 Schema • A collection of schemas and tables • Schemas can be arbitrarily nested
  13. 13. ><INTRODUCTION TO APACHE CALCITE 13 Table • Represents a single data set • Fields are defined by a RelDataType
  14. 14. ><INTRODUCTION TO APACHE CALCITE 14 Table • Represents a single data set • Fields are defined by a RelDataType
  15. 15. ><INTRODUCTION TO APACHE CALCITE 15 RelDataType • Represents the data type of an object • Supports all SQL data types, including structs and arrays • Similar to Spark’s DataType
  16. 16. ><INTRODUCTION TO APACHE CALCITE 16 RelDataType
  17. 17. ><INTRODUCTION TO APACHE CALCITE 17 RelDataType data type enum
  18. 18. ><INTRODUCTION TO APACHE CALCITE 18 Statistic • Provide table statistics used in optimization
  19. 19. ><INTRODUCTION TO APACHE CALCITE 19 Statistic • Provide table statistics used in optimization
  20. 20. ><INTRODUCTION TO APACHE CALCITE 20 Usage of the Calcite catalog
  21. 21. ><INTRODUCTION TO APACHE CALCITE 21 Usage of the Calcite catalog schema
  22. 22. ><INTRODUCTION TO APACHE CALCITE 22 Usage of the Calcite catalog schema table
  23. 23. ><INTRODUCTION TO APACHE CALCITE 23 Usage of the Calcite catalog schema table data type
  24. 24. ><INTRODUCTION TO APACHE CALCITE 24 Usage of the Calcite catalog schema table data typedata type field
  25. 25. SQL PARSER next 25
  26. 26. ><INTRODUCTION TO APACHE CALCITE 26 Calcite SQL parser • LL(k) parser written in JavaCC • Input queries are parsed into an abstract syntax tree (AST) • Tokens are represented in Calcite by SqlNode • SqlNode can also be converted back to a SQL string via the unparse method
  27. 27. ><INTRODUCTION TO APACHE CALCITE 27 JavaCC • Java Compiler Compiler • Created in 1996 at Sun Microsystems • Generates Java code from a domain- specific language • ANTLR is the modern alternative used in projects like Hive and Drill • JavaCC has sparse documentation
  28. 28. ><INTRODUCTION TO APACHE CALCITE 28 JavaCC
  29. 29. ><INTRODUCTION TO APACHE CALCITE 29 JavaCC
  30. 30. ><INTRODUCTION TO APACHE CALCITE 30 JavaCC tokens
  31. 31. ><INTRODUCTION TO APACHE CALCITE 31 JavaCC tokens or
  32. 32. ><INTRODUCTION TO APACHE CALCITE 32 JavaCC tokens or function call
  33. 33. ><INTRODUCTION TO APACHE CALCITE 33 JavaCC tokens Java code or function call
  34. 34. ><INTRODUCTION TO APACHE CALCITE 34 SqlNode • SqlNode represents an element in an abstract syntax tree
  35. 35. ><INTRODUCTION TO APACHE CALCITE 35 SqlNode • SqlNode represents an element in an abstract syntax tree select
  36. 36. ><INTRODUCTION TO APACHE CALCITE 36 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect
  37. 37. ><INTRODUCTION TO APACHE CALCITE 37 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect operator
  38. 38. ><INTRODUCTION TO APACHE CALCITE 38 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect operator identifier
  39. 39. ><INTRODUCTION TO APACHE CALCITE 39 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect operator identifier data type
  40. 40. ><INTRODUCTION TO APACHE CALCITE 40 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect operator identifier data type identifier
  41. 41. ><INTRODUCTION TO APACHE CALCITE 41 SqlNode • SqlNode’s unparse method converts a SQL element back into a string
  42. 42. ><INTRODUCTION TO APACHE CALCITE 42 SqlNode • SqlNode’s unparse method converts a SQL element back into a string
  43. 43. ><INTRODUCTION TO APACHE CALCITE 43 SqlNode
  44. 44. ><INTRODUCTION TO APACHE CALCITE 44 SqlNode • SqlDialect indicates the capitalization and quoting rules of specific databases
  45. 45. ><INTRODUCTION TO APACHE CALCITE 45 SqlNode • SqlDialect indicates the capitalization and quoting rules of specific databases
  46. 46. QUERY OPTIMIZER next 46
  47. 47. ><INTRODUCTION TO APACHE CALCITE 47 Query Plans • Query plans represent the steps necessary to execute a query
  48. 48. ><INTRODUCTION TO APACHE CALCITE 48 Query Plans • Query plans represent the steps necessary to execute a query
  49. 49. ><INTRODUCTION TO APACHE CALCITE 49 Query Plans • Query plans represent the steps necessary to execute a query table scan table scan
  50. 50. ><INTRODUCTION TO APACHE CALCITE 50 Query Plans • Query plans represent the steps necessary to execute a query inner join table scan table scan
  51. 51. ><INTRODUCTION TO APACHE CALCITE 51 Query Plans • Query plans represent the steps necessary to execute a query filter inner join table scan table scan
  52. 52. ><INTRODUCTION TO APACHE CALCITE 52 Query Plans • Query plans represent the steps necessary to execute a query filter inner join project table scan table scan
  53. 53. ><INTRODUCTION TO APACHE CALCITE 53 Query Plans • Query plans represent the steps necessary to execute a query filter inner join project table scan table scan
  54. 54. ><INTRODUCTION TO APACHE CALCITE 54 Query Optimization • Optimize logical plan • Goal is typically to try to reduce the amount of data that must be processed early in the plan • Convert logical plan into a physical plan • Physical plan is engine specific and represents the physical execution stages
  55. 55. ><INTRODUCTION TO APACHE CALCITE 55 Query Optimization • Prune unused fields • Merge projections • Convert subqueries to joins • Reorder joins • Push down projections • Push down filters
  56. 56. ><INTRODUCTION TO APACHE CALCITE 56 Query Optimization
  57. 57. ><INTRODUCTION TO APACHE CALCITE 57 Query Optimization
  58. 58. ><INTRODUCTION TO APACHE CALCITE 58 Query Optimization
  59. 59. ><INTRODUCTION TO APACHE CALCITE 59 Query Optimization push down project
  60. 60. ><INTRODUCTION TO APACHE CALCITE 60 Query Optimization push down project push down filter
  61. 61. ><INTRODUCTION TO APACHE CALCITE 61 Query Optimization
  62. 62. ><INTRODUCTION TO APACHE CALCITE 62 Key Concepts Relational algebra Row expressions Traits Conventions Rules Planners Programs
  63. 63. ><INTRODUCTION TO APACHE CALCITE 63 Key Concepts Relational algebra Row expressions Traits Conventions Rules Planners Programs RelNode RexNode RelTrait Convention RelOptRule RelOptPlanner Program
  64. 64. ><INTRODUCTION TO APACHE CALCITE 64 Relational Algebra • RelNode represents a relational expression • Largely equivalent to Spark’s DataFrame methods • Logical algebra • Physical algebra
  65. 65. ><INTRODUCTION TO APACHE CALCITE 65 Relational Algebra TableScan Project Filter Aggregate Join Union Intersect Sort
  66. 66. ><INTRODUCTION TO APACHE CALCITE 66 Relational Algebra TableScan Project Filter Aggregate Join Union Intersect Sort SparkTableScan SparkProject SparkFilter SparkAggregate SparkJoin SparkUnion SparkIntersect SparkSort
  67. 67. ><INTRODUCTION TO APACHE CALCITE 67 Row Expressions • RexNode represents a row-level expression • Largely equivalent to Spark’s Column functions • Projection fields • Filter condition • Join condition • Sort fields
  68. 68. ><INTRODUCTION TO APACHE CALCITE 68 Row Expressions Input column ref Literal Struct field access Function call Window expression
  69. 69. ><INTRODUCTION TO APACHE CALCITE 69 Row Expressions Input column ref Literal Struct field access Function call Window expression RexInputRef RexLiteral RexFieldAccess RexCall RexOver
  70. 70. ><INTRODUCTION TO APACHE CALCITE 70 Row Expressions
  71. 71. ><INTRODUCTION TO APACHE CALCITE 71 Row Expressions input ref
  72. 72. ><INTRODUCTION TO APACHE CALCITE 72 Row Expressions input ref function call
  73. 73. ><INTRODUCTION TO APACHE CALCITE 73 Traits • Defined by the RelTrait interface • Represent a trait of a relational expression that does not alter execution • Traits are used to validate plan output • Three primary trait types: • Convention • RelCollation • RelDistribution
  74. 74. ><INTRODUCTION TO APACHE CALCITE 74 Conventions • Convention is a type of RelTrait • A Convention is associated with a RelNode interface • SparkConvention, JdbcConvention, EnumerableConvention, etc • Conventions are used to represent a single data source • Inputs to a relational expression must be in the same convention
  75. 75. ><INTRODUCTION TO APACHE CALCITE 75 Conventions
  76. 76. ><INTRODUCTION TO APACHE CALCITE 76 Conventions Spark convention
  77. 77. ><INTRODUCTION TO APACHE CALCITE 77 Conventions Spark convention JDBC convention
  78. 78. ><INTRODUCTION TO APACHE CALCITE 78 Conventions Spark convention JDBC convention converter
  79. 79. ><INTRODUCTION TO APACHE CALCITE 79 Rules • Rules are used to modify query plans • Defined by the RelOptRule interface • Two types of rules: converters and transformers • Converter rules implement Converter and convert from one convention to another • Rules are matched to elements of a query plan using pattern matching • onMatch is called for matched rules • Converter rules applied via convert
  80. 80. ><INTRODUCTION TO APACHE CALCITE 80 Converter Rule
  81. 81. ><INTRODUCTION TO APACHE CALCITE 81 Converter Rule expression type
  82. 82. ><INTRODUCTION TO APACHE CALCITE 82 Converter Rule expression type input convention
  83. 83. ><INTRODUCTION TO APACHE CALCITE 83 Converter Rule expression type input convention converted convention
  84. 84. ><INTRODUCTION TO APACHE CALCITE 84 Converter Rule expression type input convention converted convention converter function
  85. 85. ><INTRODUCTION TO APACHE CALCITE 85 Pattern Matching
  86. 86. ><INTRODUCTION TO APACHE CALCITE 86 Pattern Matching
  87. 87. ><INTRODUCTION TO APACHE CALCITE 87 Pattern Matching no match :-(
  88. 88. ><INTRODUCTION TO APACHE CALCITE 88 Pattern Matching no match :-(
  89. 89. ><INTRODUCTION TO APACHE CALCITE 89 Pattern Matching match! no match :-(
  90. 90. ><INTRODUCTION TO APACHE CALCITE 90 Planners • Planners implement the RelOptPlanner interface • Two types of planners: • HepPlanner • VolcanoPlanner
  91. 91. ><INTRODUCTION TO APACHE CALCITE 91 Heuristic Optimization • HepPlanner is a heuristic optimizer similar to Spark’s optimizer • Applies all matching rules until none can be applied • Heuristic optimization is faster than cost- based optimization • Risk of infinite recursion if rules make opposing changes to the plan
  92. 92. ><INTRODUCTION TO APACHE CALCITE 92 Cost-based Optimization • VolcanoPlanner is a cost-based optimizer • Applies matching rules iteratively, selecting the plan with the cheapest cost on each iteration • Costs are provided by relational expressions • Not all possible plans can be computed • Stops optimization when the cost does not significantly improve through a determinable number of iterations
  93. 93. ><INTRODUCTION TO APACHE CALCITE 93 Cost-based Optimization • Cost is provided by each RelNode • Cost is represented by RelOptCost • Cost typically includes row count, I/O, and CPU cost • Cost estimates are relative • Statistics are used to improve accuracy of cost estimations • Calcite provides utilities for computing various resource-related statistics for use in cost estimations
  94. 94. ><INTRODUCTION TO APACHE CALCITE 94 Cost-based Optimization
  95. 95. ><INTRODUCTION TO APACHE CALCITE 95 Cost-based Optimization
  96. 96. ><INTRODUCTION TO APACHE CALCITE 96 Cost-based Optimization
  97. 97. ><INTRODUCTION TO APACHE CALCITE 97 Cost-based Optimization
  98. 98. ><INTRODUCTION TO APACHE CALCITE 98 Cost-based Optimization
  99. 99. PUTTING IT ALL TOGETHER next 99
  100. 100. ><INTRODUCTION TO APACHE CALCITE 100 Putting it all together
  101. 101. ><INTRODUCTION TO APACHE CALCITE 101 Putting it all together
  102. 102. ><INTRODUCTION TO APACHE CALCITE 102 Putting it all together
  103. 103. ><INTRODUCTION TO APACHE CALCITE 103 Putting it all together
  104. 104. ><INTRODUCTION TO APACHE CALCITE 104 Putting it all together
  105. 105. ><INTRODUCTION TO APACHE CALCITE 105 Putting it all together
  106. 106. ><INTRODUCTION TO APACHE CALCITE 106 Putting it all together
  107. 107. ><INTRODUCTION TO APACHE CALCITE 107 Putting it all together
  108. 108. ><INTRODUCTION TO APACHE CALCITE 108 Putting it all together
  109. 109. ><INTRODUCTION TO APACHE CALCITE 109 Putting it all together

×