Successfully reported this slideshow.
Your SlideShare is downloading. ×

Introduction to Apache Calcite

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 109 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to Introduction to Apache Calcite (20)

Advertisement

Recently uploaded (20)

Introduction to Apache Calcite

  1. 1. >< INTRODUCTIONTO INTRODUCTION TO APACHE CALCITE APACHE CALCITE JORDAN HALTERMAN 1
  2. 2. WHAT IS APACHE CALCITE? next 2
  3. 3. ><INTRODUCTION TO APACHE CALCITE 3 What is Apache Calcite? • A framework for building SQL databases • Developed over more than ten years • Written in Java • Previously known as Optiq • Previously known as Farrago • Became an Apache project in 2013 • Led by Julian Hyde at Hortonworks
  4. 4. ><INTRODUCTION TO APACHE CALCITE 4 Projects using Calcite • Apache Hive • Apache Drill • Apache Flink • Apache Phoenix • Apache Samza • Apache Storm • Apache everything…
  5. 5. ><INTRODUCTION TO APACHE CALCITE 5 What is Apache Calcite? • SQL parser • SQL validation • Query optimizer • SQL generator • Data federator
  6. 6. ><INTRODUCTION TO APACHE CALCITE Parse Queries are parsed using a JavaCC generated parser Validate Queries are validated against known database metadata Optimize Logical plans are optimized and converted into physical expressions Execute P h y s i c a l p l a n s a r e converted into application- specific executions 01 02 03 04 Stages of query execution 6
  7. 7. COMPONENTS next 7
  8. 8. ><INTRODUCTION TO APACHE CALCITE 8 Components of Calcite • Catalog - Defines metadata and namespaces that can be accessed in SQL queries • SQL parser - Parses valid SQL queries into an abstract syntax tree (AST) • SQL validator - Validates abstract syntax trees against metadata provided by the catalog • Query optimizer - Converts AST into logical plans, optimizes logical plans, and converts logical expressions into physical plans • SQL generator - Converts physical plans to SQL
  9. 9. CATALOG next 9
  10. 10. ><INTRODUCTION TO APACHE CALCITE 10 Calcite Catalog • Defines namespaces that can be accessed in Calcite queries • Schema • A collection of schemas and tables • Can be arbitrarily nested • Table • Represents a single data set • Fields defined by a RelDataType • RelDataType • Represents fields in a data set • Supports all SQL data types, including structs and
  11. 11. ><INTRODUCTION TO APACHE CALCITE 11 Schema • A collection of schemas and tables • Schemas can be arbitrarily nested
  12. 12. ><INTRODUCTION TO APACHE CALCITE 12 Schema • A collection of schemas and tables • Schemas can be arbitrarily nested
  13. 13. ><INTRODUCTION TO APACHE CALCITE 13 Table • Represents a single data set • Fields are defined by a RelDataType
  14. 14. ><INTRODUCTION TO APACHE CALCITE 14 Table • Represents a single data set • Fields are defined by a RelDataType
  15. 15. ><INTRODUCTION TO APACHE CALCITE 15 RelDataType • Represents the data type of an object • Supports all SQL data types, including structs and arrays • Similar to Spark’s DataType
  16. 16. ><INTRODUCTION TO APACHE CALCITE 16 RelDataType
  17. 17. ><INTRODUCTION TO APACHE CALCITE 17 RelDataType data type enum
  18. 18. ><INTRODUCTION TO APACHE CALCITE 18 Statistic • Provide table statistics used in optimization
  19. 19. ><INTRODUCTION TO APACHE CALCITE 19 Statistic • Provide table statistics used in optimization
  20. 20. ><INTRODUCTION TO APACHE CALCITE 20 Usage of the Calcite catalog
  21. 21. ><INTRODUCTION TO APACHE CALCITE 21 Usage of the Calcite catalog schema
  22. 22. ><INTRODUCTION TO APACHE CALCITE 22 Usage of the Calcite catalog schema table
  23. 23. ><INTRODUCTION TO APACHE CALCITE 23 Usage of the Calcite catalog schema table data type
  24. 24. ><INTRODUCTION TO APACHE CALCITE 24 Usage of the Calcite catalog schema table data typedata type field
  25. 25. SQL PARSER next 25
  26. 26. ><INTRODUCTION TO APACHE CALCITE 26 Calcite SQL parser • LL(k) parser written in JavaCC • Input queries are parsed into an abstract syntax tree (AST) • Tokens are represented in Calcite by SqlNode • SqlNode can also be converted back to a SQL string via the unparse method
  27. 27. ><INTRODUCTION TO APACHE CALCITE 27 JavaCC • Java Compiler Compiler • Created in 1996 at Sun Microsystems • Generates Java code from a domain- specific language • ANTLR is the modern alternative used in projects like Hive and Drill • JavaCC has sparse documentation
  28. 28. ><INTRODUCTION TO APACHE CALCITE 28 JavaCC
  29. 29. ><INTRODUCTION TO APACHE CALCITE 29 JavaCC
  30. 30. ><INTRODUCTION TO APACHE CALCITE 30 JavaCC tokens
  31. 31. ><INTRODUCTION TO APACHE CALCITE 31 JavaCC tokens or
  32. 32. ><INTRODUCTION TO APACHE CALCITE 32 JavaCC tokens or function call
  33. 33. ><INTRODUCTION TO APACHE CALCITE 33 JavaCC tokens Java code or function call
  34. 34. ><INTRODUCTION TO APACHE CALCITE 34 SqlNode • SqlNode represents an element in an abstract syntax tree
  35. 35. ><INTRODUCTION TO APACHE CALCITE 35 SqlNode • SqlNode represents an element in an abstract syntax tree select
  36. 36. ><INTRODUCTION TO APACHE CALCITE 36 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect
  37. 37. ><INTRODUCTION TO APACHE CALCITE 37 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect operator
  38. 38. ><INTRODUCTION TO APACHE CALCITE 38 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect operator identifier
  39. 39. ><INTRODUCTION TO APACHE CALCITE 39 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect operator identifier data type
  40. 40. ><INTRODUCTION TO APACHE CALCITE 40 SqlNode • SqlNode represents an element in an abstract syntax tree identifiersselect operator identifier data type identifier
  41. 41. ><INTRODUCTION TO APACHE CALCITE 41 SqlNode • SqlNode’s unparse method converts a SQL element back into a string
  42. 42. ><INTRODUCTION TO APACHE CALCITE 42 SqlNode • SqlNode’s unparse method converts a SQL element back into a string
  43. 43. ><INTRODUCTION TO APACHE CALCITE 43 SqlNode
  44. 44. ><INTRODUCTION TO APACHE CALCITE 44 SqlNode • SqlDialect indicates the capitalization and quoting rules of specific databases
  45. 45. ><INTRODUCTION TO APACHE CALCITE 45 SqlNode • SqlDialect indicates the capitalization and quoting rules of specific databases
  46. 46. QUERY OPTIMIZER next 46
  47. 47. ><INTRODUCTION TO APACHE CALCITE 47 Query Plans • Query plans represent the steps necessary to execute a query
  48. 48. ><INTRODUCTION TO APACHE CALCITE 48 Query Plans • Query plans represent the steps necessary to execute a query
  49. 49. ><INTRODUCTION TO APACHE CALCITE 49 Query Plans • Query plans represent the steps necessary to execute a query table scan table scan
  50. 50. ><INTRODUCTION TO APACHE CALCITE 50 Query Plans • Query plans represent the steps necessary to execute a query inner join table scan table scan
  51. 51. ><INTRODUCTION TO APACHE CALCITE 51 Query Plans • Query plans represent the steps necessary to execute a query filter inner join table scan table scan
  52. 52. ><INTRODUCTION TO APACHE CALCITE 52 Query Plans • Query plans represent the steps necessary to execute a query filter inner join project table scan table scan
  53. 53. ><INTRODUCTION TO APACHE CALCITE 53 Query Plans • Query plans represent the steps necessary to execute a query filter inner join project table scan table scan
  54. 54. ><INTRODUCTION TO APACHE CALCITE 54 Query Optimization • Optimize logical plan • Goal is typically to try to reduce the amount of data that must be processed early in the plan • Convert logical plan into a physical plan • Physical plan is engine specific and represents the physical execution stages
  55. 55. ><INTRODUCTION TO APACHE CALCITE 55 Query Optimization • Prune unused fields • Merge projections • Convert subqueries to joins • Reorder joins • Push down projections • Push down filters
  56. 56. ><INTRODUCTION TO APACHE CALCITE 56 Query Optimization
  57. 57. ><INTRODUCTION TO APACHE CALCITE 57 Query Optimization
  58. 58. ><INTRODUCTION TO APACHE CALCITE 58 Query Optimization
  59. 59. ><INTRODUCTION TO APACHE CALCITE 59 Query Optimization push down project
  60. 60. ><INTRODUCTION TO APACHE CALCITE 60 Query Optimization push down project push down filter
  61. 61. ><INTRODUCTION TO APACHE CALCITE 61 Query Optimization
  62. 62. ><INTRODUCTION TO APACHE CALCITE 62 Key Concepts Relational algebra Row expressions Traits Conventions Rules Planners Programs
  63. 63. ><INTRODUCTION TO APACHE CALCITE 63 Key Concepts Relational algebra Row expressions Traits Conventions Rules Planners Programs RelNode RexNode RelTrait Convention RelOptRule RelOptPlanner Program
  64. 64. ><INTRODUCTION TO APACHE CALCITE 64 Relational Algebra • RelNode represents a relational expression • Largely equivalent to Spark’s DataFrame methods • Logical algebra • Physical algebra
  65. 65. ><INTRODUCTION TO APACHE CALCITE 65 Relational Algebra TableScan Project Filter Aggregate Join Union Intersect Sort
  66. 66. ><INTRODUCTION TO APACHE CALCITE 66 Relational Algebra TableScan Project Filter Aggregate Join Union Intersect Sort SparkTableScan SparkProject SparkFilter SparkAggregate SparkJoin SparkUnion SparkIntersect SparkSort
  67. 67. ><INTRODUCTION TO APACHE CALCITE 67 Row Expressions • RexNode represents a row-level expression • Largely equivalent to Spark’s Column functions • Projection fields • Filter condition • Join condition • Sort fields
  68. 68. ><INTRODUCTION TO APACHE CALCITE 68 Row Expressions Input column ref Literal Struct field access Function call Window expression
  69. 69. ><INTRODUCTION TO APACHE CALCITE 69 Row Expressions Input column ref Literal Struct field access Function call Window expression RexInputRef RexLiteral RexFieldAccess RexCall RexOver
  70. 70. ><INTRODUCTION TO APACHE CALCITE 70 Row Expressions
  71. 71. ><INTRODUCTION TO APACHE CALCITE 71 Row Expressions input ref
  72. 72. ><INTRODUCTION TO APACHE CALCITE 72 Row Expressions input ref function call
  73. 73. ><INTRODUCTION TO APACHE CALCITE 73 Traits • Defined by the RelTrait interface • Represent a trait of a relational expression that does not alter execution • Traits are used to validate plan output • Three primary trait types: • Convention • RelCollation • RelDistribution
  74. 74. ><INTRODUCTION TO APACHE CALCITE 74 Conventions • Convention is a type of RelTrait • A Convention is associated with a RelNode interface • SparkConvention, JdbcConvention, EnumerableConvention, etc • Conventions are used to represent a single data source • Inputs to a relational expression must be in the same convention
  75. 75. ><INTRODUCTION TO APACHE CALCITE 75 Conventions
  76. 76. ><INTRODUCTION TO APACHE CALCITE 76 Conventions Spark convention
  77. 77. ><INTRODUCTION TO APACHE CALCITE 77 Conventions Spark convention JDBC convention
  78. 78. ><INTRODUCTION TO APACHE CALCITE 78 Conventions Spark convention JDBC convention converter
  79. 79. ><INTRODUCTION TO APACHE CALCITE 79 Rules • Rules are used to modify query plans • Defined by the RelOptRule interface • Two types of rules: converters and transformers • Converter rules implement Converter and convert from one convention to another • Rules are matched to elements of a query plan using pattern matching • onMatch is called for matched rules • Converter rules applied via convert
  80. 80. ><INTRODUCTION TO APACHE CALCITE 80 Converter Rule
  81. 81. ><INTRODUCTION TO APACHE CALCITE 81 Converter Rule expression type
  82. 82. ><INTRODUCTION TO APACHE CALCITE 82 Converter Rule expression type input convention
  83. 83. ><INTRODUCTION TO APACHE CALCITE 83 Converter Rule expression type input convention converted convention
  84. 84. ><INTRODUCTION TO APACHE CALCITE 84 Converter Rule expression type input convention converted convention converter function
  85. 85. ><INTRODUCTION TO APACHE CALCITE 85 Pattern Matching
  86. 86. ><INTRODUCTION TO APACHE CALCITE 86 Pattern Matching
  87. 87. ><INTRODUCTION TO APACHE CALCITE 87 Pattern Matching no match :-(
  88. 88. ><INTRODUCTION TO APACHE CALCITE 88 Pattern Matching no match :-(
  89. 89. ><INTRODUCTION TO APACHE CALCITE 89 Pattern Matching match! no match :-(
  90. 90. ><INTRODUCTION TO APACHE CALCITE 90 Planners • Planners implement the RelOptPlanner interface • Two types of planners: • HepPlanner • VolcanoPlanner
  91. 91. ><INTRODUCTION TO APACHE CALCITE 91 Heuristic Optimization • HepPlanner is a heuristic optimizer similar to Spark’s optimizer • Applies all matching rules until none can be applied • Heuristic optimization is faster than cost- based optimization • Risk of infinite recursion if rules make opposing changes to the plan
  92. 92. ><INTRODUCTION TO APACHE CALCITE 92 Cost-based Optimization • VolcanoPlanner is a cost-based optimizer • Applies matching rules iteratively, selecting the plan with the cheapest cost on each iteration • Costs are provided by relational expressions • Not all possible plans can be computed • Stops optimization when the cost does not significantly improve through a determinable number of iterations
  93. 93. ><INTRODUCTION TO APACHE CALCITE 93 Cost-based Optimization • Cost is provided by each RelNode • Cost is represented by RelOptCost • Cost typically includes row count, I/O, and CPU cost • Cost estimates are relative • Statistics are used to improve accuracy of cost estimations • Calcite provides utilities for computing various resource-related statistics for use in cost estimations
  94. 94. ><INTRODUCTION TO APACHE CALCITE 94 Cost-based Optimization
  95. 95. ><INTRODUCTION TO APACHE CALCITE 95 Cost-based Optimization
  96. 96. ><INTRODUCTION TO APACHE CALCITE 96 Cost-based Optimization
  97. 97. ><INTRODUCTION TO APACHE CALCITE 97 Cost-based Optimization
  98. 98. ><INTRODUCTION TO APACHE CALCITE 98 Cost-based Optimization
  99. 99. PUTTING IT ALL TOGETHER next 99
  100. 100. ><INTRODUCTION TO APACHE CALCITE 100 Putting it all together
  101. 101. ><INTRODUCTION TO APACHE CALCITE 101 Putting it all together
  102. 102. ><INTRODUCTION TO APACHE CALCITE 102 Putting it all together
  103. 103. ><INTRODUCTION TO APACHE CALCITE 103 Putting it all together
  104. 104. ><INTRODUCTION TO APACHE CALCITE 104 Putting it all together
  105. 105. ><INTRODUCTION TO APACHE CALCITE 105 Putting it all together
  106. 106. ><INTRODUCTION TO APACHE CALCITE 106 Putting it all together
  107. 107. ><INTRODUCTION TO APACHE CALCITE 107 Putting it all together
  108. 108. ><INTRODUCTION TO APACHE CALCITE 108 Putting it all together
  109. 109. ><INTRODUCTION TO APACHE CALCITE 109 Putting it all together

×