The evolution of
Apache Calcite
UCSC CROSS
2021/05/25
Julian Hyde @julianhyde (Google)
Apache Calcite
Apache Calcite
Avatica
JDBC server
JDBC client
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
ODBC client
Adapter
Physical
operators
Storage
SQL parser &
validator
Query
planner
Relational
algebra
Core – Operator expressions
(relational algebra) and planner
(based on Cascades)
External – Data storage,
algorithms and catalog
Optional – SQL parser, JDBC &
ODBC drivers
Extensible – Planner rewrite
rules, statistics, cost model,
algebra, UDFs
RelBuilder
https://pixabay.com/photos/bread-flour-sourdough-yeast-fresh-5131398/
This talk
An open source project is only as strong as its community
In this talk, we tell the story of Calcite’s technical evolution; the community
evolved in parallel
The challenge: How to design for community growth?
Please interrupt and ask questions! (It’s much easier to write slides about tech
than community. But the “why didn’t you…?” questions are the most interesting.)
Apache Calcite goals
Make it easier to write a simple DBMS
Advance the state of the art for complex DBMS
Bring database approaches to new areas (e.g. streaming, geospatial, federation,
data science)
Composition + evolution (framework + open source)
Apache license & governance
LucidDB
C++
Calcite evolution - origins as an SMP DB
JDBC server
JDBC client
Physical
operators
Rewrite rules
Catalog
Storage & data
structures
SQL parser &
validator
Query
planner
Relational
algebra
Java
Optiq
Calcite evolution - pluggable components
JDBC server
JDBC client
Physical
operators
Rewrite rules
SQL parser &
validator
Query
planner
Relational
algebra
Optiq
Calcite evolution - pluggable components
JDBC server
JDBC client
SQL parser &
validator
Query
planner
Adapter
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
Physical
operators
Storage
Relational
algebra
Apache Calcite
Calcite evolution - separate JDBC stack
Avatica
JDBC server
JDBC client
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
ODBC client
Adapter
Physical
operators
Storage
SQL parser &
validator
Query
planner
Relational
algebra
Apache Calcite
Calcite evolution - federation via adapters
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
Adapter
Physical
operators
Storage
SQL parser &
validator
Query
planner
Relational
algebra
SQL
Calcite evolution - federation via adapters
Apache Calcite
JDBC adapter
Pluggable
rewrite rules
Pluggable
stats / cost
Enumerable
adapter
MongoDB
adapter
File adapter
(CSV, JSON, Http)
Apache Kafka
adapter
Apache Spark
adapter
Pluggable
catalog
SQL
SQL parser &
validator
Query
planner
Relational
algebra
Calcite evolution - federation via adapters
Apache Calcite
Pluggable
rewrite rules
Pluggable
stats / cost
Enumerable
adapter
Pluggable
catalog
SQL
SQL parser &
validator
Query
planner
Relational
algebra
Calcite evolution - federation via adapters
Apache Calcite
JDBC adapter
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
SQL
SQL parser &
validator
Query
planner
Relational
algebra
Apache Calcite
Calcite evolution - SQL dialects
Pluggable
rewrite rules
Pluggable parser, lexical,
conformance, operators
Pluggable
SQL dialect
SQL
SQL
SQL parser &
validator
Query
planner
Relational
algebra
JDBC adapter
Apache Calcite
Calcite evolution - other front-end languages
SQL
Adapter
Physical
operators
Storage
SQL parser &
validator
Query
planner
Relational
algebra
Calcite evolution - other front-end languages
Pig
RelBuilder
Adapter
Physical
operators
Morel
Storage
Query
planner
Relational
algebra
Datalog
SQL parser &
validator
SQL
Lessons learned
Decompose the database into components
SQL is standard but also allows innovation
Relational algebra intermediate language
Calcite has many uses, including:
● Embedded within DBMS (e.g. Apache Hive, OmniSciDB)
● Lightweight DBMS
● Platform for research
● Sandbox for relational algebra
● Toolkit for translating between SQL dialects
Designing for community - ten mantras
1. Use a permissive license
2. Nurture the community
3. Implement a standard, if possible
4. Design for users
5. Design for customization: configuration, extensibility, copy-paste
6. Do the simplest thing that could possibly work
7. A ‘broken’ feature is an opportunity to contribute
8. Free as in puppy
9. You can’t push on a rope
10. Sweep the floors, prime the pump
Designing for community
1. Use a permissive license You’ll get more adoption. People (and their employers) will contribute back if it’s in their
interests. Take the time to understand people’s motivations!
2. Nurture the community Make regular releases. Timely review PRs. Promote people on merit.
3. Implement a standard, if possible Calcite has benefited hugely from standard SQL. Lower barriers to exit; make
fewer specification mistakes; write less documentation
4. Design for users Make your contributors describe bugs & features in end-user terms; focus on the release notes
5. Design for customization: configuration, extensibility, copy-paste There’s no perfect answer. Some of these
choices will encourage contributions back. Avoid premature abstraction.
6. Do the simplest thing that could possibly work You can always revisit, if the feature becomes popular; but
complex features are harder to maintain
7. A ‘broken’ feature is an opportunity to contribute
8. Beware of “free as in puppy” Don’t accept contributions that can’t be maintained. Contributions must have tests,
must have users.
9. You can’t push on a rope Be aware when you have power, and when you don’t. E.g. you can’t promise features.
10. Sweep the floors, prime the pump Accept that there is some crap that just needs to be done, and no
self-interested party will pay for it. Especially building experimental features to a level where they show promise.
Thank you!
Questions?
@ApacheCalcite
https://calcite.apache.org

The evolution of Apache Calcite and its Community

  • 1.
    The evolution of ApacheCalcite UCSC CROSS 2021/05/25 Julian Hyde @julianhyde (Google)
  • 2.
    Apache Calcite Apache Calcite Avatica JDBCserver JDBC client Pluggable rewrite rules Pluggable stats / cost Pluggable catalog ODBC client Adapter Physical operators Storage SQL parser & validator Query planner Relational algebra Core – Operator expressions (relational algebra) and planner (based on Cascades) External – Data storage, algorithms and catalog Optional – SQL parser, JDBC & ODBC drivers Extensible – Planner rewrite rules, statistics, cost model, algebra, UDFs RelBuilder
  • 6.
  • 7.
    This talk An opensource project is only as strong as its community In this talk, we tell the story of Calcite’s technical evolution; the community evolved in parallel The challenge: How to design for community growth? Please interrupt and ask questions! (It’s much easier to write slides about tech than community. But the “why didn’t you…?” questions are the most interesting.)
  • 8.
    Apache Calcite goals Makeit easier to write a simple DBMS Advance the state of the art for complex DBMS Bring database approaches to new areas (e.g. streaming, geospatial, federation, data science) Composition + evolution (framework + open source) Apache license & governance
  • 9.
    LucidDB C++ Calcite evolution -origins as an SMP DB JDBC server JDBC client Physical operators Rewrite rules Catalog Storage & data structures SQL parser & validator Query planner Relational algebra Java
  • 10.
    Optiq Calcite evolution -pluggable components JDBC server JDBC client Physical operators Rewrite rules SQL parser & validator Query planner Relational algebra
  • 11.
    Optiq Calcite evolution -pluggable components JDBC server JDBC client SQL parser & validator Query planner Adapter Pluggable rewrite rules Pluggable stats / cost Pluggable catalog Physical operators Storage Relational algebra
  • 12.
    Apache Calcite Calcite evolution- separate JDBC stack Avatica JDBC server JDBC client Pluggable rewrite rules Pluggable stats / cost Pluggable catalog ODBC client Adapter Physical operators Storage SQL parser & validator Query planner Relational algebra
  • 13.
    Apache Calcite Calcite evolution- federation via adapters Pluggable rewrite rules Pluggable stats / cost Pluggable catalog Adapter Physical operators Storage SQL parser & validator Query planner Relational algebra SQL
  • 14.
    Calcite evolution -federation via adapters Apache Calcite JDBC adapter Pluggable rewrite rules Pluggable stats / cost Enumerable adapter MongoDB adapter File adapter (CSV, JSON, Http) Apache Kafka adapter Apache Spark adapter Pluggable catalog SQL SQL parser & validator Query planner Relational algebra
  • 15.
    Calcite evolution -federation via adapters Apache Calcite Pluggable rewrite rules Pluggable stats / cost Enumerable adapter Pluggable catalog SQL SQL parser & validator Query planner Relational algebra
  • 16.
    Calcite evolution -federation via adapters Apache Calcite JDBC adapter Pluggable rewrite rules Pluggable stats / cost Pluggable catalog SQL SQL parser & validator Query planner Relational algebra
  • 17.
    Apache Calcite Calcite evolution- SQL dialects Pluggable rewrite rules Pluggable parser, lexical, conformance, operators Pluggable SQL dialect SQL SQL SQL parser & validator Query planner Relational algebra JDBC adapter
  • 18.
    Apache Calcite Calcite evolution- other front-end languages SQL Adapter Physical operators Storage SQL parser & validator Query planner Relational algebra
  • 19.
    Calcite evolution -other front-end languages Pig RelBuilder Adapter Physical operators Morel Storage Query planner Relational algebra Datalog SQL parser & validator SQL
  • 23.
    Lessons learned Decompose thedatabase into components SQL is standard but also allows innovation Relational algebra intermediate language Calcite has many uses, including: ● Embedded within DBMS (e.g. Apache Hive, OmniSciDB) ● Lightweight DBMS ● Platform for research ● Sandbox for relational algebra ● Toolkit for translating between SQL dialects
  • 24.
    Designing for community- ten mantras 1. Use a permissive license 2. Nurture the community 3. Implement a standard, if possible 4. Design for users 5. Design for customization: configuration, extensibility, copy-paste 6. Do the simplest thing that could possibly work 7. A ‘broken’ feature is an opportunity to contribute 8. Free as in puppy 9. You can’t push on a rope 10. Sweep the floors, prime the pump
  • 25.
    Designing for community 1.Use a permissive license You’ll get more adoption. People (and their employers) will contribute back if it’s in their interests. Take the time to understand people’s motivations! 2. Nurture the community Make regular releases. Timely review PRs. Promote people on merit. 3. Implement a standard, if possible Calcite has benefited hugely from standard SQL. Lower barriers to exit; make fewer specification mistakes; write less documentation 4. Design for users Make your contributors describe bugs & features in end-user terms; focus on the release notes 5. Design for customization: configuration, extensibility, copy-paste There’s no perfect answer. Some of these choices will encourage contributions back. Avoid premature abstraction. 6. Do the simplest thing that could possibly work You can always revisit, if the feature becomes popular; but complex features are harder to maintain 7. A ‘broken’ feature is an opportunity to contribute 8. Beware of “free as in puppy” Don’t accept contributions that can’t be maintained. Contributions must have tests, must have users. 9. You can’t push on a rope Be aware when you have power, and when you don’t. E.g. you can’t promise features. 10. Sweep the floors, prime the pump Accept that there is some crap that just needs to be done, and no self-interested party will pay for it. Especially building experimental features to a level where they show promise.
  • 26.