Introduction to
Apache Calcite
Josh Elser
MTS
2016-04-20
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About me
Apache Calcite is a project at the Apache Software Foundation.
This name is a trademark of the Foundation.
Apache Calcite Committer and PMC
(Slowly) Re-learning SQL
Distributed systems nerd
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Users
Apache
Kylin
Apache Samza
Quark
SQL-Gremlin/Apache TinkerPop
See the respective project pages at the ASF
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Brief History
 Originally known as “Optiq” (https://github.com/julianhyde/optiq): 2012-05-07
 Entered Apache Software Fundation’s Incubator: 2014-05-25
 Renamed to Apache Calcite (incubating): 2014-09-30
 Graduates to top-level project (TLP): 2015-10-21
 2 major releases since graduation: 2016-03-XX
 Currently comprised of 16 committers and 14 PMC members
“The foundation for your next high-performance database.”
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
SQL Parser
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SQL Parser
SELECT d.name, COUNT(*) as c
FROM Emps as e
JOIN Depts as d ON e.deptno = d.deptno
WHERE e.age < 30
GROUP BY d.deptno
HAVING COUNT(*) > 5
ORDER BY c DESC
Scan
Join
Filter
Aggregate
Filter
Project
Sort
Scanhttps://calcite.apache.org/docs/reference.html
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
SQL Parser
Cost-Based Optimizer
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cost-Based Optimizer
 Extensible
 Java API
– Parser output
– Inline Java code
AKA Relational Algebra
RelBuilder builder = RelBuilder.create(config);
RelNode node = builder
.scan("EMP")
.project(builder.field(“DEPTNO”),
builder.field(“ENAME”))
.build();
SELECT ename, deptno FROM emp;
LogicalProject(DEPTNO, ENAME)
LogicalTableScan(EMP)
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cost-Based Optimizer
SELECT d.name,
COUNT(*) as c
FROM depts AS d
JOIN emp AS e
on d.deptno = e.deptno
GROUP BY d.name;
Scan Emp[deptno]
Join
Aggregate
Scan Depts[deptno,
name]
Join
Aggregate
Project[name, c]
Scan Emp[*] Scan Depts[*]
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
SQL Parser
Cost-Based Optimizer
Pluggable Data Sources
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Pluggable Data Sources
 User-implemented
– Yes, you.
 Custom optimizations
– Predicate pushdown
– Projections
 Sources of Sources
– Federation
Everything but the data
Join
Aggregate
Project
Scan Emp[*] Scan Depts[*]
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
SQL Parser
Cost-Based Optimizer
Pluggable Data Sources
Avatica
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Avatica
 Calcite sub-project
 Wire protocol
– Protocol Buffers
– JSON
 Metrics
 Authentication
 Clients
– JDBC client
– Python and Go (in-progress)
JDBC over HTTP – SQL for Everyone
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You!
Email: elserj@apache.org
Twitter: @josh_elser
Mailing lists:
dev@calcite.apache.org
Project info:
https://calcite.apache.org/

Calcite meetup-2016-04-20

  • 1.
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights Reserved About me Apache Calcite is a project at the Apache Software Foundation. This name is a trademark of the Foundation. Apache Calcite Committer and PMC (Slowly) Re-learning SQL Distributed systems nerd
  • 3.
    3 © HortonworksInc. 2011 – 2016. All Rights Reserved Users Apache Kylin Apache Samza Quark SQL-Gremlin/Apache TinkerPop See the respective project pages at the ASF
  • 4.
    4 © HortonworksInc. 2011 – 2016. All Rights Reserved Brief History  Originally known as “Optiq” (https://github.com/julianhyde/optiq): 2012-05-07  Entered Apache Software Fundation’s Incubator: 2014-05-25  Renamed to Apache Calcite (incubating): 2014-09-30  Graduates to top-level project (TLP): 2015-10-21  2 major releases since graduation: 2016-03-XX  Currently comprised of 16 committers and 14 PMC members “The foundation for your next high-performance database.”
  • 5.
    5 © HortonworksInc. 2011 – 2016. All Rights Reserved Agenda SQL Parser
  • 6.
    6 © HortonworksInc. 2011 – 2016. All Rights Reserved SQL Parser SELECT d.name, COUNT(*) as c FROM Emps as e JOIN Depts as d ON e.deptno = d.deptno WHERE e.age < 30 GROUP BY d.deptno HAVING COUNT(*) > 5 ORDER BY c DESC Scan Join Filter Aggregate Filter Project Sort Scanhttps://calcite.apache.org/docs/reference.html
  • 7.
    7 © HortonworksInc. 2011 – 2016. All Rights Reserved Agenda SQL Parser Cost-Based Optimizer
  • 8.
    8 © HortonworksInc. 2011 – 2016. All Rights Reserved Cost-Based Optimizer  Extensible  Java API – Parser output – Inline Java code AKA Relational Algebra RelBuilder builder = RelBuilder.create(config); RelNode node = builder .scan("EMP") .project(builder.field(“DEPTNO”), builder.field(“ENAME”)) .build(); SELECT ename, deptno FROM emp; LogicalProject(DEPTNO, ENAME) LogicalTableScan(EMP)
  • 9.
    9 © HortonworksInc. 2011 – 2016. All Rights Reserved Cost-Based Optimizer SELECT d.name, COUNT(*) as c FROM depts AS d JOIN emp AS e on d.deptno = e.deptno GROUP BY d.name; Scan Emp[deptno] Join Aggregate Scan Depts[deptno, name] Join Aggregate Project[name, c] Scan Emp[*] Scan Depts[*]
  • 10.
    10 © HortonworksInc. 2011 – 2016. All Rights Reserved Agenda SQL Parser Cost-Based Optimizer Pluggable Data Sources
  • 11.
    11 © HortonworksInc. 2011 – 2016. All Rights Reserved Pluggable Data Sources  User-implemented – Yes, you.  Custom optimizations – Predicate pushdown – Projections  Sources of Sources – Federation Everything but the data Join Aggregate Project Scan Emp[*] Scan Depts[*]
  • 12.
    12 © HortonworksInc. 2011 – 2016. All Rights Reserved Agenda SQL Parser Cost-Based Optimizer Pluggable Data Sources Avatica
  • 13.
    13 © HortonworksInc. 2011 – 2016. All Rights Reserved Avatica  Calcite sub-project  Wire protocol – Protocol Buffers – JSON  Metrics  Authentication  Clients – JDBC client – Python and Go (in-progress) JDBC over HTTP – SQL for Everyone
  • 14.
    14 © HortonworksInc. 2011 – 2016. All Rights Reserved Thank You! Email: elserj@apache.org Twitter: @josh_elser Mailing lists: dev@calcite.apache.org Project info: https://calcite.apache.org/