Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite

Enable SQL/JDBC Access to Apache
Geode/GemFire Using Apache Calcite
By Christian Tzolov
@christzolov
1

Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Christian Tzolov
Spring Software Engineer at Pivotal
Spring Cloud Data Flow
Apache Committer, Crunch PMC member
Apache {Crunch, HAWQ, Zeppelin, Calcite...}
2
blog.tzolov.net
@christzolov
@tzolov

“What happens if an established NoSQL database
decides to implement a reasonably standard SQL;
The only predictable outcome for such an
eventuality is plenty of argument.”
Martin Fowler, P.J.Sadalage, NoSQL Distilled, 2012
3

Safe Harbor Statement
The following is intended to outline the general direction of Pivotal's offerings. It is
intended for information purposes only and may not be incorporated into any
contract. Any information regarding pre-release of Pivotal offerings, future updates
or other planned modifications is subject to ongoing evaluation by Pivotal and is
subject to change. This information is provided without warranty or any kind,
express or implied, and is not a commitment to deliver any material, code, or
functionality, and should not be relied upon in making purchasing decisions
regarding Pivotal's offerings. These purchasing decisions should only be based on
features currently available. The development, release, and timing of any features
or functionality described for Pivotal's offerings in this presentation remain at the
sole discretion of Pivotal. Pivotal has no obligation to update forward looking
information in this presentation.
5

Demo?
In a minute ;)
6

8

One Size Fit All*
“The traditional DBMS architecture has been used to support many
data-centric applications with widely varying characteristics and
requirements… We argue that this concept is no longer applicable to the
database market, and that the commercial world will fracture into a
collection of independent database engines, some of which may be
unified by a common front-end parser.”
Michael Stonebraker, 2005
9

Collection of Independent Databases
10
How to ingest and keep
consistent across DBs?
How to access the data
across multiple DBs?

Federation: Unified Query Interface
12

Federation: Two Mapping Tasks
13

Federation: Data Model Mapping

Geode Model Mapping
• PDX serialization
• Primitive Types
• Arrays (employ[0])
• Nested Objects
(parent[‘child.childChild…’])
15
Cache
Region 1
Region K
ValKey
v1k1
v2k2
Schema
Table 1
Table K
C
o
l
1
C
o
l
2
C
o
l
N
V(M,1)
R
o
w
M
V(M,2)
V(M,N
)
V(2,1)
R
o
w
2
V(2,2) V(2,N)
V(1,1)
R
o
w
1
V(1,2) V(1,N)
2. Regions into Tables
1. Cache into Schema
3. Key/Value into
Table Row
4. Column Types inferred
from PDX fields

SQL for NoSQL?
• Extended Relational Algebra - already present in most NoSql data
• Relational Expression Optimization – Desirable but hard to implement
17

Query = Relational Algebra
18
SELECT b.totalPrice, c.firstName
FROM BookOrder AS b
INNER JOIN BookCustomer AS c ON b.customerNumber = c.customerNumber
WHERE b.totalPrice > 0
…
}
Scan Scan
Join
Filter
Project
BookCustomer [c] BookOrder [b]
(on customerNumber)
(b.totalPrice > 0)
(c.firstName, b.totalPrice)
Scan Scan
Join
Project
BookCustomer [c] BookOrder [b]
(on
customerNumber)
(totalPrice > 0)
(c.firstName, b.totalPrice)
Project
(firstName,
customerNumber)
Filter
(totalPrice,
customerNumber)Project
<<Optimization>>
<<SQLParsing>>

“Native” Geode Operations
19
Relational Operation Supported on Geode
SORT/LIMIT YES (no Offset)
PROJECT YES
FILTER YES
AGGREGATE
SUM, AVG, MAX, MIN,
COUNT, GROUP BY,
DISTINCT
JOIN NO

Apache Calcite
• Java Framework
• SQL Query Parser, Validator and Optimizer(s)
• JDBC drivers - local and remote
• Agnostic to data storage and processing
• Powered by Calcite:
21

Apache Geode Adapter
22
Geode API and OQL
SQL/JDBC
Convert relational expressions into OQL queriesGeode Adapter
(Geode Client)
Geode ServerGeode ServerGeode Server
Data Data Data
Push down all supported expressions to Geode OQL
and fall back to Calcite Enumerable for the rest
Enumerable
Adapter
Apache Calcite Parse SQL into relational expression and optimizes
Calcite-Geode-Embedded(e.g.JDBC.jar)

Apache Geode Concepts
Cache Server
Cache
Region 1
Region N
ValKey
v1k1
v2k2
Cache - In-memory collection of
Regions
Region - consistent, distributed Map
(key-value),
Partitioned or Replicated
CacheServer – process
connected to the distributed
system with created Cache
ClientLocator
Client –read and modify the content
of the distributed system
Locator – tracks system members and
provides membership information
…
Listeners
FunctionsFunctions – distributed,
concurrent data processing
Listener – event handler.
Registers for one or more events
and notified when they occur

Key/Value vs. OQL
“to get extreme scale we use GemFire for what it does best: key-value
storage. Then, when we can’t design our way to using what GemFire does
best, we can use OQL and treat it like an object data-base”
Mike Stolz, “Scaling Data Services with Pivotal GemFire®”, 2018

Query Execution Flow
25
Enumerable
Interpreter
Prepare
SQL,
Relational,
Planner
Geode
Adapter
Binder
JDBC
Geode Cluster
1
2
3
4
5
6 7
7
7
2. Parse SQL, convert to rel. expressions. Validate
and Optimize them
3. Start building a physical plan from the relation
expressions
4. Implement the Geode relations and encode
them as Expression tree
5. Pass the Expression tree to the Interpreter to
generate Java code
6. Generate and Compile a Binder instance that
on ‘bind()’ call runs Geodes’ query method
1. On new SQL query JDBC delegates to Prepare
to prepare the query execution
7. JDBC uses the newly compiled Binder to
perform the query on the Geode Cluster
Calcite Framework
Geode Adapter
2

Query Mapping - Example
26
SELECT author
FROM BookMaster
WHERE retailCost > 0
LIMIT 1
…
}
SELECT author
FROM BookMaster
WHERE retailCost > 0
LIMIT 1 OFFSET 1
…
}

DB Tools Integration
28
Access Apache Geode /
GemFire over SQL/JDBC
Explore Geode & GemFire
Data with IntelliJ
SQL/Database tool

Analytics
29
Advanced Apache
Geode Data Analytics
with Apache Zeppelin
over SQL/JDBC

Data Federation
30
{Geode|Greenplum|...} Data Federation

Spatial (OpenGIS /PostGIS)
31

SQL Stream (in progress...)
32
https://calcite.apache.org/docs/stream.html

SQL Stream (2)
33
● How many rows come out for each row in?
● Does each incoming value appear in one total, or more?
● What defines the “window”, the set of rows that contribute to a given output row?
● Is the result a stream or a relation?
● tumbling window (GROUP BY)
● hopping window (multi GROUP BY)
● sliding window (window functions)
● cascading window (window
functions)

Pros & Cons
• Ad-hoc data exploration
• JDBC integration with 3rd
party
tools
• Data Federation, correlate Geode
with other data sources
• SQL Streaming as CQ++
• No-intrusive and extensible
approach
• What happened with SQLFire?
• Geode == Transactional System!
SQL+Geode <> Analytical System!
• Key/Value vs. Full Scan
• Overhead: SQL > OQL > Functions
• Data at Rest (Table) vs. Data at
Motion (Stream)

Future work
• Add Geode Adapter to Calcite Project: [CALCITE-2059] – VOTE! ;)
• Table Materialization
• SQL Streaming (https://calcite.apache.org/docs/stream.html)
• Pre-defined vs. Inferred Schema
• “Native” JOIN operation implementation
• Geode Indexes and Query statistics as Calcite metadata. Hint Indexes
• Mixing Key/Value and OQL data access
• Explore Geode functions as an alternative to OQL
• Explore TX support
• Performance Benchmark
• Calcite Spatial (https://calcite.apache.org/docs/spatial.html)

Credits
to Dan Baskette for inspiring this work
and to Roman Shaposhnik for helping
spread the word
37

References
38
[1 ] Apache Geode: http://geode.apache.org
[2] Geode Object Query Language (OQL): http://bit.ly/2BfXmNR
[3] Apache Calcite: https://calcite.apache.org
[4] Calcite - supported SQL language: https://calcite.apache.org/docs/reference.html
[5] Apache Geode Adapter for Apache Calcite: https://github.com/tzolov/calcite/tree/geode-1.3
[6] Relational Algebra Operations: http://bit.ly/2zX8cMc
[7] Apache Phoenix - http://phoenix.apache.org
[8] Big Data is Four Different Problems, 2016, M.Stonebraker (video): http://bit.ly/2jDmKpq
[9] Architecture of a Database System, 2007 (J.M. Hellerstein, M. Stonebraker, J. Hamilton):
http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf
[10] One Size Fits All (Michael Stonebraker): http://ieeexplore.ieee.org/document/1410100/?denied

Bonus: Geode Adapter Implementation
39
GeodeAdapterRel
+ implement(implContext)
GeodeAdapterConvention
Convention.Impl(“MyAdapter”)
Common interface for all MyAdapter Relation
Expressions. Provides implementation callback
method called as part of physical plan
implementation
ImplContext
+ implParm1
+ implParm2 …
RelNode
GeodeAdapterTable
+ toRel(optTable)
+ asQueryable(provider,…)
GeodeAdapterQueryable
+ myQuery(params) :
Enumetator
TranslatableTable
<<instance of>>
AbstractQueryableTable
AbstractTableQueryable <<create>>
Can convert
queries in
Expression
myQuery() implements the call to your DB
It is called by the auto generated code. It must
return an Enumberable instance
GeodeAdapterScan
+ register(planer) {
Registers all MyAdapter Rules
}
<<create>>
GeodeAdapterToEnumerableConvertorRule
operands: (RelNode.class,
MyAdapterConvention, EnumerableConvention) ConverterRue
TableScan
GeodeAdapterToEnumerableConvertor
+ implement(EnumerableRelImplementor) {
ctx = new MyAdapterRel.ImplContext()
getImputs().implement(ctx)
return BlockBuild.append( MY_QUERY_REF,
Expressions.constant(ctx.implParms1),
Expressions.constant(ctx.implParms2) …
EnumerableRel
ConvertorImpl
<<create on match >>
GeodeAdapterProject
GeodeAdapterFilter
GeodeAdapterXXX
RelOptRule
GeodeAdapterProjectRu
GeodeAdapterFilterRule
GeodeAdapterXXXRule
<<create on match >>
Recursively call the implement on each MyAdapter
Relation Expression
Encode the myQuery(params) call as
Expressions
MY_QUERY_REF = Types.lookupMethod(
MyAdapterQueryable.class,
”myQuery”,
String.class
String.class);
1
3
4
5
2
6
7
8
9
Calcite Framework
GeodeAdapter components

Learn More. Stay Connected.
Vote for [CALCITE-2059] and don’t miss:
Simplifying Apache Geode with Spring Data
Exploring Data-Driven, Cognitive Capabilities in Pivotal Cloud Foundry
Orchestrating Data Microservices with Spring Cloud Data Flow
40
#springone@s1p

Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite

Similar to Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite (20)

More from VMware Tanzu

More from VMware Tanzu (20)

Recently uploaded

Recently uploaded (20)

Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite