SlideShare a Scribd company logo
1 of 34
Download to read offline
SQL for NoSQL and how
Apache Calcite can help
FOSDEM 2017
Christian Tzolov
2
Engineer at Pivotal
BigData, Hadoop, Spring Cloud Dataflow
Apache Committer, PMC member
Apache {Crunch, Geode, HAWQ, ...}
Disclaimer This talk expresses my personal opinions. It is not read or approved by Pivotal and does not necessarily reflect the
views and opinions of Pivotal nor does it constitute any official communication of Pivotal. Pivotal does not support any of the
code shared here.
blog.tzolov.net
twitter.com/christzolov
nl.linkedin.com/in/tzolov
3
“It will be interesting to see what happens if an established
NoSQL database decides to implement a reasonably
standard SQL;
The only predictable outcome for such an eventuality is plenty
of argument.”
2012, Martin Fowler, P.J.Sadalage, NoSQL Distilled
Data Big Bang
4
Why?
NoSQL Driving Forces
5
•  Infrastructure Automation and Elasticity (Cloud
Computing)
•  Rise of Internet Web, Mobile, IoT – Data Volume,
Velocity, Variety challenges
•  Row-based Relational Model. Object-Relational
Impedance Mismatch
ACID & 2PC clash with Distributed architectures. CAP, PAXOS instead..
More convenient data models: Datastores, Key/Value, Graph, Columnar,
Full-text Search, Schema-on-Load…
Eliminate operational complexity and cost. Shift from Integration to application
databases …
Data Big Bang Implications
6
•  Over 150 commercial NoSQL and
BigData systems.
•  Organizations will have to mix data
storage technologies!
•  How to integrate such multitude of
data systems?
“Standard” Data Process/Query Language?
7
•  Functional - Unified Programming
Model
•  Apache {Beam, Spark, Flink,
Apex, Crunch}, Cascading
•  Converging around Apache
Beam
•  Declarative - SQL
•  Adopted by many NoSQL
Vendors
•  Most Hadoop tasks: Hive and
SQL-on-Hadoop
•  Spark SQL - most used
production component for 2016
•  Google F1
pcollection.apply(Read.from(”in.txt"))
.apply(FlatMapElements.via((String word) ->
asList(word.split("[^a-zA-Z']+")))
.apply(Filter.by((String word)->!word.isEmpty()))
.apply(Count.<String>perElement())
SELECT b."totalPrice", c."firstName”
FROM "BookOrder" as b
INNER JOIN "Customer" as c
ON b."customerNumber" = c."customerNumber”
WHERE b."totalPrice" > 0;
Batch & Streaming, OLTP
OLAP, EDW, Exploration
SQL for NoSQL?
8
•  Extended Relational Algebra - already present in most NoSql data system
•  Relational Expression Optimization – Desirable but hard to implement
Organization Data - Integrated View
9
Single Federated DB (M:1:N)
HAWQ FDBS
NoSQL 1
PXF 1
Native
API 1
Apache HAWQ
Optimizer, Columnar
(HDFS)
Organization Data Tools
SQL/JDBC
NoSQL 1
PXF 2
Native
API 2
NoSQL n
PXF n
Native
API n
…
Organization Data Tools
NoSQL 1
Calcite
SQLAdapter 1
SQL/JDBC
NoSQL 2
Calcite
SQLAdapter 2
SQL/JDBC
NoSQL n
Calcite
SQLAdapter n
SQL/JDBC
…
Direct (M:N)
https://issues.apache.org/jira/browse/HAWQ-1235
Single Federated Database
10
Federated External Tables with Apache HAWQ - MPP, Shared-Noting, SQL-
on-Hadoop
CREATE EXTERNAL TABLE MyNoSQL
(
customer_id TEXT,
first_name TEXT,
last_name TEXT,
gender TEXT
)
LOCATION ('pxf://MyNoSQL-URL>?
FRAGMENTER=MyFragmenter&
ACCESSOR=MyAccessor&
RESOLVER=MyResolver&')
FORMAT
'custom'(formatter='pxfwritable_import');
Apache Calcite?
Java framework that allows SQL interface and advanced query optimization,
for virtually any data system
•  Query Parser, Validator and Optimizer(s)
•  JDBC drivers - local and remote
•  Agnostic to data storage and processing
Calcite Application
12
•  Apache Apex
•  Apache Drill
•  Apache Flink
•  Apache Hive
•  Apache Kylin
•  Apache Phoenix
•  Apache Samza
•  Apache Storm
•  Cascading
•  Qubole Quark
•  SQL-Gremlin
…
•  Apache Geode
SQL Adapter Design Choices
13
SQL completeness vs. NoSql design integrity
(simple) Predicate Pushdown: Scan, Filter,
Projection
(complex) Custom Relational Rules and
Operations: Sort, Join, GroupBy ...
Catalog – namespaces accessed in queries
Schema - collection of schemas and tables
Table - single data set, collection of rows
RelDataType – SQL fields types in a Table
•  Move Computation to Data
•  Data Type Conversion
Geode to Calcite Data Types Mapping
14
Geode Cache
Region 1
Region K
ValKey
v1k1
v2k2
…
Calcite Schema
Table 1
Table K
Col1 Col2 ColN
V(M,1)RowM V(M,2) V(M,N)
V(2,1)Row2 V(2,2) V(2,N)
V(1,1)Row1 V(1,2) V(1,N)
…
Regions are mapped into Tables
Geode Cache is mapped into Calcite Schema
Geode Key/Value is mapped
into Table Row
Create Column Types
(RelDataType) from Geode
Value class
(JavaTypeFactoryImpl)
Geode Adapter - Overview
Geode API and OQL
SQL/JDBC/ODBC
Convert SQL relational
expressions into OQL queries
Geode Adapter
(Geode Client)
Geode ServerGeode ServerGeode Server
Data Data Data
Push down the relational
expressions supported by Geode
OQL and falls back to the Calcite
Enumerable Adapter for the rest
Enumerable
Adapter
Apache Calcite
Spring Data
Geode
Spring Data API for
interacting with Geode
Parse SQL, converts into
relational expression and
optimizes
Simple SQL Adapter
16
<<SchemaFactory>>
MySchemaFactory
+create(operands):Schema
<<create>>
<<ScannableTable>>
MyTable
+getRowType(RelDataTypeFactor)
+scan(ctx):Ennumerator<Object[]>
<<Schema>>
MySchema
+getTableMap():Map<String, Table>)
<<on scan() create>>
<<Enummerator>>
MyEnummerator
+moveNext()
+convert(Object):E
My NoSQL
<<create>>
<<Get all Data>>
defaultSchema: 'MyNoSQL',
schemas: [{
name: ’MyNoSQLAdapter,
factory: MySchemaFactory’,
operand: { myNoSqlUrl: …, }
}]
!connect jdbc:calcite:model=path-to-model.json
Returns an Enumeration
over the entire target
data store
Uses reflection to builds
RelDataType from your
value’s class type
Converts MyNoSQL
value response into
Calcite row data
Defined in the Linq4j
sub-project
ScannableTable,
FilterableTable,
ProjectableFilterableTable
Initialize
Query
SELECT b."totalPrice”
FROM "BookOrder" as b
WHERE b."totalPrice" > 0;
Non-Relational Tables (Simple)
17
Scanned without intermediate relational expression.
•  ScannableTable - can be scanned
•  FilterableTable - can be scanned, applying supplied filter expressions
•  ProjectableFilterableTable - can be scanned, applying supplied filter expressions
and projecting a given list of columns
Enumerable<Object[]> scan(DataContext root, List<RexNode> filters, int[] projects);
Enumerable<Object[]> scan(DataContext root, List<RexNode> filters);
Enumerable<Object[]> scan(DataContext root);
Calcite Ecosystem
18
Several “semi-independent” projects.
JDBC and Avatica
Linq4j
Expression
Tree
Enumerable
Adapter
Relational
•  Relational Expressions
•  Row Expression
•  Optimization Rules
•  Planner …
SQL Parser & AST
Port of LINQ (Language-Integrated Query)
to Java.
Local and Remote
JDBC driver
Converts SQL
queries Into AST
(SqlNode …)
3rd party Adapters
Method for translating
executable code into data
(LINQ/MSN port)
Default (In-memory) Data
Store Adapter
implementation.
Leverages Linq4j
Relational Algebra,
expression,
optimizations …
Interpreter
Complies Java code
generated from linq4j
“Expressions”. Part of the
physical plan implementer
Calcite SQL Query Execution Flow
19
Enumerable
Interpreter
Prepare
SQL,
Relational,
Planner
Geode
Adapter
Binder
JDBC
Geode
Cluster
1
2
3
4
5
6 7
7
7
2. Parse SQL, convert to rel.
expressions. Validate and Optimize
them
3. Start building a physical plan from
the relation expressions
4. Implement the Geode relations and
encode them as Expression tree
5. Pass the Expression tree to the
Interpreter to generate Java code
6. Generate and Compile a Binder
instance that on ‘bind()’ call runs
Geodes’ query method
1. On new SQL query JDBC delegates
to Prepare to prepare the query
execution
7. JDBC uses the newly compiled
Binder to perform the query on the
Geode Cluster
Calcite Framework
Geode Adapter
2
Calcite Relational Expressions
20
RelNode
Relational
expression
TableScan
Project
Filter
Aggregate
Join
Intersect
Sort
RexlNode
Row-level
expressions
Project, Sort fields
Filter, Join conditions
Input Column
Ref
Literal
Struct field
access
Function call
Window
expressions
*
RelTrait
*
Physical attribute
of a relation
Calcite Relational Expressions
21
RelNode
+ register(RelOptPlander)
+ List<RelNode> getInputs();
RelOptPlanner
+findBestExp():RelNode
RexNode
RelTrait Convention
NONE
*
*
EnumberableConvention
RelOptRule
+ onMatch(call)
<<register>>
<<create>>
MyDBConvention
ConverterRule
+ RelNode convert(RelNode)
Converts from one calling
convention to another
Convertor
Indicate that it converts a
physical attribute only!
<<rules>>
*
<<inputs>>
*
<<root>>
Query optimizer: Transforms a
relational expression according to
a given set of rules and a cost model.
RelOptCluster
Rule transforms an expression into another. It has a list of
Operands, which determine whether the rule can be applied to
a particular section of the tree.RelOptRuleOperand
*<<fire criteria>>
Calling convention used
to represent a single
data source.
Inputs to a relational
expression must be in
the same convention
Calcite Adapter Implementation Patterns
22
MyAdapterRel
+ implement(implContext)
MyAdapterConvention
Convention.Impl(“MyAdapter”)
Common interface for all MyAdapter
Relation Expressions. Provides
implementation callback method called
as part of physical plan implementation
ImplContext
+ implParm1
+ implParm2 …
RelNode
MyAdapterTable
+ toRel(optTable)
+ asQueryable(provider,…)
MyAdapterQueryable
+ myQuery(params) :
Enumetator
TranslatableTable
<<instance of>>
AbstractQueryableTable
AbstractTableQueryable <<create>>
Can convert
queries in
Expression
myQuery() implements the call to your DB
It is called by the auto generated code. It
must return an Enumberable instance
MyAdapterScan
+ register(planer) {
Registers all MyAdapter Rules
}
<<create>>
MyAdapterToEnumerableConvertorRule
operands: (RelNode.class,
MyAdapterConvention, EnumerableConvention) ConverterRue
TableScan
MyAdapterToEnumerableConvertor
+ implement(EnumerableRelImplementor) {
ctx = new MyAdapterRel.ImplContext()
getImputs().implement(ctx)
return BlockBuild.append( MY_QUERY_REF,
Expressions.constant(ctx.implParms1),
Expressions.constant(ctx.implParms2) …
EnumerableRel
ConvertorImpl
<<create on match >>
MyAdapterProject
MyAdapterFilter
MyAdapterXXX
RelOptRule
MyAdapterProjectRu
MyAdapterFilterRule
MyAdapterXXXRule
<<create on match >>
Recursively call the implement on each
MyAdapter Relation Expression
Encode the myQuery(params) call as
Expressions
MY_QUERY_REF = Types.lookupMethod(
MyAdapterQueryable.class,
”myQuery”,
String.class
String.class);
1
3
4
5
2
6
7
8
9
Calcite Framework
MyAdapter components
Relational Algebra
23
Scan Scan
Join
Filter
Project
Customer [c] BookOrder [b]
(on customerNumber)
(b.totalPrice > 0)
(c.firstName, b.totalPrice)
SELECT b."totalPrice", c."firstName”
FROM "BookOrder" as b
INNER JOIN "Customer" as c ON b."customerNumber" = c."customerNumber”
WHERE b."totalPrice" > 0;
Scan Scan
Join
Project
Customer [c] BookOrder [b]
(on customerNumber)
(totalPrice > 0)
(c.firstName, b.totalPrice)
Project(firstName,
customerNumber)
Filter
(totalPrice,
customerNumber)Project
optimize
Calcite with Geode - Without Implementation
24
SELECT b."totalPrice", c."firstName”
FROM "BookOrder" as b
INNER JOIN "Customer" as c ON b."customerNumber" = c."customerNumber”
WHERE b."totalPrice" > 0;
Calcite with Geode – Scannable Table (Simple)
25
SELECT b."totalPrice", c."firstName”
FROM "BookOrder" as b
INNER JOIN "Customer" as c ON b."customerNumber" = c."customerNumber”
WHERE b."totalPrice" > 0;
Calcite with Geode – Relational (Complex)
26
SELECT b."totalPrice", c."firstName”
FROM "BookOrder" as b
INNER JOIN "Customer" as c ON b."customerNumber" = c."customerNumber”
WHERE b."totalPrice" > 0;
Calcite JDBC Connection
27
References
28
•  Big Data is Four Different Problems, 2016, M.Stonebraker:
https://www.youtube.com/watch?v=S79-buNhdhI
•  Turning Database Inside-Out, 2015 (M. Kleppmann)
https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza
•  NoSQL Distilled, 2012 (Pramod J. Sadalage and M.Fowler)
https://martinfowler.com/books/nosql.html
•  Architecture of a Database System, 2007 (J.M. Hellerstein, M. Stonebraker, J.
Hamilton)http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf
•  ORCA: A Modular Query Optimizer Architecture for Big Data:
http://15721.courses.cs.cmu.edu/spring2016/papers/p337-soliman.pdf
•  Apache Geode Project (2016) : http://geode.apache.org
•  Geode Object Query Language (OQL) : http://bit.ly/2eKywgp
•  Apache Calcite Project (2016) : https://calcite.apache.org
•  Apache Geode Adapter for Apache Calcite: https://github.com/tzolov/calcite
•  Relational Algebra Operations: https://www.coursera.org/learn/data-manipulation/lecture/
4JKs1/relational-algebra-operators-union-difference-selection
Thanks!
Apache Geode?
“… in-memory, distributed database
with strong consistency built to
support low latency transactional
applications at extreme scale”
Why Apache Geode?
31
5,700 train stations
4.5 million tickets per day
20 million daily users
1.4 billion page views per day
40,000 visits per second
7,000 stations
72,000 miles of track
23 million passengers daily
120,000 concurrent users
10,000 transactions per minute
https://pivotal.io/big-data/case-study/distributed-in-memory-data-management-solution
https://pivotal.io/big-data/case-study/scaling-online-sales-for-the-largest-railway-in-the-world-china-railway-corporation
China Railway
Geode Features
32
•  In-Memory Data Storage
–  Over 100TB Memory
–  JVM Heap + Off Heap
•  Any Data Format
–  Key-Value/Object Store
•  ACID and JTA Compliant
Transactions
•  HA and Linear Scalability
•  Strong Consistency
•  Streaming and Event Processing
–  Listeners
–  Distributed Functions
–  Continuous OQL Queries
•  Multi-site / Inter-cluster
•  Full Text Search (Lucene indexes)
•  Embedded and Standalone
•  Top Level Apache Project
Apache Geode Concepts
Cache Server (member)
Cache
Region 1
Region N
ValKe
y
v1k1
v2k2
…
Cache - In-memory collection
of Regions
Region - consistent, distributed
Map (key-value),
Partitioned or Replicated
CacheServer – process
connected to the distributed
system with created Cache
ClientLocator (member)
Client –read and modify the
content of the distributed
system
Locator – tracks system
members and provides
membership information
…
Listeners
Functions
Functions – distributed,
concurrent data
processing
Listener – event handler.
Registers for one or
more events and notified
when they occur
Geode Topology
Cache ServerCache ServerCache Server
Cache Data Cache Data Cache Data
Peer-to-Peer
Cache ServerCache ServerCache Server
Cache Data Cache Data Cache Data
Client
Local Cache
pool
Client-Server
Cache Server
Cache Server
Gateway Sender
…
Cache Server
Gateway Receiver
Cache ServerCache Server
Cache Data Cache Data Cache Data Cache Data
Gateway Receiver
Cache Server
… Gateway Sender
Cache Server
Cache Server
Cache Data Cache Data Cache Data Cache Data
WAN Multi-site Boundary Multi-Site

More Related Content

What's hot

Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Stamatis Zampetakis
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQLJulian Hyde
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Julian Hyde
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllMichael Mior
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache CalciteJulian Hyde
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityJulian Hyde
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteJulian Hyde
 
Oracle db performance tuning
Oracle db performance tuningOracle db performance tuning
Oracle db performance tuningSimon Huang
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connectKnoldus Inc.
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentationCyanny LIANG
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteJulian Hyde
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfEric Xiao
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slidesMohamed Farouk
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveAvkash Chauhan
 

What's hot (20)

Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Oracle db performance tuning
Oracle db performance tuningOracle db performance tuning
Oracle db performance tuning
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slides
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 

Viewers also liked

Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / OptiqJulian Hyde
 
Choosing an open source log management system for small business
Choosing an open source log management system for small businessChoosing an open source log management system for small business
Choosing an open source log management system for small businessFixNix Inc.,
 
phoenix-on-calcite-nyc-meetup
phoenix-on-calcite-nyc-meetupphoenix-on-calcite-nyc-meetup
phoenix-on-calcite-nyc-meetupMaryann Xue
 
Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite Hortonworks
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleMapR Technologies
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Josh Elser
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Summit
 
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016Carl Steinbach
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...The Hive
 

Viewers also liked (11)

Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / Optiq
 
Choosing an open source log management system for small business
Choosing an open source log management system for small businessChoosing an open source log management system for small business
Choosing an open source log management system for small business
 
phoenix-on-calcite-nyc-meetup
phoenix-on-calcite-nyc-meetupphoenix-on-calcite-nyc-meetup
phoenix-on-calcite-nyc-meetup
 
Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is Possible
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
 
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
 

Similar to SQL for NoSQL and how Apache Calcite can help

NoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSONNoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSONMario Beck
 
It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?Srihari Srinivasan
 
MySQL Document Store for Modern Applications
MySQL Document Store for Modern ApplicationsMySQL Document Store for Modern Applications
MySQL Document Store for Modern ApplicationsOlivier DASINI
 
Declarative Database Development with SQL Server Data Tools
Declarative Database Development with SQL Server Data ToolsDeclarative Database Development with SQL Server Data Tools
Declarative Database Development with SQL Server Data ToolsGert Drapers
 
The Making of the Oracle R2DBC Driver and How to Take Your Code from Synchron...
The Making of the Oracle R2DBC Driver and How to Take Your Code from Synchron...The Making of the Oracle R2DBC Driver and How to Take Your Code from Synchron...
The Making of the Oracle R2DBC Driver and How to Take Your Code from Synchron...VMware Tanzu
 
R2DBC JEEConf 2019 by Igor Lozynskyi
R2DBC JEEConf 2019 by Igor LozynskyiR2DBC JEEConf 2019 by Igor Lozynskyi
R2DBC JEEConf 2019 by Igor LozynskyiIgor Lozynskyi
 
Java database programming with jdbc
Java database programming with jdbcJava database programming with jdbc
Java database programming with jdbcsriram raj
 
SQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data ClusterSQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data ClusterMaximiliano Accotto
 
Reactive Relational Database Connectivity
Reactive Relational Database ConnectivityReactive Relational Database Connectivity
Reactive Relational Database ConnectivityVMware Tanzu
 
Ms Sql Business Inteligence With My Sql
Ms Sql Business Inteligence With My SqlMs Sql Business Inteligence With My Sql
Ms Sql Business Inteligence With My Sqlrsnarayanan
 
High Performance Jdbc
High Performance JdbcHigh Performance Jdbc
High Performance JdbcSam Pattsin
 
Data virtualization, Data Federation & IaaS with Jboss Teiid
Data virtualization, Data Federation & IaaS with Jboss TeiidData virtualization, Data Federation & IaaS with Jboss Teiid
Data virtualization, Data Federation & IaaS with Jboss TeiidAnil Allewar
 
Spring db-access mod03
Spring db-access mod03Spring db-access mod03
Spring db-access mod03Guo Albert
 
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...Data Con LA
 
Spark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkSpark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkMatt Ingenthron
 
TIQ Solutions - QlikView Data Integration in a Java World
TIQ Solutions - QlikView Data Integration in a Java WorldTIQ Solutions - QlikView Data Integration in a Java World
TIQ Solutions - QlikView Data Integration in a Java WorldVizlib Ltd.
 

Similar to SQL for NoSQL and how Apache Calcite can help (20)

NoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSONNoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSON
 
Mobile
MobileMobile
Mobile
 
It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?
 
MySQL Document Store for Modern Applications
MySQL Document Store for Modern ApplicationsMySQL Document Store for Modern Applications
MySQL Document Store for Modern Applications
 
Declarative Database Development with SQL Server Data Tools
Declarative Database Development with SQL Server Data ToolsDeclarative Database Development with SQL Server Data Tools
Declarative Database Development with SQL Server Data Tools
 
The Making of the Oracle R2DBC Driver and How to Take Your Code from Synchron...
The Making of the Oracle R2DBC Driver and How to Take Your Code from Synchron...The Making of the Oracle R2DBC Driver and How to Take Your Code from Synchron...
The Making of the Oracle R2DBC Driver and How to Take Your Code from Synchron...
 
R2DBC JEEConf 2019 by Igor Lozynskyi
R2DBC JEEConf 2019 by Igor LozynskyiR2DBC JEEConf 2019 by Igor Lozynskyi
R2DBC JEEConf 2019 by Igor Lozynskyi
 
Java database programming with jdbc
Java database programming with jdbcJava database programming with jdbc
Java database programming with jdbc
 
SQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data ClusterSQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data Cluster
 
Reactive Relational Database Connectivity
Reactive Relational Database ConnectivityReactive Relational Database Connectivity
Reactive Relational Database Connectivity
 
Ms Sql Business Inteligence With My Sql
Ms Sql Business Inteligence With My SqlMs Sql Business Inteligence With My Sql
Ms Sql Business Inteligence With My Sql
 
High Performance Jdbc
High Performance JdbcHigh Performance Jdbc
High Performance Jdbc
 
Data virtualization, Data Federation & IaaS with Jboss Teiid
Data virtualization, Data Federation & IaaS with Jboss TeiidData virtualization, Data Federation & IaaS with Jboss Teiid
Data virtualization, Data Federation & IaaS with Jboss Teiid
 
Practical OData
Practical ODataPractical OData
Practical OData
 
Spring db-access mod03
Spring db-access mod03Spring db-access mod03
Spring db-access mod03
 
Couch db
Couch dbCouch db
Couch db
 
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
 
Spark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkSpark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with Spark
 
Spark sql meetup
Spark sql meetupSpark sql meetup
Spark sql meetup
 
TIQ Solutions - QlikView Data Integration in a Java World
TIQ Solutions - QlikView Data Integration in a Java WorldTIQ Solutions - QlikView Data Integration in a Java World
TIQ Solutions - QlikView Data Integration in a Java World
 

Recently uploaded

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 

Recently uploaded (20)

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 

SQL for NoSQL and how Apache Calcite can help

  • 1. SQL for NoSQL and how Apache Calcite can help FOSDEM 2017
  • 2. Christian Tzolov 2 Engineer at Pivotal BigData, Hadoop, Spring Cloud Dataflow Apache Committer, PMC member Apache {Crunch, Geode, HAWQ, ...} Disclaimer This talk expresses my personal opinions. It is not read or approved by Pivotal and does not necessarily reflect the views and opinions of Pivotal nor does it constitute any official communication of Pivotal. Pivotal does not support any of the code shared here. blog.tzolov.net twitter.com/christzolov nl.linkedin.com/in/tzolov
  • 3. 3 “It will be interesting to see what happens if an established NoSQL database decides to implement a reasonably standard SQL; The only predictable outcome for such an eventuality is plenty of argument.” 2012, Martin Fowler, P.J.Sadalage, NoSQL Distilled
  • 5. NoSQL Driving Forces 5 •  Infrastructure Automation and Elasticity (Cloud Computing) •  Rise of Internet Web, Mobile, IoT – Data Volume, Velocity, Variety challenges •  Row-based Relational Model. Object-Relational Impedance Mismatch ACID & 2PC clash with Distributed architectures. CAP, PAXOS instead.. More convenient data models: Datastores, Key/Value, Graph, Columnar, Full-text Search, Schema-on-Load… Eliminate operational complexity and cost. Shift from Integration to application databases …
  • 6. Data Big Bang Implications 6 •  Over 150 commercial NoSQL and BigData systems. •  Organizations will have to mix data storage technologies! •  How to integrate such multitude of data systems?
  • 7. “Standard” Data Process/Query Language? 7 •  Functional - Unified Programming Model •  Apache {Beam, Spark, Flink, Apex, Crunch}, Cascading •  Converging around Apache Beam •  Declarative - SQL •  Adopted by many NoSQL Vendors •  Most Hadoop tasks: Hive and SQL-on-Hadoop •  Spark SQL - most used production component for 2016 •  Google F1 pcollection.apply(Read.from(”in.txt")) .apply(FlatMapElements.via((String word) -> asList(word.split("[^a-zA-Z']+"))) .apply(Filter.by((String word)->!word.isEmpty())) .apply(Count.<String>perElement()) SELECT b."totalPrice", c."firstName” FROM "BookOrder" as b INNER JOIN "Customer" as c ON b."customerNumber" = c."customerNumber” WHERE b."totalPrice" > 0; Batch & Streaming, OLTP OLAP, EDW, Exploration
  • 8. SQL for NoSQL? 8 •  Extended Relational Algebra - already present in most NoSql data system •  Relational Expression Optimization – Desirable but hard to implement
  • 9. Organization Data - Integrated View 9 Single Federated DB (M:1:N) HAWQ FDBS NoSQL 1 PXF 1 Native API 1 Apache HAWQ Optimizer, Columnar (HDFS) Organization Data Tools SQL/JDBC NoSQL 1 PXF 2 Native API 2 NoSQL n PXF n Native API n … Organization Data Tools NoSQL 1 Calcite SQLAdapter 1 SQL/JDBC NoSQL 2 Calcite SQLAdapter 2 SQL/JDBC NoSQL n Calcite SQLAdapter n SQL/JDBC … Direct (M:N) https://issues.apache.org/jira/browse/HAWQ-1235
  • 10. Single Federated Database 10 Federated External Tables with Apache HAWQ - MPP, Shared-Noting, SQL- on-Hadoop CREATE EXTERNAL TABLE MyNoSQL ( customer_id TEXT, first_name TEXT, last_name TEXT, gender TEXT ) LOCATION ('pxf://MyNoSQL-URL>? FRAGMENTER=MyFragmenter& ACCESSOR=MyAccessor& RESOLVER=MyResolver&') FORMAT 'custom'(formatter='pxfwritable_import');
  • 11. Apache Calcite? Java framework that allows SQL interface and advanced query optimization, for virtually any data system •  Query Parser, Validator and Optimizer(s) •  JDBC drivers - local and remote •  Agnostic to data storage and processing
  • 12. Calcite Application 12 •  Apache Apex •  Apache Drill •  Apache Flink •  Apache Hive •  Apache Kylin •  Apache Phoenix •  Apache Samza •  Apache Storm •  Cascading •  Qubole Quark •  SQL-Gremlin … •  Apache Geode
  • 13. SQL Adapter Design Choices 13 SQL completeness vs. NoSql design integrity (simple) Predicate Pushdown: Scan, Filter, Projection (complex) Custom Relational Rules and Operations: Sort, Join, GroupBy ... Catalog – namespaces accessed in queries Schema - collection of schemas and tables Table - single data set, collection of rows RelDataType – SQL fields types in a Table •  Move Computation to Data •  Data Type Conversion
  • 14. Geode to Calcite Data Types Mapping 14 Geode Cache Region 1 Region K ValKey v1k1 v2k2 … Calcite Schema Table 1 Table K Col1 Col2 ColN V(M,1)RowM V(M,2) V(M,N) V(2,1)Row2 V(2,2) V(2,N) V(1,1)Row1 V(1,2) V(1,N) … Regions are mapped into Tables Geode Cache is mapped into Calcite Schema Geode Key/Value is mapped into Table Row Create Column Types (RelDataType) from Geode Value class (JavaTypeFactoryImpl)
  • 15. Geode Adapter - Overview Geode API and OQL SQL/JDBC/ODBC Convert SQL relational expressions into OQL queries Geode Adapter (Geode Client) Geode ServerGeode ServerGeode Server Data Data Data Push down the relational expressions supported by Geode OQL and falls back to the Calcite Enumerable Adapter for the rest Enumerable Adapter Apache Calcite Spring Data Geode Spring Data API for interacting with Geode Parse SQL, converts into relational expression and optimizes
  • 16. Simple SQL Adapter 16 <<SchemaFactory>> MySchemaFactory +create(operands):Schema <<create>> <<ScannableTable>> MyTable +getRowType(RelDataTypeFactor) +scan(ctx):Ennumerator<Object[]> <<Schema>> MySchema +getTableMap():Map<String, Table>) <<on scan() create>> <<Enummerator>> MyEnummerator +moveNext() +convert(Object):E My NoSQL <<create>> <<Get all Data>> defaultSchema: 'MyNoSQL', schemas: [{ name: ’MyNoSQLAdapter, factory: MySchemaFactory’, operand: { myNoSqlUrl: …, } }] !connect jdbc:calcite:model=path-to-model.json Returns an Enumeration over the entire target data store Uses reflection to builds RelDataType from your value’s class type Converts MyNoSQL value response into Calcite row data Defined in the Linq4j sub-project ScannableTable, FilterableTable, ProjectableFilterableTable Initialize Query SELECT b."totalPrice” FROM "BookOrder" as b WHERE b."totalPrice" > 0;
  • 17. Non-Relational Tables (Simple) 17 Scanned without intermediate relational expression. •  ScannableTable - can be scanned •  FilterableTable - can be scanned, applying supplied filter expressions •  ProjectableFilterableTable - can be scanned, applying supplied filter expressions and projecting a given list of columns Enumerable<Object[]> scan(DataContext root, List<RexNode> filters, int[] projects); Enumerable<Object[]> scan(DataContext root, List<RexNode> filters); Enumerable<Object[]> scan(DataContext root);
  • 18. Calcite Ecosystem 18 Several “semi-independent” projects. JDBC and Avatica Linq4j Expression Tree Enumerable Adapter Relational •  Relational Expressions •  Row Expression •  Optimization Rules •  Planner … SQL Parser & AST Port of LINQ (Language-Integrated Query) to Java. Local and Remote JDBC driver Converts SQL queries Into AST (SqlNode …) 3rd party Adapters Method for translating executable code into data (LINQ/MSN port) Default (In-memory) Data Store Adapter implementation. Leverages Linq4j Relational Algebra, expression, optimizations … Interpreter Complies Java code generated from linq4j “Expressions”. Part of the physical plan implementer
  • 19. Calcite SQL Query Execution Flow 19 Enumerable Interpreter Prepare SQL, Relational, Planner Geode Adapter Binder JDBC Geode Cluster 1 2 3 4 5 6 7 7 7 2. Parse SQL, convert to rel. expressions. Validate and Optimize them 3. Start building a physical plan from the relation expressions 4. Implement the Geode relations and encode them as Expression tree 5. Pass the Expression tree to the Interpreter to generate Java code 6. Generate and Compile a Binder instance that on ‘bind()’ call runs Geodes’ query method 1. On new SQL query JDBC delegates to Prepare to prepare the query execution 7. JDBC uses the newly compiled Binder to perform the query on the Geode Cluster Calcite Framework Geode Adapter 2
  • 20. Calcite Relational Expressions 20 RelNode Relational expression TableScan Project Filter Aggregate Join Intersect Sort RexlNode Row-level expressions Project, Sort fields Filter, Join conditions Input Column Ref Literal Struct field access Function call Window expressions * RelTrait * Physical attribute of a relation
  • 21. Calcite Relational Expressions 21 RelNode + register(RelOptPlander) + List<RelNode> getInputs(); RelOptPlanner +findBestExp():RelNode RexNode RelTrait Convention NONE * * EnumberableConvention RelOptRule + onMatch(call) <<register>> <<create>> MyDBConvention ConverterRule + RelNode convert(RelNode) Converts from one calling convention to another Convertor Indicate that it converts a physical attribute only! <<rules>> * <<inputs>> * <<root>> Query optimizer: Transforms a relational expression according to a given set of rules and a cost model. RelOptCluster Rule transforms an expression into another. It has a list of Operands, which determine whether the rule can be applied to a particular section of the tree.RelOptRuleOperand *<<fire criteria>> Calling convention used to represent a single data source. Inputs to a relational expression must be in the same convention
  • 22. Calcite Adapter Implementation Patterns 22 MyAdapterRel + implement(implContext) MyAdapterConvention Convention.Impl(“MyAdapter”) Common interface for all MyAdapter Relation Expressions. Provides implementation callback method called as part of physical plan implementation ImplContext + implParm1 + implParm2 … RelNode MyAdapterTable + toRel(optTable) + asQueryable(provider,…) MyAdapterQueryable + myQuery(params) : Enumetator TranslatableTable <<instance of>> AbstractQueryableTable AbstractTableQueryable <<create>> Can convert queries in Expression myQuery() implements the call to your DB It is called by the auto generated code. It must return an Enumberable instance MyAdapterScan + register(planer) { Registers all MyAdapter Rules } <<create>> MyAdapterToEnumerableConvertorRule operands: (RelNode.class, MyAdapterConvention, EnumerableConvention) ConverterRue TableScan MyAdapterToEnumerableConvertor + implement(EnumerableRelImplementor) { ctx = new MyAdapterRel.ImplContext() getImputs().implement(ctx) return BlockBuild.append( MY_QUERY_REF, Expressions.constant(ctx.implParms1), Expressions.constant(ctx.implParms2) … EnumerableRel ConvertorImpl <<create on match >> MyAdapterProject MyAdapterFilter MyAdapterXXX RelOptRule MyAdapterProjectRu MyAdapterFilterRule MyAdapterXXXRule <<create on match >> Recursively call the implement on each MyAdapter Relation Expression Encode the myQuery(params) call as Expressions MY_QUERY_REF = Types.lookupMethod( MyAdapterQueryable.class, ”myQuery”, String.class String.class); 1 3 4 5 2 6 7 8 9 Calcite Framework MyAdapter components
  • 23. Relational Algebra 23 Scan Scan Join Filter Project Customer [c] BookOrder [b] (on customerNumber) (b.totalPrice > 0) (c.firstName, b.totalPrice) SELECT b."totalPrice", c."firstName” FROM "BookOrder" as b INNER JOIN "Customer" as c ON b."customerNumber" = c."customerNumber” WHERE b."totalPrice" > 0; Scan Scan Join Project Customer [c] BookOrder [b] (on customerNumber) (totalPrice > 0) (c.firstName, b.totalPrice) Project(firstName, customerNumber) Filter (totalPrice, customerNumber)Project optimize
  • 24. Calcite with Geode - Without Implementation 24 SELECT b."totalPrice", c."firstName” FROM "BookOrder" as b INNER JOIN "Customer" as c ON b."customerNumber" = c."customerNumber” WHERE b."totalPrice" > 0;
  • 25. Calcite with Geode – Scannable Table (Simple) 25 SELECT b."totalPrice", c."firstName” FROM "BookOrder" as b INNER JOIN "Customer" as c ON b."customerNumber" = c."customerNumber” WHERE b."totalPrice" > 0;
  • 26. Calcite with Geode – Relational (Complex) 26 SELECT b."totalPrice", c."firstName” FROM "BookOrder" as b INNER JOIN "Customer" as c ON b."customerNumber" = c."customerNumber” WHERE b."totalPrice" > 0;
  • 28. References 28 •  Big Data is Four Different Problems, 2016, M.Stonebraker: https://www.youtube.com/watch?v=S79-buNhdhI •  Turning Database Inside-Out, 2015 (M. Kleppmann) https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza •  NoSQL Distilled, 2012 (Pramod J. Sadalage and M.Fowler) https://martinfowler.com/books/nosql.html •  Architecture of a Database System, 2007 (J.M. Hellerstein, M. Stonebraker, J. Hamilton)http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf •  ORCA: A Modular Query Optimizer Architecture for Big Data: http://15721.courses.cs.cmu.edu/spring2016/papers/p337-soliman.pdf •  Apache Geode Project (2016) : http://geode.apache.org •  Geode Object Query Language (OQL) : http://bit.ly/2eKywgp •  Apache Calcite Project (2016) : https://calcite.apache.org •  Apache Geode Adapter for Apache Calcite: https://github.com/tzolov/calcite •  Relational Algebra Operations: https://www.coursera.org/learn/data-manipulation/lecture/ 4JKs1/relational-algebra-operators-union-difference-selection
  • 30. Apache Geode? “… in-memory, distributed database with strong consistency built to support low latency transactional applications at extreme scale”
  • 31. Why Apache Geode? 31 5,700 train stations 4.5 million tickets per day 20 million daily users 1.4 billion page views per day 40,000 visits per second 7,000 stations 72,000 miles of track 23 million passengers daily 120,000 concurrent users 10,000 transactions per minute https://pivotal.io/big-data/case-study/distributed-in-memory-data-management-solution https://pivotal.io/big-data/case-study/scaling-online-sales-for-the-largest-railway-in-the-world-china-railway-corporation China Railway
  • 32. Geode Features 32 •  In-Memory Data Storage –  Over 100TB Memory –  JVM Heap + Off Heap •  Any Data Format –  Key-Value/Object Store •  ACID and JTA Compliant Transactions •  HA and Linear Scalability •  Strong Consistency •  Streaming and Event Processing –  Listeners –  Distributed Functions –  Continuous OQL Queries •  Multi-site / Inter-cluster •  Full Text Search (Lucene indexes) •  Embedded and Standalone •  Top Level Apache Project
  • 33. Apache Geode Concepts Cache Server (member) Cache Region 1 Region N ValKe y v1k1 v2k2 … Cache - In-memory collection of Regions Region - consistent, distributed Map (key-value), Partitioned or Replicated CacheServer – process connected to the distributed system with created Cache ClientLocator (member) Client –read and modify the content of the distributed system Locator – tracks system members and provides membership information … Listeners Functions Functions – distributed, concurrent data processing Listener – event handler. Registers for one or more events and notified when they occur
  • 34. Geode Topology Cache ServerCache ServerCache Server Cache Data Cache Data Cache Data Peer-to-Peer Cache ServerCache ServerCache Server Cache Data Cache Data Cache Data Client Local Cache pool Client-Server Cache Server Cache Server Gateway Sender … Cache Server Gateway Receiver Cache ServerCache Server Cache Data Cache Data Cache Data Cache Data Gateway Receiver Cache Server … Gateway Sender Cache Server Cache Server Cache Data Cache Data Cache Data Cache Data WAN Multi-site Boundary Multi-Site