SlideShare a Scribd company logo
© Hortonworks Inc. 2015
Why you care about

relational algebra
(even though you didn’t know it)
Julian Hyde
@julianhyde
Enterprise Data World
Washington, DC
April 2nd, 2015
© Hortonworks Inc. 2015
About me
Apache

Calcite
Apache
Calcite
© Hortonworks Inc. 2015
Why you should care about relational algebra
Why should you care?
• It is old
• It is as useful as ever
• Exposed in new products such as Hadoop
• New challenges
Agenda
• Is Hadoop a revolution for the database world?
• What is relational algebra?
• Examples of algebra in action
• Introducing Apache Calcite
• Adding data independence to Hadoop via materialized views
© Hortonworks Inc. 2015
Hadoop
Old world, new world
RDBMS
• Security
• Metadata
• SQL
• Query planning
• Data independence
• Scale
• Late schema
• Choice of front-end
• Choice of engines
• Workload: batch, interactive,
streaming, ML, graph, …
© Hortonworks Inc. 2015
Many front ends, many engines
SQL
Planning
Execution

engine
Planning
User code
Map

Reduce
Tez User code
in Yarn
Spark MongoDB
Hadoop
External

SQL
SQL Spark Storm Cascading HBase Graph
© Hortonworks Inc. 2015
Analogy: LLVM
Lessons from the compiler

community:
• Writing a front end is hard
• Writing a back end is hard
• Writing an optimizer is really hard
• Most of the logic in the optimizer is independent of
front end and back end
• E.g. register assignment
• The optimizer is a collection of separate algorithms
• Common language between algorithms
© Hortonworks Inc. 2015
Relational algebra
SELECT d.name, COUNT(*) AS c
FROM Emps AS e
JOIN Depts AS d ON e.deptno = d.deptno
WHERE e.age < 30
GROUP BY d.deptno
HAVING COUNT(*) > 5
ORDER BY c DESC
Scan [Emps] Scan [Depts]
Join [e.deptno

= d.deptno]
Filter [e.age < 30]
Aggregate [deptno, COUNT(*) AS c]
Filter [c > 5]
Project [name, c]
Sort [c DESC]
(Column names are simplified. They would usually

be ordinals, e.g. $0 is the first column of the left input.)
© Hortonworks Inc. 2015
Relational algebra - Union and sub-query
SELECT * FROM (

SELECT zipcode, state

FROM Emps

UNION ALL

SELECT zipcode, state

FROM Customers)

WHERE state IN (‘CA’, ‘TX’)
Scan [Emps] Scan [Customers]
Union [all]
Project [zipcode, state] Project [zipcode, state]
Filter [state IN (‘CA’, ‘TX’)]
© Hortonworks Inc. 2015
Relational algebra - Insert and Values
INSERT INTO Facts

VALUES (‘Meaning of life’, 42),

(‘Clever as clever’, 6)
Insert [Facts]
Values [[‘Meaning of life’, 42],
[‘Clever as clever’, 6]]
© Hortonworks Inc. 2015
Relational algebra - Strict versus Pragmatic
“Strict” relational algebra
Introduced by E.F. Codd in “A relational
model for large shared data banks” [1970]
Goal is mathematical elegance (ability to
prove theorems)
Greek symbols: σ, π, ρ, U,
Relations cannot contain duplicates
Relations are not sorted
Column values are scalars
Only logical operators
Pragmatic relational algebra
Goal is to optimize queries, allow real-
world data models, extensibility
Elegance still important
Verbs: Project, Filter, Union, Join
Relations may contain duplicates
Relations may be sorted
• But Sort is the only logical operator
that guarantees order
Null values have 3-value semantics, as in
SQL
Physical operators (e.g. HashJoin,
MergeJoin)
Physical properties (sort, distribution)
© Hortonworks Inc. 2015
Algebraic transformations
(R filter c1) filter c2 → R filter (c1 and c2)
(R1 union R2) join R3 on c → (R1 join R3 on C) union (R2 join R3 on c)
• Compare distributive law of arithmetic: (x + y) * z → (x * z) + (y * z)
(R1 join R2 on c) filter c2 → (R1 filter c2) join R2 on c
(R1 join R2 on c) → (R2 join R2 on c) project [R1.*, R2.*]
(R1 join R2 on c) join R3 on c2 → R1 join (R2 join R3 on c2) on c
Many, many others…
(provided C2 only depends on
columns in E, and join is inner)
(provided c, c2 have the
necessary columns)
© Hortonworks Inc. 2015
Query using a view
SELECT deptno, min(salary)

FROM Managers

WHERE age >= 50

GROUP BY deptno
CREATE VIEW Managers AS

SELECT *

FROM Emps 

WHERE EXISTS (

SELECT *

FROM Emps AS underling

WHERE underling.manager = emp.id) Scan [Emps]
Join [$0, $5]
Project [$0, $1, $2, $3]
Filter [age >= 50]
Aggregate [deptno, min(salary)]
Scan [Managers]
Aggregate [manager]
Scan [Emps]
© Hortonworks Inc. 2015
After view expansion
SELECT deptno, min(salary)

FROM Managers

WHERE age >= 50

GROUP BY deptno
CREATE VIEW Managers AS

SELECT *

FROM Emps 

WHERE EXISTS (

SELECT *

FROM Emps AS underling

WHERE underling.manager = emp.id)
Scan [Emps] Aggregate [manager]
Join [$0, $5]
Project [$0, $1, $2, $3]
Filter [age >= 50]
Aggregate [deptno, min(salary)]
Scan [Emps]
© Hortonworks Inc. 2015
After pushing down filter
SELECT deptno, min(salary)

FROM Managers

WHERE age >= 50

GROUP BY deptno
CREATE VIEW Managers AS

SELECT *

FROM Emps 

WHERE EXISTS (

SELECT *

FROM Emps AS underling

WHERE underling.manager = emp.id)
Scan [Emps]
Scan [Emps]
Join [$0, $5]
Project [$0, $1, $2, $3]
Filter [age >= 50]
Aggregate [deptno, min(salary)]
© Hortonworks Inc. 2015
Materialized view
CREATE MATERIALIZED VIEW EmpSummary AS

SELECT deptno,

gender,

COUNT(*) AS c,

SUM(sal) AS s

FROM Emps

GROUP BY deptno, gender
SELECT COUNT(*)

FROM Emps

WHERE deptno = 10

AND gender = ‘M’
Scan [Emps]
Aggregate [deptno, gender,

COUNT(*), SUM(sal)]
Scan [EmpSummary] =
Scan [Emps]
Filter [deptno = 10 AND gender = ‘M’]
Aggregate [COUNT(*)]
© Hortonworks Inc. 2015
Materialized view, step 2: Rewrite query to match
CREATE MATERIALIZED VIEW EmpSummary AS

SELECT deptno,

gender,

COUNT(*) AS c,

SUM(sal) AS s

FROM Emps

GROUP BY deptno, gender
SELECT COUNT(*)

FROM Emps

WHERE deptno = 10

AND gender = ‘M’
Scan [Emps]
Aggregate [deptno, gender,

COUNT(*), SUM(sal)]
Scan [EmpSummary] =
Scan [Emps]
Filter [deptno = 10 AND gender = ‘M’]
Aggregate [deptno, gender,

COUNT(*) AS c, SUM(sal) AS s]
Project [c]
© Hortonworks Inc. 2015
Materialized view, step 3: Substitute table
CREATE MATERIALIZED VIEW EmpSummary AS

SELECT deptno,

gender,

COUNT(*) AS c,

SUM(sal) AS s

FROM Emps

GROUP BY deptno, gender
SELECT COUNT(*)

FROM Emps

WHERE deptno = 10

AND gender = ‘M’
Scan [Emps]
Aggregate [deptno, gender,

COUNT(*), SUM(sal)]
Scan [EmpSummary] =
Filter [deptno = 10 AND gender = ‘M’]
Project [c]
Scan [EmpSummary]
© Hortonworks Inc. 2015
Streaming
SELECT STREAM DISTINCT productName,

floor(rowtime TO HOUR) AS h

FROM Orders
Delta
Converts a table to a stream
Each time a row is inserted into the table, a
record appears in the stream
Chi
Converts a stream into a table
Often we can safely narrow the table down to a
small time window
Chi
Aggregate [productName, h]
Scan [Orders]
Project [productName,

floor(rowtime TO HOUR) AS h]
Delta
© Hortonworks Inc. 2015
Streaming - efficient implementation
SELECT STREAM DISTINCT productName,

floor(rowtime TO HOUR) AS h

FROM Orders
Can create efficient implementation:
• Input is sorted by timestamp
• Only need to aggregate an hour at a time
• Output timestamp tracks input timestamp
• Therefore it is safe to cancel out the Chi
and Delta operators

StreamingAggregate [productName, h]
Scan [Orders]
Project [productName,

floor(rowtime TO HOUR) AS h]
© Hortonworks Inc. 2015
Algebraic transformations - streaming
delta(filter(c, R)) → filter(delta(c, R))
delta(project(e1, …, en, R) → project(delta(e1, …, en, R))
delta(union(R1, R2)) → union(delta(R1), delta(R2))
delta(join(R1, R2, c)) → union(join(R1, delta(R2), c),

join(delta(R1), R2), c)
Delta behaves like “differentiate” in differential calculus,
Chi like “integrate”.
(f + g)’ = f’ + g’
(f . g)’ = f.g’ + f’.g
© Hortonworks Inc. 2015
Apache Calcite
Apache

Calcite
Apache
Calcite
© Hortonworks Inc. 2015
Apache Calcite
Apache incubator project since May, 2014
• Originally named Optiq
Query planning framework
• Relational algebra, rewrite rules, cost model
• Extensible
Packaging
• Library (JDBC server optional)
• Open source
• Community-authored rules, adapters
Adoption
• Embedded: Lingual (SQL interface to Cascading), Apache Drill, Apache Hive, Kylin OLAP,
Apache Phoenix, Apache Samza
• Adapters: Splunk, Spark, MongoDB, JDBC, CSV, JSON, Web tables, In-memory data
© Hortonworks Inc. 2015
Conventional DB architecture
© Hortonworks Inc. 2015
Calcite architecture
© Hortonworks Inc. 2015
Calcite – APIs and SPIs
Cost, statistics
RelOptCost
RelOptCostFactory
RelMetadataProvider
• RelMdColumnUniquensss
• RelMdDistinctRowCount
• RelMdSelectivity
SQL parser
SqlNode

SqlParser

SqlValidator
Transformation rules
RelOptRule
• MergeFilterRule
• PushAggregateThroughUnionRule
• 100+ more
Global transformations
• Unification (materialized view)
• Column trimming
• De-correlation
• Join ordering
Relational algebra
RelNode (operator)
• Scan
• Filter
• Project
• Union
• Aggregate
• …
RelDataType (type)
RexNode (expression)
RelTrait (physical property)
• RelConvention (calling-convention)
• RelCollation (sort-order)
• RelDistribution (partitions)
JDBC driver
Metadata
Schema
Table
Function
• TableFunction
• TableMacro
Lattice
© Hortonworks Inc. 2015
Data independence
A core principle of data management
Data independence is a contract:
• Applications do not make assumptions about the location or organization of data
• The DBMS chooses the most efficient access path
Requires:
• Declarative query language
• Query planner
Allows:
• The DBMS (or administrator) can re-organize the data without breaking the
application
• Redundant copies of the data (indexes, materialized views, replicas)
• Novel algorithms
• Novel data formats and organizations (e.g. b-tree, r-tree, column store)
© Hortonworks Inc. 2015
Disk
Hadoop
B2B1
B3 B4
Memory
CPU
Name
node
(HDFS)
Application
master
(YARN)
Zookeeper
© Hortonworks Inc. 2015
Commodity hardware
Storage, memory and CPU all scale as you add nodes
N replicas of each block (typically 3) give redundancy & scheduling flexibility
Disk
Hadoop scales
B2B1
B3 B4
Memory
CPU
Disk
B3B1
B5
Memory
CPU
Disk
B4B1
B5
Memory
CPU
Disk
B3B2
B6
Memory
CPU
Disk
B5B2
B6
Memory
CPU
B3
© Hortonworks Inc. 2015
Data flow among operators running on nodes
Nodes are assigned to work on blocks that have a replica locally
Memory is used for file blocks and for scratch space (e.g. hash tables)
Disk
Hadoop query execution
B2B1
B3 B4
Memory
CPU
Disk
B3B1
B5
Memory
CPU
Disk
B4B1
B5
Memory
CPU
Disk
B3B2
B6
Memory
CPU
Disk
B5B2
B6
Memory
CPU
B3
B1 B3 B4 B21 1 11
© Hortonworks Inc. 2015
Data independence and Hadoop
Hadoop is very flexible when data is loaded
That flexibility has made it hard for the system to optimize access
Materialized views are an opportunity to “crack” the data, and create copies in
other formats
Page‹#› © Hortonworks Inc. 2014
Calcite: Lattices and tiles
Materialized view
A table whose contents are guaranteed to be the same as
executing a given query.
Lattice
Recommends, builds, and recognizes summary
materialized views (tiles) based on a star schema.
A query defines the tables and many:1 relationships in the
star schema.
Tile
A summary materialized view that belongs to a lattice.
A tile may or may not be materialized.
Materialization methods:
• Declare in lattice
• Generate via recommender algorithm
• Created in response to query
CREATE MATERIALIZED VIEW t AS
SELECT * FROM Emps
WHERE deptno = 10;
CREATE LATTICE star AS
SELECT *
FROM Sales AS s
JOIN Products AS p ON …
JOIN ProductClasses AS pc ON …
JOIN Customers AS c ON …
JOIN Time AS t ON …;
CREATE MATERIALIZED VIEW zg IN star

SELECT gender, zipcode,
COUNT(*), SUM(unit_sales)

FROM star

GROUP BY gender, zipcode;
(FAKE SYNTAX)
© Hortonworks Inc. 2015
Query: SELECT x, SUM(y) FROM t GROUP BY x
In-memory

materialized
queries
Tables

on disk
Tiled, in-memory materializations
Where we’re going… algebraic cache: http://hortonworks.com/blog/dmmq/
© Hortonworks Inc. 2015
Summary
1. Relational algebra allows us to reason about queries, and
is the foundation of query planning
2. Hadoop is deconstructing the DBMS, and enabling new
languages, engines and data formats
3. Data independence is more important than ever
4. Apache Calcite - an implementation of relational algebra
© Hortonworks Inc. 2015
Thank you!
@julianhyde
http://calcite.incubator.apache.org
Apache

Calcite
Apache
Calcite

More Related Content

What's hot

Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
Michael Mior
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
Julian Hyde
 
Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / Optiq
Julian Hyde
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
Julian Hyde
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
Julian Hyde
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Julian Hyde
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Julian Hyde
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
Evans Ye
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
Julian Hyde
 
SQL on Big Data using Optiq
SQL on Big Data using OptiqSQL on Big Data using Optiq
SQL on Big Data using Optiq
Julian Hyde
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Christian Tzolov
 
Cost-based Query Optimization
Cost-based Query Optimization Cost-based Query Optimization
Cost-based Query Optimization
DataWorks Summit/Hadoop Summit
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overview
Julian Hyde
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
Julian Hyde
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
Julian Hyde
 
Query optimization techniques in Apache Hive
Query optimization techniques in Apache Hive Query optimization techniques in Apache Hive
Query optimization techniques in Apache Hive
Zara Tariq
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
Julian Hyde
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
Julian Hyde
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
Josh Elser
 

What's hot (20)

Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / Optiq
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
SQL on Big Data using Optiq
SQL on Big Data using OptiqSQL on Big Data using Optiq
SQL on Big Data using Optiq
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
 
Cost-based Query Optimization
Cost-based Query Optimization Cost-based Query Optimization
Cost-based Query Optimization
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overview
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
Query optimization techniques in Apache Hive
Query optimization techniques in Apache Hive Query optimization techniques in Apache Hive
Query optimization techniques in Apache Hive
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
 

Viewers also liked

Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits all
Julian Hyde
 
Discardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With HadoopDiscardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With Hadoop
Julian Hyde
 
The twins that everyone loved too much
The twins that everyone loved too muchThe twins that everyone loved too much
The twins that everyone loved too much
Julian Hyde
 
What's new in Mondrian 4?
What's new in Mondrian 4?What's new in Mondrian 4?
What's new in Mondrian 4?
Julian Hyde
 
Optiq: A dynamic data management framework
Optiq: A dynamic data management frameworkOptiq: A dynamic data management framework
Optiq: A dynamic data management framework
Julian Hyde
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
 Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Julian Hyde
 
Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016
Chris Fregly
 
Improve Mondrian MDX usability with user defined functions
Improve Mondrian MDX usability with user defined functionsImprove Mondrian MDX usability with user defined functions
Improve Mondrian MDX usability with user defined functions
Raimonds Simanovskis
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
Julian Hyde
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
Jordan Halterman
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
HBaseCon
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Chris Fregly
 

Viewers also liked (13)

Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits all
 
Discardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With HadoopDiscardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With Hadoop
 
The twins that everyone loved too much
The twins that everyone loved too muchThe twins that everyone loved too much
The twins that everyone loved too much
 
What's new in Mondrian 4?
What's new in Mondrian 4?What's new in Mondrian 4?
What's new in Mondrian 4?
 
Optiq: A dynamic data management framework
Optiq: A dynamic data management frameworkOptiq: A dynamic data management framework
Optiq: A dynamic data management framework
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
 Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
 
Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016
 
Improve Mondrian MDX usability with user defined functions
Improve Mondrian MDX usability with user defined functionsImprove Mondrian MDX usability with user defined functions
Improve Mondrian MDX usability with user defined functions
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
 

Similar to Why you care about
 relational algebra (even though you didn’t know it)

Polyalgebra
PolyalgebraPolyalgebra
Cost-based Query Optimization in Hive
Cost-based Query Optimization in HiveCost-based Query Optimization in Hive
Cost-based Query Optimization in Hive
DataWorks Summit
 
phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016
Maryann Xue
 
Cost-Based query optimization
Cost-Based query optimizationCost-Based query optimization
Cost-Based query optimization
DataWorks Summit/Hadoop Summit
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
Neo4j
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Julian Hyde
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application code
DataWorks Summit
 
Keynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczKeynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter Boncz
LDBC council
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
Ioan Toma
 
Orm and hibernate
Orm and hibernateOrm and hibernate
Orm and hibernate
s4al_com
 
Mondrian - Geo Mondrian
Mondrian - Geo MondrianMondrian - Geo Mondrian
Mondrian - Geo Mondrian
Simone Campora
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
Hortonworks
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolio
Chris Seebacher
 
Bi Ppt Portfolio Elmer Donavan
Bi Ppt Portfolio  Elmer DonavanBi Ppt Portfolio  Elmer Donavan
Bi Ppt Portfolio Elmer Donavan
EJDonavan
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j
 
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
Amazon Web Services
 
Spark sql meetup
Spark sql meetupSpark sql meetup
Spark sql meetup
Michael Zhang
 
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem-  Impetus TechnologiesBig Data Ecosystem-  Impetus Technologies
Big Data Ecosystem- Impetus Technologies
Impetus Technologies
 
Powerpivot web wordpress present
Powerpivot web wordpress presentPowerpivot web wordpress present
Powerpivot web wordpress present
MariAnne Woehrle
 
Nitin\'s Business Intelligence Portfolio
Nitin\'s Business Intelligence PortfolioNitin\'s Business Intelligence Portfolio
Nitin\'s Business Intelligence Portfolio
npatel2362
 

Similar to Why you care about
 relational algebra (even though you didn’t know it) (20)

Polyalgebra
PolyalgebraPolyalgebra
Polyalgebra
 
Cost-based Query Optimization in Hive
Cost-based Query Optimization in HiveCost-based Query Optimization in Hive
Cost-based Query Optimization in Hive
 
phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016
 
Cost-Based query optimization
Cost-Based query optimizationCost-Based query optimization
Cost-Based query optimization
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application code
 
Keynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczKeynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter Boncz
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
 
Orm and hibernate
Orm and hibernateOrm and hibernate
Orm and hibernate
 
Mondrian - Geo Mondrian
Mondrian - Geo MondrianMondrian - Geo Mondrian
Mondrian - Geo Mondrian
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolio
 
Bi Ppt Portfolio Elmer Donavan
Bi Ppt Portfolio  Elmer DonavanBi Ppt Portfolio  Elmer Donavan
Bi Ppt Portfolio Elmer Donavan
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael Moore
 
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
 
Spark sql meetup
Spark sql meetupSpark sql meetup
Spark sql meetup
 
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem-  Impetus TechnologiesBig Data Ecosystem-  Impetus Technologies
Big Data Ecosystem- Impetus Technologies
 
Powerpivot web wordpress present
Powerpivot web wordpress presentPowerpivot web wordpress present
Powerpivot web wordpress present
 
Nitin\'s Business Intelligence Portfolio
Nitin\'s Business Intelligence PortfolioNitin\'s Business Intelligence Portfolio
Nitin\'s Business Intelligence Portfolio
 

More from Julian Hyde

Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Julian Hyde
 
Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using Calcite
Julian Hyde
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!
Julian Hyde
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
Julian Hyde
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
Julian Hyde
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
Julian Hyde
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
Julian Hyde
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
Julian Hyde
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
Julian Hyde
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're Incubating
Julian Hyde
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Julian Hyde
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
Julian Hyde
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
Julian Hyde
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
Julian Hyde
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
Julian Hyde
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
Julian Hyde
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
Julian Hyde
 

More from Julian Hyde (18)

Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using Calcite
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're Incubating
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 

Recently uploaded

Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
mbawufebxi
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 

Recently uploaded (20)

Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 

Why you care about
 relational algebra (even though you didn’t know it)

  • 1. © Hortonworks Inc. 2015 Why you care about
 relational algebra (even though you didn’t know it) Julian Hyde @julianhyde Enterprise Data World Washington, DC April 2nd, 2015
  • 2. © Hortonworks Inc. 2015 About me Apache
 Calcite Apache Calcite
  • 3. © Hortonworks Inc. 2015 Why you should care about relational algebra Why should you care? • It is old • It is as useful as ever • Exposed in new products such as Hadoop • New challenges Agenda • Is Hadoop a revolution for the database world? • What is relational algebra? • Examples of algebra in action • Introducing Apache Calcite • Adding data independence to Hadoop via materialized views
  • 4. © Hortonworks Inc. 2015 Hadoop Old world, new world RDBMS • Security • Metadata • SQL • Query planning • Data independence • Scale • Late schema • Choice of front-end • Choice of engines • Workload: batch, interactive, streaming, ML, graph, …
  • 5. © Hortonworks Inc. 2015 Many front ends, many engines SQL Planning Execution
 engine Planning User code Map
 Reduce Tez User code in Yarn Spark MongoDB Hadoop External
 SQL SQL Spark Storm Cascading HBase Graph
  • 6. © Hortonworks Inc. 2015 Analogy: LLVM Lessons from the compiler
 community: • Writing a front end is hard • Writing a back end is hard • Writing an optimizer is really hard • Most of the logic in the optimizer is independent of front end and back end • E.g. register assignment • The optimizer is a collection of separate algorithms • Common language between algorithms
  • 7. © Hortonworks Inc. 2015 Relational algebra SELECT d.name, COUNT(*) AS c FROM Emps AS e JOIN Depts AS d ON e.deptno = d.deptno WHERE e.age < 30 GROUP BY d.deptno HAVING COUNT(*) > 5 ORDER BY c DESC Scan [Emps] Scan [Depts] Join [e.deptno
 = d.deptno] Filter [e.age < 30] Aggregate [deptno, COUNT(*) AS c] Filter [c > 5] Project [name, c] Sort [c DESC] (Column names are simplified. They would usually
 be ordinals, e.g. $0 is the first column of the left input.)
  • 8. © Hortonworks Inc. 2015 Relational algebra - Union and sub-query SELECT * FROM (
 SELECT zipcode, state
 FROM Emps
 UNION ALL
 SELECT zipcode, state
 FROM Customers)
 WHERE state IN (‘CA’, ‘TX’) Scan [Emps] Scan [Customers] Union [all] Project [zipcode, state] Project [zipcode, state] Filter [state IN (‘CA’, ‘TX’)]
  • 9. © Hortonworks Inc. 2015 Relational algebra - Insert and Values INSERT INTO Facts
 VALUES (‘Meaning of life’, 42),
 (‘Clever as clever’, 6) Insert [Facts] Values [[‘Meaning of life’, 42], [‘Clever as clever’, 6]]
  • 10. © Hortonworks Inc. 2015 Relational algebra - Strict versus Pragmatic “Strict” relational algebra Introduced by E.F. Codd in “A relational model for large shared data banks” [1970] Goal is mathematical elegance (ability to prove theorems) Greek symbols: σ, π, ρ, U, Relations cannot contain duplicates Relations are not sorted Column values are scalars Only logical operators Pragmatic relational algebra Goal is to optimize queries, allow real- world data models, extensibility Elegance still important Verbs: Project, Filter, Union, Join Relations may contain duplicates Relations may be sorted • But Sort is the only logical operator that guarantees order Null values have 3-value semantics, as in SQL Physical operators (e.g. HashJoin, MergeJoin) Physical properties (sort, distribution)
  • 11. © Hortonworks Inc. 2015 Algebraic transformations (R filter c1) filter c2 → R filter (c1 and c2) (R1 union R2) join R3 on c → (R1 join R3 on C) union (R2 join R3 on c) • Compare distributive law of arithmetic: (x + y) * z → (x * z) + (y * z) (R1 join R2 on c) filter c2 → (R1 filter c2) join R2 on c (R1 join R2 on c) → (R2 join R2 on c) project [R1.*, R2.*] (R1 join R2 on c) join R3 on c2 → R1 join (R2 join R3 on c2) on c Many, many others… (provided C2 only depends on columns in E, and join is inner) (provided c, c2 have the necessary columns)
  • 12. © Hortonworks Inc. 2015 Query using a view SELECT deptno, min(salary)
 FROM Managers
 WHERE age >= 50
 GROUP BY deptno CREATE VIEW Managers AS
 SELECT *
 FROM Emps 
 WHERE EXISTS (
 SELECT *
 FROM Emps AS underling
 WHERE underling.manager = emp.id) Scan [Emps] Join [$0, $5] Project [$0, $1, $2, $3] Filter [age >= 50] Aggregate [deptno, min(salary)] Scan [Managers] Aggregate [manager] Scan [Emps]
  • 13. © Hortonworks Inc. 2015 After view expansion SELECT deptno, min(salary)
 FROM Managers
 WHERE age >= 50
 GROUP BY deptno CREATE VIEW Managers AS
 SELECT *
 FROM Emps 
 WHERE EXISTS (
 SELECT *
 FROM Emps AS underling
 WHERE underling.manager = emp.id) Scan [Emps] Aggregate [manager] Join [$0, $5] Project [$0, $1, $2, $3] Filter [age >= 50] Aggregate [deptno, min(salary)] Scan [Emps]
  • 14. © Hortonworks Inc. 2015 After pushing down filter SELECT deptno, min(salary)
 FROM Managers
 WHERE age >= 50
 GROUP BY deptno CREATE VIEW Managers AS
 SELECT *
 FROM Emps 
 WHERE EXISTS (
 SELECT *
 FROM Emps AS underling
 WHERE underling.manager = emp.id) Scan [Emps] Scan [Emps] Join [$0, $5] Project [$0, $1, $2, $3] Filter [age >= 50] Aggregate [deptno, min(salary)]
  • 15. © Hortonworks Inc. 2015 Materialized view CREATE MATERIALIZED VIEW EmpSummary AS
 SELECT deptno,
 gender,
 COUNT(*) AS c,
 SUM(sal) AS s
 FROM Emps
 GROUP BY deptno, gender SELECT COUNT(*)
 FROM Emps
 WHERE deptno = 10
 AND gender = ‘M’ Scan [Emps] Aggregate [deptno, gender,
 COUNT(*), SUM(sal)] Scan [EmpSummary] = Scan [Emps] Filter [deptno = 10 AND gender = ‘M’] Aggregate [COUNT(*)]
  • 16. © Hortonworks Inc. 2015 Materialized view, step 2: Rewrite query to match CREATE MATERIALIZED VIEW EmpSummary AS
 SELECT deptno,
 gender,
 COUNT(*) AS c,
 SUM(sal) AS s
 FROM Emps
 GROUP BY deptno, gender SELECT COUNT(*)
 FROM Emps
 WHERE deptno = 10
 AND gender = ‘M’ Scan [Emps] Aggregate [deptno, gender,
 COUNT(*), SUM(sal)] Scan [EmpSummary] = Scan [Emps] Filter [deptno = 10 AND gender = ‘M’] Aggregate [deptno, gender,
 COUNT(*) AS c, SUM(sal) AS s] Project [c]
  • 17. © Hortonworks Inc. 2015 Materialized view, step 3: Substitute table CREATE MATERIALIZED VIEW EmpSummary AS
 SELECT deptno,
 gender,
 COUNT(*) AS c,
 SUM(sal) AS s
 FROM Emps
 GROUP BY deptno, gender SELECT COUNT(*)
 FROM Emps
 WHERE deptno = 10
 AND gender = ‘M’ Scan [Emps] Aggregate [deptno, gender,
 COUNT(*), SUM(sal)] Scan [EmpSummary] = Filter [deptno = 10 AND gender = ‘M’] Project [c] Scan [EmpSummary]
  • 18. © Hortonworks Inc. 2015 Streaming SELECT STREAM DISTINCT productName,
 floor(rowtime TO HOUR) AS h
 FROM Orders Delta Converts a table to a stream Each time a row is inserted into the table, a record appears in the stream Chi Converts a stream into a table Often we can safely narrow the table down to a small time window Chi Aggregate [productName, h] Scan [Orders] Project [productName,
 floor(rowtime TO HOUR) AS h] Delta
  • 19. © Hortonworks Inc. 2015 Streaming - efficient implementation SELECT STREAM DISTINCT productName,
 floor(rowtime TO HOUR) AS h
 FROM Orders Can create efficient implementation: • Input is sorted by timestamp • Only need to aggregate an hour at a time • Output timestamp tracks input timestamp • Therefore it is safe to cancel out the Chi and Delta operators
 StreamingAggregate [productName, h] Scan [Orders] Project [productName,
 floor(rowtime TO HOUR) AS h]
  • 20. © Hortonworks Inc. 2015 Algebraic transformations - streaming delta(filter(c, R)) → filter(delta(c, R)) delta(project(e1, …, en, R) → project(delta(e1, …, en, R)) delta(union(R1, R2)) → union(delta(R1), delta(R2)) delta(join(R1, R2, c)) → union(join(R1, delta(R2), c),
 join(delta(R1), R2), c) Delta behaves like “differentiate” in differential calculus, Chi like “integrate”. (f + g)’ = f’ + g’ (f . g)’ = f.g’ + f’.g
  • 21. © Hortonworks Inc. 2015 Apache Calcite Apache
 Calcite Apache Calcite
  • 22. © Hortonworks Inc. 2015 Apache Calcite Apache incubator project since May, 2014 • Originally named Optiq Query planning framework • Relational algebra, rewrite rules, cost model • Extensible Packaging • Library (JDBC server optional) • Open source • Community-authored rules, adapters Adoption • Embedded: Lingual (SQL interface to Cascading), Apache Drill, Apache Hive, Kylin OLAP, Apache Phoenix, Apache Samza • Adapters: Splunk, Spark, MongoDB, JDBC, CSV, JSON, Web tables, In-memory data
  • 23. © Hortonworks Inc. 2015 Conventional DB architecture
  • 24. © Hortonworks Inc. 2015 Calcite architecture
  • 25. © Hortonworks Inc. 2015 Calcite – APIs and SPIs Cost, statistics RelOptCost RelOptCostFactory RelMetadataProvider • RelMdColumnUniquensss • RelMdDistinctRowCount • RelMdSelectivity SQL parser SqlNode
 SqlParser
 SqlValidator Transformation rules RelOptRule • MergeFilterRule • PushAggregateThroughUnionRule • 100+ more Global transformations • Unification (materialized view) • Column trimming • De-correlation • Join ordering Relational algebra RelNode (operator) • Scan • Filter • Project • Union • Aggregate • … RelDataType (type) RexNode (expression) RelTrait (physical property) • RelConvention (calling-convention) • RelCollation (sort-order) • RelDistribution (partitions) JDBC driver Metadata Schema Table Function • TableFunction • TableMacro Lattice
  • 26. © Hortonworks Inc. 2015 Data independence A core principle of data management Data independence is a contract: • Applications do not make assumptions about the location or organization of data • The DBMS chooses the most efficient access path Requires: • Declarative query language • Query planner Allows: • The DBMS (or administrator) can re-organize the data without breaking the application • Redundant copies of the data (indexes, materialized views, replicas) • Novel algorithms • Novel data formats and organizations (e.g. b-tree, r-tree, column store)
  • 27. © Hortonworks Inc. 2015 Disk Hadoop B2B1 B3 B4 Memory CPU Name node (HDFS) Application master (YARN) Zookeeper
  • 28. © Hortonworks Inc. 2015 Commodity hardware Storage, memory and CPU all scale as you add nodes N replicas of each block (typically 3) give redundancy & scheduling flexibility Disk Hadoop scales B2B1 B3 B4 Memory CPU Disk B3B1 B5 Memory CPU Disk B4B1 B5 Memory CPU Disk B3B2 B6 Memory CPU Disk B5B2 B6 Memory CPU B3
  • 29. © Hortonworks Inc. 2015 Data flow among operators running on nodes Nodes are assigned to work on blocks that have a replica locally Memory is used for file blocks and for scratch space (e.g. hash tables) Disk Hadoop query execution B2B1 B3 B4 Memory CPU Disk B3B1 B5 Memory CPU Disk B4B1 B5 Memory CPU Disk B3B2 B6 Memory CPU Disk B5B2 B6 Memory CPU B3 B1 B3 B4 B21 1 11
  • 30. © Hortonworks Inc. 2015 Data independence and Hadoop Hadoop is very flexible when data is loaded That flexibility has made it hard for the system to optimize access Materialized views are an opportunity to “crack” the data, and create copies in other formats
  • 31. Page‹#› © Hortonworks Inc. 2014 Calcite: Lattices and tiles Materialized view A table whose contents are guaranteed to be the same as executing a given query. Lattice Recommends, builds, and recognizes summary materialized views (tiles) based on a star schema. A query defines the tables and many:1 relationships in the star schema. Tile A summary materialized view that belongs to a lattice. A tile may or may not be materialized. Materialization methods: • Declare in lattice • Generate via recommender algorithm • Created in response to query CREATE MATERIALIZED VIEW t AS SELECT * FROM Emps WHERE deptno = 10; CREATE LATTICE star AS SELECT * FROM Sales AS s JOIN Products AS p ON … JOIN ProductClasses AS pc ON … JOIN Customers AS c ON … JOIN Time AS t ON …; CREATE MATERIALIZED VIEW zg IN star
 SELECT gender, zipcode, COUNT(*), SUM(unit_sales)
 FROM star
 GROUP BY gender, zipcode; (FAKE SYNTAX)
  • 32. © Hortonworks Inc. 2015 Query: SELECT x, SUM(y) FROM t GROUP BY x In-memory
 materialized queries Tables
 on disk Tiled, in-memory materializations Where we’re going… algebraic cache: http://hortonworks.com/blog/dmmq/
  • 33. © Hortonworks Inc. 2015 Summary 1. Relational algebra allows us to reason about queries, and is the foundation of query planning 2. Hadoop is deconstructing the DBMS, and enabling new languages, engines and data formats 3. Data independence is more important than ever 4. Apache Calcite - an implementation of relational algebra
  • 34. © Hortonworks Inc. 2015 Thank you! @julianhyde http://calcite.incubator.apache.org Apache
 Calcite Apache Calcite