SlideShare a Scribd company logo
1 of 57
Download to read offline
Compiling openCypher graph queries
with Spark Catalyst
Gábor Szárnyas
pre-holiday Spark des fêtes @ Montréal
BACKGROUND
 PhD student @ Budapest Univ. of Tech. and Econ., Hungary
 Visiting researcher @ McGill University
RESEARCH TOPIC
Problem statement
 Large graph (100M+ nodes)
 Complex global graph queries
 Evaluate them in <1sec
Approach
 We “cheat”
 Build a huge cache
 Maintain results:
incremental views
RESEARCH OBJECTIVES
Create a scalable graph query engine with incremental views
1. graph queries
2. incremental views
3. making it scale
Graph queries
PROPERTY GRAPH DATABASES
NoSQL family
Data model:
vertices, edges
and properties
#1 query approach: graph pattern matching
Note. Spark GraphX is an engine for graph analytics.
CYPHER AND OPENCYPHER
Cypher: query language of the Neo4j graph database.
„Cypher is a declarative, SQL-inspired language for describing
patterns in graphs visually using an ascii-art syntax.”
MATCH
(p:Person)-[:PRESENTER_OF]->(:Presentation)-[:AT]->(m:Meetup)
WHERE m.date = 'Monday, December 18, 2017'
RETURN p
„The openCypher project aims to deliver a full and open
specification of the industry’s most widely adopted graph
database query language: Cypher.” (late 2015)
OPENCYPHER SYSTEMS
 Increasing adoption
 Relational databases:
o SAP HANA
o AGENS Graph
 Research prototypes:
o Graphflow (Univesity of Waterloo)
o ingraph (incremental graph engine)
(Source: Keynote talk @ GraphConnect NYC 2017)
LINKED DATA BENCHMARK COUNCIL
LDBC is a non-profit organization dedicated to establishing
benchmarks, benchmark practices and benchmark results for
graph data management software.
LDBC’s Social Network Benchmark is an industrial and academic
initiative, formed by principal actors in the field of graph-like
data management.
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
OLAP
local queries
global queries
global computations
OLAP global queries
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
local queries
global computations
Example: „Friends’ recent likes”
MATCH (u:User {id: $userId})-[:FRIEND]-
(f:User)-[l:LIKES]->(p:Post)
RETURN f, p
ORDER BY l.timestamp DESC
LIMIT 10
OLAP global queries
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
local queries
global computations
Orri Erling et al.,
The LDBC Social Network Benchmark: Interactive Workload,
SIGMOD 2015
14 queries and 8 updates
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
local queries
global computations
OLAP global queries
Example: „One-sided friendships”
MATCH (u1:User)-[:FRIEND]-(u2:User)-[l:LIKES]->(p:Post),
(u1)-[:AUTHOR_OF]->(p)
WITH u1, u2, count(l) AS likes
WHERE likes > 10
AND NOT (u1)-[:LIKES]->(:Post)<-[:AUTHOR_OF]-(u2)
RETURN u1, u2
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
local queries
global computations
Arnau Prat, Gábor Szárnyas, Alex Averbuch et al.,
The LDBC Social Network Benchmark: BI Workload,
Technical report available, peer-reviewed paper in 2018
OLAP global queries
25 queries with infrequent executions
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
OLAP
local queries
global queries
global computations
• PageRank
• Shortest paths
• Clustering coefficient
Example: „Find the most central individuals.”
Spark: GraphX | Flink: Gelly | Neo4j: Graph Algorithms library
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
OLAP
local queries
global queries
global computations
Alexandru Iosup et al.,
LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on
Parallel and Distributed Platforms,
VLDB 2016
One-time execution
OVERVIEW OF GRAPH PROCESSING
OLTP
analytics
OLAP
local queries
global queries
global computations
Incremental view maintenance
CYBER-PHYSICAL SYSTEMS: LIVE RAILWAY MODEL
Trailing the switch
Proximity detection
CYBER-PHYSICAL SYSTEMS: LIVE RAILWAY MODEL
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
ON
a b
1
NEXT
ON
NEXT
PROXIMITY DETECTION
seg
1
NEXT: 1..2
t1
ON
MATCH
(t1:Train)-[:ON]->(seg1:Segment)
-[:NEXT*1..2]->(seg2:Segment)
<-[:ON]-(t2:Train)
RETURN t1, t2, seg1, seg2
seg
2
t2
ON
≤ 𝟏 segments
TRAILING THE SWITCH
seg div
t
STRAIGHT
ON
MATCH (t:Train)-[:ON]->(seg:Segment)
<-[:STRAIGHT]-(sw:Switch)
WHERE sw.position = 'diverging'
RETURN t.number, sw
Evaluate
continuously
INCREMENTAL QUERIES
 Register a set of standing queries
 Continuously evaluate queries on changes
 Approach: build a cache and maintain its content
 First publication: 1974, the Rete algorithm
ingraphclient
register queries
query results
change notifications
update graph
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
div
STRAIGHT
Trailing the switch
ON
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e2
ON
a1
ON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
div
STRAIGHT
Trailing the switch
ON
a
1
ON
e
2
ON
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e2
ON
a1
ON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
div
STRAIGHT
Trailing the switch
ON
e div
STRAIGHT
e div
STRAIGHT
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e div
STRAIGHT
e2
ON
a1
ON
e div
STRAIGHT
2
ON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
div
STRAIGHT
Trailing the switch
ON
e div
STRAIGHT
e2
ON
e div
2
STRAIGHT
ON
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e div
STRAIGHT
e2
ON
a1
ON
div
STRAIGHTON
e div
STRAIGHT
2
ON
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
div
STRAIGHT
Trailing the switch
ON
e2 div
STRAIGHTON
e2
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e div
STRAIGHT
e2
ON
a1
ON
e div
STRAIGHT
2
ON
e div
STRAIGHT
2
ON
div2
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
div
STRAIGHT
Trailing the switch
ON
div
2
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e div
STRAIGHT
e2
ON
a1
ON
e div
STRAIGHT
2
ON
e div
STRAIGHT
2
ON
div2
c d e
g
fdiv
2
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
ON
div
STRAIGHT
Trailing the switch
ON
div
2
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e div
STRAIGHT
e2
ON
a1
ON
e div
STRAIGHT
2
ON
e div
STRAIGHT
2
ON
div2
c e
g
fdiv
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
div
STRAIGHT
Trailing the switch
ON
div
ON
2
d
πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e div
STRAIGHT
d2
ON
a1
ON
e div
STRAIGHT
2
ON
e div
STRAIGHT
2
ON
div2
c e
g
fdiv
NEXT NEXT
STRAIGHT TOP
a b
1
NEXT NEXT
ON
div
STRAIGHT
Trailing the switch
ON
div
ON
2
d
GRAPH RELATIONAL ALGEBRA
 Basic relational algebra
o projection, selection, join, left outer join, antijoin, union
 Common extensions
o aggregation (𝛾), duplicate-elimination (𝛿), sort (𝜏), top (𝜆)
 Graph-specific extensions
o get-vertices ()
o expand-out (↑), expand-in (↓), expand-both (↕)
J. Marton, G. Szárnyas, D. Varró:
Formalising openCypher Graph Queries in Relational Algebra,
ADBIS, Springer, 2017
THEORETICAL ISSUES WITH INCREMENTAL CYPHER
 Graph RA is not incrementally maintainable
o Expand operators
o Property access needs nested data structures (0NF)
o Ordering
o Weak schema
 Most incremental approaches work on flat relational algebra:
o Transform graph relational algebra to a flat one
o Optimize query
G. Szárnyas:
Incremental View Maintenance for Property Graph Queries,
arXiv preprint, 2017
PROPOSED WORKFLOW
 Parse
 Compile
 Evaluate query
openCypher
query
Magic
Deployed
query
AST
QUERY “TRAILING THE SWITCH”
PROPERTY ACCESS
Assuming that x is a column of a
graph relation, we use the notation
“x.a” in selection conditions to
express the access to the
corresponding value of property a in
the property graph.
J. Hölsch, M. Grossniklaus:
An algebra and equivalences to transform graph patterns in Neo4j,
GraphQ @ EDBT 2016
t, seg
t, seg, t.number
sw, seg
sw, seg, sw.position
t.number, sw.position
πt.number, sw
σsw.position = ′diverging′
⋈
(sw:Switch)−[:STRAIGHT]−>(seg:Segment)(t:Train)−[:ON]−>(seg:Segment)
t.number, sw
t.number, sw
t, seg, sw
t, seg, t.number, sw, sw.position
t, seg, sw
t, seg, t.number, sw, sw.position
t.number
t.number, sw.position
sw.positiont.number
2
1. external schema
2. extra attributes
3. internal schema
This is the current
implementation
SCHEMA
INFERENCING
MATCH (t:Train)-[:ON]->(seg:Segment)
<-[:STRAIGHT]-(sw:Switch)
WHERE sw.position = 'diverging'
RETURN t.number, sw
openCypher
query
AST Graph RA
Graph RA Flat RANested RA
Deployed
query
SPARK SQL
 “Spark SQL lets you query structured data inside Spark
programs, using either SQL or a familiar DataFrame API.”
http://www.gatorsmile.io/sparksqloverview/
http://www.gatorsmile.io/sparksqloverview/
SPARK CATALYST
 Tree Manipulation Framework
o “Catalyst is an execution-agnostic framework to represent and
manipulate a dataflow graph, i.e. trees of relational operators and
expressions.”
 Optimizer (both cost-based and rule-based)
Catalyst
SPARK CATALYST: OBSERVATIONS
 Strong community
 Well-written in general, but noisy here and there (Hive)
 Nice API docs… but not much else
CATALYST EXAMPLES: TREE TRANSFORMATION
CATALYST EXAMPLES: ATTRIBUTE RESOLVER
CATALYST FEATURES: CODE GENERATION
 Generates bytecode for performance
H. Karau, R. Warren:
High Performance Spark: Best Practices for Scaling and
Optimizing Apache Spark
O'Reilly Media, Inc., May 25, 2017
Scalable graph queries
MAKING IT SCALE πt.number, sw
σsw.position = ′diverging′
⋈
STRAIGHTON e div
STRAIGHT
d2
ON
a1
ON
div
STRAIGHT
ON
Actors
Async messages
G. Szárnyas et al.,
IncQuery-D: A distributed incremental model query framework in the cloud.
ACM/IEEE MODELS, 2014
openCypher
query
AST Graph RA
Graph RA Flat RANested RA
Deployed
query
ARCHITECTURE
Related work and summary
CAPS: CYPHER FOR APACHE SPARK
 An openCypher project
 “CAPS is built on top of the Spark DataFrames API and uses
features such as the Catalyst optimizer.”
 Approach
o Compiles to operations to a custom dataflow graph
o Transforms the dataflow graph to queries on the DataFrames API
(backed by Catalyst)
LESSONS LEARNT
 Simply extending the SQL model is insufficient
 Implemented new components from scratch
o Logical plans
o Attribute resolver
 Still reused a lot of components
o Data model
o Expressions
o Transformations
o Built-in methods: toString, output, etc.
FUTURE DIRECTIONS
 Cost-based optimizer
 Experiment with the LDBC Social Network Benchmark
 Transform queries to SQL
 Integrate engine to Spark
G. Szárnyas, A. Prat, A. Averbuch et al.:
The LDBC Social Network Benchmark: BI Workload.
Technical report, peer-reviewed paper in 2018
RELATED RESOURCES
Ingraph github.com/ftsrg/ingraph
Cypher for Apache Spark github.com/opencypher/cypher-for-apache-spark
Slizaa openCypher github.com/slizaa/slizaa-opencypher-xtext
Mastering Apache Spark jaceklaskowski.gitbooks.io/mastering-apache-spark
Scala Days presentation people.apache.org/… | youtu.be/6bCpISym_0w
Deep dive blogpost databricks.com/blog/2015/04/13/deep-dive-…
Thanks for the contributions to the ingraph team.

More Related Content

Similar to Compiling openCypher graph queries with Spark Catalyst

Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014Roger Huang
 
Learning Timed Automata with Cypher
Learning Timed Automata with CypherLearning Timed Automata with Cypher
Learning Timed Automata with CypherGábor Szárnyas
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介Masayuki Matsushita
 
Data Processing with Apache Spark Meetup Talk
Data Processing with Apache Spark Meetup TalkData Processing with Apache Spark Meetup Talk
Data Processing with Apache Spark Meetup TalkEren Avşaroğulları
 
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Flink Forward
 
Toying with spark
Toying with sparkToying with spark
Toying with sparkRaymond Tay
 
MLconf NYC Shan Shan Huang
MLconf NYC Shan Shan HuangMLconf NYC Shan Shan Huang
MLconf NYC Shan Shan HuangMLconf
 
Linking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsLinking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsKhalid Belhajjame
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Databricks
 
The magic of (data parallel) distributed systems and where it all breaks - Re...
The magic of (data parallel) distributed systems and where it all breaks - Re...The magic of (data parallel) distributed systems and where it all breaks - Re...
The magic of (data parallel) distributed systems and where it all breaks - Re...Holden Karau
 
Cypher and apache spark multiple graphs and more in open cypher
Cypher and apache spark  multiple graphs and more in  open cypherCypher and apache spark  multiple graphs and more in  open cypher
Cypher and apache spark multiple graphs and more in open cypherNeo4j
 
Spline: Data Lineage For Spark Structured Streaming
Spline: Data Lineage For Spark Structured StreamingSpline: Data Lineage For Spark Structured Streaming
Spline: Data Lineage For Spark Structured StreamingVaclav Kosar
 
The Semantics of SPARQL
The Semantics of SPARQLThe Semantics of SPARQL
The Semantics of SPARQLOlaf Hartig
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
An Introduction to NV_path_rendering
An Introduction to NV_path_renderingAn Introduction to NV_path_rendering
An Introduction to NV_path_renderingMark Kilgard
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphXAndy Petrella
 

Similar to Compiling openCypher graph queries with Spark Catalyst (20)

Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
 
Scala 20140715
Scala 20140715Scala 20140715
Scala 20140715
 
Learning Timed Automata with Cypher
Learning Timed Automata with CypherLearning Timed Automata with Cypher
Learning Timed Automata with Cypher
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
 
Data Processing with Apache Spark Meetup Talk
Data Processing with Apache Spark Meetup TalkData Processing with Apache Spark Meetup Talk
Data Processing with Apache Spark Meetup Talk
 
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
 
Toying with spark
Toying with sparkToying with spark
Toying with spark
 
MLconf NYC Shan Shan Huang
MLconf NYC Shan Shan HuangMLconf NYC Shan Shan Huang
MLconf NYC Shan Shan Huang
 
Linking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsLinking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scripts
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
 
The magic of (data parallel) distributed systems and where it all breaks - Re...
The magic of (data parallel) distributed systems and where it all breaks - Re...The magic of (data parallel) distributed systems and where it all breaks - Re...
The magic of (data parallel) distributed systems and where it all breaks - Re...
 
Cypher and apache spark multiple graphs and more in open cypher
Cypher and apache spark  multiple graphs and more in  open cypherCypher and apache spark  multiple graphs and more in  open cypher
Cypher and apache spark multiple graphs and more in open cypher
 
Spline: Data Lineage For Spark Structured Streaming
Spline: Data Lineage For Spark Structured StreamingSpline: Data Lineage For Spark Structured Streaming
Spline: Data Lineage For Spark Structured Streaming
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
The Semantics of SPARQL
The Semantics of SPARQLThe Semantics of SPARQL
The Semantics of SPARQL
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
An Introduction to NV_path_rendering
An Introduction to NV_path_renderingAn Introduction to NV_path_rendering
An Introduction to NV_path_rendering
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 

More from Gábor Szárnyas

GraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queriesGraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queriesGábor Szárnyas
 
What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?Gábor Szárnyas
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLGábor Szárnyas
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...Gábor Szárnyas
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesGábor Szárnyas
 
Writing a Cypher Engine in Clojure
Writing a Cypher Engine in ClojureWriting a Cypher Engine in Clojure
Writing a Cypher Engine in ClojureGábor Szárnyas
 
Időzített automatatanulás Cypherrel
Időzített automatatanulás CypherrelIdőzített automatatanulás Cypherrel
Időzített automatatanulás CypherrelGábor Szárnyas
 
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Gábor Szárnyas
 
Sharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesSharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesGábor Szárnyas
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsGábor Szárnyas
 
IncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesIncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesGábor Szárnyas
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudGábor Szárnyas
 

More from Gábor Szárnyas (13)

GraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queriesGraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queries
 
What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQL
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher Queries
 
Writing a Cypher Engine in Clojure
Writing a Cypher Engine in ClojureWriting a Cypher Engine in Clojure
Writing a Cypher Engine in Clojure
 
Időzített automatatanulás Cypherrel
Időzített automatatanulás CypherrelIdőzített automatatanulás Cypherrel
Időzített automatatanulás Cypherrel
 
Parsing process
Parsing processParsing process
Parsing process
 
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
 
Sharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesSharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph Queries
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
 
IncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesIncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph Queries
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the Cloud
 

Recently uploaded

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 

Recently uploaded (20)

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 

Compiling openCypher graph queries with Spark Catalyst

  • 1. Compiling openCypher graph queries with Spark Catalyst Gábor Szárnyas pre-holiday Spark des fêtes @ Montréal
  • 2. BACKGROUND  PhD student @ Budapest Univ. of Tech. and Econ., Hungary  Visiting researcher @ McGill University
  • 3. RESEARCH TOPIC Problem statement  Large graph (100M+ nodes)  Complex global graph queries  Evaluate them in <1sec Approach  We “cheat”  Build a huge cache  Maintain results: incremental views
  • 4. RESEARCH OBJECTIVES Create a scalable graph query engine with incremental views 1. graph queries 2. incremental views 3. making it scale
  • 6. PROPERTY GRAPH DATABASES NoSQL family Data model: vertices, edges and properties #1 query approach: graph pattern matching Note. Spark GraphX is an engine for graph analytics.
  • 7. CYPHER AND OPENCYPHER Cypher: query language of the Neo4j graph database. „Cypher is a declarative, SQL-inspired language for describing patterns in graphs visually using an ascii-art syntax.” MATCH (p:Person)-[:PRESENTER_OF]->(:Presentation)-[:AT]->(m:Meetup) WHERE m.date = 'Monday, December 18, 2017' RETURN p „The openCypher project aims to deliver a full and open specification of the industry’s most widely adopted graph database query language: Cypher.” (late 2015)
  • 8. OPENCYPHER SYSTEMS  Increasing adoption  Relational databases: o SAP HANA o AGENS Graph  Research prototypes: o Graphflow (Univesity of Waterloo) o ingraph (incremental graph engine) (Source: Keynote talk @ GraphConnect NYC 2017)
  • 9. LINKED DATA BENCHMARK COUNCIL LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark results for graph data management software. LDBC’s Social Network Benchmark is an industrial and academic initiative, formed by principal actors in the field of graph-like data management.
  • 10. OVERVIEW OF GRAPH PROCESSING OLTP analytics OLAP local queries global queries global computations
  • 11. OLAP global queries OVERVIEW OF GRAPH PROCESSING OLTP analytics local queries global computations Example: „Friends’ recent likes” MATCH (u:User {id: $userId})-[:FRIEND]- (f:User)-[l:LIKES]->(p:Post) RETURN f, p ORDER BY l.timestamp DESC LIMIT 10
  • 12. OLAP global queries OVERVIEW OF GRAPH PROCESSING OLTP analytics local queries global computations Orri Erling et al., The LDBC Social Network Benchmark: Interactive Workload, SIGMOD 2015 14 queries and 8 updates
  • 13. OVERVIEW OF GRAPH PROCESSING OLTP analytics local queries global computations OLAP global queries Example: „One-sided friendships” MATCH (u1:User)-[:FRIEND]-(u2:User)-[l:LIKES]->(p:Post), (u1)-[:AUTHOR_OF]->(p) WITH u1, u2, count(l) AS likes WHERE likes > 10 AND NOT (u1)-[:LIKES]->(:Post)<-[:AUTHOR_OF]-(u2) RETURN u1, u2
  • 14. OVERVIEW OF GRAPH PROCESSING OLTP analytics local queries global computations Arnau Prat, Gábor Szárnyas, Alex Averbuch et al., The LDBC Social Network Benchmark: BI Workload, Technical report available, peer-reviewed paper in 2018 OLAP global queries 25 queries with infrequent executions
  • 15. OVERVIEW OF GRAPH PROCESSING OLTP analytics OLAP local queries global queries global computations • PageRank • Shortest paths • Clustering coefficient Example: „Find the most central individuals.” Spark: GraphX | Flink: Gelly | Neo4j: Graph Algorithms library
  • 16. OVERVIEW OF GRAPH PROCESSING OLTP analytics OLAP local queries global queries global computations Alexandru Iosup et al., LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms, VLDB 2016 One-time execution
  • 17. OVERVIEW OF GRAPH PROCESSING OLTP analytics OLAP local queries global queries global computations
  • 19. CYBER-PHYSICAL SYSTEMS: LIVE RAILWAY MODEL Trailing the switch Proximity detection
  • 20. CYBER-PHYSICAL SYSTEMS: LIVE RAILWAY MODEL c d e g fdiv 2 NEXT NEXT STRAIGHT TOP ON a b 1 NEXT ON NEXT
  • 22. TRAILING THE SWITCH seg div t STRAIGHT ON MATCH (t:Train)-[:ON]->(seg:Segment) <-[:STRAIGHT]-(sw:Switch) WHERE sw.position = 'diverging' RETURN t.number, sw Evaluate continuously
  • 23. INCREMENTAL QUERIES  Register a set of standing queries  Continuously evaluate queries on changes  Approach: build a cache and maintain its content  First publication: 1974, the Rete algorithm ingraphclient register queries query results change notifications update graph
  • 24. πt.number, sw σsw.position = ′diverging′ ⋈ STRAIGHTON c d e g fdiv 2 NEXT NEXT STRAIGHT TOP a b 1 NEXT NEXT ON ON div STRAIGHT Trailing the switch ON
  • 25. πt.number, sw σsw.position = ′diverging′ ⋈ STRAIGHTON e2 ON a1 ON c d e g fdiv 2 NEXT NEXT STRAIGHT TOP a b 1 NEXT NEXT ON ON div STRAIGHT Trailing the switch ON a 1 ON e 2 ON
  • 26. πt.number, sw σsw.position = ′diverging′ ⋈ STRAIGHTON e2 ON a1 ON c d e g fdiv 2 NEXT NEXT STRAIGHT TOP a b 1 NEXT NEXT ON ON div STRAIGHT Trailing the switch ON e div STRAIGHT e div STRAIGHT
  • 27. πt.number, sw σsw.position = ′diverging′ ⋈ STRAIGHTON e div STRAIGHT e2 ON a1 ON e div STRAIGHT 2 ON c d e g fdiv 2 NEXT NEXT STRAIGHT TOP a b 1 NEXT NEXT ON ON div STRAIGHT Trailing the switch ON e div STRAIGHT e2 ON e div 2 STRAIGHT ON
  • 28. πt.number, sw σsw.position = ′diverging′ ⋈ STRAIGHTON e div STRAIGHT e2 ON a1 ON div STRAIGHTON e div STRAIGHT 2 ON c d e g fdiv 2 NEXT NEXT STRAIGHT TOP a b 1 NEXT NEXT ON ON div STRAIGHT Trailing the switch ON e2 div STRAIGHTON e2
  • 29. πt.number, sw σsw.position = ′diverging′ ⋈ STRAIGHTON e div STRAIGHT e2 ON a1 ON e div STRAIGHT 2 ON e div STRAIGHT 2 ON div2 c d e g fdiv 2 NEXT NEXT STRAIGHT TOP a b 1 NEXT NEXT ON ON div STRAIGHT Trailing the switch ON div 2
  • 30. πt.number, sw σsw.position = ′diverging′ ⋈ STRAIGHTON e div STRAIGHT e2 ON a1 ON e div STRAIGHT 2 ON e div STRAIGHT 2 ON div2 c d e g fdiv 2 NEXT NEXT STRAIGHT TOP a b 1 NEXT NEXT ON ON div STRAIGHT Trailing the switch ON div 2
  • 31. πt.number, sw σsw.position = ′diverging′ ⋈ STRAIGHTON e div STRAIGHT e2 ON a1 ON e div STRAIGHT 2 ON e div STRAIGHT 2 ON div2 c e g fdiv NEXT NEXT STRAIGHT TOP a b 1 NEXT NEXT ON div STRAIGHT Trailing the switch ON div ON 2 d
  • 32. πt.number, sw σsw.position = ′diverging′ ⋈ STRAIGHTON e div STRAIGHT d2 ON a1 ON e div STRAIGHT 2 ON e div STRAIGHT 2 ON div2 c e g fdiv NEXT NEXT STRAIGHT TOP a b 1 NEXT NEXT ON div STRAIGHT Trailing the switch ON div ON 2 d
  • 33. GRAPH RELATIONAL ALGEBRA  Basic relational algebra o projection, selection, join, left outer join, antijoin, union  Common extensions o aggregation (𝛾), duplicate-elimination (𝛿), sort (𝜏), top (𝜆)  Graph-specific extensions o get-vertices () o expand-out (↑), expand-in (↓), expand-both (↕) J. Marton, G. Szárnyas, D. Varró: Formalising openCypher Graph Queries in Relational Algebra, ADBIS, Springer, 2017
  • 34.
  • 35.
  • 36. THEORETICAL ISSUES WITH INCREMENTAL CYPHER  Graph RA is not incrementally maintainable o Expand operators o Property access needs nested data structures (0NF) o Ordering o Weak schema  Most incremental approaches work on flat relational algebra: o Transform graph relational algebra to a flat one o Optimize query G. Szárnyas: Incremental View Maintenance for Property Graph Queries, arXiv preprint, 2017
  • 37. PROPOSED WORKFLOW  Parse  Compile  Evaluate query openCypher query Magic Deployed query AST
  • 39. PROPERTY ACCESS Assuming that x is a column of a graph relation, we use the notation “x.a” in selection conditions to express the access to the corresponding value of property a in the property graph. J. Hölsch, M. Grossniklaus: An algebra and equivalences to transform graph patterns in Neo4j, GraphQ @ EDBT 2016
  • 40. t, seg t, seg, t.number sw, seg sw, seg, sw.position t.number, sw.position πt.number, sw σsw.position = ′diverging′ ⋈ (sw:Switch)−[:STRAIGHT]−>(seg:Segment)(t:Train)−[:ON]−>(seg:Segment) t.number, sw t.number, sw t, seg, sw t, seg, t.number, sw, sw.position t, seg, sw t, seg, t.number, sw, sw.position t.number t.number, sw.position sw.positiont.number 2 1. external schema 2. extra attributes 3. internal schema This is the current implementation SCHEMA INFERENCING
  • 41.
  • 42. MATCH (t:Train)-[:ON]->(seg:Segment) <-[:STRAIGHT]-(sw:Switch) WHERE sw.position = 'diverging' RETURN t.number, sw openCypher query AST Graph RA
  • 43. Graph RA Flat RANested RA Deployed query
  • 44. SPARK SQL  “Spark SQL lets you query structured data inside Spark programs, using either SQL or a familiar DataFrame API.” http://www.gatorsmile.io/sparksqloverview/ http://www.gatorsmile.io/sparksqloverview/
  • 45. SPARK CATALYST  Tree Manipulation Framework o “Catalyst is an execution-agnostic framework to represent and manipulate a dataflow graph, i.e. trees of relational operators and expressions.”  Optimizer (both cost-based and rule-based) Catalyst
  • 46. SPARK CATALYST: OBSERVATIONS  Strong community  Well-written in general, but noisy here and there (Hive)  Nice API docs… but not much else
  • 47. CATALYST EXAMPLES: TREE TRANSFORMATION
  • 49. CATALYST FEATURES: CODE GENERATION  Generates bytecode for performance H. Karau, R. Warren: High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark O'Reilly Media, Inc., May 25, 2017
  • 51. MAKING IT SCALE πt.number, sw σsw.position = ′diverging′ ⋈ STRAIGHTON e div STRAIGHT d2 ON a1 ON div STRAIGHT ON Actors Async messages G. Szárnyas et al., IncQuery-D: A distributed incremental model query framework in the cloud. ACM/IEEE MODELS, 2014
  • 52. openCypher query AST Graph RA Graph RA Flat RANested RA Deployed query ARCHITECTURE
  • 53. Related work and summary
  • 54. CAPS: CYPHER FOR APACHE SPARK  An openCypher project  “CAPS is built on top of the Spark DataFrames API and uses features such as the Catalyst optimizer.”  Approach o Compiles to operations to a custom dataflow graph o Transforms the dataflow graph to queries on the DataFrames API (backed by Catalyst)
  • 55. LESSONS LEARNT  Simply extending the SQL model is insufficient  Implemented new components from scratch o Logical plans o Attribute resolver  Still reused a lot of components o Data model o Expressions o Transformations o Built-in methods: toString, output, etc.
  • 56. FUTURE DIRECTIONS  Cost-based optimizer  Experiment with the LDBC Social Network Benchmark  Transform queries to SQL  Integrate engine to Spark G. Szárnyas, A. Prat, A. Averbuch et al.: The LDBC Social Network Benchmark: BI Workload. Technical report, peer-reviewed paper in 2018
  • 57. RELATED RESOURCES Ingraph github.com/ftsrg/ingraph Cypher for Apache Spark github.com/opencypher/cypher-for-apache-spark Slizaa openCypher github.com/slizaa/slizaa-opencypher-xtext Mastering Apache Spark jaceklaskowski.gitbooks.io/mastering-apache-spark Scala Days presentation people.apache.org/… | youtu.be/6bCpISym_0w Deep dive blogpost databricks.com/blog/2015/04/13/deep-dive-… Thanks for the contributions to the ingraph team.