SlideShare a Scribd company logo
1 of 20
Download to read offline
Extended Property Graphs and Cypher on Gradoop
1st openCypher Implementers Meeting
8 February 2017
Walldorf, Germany
Martin Junghanns
University of Leipzig – Database Research Group
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
2
Gradoop
„An open-source graph dataflow system for
declarative analytics of heterogeneous graph data.“
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 3
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
3
Gradoop
Distributed Graph Storage (Apache HDFS)
Apache Flink Operator Implementation
Distributed Operator Execution (Apache Flink)
Extended Property Graph Model (EPGM)
Graph Dataflow Operators
I/O
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 4
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
4
Extended Property Graphs
• Vertices and directed Edges
• Logical Graphs
• Identifiers
• Type Labels
• Properties
1 2
3
4
5
1 2 3
4
5
2
1
Hobbit
name : Samwise
Orc
name : Azog
Clan
name : Tribes of Moria
founded : 1981
Orc
name : Bolg
Hobbit
name : Frodo
yob : 2968
leaderOf
since : 2790
memberOf
since : 2013
hates
since : 2301
hates
knows
since : 2990
|Area|title:Mordor
|Area|title:Shire
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 5
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
5
Extended Property Graphs
Graph Operators/Transformations
Unary Binary
GraphCollectionLogicalGraph
Equality
Union
Intersection
Difference
Limit
Selection
Pattern Matching
Distinct
Apply
Reduce
Call
Aggregation
Pattern Matching
Transformation
Grouping
Call
Subgraph
Equality
Combination
Overlap
Exclusion
Fusion
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 6
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
6
Extended Property Graphs
3
1 3
4
5
2 3
4
1 2
UDF
LogicalGraph graph3 = readFromHDFS();
LogicalGraph graph4 = graph3.subgraph(
(vertex => vertex.getLabel().equals(‘Green’)),
(edge => edge.getLabel().equals(‘orange’)));
Subgraph
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 7
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
7
Extended Property Graphs
3
1 3
4
5
2
Pattern
4 5
1 3
4
2
Graph Collection
GraphCollection collection = graph3.match(‘(:Green)-[:orange]->(:Orange)’);
Pattern Matching (Single Graph Input)
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 8
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
8
Cypher on Gradoop
„Which two clan leaders hate each other and one
of them knows Frodo over one to ten hops?“
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 9
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
9
Cypher on Gradoop
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 10
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
10
Cypher on Gradoop
PlanTableEntry | type: GRAPH | all-vars: [...] | proc-vars: [...] | attr-vars: [] | est-card: 23 | prediates: () | Plan :
|-FilterEmbeddingsNode{filterPredicate=((c1 != c2) AND (o1 != o2))}
|.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I}
|.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I}
|.|.|.|-JoinEmbeddingsNode{joinVariables=[c1], vertexMorphism=H, edgeMorphism=I}
|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c1, filterPredicate=((c1.label = Clan)), projectionKeys=[]}
|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e0', targetVar='c1', filterPredicate=((_e0.label = leaderOf)), projectionKeys=[]}
|.|.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I}
|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o1, filterPredicate=((o1.label = Orc)), projectionKeys=[]}
|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e1', targetVar='o2', filterPredicate=((_e1.label = hates)), projectionKeys=[]}
|.|.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I}
|.|.|.|-JoinEmbeddingsNode{joinVariables=[h], vertexMorphism=H, edgeMorphism=I}
|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=h, filterPredicate=((h.label = Hobbit) AND (h.name = Frodo Baggins)), projectionKeys=[]}
|.|.|.|.|-ExpandEmbeddingsNode={startVar='o2', pathVar='_e3', endVar='h', lb=1, ub=10, direction=OUT, vertexMorphism=H, edgeMorphism=I}
|.|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o2, filterPredicate=((o2.label = Orc)), projectionKeys=[]}
|.|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e3', targetVar='h', filterPredicate=((_e3.label = knows)), projectionKeys=[]}
|.|.|.|-JoinEmbeddingsNode{joinVariables=[c2], vertexMorphism=H, edgeMorphism=I}
|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c2, filterPredicate=((c2.label = Clan)), projectionKeys=[]}
|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e2', targetVar='c2', filterPredicate=((_e2.label = leaderOf)), projectionKeys=[]}
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 11
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
11
Cypher on Gradoop
0 1 2 3 4 5 6 7 8 9
1 37 5 3 7 8 45 99 12 3
0 1 2
Frodo Baggins 1.22 Saruman
0
45: [4,1,33]
EmbeddingMetaData – Stores information about the embedding content
Mapping : Variable -> ID Column {h: 0, e1: 1, o2: 5, ...}
Mapping : Variable.Property -> Property Column {h.name: 0, h.height: 1, c1.name: 2, ...}
Embedding - Data structure used for intermediate results
Identifiers
Properties
Paths
Embedding
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 12
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
12
Cypher on Gradoop
Filter
Hobbit(name=Frodo Baggins)
name: Frodo Baggins
height: 1.22m
gender: male
city: Bag End
Project
[ ]
h.id h.name h.height …
31 Frodo 1.22 …
h.id
32
id Properties
1 {…}
2 {…}
3 {…}
… …
DataSet<Vertex> DataSet<Embedding>
FlatMap(Vertex -> Embedding)
𝜋ℎ.𝐼𝑑(𝑉′
)𝜎 𝐿𝑎𝑏𝑒𝑙=𝐻𝑜𝑏𝑏𝑖𝑡
∧𝑛𝑎𝑚𝑒=𝐹𝑟𝑜𝑑𝑜
(𝑉)
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 13
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
13
Cypher on Gradoop
c.id _e1.id o1.id
51 11 2
52 12 3
… … …
DataSet<Embedding> DataSet<Embedding>
FlatJoin(lhs, rhs -> combine(lhs, rhs))DataSet<Embedding>
o1.id _e2.id o2.id
2 13 5
3 14 3
… … …
c.id _e1.id o1.id _e2.id o2.id
51 11 2 13 5
52 12 3 14 3
… … …
Combine
Check for vertex/edge isomorphism,
Remove duplicate entries
JoinEmbeddings
Left: (c1:Clan)<-[:hasLeader]-(o1:Orc)
Right: (o1:Orc)-[:hates]->(o2.Orc)
𝐿 ⋈ 𝑜1.𝑖𝑑 𝑅
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 14
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
14
Cypher on Gradoop
ExpandEmbeddings
o2.id
5
DataSet<Embedding>
DataSet<Embedding>
DataSet<Embedding>
_e3.sid _e3.id _e3.tid
5 26 31
31 27 32
32 28 33
o2.id _e3.id h.id
3 [26] 31
3 [26,31,27] 32
3 [26,31,27,32,28] 33
FlatJoin(lhs, rhs ->
combine(lhs, rhs))
BulkIteration
𝐿 ⋈ 𝑜2.𝑖𝑑=_𝑒3.𝑠𝑖𝑑 𝐸
Left: (o2:Orc)
Edge: (o2)-[:knows*1..10]->(h)
𝐸′ ⋈ 𝑒.𝑡𝑖𝑑=_𝑒3.𝑠𝑖𝑑 𝐸
Combine
Check for vertex/edge isomorphism
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 15
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
15
Cypher on Gradoop
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 16
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
16
Cypher on Gradoop
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 17
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
17
Cypher on Gradoop
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 18
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
18
Conclusion
• Implement Cypher Technology Compatibility KIT (TCK) integration tests
• Benchmarking
• Implement and evaluate LDBC benchmarking queries
• Optimizations
• DP-Planner
• Improve cost model (more statistics, Flink optimizer hints)
• Reuse of intermediate results
• Consider graph partitioning
• Support more Cypher features
• e.g. Aggregation and Functions
• Introduce new Cypher features
• e.g. regular path queries
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 19
Gradoop Extended Property Graphs ConclusionCypher on Gradoop
19
Conclusion
• Gradoop on Apache Flink
• Extended Property Graph abstraction on Apache Flink
• Schema flexible: Type Labels and Properties
• Logical Graphs / Graphs Collection
• Graph Transformations for Graphs and Graph collections
• Cypher on Gradoop
• Covering many Cypher features (variable length paths, predicates)
• Query execution engine incl. Greedy cost-based optimizer
• Physical operators mapped to Flink transformations
www.gradoop.com
[1] Junghanns, M.; Petermann, A.; K.; Rahm, E., „Distributed Grouping of Property Graphs with Gradoop“,
Proc. BTW Conf. , 2017.
[2] Petermann, A.; Junghanns, M.; Kemper, S.; Gomez, K.; Teichmann, N.; Rahm, E., „Graph Mining for Complex Data Analytics “,
Proc. ICDM Conf. (Demo), 2016.
[3] Junghanns, M.; Petermann, A.; Teichmann, N.; Gomez, K.; Rahm, E., „Analyzing Extended Property Graphs with Apache Flink“,
Int. Workshop on Network Data Analytics (NDA), SIGMOD, 2016.
[4] Petermann, A.; Junghanns, M., „Scalable Business Intelligence with Graph Collections“,
it – Special Issue on Big Data Analytics, 2016.
[5] Petermann, A.; Junghanns, M.; Müller, M.; Rahm, E., „Graph-based Data Integration and Business Intelligence with BIIIG“,
Proc. VLDB Conf. (Demo), 2014.

More Related Content

Similar to Extended Property Graphs and Cypher on Gradoop

RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
Connected Data World
 
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
Spark Summit
 
Geo script opengeo spring 2013
Geo script opengeo spring 2013Geo script opengeo spring 2013
Geo script opengeo spring 2013
Ilya Rosenfeld
 
Gl tf siggraph-2013
Gl tf siggraph-2013Gl tf siggraph-2013
Gl tf siggraph-2013
Khaled MAMOU
 

Similar to Extended Property Graphs and Cypher on Gradoop (20)

RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
 
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDFGPU-Accelerating UDFs in PySpark with Numba and PyGDF
GPU-Accelerating UDFs in PySpark with Numba and PyGDF
 
Graphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platformsGraphalytics: A big data benchmark for graph-processing platforms
Graphalytics: A big data benchmark for graph-processing platforms
 
Geo script opengeo spring 2013
Geo script opengeo spring 2013Geo script opengeo spring 2013
Geo script opengeo spring 2013
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
Gl tf siggraph-2013
Gl tf siggraph-2013Gl tf siggraph-2013
Gl tf siggraph-2013
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data science
 
Dgraph: Graph database for production environment
Dgraph:  Graph database for production environmentDgraph:  Graph database for production environment
Dgraph: Graph database for production environment
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
GoMR: A MapReduce Framework for Go
GoMR: A MapReduce Framework for GoGoMR: A MapReduce Framework for Go
GoMR: A MapReduce Framework for Go
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
 
Processing Landsat 8 Multi-Spectral Images with GRASS Tools & the potential o...
Processing Landsat 8 Multi-Spectral Images with GRASS Tools & the potential o...Processing Landsat 8 Multi-Spectral Images with GRASS Tools & the potential o...
Processing Landsat 8 Multi-Spectral Images with GRASS Tools & the potential o...
 
pmux
pmuxpmux
pmux
 
Sparksummit2016 share
Sparksummit2016 shareSparksummit2016 share
Sparksummit2016 share
 
mago3D FOSS4G NA 2018
mago3D FOSS4G NA 2018mago3D FOSS4G NA 2018
mago3D FOSS4G NA 2018
 
Graph ql api gateway
Graph ql api gatewayGraph ql api gateway
Graph ql api gateway
 

More from openCypher

More from openCypher (20)

Learning Timed Automata with Cypher
Learning Timed Automata with CypherLearning Timed Automata with Cypher
Learning Timed Automata with Cypher
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher Queries
 
Formal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updatesFormal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updates
 
Cypher.PL: an executable specification of Cypher semantics
Cypher.PL: an executable specification of Cypher semanticsCypher.PL: an executable specification of Cypher semantics
Cypher.PL: an executable specification of Cypher semantics
 
Multiple Graphs: Updatable Views
Multiple Graphs: Updatable ViewsMultiple Graphs: Updatable Views
Multiple Graphs: Updatable Views
 
Micro-Servicing Linked Data
Micro-Servicing Linked DataMicro-Servicing Linked Data
Micro-Servicing Linked Data
 
Graph abstraction
Graph abstractionGraph abstraction
Graph abstraction
 
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...
 
Cypher for Gremlin
Cypher for GremlinCypher for Gremlin
Cypher for Gremlin
 
Comparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and CypherComparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and Cypher
 
Multiple graphs in openCypher
Multiple graphs in openCypherMultiple graphs in openCypher
Multiple graphs in openCypher
 
Eighth openCypher Implementers Group Meeting: Status Update
Eighth openCypher Implementers Group Meeting: Status UpdateEighth openCypher Implementers Group Meeting: Status Update
Eighth openCypher Implementers Group Meeting: Status Update
 
Cypher for Gremlin
Cypher for GremlinCypher for Gremlin
Cypher for Gremlin
 
Supporting dates and times in Cypher
Supporting dates and times in CypherSupporting dates and times in Cypher
Supporting dates and times in Cypher
 
Seventh openCypher Implementers Group Meeting: Status Update
Seventh openCypher Implementers Group Meeting: Status UpdateSeventh openCypher Implementers Group Meeting: Status Update
Seventh openCypher Implementers Group Meeting: Status Update
 
Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...
 
Property Graphs with Time
Property Graphs with TimeProperty Graphs with Time
Property Graphs with Time
 
Cypher.PL: Executable Specification of Cypher written in Prolog
Cypher.PL: Executable Specification of Cypher written in PrologCypher.PL: Executable Specification of Cypher written in Prolog
Cypher.PL: Executable Specification of Cypher written in Prolog
 
Use case: processing multiple graphs
Use case: processing multiple graphsUse case: processing multiple graphs
Use case: processing multiple graphs
 
openCypher Technology Compatibility Kit (TCK)
openCypher Technology Compatibility Kit (TCK)openCypher Technology Compatibility Kit (TCK)
openCypher Technology Compatibility Kit (TCK)
 

Recently uploaded

“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
Wonjun Hwang
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Recently uploaded (20)

State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 

Extended Property Graphs and Cypher on Gradoop

  • 1. Extended Property Graphs and Cypher on Gradoop 1st openCypher Implementers Meeting 8 February 2017 Walldorf, Germany Martin Junghanns University of Leipzig – Database Research Group
  • 2. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 2 Gradoop „An open-source graph dataflow system for declarative analytics of heterogeneous graph data.“
  • 3. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 3 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 3 Gradoop Distributed Graph Storage (Apache HDFS) Apache Flink Operator Implementation Distributed Operator Execution (Apache Flink) Extended Property Graph Model (EPGM) Graph Dataflow Operators I/O
  • 4. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 4 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 4 Extended Property Graphs • Vertices and directed Edges • Logical Graphs • Identifiers • Type Labels • Properties 1 2 3 4 5 1 2 3 4 5 2 1 Hobbit name : Samwise Orc name : Azog Clan name : Tribes of Moria founded : 1981 Orc name : Bolg Hobbit name : Frodo yob : 2968 leaderOf since : 2790 memberOf since : 2013 hates since : 2301 hates knows since : 2990 |Area|title:Mordor |Area|title:Shire
  • 5. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 5 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 5 Extended Property Graphs Graph Operators/Transformations Unary Binary GraphCollectionLogicalGraph Equality Union Intersection Difference Limit Selection Pattern Matching Distinct Apply Reduce Call Aggregation Pattern Matching Transformation Grouping Call Subgraph Equality Combination Overlap Exclusion Fusion
  • 6. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 6 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 6 Extended Property Graphs 3 1 3 4 5 2 3 4 1 2 UDF LogicalGraph graph3 = readFromHDFS(); LogicalGraph graph4 = graph3.subgraph( (vertex => vertex.getLabel().equals(‘Green’)), (edge => edge.getLabel().equals(‘orange’))); Subgraph
  • 7. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 7 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 7 Extended Property Graphs 3 1 3 4 5 2 Pattern 4 5 1 3 4 2 Graph Collection GraphCollection collection = graph3.match(‘(:Green)-[:orange]->(:Orange)’); Pattern Matching (Single Graph Input)
  • 8. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 8 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 8 Cypher on Gradoop „Which two clan leaders hate each other and one of them knows Frodo over one to ten hops?“
  • 9. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 9 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 9 Cypher on Gradoop
  • 10. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 10 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 10 Cypher on Gradoop PlanTableEntry | type: GRAPH | all-vars: [...] | proc-vars: [...] | attr-vars: [] | est-card: 23 | prediates: () | Plan : |-FilterEmbeddingsNode{filterPredicate=((c1 != c2) AND (o1 != o2))} |.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I} |.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I} |.|.|.|-JoinEmbeddingsNode{joinVariables=[c1], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c1, filterPredicate=((c1.label = Clan)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e0', targetVar='c1', filterPredicate=((_e0.label = leaderOf)), projectionKeys=[]} |.|.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o1, filterPredicate=((o1.label = Orc)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e1', targetVar='o2', filterPredicate=((_e1.label = hates)), projectionKeys=[]} |.|.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I} |.|.|.|-JoinEmbeddingsNode{joinVariables=[h], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=h, filterPredicate=((h.label = Hobbit) AND (h.name = Frodo Baggins)), projectionKeys=[]} |.|.|.|.|-ExpandEmbeddingsNode={startVar='o2', pathVar='_e3', endVar='h', lb=1, ub=10, direction=OUT, vertexMorphism=H, edgeMorphism=I} |.|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o2, filterPredicate=((o2.label = Orc)), projectionKeys=[]} |.|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e3', targetVar='h', filterPredicate=((_e3.label = knows)), projectionKeys=[]} |.|.|.|-JoinEmbeddingsNode{joinVariables=[c2], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c2, filterPredicate=((c2.label = Clan)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e2', targetVar='c2', filterPredicate=((_e2.label = leaderOf)), projectionKeys=[]}
  • 11. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 11 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 11 Cypher on Gradoop 0 1 2 3 4 5 6 7 8 9 1 37 5 3 7 8 45 99 12 3 0 1 2 Frodo Baggins 1.22 Saruman 0 45: [4,1,33] EmbeddingMetaData – Stores information about the embedding content Mapping : Variable -> ID Column {h: 0, e1: 1, o2: 5, ...} Mapping : Variable.Property -> Property Column {h.name: 0, h.height: 1, c1.name: 2, ...} Embedding - Data structure used for intermediate results Identifiers Properties Paths Embedding
  • 12. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 12 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 12 Cypher on Gradoop Filter Hobbit(name=Frodo Baggins) name: Frodo Baggins height: 1.22m gender: male city: Bag End Project [ ] h.id h.name h.height … 31 Frodo 1.22 … h.id 32 id Properties 1 {…} 2 {…} 3 {…} … … DataSet<Vertex> DataSet<Embedding> FlatMap(Vertex -> Embedding) 𝜋ℎ.𝐼𝑑(𝑉′ )𝜎 𝐿𝑎𝑏𝑒𝑙=𝐻𝑜𝑏𝑏𝑖𝑡 ∧𝑛𝑎𝑚𝑒=𝐹𝑟𝑜𝑑𝑜 (𝑉)
  • 13. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 13 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 13 Cypher on Gradoop c.id _e1.id o1.id 51 11 2 52 12 3 … … … DataSet<Embedding> DataSet<Embedding> FlatJoin(lhs, rhs -> combine(lhs, rhs))DataSet<Embedding> o1.id _e2.id o2.id 2 13 5 3 14 3 … … … c.id _e1.id o1.id _e2.id o2.id 51 11 2 13 5 52 12 3 14 3 … … … Combine Check for vertex/edge isomorphism, Remove duplicate entries JoinEmbeddings Left: (c1:Clan)<-[:hasLeader]-(o1:Orc) Right: (o1:Orc)-[:hates]->(o2.Orc) 𝐿 ⋈ 𝑜1.𝑖𝑑 𝑅
  • 14. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 14 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 14 Cypher on Gradoop ExpandEmbeddings o2.id 5 DataSet<Embedding> DataSet<Embedding> DataSet<Embedding> _e3.sid _e3.id _e3.tid 5 26 31 31 27 32 32 28 33 o2.id _e3.id h.id 3 [26] 31 3 [26,31,27] 32 3 [26,31,27,32,28] 33 FlatJoin(lhs, rhs -> combine(lhs, rhs)) BulkIteration 𝐿 ⋈ 𝑜2.𝑖𝑑=_𝑒3.𝑠𝑖𝑑 𝐸 Left: (o2:Orc) Edge: (o2)-[:knows*1..10]->(h) 𝐸′ ⋈ 𝑒.𝑡𝑖𝑑=_𝑒3.𝑠𝑖𝑑 𝐸 Combine Check for vertex/edge isomorphism
  • 15. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 15 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 15 Cypher on Gradoop
  • 16. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 16 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 16 Cypher on Gradoop
  • 17. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 17 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 17 Cypher on Gradoop
  • 18. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 18 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 18 Conclusion • Implement Cypher Technology Compatibility KIT (TCK) integration tests • Benchmarking • Implement and evaluate LDBC benchmarking queries • Optimizations • DP-Planner • Improve cost model (more statistics, Flink optimizer hints) • Reuse of intermediate results • Consider graph partitioning • Support more Cypher features • e.g. Aggregation and Functions • Introduce new Cypher features • e.g. regular path queries
  • 19. Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 19 Gradoop Extended Property Graphs ConclusionCypher on Gradoop 19 Conclusion • Gradoop on Apache Flink • Extended Property Graph abstraction on Apache Flink • Schema flexible: Type Labels and Properties • Logical Graphs / Graphs Collection • Graph Transformations for Graphs and Graph collections • Cypher on Gradoop • Covering many Cypher features (variable length paths, predicates) • Query execution engine incl. Greedy cost-based optimizer • Physical operators mapped to Flink transformations
  • 20. www.gradoop.com [1] Junghanns, M.; Petermann, A.; K.; Rahm, E., „Distributed Grouping of Property Graphs with Gradoop“, Proc. BTW Conf. , 2017. [2] Petermann, A.; Junghanns, M.; Kemper, S.; Gomez, K.; Teichmann, N.; Rahm, E., „Graph Mining for Complex Data Analytics “, Proc. ICDM Conf. (Demo), 2016. [3] Junghanns, M.; Petermann, A.; Teichmann, N.; Gomez, K.; Rahm, E., „Analyzing Extended Property Graphs with Apache Flink“, Int. Workshop on Network Data Analytics (NDA), SIGMOD, 2016. [4] Petermann, A.; Junghanns, M., „Scalable Business Intelligence with Graph Collections“, it – Special Issue on Big Data Analytics, 2016. [5] Petermann, A.; Junghanns, M.; Müller, M.; Rahm, E., „Graph-based Data Integration and Business Intelligence with BIIIG“, Proc. VLDB Conf. (Demo), 2014.