SlideShare a Scribd company logo
Copyright © ArangoDB GmbH, 2017 -
Confidential
+ +
Handling Billions Of Edges in
a Graph Database
1
‣ Michael Hackstein
‣ ArangoDB Core Team
‣ Graph visualisation
‣ Graph features
‣ SmartGraphs
‣ Host of cologne.js
‣ Master’s Degree (spec. Databases and
Information Systems)
{
name: "alice",
age: 32
}
{
name: "dancing"
}
{
name: "bob",
age: 35,
size: 1,73m
}
{
name: "reading"
}
{
name: "fishing"
}
hobby
hobby
hobby
hobby
‣ Schema-free Objects (Vertices)
‣ Relations between them (Edges)
‣ Edges have a direction
‣ Edges can be queried in both directions
‣ Easily query a range of edges (2 to 5)
‣ Undefined number of edges (1 to *)
‣ Shortest Path between two vertices
BobBob
Charly DaveCharly Dave
‣ Give me all friends of Alice
Alice
Eve FrankEve Frank
FrankFrankEveEve
Alice
Bob
Charly Dave
‣ Give me all friends-of-friends of Alice
Bob
Charly Dave
Eve BobEve
AliceCharly Dave
Frank
‣ What is the linking path between Alice and Eve
You are
here
‣ Which Train Stations can I reach if I am allowed to drive a distance of at most 6
stations on my ticket
FriendFriend
‣ Give me all users that share two hobbies with Alice
Alice
Hobby1 Hobby2
ProductAlice
Product
Friend
‣ Give me all products that at least one of my friends has bought together with
the products I already own, ordered by how many friends have bought it and
the products rating, but only 20 of them.
has_bought
has_bought
has_boughtis_friend
Product
‣ Give me all users which have an age attribute between 21 and 35.
‣ Give me the age distribution of all users
‣ Group all users by their name
Traversal: Iterate down two edges with some filters
We first pick a start vertex (S)
Traversal: Iterate down two edges with some filters
We first pick a start vertex (S)
We collect all edges on S
Traversal: Iterate down two edges with some filters
We first pick a start vertex (S)
We collect all edges on S
We apply filters on edges
Traversal: Iterate down two edges with some filters
We first pick a start vertex (S)
We collect all edges on S
We apply filters on edges
We iterate down one of the edges to (A)
Traversal: Iterate down two edges with some filters
We first pick a start vertex (S)
We collect all edges on S
We apply filters on edges
We iterate down one of the edges to (A)
We apply filters on edges
Traversal: Iterate down two edges with some filters
We first pick a start vertex (S)
We collect all edges on S
We apply filters on edges
We iterate down one of the edges to (A)
We apply filters on edges
The next vertex (E) is in desired depth.
Return the path S -> A -> E
Traversal: Iterate down two edges with some filters
We first pick a start vertex (S)
We collect all edges on S
We apply filters on edges
We iterate down one of the edges to (A)
We apply filters on edges
The next vertex (E) is in desired depth.
Return the path S -> A -> E
Go back to the next unfinished vertex (B)
Traversal: Iterate down two edges with some filters
We first pick a start vertex (S)
We collect all edges on S
We apply filters on edges
We iterate down one of the edges to (A)
We apply filters on edges
The next vertex (E) is in desired depth.
Return the path S -> A -> E
Go back to the next unfinished vertex (B)
We iterate down on (B)
Traversal: Iterate down two edges with some filters
We first pick a start vertex (S)
We collect all edges on S
We apply filters on edges
We iterate down one of the edges to (A)
We apply filters on edges
The next vertex (E) is in desired depth.
Return the path S -> A -> E
Go back to the next unfinished vertex (B)
We iterate down on (B)
We apply filters on edges
Traversal: Iterate down two edges with some filters
We first pick a start vertex (S)
We collect all edges on S
We apply filters on edges
We iterate down one of the edges to (A)
We apply filters on edges
The next vertex (E) is in desired depth.
Return the path S -> A -> E
Go back to the next unfinished vertex (B)
We iterate down on (B)
We apply filters on edges
The next vertex (F) is in desired depth.
Return the path S -> B -> F
Traversal: Complexity
Once: Operation Comment O
Find the start vertex Depends on indexes: Hash: 1
For every
depth:
Find all connected edges Edge-Index or Index-Free: 1
Filter non-matching edges Linear in edges: n
Find connected vertices Depends on indexes: Hash: n · 1
Filter non-matching vertices Linear in vertices: n
Total for
one pass:
3n
Traversal: Complexity
Linear sounds evil?
Traversal: Complexity
Linear sounds evil?
NOT linear in all edges O(E)
Only linear in relevant edges n < E
Traversal: Complexity
Linear sounds evil?
NOT linear in all edges O(E)
Only linear in relevant edges n < E
Traversals solely scale with their result size
Traversal: Complexity
Linear sounds evil?
NOT linear in all edges O(E)
Only linear in relevant edges n < E
Traversals solely scale with their result size
They are not effected at all by total amount of data
Traversal: Complexity
Linear sounds evil?
NOT linear in all edges O(E)
Only linear in relevant edges n < E
Traversals solely scale with their result size
They are not effected at all by total amount of data
BUT: Every depth increases the exponent: O((3n)d)
Traversal: Complexity
Linear sounds evil?
NOT linear in all edges O(E)
Only linear in relevant edges n < E
Traversals solely scale with their result size
They are not effected at all by total amount of data
BUT: Every depth increases the exponent: O((3n)d)
“7 degrees of separation”: n6 < E < n7
‣ MULTI-MODEL database
‣ Stores Key Value, Documents, and Graphs
‣ All in one core
‣ Query language AQL
‣ Document Queries
‣ Graph Queries
‣ Joins
‣ All can be combined in the same statement
‣ ACID support including Multi Collection Transactions
+ +
FOR user IN users
RETURN user
FOR user IN users
FILTER user.name == "alice"
RETURN user
Alice
FOR user IN users
FILTER user.name == "alice"
FOR product IN OUTBOUND user has_bought
RETURN product
Alice
has_bought
TV
FOR user IN users
FILTER user.name == "alice"
FOR recommendation, action, path IN 3 ANY user has_bought
FILTER path.vertices[2].age <= user.age + 5
AND path.vertices[2].age >= user.age - 5
FILTER recommendation.price < 25
LIMIT 10
RETURN recommendation
Alice
has_bought
TV
has_bought
playstation.price < 25
PlaystationBob
alice.age - 5 <= bob.age &&
bob.age <= alice.age + 5
has_bought
‣ Many graphs have "celebrities"
‣ Vertices with many inbound and/or outbound edges
‣ Traversing over them is expensive (linear in number of Edges)
‣ Often you only need a subset of edges
Bob Alice
‣ Remember Complexity? O(3 * nd
)
‣ Filtering of non-matching edges is linear for every depth
‣ Index all edges based on their vertices and arbitrary other attributes
‣ Find initial set of edges in identical time
‣ Less / No post-filtering required
‣ This decreases the n significantly
Alice
‣ We have the rise of big data
‣ Store everything you can
‣ Dataset easily grows beyond one machine
‣ This includes graph data!
Scaling horizontally
Distribute graph on several machines (sharding)
Scaling horizontally
Distribute graph on several machines (sharding)
How to query it now?
No global view of the graph possible any more
What about edges between servers?
Scaling horizontally
Distribute graph on several machines (sharding)
How to query it now?
No global view of the graph possible any more
What about edges between servers?
In a sharded environment the network most of the time is the bottleneck
Reduce network hops
Scaling horizontally
Distribute graph on several machines (sharding)
How to query it now?
No global view of the graph possible any more
What about edges between servers?
In a sharded environment the network most of the time is the bottleneck
Reduce network hops
Vertex-Centric Indexes again help with super-nodes
But: Only on a local machine
Random distribution
Advantages:
every server takes an equal portion
of the data
easy to realize
no knowledge about data required
always works
Disadvantages:
Neighbors on different machines
Probably edges on other machines
than their vertices
A lot of network overhead is
required for querying
Random distribution
Advantages:
every server takes an equal portion
of the data
easy to realize
no knowledge about data required
always works
Disadvantages:
Neighbors on different machines
Probably edges on other machines
than their vertices
A lot of network overhead is
required for querying
Domain-based Distribution
Many Graphs have a natural distribution
By country/region for People
By tags for Blogs
By category for Products
Most edges in the same group
Rare edges between groups
Domain-based Distribution
Many Graphs have a natural distribution
By country/region for People
By tags for Blogs
By category for Products
Most edges in the same group
Rare edges between groups
Domain-based Distribution
Many Graphs have a natural distribution
By country/region for People
By tags for Blogs
By category for Products
Most edges in the same group
Rare edges between groups
Smart graphs: this is how it works
Smart graphs: this is how it works
Smart graphs: this is how it works
‣ ArangoDB uses a hash-based edge index (O(1) - lookup)
‣ The vertex is independent of its edges
‣ It can be stored on a different machine
‣ Used by most other graph databases
‣ Every vertex maintains two lists of it's edges (IN and OUT)
‣ Do not use an index to find edges
‣ How to shard this?
????
‣ Further questions?
‣ Follow us on twitter: @arangodb
‣ Join our slack: slack.arangodb.com
‣ Follow me on twitter/github: @mchacki
Thank You

More Related Content

Similar to Scaling to billions of Edges in a Graph Database by Max Neunhoeffer at Big Data Spain 2017

Overlay Stitch Meshing
Overlay Stitch MeshingOverlay Stitch Meshing
Overlay Stitch MeshingDon Sheehy
 
Talk on Graph Theory - I
Talk on Graph Theory - ITalk on Graph Theory - I
Talk on Graph Theory - I
Anirudh Raja
 
Graphs in data structures
Graphs in data structuresGraphs in data structures
Graphs in data structures
Savit Chandra
 
Graphs
GraphsGraphs
Graph Data Structure
Graph Data StructureGraph Data Structure
Graph Data Structure
Afaq Mansoor Khan
 
Unit VI - Graphs.ppt
Unit VI - Graphs.pptUnit VI - Graphs.ppt
Unit VI - Graphs.ppt
HODElex
 
Graphs
GraphsGraphs
Daa chpater 12
Daa chpater 12Daa chpater 12
Daa chpater 12
B.Kirron Reddi
 
Graph Basic In Data structure
Graph Basic In Data structureGraph Basic In Data structure
Graph Basic In Data structure
Ikhlas Rahman
 
UNIT 4_DSA_KAVITHA_RMP.ppt
UNIT 4_DSA_KAVITHA_RMP.pptUNIT 4_DSA_KAVITHA_RMP.ppt
UNIT 4_DSA_KAVITHA_RMP.ppt
KavithaMuralidharan2
 
Class01_Computer_Contest_Level_3_Notes_Sep_07 - Copy.pdf
Class01_Computer_Contest_Level_3_Notes_Sep_07 - Copy.pdfClass01_Computer_Contest_Level_3_Notes_Sep_07 - Copy.pdf
Class01_Computer_Contest_Level_3_Notes_Sep_07 - Copy.pdf
ChristianKapsales1
 
Graphical reprsentation dsa (DATA STRUCTURE ALGORITHM)
Graphical reprsentation dsa (DATA STRUCTURE ALGORITHM)Graphical reprsentation dsa (DATA STRUCTURE ALGORITHM)
Graphical reprsentation dsa (DATA STRUCTURE ALGORITHM)
abdulrafaychaudhry
 
Self healing data
Self healing dataSelf healing data
Self healing data
Uwe Friedrichsen
 
Graph Data Structure
Graph Data StructureGraph Data Structure
Graph Data Structure
Keno benti
 
graph representation.pdf
graph representation.pdfgraph representation.pdf
graph representation.pdf
amitbhachne
 
22-graphs1-dfs-bfs.ppt
22-graphs1-dfs-bfs.ppt22-graphs1-dfs-bfs.ppt
22-graphs1-dfs-bfs.ppt
KarunaBiswas3
 
Data Structures and Agorithm: DS 21 Graph Theory.pptx
Data Structures and Agorithm: DS 21 Graph Theory.pptxData Structures and Agorithm: DS 21 Graph Theory.pptx
Data Structures and Agorithm: DS 21 Graph Theory.pptx
RashidFaridChishti
 
09-graphs.ppt
09-graphs.ppt09-graphs.ppt
09-graphs.ppt
Dr. Ajay Kumar Singh
 
Graphs
GraphsGraphs
Graphs
LavanyaJ28
 

Similar to Scaling to billions of Edges in a Graph Database by Max Neunhoeffer at Big Data Spain 2017 (20)

Overlay Stitch Meshing
Overlay Stitch MeshingOverlay Stitch Meshing
Overlay Stitch Meshing
 
Talk on Graph Theory - I
Talk on Graph Theory - ITalk on Graph Theory - I
Talk on Graph Theory - I
 
Graphs in data structures
Graphs in data structuresGraphs in data structures
Graphs in data structures
 
Graphs
GraphsGraphs
Graphs
 
Graph Data Structure
Graph Data StructureGraph Data Structure
Graph Data Structure
 
Unit VI - Graphs.ppt
Unit VI - Graphs.pptUnit VI - Graphs.ppt
Unit VI - Graphs.ppt
 
Graphs
GraphsGraphs
Graphs
 
Daa chpater 12
Daa chpater 12Daa chpater 12
Daa chpater 12
 
Graph Basic In Data structure
Graph Basic In Data structureGraph Basic In Data structure
Graph Basic In Data structure
 
UNIT 4_DSA_KAVITHA_RMP.ppt
UNIT 4_DSA_KAVITHA_RMP.pptUNIT 4_DSA_KAVITHA_RMP.ppt
UNIT 4_DSA_KAVITHA_RMP.ppt
 
05 graph
05 graph05 graph
05 graph
 
Class01_Computer_Contest_Level_3_Notes_Sep_07 - Copy.pdf
Class01_Computer_Contest_Level_3_Notes_Sep_07 - Copy.pdfClass01_Computer_Contest_Level_3_Notes_Sep_07 - Copy.pdf
Class01_Computer_Contest_Level_3_Notes_Sep_07 - Copy.pdf
 
Graphical reprsentation dsa (DATA STRUCTURE ALGORITHM)
Graphical reprsentation dsa (DATA STRUCTURE ALGORITHM)Graphical reprsentation dsa (DATA STRUCTURE ALGORITHM)
Graphical reprsentation dsa (DATA STRUCTURE ALGORITHM)
 
Self healing data
Self healing dataSelf healing data
Self healing data
 
Graph Data Structure
Graph Data StructureGraph Data Structure
Graph Data Structure
 
graph representation.pdf
graph representation.pdfgraph representation.pdf
graph representation.pdf
 
22-graphs1-dfs-bfs.ppt
22-graphs1-dfs-bfs.ppt22-graphs1-dfs-bfs.ppt
22-graphs1-dfs-bfs.ppt
 
Data Structures and Agorithm: DS 21 Graph Theory.pptx
Data Structures and Agorithm: DS 21 Graph Theory.pptxData Structures and Agorithm: DS 21 Graph Theory.pptx
Data Structures and Agorithm: DS 21 Graph Theory.pptx
 
09-graphs.ppt
09-graphs.ppt09-graphs.ppt
09-graphs.ppt
 
Graphs
GraphsGraphs
Graphs
 

More from Big Data Spain

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Big Data Spain
 

More from Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 

Recently uploaded

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 

Recently uploaded (20)

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 

Scaling to billions of Edges in a Graph Database by Max Neunhoeffer at Big Data Spain 2017

  • 1.
  • 2. Copyright © ArangoDB GmbH, 2017 - Confidential + + Handling Billions Of Edges in a Graph Database 1
  • 3. ‣ Michael Hackstein ‣ ArangoDB Core Team ‣ Graph visualisation ‣ Graph features ‣ SmartGraphs ‣ Host of cologne.js ‣ Master’s Degree (spec. Databases and Information Systems)
  • 4. { name: "alice", age: 32 } { name: "dancing" } { name: "bob", age: 35, size: 1,73m } { name: "reading" } { name: "fishing" } hobby hobby hobby hobby ‣ Schema-free Objects (Vertices) ‣ Relations between them (Edges) ‣ Edges have a direction ‣ Edges can be queried in both directions ‣ Easily query a range of edges (2 to 5) ‣ Undefined number of edges (1 to *) ‣ Shortest Path between two vertices
  • 5. BobBob Charly DaveCharly Dave ‣ Give me all friends of Alice Alice Eve FrankEve Frank
  • 6. FrankFrankEveEve Alice Bob Charly Dave ‣ Give me all friends-of-friends of Alice Bob Charly Dave
  • 7. Eve BobEve AliceCharly Dave Frank ‣ What is the linking path between Alice and Eve
  • 8. You are here ‣ Which Train Stations can I reach if I am allowed to drive a distance of at most 6 stations on my ticket
  • 9. FriendFriend ‣ Give me all users that share two hobbies with Alice Alice Hobby1 Hobby2
  • 10. ProductAlice Product Friend ‣ Give me all products that at least one of my friends has bought together with the products I already own, ordered by how many friends have bought it and the products rating, but only 20 of them. has_bought has_bought has_boughtis_friend Product
  • 11. ‣ Give me all users which have an age attribute between 21 and 35. ‣ Give me the age distribution of all users ‣ Group all users by their name
  • 12. Traversal: Iterate down two edges with some filters We first pick a start vertex (S)
  • 13. Traversal: Iterate down two edges with some filters We first pick a start vertex (S) We collect all edges on S
  • 14. Traversal: Iterate down two edges with some filters We first pick a start vertex (S) We collect all edges on S We apply filters on edges
  • 15. Traversal: Iterate down two edges with some filters We first pick a start vertex (S) We collect all edges on S We apply filters on edges We iterate down one of the edges to (A)
  • 16. Traversal: Iterate down two edges with some filters We first pick a start vertex (S) We collect all edges on S We apply filters on edges We iterate down one of the edges to (A) We apply filters on edges
  • 17. Traversal: Iterate down two edges with some filters We first pick a start vertex (S) We collect all edges on S We apply filters on edges We iterate down one of the edges to (A) We apply filters on edges The next vertex (E) is in desired depth. Return the path S -> A -> E
  • 18. Traversal: Iterate down two edges with some filters We first pick a start vertex (S) We collect all edges on S We apply filters on edges We iterate down one of the edges to (A) We apply filters on edges The next vertex (E) is in desired depth. Return the path S -> A -> E Go back to the next unfinished vertex (B)
  • 19. Traversal: Iterate down two edges with some filters We first pick a start vertex (S) We collect all edges on S We apply filters on edges We iterate down one of the edges to (A) We apply filters on edges The next vertex (E) is in desired depth. Return the path S -> A -> E Go back to the next unfinished vertex (B) We iterate down on (B)
  • 20. Traversal: Iterate down two edges with some filters We first pick a start vertex (S) We collect all edges on S We apply filters on edges We iterate down one of the edges to (A) We apply filters on edges The next vertex (E) is in desired depth. Return the path S -> A -> E Go back to the next unfinished vertex (B) We iterate down on (B) We apply filters on edges
  • 21. Traversal: Iterate down two edges with some filters We first pick a start vertex (S) We collect all edges on S We apply filters on edges We iterate down one of the edges to (A) We apply filters on edges The next vertex (E) is in desired depth. Return the path S -> A -> E Go back to the next unfinished vertex (B) We iterate down on (B) We apply filters on edges The next vertex (F) is in desired depth. Return the path S -> B -> F
  • 22. Traversal: Complexity Once: Operation Comment O Find the start vertex Depends on indexes: Hash: 1 For every depth: Find all connected edges Edge-Index or Index-Free: 1 Filter non-matching edges Linear in edges: n Find connected vertices Depends on indexes: Hash: n · 1 Filter non-matching vertices Linear in vertices: n Total for one pass: 3n
  • 24. Traversal: Complexity Linear sounds evil? NOT linear in all edges O(E) Only linear in relevant edges n < E
  • 25. Traversal: Complexity Linear sounds evil? NOT linear in all edges O(E) Only linear in relevant edges n < E Traversals solely scale with their result size
  • 26. Traversal: Complexity Linear sounds evil? NOT linear in all edges O(E) Only linear in relevant edges n < E Traversals solely scale with their result size They are not effected at all by total amount of data
  • 27. Traversal: Complexity Linear sounds evil? NOT linear in all edges O(E) Only linear in relevant edges n < E Traversals solely scale with their result size They are not effected at all by total amount of data BUT: Every depth increases the exponent: O((3n)d)
  • 28. Traversal: Complexity Linear sounds evil? NOT linear in all edges O(E) Only linear in relevant edges n < E Traversals solely scale with their result size They are not effected at all by total amount of data BUT: Every depth increases the exponent: O((3n)d) “7 degrees of separation”: n6 < E < n7
  • 29. ‣ MULTI-MODEL database ‣ Stores Key Value, Documents, and Graphs ‣ All in one core ‣ Query language AQL ‣ Document Queries ‣ Graph Queries ‣ Joins ‣ All can be combined in the same statement ‣ ACID support including Multi Collection Transactions + +
  • 30. FOR user IN users RETURN user
  • 31. FOR user IN users FILTER user.name == "alice" RETURN user Alice
  • 32. FOR user IN users FILTER user.name == "alice" FOR product IN OUTBOUND user has_bought RETURN product Alice has_bought TV
  • 33. FOR user IN users FILTER user.name == "alice" FOR recommendation, action, path IN 3 ANY user has_bought FILTER path.vertices[2].age <= user.age + 5 AND path.vertices[2].age >= user.age - 5 FILTER recommendation.price < 25 LIMIT 10 RETURN recommendation Alice has_bought TV has_bought playstation.price < 25 PlaystationBob alice.age - 5 <= bob.age && bob.age <= alice.age + 5 has_bought
  • 34. ‣ Many graphs have "celebrities" ‣ Vertices with many inbound and/or outbound edges ‣ Traversing over them is expensive (linear in number of Edges) ‣ Often you only need a subset of edges Bob Alice
  • 35. ‣ Remember Complexity? O(3 * nd ) ‣ Filtering of non-matching edges is linear for every depth ‣ Index all edges based on their vertices and arbitrary other attributes ‣ Find initial set of edges in identical time ‣ Less / No post-filtering required ‣ This decreases the n significantly Alice
  • 36. ‣ We have the rise of big data ‣ Store everything you can ‣ Dataset easily grows beyond one machine ‣ This includes graph data!
  • 37. Scaling horizontally Distribute graph on several machines (sharding)
  • 38. Scaling horizontally Distribute graph on several machines (sharding) How to query it now? No global view of the graph possible any more What about edges between servers?
  • 39. Scaling horizontally Distribute graph on several machines (sharding) How to query it now? No global view of the graph possible any more What about edges between servers? In a sharded environment the network most of the time is the bottleneck Reduce network hops
  • 40. Scaling horizontally Distribute graph on several machines (sharding) How to query it now? No global view of the graph possible any more What about edges between servers? In a sharded environment the network most of the time is the bottleneck Reduce network hops Vertex-Centric Indexes again help with super-nodes But: Only on a local machine
  • 41. Random distribution Advantages: every server takes an equal portion of the data easy to realize no knowledge about data required always works Disadvantages: Neighbors on different machines Probably edges on other machines than their vertices A lot of network overhead is required for querying
  • 42. Random distribution Advantages: every server takes an equal portion of the data easy to realize no knowledge about data required always works Disadvantages: Neighbors on different machines Probably edges on other machines than their vertices A lot of network overhead is required for querying
  • 43. Domain-based Distribution Many Graphs have a natural distribution By country/region for People By tags for Blogs By category for Products Most edges in the same group Rare edges between groups
  • 44. Domain-based Distribution Many Graphs have a natural distribution By country/region for People By tags for Blogs By category for Products Most edges in the same group Rare edges between groups
  • 45. Domain-based Distribution Many Graphs have a natural distribution By country/region for People By tags for Blogs By category for Products Most edges in the same group Rare edges between groups
  • 46. Smart graphs: this is how it works
  • 47. Smart graphs: this is how it works
  • 48. Smart graphs: this is how it works
  • 49. ‣ ArangoDB uses a hash-based edge index (O(1) - lookup) ‣ The vertex is independent of its edges ‣ It can be stored on a different machine ‣ Used by most other graph databases ‣ Every vertex maintains two lists of it's edges (IN and OUT) ‣ Do not use an index to find edges ‣ How to shard this? ????
  • 50. ‣ Further questions? ‣ Follow us on twitter: @arangodb ‣ Join our slack: slack.arangodb.com ‣ Follow me on twitter/github: @mchacki Thank You