Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Polyglot Persistence & Multi-Model ... by ArangoDB Database 1434 views
- Introduction to Foxx by our communi... by ArangoDB Database 1743 views
- Handling Billions of Edges in a Gra... by ArangoDB Database 2469 views
- Handling Billions of Edges in a Gra... by ArangoDB Database 1022 views
- Software + Babies by ArangoDB Database 846 views
- Polyglot Persistence & Multi-Model ... by ArangoDB Database 1865 views

1,498 views

Published on

Published in:
Technology

No Downloads

Total views

1,498

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

27

Comments

0

Likes

5

No embeds

No notes for slide

- 1. Copyright © ArangoDB GmbH, 2016 Handling Billions Of Edges in a Graph Database Michael Hackstein @mchacki
- 2. Copyright © ArangoDB GmbH, 2016 ‣ Schema-free Objects (Vertices) ‣ Relations between them (Edges) ‣ Edges have a direction { name: "alice", age: 32 } { name: "dancing" } { name: "bob", age: 35, size: 1,73m } { name: "reading" } { name: "fishing" } hobby hobby hobby hobby
- 3. Copyright © ArangoDB GmbH, 2016 ‣ Schema-free Objects (Vertices) ‣ Relations between them (Edges) ‣ Edges have a direction ‣ Edges can be queried in both directions ‣ Easily query a range of edges (2 to 5) ‣ Undeﬁned number of edges (1 to *) ‣ Shortest Path between two vertices { name: "alice", age: 32 } { name: "dancing" } { name: "bob", age: 35, size: 1,73m } { name: "reading" } { name: "fishing" } hobby hobby hobby hobby
- 4. Copyright © ArangoDB GmbH, 2016 ‣ Give me all friends of Alice Bob Charly DaveAlice Eve Frank
- 5. Copyright © ArangoDB GmbH, 2016 ‣ Give me all friends of Alice Bob Charly DaveCharly Bob DaveAlice Eve Frank
- 6. Copyright © ArangoDB GmbH, 2016 ‣ Give me all friends-of-friends of Alice Alice Bob Charly Eve Dave Frank
- 7. Copyright © ArangoDB GmbH, 2016 ‣ Give me all friends-of-friends of Alice Alice Bob Charly Eve Dave FrankEve Frank
- 8. Copyright © ArangoDB GmbH, 2016 ‣ What is the linking path between Alice and Eve Alice Bob Charly Eve Dave Frank
- 9. Copyright © ArangoDB GmbH, 2016 ‣ What is the linking path between Alice and Eve Alice Bob Charly Eve Dave FrankBobEve Alice
- 10. Copyright © ArangoDB GmbH, 2016 ‣ Which Trainstations can I reach if I am allowed to drive a distance of at most 6 stations on my ticket You are here
- 11. Copyright © ArangoDB GmbH, 2016 ‣ Which Trainstations can I reach if I am allowed to drive a distance of at most 6 stations on my ticket You are here
- 12. Copyright © ArangoDB GmbH, 2016 Friend ‣ Give me all users that share two hobbies with Alice Alice Hobby1 Hobby2
- 13. Copyright © ArangoDB GmbH, 2016 FriendFriend ‣ Give me all users that share two hobbies with Alice Alice Hobby1 Hobby2
- 14. Copyright © ArangoDB GmbH, 2016 Alice Product ProductFriend ‣ Give me all products that at least one of my friends has bought together with the products I already own, ordered by how many friends have bought it and the products rating, but only 20 of them. has_bought has_bought has_boughtis_friend
- 15. Copyright © ArangoDB GmbH, 2016 Alice Product ProductFriend Product ‣ Give me all products that at least one of my friends has bought together with the products I already own, ordered by how many friends have bought it and the products rating, but only 20 of them. has_bought has_bought has_boughtis_friend
- 16. Copyright © ArangoDB GmbH, 2016
- 17. Copyright © ArangoDB GmbH, 2016 ‣ Give me all users which have an age attribute between 21 and 35.
- 18. Copyright © ArangoDB GmbH, 2016 ‣ Give me all users which have an age attribute between 21 and 35. ‣ Give me the age distribution of all users
- 19. Copyright © ArangoDB GmbH, 2016 ‣ Give me all users which have an age attribute between 21 and 35. ‣ Give me the age distribution of all users ‣ Group all users by their name
- 20. Copyright © ArangoDB GmbH, 2016
- 21. Copyright © ArangoDB GmbH, 2016 ‣ We ﬁrst pick a start vertex (S) S
- 22. Copyright © ArangoDB GmbH, 2016 ‣ We ﬁrst pick a start vertex (S) ‣ We collect all edges on S B C A S
- 23. Copyright © ArangoDB GmbH, 2016 ‣ We ﬁrst pick a start vertex (S) ‣ We collect all edges on S ‣ We apply ﬁlters on edges B C A SS A B
- 24. Copyright © ArangoDB GmbH, 2016 ‣ We ﬁrst pick a start vertex (S) ‣ We collect all edges on S ‣ We apply ﬁlters on edges ‣ We iterate down one of the new vertices (A) D E B C A SS A B
- 25. Copyright © ArangoDB GmbH, 2016 ‣ We ﬁrst pick a start vertex (S) ‣ We collect all edges on S ‣ We apply ﬁlters on edges ‣ We iterate down one of the new vertices (A) ‣ We apply ﬁlters on edges D E B C A SS E A B
- 26. Copyright © ArangoDB GmbH, 2016 ‣ We ﬁrst pick a start vertex (S) ‣ We collect all edges on S ‣ We apply ﬁlters on edges ‣ We iterate down one of the new vertices (A) ‣ We apply ﬁlters on edges ‣ The next vertex (E) is in desired depth. Return the path S -> A -> E D E B C A SS E A B
- 27. Copyright © ArangoDB GmbH, 2016 ‣ We ﬁrst pick a start vertex (S) ‣ We collect all edges on S ‣ We apply ﬁlters on edges ‣ We iterate down one of the new vertices (A) ‣ We apply ﬁlters on edges ‣ The next vertex (E) is in desired depth. Return the path S -> A -> E ‣ Go back to the next unﬁnished vertex (B) D E B C A SS E A B
- 28. Copyright © ArangoDB GmbH, 2016 ‣ We ﬁrst pick a start vertex (S) ‣ We collect all edges on S ‣ We apply ﬁlters on edges ‣ We iterate down one of the new vertices (A) ‣ We apply ﬁlters on edges ‣ The next vertex (E) is in desired depth. Return the path S -> A -> E ‣ Go back to the next unﬁnished vertex (B) ‣ We iterate down on (B) D E F B C A SS E A B
- 29. Copyright © ArangoDB GmbH, 2016 ‣ We ﬁrst pick a start vertex (S) ‣ We collect all edges on S ‣ We apply ﬁlters on edges ‣ We iterate down one of the new vertices (A) ‣ We apply ﬁlters on edges ‣ The next vertex (E) is in desired depth. Return the path S -> A -> E ‣ Go back to the next unﬁnished vertex (B) ‣ We iterate down on (B) ‣ We apply ﬁlters on edges D E F B C A SS E A F B
- 30. Copyright © ArangoDB GmbH, 2016 ‣ We ﬁrst pick a start vertex (S) ‣ We collect all edges on S ‣ We apply ﬁlters on edges ‣ We iterate down one of the new vertices (A) ‣ We apply ﬁlters on edges ‣ The next vertex (E) is in desired depth. Return the path S -> A -> E ‣ Go back to the next unﬁnished vertex (B) ‣ We iterate down on (B) ‣ We apply ﬁlters on edges ‣ The next vertex (F) is in desired depth. Return the path S -> B -> F D E F B C A SS E A F B
- 31. Copyright © ArangoDB GmbH, 2016 ‣ Once: ‣ Find the start vertex ‣ For every depth: ‣ Find all connected edges ‣ Filter non-matching edges ‣ Find connected vertices ‣ Filter non-matching vertices Depends on indexes: Hash: Edge-Index or Index-Free: Linear in edges: Depends on indexes: Hash: Linear in vertices: Only one pass: O 1 1 n n * 1 n 3n
- 32. Copyright © ArangoDB GmbH, 2016 ‣ Linear sounds evil? ‣ NOT linear in All Edges O(E) ‣ Only Linear in relevant Edges n < E ‣ Traversals solely scale with their result size ‣ They are not eﬀected at all by total amount of data ‣ BUT: Every depth increases the exponent: O((3n)d ) ‣ "7 degrees of separation": (3n)6 < E < (3n)7
- 33. Copyright © ArangoDB GmbH, 2016 ‣ MULTI-MODEL database ‣ Stores Key Value, Documents, and Graphs ‣ All in one core ‣ Query language AQL ‣ Document Queries ‣ Graph Queries ‣ Joins ‣ All can be combined in the same statement ‣ ACID support including Multi Collection Transactions + +
- 34. Copyright © ArangoDB GmbH, 2016 FOR user IN users RETURN user
- 35. Copyright © ArangoDB GmbH, 2016 FOR user IN users FILTER user.name == "alice" RETURN user Alice
- 36. Copyright © ArangoDB GmbH, 2016 FOR user IN users FILTER user.name == "alice" FOR product IN OUTBOUND user has_bought RETURN product Alice
- 37. Copyright © ArangoDB GmbH, 2016 FOR user IN users FILTER user.name == "alice" FOR product IN OUTBOUND user has_bought RETURN product Alice has_bought TV
- 38. Copyright © ArangoDB GmbH, 2016 FOR user IN users FILTER user.name == "alice" FOR recommendation, action, path IN 3 ANY user has_bought FILTER path.vertices[2].age <= user.age + 5 AND path.vertices[2].age >= user.age - 5 FILTER recommendation.price < 25 LIMIT 10 RETURN recommendation Alice has_bought TV
- 39. Copyright © ArangoDB GmbH, 2016 FOR user IN users FILTER user.name == "alice" FOR recommendation, action, path IN 3 ANY user has_bought FILTER path.vertices[2].age <= user.age + 5 AND path.vertices[2].age >= user.age - 5 FILTER recommendation.price < 25 LIMIT 10 RETURN recommendation Alice has_bought TV has_bought playstation.price < 25 PlaystationBob alice.age - 5 <= bob.age && bob.age <= alice.age + 5 has_bought
- 40. Copyright © ArangoDB GmbH, 2016 ‣ Many graphs have "Supernodes" ‣ Vertices with many inbound and/or outbound edges ‣ Traversing over them is expensive ‣ Often you only need a subset of edges Bob Alice
- 41. Copyright © ArangoDB GmbH, 2016 ‣ Remember Complexity? O((3n)d ) ‣ Filtering of non-matching edges is linear for every depth ‣ Index all edges based on their vertices and arbitrary other attributes ‣ Find initial set of edges in identical time ‣ Less / No post-ﬁltering required ‣ This decreases the n signiﬁcantly Alice
- 42. Copyright © ArangoDB GmbH, 2016 ‣ We have the rise of big data ‣ Store everything you can ‣ Dataset easily grows beyond one machine ‣ This includes graph data!
- 43. Copyright © ArangoDB GmbH, 2016 ‣ Distribute graph on several machines (sharding) ‣ How to query it now? ‣ No global view of the graph possible any more ‣ What about edges between servers? ‣ In a shared environment network most of the time is the bottleneck ‣ Reduce network hops ‣ Vertex-Centric Indexes again help with super-nodes ‣ But: Only on a local machine
- 44. First let's do the cluster thingy
- 45. Copyright © ArangoDB GmbH, 2016 ‣ ArangoDB is the ﬁrst fully certiﬁed database including the persistence primitives for DC/OS ‣ ArangoDB's cluster resource management is based on Apache Mesos ‣ Later this year we will also support Kubernetes
- 46. Copyright © ArangoDB GmbH, 2016 ‣ ArangoDB can run clusters without it ‣ Setup Requires manual eﬀort (can be scripted) ‣ This works: ‣ Automatic Failover (Follower takes over if leader dies) ‣ Rebalancing of shards ‣ Everything inside of ArangoDB ‣ This is based on Mesos: ‣ Complete self healing ‣ Automatic restart of ArangoDBs (on new machines) ➡ We suggest you have someone on call
- 47. Demo Time DC/OS
- 48. Now distribute the graph
- 49. Copyright © ArangoDB GmbH, 2016 ‣ Only parts of the graph on every machine ‣ Neighboring vertices may be on diﬀerent machines ‣ Even edges could be on other machines than their vertices ‣ Queries need to be executed in a distributed way ‣ Result needs to be merged locally
- 50. Copyright © ArangoDB GmbH, 2016 Random Distribution ‣ Advantages: ‣ every server takes an equal portion of data ‣ easy to realize ‣ no knowledge about data required ‣ always works ‣ Disadvantages: ‣ Neighbors on diﬀerent machines ‣ Probably edges on other machines than their vertices ‣ A lot of network overhead is required for querying
- 51. Copyright © ArangoDB GmbH, 2016 Random Distribution ‣ Advantages: ‣ every server takes an equal portion of data ‣ easy to realize ‣ no knowledge about data required ‣ always works ‣ Disadvantages: ‣ Neighbors on diﬀerent machines ‣ Probably edges on other machines than their vertices ‣ A lot of network overhead is required for querying
- 52. Copyright © ArangoDB GmbH, 2016 Index-Free Adjacency ‣ Used by most other graph databases ‣ Every vertex maintains two lists of it's edges (IN and OUT) ‣ Do not use an index to ﬁnd edges ‣ How to shard this?
- 53. Copyright © ArangoDB GmbH, 2016 Index-Free Adjacency ‣ Used by most other graph databases ‣ Every vertex maintains two lists of it's edges (IN and OUT) ‣ Do not use an index to ﬁnd edges ‣ How to shard this?
- 54. Copyright © ArangoDB GmbH, 2016 Index-Free Adjacency ‣ Used by most other graph databases ‣ Every vertex maintains two lists of it's edges (IN and OUT) ‣ Do not use an index to ﬁnd edges ‣ How to shard this?
- 55. Copyright © ArangoDB GmbH, 2016 Index-Free Adjacency ‣ Used by most other graph databases ‣ Every vertex maintains two lists of it's edges (IN and OUT) ‣ Do not use an index to ﬁnd edges ‣ How to shard this? ????
- 56. Copyright © ArangoDB GmbH, 2016 Index-Free Adjacency ‣ ArangoDB uses an hash-based EdgeIndex (O(1) - lookup) ‣ The vertex is independent of it's edges ‣ It can be stored on a diﬀerent machine ‣ Used by most other graph databases ‣ Every vertex maintains two lists of it's edges (IN and OUT) ‣ Do not use an index to ﬁnd edges ‣ How to shard this? ????
- 57. Copyright © ArangoDB GmbH, 2016 Domain Based Distribution ‣ Many Graphs have a natural distribution ‣ By country/region for People ‣ By tags for Blogs ‣ By category for Products ‣ Most edges in same group ‣ Rare edges between groups
- 58. Copyright © ArangoDB GmbH, 2016 Domain Based Distribution ‣ Many Graphs have a natural distribution ‣ By country/region for People ‣ By tags for Blogs ‣ By category for Products ‣ Most edges in same group ‣ Rare edges between groups
- 59. Copyright © ArangoDB GmbH, 2016 Domain Based Distribution ‣ Many Graphs have a natural distribution ‣ By country/region for People ‣ By tags for Blogs ‣ By category for Products ‣ Most edges in same group ‣ Rare edges between groups ArangoDB Enterprise Edition uses Domain Knowledge for short-cuts
- 60. Copyright © ArangoDB GmbH, 2016 SmartGraphs - How it works DB Server 1 DB Server 2 DB Server n Coordinator Foxx Coordinator Foxx
- 61. Copyright © ArangoDB GmbH, 2016 SmartGraphs - How it works DB Server 1 DB Server 2 DB Server n Coordinator Foxx Coordinator Foxx
- 62. Copyright © ArangoDB GmbH, 2016 SmartGraphs - How it works DB Server 1 DB Server 2 DB Server n Coordinator Foxx Coordinator Foxx
- 63. Sneak Preview SmartGraphs
- 64. Copyright © ArangoDB GmbH, 2016
- 65. Copyright © ArangoDB GmbH, 2016 Database document write transactions per virtual CPU/second ArangoDB 1,730 Aerospike 2,500 Cassandra 965 Couchbase 1,375 FoundationDB 750 *) Source: https://mesosphere.com/blog/2015/11/30/arangodb-benchmark-dcos/
- 66. Copyright © ArangoDB GmbH, 2016 ‣ Further questions? ‣ Follow us on twitter: @arangodb ‣ Join our slack: slack.arangodb.com ‣ Follow me on twitter/github: @mchacki ‣ Write me a mail: michael@arangodb.com Thank You

No public clipboards found for this slide

Be the first to comment