Advertisement

Graph in Apache Cassandra. The World’s Most Scalable Graph Database

Connected Data World
Oct. 14, 2019
Advertisement

More Related Content

Advertisement

More from Connected Data World(20)

Advertisement

Graph in Apache Cassandra. The World’s Most Scalable Graph Database

  1. How to build a scalable graph database Bryn Cooke The smart way
  2. In this talk 1. What does it take to build a graph database? 2. Why shouldn’t you do this at home. 3. What do you use this for?
  3. Graph family tree
  4. Graph database recipe 1. Model 2. Language 3. Storage
  5. Model bob since: 2001 steph bob: knows :steph age: 30age: 34 knows known Property Graph RDF
  6. Language g.V().has('name', 'marko').out('knows').values('name')
  7. Storage
  8. The adjacency list Vertex Adjacent to A B, D, E B C B D C E D, F F A B C E D F
  9. //TODO • Storage • Indexing • Commit log • Drivers • Caching • Schema • Metrics • Backup/Restore • Logging • Security • Testing • Support • Failover • QoS • Paging • Partitioning • Sorting • Compaction • Repair • Community • Bux fixing • Optimisation
  10. Storage - Cassandra • Fast • Distributed • Scalable • Reliable • 11 years of development • 54 committers (listed on apache) • 274 contributors (listed on github)
  11. The adjacency list (in Cassandra)
  12. Here's what you could do C* C* C* C*C* My Graph Database Client Client Client Client
  13. Client Here's what you could do C* C* C* C*C* My Graph Database
  14. Here's what you should do C* C* C* C*C* DS Graph Client Client Client Client
  15. Deep integration with DataStax Enterprise DataStax Enterprise • DataStax Enterprise scalability > Cassandra scalability. • Analytics integration. • Search integration. • Thread optimisation. • Continuous paging. • Prefetching. • First class schema integration.
  16. Today’s Graph Database Market Graph Problems > Graph Databases
  17. Typical customer 360 queries Offline fast Human fast Machine fast Analytics CQL Search Responsetime Simple Complex No go zone DSE • Find me Jenny. • Find me all people with similar names to 'Jenny'. • Tell there are duplicate Jennys. • Find how Jenny and John are connected. • Find how influential Jenny is in my application.
  18. Find me Jenny Offline fast Human fast Machine fast Analytics CQL Search Responsetime Simple Complex No go zone DSE How Complex? • Simple How Fast? • Machine What? • CQL Why? • Single partition lookup • Single iteration
  19. Find me all people with similar names to 'Jenny' Offline fast Human fast Machine fast Analytics CQL Search Responsetime Simple Complex No go zone DSE How Complex? • Medium How Fast? • Human Fast What? • Search • Graph Why? • Single index lookup • Single iteration
  20. Tell there are duplicate Jennys Offline fast Human fast Machine fast Analytics CQL Search Responsetime Simple Complex No go zone DSE How Complex? • Medium How Fast? • Offline What? • Analytics • Graph Why? • Aggregation • Multiple Iteration
  21. Find how Jenny and John are connected Offline fast Human fast Machine fast Analytics CQL Search Responsetime Simple Complex No go zone DSE How Complex? • Complex How Fast? • Machine What? • Graph Why? • Multiple partition lookup • Multiple iteration
  22. Find how influential Jenny is in my application Offline fast Human fast Machine fast Analytics CQL Search Responsetime Simple Complex No go zone DSE How Complex? • Complex How Fast? • Offline What? • Spark Analytics • Graph via PageRank Why? • Full scan • Unknown iterations
  23. Typical customer 360 queries Offline fast Human fast Machine fast Analytics CQL Search Responsetime Simple Complex No go zone DSE • Find me Jenny. • Find me all people with similar names to 'Jenny'. • Tell there are duplicate Jennys. • Find how Jenny and John are connected. • Find how influential Jenny is in my application.
  24. Summary 1. What it takes to create a graph database a. Model b. Language c. Storage 2. How you can leverage an existing storage engine, and why Cassandra is a great choice. 3. Solving graph problems requires more than just the basics. Search and Analytics are essential tools, especially graph database.
  25. Don't try this at home Do not try replicate 100 person years of dev effort creating your own storage engine. Creating a graph database that scales is tough enough.
  26. Try it now https://downloads.datastax.com/#labs Labs
  27. Thank You
Advertisement