Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A case-for-graph-db

1,768 views

Published on

These days we hear a lot about NoSQL and particularly graph database. But before jumping straight into development using any graph database, we should ask the following questions - 'what makes it a case for GraphDB? And can you prove it?' Basically de-risking and making a case for management buy in. Further, its important to convince ourselves.

This talk highlights some cases where GraphDB will be useful. Followed by some insights from a comparison we did between Neo4j and MySQL/MSSQL. By the end of this session, you'll understand the advantages of using a GraphDB and also what questions to ask before selecting a GraphDB.

This was presented at TechJam on 11th Sept 2014

Published in: Technology

A case-for-graph-db

  1. 1. A case for Graph Database? dhaval.dalal@software-artisan.com ! @softwareartisan 11th Sept 2014
  2. 2. Context Direct and Cross-Functional reporting represents a network even for a simple organisation. What about modelling a group?
  3. 3. Apiary Functionality Structural Operations Mine Organisational Data ! • Expand/Collapse levels • View lineage • Summary Data at all levels • CRUD on all data (nodes/ relationships) • Link/De-link Sub-graphs or nodes • Evolving attributes of nodes and relationships based • Adding new nodes and relationships ! • Affinities Graph - Who talks to whom the most • Discover Skills Communities • Detecting overlap using SLPA (Speaker-listener Label Propogation)
  4. 4. Convince ourself first! • Anyone should ask - “What makes it a case for Graph DB? and Can you prove it?” • Its basically a de-risking act. • Two major aspects that we looked at • Flexibility in schema evolution • Performance
  5. 5. What to compare against? • RDBMS’ are a natural choice to be compared against. • MongoDB, though a NoSQL document store • good for storing DDD style aggregates. • not for inter-connected data. • We picked Neo4j • But remember, this is not a battle, we are just trying to find out when you should use what!
  6. 6. Flexibility in Evolution
  7. 7. Flexibility in Evolution
  8. 8. Flexibility in Evolution • Entity Diversity • Different kinds of nodes
  9. 9. Flexibility in Evolution • Entity Diversity • Different kinds of nodes • Connection Diversity • links could have different weights, directions.
  10. 10. Flexibility in Evolution • Entity Diversity • Different kinds of nodes • Connection Diversity • links could have different weights, directions. • Evolution of Entities and Links, themselves over time. • Varietal data needs • Is every node/link structured regularly or irregularly, connected or disconnected nodes etc…
  11. 11. Minimal set of functionality Analysis Model (Phase 1) Analysis Model (Phase 1) Neo4J Domain Model 1) Neo4J Domain Model Node Properties Person name Node Properties Person name type level type level Relationships Properties DIRECTLY_MANAGES N/A Note: For the purpose of establishing the case, we have modeled minimal relationships and not all the relationships that would have been in the final application. Below is a list of relationships that are yet pending to be modeled, but are not relevant for the purposes of taking performance measurements. Relationships Properties DIRECTLY_MANAGES Relationships N/A Properties 2) SQL Domain Model Queries Above screen-flow and modeling for organizations and groups use cases requires us to run
  12. 12. Measured performance of 3 1) Gather Subordinates names till a visibility level from a current level CURRENT LEVEL queries We have varied total hierarchy levels in the organization Aggregate from 3 to 8 for people. Optimized generic Cypher query is: Data start n = node:Person(name = "fName lName") match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->m return nodes(p) Where visibility level is a number that indicates the number of levels to show. For SQL we have to recursively add joins for each level, a generic SQL can SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS Reportee1, L2Reportees.directly_manages AS Reportee2, ... FROM person_reportee manager LEFT JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid LEFT JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid ... ... Current Level WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") Names • Subordinate names from current level until a visibility level • Aggregate Data from current level until a visibility level • Overall Aggregate Data for Dashboard • distribution of people at various levels 2) Gather Subordinates Aggregate data from current level We have varied total hierarchy levels in the organization from 3 to 8 for different volumes of people. Further, the optimized Generic Cypher query is: start n = node:Person(name = "fName lName") match n-[:DIRECTLY_MANAGES*0..(totalLevels - n.level)]->m-[:DIRECTLY_MANAGES*1..(totalLevels - n.level)]->o where n.level + visibilityLevel >= m.level return m.name as Subordinate, count(o) as Total For SQL aggregate query, we not only have to recursively add joins but also perform inner unions for each level till the last level to obtain the aggregate data for that level. Once we obtain the data for a particular level (per person), we perform outer unions to get the final result for all the levels. This results in a very big SQL query. Here is a sample query that returns aggregate data for 3 levels below the current level. Say,
  13. 13. Apple-to-Apple comparison • MySQL, MS-SQL and Neo4j • We did not use Traversal API (though faster), just as Cypher is to Neo4j as SQL is to RDBMS’ • For longer term, Neo4j intends to further Cypher Query planning and optimisation. • Indexing • Enabled on join columns for MySQL and MS-SQL DBs • For Neo4j, Person names were indexed
  14. 14. Environment Consistency • Same machine for all databases • MySQL v5.6.12, MS-SQL Server 2008 R2, Neo4j v1.9 (Advanced) • DB and tools on the same machine • Avoid network transport times being factored in. • Out-of-box settings for all DBs apart from giving 3 GB to the java process that ran Neo4j in Embedded mode.
  15. 15. Functional Equivalence • Consistent data distribution. • 8 Levels in org with people at each level managing the next for all DBs. • Functionally equivalent queries. • Measurements for worst possible queries scenario being executed by the application • Say if the top-boss logs in and wants to see all the levels (max. visibility level), the query will take the most time.
  16. 16. Measurement Tools • MS-SQL • Query Profiler • MySQL • we noted the duration (excluding fetch time) from MySQL workbench • Neo4j • Executed Parameteric queries programatically in Embedded Mode. • Did not use Neo4j shell for measurements as its intended to be an Ops tool (and not a transactional tool).
  17. 17. 1) Gather Subordinates names till a visibility level from a current level start n = node:Person(name = "fName lName") start n = node:Person(name = "fName lName") match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->m return nodes(p) Where visibility level is a number that indicates the number of Gather Subordinate Names match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->m return nodes(p) Where visibility level is a number that indicates the number of For SQL we have to recursively add joins for each level, a generic SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS Reportee1, L2Reportees.directly_manages AS Reportee2, ... FROM person_reportee manager LEFT JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid LEFT JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid Query 1) We have varied total hierarchy ... ... levels in the organization from people. Optimized generic WHERE manager.Cypher pid = (SELECT query id FROM person is: WHERE name = "fName lName") Visibility Database Queries Level 2 Neo4j start n = node:Person(name = "fName match n-[:DIRECTLY_MANAGES*1..2]->m return nodes(p) SELECT manager.pid AS Bigboss, manager.Subordinate, L1Reportees.directly_manages MySQL/ MSSQL Gather Subordinates names till a visibility level from a current level CURRENT LEVEL Names We have varied total hierarchy levels in the organization from 3 to 8 for different volumes of people. Optimized generic Cypher query is: start n = node:Person(name = "fName lName") match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->m return nodes(p) Where visibility level is a number that indicates the number of levels to show. For SQL we have to recursively add joins for each level, a generic SQL can be written as: SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS Reportee1, L2Reportees.directly_manages AS Reportee2, ... FROM person_reportee manager LEFT JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid LEFT JOIN person_reportee L2Reportees
  18. 18. 5 5 124 16 5616 5460 125 31 1377 757 1830 913 194 6 109 0 62 15 47 0 212 43 138 13 59 0 813 166 2848 763 1533 221 6 125 0 156 62 63 0 118 78 278 72 26 0 699 171 3607 1124 1387 319 7 109 16 94 47 62 0 151 90 420 71 13 0 728 177 3123 929 1457 357 8 110 0 141 31 78 0 231 46 621 84 18 0 721 164 2879 1040 906 207 109 0 9734 9625 140 31 1006 811 3002 1274 159 7 93 0 16271 16100 156 31 1125 784 2947 1566 158 8 Gather Subordinate Names 1000 People - Gather Subordinate Names 200 150 100 50 0 164 178 166 171 177 164 3 4 5 6 7 8 Query Execution Time (ms) Levels Volume 1M People Org 109 0 25334 25287 125 32 1285 951 4256 2171 510 100K People - Gather Subordinate Names 0 1000 People - Gather Subordinate Aggregate 1500 1125 750 MySQL - LEFT Join MySQL - INNER Neo4j 375 Cold Warm Cold Warm Cold 3 760 1000 750 500 851 763 1124 929 1040 250 166 167 167 175 177 183 3 4 5 6 7 8 Query Execution Time (ms) Levels 10K People - Gather Subordinate Names 400 300 200 221 100 0 -100 1M People - Gather Subordinate Names 1000 People - Overall Aggregate 269 185 183 171 221 10K People - Gather Subordinate Aggregate 3000 2250 319 9000 6750 357 4500 207 1500 750 2250 898 952 3 4 5 6 7 8 Query Execution Time (ms) Levels Warm Cache Plots 14 Levels MySQL MSSQL Neo4j Subordinate Names query for Warm 1M, Cache 2M Plots Level 8, and Volume 10K People Org MySQL MS-SQL Neo4j Lvls Gather Subordinate Names Gather Subordinate Aggregate Overall Aggregate Gather Subordinate Names Gather Subordinate Aggregate Overall Aggregate Gather Subordinate Names Gather Subordinate Aggregate Overall Aggregate Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm 3 109 0 140 31 109 15 172 107 294 181 103 1 730 176 6129 898 2292 613 4 141 16 343 234 109 0 229 116 1267 156 102 1 776 178 5865 952 2073 455 5 94 15 640 561 94 15 270 164 403 188 327 1 727 184 5718 1194 2119 328 6 94 0 998 905 78 0 401 123 650 254 38 3 742 221 5692 1458 3182 453 7 93 0 1482 1326 78 0 303 177 870 374 68 1 705 185 5710 1188 1990 456 8 93 0 2745 2683 78 0 387 163 1146 573 71 1 713 171 6002 1271 1976 36 300 225 150 75 0 176 178 184 3 4 5 6 7 8 Query Execution Time (ms) Levels 0 1194 1458 600 450 300 1M People Query 100K People - Overall Aggregate 1188 1271 150 176 182 184 173 177 153 3 4 5 6 7 8 Query Execution Time (ms) Levels 10K People - Overall Aggregate 3 4 5 6 7 8 Query Execution Time (ms) Levels 100K People 30000 22500 15000 7500 Query 0 3 4404 Execution Time (ms) 0 594 300000 225000 566 497 3 4 5 6 7 573 592 Query Execution Time (ms) 150000 75000 Levels Warm Cache Plots MySQL MS-SQL Lvls Gather Subordin ate Names Gather Subordinate Aggregate Overall Aggregate Gather Subordinate Names Gather Subordinate Aggregate Overall Aggregate Cold War m Cold Warm Cold War m 109 16 10998 10171 452 281 6337 4973 22703 20037 798 4 172 0 20467 19734 515 280 5677 5280 14888 7813 1245 5 172 16 60606 60653 531 281 7111 5959 12047 8293 1211 6 141 0 92789 92867 483 265 7278 6266 15226 7076 1251 7 218 0 160151 158793 499 280 11890 6724 15226 7076 1330 8 265 0 281301 280630 577 265 11300 8777 20560 14564 976 0 -2250 3 4 5 6 7 8 Query Execution Time (ms) Levels 0 3 44265 Execution Time (ms) 1M People - Overall Aggregate execution time for MySQL. Below are the results of application? Say, if the situation demanded that we change would find ourselves in a bad position performance that Neo4j’s performance is almost constant time factor-in INDIRECTLY_MANAGES relationship, translate to additional join for SQL, thus bloating increasing complexity. This will further degrade the take more time. 1M-3 1M-5 1M-7 2M-8 -17500 0 17500 35000 52500 70000 LEFT Vs INNER Vs Neo4j Query Execution Time (ms) Org Size-Levels Warm Cache Plots
  19. 19. queries using inner join on 1M (all levels), 2M and 3M (for level 8), we increase in query execution time for MySQL. Below are the results of MySQL - Left Vs Inner Join from the Gather Subordinate Names query for 1M, 2M Level 8, and 70000 52500 35000 17500 0 -17500 Warm Cache Plots LEFT Vs INNER Vs Neo4j 1M-3 1M-5 1M-7 2M-8 Query Execution Time (ms) Org Size-Levels MySQL - LEFT Join MySQL - INNER Neo4j Neo4j JOIN warm cold warm 16 718 176 0 740 182 874 721 184 312 709 173 3432 700 177 15896 835 153 36301 822 149 61776 744 148 us or any application? Say, if the situation demanded that we change
  20. 20. Performance
  21. 21. Performance
  22. 22. Performance 1. Cartesian Product All possible combinations
  23. 23. Performance 1. Cartesian Product All possible combinations
  24. 24. Performance 1. Cartesian Product All possible combinations 2. Filter
  25. 25. Performance 1. Cartesian Product All possible combinations 2. Filter
  26. 26. Performance Data Size 1. Cartesian Product All possible combinations 2. Filter DS
  27. 27. Performance Data Size 1. Cartesian Product All possible combinations 2. Filter DS Time
  28. 28. Performance Joins Data Size 1. Cartesian Product All possible combinations 2. Filter | J DS Time
  29. 29. Performance Joins Data Size 1. Cartesian Product All possible combinations 2. Filter Project People Skills | J DS Time
  30. 30. Performance Joins Data Size 1. Cartesian Product All possible combinations 2. Filter Project People Skills DS Time Query is now localised to a section of graph, solves large data-size problem | J
  31. 31. Performance Joins Data Size 1. Cartesian Product All possible combinations 2. Filter Project People Skills DS Time Traversal Query is now localised to a section of graph, solves large data-size problem | J
  32. 32. Performance Joins Data Size 1. Cartesian Product All possible combinations 2. Filter Project People Skills Traversal Query is now localised to a DS Time Time section of graph, solves large data-size problem | J
  33. 33. Relationships • Not first-class citizens in RDBMS’ and NoSQL aggregate stores • document stores or key-value stores. • Cost of executing connected queries is high. • “Who reports immediately to Smarty Pants?” - not a problem • But, “Who all report to Smarty Pants?” - introduces recursive joins and going down more than 5-6 levels the space and time complexity is very high. • “What skills does this person have?“ is relatively cheaper than “which people have these skills?” • what about - “which are people who have these skills also have those skills?”
  34. 34. Gather Subordinates Aggregate Data Volume 10K People Org MySQL MS-SQL Neo4j Gather Subordinate Aggregate People - Gather Subordinate Names Subordinate Names query for 1M, 2M Level 8, and execution time for MySQL. Below are the results of 178 Levels 14 166 171 177 MySQL MSSQL Neo4j Warm Cache Plots Volume 1M People Org MySQL MS-SQL Neo4j 100K People - Gather Subordinate Aggregate 30000 application? Say, if the situation demanded that we change would find ourselves in a bad position performance that Neo4j’s performance is almost constant time factor-in INDIRECTLY_MANAGES relationship, translate to additional join for SQL, thus bloating increasing complexity. This will further degrade the 4404 4697 5516 6145 6915 8304 1M People - Gather Subordinate Aggregate 73121 take more time. 100K People - Gather Subordinate Names Query Execution Time (ms) Levels 166 167 167 175 177 183 1M People - Gather Subordinate Names 600 450 300 150 100K People - Overall Aggregate 566 594 -17500 1000 People - Gather Subordinate Aggregate 1040 1000 750 500 250 10K People - Gather Subordinate Aggregate 1188 1271 0 1124 9000 6750 4500 1458 2250 0 17500 851 763 1000 People - Overall Aggregate 207 898 952 1194 35000 1500 1125 750 760 People - Gather Subordinate Names 3000 Query Execution Time (ms) Levels 2250 319 357 1500 750 52500 164 221 185 171 70000 1M-3 1M-5 1M-7 2M-8 LEFT Vs INNER Vs Neo4j 929 176 182 184 173 177 153 Query Execution Time (ms) MySQL - LEFT Join MySQL - INNER Neo4j Org Size-Levels Warm Cache Plots Subordinate Aggregate Aggregate Subordinate Names Subordinate Aggregate Aggregate Subordinate Names Subordinate Aggregate Aggregate Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm 109 0 93 0 226 0 49 0 6 0 679 164 3329 760 1538 269 94 16 78 16 114 82 133 61 35 0 670 178 2878 851 892 183 62 15 47 0 212 43 138 13 59 0 813 166 2848 763 1533 221 156 62 63 0 118 78 278 72 26 0 699 171 3607 1124 1387 319 94 47 62 0 151 90 420 71 13 0 728 177 3123 929 1457 357 141 31 78 0 231 46 621 84 18 0 721 164 2879 1040 906 207 4 5 6 7 8 Levels 375 3 4 5 6 7 8 Query Execution Time (ms) Levels 400 300 200 221 100 0 -100 269 183 3 4 5 6 7 8 Warm Cache Plots MySQL MSSQL Neo4j Overall Aggregate Gather Subordinate Names Gather Subordinate Aggregate Overall Aggregate Gather Subordinate Names Gather Subordinate Aggregate Overall Aggregate Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm 140 31 109 15 172 107 294 181 103 1 730 176 6129 898 2292 613 343 234 109 0 229 116 1267 156 102 1 776 178 5865 952 2073 455 640 561 94 15 270 164 403 188 327 1 727 184 5718 1194 2119 328 998 905 78 0 401 123 650 254 38 3 742 221 5692 1458 3182 453 1482 1326 78 0 303 177 870 374 68 1 705 185 5710 1188 1990 456 2745 2683 78 0 387 163 1146 573 71 1 713 171 6002 1271 1976 36 178 184 5 6 7 8 Levels 0 3 4 5 6 7 8 Query Execution Time (ms) Levels 700 10K People - Overall Aggregate Subordinate Names Subordinate Aggregate Aggregate Subordinate Names Subordinate Aggregate Aggregate Subordinate Names Subordinate Aggregate Aggregate Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm 3 124 0 1107 998 140 31 617 537 1201 608 168 12 736 166 13340 4404 5157 566 4 109 0 2231 2106 140 31 1224 650 1263 667 157 19 812 167 14686 4697 5396 594 5 124 16 5616 5460 125 31 1377 757 1830 913 194 12 713 167 15351 5516 5118 497 6 109 0 9734 9625 140 31 1006 811 3002 1274 159 12 756 175 17056 6145 4736 573 7 93 0 16271 16100 156 31 1125 784 2947 1566 158 14 700 177 17821 6915 4592 592 8 109 0 25334 25287 125 32 1285 951 4256 2171 510 14 691 183 19881 8304 4731 557 0 3 4 5 6 7 8 22500 15000 7500 0 3 4 5 6 7 8 Query Execution Time (ms) Levels 0 497 300000 225000 573 592 557 150000 75000 106260 44265 3 4 5 6 7 8 Query Execution Time (ms) Levels Warm Cache Plots Lvls Gather Subordin ate Names Gather Subordinate Aggregate Overall Aggregate Gather Subordinate Names Gather Subordinate Aggregate Overall Aggregate Gather Subordinat e Names Gather Subordinate Aggregate Overall Aggregate Cold War m Cold Warm Cold War m Cold Warm Cold Warm Cold War m Cold War m Cold Warm Cold War m 3 109 16 10998 10171 452 281 6337 4973 22703 20037 798 116 718 176 109432 44265 37190 2109 4 172 0 20467 19734 515 280 5677 5280 14888 7813 1245 114 740 182 115011 106260 36033 2194 5 172 16 60606 60653 531 281 7111 5959 12047 8293 1211 140 721 184 169123 77528 34068 2248 6 141 0 92789 92867 483 265 7278 6266 15226 7076 1251 140 709 173 255813 124810 33015 2107 7 218 0 160151 158793 499 280 11890 6724 15226 7076 1330 111 700 177 171607 125152 32813 1960 8 265 0 281301 280630 577 265 11300 8777 20560 14564 976 121 835 153 136432 73121 23673 1912 -2250 3 4 5 6 7 8 Query Execution Time (ms) Levels 0 124810125152 77528 3 4 5 6 7 8 Query Execution Time (ms) Levels 3000 1M People - Overall Aggregate
  35. 35. Gather Overall Aggregate Query start n = node(*) ! return n.level as Level, count(n) as Total! order by Level SELECT level, count(id)! FROM person! GROUP BY level
  36. 36. 164 178 166 171 177 164 1125 750 Query Execution Time Levels Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm 500 1124 3 109 0 140 31 109 15 172 107 294 750 181 103 1 730 176 6129 929 898 2292 1040 613 4 141 16 343 234 109 0 229 116 1267 156 102 1 776 178 5865 952 2073 455 5 94 15 640 561 94 15 270 164 403 188 327 1 727 184 5718 1194 2119 328 6 94 0 998 905 78 0 401 123 650 254 38 3 742 221 5692 1458 3182 453 7 93 0 1482 1326 78 0 303 177 870 374 68 1 705 185 5710 1188 1990 456 8 93 0 2745 2683 78 0 387 163 1146 573 71 1 713 171 6002 1271 1976 36 22500 Query Execution Time Levels 15000 7500 Gather Overall 851 763 250 Aggregate 166 167 167 175 177 183 Data 150 100 50 0 3 4 5 6 7 8 Query Execution Time (ms) Levels 375 760 3 4 5 6 7 8 Query Execution Time (ms) Levels 10K People - Gather Subordinate Names 400 300 200 221 100 0 -100 1000 People - Overall Aggregate 269 185 183 171 221 10K People - Gather Subordinate Aggregate 3000 2250 319 357 207 1500 750 898 952 3 4 5 6 7 8 Query Execution Time (ms) Levels 1M People - Gather Subordinate Names 600 450 300 150 176 182 184 173 177 153 3000 14 Warm Cache Plots 10K People - Overall Aggregate MySQL MSSQL Neo4j 14 Levels MySQL MSSQL Neo4j Subordinate Names query for 1M, 2M Level 8, and execution time for MySQL. Below are the results of Warm Cache Plots 100K People - Overall Aggregate 566 594 application? Say, if the situation demanded that we change would find ourselves in a bad position performance that Neo4j’s performance is almost constant time 4404 4697 5516 6145 6915 8304 1M People - Gather Subordinate Aggregate 300000 573 592 225000 factor-in INDIRECTLY_MANAGES relationship, translate to additional join for SQL, thus bloating 557 increasing complexity. This will further degrade the 106260 124810125152 77528 1M People - Overall Aggregate 2109 2194 2248 2107 1960 1912 take more time. 9000 6750 4500 1194 1458 2250 0 -17500 453 456 0 328 17500 455 35000 700 525 350 175 0 613 52500 176 178 184 70000 1M-3 1M-5 1M-7 2M-8 LEFT Vs INNER Vs Neo4j 36 Query Execution Time (ms) MySQL - LEFT Join MySQL - INNER Neo4j 1188 1271 Org Size-Levels 300 225 150 75 Warm Cache Plots 0 3 4 5 6 7 8 Query Execution Time (ms) Levels 0 3 4 5 6 7 8 Query Execution Time (ms) Levels -175 3 4 5 6 7 8 Query Execution Time (ms) Levels Warm Cache Plots MySQL MSSQL Neo4j 0 3 4 5 6 7 8 0 3 4 5 6 7 8 0 497 150000 75000 44265 3 4 5 6 7 8 Query Execution Time (ms) Levels 16 Warm Cache Plots MySQL MSSQL Neo4J Names Cold War m Cold Warm Cold War m Cold Warm Cold Warm Cold War m Cold War m Cold Warm Cold War m 3 109 16 10998 10171 452 281 6337 4973 22703 20037 798 116 718 176 109432 44265 37190 2109 4 172 0 20467 19734 515 280 5677 5280 14888 7813 1245 114 740 182 115011 106260 36033 2194 5 172 16 60606 60653 531 281 7111 5959 12047 8293 1211 140 721 184 169123 77528 34068 2248 6 141 0 92789 92867 483 265 7278 6266 15226 7076 1251 140 709 173 255813 124810 33015 2107 7 218 0 160151 158793 499 280 11890 6724 15226 7076 1330 111 700 177 171607 125152 32813 1960 8 265 0 281301 280630 577 265 11300 8777 20560 14564 976 121 835 153 136432 73121 23673 1912 Warm Cache Plots -2250 3 4 5 6 7 8 Query Execution Time (ms) Levels 0 73121 3 4 5 6 7 8 Query Execution Time (ms) Levels 2250 1500 750 0 3 4 5 6 7 8 Query Execution Time (ms) Levels 17 MySQL MSSQL Neo4J
  37. 37. But, what if... • I need to aggregate data, placing an OLAPish demand on the data? • Use non-graph stores: RDBMS or NoSQL alongside. • Graph Compute Engines are optimised for scanning and processing large amounts of information in batch. • Giraph, Pegasus etc… • Polyglot persistence is a norm. • Has anyone tried Datomic?
  38. 38. Asking Questions • Is your data connected? • Or does your domain naturally gravitate towards or has good number of joins (Dense Connections) leading to explosion with large data? • For example: Making Recommendations • Or are you finding yourself writing complex SQL Queries? • For example: recursive joins or lots of joins
  39. 39. Qs?
  40. 40. References • Graph Databases • Jim Webber, Ian Robinson and Emil Eifrem • Apiary: A case for Neo4j? • Anuj Mathur and Dhaval Dalal • Code and scaffolding programs that we used are available on - https://github.com/ EqualExperts/Apiary-Neo4j- RDBMS-Comparison

×