SlideShare a Scribd company logo
1 of 40
Download to read offline
A case for 
Graph Database? 
dhaval.dalal@software-artisan.com 
! 
@softwareartisan 
11th Sept 2014
Context 
Direct and Cross-Functional reporting represents a network even for 
a simple organisation. 
What about modelling a group?
Apiary Functionality 
Structural Operations 
Mine Organisational 
Data 
! 
• Expand/Collapse levels 
• View lineage 
• Summary Data at all levels 
• CRUD on all data (nodes/ 
relationships) 
• Link/De-link Sub-graphs or 
nodes 
• Evolving attributes of nodes 
and relationships based 
• Adding new nodes and 
relationships 
! 
• Affinities Graph - Who talks to 
whom the most 
• Discover Skills Communities 
• Detecting overlap using SLPA 
(Speaker-listener Label 
Propogation)
Convince ourself first! 
• Anyone should ask - “What makes it a case for 
Graph DB? and Can you prove it?” 
• Its basically a de-risking act. 
• Two major aspects that we looked at 
• Flexibility in schema evolution 
• Performance
What to compare against? 
• RDBMS’ are a natural choice to be compared against. 
• MongoDB, though a NoSQL document store 
• good for storing DDD style aggregates. 
• not for inter-connected data. 
• We picked Neo4j 
• But remember, this is not a battle, we are just trying to 
find out when you should use what!
Flexibility in Evolution
Flexibility in Evolution
Flexibility in Evolution 
• Entity Diversity 
• Different kinds of nodes
Flexibility in Evolution 
• Entity Diversity 
• Different kinds of nodes 
• Connection Diversity 
• links could have different weights, directions.
Flexibility in Evolution 
• Entity Diversity 
• Different kinds of nodes 
• Connection Diversity 
• links could have different weights, directions. 
• Evolution of Entities and Links, themselves over time. 
• Varietal data needs 
• Is every node/link structured regularly or irregularly, 
connected or disconnected nodes etc…
Minimal set of functionality 
Analysis Model (Phase 1) 
Analysis Model (Phase 1) 
Neo4J Domain Model 
1) Neo4J Domain Model 
Node Properties 
Person name 
Node Properties 
Person name 
type 
level 
type 
level 
Relationships Properties 
DIRECTLY_MANAGES N/A 
Note: For the purpose of establishing the case, we have modeled minimal relationships and 
not all the relationships that would have been in the final application. Below is a list of 
relationships that are yet pending to be modeled, but are not relevant for the purposes of 
taking performance measurements. 
Relationships Properties 
DIRECTLY_MANAGES Relationships N/A 
Properties 
2) SQL Domain Model 
Queries 
Above screen-flow and modeling for organizations and groups use cases requires us to run
Measured performance of 3 
1) Gather Subordinates names till a visibility level from a current level 
CURRENT 
LEVEL 
queries 
We have varied total hierarchy levels in the organization Aggregate 
from 3 to 8 for people. Optimized generic Cypher query is: 
Data 
start n = node:Person(name = "fName lName") 
match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->m 
return nodes(p) 
Where visibility level is a number that indicates the number of levels to show. 
For SQL we have to recursively add joins for each level, a generic SQL can SELECT manager.pid AS Bigboss, 
manager.directly_manages AS Subordinate, 
L1Reportees.directly_manages AS Reportee1, 
L2Reportees.directly_manages AS Reportee2, 
... 
FROM person_reportee manager 
LEFT JOIN person_reportee L1Reportees 
ON manager.directly_manages = L1Reportees.pid 
LEFT JOIN person_reportee L2Reportees 
ON L1Reportees.directly_manages = L2Reportees.pid 
... 
... 
Current 
Level 
WHERE manager.pid = (SELECT id 
FROM person 
WHERE name = "fName lName") 
Names 
• Subordinate names from 
current level until a visibility 
level 
• Aggregate Data from current 
level until a visibility level 
• Overall Aggregate Data for 
Dashboard 
• distribution of people at various 
levels 
2) Gather Subordinates Aggregate data from current level 
We have varied total hierarchy levels in the 
organization from 3 to 8 for different volumes 
of people. 
Further, the optimized Generic Cypher query is: 
start n = node:Person(name = "fName lName") 
match n-[:DIRECTLY_MANAGES*0..(totalLevels - 
n.level)]->m-[:DIRECTLY_MANAGES*1..(totalLevels 
- n.level)]->o 
where n.level + visibilityLevel >= m.level 
return m.name as Subordinate, count(o) as Total 
For SQL aggregate query, we not only have to recursively add joins but also perform inner 
unions for each level till the last level to obtain the aggregate data for that level. Once we 
obtain the data for a particular level (per person), we perform outer unions to get the final 
result for all the levels. This results in a very big SQL query. 
Here is a sample query that returns aggregate data for 3 levels below the current level. Say,
Apple-to-Apple comparison 
• MySQL, MS-SQL and Neo4j 
• We did not use Traversal API (though faster), just as 
Cypher is to Neo4j as SQL is to RDBMS’ 
• For longer term, Neo4j intends to further Cypher Query planning 
and optimisation. 
• Indexing 
• Enabled on join columns for MySQL and MS-SQL DBs 
• For Neo4j, Person names were indexed
Environment Consistency 
• Same machine for all databases 
• MySQL v5.6.12, MS-SQL Server 2008 R2, Neo4j v1.9 
(Advanced) 
• DB and tools on the same machine 
• Avoid network transport times being factored in. 
• Out-of-box settings for all DBs apart from giving 3 
GB to the java process that ran Neo4j in Embedded 
mode.
Functional Equivalence 
• Consistent data distribution. 
• 8 Levels in org with people at each level managing the next 
for all DBs. 
• Functionally equivalent queries. 
• Measurements for worst possible queries scenario 
being executed by the application 
• Say if the top-boss logs in and wants to see all the levels 
(max. visibility level), the query will take the most time.
Measurement Tools 
• MS-SQL 
• Query Profiler 
• MySQL 
• we noted the duration (excluding fetch time) from MySQL 
workbench 
• Neo4j 
• Executed Parameteric queries programatically in Embedded Mode. 
• Did not use Neo4j shell for measurements as its intended to be an 
Ops tool (and not a transactional tool).
1) Gather Subordinates names till a visibility level from a current level 
start n = node:Person(name = "fName lName") 
start n = node:Person(name = "fName lName") 
match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->m 
return nodes(p) 
Where visibility level is a number that indicates the number of Gather Subordinate Names 
match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->m 
return nodes(p) 
Where visibility level is a number that indicates the number of For SQL we have to recursively add joins for each level, a generic SELECT manager.pid AS Bigboss, 
manager.directly_manages AS Subordinate, 
L1Reportees.directly_manages AS Reportee1, 
L2Reportees.directly_manages AS Reportee2, 
... 
FROM person_reportee manager 
LEFT JOIN person_reportee L1Reportees 
ON manager.directly_manages = L1Reportees.pid 
LEFT JOIN person_reportee L2Reportees 
ON L1Reportees.directly_manages = L2Reportees.pid 
Query 
1) We have varied total hierarchy ... 
... 
levels in the organization from people. Optimized generic WHERE manager.Cypher pid = (SELECT query id 
FROM person 
is: 
WHERE name = "fName lName") 
Visibility 
Database Queries 
Level 
2 
Neo4j start n = node:Person(name = "fName match n-[:DIRECTLY_MANAGES*1..2]->m 
return nodes(p) 
SELECT manager.pid AS Bigboss, manager.Subordinate, L1Reportees.directly_manages MySQL/ 
MSSQL 
Gather Subordinates names till a visibility level from a current level 
CURRENT 
LEVEL 
Names 
We have varied total hierarchy levels in the organization from 3 to 8 for different volumes of 
people. Optimized generic Cypher query is: 
start n = node:Person(name = "fName lName") 
match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->m 
return nodes(p) 
Where visibility level is a number that indicates the number of levels to show. 
For SQL we have to recursively add joins for each level, a generic SQL can be written as: 
SELECT manager.pid AS Bigboss, 
manager.directly_manages AS Subordinate, 
L1Reportees.directly_manages AS Reportee1, 
L2Reportees.directly_manages AS Reportee2, 
... 
FROM person_reportee manager 
LEFT JOIN person_reportee L1Reportees 
ON manager.directly_manages = L1Reportees.pid 
LEFT JOIN person_reportee L2Reportees
5 5 124 16 5616 5460 125 31 1377 757 1830 913 194 6 109 0 62 15 47 0 212 43 138 13 59 0 813 166 2848 763 1533 221 
6 125 0 156 62 63 0 118 78 278 72 26 0 699 171 3607 1124 1387 319 
7 109 16 94 47 62 0 151 90 420 71 13 0 728 177 3123 929 1457 357 
8 110 0 141 31 78 0 231 46 621 84 18 0 721 164 2879 1040 906 207 
109 0 9734 9625 140 31 1006 811 3002 1274 159 7 93 0 16271 16100 156 31 1125 784 2947 1566 158 8 Gather Subordinate Names 
1000 People - Gather Subordinate Names 
200 
150 
100 
50 
0 
164 
178 
166 171 177 
164 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
Volume 1M People Org 
109 0 25334 25287 125 32 1285 951 4256 2171 510 100K People - Gather Subordinate Names 
0 
1000 People - Gather Subordinate Aggregate 
1500 
1125 
750 
MySQL - LEFT Join MySQL - INNER Neo4j 
375 
Cold Warm Cold Warm Cold 3 760 
1000 
750 
500 
851 
763 
1124 
929 
1040 
250 
166 167 167 175 177 183 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
10K People - Gather Subordinate Names 
400 
300 
200 
221 
100 
0 
-100 
1M People - Gather Subordinate Names 
1000 People - Overall Aggregate 
269 
185 183 
171 
221 
10K People - Gather Subordinate Aggregate 
3000 
2250 
319 
9000 
6750 
357 
4500 
207 
1500 
750 
2250 
898 952 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
Warm Cache Plots 
14 
Levels 
MySQL MSSQL Neo4j 
Subordinate Names query for Warm 1M, Cache 2M Plots 
Level 8, and 
Volume 10K People Org 
MySQL MS-SQL Neo4j 
Lvls Gather 
Subordinate 
Names 
Gather 
Subordinate 
Aggregate 
Overall 
Aggregate 
Gather 
Subordinate 
Names 
Gather 
Subordinate 
Aggregate 
Overall 
Aggregate 
Gather 
Subordinate 
Names 
Gather 
Subordinate 
Aggregate 
Overall 
Aggregate 
Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm 
3 109 0 140 31 109 15 172 107 294 181 103 1 730 176 6129 898 2292 613 
4 141 16 343 234 109 0 229 116 1267 156 102 1 776 178 5865 952 2073 455 
5 94 15 640 561 94 15 270 164 403 188 327 1 727 184 5718 1194 2119 328 
6 94 0 998 905 78 0 401 123 650 254 38 3 742 221 5692 1458 3182 453 
7 93 0 1482 1326 78 0 303 177 870 374 68 1 705 185 5710 1188 1990 456 
8 93 0 2745 2683 78 0 387 163 1146 573 71 1 713 171 6002 1271 1976 36 
300 
225 
150 
75 
0 
176 178 184 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
0 
1194 
1458 
600 
450 
300 
1M People Query 100K People - Overall Aggregate 
1188 1271 
150 
176 182 184 173 177 153 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
10K People - Overall Aggregate 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
100K People 30000 
22500 
15000 
7500 
Query 0 
3 4404 Execution Time (ms) 
0 
594 
300000 
225000 
566 497 
3 4 5 6 7 573 592 
Query Execution Time (ms) 
150000 
75000 
Levels 
Warm Cache Plots 
MySQL MS-SQL Lvls Gather 
Subordin 
ate 
Names 
Gather Subordinate 
Aggregate 
Overall 
Aggregate 
Gather 
Subordinate 
Names 
Gather 
Subordinate 
Aggregate 
Overall 
Aggregate 
Cold War 
m 
Cold Warm Cold War 
m 
109 16 10998 10171 452 281 6337 4973 22703 20037 798 4 172 0 20467 19734 515 280 5677 5280 14888 7813 1245 5 172 16 60606 60653 531 281 7111 5959 12047 8293 1211 6 141 0 92789 92867 483 265 7278 6266 15226 7076 1251 7 218 0 160151 158793 499 280 11890 6724 15226 7076 1330 8 265 0 281301 280630 577 265 11300 8777 20560 14564 976 0 
-2250 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
0 
3 44265 
Execution Time (ms) 
1M People - Overall Aggregate 
execution time for MySQL. Below are the results of 
application? Say, if the situation demanded that we change 
would find ourselves in a bad position performance 
that Neo4j’s performance is almost constant time 
factor-in INDIRECTLY_MANAGES relationship, 
translate to additional join for SQL, thus bloating 
increasing complexity. This will further degrade the 
take more time. 
1M-3 1M-5 1M-7 2M-8 
-17500 
0 
17500 
35000 
52500 
70000 
LEFT Vs INNER Vs Neo4j 
Query Execution Time (ms) 
Org Size-Levels 
Warm Cache Plots
queries using inner join on 1M (all levels), 2M and 3M (for level 8), we 
increase in query execution time for MySQL. Below are the results of 
MySQL - Left Vs Inner Join 
from the Gather Subordinate Names query for 1M, 2M Level 8, and 
70000 
52500 
35000 
17500 
0 
-17500 
Warm Cache Plots 
LEFT Vs INNER Vs Neo4j 
1M-3 1M-5 1M-7 2M-8 
Query Execution Time (ms) 
Org Size-Levels 
MySQL - LEFT Join MySQL - INNER Neo4j 
Neo4j 
JOIN 
warm cold warm 
16 718 176 
0 740 182 
874 721 184 
312 709 173 
3432 700 177 
15896 835 153 
36301 822 149 
61776 744 148 
us or any application? Say, if the situation demanded that we change
Performance
Performance
Performance 
1. Cartesian Product 
All possible combinations
Performance 
1. Cartesian Product 
All possible combinations
Performance 
1. Cartesian Product 
All possible combinations 
2. Filter
Performance 
1. Cartesian Product 
All possible combinations 
2. Filter
Performance 
Data 
Size 
1. Cartesian Product 
All possible combinations 
2. Filter 
DS
Performance 
Data 
Size 
1. Cartesian Product 
All possible combinations 
2. Filter 
DS 
Time
Performance 
Joins 
Data 
Size 
1. Cartesian Product 
All possible combinations 
2. Filter 
| J 
DS 
Time
Performance 
Joins 
Data 
Size 
1. Cartesian Product 
All possible combinations 
2. Filter 
Project 
People 
Skills 
| J 
DS 
Time
Performance 
Joins 
Data 
Size 
1. Cartesian Product 
All possible combinations 
2. Filter 
Project 
People 
Skills 
DS 
Time 
Query is now localised to a 
section of graph, 
solves large data-size problem 
| J
Performance 
Joins 
Data 
Size 
1. Cartesian Product 
All possible combinations 
2. Filter 
Project 
People 
Skills 
DS 
Time 
Traversal Query is now localised to a 
section of graph, 
solves large data-size problem 
| J
Performance 
Joins 
Data 
Size 
1. Cartesian Product 
All possible combinations 
2. Filter 
Project 
People 
Skills 
Traversal Query is now localised to a 
DS 
Time Time 
section of graph, 
solves large data-size problem 
| J
Relationships 
• Not first-class citizens in RDBMS’ and NoSQL aggregate stores 
• document stores or key-value stores. 
• Cost of executing connected queries is high. 
• “Who reports immediately to Smarty Pants?” - not a problem 
• But, “Who all report to Smarty Pants?” - introduces recursive joins and 
going down more than 5-6 levels the space and time complexity is very 
high. 
• “What skills does this person have?“ is relatively cheaper than “which 
people have these skills?” 
• what about - “which are people who have these skills also have those 
skills?”
Gather Subordinates 
Aggregate Data 
Volume 10K People Org 
MySQL MS-SQL Neo4j 
Gather 
Subordinate 
Aggregate 
People - Gather Subordinate Names 
Subordinate Names query for 1M, 2M Level 8, and 
execution time for MySQL. Below are the results of 
178 
Levels 
14 
166 171 177 
MySQL MSSQL Neo4j 
Warm Cache Plots 
Volume 1M People Org 
MySQL MS-SQL Neo4j 
100K People - Gather Subordinate Aggregate 
30000 
application? Say, if the situation demanded that we change 
would find ourselves in a bad position performance 
that Neo4j’s performance is almost constant time 
factor-in INDIRECTLY_MANAGES relationship, 
translate to additional join for SQL, thus bloating 
increasing complexity. This will further degrade the 
4404 4697 5516 6145 6915 8304 
1M People - Gather Subordinate Aggregate 
73121 
take more time. 
100K People - Gather Subordinate Names 
Query Execution Time (ms) Levels 
166 167 167 175 177 183 
1M People - Gather Subordinate Names 
600 
450 
300 
150 
100K People - Overall Aggregate 
566 594 
-17500 
1000 People - Gather Subordinate Aggregate 
1040 
1000 
750 
500 
250 
10K People - Gather Subordinate Aggregate 
1188 1271 
0 
1124 
9000 
6750 
4500 
1458 
2250 
0 
17500 
851 
763 
1000 People - Overall Aggregate 
207 
898 952 
1194 
35000 
1500 
1125 
750 
760 
People - Gather Subordinate Names 
3000 
Query Execution Time (ms) Levels 
2250 
319 
357 
1500 
750 
52500 
164 
221 
185 171 
70000 
1M-3 1M-5 1M-7 2M-8 
LEFT Vs INNER Vs Neo4j 
929 
176 182 184 173 177 153 
Query Execution Time (ms) 
MySQL - LEFT Join MySQL - INNER Neo4j 
Org Size-Levels 
Warm Cache Plots 
Subordinate 
Aggregate 
Aggregate 
Subordinate 
Names 
Subordinate 
Aggregate 
Aggregate 
Subordinate 
Names 
Subordinate 
Aggregate 
Aggregate 
Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm 
109 0 93 0 226 0 49 0 6 0 679 164 3329 760 1538 269 
94 16 78 16 114 82 133 61 35 0 670 178 2878 851 892 183 
62 15 47 0 212 43 138 13 59 0 813 166 2848 763 1533 221 
156 62 63 0 118 78 278 72 26 0 699 171 3607 1124 1387 319 
94 47 62 0 151 90 420 71 13 0 728 177 3123 929 1457 357 
141 31 78 0 231 46 621 84 18 0 721 164 2879 1040 906 207 
4 5 6 7 8 
Levels 
375 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
400 
300 
200 
221 
100 
0 
-100 
269 
183 
3 4 5 6 7 8 
Warm Cache Plots 
MySQL MSSQL Neo4j 
Overall 
Aggregate 
Gather 
Subordinate 
Names 
Gather 
Subordinate 
Aggregate 
Overall 
Aggregate 
Gather 
Subordinate 
Names 
Gather 
Subordinate 
Aggregate 
Overall 
Aggregate 
Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm 
140 31 109 15 172 107 294 181 103 1 730 176 6129 898 2292 613 
343 234 109 0 229 116 1267 156 102 1 776 178 5865 952 2073 455 
640 561 94 15 270 164 403 188 327 1 727 184 5718 1194 2119 328 
998 905 78 0 401 123 650 254 38 3 742 221 5692 1458 3182 453 
1482 1326 78 0 303 177 870 374 68 1 705 185 5710 1188 1990 456 
2745 2683 78 0 387 163 1146 573 71 1 713 171 6002 1271 1976 36 
178 184 
5 6 7 8 
Levels 
0 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
700 
10K People - Overall Aggregate 
Subordinate 
Names 
Subordinate 
Aggregate 
Aggregate 
Subordinate 
Names 
Subordinate 
Aggregate 
Aggregate 
Subordinate 
Names 
Subordinate 
Aggregate 
Aggregate 
Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm 
3 124 0 1107 998 140 31 617 537 1201 608 168 12 736 166 13340 4404 5157 566 
4 109 0 2231 2106 140 31 1224 650 1263 667 157 19 812 167 14686 4697 5396 594 
5 124 16 5616 5460 125 31 1377 757 1830 913 194 12 713 167 15351 5516 5118 497 
6 109 0 9734 9625 140 31 1006 811 3002 1274 159 12 756 175 17056 6145 4736 573 
7 93 0 16271 16100 156 31 1125 784 2947 1566 158 14 700 177 17821 6915 4592 592 
8 109 0 25334 25287 125 32 1285 951 4256 2171 510 14 691 183 19881 8304 4731 557 
0 
3 4 5 6 7 8 
22500 
15000 
7500 
0 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
0 
497 
300000 
225000 
573 592 
557 
150000 
75000 
106260 
44265 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
Warm Cache Plots 
Lvls Gather 
Subordin 
ate 
Names 
Gather Subordinate 
Aggregate 
Overall 
Aggregate 
Gather 
Subordinate 
Names 
Gather 
Subordinate 
Aggregate 
Overall 
Aggregate 
Gather 
Subordinat 
e Names 
Gather Subordinate 
Aggregate 
Overall 
Aggregate 
Cold War 
m 
Cold Warm Cold War 
m 
Cold Warm Cold Warm Cold War 
m 
Cold War 
m 
Cold Warm Cold War 
m 
3 109 16 10998 10171 452 281 6337 4973 22703 20037 798 116 718 176 109432 44265 37190 2109 
4 172 0 20467 19734 515 280 5677 5280 14888 7813 1245 114 740 182 115011 106260 36033 2194 
5 172 16 60606 60653 531 281 7111 5959 12047 8293 1211 140 721 184 169123 77528 34068 2248 
6 141 0 92789 92867 483 265 7278 6266 15226 7076 1251 140 709 173 255813 124810 33015 2107 
7 218 0 160151 158793 499 280 11890 6724 15226 7076 1330 111 700 177 171607 125152 32813 1960 
8 265 0 281301 280630 577 265 11300 8777 20560 14564 976 121 835 153 136432 73121 23673 1912 
-2250 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
0 
124810125152 
77528 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
3000 
1M People - Overall Aggregate
Gather Overall Aggregate 
Query 
start n = node(*) ! 
return n.level as Level, count(n) as Total! 
order by Level 
SELECT level, count(id)! 
FROM person! 
GROUP BY level
164 
178 
166 171 177 
164 
1125 
750 
Query Execution Time Levels 
Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm 
500 
1124 
3 109 0 140 31 109 15 172 107 294 750 
181 103 1 730 176 6129 929 
898 2292 1040 
613 
4 141 16 343 234 109 0 229 116 1267 156 102 1 776 178 5865 952 2073 455 
5 94 15 640 561 94 15 270 164 403 188 327 1 727 184 5718 1194 2119 328 
6 94 0 998 905 78 0 401 123 650 254 38 3 742 221 5692 1458 3182 453 
7 93 0 1482 1326 78 0 303 177 870 374 68 1 705 185 5710 1188 1990 456 
8 93 0 2745 2683 78 0 387 163 1146 573 71 1 713 171 6002 1271 1976 36 
22500 
Query Execution Time Levels 
15000 
7500 
Gather Overall 851 
763 
250 
Aggregate 
166 167 167 175 177 183 
Data 
150 
100 
50 
0 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
375 
760 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
10K People - Gather Subordinate Names 
400 
300 
200 
221 
100 
0 
-100 
1000 People - Overall Aggregate 
269 
185 183 
171 
221 
10K People - Gather Subordinate Aggregate 
3000 
2250 
319 
357 
207 
1500 
750 
898 952 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
1M People - Gather Subordinate Names 
600 
450 
300 
150 
176 182 184 173 177 153 
3000 
14 
Warm Cache Plots 
10K People - Overall Aggregate 
MySQL MSSQL Neo4j 
14 
Levels 
MySQL MSSQL Neo4j 
Subordinate Names query for 1M, 2M Level 8, and 
execution time for MySQL. Below are the results of 
Warm Cache Plots 
100K People - Overall Aggregate 
566 594 
application? Say, if the situation demanded that we change 
would find ourselves in a bad position performance 
that Neo4j’s performance is almost constant time 
4404 4697 5516 6145 6915 8304 
1M People - Gather Subordinate Aggregate 
300000 
573 592 
225000 
factor-in INDIRECTLY_MANAGES relationship, 
translate to additional join for SQL, thus bloating 
557 
increasing complexity. This will further degrade the 
106260 
124810125152 
77528 
1M People - Overall Aggregate 
2109 2194 2248 2107 1960 1912 
take more time. 
9000 
6750 
4500 
1194 
1458 
2250 
0 
-17500 
453 456 
0 
328 
17500 
455 
35000 
700 
525 
350 
175 
0 
613 
52500 
176 178 184 
70000 
1M-3 1M-5 1M-7 2M-8 
LEFT Vs INNER Vs Neo4j 
36 
Query Execution Time (ms) 
MySQL - LEFT Join MySQL - INNER Neo4j 
1188 1271 
Org Size-Levels 
300 
225 
150 
75 
Warm Cache Plots 
0 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
0 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
-175 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
Warm Cache Plots 
MySQL MSSQL Neo4j 
0 
3 4 5 6 7 8 
0 
3 4 5 6 7 8 
0 
497 
150000 
75000 
44265 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
16 
Warm Cache Plots 
MySQL MSSQL Neo4J 
Names 
Cold War 
m 
Cold Warm Cold War 
m 
Cold Warm Cold Warm Cold War 
m 
Cold War 
m 
Cold Warm Cold War 
m 
3 109 16 10998 10171 452 281 6337 4973 22703 20037 798 116 718 176 109432 44265 37190 2109 
4 172 0 20467 19734 515 280 5677 5280 14888 7813 1245 114 740 182 115011 106260 36033 2194 
5 172 16 60606 60653 531 281 7111 5959 12047 8293 1211 140 721 184 169123 77528 34068 2248 
6 141 0 92789 92867 483 265 7278 6266 15226 7076 1251 140 709 173 255813 124810 33015 2107 
7 218 0 160151 158793 499 280 11890 6724 15226 7076 1330 111 700 177 171607 125152 32813 1960 
8 265 0 281301 280630 577 265 11300 8777 20560 14564 976 121 835 153 136432 73121 23673 1912 
Warm Cache Plots 
-2250 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
0 
73121 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
2250 
1500 
750 
0 
3 4 5 6 7 8 
Query Execution Time (ms) 
Levels 
17 
MySQL MSSQL Neo4J
But, what if... 
• I need to aggregate data, placing an OLAPish demand on 
the data? 
• Use non-graph stores: RDBMS or NoSQL alongside. 
• Graph Compute Engines are optimised for scanning 
and processing large amounts of information in batch. 
• Giraph, Pegasus etc… 
• Polyglot persistence is a norm. 
• Has anyone tried Datomic?
Asking Questions 
• Is your data connected? 
• Or does your domain naturally gravitate towards or has 
good number of joins (Dense Connections) leading to 
explosion with large data? 
• For example: Making Recommendations 
• Or are you finding yourself writing complex SQL 
Queries? 
• For example: recursive joins or lots of joins
Qs?
References 
• Graph Databases 
• Jim Webber, Ian Robinson and Emil Eifrem 
• Apiary: A case for Neo4j? 
• Anuj Mathur and Dhaval Dalal 
• Code and scaffolding programs that we used are available 
on - https://github.com/ EqualExperts/Apiary-Neo4j- 
RDBMS-Comparison

More Related Content

What's hot

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
 

What's hot (15)

High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Dictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary Based Annotation at Scale with Spark by Sujit PalDictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary Based Annotation at Scale with Spark by Sujit Pal
 
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Twitter Search Architecture
Twitter Search Architecture Twitter Search Architecture
Twitter Search Architecture
 
8. key value databases laboratory
8. key value databases laboratory 8. key value databases laboratory
8. key value databases laboratory
 
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
 
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
 
Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensembles
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Big data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysBig data 101 for beginners riga dev days
Big data 101 for beginners riga dev days
 
Slick – the modern way to access your Data
Slick – the modern way to access your DataSlick – the modern way to access your Data
Slick – the modern way to access your Data
 
P9 speed of-light faceted search via oracle in-memory option by alexander tok...
P9 speed of-light faceted search via oracle in-memory option by alexander tok...P9 speed of-light faceted search via oracle in-memory option by alexander tok...
P9 speed of-light faceted search via oracle in-memory option by alexander tok...
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
 

Viewers also liked

La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphes
Cédric Fauvet
 

Viewers also liked (20)

20150624 Belgian GraphDB meetup at Ordina
20150624 Belgian GraphDB meetup at Ordina20150624 Belgian GraphDB meetup at Ordina
20150624 Belgian GraphDB meetup at Ordina
 
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 WinnipegGraph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
 
NoSQL Now! Introduction to Graph Databases
NoSQL Now! Introduction to Graph DatabasesNoSQL Now! Introduction to Graph Databases
NoSQL Now! Introduction to Graph Databases
 
La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphes
 
Introduction à Neo4j - La base de données de graphes - 2016
Introduction à Neo4j - La base de données de graphes - 2016Introduction à Neo4j - La base de données de graphes - 2016
Introduction à Neo4j - La base de données de graphes - 2016
 
Intro to Neo4j - Nicole White
Intro to Neo4j - Nicole WhiteIntro to Neo4j - Nicole White
Intro to Neo4j - Nicole White
 
Neo4J : Introduction to Graph Database
Neo4J : Introduction to Graph DatabaseNeo4J : Introduction to Graph Database
Neo4J : Introduction to Graph Database
 
Neo4j -- or why graph dbs kick ass
Neo4j -- or why graph dbs kick assNeo4j -- or why graph dbs kick ass
Neo4j -- or why graph dbs kick ass
 
Intro to Cypher
Intro to CypherIntro to Cypher
Intro to Cypher
 
Neo4j
Neo4jNeo4j
Neo4j
 
Intro to Neo4j or why insurances should love graphs
Intro to Neo4j or why insurances should love graphsIntro to Neo4j or why insurances should love graphs
Intro to Neo4j or why insurances should love graphs
 
Neo4j - Cas d'usages pour votre métier
Neo4j - Cas d'usages pour votre métierNeo4j - Cas d'usages pour votre métier
Neo4j - Cas d'usages pour votre métier
 
Immersion Musicale avec Neo4j
Immersion Musicale avec Neo4jImmersion Musicale avec Neo4j
Immersion Musicale avec Neo4j
 
OrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesOrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databases
 
RDBMS to Graphs
RDBMS to GraphsRDBMS to Graphs
RDBMS to Graphs
 
The Graph Traversal Programming Pattern
The Graph Traversal Programming PatternThe Graph Traversal Programming Pattern
The Graph Traversal Programming Pattern
 
Recommandations avec Neo4j et le GraphAware Recommendation Engine
Recommandations avec Neo4j et le GraphAware Recommendation EngineRecommandations avec Neo4j et le GraphAware Recommendation Engine
Recommandations avec Neo4j et le GraphAware Recommendation Engine
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
 
Introduction to Graph databases and Neo4j (by Stefan Armbruster)
Introduction to Graph databases and Neo4j (by Stefan Armbruster)Introduction to Graph databases and Neo4j (by Stefan Armbruster)
Introduction to Graph databases and Neo4j (by Stefan Armbruster)
 
Neo4j Introduction (Basics, Cypher, RDBMS to GRAPH)
Neo4j Introduction (Basics, Cypher, RDBMS to GRAPH) Neo4j Introduction (Basics, Cypher, RDBMS to GRAPH)
Neo4j Introduction (Basics, Cypher, RDBMS to GRAPH)
 

Similar to A case-for-graph-db

An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
Shiyong Lu
 
Data massage! databases scaled from one to one million nodes (ulf wendel)
Data massage! databases scaled from one to one million nodes (ulf wendel)Data massage! databases scaled from one to one million nodes (ulf wendel)
Data massage! databases scaled from one to one million nodes (ulf wendel)
Zhang Bo
 

Similar to A case-for-graph-db (20)

Neo4j MySql MS-SQL comparison
Neo4j MySql MS-SQL comparisonNeo4j MySql MS-SQL comparison
Neo4j MySql MS-SQL comparison
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
Access Data from XPages with the Relational Controls
Access Data from XPages with the Relational ControlsAccess Data from XPages with the Relational Controls
Access Data from XPages with the Relational Controls
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloud
 
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)
 
Big Data Tel Aviv 2019 v.3.0 I 'Graph database 4 beginners' - Michael Kogan
Big Data Tel Aviv 2019 v.3.0 I 'Graph database 4 beginners' - Michael KoganBig Data Tel Aviv 2019 v.3.0 I 'Graph database 4 beginners' - Michael Kogan
Big Data Tel Aviv 2019 v.3.0 I 'Graph database 4 beginners' - Michael Kogan
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
 
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn SearchStructure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
 
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quickl
 
Change RelationalDB to GraphDB with OrientDB
Change RelationalDB to GraphDB with OrientDBChange RelationalDB to GraphDB with OrientDB
Change RelationalDB to GraphDB with OrientDB
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
database management system
database management systemdatabase management system
database management system
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
Data science unit3
Data science unit3Data science unit3
Data science unit3
 
Data massage! databases scaled from one to one million nodes (ulf wendel)
Data massage! databases scaled from one to one million nodes (ulf wendel)Data massage! databases scaled from one to one million nodes (ulf wendel)
Data massage! databases scaled from one to one million nodes (ulf wendel)
 
My Master's Thesis
My Master's ThesisMy Master's Thesis
My Master's Thesis
 

More from Dhaval Dalal

More from Dhaval Dalal (20)

Test Pyramid in Microservices Context
Test Pyramid in Microservices ContextTest Pyramid in Microservices Context
Test Pyramid in Microservices Context
 
Code Retreat
Code RetreatCode Retreat
Code Retreat
 
Booting into functional programming
Booting into functional programmingBooting into functional programming
Booting into functional programming
 
Currying and Partial Function Application (PFA)
Currying and Partial Function Application (PFA)Currying and Partial Function Application (PFA)
Currying and Partial Function Application (PFA)
 
Creating Lazy stream in CSharp
Creating Lazy stream in CSharpCreating Lazy stream in CSharp
Creating Lazy stream in CSharp
 
Json Viewer Stories
Json Viewer StoriesJson Viewer Stories
Json Viewer Stories
 
Value Objects
Value ObjectsValue Objects
Value Objects
 
Mars rover-extension
Mars rover-extensionMars rover-extension
Mars rover-extension
 
How Is Homeopathy Near To Yoga?
How Is Homeopathy Near To Yoga?How Is Homeopathy Near To Yoga?
How Is Homeopathy Near To Yoga?
 
Approaching ATDD/BDD
Approaching ATDD/BDDApproaching ATDD/BDD
Approaching ATDD/BDD
 
Paradigms Code jugalbandi
Paradigms Code jugalbandiParadigms Code jugalbandi
Paradigms Code jugalbandi
 
Data Reconciliation
Data ReconciliationData Reconciliation
Data Reconciliation
 
DRYing to Monad in Java8
DRYing to Monad in Java8DRYing to Monad in Java8
DRYing to Monad in Java8
 
CodeRetreat
CodeRetreatCodeRetreat
CodeRetreat
 
4-Code-Jugalbandi-destructuring-patternmatching-healthycode#apr2015
4-Code-Jugalbandi-destructuring-patternmatching-healthycode#apr20154-Code-Jugalbandi-destructuring-patternmatching-healthycode#apr2015
4-Code-Jugalbandi-destructuring-patternmatching-healthycode#apr2015
 
Jumping-with-java8
Jumping-with-java8Jumping-with-java8
Jumping-with-java8
 
3-CodeJugalbandi-currying-pfa-healthycodemagazine#mar2015
3-CodeJugalbandi-currying-pfa-healthycodemagazine#mar20153-CodeJugalbandi-currying-pfa-healthycodemagazine#mar2015
3-CodeJugalbandi-currying-pfa-healthycodemagazine#mar2015
 
CodeJugalbandi-Sequencing-HealthyCode-Magazine-Feb-2015
CodeJugalbandi-Sequencing-HealthyCode-Magazine-Feb-2015CodeJugalbandi-Sequencing-HealthyCode-Magazine-Feb-2015
CodeJugalbandi-Sequencing-HealthyCode-Magazine-Feb-2015
 
CodeJugalbandi-Expression-Problem-HealthyCode-Magazine#Jan-2015-Issue
CodeJugalbandi-Expression-Problem-HealthyCode-Magazine#Jan-2015-IssueCodeJugalbandi-Expression-Problem-HealthyCode-Magazine#Jan-2015-Issue
CodeJugalbandi-Expression-Problem-HealthyCode-Magazine#Jan-2015-Issue
 
The tao-of-transformation-workshop
The tao-of-transformation-workshopThe tao-of-transformation-workshop
The tao-of-transformation-workshop
 

Recently uploaded

Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligenceRevolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Precisely
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 

Recently uploaded (20)

Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligenceRevolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 

A case-for-graph-db

  • 1. A case for Graph Database? dhaval.dalal@software-artisan.com ! @softwareartisan 11th Sept 2014
  • 2. Context Direct and Cross-Functional reporting represents a network even for a simple organisation. What about modelling a group?
  • 3. Apiary Functionality Structural Operations Mine Organisational Data ! • Expand/Collapse levels • View lineage • Summary Data at all levels • CRUD on all data (nodes/ relationships) • Link/De-link Sub-graphs or nodes • Evolving attributes of nodes and relationships based • Adding new nodes and relationships ! • Affinities Graph - Who talks to whom the most • Discover Skills Communities • Detecting overlap using SLPA (Speaker-listener Label Propogation)
  • 4. Convince ourself first! • Anyone should ask - “What makes it a case for Graph DB? and Can you prove it?” • Its basically a de-risking act. • Two major aspects that we looked at • Flexibility in schema evolution • Performance
  • 5. What to compare against? • RDBMS’ are a natural choice to be compared against. • MongoDB, though a NoSQL document store • good for storing DDD style aggregates. • not for inter-connected data. • We picked Neo4j • But remember, this is not a battle, we are just trying to find out when you should use what!
  • 8. Flexibility in Evolution • Entity Diversity • Different kinds of nodes
  • 9. Flexibility in Evolution • Entity Diversity • Different kinds of nodes • Connection Diversity • links could have different weights, directions.
  • 10. Flexibility in Evolution • Entity Diversity • Different kinds of nodes • Connection Diversity • links could have different weights, directions. • Evolution of Entities and Links, themselves over time. • Varietal data needs • Is every node/link structured regularly or irregularly, connected or disconnected nodes etc…
  • 11. Minimal set of functionality Analysis Model (Phase 1) Analysis Model (Phase 1) Neo4J Domain Model 1) Neo4J Domain Model Node Properties Person name Node Properties Person name type level type level Relationships Properties DIRECTLY_MANAGES N/A Note: For the purpose of establishing the case, we have modeled minimal relationships and not all the relationships that would have been in the final application. Below is a list of relationships that are yet pending to be modeled, but are not relevant for the purposes of taking performance measurements. Relationships Properties DIRECTLY_MANAGES Relationships N/A Properties 2) SQL Domain Model Queries Above screen-flow and modeling for organizations and groups use cases requires us to run
  • 12. Measured performance of 3 1) Gather Subordinates names till a visibility level from a current level CURRENT LEVEL queries We have varied total hierarchy levels in the organization Aggregate from 3 to 8 for people. Optimized generic Cypher query is: Data start n = node:Person(name = "fName lName") match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->m return nodes(p) Where visibility level is a number that indicates the number of levels to show. For SQL we have to recursively add joins for each level, a generic SQL can SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS Reportee1, L2Reportees.directly_manages AS Reportee2, ... FROM person_reportee manager LEFT JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid LEFT JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid ... ... Current Level WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") Names • Subordinate names from current level until a visibility level • Aggregate Data from current level until a visibility level • Overall Aggregate Data for Dashboard • distribution of people at various levels 2) Gather Subordinates Aggregate data from current level We have varied total hierarchy levels in the organization from 3 to 8 for different volumes of people. Further, the optimized Generic Cypher query is: start n = node:Person(name = "fName lName") match n-[:DIRECTLY_MANAGES*0..(totalLevels - n.level)]->m-[:DIRECTLY_MANAGES*1..(totalLevels - n.level)]->o where n.level + visibilityLevel >= m.level return m.name as Subordinate, count(o) as Total For SQL aggregate query, we not only have to recursively add joins but also perform inner unions for each level till the last level to obtain the aggregate data for that level. Once we obtain the data for a particular level (per person), we perform outer unions to get the final result for all the levels. This results in a very big SQL query. Here is a sample query that returns aggregate data for 3 levels below the current level. Say,
  • 13. Apple-to-Apple comparison • MySQL, MS-SQL and Neo4j • We did not use Traversal API (though faster), just as Cypher is to Neo4j as SQL is to RDBMS’ • For longer term, Neo4j intends to further Cypher Query planning and optimisation. • Indexing • Enabled on join columns for MySQL and MS-SQL DBs • For Neo4j, Person names were indexed
  • 14. Environment Consistency • Same machine for all databases • MySQL v5.6.12, MS-SQL Server 2008 R2, Neo4j v1.9 (Advanced) • DB and tools on the same machine • Avoid network transport times being factored in. • Out-of-box settings for all DBs apart from giving 3 GB to the java process that ran Neo4j in Embedded mode.
  • 15. Functional Equivalence • Consistent data distribution. • 8 Levels in org with people at each level managing the next for all DBs. • Functionally equivalent queries. • Measurements for worst possible queries scenario being executed by the application • Say if the top-boss logs in and wants to see all the levels (max. visibility level), the query will take the most time.
  • 16. Measurement Tools • MS-SQL • Query Profiler • MySQL • we noted the duration (excluding fetch time) from MySQL workbench • Neo4j • Executed Parameteric queries programatically in Embedded Mode. • Did not use Neo4j shell for measurements as its intended to be an Ops tool (and not a transactional tool).
  • 17. 1) Gather Subordinates names till a visibility level from a current level start n = node:Person(name = "fName lName") start n = node:Person(name = "fName lName") match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->m return nodes(p) Where visibility level is a number that indicates the number of Gather Subordinate Names match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->m return nodes(p) Where visibility level is a number that indicates the number of For SQL we have to recursively add joins for each level, a generic SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS Reportee1, L2Reportees.directly_manages AS Reportee2, ... FROM person_reportee manager LEFT JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid LEFT JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid Query 1) We have varied total hierarchy ... ... levels in the organization from people. Optimized generic WHERE manager.Cypher pid = (SELECT query id FROM person is: WHERE name = "fName lName") Visibility Database Queries Level 2 Neo4j start n = node:Person(name = "fName match n-[:DIRECTLY_MANAGES*1..2]->m return nodes(p) SELECT manager.pid AS Bigboss, manager.Subordinate, L1Reportees.directly_manages MySQL/ MSSQL Gather Subordinates names till a visibility level from a current level CURRENT LEVEL Names We have varied total hierarchy levels in the organization from 3 to 8 for different volumes of people. Optimized generic Cypher query is: start n = node:Person(name = "fName lName") match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->m return nodes(p) Where visibility level is a number that indicates the number of levels to show. For SQL we have to recursively add joins for each level, a generic SQL can be written as: SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS Reportee1, L2Reportees.directly_manages AS Reportee2, ... FROM person_reportee manager LEFT JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid LEFT JOIN person_reportee L2Reportees
  • 18. 5 5 124 16 5616 5460 125 31 1377 757 1830 913 194 6 109 0 62 15 47 0 212 43 138 13 59 0 813 166 2848 763 1533 221 6 125 0 156 62 63 0 118 78 278 72 26 0 699 171 3607 1124 1387 319 7 109 16 94 47 62 0 151 90 420 71 13 0 728 177 3123 929 1457 357 8 110 0 141 31 78 0 231 46 621 84 18 0 721 164 2879 1040 906 207 109 0 9734 9625 140 31 1006 811 3002 1274 159 7 93 0 16271 16100 156 31 1125 784 2947 1566 158 8 Gather Subordinate Names 1000 People - Gather Subordinate Names 200 150 100 50 0 164 178 166 171 177 164 3 4 5 6 7 8 Query Execution Time (ms) Levels Volume 1M People Org 109 0 25334 25287 125 32 1285 951 4256 2171 510 100K People - Gather Subordinate Names 0 1000 People - Gather Subordinate Aggregate 1500 1125 750 MySQL - LEFT Join MySQL - INNER Neo4j 375 Cold Warm Cold Warm Cold 3 760 1000 750 500 851 763 1124 929 1040 250 166 167 167 175 177 183 3 4 5 6 7 8 Query Execution Time (ms) Levels 10K People - Gather Subordinate Names 400 300 200 221 100 0 -100 1M People - Gather Subordinate Names 1000 People - Overall Aggregate 269 185 183 171 221 10K People - Gather Subordinate Aggregate 3000 2250 319 9000 6750 357 4500 207 1500 750 2250 898 952 3 4 5 6 7 8 Query Execution Time (ms) Levels Warm Cache Plots 14 Levels MySQL MSSQL Neo4j Subordinate Names query for Warm 1M, Cache 2M Plots Level 8, and Volume 10K People Org MySQL MS-SQL Neo4j Lvls Gather Subordinate Names Gather Subordinate Aggregate Overall Aggregate Gather Subordinate Names Gather Subordinate Aggregate Overall Aggregate Gather Subordinate Names Gather Subordinate Aggregate Overall Aggregate Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm 3 109 0 140 31 109 15 172 107 294 181 103 1 730 176 6129 898 2292 613 4 141 16 343 234 109 0 229 116 1267 156 102 1 776 178 5865 952 2073 455 5 94 15 640 561 94 15 270 164 403 188 327 1 727 184 5718 1194 2119 328 6 94 0 998 905 78 0 401 123 650 254 38 3 742 221 5692 1458 3182 453 7 93 0 1482 1326 78 0 303 177 870 374 68 1 705 185 5710 1188 1990 456 8 93 0 2745 2683 78 0 387 163 1146 573 71 1 713 171 6002 1271 1976 36 300 225 150 75 0 176 178 184 3 4 5 6 7 8 Query Execution Time (ms) Levels 0 1194 1458 600 450 300 1M People Query 100K People - Overall Aggregate 1188 1271 150 176 182 184 173 177 153 3 4 5 6 7 8 Query Execution Time (ms) Levels 10K People - Overall Aggregate 3 4 5 6 7 8 Query Execution Time (ms) Levels 100K People 30000 22500 15000 7500 Query 0 3 4404 Execution Time (ms) 0 594 300000 225000 566 497 3 4 5 6 7 573 592 Query Execution Time (ms) 150000 75000 Levels Warm Cache Plots MySQL MS-SQL Lvls Gather Subordin ate Names Gather Subordinate Aggregate Overall Aggregate Gather Subordinate Names Gather Subordinate Aggregate Overall Aggregate Cold War m Cold Warm Cold War m 109 16 10998 10171 452 281 6337 4973 22703 20037 798 4 172 0 20467 19734 515 280 5677 5280 14888 7813 1245 5 172 16 60606 60653 531 281 7111 5959 12047 8293 1211 6 141 0 92789 92867 483 265 7278 6266 15226 7076 1251 7 218 0 160151 158793 499 280 11890 6724 15226 7076 1330 8 265 0 281301 280630 577 265 11300 8777 20560 14564 976 0 -2250 3 4 5 6 7 8 Query Execution Time (ms) Levels 0 3 44265 Execution Time (ms) 1M People - Overall Aggregate execution time for MySQL. Below are the results of application? Say, if the situation demanded that we change would find ourselves in a bad position performance that Neo4j’s performance is almost constant time factor-in INDIRECTLY_MANAGES relationship, translate to additional join for SQL, thus bloating increasing complexity. This will further degrade the take more time. 1M-3 1M-5 1M-7 2M-8 -17500 0 17500 35000 52500 70000 LEFT Vs INNER Vs Neo4j Query Execution Time (ms) Org Size-Levels Warm Cache Plots
  • 19. queries using inner join on 1M (all levels), 2M and 3M (for level 8), we increase in query execution time for MySQL. Below are the results of MySQL - Left Vs Inner Join from the Gather Subordinate Names query for 1M, 2M Level 8, and 70000 52500 35000 17500 0 -17500 Warm Cache Plots LEFT Vs INNER Vs Neo4j 1M-3 1M-5 1M-7 2M-8 Query Execution Time (ms) Org Size-Levels MySQL - LEFT Join MySQL - INNER Neo4j Neo4j JOIN warm cold warm 16 718 176 0 740 182 874 721 184 312 709 173 3432 700 177 15896 835 153 36301 822 149 61776 744 148 us or any application? Say, if the situation demanded that we change
  • 22. Performance 1. Cartesian Product All possible combinations
  • 23. Performance 1. Cartesian Product All possible combinations
  • 24. Performance 1. Cartesian Product All possible combinations 2. Filter
  • 25. Performance 1. Cartesian Product All possible combinations 2. Filter
  • 26. Performance Data Size 1. Cartesian Product All possible combinations 2. Filter DS
  • 27. Performance Data Size 1. Cartesian Product All possible combinations 2. Filter DS Time
  • 28. Performance Joins Data Size 1. Cartesian Product All possible combinations 2. Filter | J DS Time
  • 29. Performance Joins Data Size 1. Cartesian Product All possible combinations 2. Filter Project People Skills | J DS Time
  • 30. Performance Joins Data Size 1. Cartesian Product All possible combinations 2. Filter Project People Skills DS Time Query is now localised to a section of graph, solves large data-size problem | J
  • 31. Performance Joins Data Size 1. Cartesian Product All possible combinations 2. Filter Project People Skills DS Time Traversal Query is now localised to a section of graph, solves large data-size problem | J
  • 32. Performance Joins Data Size 1. Cartesian Product All possible combinations 2. Filter Project People Skills Traversal Query is now localised to a DS Time Time section of graph, solves large data-size problem | J
  • 33. Relationships • Not first-class citizens in RDBMS’ and NoSQL aggregate stores • document stores or key-value stores. • Cost of executing connected queries is high. • “Who reports immediately to Smarty Pants?” - not a problem • But, “Who all report to Smarty Pants?” - introduces recursive joins and going down more than 5-6 levels the space and time complexity is very high. • “What skills does this person have?“ is relatively cheaper than “which people have these skills?” • what about - “which are people who have these skills also have those skills?”
  • 34. Gather Subordinates Aggregate Data Volume 10K People Org MySQL MS-SQL Neo4j Gather Subordinate Aggregate People - Gather Subordinate Names Subordinate Names query for 1M, 2M Level 8, and execution time for MySQL. Below are the results of 178 Levels 14 166 171 177 MySQL MSSQL Neo4j Warm Cache Plots Volume 1M People Org MySQL MS-SQL Neo4j 100K People - Gather Subordinate Aggregate 30000 application? Say, if the situation demanded that we change would find ourselves in a bad position performance that Neo4j’s performance is almost constant time factor-in INDIRECTLY_MANAGES relationship, translate to additional join for SQL, thus bloating increasing complexity. This will further degrade the 4404 4697 5516 6145 6915 8304 1M People - Gather Subordinate Aggregate 73121 take more time. 100K People - Gather Subordinate Names Query Execution Time (ms) Levels 166 167 167 175 177 183 1M People - Gather Subordinate Names 600 450 300 150 100K People - Overall Aggregate 566 594 -17500 1000 People - Gather Subordinate Aggregate 1040 1000 750 500 250 10K People - Gather Subordinate Aggregate 1188 1271 0 1124 9000 6750 4500 1458 2250 0 17500 851 763 1000 People - Overall Aggregate 207 898 952 1194 35000 1500 1125 750 760 People - Gather Subordinate Names 3000 Query Execution Time (ms) Levels 2250 319 357 1500 750 52500 164 221 185 171 70000 1M-3 1M-5 1M-7 2M-8 LEFT Vs INNER Vs Neo4j 929 176 182 184 173 177 153 Query Execution Time (ms) MySQL - LEFT Join MySQL - INNER Neo4j Org Size-Levels Warm Cache Plots Subordinate Aggregate Aggregate Subordinate Names Subordinate Aggregate Aggregate Subordinate Names Subordinate Aggregate Aggregate Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm 109 0 93 0 226 0 49 0 6 0 679 164 3329 760 1538 269 94 16 78 16 114 82 133 61 35 0 670 178 2878 851 892 183 62 15 47 0 212 43 138 13 59 0 813 166 2848 763 1533 221 156 62 63 0 118 78 278 72 26 0 699 171 3607 1124 1387 319 94 47 62 0 151 90 420 71 13 0 728 177 3123 929 1457 357 141 31 78 0 231 46 621 84 18 0 721 164 2879 1040 906 207 4 5 6 7 8 Levels 375 3 4 5 6 7 8 Query Execution Time (ms) Levels 400 300 200 221 100 0 -100 269 183 3 4 5 6 7 8 Warm Cache Plots MySQL MSSQL Neo4j Overall Aggregate Gather Subordinate Names Gather Subordinate Aggregate Overall Aggregate Gather Subordinate Names Gather Subordinate Aggregate Overall Aggregate Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm 140 31 109 15 172 107 294 181 103 1 730 176 6129 898 2292 613 343 234 109 0 229 116 1267 156 102 1 776 178 5865 952 2073 455 640 561 94 15 270 164 403 188 327 1 727 184 5718 1194 2119 328 998 905 78 0 401 123 650 254 38 3 742 221 5692 1458 3182 453 1482 1326 78 0 303 177 870 374 68 1 705 185 5710 1188 1990 456 2745 2683 78 0 387 163 1146 573 71 1 713 171 6002 1271 1976 36 178 184 5 6 7 8 Levels 0 3 4 5 6 7 8 Query Execution Time (ms) Levels 700 10K People - Overall Aggregate Subordinate Names Subordinate Aggregate Aggregate Subordinate Names Subordinate Aggregate Aggregate Subordinate Names Subordinate Aggregate Aggregate Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm 3 124 0 1107 998 140 31 617 537 1201 608 168 12 736 166 13340 4404 5157 566 4 109 0 2231 2106 140 31 1224 650 1263 667 157 19 812 167 14686 4697 5396 594 5 124 16 5616 5460 125 31 1377 757 1830 913 194 12 713 167 15351 5516 5118 497 6 109 0 9734 9625 140 31 1006 811 3002 1274 159 12 756 175 17056 6145 4736 573 7 93 0 16271 16100 156 31 1125 784 2947 1566 158 14 700 177 17821 6915 4592 592 8 109 0 25334 25287 125 32 1285 951 4256 2171 510 14 691 183 19881 8304 4731 557 0 3 4 5 6 7 8 22500 15000 7500 0 3 4 5 6 7 8 Query Execution Time (ms) Levels 0 497 300000 225000 573 592 557 150000 75000 106260 44265 3 4 5 6 7 8 Query Execution Time (ms) Levels Warm Cache Plots Lvls Gather Subordin ate Names Gather Subordinate Aggregate Overall Aggregate Gather Subordinate Names Gather Subordinate Aggregate Overall Aggregate Gather Subordinat e Names Gather Subordinate Aggregate Overall Aggregate Cold War m Cold Warm Cold War m Cold Warm Cold Warm Cold War m Cold War m Cold Warm Cold War m 3 109 16 10998 10171 452 281 6337 4973 22703 20037 798 116 718 176 109432 44265 37190 2109 4 172 0 20467 19734 515 280 5677 5280 14888 7813 1245 114 740 182 115011 106260 36033 2194 5 172 16 60606 60653 531 281 7111 5959 12047 8293 1211 140 721 184 169123 77528 34068 2248 6 141 0 92789 92867 483 265 7278 6266 15226 7076 1251 140 709 173 255813 124810 33015 2107 7 218 0 160151 158793 499 280 11890 6724 15226 7076 1330 111 700 177 171607 125152 32813 1960 8 265 0 281301 280630 577 265 11300 8777 20560 14564 976 121 835 153 136432 73121 23673 1912 -2250 3 4 5 6 7 8 Query Execution Time (ms) Levels 0 124810125152 77528 3 4 5 6 7 8 Query Execution Time (ms) Levels 3000 1M People - Overall Aggregate
  • 35. Gather Overall Aggregate Query start n = node(*) ! return n.level as Level, count(n) as Total! order by Level SELECT level, count(id)! FROM person! GROUP BY level
  • 36. 164 178 166 171 177 164 1125 750 Query Execution Time Levels Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm 500 1124 3 109 0 140 31 109 15 172 107 294 750 181 103 1 730 176 6129 929 898 2292 1040 613 4 141 16 343 234 109 0 229 116 1267 156 102 1 776 178 5865 952 2073 455 5 94 15 640 561 94 15 270 164 403 188 327 1 727 184 5718 1194 2119 328 6 94 0 998 905 78 0 401 123 650 254 38 3 742 221 5692 1458 3182 453 7 93 0 1482 1326 78 0 303 177 870 374 68 1 705 185 5710 1188 1990 456 8 93 0 2745 2683 78 0 387 163 1146 573 71 1 713 171 6002 1271 1976 36 22500 Query Execution Time Levels 15000 7500 Gather Overall 851 763 250 Aggregate 166 167 167 175 177 183 Data 150 100 50 0 3 4 5 6 7 8 Query Execution Time (ms) Levels 375 760 3 4 5 6 7 8 Query Execution Time (ms) Levels 10K People - Gather Subordinate Names 400 300 200 221 100 0 -100 1000 People - Overall Aggregate 269 185 183 171 221 10K People - Gather Subordinate Aggregate 3000 2250 319 357 207 1500 750 898 952 3 4 5 6 7 8 Query Execution Time (ms) Levels 1M People - Gather Subordinate Names 600 450 300 150 176 182 184 173 177 153 3000 14 Warm Cache Plots 10K People - Overall Aggregate MySQL MSSQL Neo4j 14 Levels MySQL MSSQL Neo4j Subordinate Names query for 1M, 2M Level 8, and execution time for MySQL. Below are the results of Warm Cache Plots 100K People - Overall Aggregate 566 594 application? Say, if the situation demanded that we change would find ourselves in a bad position performance that Neo4j’s performance is almost constant time 4404 4697 5516 6145 6915 8304 1M People - Gather Subordinate Aggregate 300000 573 592 225000 factor-in INDIRECTLY_MANAGES relationship, translate to additional join for SQL, thus bloating 557 increasing complexity. This will further degrade the 106260 124810125152 77528 1M People - Overall Aggregate 2109 2194 2248 2107 1960 1912 take more time. 9000 6750 4500 1194 1458 2250 0 -17500 453 456 0 328 17500 455 35000 700 525 350 175 0 613 52500 176 178 184 70000 1M-3 1M-5 1M-7 2M-8 LEFT Vs INNER Vs Neo4j 36 Query Execution Time (ms) MySQL - LEFT Join MySQL - INNER Neo4j 1188 1271 Org Size-Levels 300 225 150 75 Warm Cache Plots 0 3 4 5 6 7 8 Query Execution Time (ms) Levels 0 3 4 5 6 7 8 Query Execution Time (ms) Levels -175 3 4 5 6 7 8 Query Execution Time (ms) Levels Warm Cache Plots MySQL MSSQL Neo4j 0 3 4 5 6 7 8 0 3 4 5 6 7 8 0 497 150000 75000 44265 3 4 5 6 7 8 Query Execution Time (ms) Levels 16 Warm Cache Plots MySQL MSSQL Neo4J Names Cold War m Cold Warm Cold War m Cold Warm Cold Warm Cold War m Cold War m Cold Warm Cold War m 3 109 16 10998 10171 452 281 6337 4973 22703 20037 798 116 718 176 109432 44265 37190 2109 4 172 0 20467 19734 515 280 5677 5280 14888 7813 1245 114 740 182 115011 106260 36033 2194 5 172 16 60606 60653 531 281 7111 5959 12047 8293 1211 140 721 184 169123 77528 34068 2248 6 141 0 92789 92867 483 265 7278 6266 15226 7076 1251 140 709 173 255813 124810 33015 2107 7 218 0 160151 158793 499 280 11890 6724 15226 7076 1330 111 700 177 171607 125152 32813 1960 8 265 0 281301 280630 577 265 11300 8777 20560 14564 976 121 835 153 136432 73121 23673 1912 Warm Cache Plots -2250 3 4 5 6 7 8 Query Execution Time (ms) Levels 0 73121 3 4 5 6 7 8 Query Execution Time (ms) Levels 2250 1500 750 0 3 4 5 6 7 8 Query Execution Time (ms) Levels 17 MySQL MSSQL Neo4J
  • 37. But, what if... • I need to aggregate data, placing an OLAPish demand on the data? • Use non-graph stores: RDBMS or NoSQL alongside. • Graph Compute Engines are optimised for scanning and processing large amounts of information in batch. • Giraph, Pegasus etc… • Polyglot persistence is a norm. • Has anyone tried Datomic?
  • 38. Asking Questions • Is your data connected? • Or does your domain naturally gravitate towards or has good number of joins (Dense Connections) leading to explosion with large data? • For example: Making Recommendations • Or are you finding yourself writing complex SQL Queries? • For example: recursive joins or lots of joins
  • 39. Qs?
  • 40. References • Graph Databases • Jim Webber, Ian Robinson and Emil Eifrem • Apiary: A case for Neo4j? • Anuj Mathur and Dhaval Dalal • Code and scaffolding programs that we used are available on - https://github.com/ EqualExperts/Apiary-Neo4j- RDBMS-Comparison