Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Avoiding Deadlocks in Neo4j on
Z-Platform
- Mahesh Chaudhari, Cesar Arevalo &
Brian Roy
Outline
•
•
•
•
•
•

Introduction to the Z-Platform
Problems caused by Deadlocks
Locks and Deadlocks in Neo4j
Avoidance us...
Full Entity
Profiles

Z-Platform
Sparse
Representatio
n of Profiles

MongoDB
Database

Json
Documents

Nodes & Edges
Neo4j...
Deadlocks in Z-Platform
• Creating relationships is one of the most time
consuming processes
• Log analysis reveals deadlo...
Locks in Neo4j
• Create a Node n1  Write Lock on Node n1
• Update a Node n1  Write Lock on Node n1
but read available on...
Deadlocks across processes
A

B

P1

C

D

• Processes: P1 and P2
• Nodes: A, B, C, D
• Relationships: R1, R2, R3, R4

P2
...
Deadlocks across Transactions
• Transactions are also like separate processes
but in a single thread or multiple threads
•...
Concurrent Transactions Deadlocks
A

B

T1

A

B

T1

C

D

T2

A

D

T2

No Deadlocks

Possibility of Deadlock

8
Sequential Asynchronous Transactions
Deadlocks
A

E

B
.
. n edges
.
F

A
T1
E

B

A

.
. n edges
.
F

D

T1

E

C

No Dea...
Deadlocks Detection and Avoidance
• Deadlocks Detection
– Only possible at run-time
– Recovery from deadlock is either to ...
Bipartite Graphs
• Given a Graph G with Vertices V and Edges
E, then graph G is a bipartite graph such that
vertices V can...
Creating Bipartite Graphs
• Use two colors to color each node such that
no two adjacent nodes have the same color.
1

2

A...
Non-Bipartite Graphs
1

2
V1

A

E

V2

A
D
C

C

D

E
B

B

13
Algorithm to generate Graph
V1

V2

A

D
C
E
B

• Create all the nodes
• Create batches of
relationships among
the same co...
Algorithm in Z-Platform
• Batch of relationships R = {r1, r2, r3….. rn} :
– each r is a triplet {src, dest, props} where s...
Performance – Test Setup
•
•
•
•
•

JDK 1.7
Neo4j Java Binding Rest API
Neo4j Enterprise Server 1.9
Batch size (configurab...
Performance – Creating Nodes
Time in seconds for Nodes
1.8
1.6
1.4
1.2
1
Time in Secs

0.8
0.6
0.4
0.2
0
1

2

3

4

5

6
...
Performance – Creating Relationships
Time in Seconds for relationships
1.4
1.2
1

0.8
Time in Secs

0.6
0.4
0.2
0
1 3 5 7 ...
Performance – Creating 39 Million Relationships

• 39,564,960 Relationships in : 10,573.56 seconds (2 hrs 56 mins 13 secon...
Graph Visualized in the Neo4j 2.0

20
Future Work
• Test performance over the network using Amazon EC2
servers to mimic real world setup
• Single threaded appli...
Conclusion
• Deadlocks in general are time consuming and
difficult to detect and prevent
• Use of graph coloring to partit...
Dr. Mahesh Chaudhari
Sr. Software Engineer
+1 602 524 0610
mahesh@zephyrhealthinc.com

jobs@zephyrhealthinc.com
23
Contact Information
Sven Junkergård

Brian Roy

Director of Technology
+1 415 503 7412
sven@zephyrhealthinc.com

Director ...
Upcoming SlideShare
Loading in …5
×

Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB - Mahesh Chaudhari @ GraphConnect SF 2013

4,157 views

Published on

Z-Platform is the new innovative powerful and complex platform to ingest data of any kind and store the data in the form of JSON documents in MongoDB and represent a sparse representation of the same in Neo4j graph database. Mahesh discusses how he tackled deadlocks and improved the performance of the system significantly. The test environment included small graphs (ranging up to 10000 relationships to very large graphs (ranging up to 39 million relationships). The average performance of the system is 3741 relationships per minute.

Published in: Technology, Health & Medicine
  • Be the first to comment

Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB - Mahesh Chaudhari @ GraphConnect SF 2013

  1. 1. Avoiding Deadlocks in Neo4j on Z-Platform - Mahesh Chaudhari, Cesar Arevalo & Brian Roy
  2. 2. Outline • • • • • • Introduction to the Z-Platform Problems caused by Deadlocks Locks and Deadlocks in Neo4j Avoidance using Bipartite graphs Performance Conclusion 2
  3. 3. Full Entity Profiles Z-Platform Sparse Representatio n of Profiles MongoDB Database Json Documents Nodes & Edges Neo4j Graph Database Z-Platform Source Datasets 3
  4. 4. Deadlocks in Z-Platform • Creating relationships is one of the most time consuming processes • Log analysis reveals deadlocks among batch transactions and retry-mechanism takes time • Dependent on how nodes and relationships are grouped together • Batch size is dependent on the size of the JSON block sent to the server • Time required to build relationships and resolve deadlocks is in the order of seconds 4
  5. 5. Locks in Neo4j • Create a Node n1  Write Lock on Node n1 • Update a Node n1  Write Lock on Node n1 but read available on Node n1 • Create a Relationship r1 between nodes n1 and n2  Write Locks on relationship r1, n1 and n2 5
  6. 6. Deadlocks across processes A B P1 C D • Processes: P1 and P2 • Nodes: A, B, C, D • Relationships: R1, R2, R3, R4 P2 R1 A No Deadlocks B R4 A C B R2 No Deadlocks C R1 R1 A P1 P1 A B P1 R3 A R3 D P2 A D P2 Possibility of Deadlock D R2 C Deadlock 6 P2 D
  7. 7. Deadlocks across Transactions • Transactions are also like separate processes but in a single thread or multiple threads • Deadlocks occur across transactions – Two concurrent transactions need write locks on the same node n1 – In two concurrent transactions, T1 has write lock on node n1 and waiting on write lock on node n2 whereas T2 has write lock on n2 and is waiting for write lock on node n1 – Transactions of varying sizes 7
  8. 8. Concurrent Transactions Deadlocks A B T1 A B T1 C D T2 A D T2 No Deadlocks Possibility of Deadlock 8
  9. 9. Sequential Asynchronous Transactions Deadlocks A E B . . n edges . F A T1 E B A . . n edges . F D T1 E C No Deadlocks D T2 A T1 D T2 A F . . n edges . B No Deadlock Deadlock 9 T2
  10. 10. Deadlocks Detection and Avoidance • Deadlocks Detection – Only possible at run-time – Recovery from deadlock is either to abort or retry • Deadlocks Avoidance – Reorder the operations to lower or eliminate the likelihood of deadlocks • Graph Clustering Algorithms: Most of them require knowledge of entire graph Clustering Relationships  Bipartite Graphs 10
  11. 11. Bipartite Graphs • Given a Graph G with Vertices V and Edges E, then graph G is a bipartite graph such that vertices V can be partitioned into two independent sets V1 and V2. V1 A V2 A E D C C D E B B 11
  12. 12. Creating Bipartite Graphs • Use two colors to color each node such that no two adjacent nodes have the same color. 1 2 A V1 V2 A E D C C D E B B 12
  13. 13. Non-Bipartite Graphs 1 2 V1 A E V2 A D C C D E B B 13
  14. 14. Algorithm to generate Graph V1 V2 A D C E B • Create all the nodes • Create batches of relationships among the same colored nodes • Create batches of relationships across the two colors 14
  15. 15. Algorithm in Z-Platform • Batch of relationships R = {r1, r2, r3….. rn} : – each r is a triplet {src, dest, props} where src and dest are nodes and props is a set of key-value pairs • Color the nodes based on each relationship with two colors • Mark the conflicting edges where both the src and dest nodes are of the same color • Batch these relationships together in a single batch • Start grouping the remaining edges such that no two batches have any node in common 15
  16. 16. Performance – Test Setup • • • • • JDK 1.7 Neo4j Java Binding Rest API Neo4j Enterprise Server 1.9 Batch size (configurable) : 2000 Test Program that generates random nodes (max 1000) and relationships (max 10,000) • Huge file that contains 10,226 nodes and 39,564,960 relationships (5 GB) 16
  17. 17. Performance – Creating Nodes Time in seconds for Nodes 1.8 1.6 1.4 1.2 1 Time in Secs 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 • 10,226 Nodes: 5.07 seconds • Average Time for 2000 Nodes: 0.99 seconds ~ 1 second • Each Node has 11 properties 17
  18. 18. Performance – Creating Relationships Time in Seconds for relationships 1.4 1.2 1 0.8 Time in Secs 0.6 0.4 0.2 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 • 1,74,000 Relationships created in 47.16 seconds • Average Time for 2000 relationships: 0.54 seconds • Number of relationships per second: 3,689 18
  19. 19. Performance – Creating 39 Million Relationships • 39,564,960 Relationships in : 10,573.56 seconds (2 hrs 56 mins 13 seconds) • Average Time for 2000 relationships: 0.53 seconds • Number of relationships per second: 3,741 19
  20. 20. Graph Visualized in the Neo4j 2.0 20
  21. 21. Future Work • Test performance over the network using Amazon EC2 servers to mimic real world setup • Single threaded application  multi-threaded to see if better performance – More complex algorithm to batch relationships together – Analyze if the complexity is worth the performance improvement • Vary multiple factors: – Batch size : 1000 to 4000 – Properties (relationship descriptors) : 2 – 20 • Dispatcher Pattern to facilitate the single point distribution of nodes and relationships to threads/Transactions 21
  22. 22. Conclusion • Deadlocks in general are time consuming and difficult to detect and prevent • Use of graph coloring to partition graph into conflicting and non-conflicting edges • Successful prototype tests shows significant improvement in building relationships varying from small number to a very large number 22
  23. 23. Dr. Mahesh Chaudhari Sr. Software Engineer +1 602 524 0610 mahesh@zephyrhealthinc.com jobs@zephyrhealthinc.com 23
  24. 24. Contact Information Sven Junkergård Brian Roy Director of Technology +1 415 503 7412 sven@zephyrhealthinc.com Director of Platform Engineering & Architect +1 415 663 6919 brian@zephyrhealthinc.com Zephyr Health Inc. 589 Howard St. 3rd Flr. San Francisco, California 94105 +1.415.529.7649 zephyrhealthinc.com 24

×