0
Gossip-based Partitioning
and Replication
Middle-ware for
Online Social Networks
Muhammad Anis Uddin Nasir
(EMDC/ICT/LCN)
...
Online Social Networks
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
•Vertice...
Existing Solutions
• Relational Databases
- MySQL Cluster
• Key-Value stores
- Cassandra, Amazon Dynamo
• Document Databas...
Why Existing Solutions are not
enough?
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Midd...
Why Existing Solutions are not
enough?
• Random Partitioning
• Social Request
- E.g., gather new feeds
from all the friend...
Social Graphs are not Random
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 6
Graph Partitioning
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 7
JA-BE-JA- edge-cut
8/27/2013
Muhammad Anis Uddin Nasir- Gossip-based Partitioning and
Replication Middle-ware
Server A Ser...
SPAR- Minimizing Replicas
8/27/2013
Muhammad Anis Uddin Nasir- Gossip-based Partitioning and
Replication Middle-ware
Serve...
Initialization
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
5
3
4
2
1
10
8
9...
Gossip Phase
• Cost Function
- Count number of replicas
- For current and new server
• Peer Selection
- Local, Random, Hyb...
Gossip Phase
• Cost Function
- Count number of replicas
- For existing and new server
• Peer Selection
- Local, Random, Hy...
Simulated Annealing
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 13
Algorithms
Algorithm Random SPAR JA-BE-JA Gossip-based
Data locality
Decentralized
Load Balancing
Fault tolerance
Avoiding...
Datasets
Datasets Vertices Edges
Synth-C 2,000 20,000
Synth-HC 2,000 20,000
Synth-PL 2,000 20,000
SNAP-Facebook 4,039 88,2...
Evaluation- with datasets
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
0
2
4...
Evaluation- with replication factor
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-...
Evaluation- with servers
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
Replic...
Evaluation- dynamicity
8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
• Number...
Conclusion
• Random Partitioning does not provide efficient
solution of Online Social Networks
• Minimizing Replicas can h...
Gossip-based Partitioning
and Replication
Middle-ware for
Online Social Networks
Muhammad Anis Uddin Nasir
(EMDC/ICT/LCN)
...
Future Work
• Execution of the algorithm with large datasets using
parallel graph processing frameworks like
GraphLab and ...
Upcoming SlideShare
Loading in...5
×

Gossip based partitioning and replication for Online Social Networks

791

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
791
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Gossip based partitioning and replication for Online Social Networks"

  1. 1. Gossip-based Partitioning and Replication Middle-ware for Online Social Networks Muhammad Anis Uddin Nasir (EMDC/ICT/LCN) Supervisor: Šarūnas Girdzijauskas Examiner: Johan Montelius
  2. 2. Online Social Networks 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware •Vertices •Edges •Metadata Ioanna Antonio Vaidas Aras Vasia Anis Mudit Manos 2 LeandroJohan
  3. 3. Existing Solutions • Relational Databases - MySQL Cluster • Key-Value stores - Cassandra, Amazon Dynamo • Document Databases - MongoDB, CouchDB • Graph Databases - Neo4j, Titans 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 3
  4. 4. Why Existing Solutions are not enough? 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 5 3 4 2 1 10 8 9 7 6 4
  5. 5. Why Existing Solutions are not enough? • Random Partitioning • Social Request - E.g., gather new feeds from all the friends • Enforcing Data Locality • Random partitioning can lead to full replication! 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 5 3 4 2 1 10 8 9 7 6 1 4 7 82 3 5 6 10 9 1’ 4’ 7’ 8’ 9’ 2’ 3’ 6’5’ 10’ 5
  6. 6. Social Graphs are not Random 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 6
  7. 7. Graph Partitioning 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 7
  8. 8. JA-BE-JA- edge-cut 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware Server A Server B 6 3 5 2 1 4 76’ 3’ 1’ 4’ 7’ • Edge Cut = 3 links, 3+2=5 replicas to maintain 8
  9. 9. SPAR- Minimizing Replicas 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware Server A Server B 6 3 5 2 1 4 76’ 3’2’ 5’ • Edge Cut = 4 links, 2+2=4 replicas to maintain 9
  10. 10. Initialization 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 5 3 4 2 1 10 8 9 7 6 1 4 7 82 3 5 6 10 9 1’ 4’ 7’ 8’ 9’ 2’ 3’ 6’5’ 10’ • Node Addition - Assign it to server with minimum master • Edge Addition - Check if Nodes are Local - Else create replicas to maintain locality 10
  11. 11. Gossip Phase • Cost Function - Count number of replicas - For current and new server • Peer Selection - Local, Random, Hybrid 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 5 3 4 2 1 10 8 9 7 6 1 4 7 82 3 5 6 10 9 1’ 4’ 7’ 8’ 9’ 5’ 10’ 11 2’ 3’ 6’
  12. 12. Gossip Phase • Cost Function - Count number of replicas - For existing and new server • Peer Selection - Local, Random, Hybrid • Simulated Annealing 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 5 3 4 2 1 10 8 9 7 6 6 4 7 82 3 5 1 10 9 4’ 8’ 9’ 3’ 5’ 10’6’ 1’ 4 10 12
  13. 13. Simulated Annealing 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 13
  14. 14. Algorithms Algorithm Random SPAR JA-BE-JA Gossip-based Data locality Decentralized Load Balancing Fault tolerance Avoiding Local Optima Availability 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 14
  15. 15. Datasets Datasets Vertices Edges Synth-C 2,000 20,000 Synth-HC 2,000 20,000 Synth-PL 2,000 20,000 SNAP-Facebook 4,039 88,234 WSON-Facebook 60,290 1,545,686 SNAP-Twitter 81,306 1,768,149 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 15
  16. 16. Evaluation- with datasets 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 0 2 4 6 8 10 12 Random SPAR JA-BE-JA Gossip-based ReplicationOverhead >3x gain compared to Random Partitioning ≈2x gain compared to SPAR • Number of Servers =16, Replication factor=2 16
  17. 17. Evaluation- with replication factor 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware ReplicationOverhead • Number of Servers =16 0 1 2 3 4 5 6 7 8 9 10 f=0 f=2 Random Graphs generates maximum replication overhead Real Graphs generates minimum replication overhead Data locality is achieved by fault tolerance replicas 17
  18. 18. Evaluation- with servers 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware ReplicationOverhead • Replication factor =2 Number of Servers WSON-Facebook 18 0 2 4 6 8 10 12 14 16 18 20 8 16 32 64 Random SPAR JA-BE-JA Gossip-based Gossip-based generates minimum replication overhead Replication overhead increases non linearly >4x gain compared to Random Partitioning 0 2 4 6 8 10 12 14 16 18 20 8 16 32 64 Gossip-based
  19. 19. Evaluation- dynamicity 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware • Number of Servers =16, Replication factor=2 0.2 0.25 0.3 0.35 0.4 0.45 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 0.2 0.25 0.3 0.35 0.4 0.45 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 SNAP-Twitter SNAP-Facebook Number of cycles Number of cycles ReplicationOverhead ReplicationOverhead Spikes show bulk edge addition Algorithm Stabilization 19 Transition state, i.e., reducing the number of replicas after new edge additions
  20. 20. Conclusion • Random Partitioning does not provide efficient solution of Online Social Networks • Minimizing Replicas can help to achieve better partitioning • Gossip-based heuristic was proposed to solve the minimization problem while achieving the global optima • Algorithm able to handle different datasets and adjusts with dynamic nature of OSNs 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 20
  21. 21. Gossip-based Partitioning and Replication Middle-ware for Online Social Networks Muhammad Anis Uddin Nasir (EMDC/ICT/LCN) Supervisor: Šarūnas Girdzijauskas Examiner: Johan Montelius
  22. 22. Future Work • Execution of the algorithm with large datasets using parallel graph processing frameworks like GraphLab and Apache Girpah • Load Balancing using both Master and Replicas and providing different consistency levels • Smart Replication to provide data locality for highly interactive nodes • Implement different consistency strategies based to access patterns 8/27/2013 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 22
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×