Your SlideShare is downloading. ×
0
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph

171

Published on

Using a modified version of set cover algorithm to reduce real-time query latencies in LinkedIn's distributed graph infrastructure.

Using a modified version of set cover algorithm to reduce real-time query latencies in LinkedIn's distributed graph infrastructure.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
171
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. SEARCH, NETWORK AND ANALYTICSUsing Set Cover to Optimize a Large-Scale Low Latency Distributed GraphLinkedIn Presentation Template©2013 LinkedIn Corporation. All Rights Reserved. 1Rui WangStaff Software Engineer
  • 2. SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work©2013 LinkedIn Corporation. All Rights Reserved. 2
  • 3. SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work©2013 LinkedIn Corporation. All Rights Reserved. 3
  • 4. SEARCH, NETWORK AND ANALYTICSLinkedIn Products Backed By Social Graph©2013 LinkedIn Corporation. All Rights Reserved. 4
  • 5. SEARCH, NETWORK AND ANALYTICSLinkedIn’s Distributed Graph APIs Graph APIs– Get Connections “Who does member X know?”– Get Shared Connections “Who do I know in common with member Y”?– Get Graph Distances “What’s the graph distances for each member returned from a searchquery?” “Is member Z within my second degree network so that I can view herprofile?” Over 50% queries is to get graph distances©2013 LinkedIn Corporation. All Rights Reserved. 5
  • 6. SEARCH, NETWORK AND ANALYTICSDistributed Graph Architecture Diagram©2013 LinkedIn Corporation. All Rights Reserved. 6
  • 7. SEARCH, NETWORK AND ANALYTICSLinkedIn’s Distributed Graph Infrastructure Graph Database (Graph DB)– Partitioned and replicated graph database– Distributed adjacency list Distributed Network Cache (NCS)– LRU cache stores second degree network for active members– Graph traversals are converted to set intersections Graph APIs– Get Connections– Get Shared Connections– Get Graph Distances©2013 LinkedIn Corporation. All Rights Reserved. 7
  • 8. SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work©2013 LinkedIn Corporation. All Rights Reserved. 8
  • 9. SEARCH, NETWORK AND ANALYTICSGraph Distance Queries and Second Degree Creation©2013 LinkedIn Corporation. All Rights Reserved. 9get graph distancescache lookup[exists=false] retrieve1st degree connections[exists=true] setintersectionsPartition IDspartial mergesK-Waymergesreturnreturn
  • 10. SEARCH, NETWORK AND ANALYTICSThe Scaling Problem Illustrated©2013 LinkedIn Corporation. All Rights Reserved. 10Graph DBNCS• Increased Query Distribution• Merging and De-duping shift to single NCS node
  • 11. SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover and Example Evaluation Related Work Conclusions and Future Work©2013 LinkedIn Corporation. All Rights Reserved. 11
  • 12. SEARCH, NETWORK AND ANALYTICSSet Cover Problem Definition– Given a set of elements U = {1, 2, …, m } (called the universe) and afamily S of n sets whose union equals the universe, the set coverproblem is to identify the smallest subset of S the union of whichcontains all elements in the universe. Greedy Algorithm– Rule to select sets At each stage, choose the set that contains the largest number ofuncovered elements.– Upper bound O(log s), where s is the size of the largest set from S©2013 LinkedIn Corporation. All Rights Reserved. 12
  • 13. SEARCH, NETWORK AND ANALYTICSModified Set Cover Algorithm Example Cont.©2013 LinkedIn Corporation. All Rights Reserved. 13 Build a map of partition ID -> Graph DB nodes for the input graph
  • 14. SEARCH, NETWORK AND ANALYTICSModified Set Cover Algorithm Example Cont.©2013 LinkedIn Corporation. All Rights Reserved. 14 Randomly pick an elementfrom U = { 1,2,5,6 }– e = 5 Retrieve from map– nodes = { R21, R13 } Intersect– R21 = {1,5 } with U,– R13 = { 5,6 } with U Select R21– U = {2,6}, C = { R21}
  • 15. SEARCH, NETWORK AND ANALYTICSModified Set Cover Algorithm Example Cont.©2013 LinkedIn Corporation. All Rights Reserved. 15 Randomly pick an elementfrom U = { 2,6 }– e = 2 Retrieve from map– nodes = { R11, R22 } Intersect– R11 = {1,2 } with U,– R22 = { 2,4 } with U Select R22– U = {6}, C = { R21, R22 }
  • 16. SEARCH, NETWORK AND ANALYTICSModified Set Cover Algorithm Example Cont.©2013 LinkedIn Corporation. All Rights Reserved. 16 Pick the final element fromU = { 6 }– e = 6 Retrieve from map– nodes = { R23, R13 } Intersect– R23 = { 3,6 } with U,– R13 = { 5,6 } with U Select R23– U = {}, C = {R21, R22, R23 }
  • 17. SEARCH, NETWORK AND ANALYTICSModified Set Cover Algorithm Example Solution Solution Compared to Optimal Result for U , where U = {1,2,5,6}©2013 LinkedIn Corporation. All Rights Reserved. 17
  • 18. SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work©2013 LinkedIn Corporation. All Rights Reserved. 18
  • 19. SEARCH, NETWORK AND ANALYTICSEvaluation Production Results– Second degree cache creation drops 38% on 99th percentile– Graph distance calculation drops 25% on 99th percentile– Outbound traffic drops 40%; Inbound traffic drops 10%©2013 LinkedIn Corporation. All Rights Reserved. 1995th quantile 99th quantile0123control setcover control setcoverlatency(normalized)network cache building95th quantile 99th quantile0123control setcover control setcoverlatency(normalized)getdistances
  • 20. SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work©2013 LinkedIn Corporation. All Rights Reserved. 20
  • 21. SEARCH, NETWORK AND ANALYTICSRelated Work Scaling through Replications– Collocating neighbors [Pujol2010]– Replication based on read/write frequencies [Mondal2012]– Replication based on locality [Carrasco2011] Multi-Core Implementations– Parallel graph exploration [Hong2011] Offline Graph Systems– Google’s Pregel [Malewicz2010]– Distributed GraphLab [Low2012]©2013 LinkedIn Corporation. All Rights Reserved. 21
  • 22. SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work©2013 LinkedIn Corporation. All Rights Reserved. 22
  • 23. SEARCH, NETWORK AND ANALYTICSConclusions and Future Work Future Work– Incremental cache updates– Replication on GraphDB nodes through LRU caching– New graph traversal algorithms Conclusions– Key challenges tackled Work distribution balancing Communication Bandwidth– Set cover optimized latency for distributed query by Identifying a much smaller set of GraphDB nodes serving queries Shifting workload to GraphDB to utilize parallel powers Alleviating the K-way merging costs on NCS by reducing K– Available athttps://github.com/linkedin-sna/norbert/blob/master/network/src/main/scala/com/linkedin/norbert/network/partitioned/loadbalancer/DefaultPartitionedLoadBalancerFactory.scala©2013 LinkedIn Corporation. All Rights Reserved. 23
  • 24. SEARCH, NETWORK AND ANALYTICS©2013 LinkedIn Corporation. All Rights Reserved. 24

×