SEARCH, NETWORK AND ANALYTICSUsing Set Cover to Optimize a Large-Scale Low Latency Distributed GraphLinkedIn Presentation ...
SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Examp...
SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Examp...
SEARCH, NETWORK AND ANALYTICSLinkedIn Products Backed By Social Graph©2013 LinkedIn Corporation. All Rights Reserved. 4
SEARCH, NETWORK AND ANALYTICSLinkedIn’s Distributed Graph APIs Graph APIs– Get Connections “Who does member X know?”– Ge...
SEARCH, NETWORK AND ANALYTICSDistributed Graph Architecture Diagram©2013 LinkedIn Corporation. All Rights Reserved. 6
SEARCH, NETWORK AND ANALYTICSLinkedIn’s Distributed Graph Infrastructure Graph Database (Graph DB)– Partitioned and repli...
SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Examp...
SEARCH, NETWORK AND ANALYTICSGraph Distance Queries and Second Degree Creation©2013 LinkedIn Corporation. All Rights Reser...
SEARCH, NETWORK AND ANALYTICSThe Scaling Problem Illustrated©2013 LinkedIn Corporation. All Rights Reserved. 10Graph DBNCS...
SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover and Exam...
SEARCH, NETWORK AND ANALYTICSSet Cover Problem Definition– Given a set of elements U = {1, 2, …, m } (called the universe...
SEARCH, NETWORK AND ANALYTICSModified Set Cover Algorithm Example Cont.©2013 LinkedIn Corporation. All Rights Reserved. 13...
SEARCH, NETWORK AND ANALYTICSModified Set Cover Algorithm Example Cont.©2013 LinkedIn Corporation. All Rights Reserved. 14...
SEARCH, NETWORK AND ANALYTICSModified Set Cover Algorithm Example Cont.©2013 LinkedIn Corporation. All Rights Reserved. 15...
SEARCH, NETWORK AND ANALYTICSModified Set Cover Algorithm Example Cont.©2013 LinkedIn Corporation. All Rights Reserved. 16...
SEARCH, NETWORK AND ANALYTICSModified Set Cover Algorithm Example Solution Solution Compared to Optimal Result for U , wh...
SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Examp...
SEARCH, NETWORK AND ANALYTICSEvaluation Production Results– Second degree cache creation drops 38% on 99th percentile– Gr...
SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Examp...
SEARCH, NETWORK AND ANALYTICSRelated Work Scaling through Replications– Collocating neighbors [Pujol2010]– Replication ba...
SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Examp...
SEARCH, NETWORK AND ANALYTICSConclusions and Future Work Future Work– Incremental cache updates– Replication on GraphDB n...
SEARCH, NETWORK AND ANALYTICS©2013 LinkedIn Corporation. All Rights Reserved. 24
Upcoming SlideShare
Loading in …5
×

Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph

380 views

Published on

Using a modified version of set cover algorithm to reduce real-time query latencies in LinkedIn's distributed graph infrastructure.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
380
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph

  1. 1. SEARCH, NETWORK AND ANALYTICSUsing Set Cover to Optimize a Large-Scale Low Latency Distributed GraphLinkedIn Presentation Template©2013 LinkedIn Corporation. All Rights Reserved. 1Rui WangStaff Software Engineer
  2. 2. SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work©2013 LinkedIn Corporation. All Rights Reserved. 2
  3. 3. SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work©2013 LinkedIn Corporation. All Rights Reserved. 3
  4. 4. SEARCH, NETWORK AND ANALYTICSLinkedIn Products Backed By Social Graph©2013 LinkedIn Corporation. All Rights Reserved. 4
  5. 5. SEARCH, NETWORK AND ANALYTICSLinkedIn’s Distributed Graph APIs Graph APIs– Get Connections “Who does member X know?”– Get Shared Connections “Who do I know in common with member Y”?– Get Graph Distances “What’s the graph distances for each member returned from a searchquery?” “Is member Z within my second degree network so that I can view herprofile?” Over 50% queries is to get graph distances©2013 LinkedIn Corporation. All Rights Reserved. 5
  6. 6. SEARCH, NETWORK AND ANALYTICSDistributed Graph Architecture Diagram©2013 LinkedIn Corporation. All Rights Reserved. 6
  7. 7. SEARCH, NETWORK AND ANALYTICSLinkedIn’s Distributed Graph Infrastructure Graph Database (Graph DB)– Partitioned and replicated graph database– Distributed adjacency list Distributed Network Cache (NCS)– LRU cache stores second degree network for active members– Graph traversals are converted to set intersections Graph APIs– Get Connections– Get Shared Connections– Get Graph Distances©2013 LinkedIn Corporation. All Rights Reserved. 7
  8. 8. SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work©2013 LinkedIn Corporation. All Rights Reserved. 8
  9. 9. SEARCH, NETWORK AND ANALYTICSGraph Distance Queries and Second Degree Creation©2013 LinkedIn Corporation. All Rights Reserved. 9get graph distancescache lookup[exists=false] retrieve1st degree connections[exists=true] setintersectionsPartition IDspartial mergesK-Waymergesreturnreturn
  10. 10. SEARCH, NETWORK AND ANALYTICSThe Scaling Problem Illustrated©2013 LinkedIn Corporation. All Rights Reserved. 10Graph DBNCS• Increased Query Distribution• Merging and De-duping shift to single NCS node
  11. 11. SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover and Example Evaluation Related Work Conclusions and Future Work©2013 LinkedIn Corporation. All Rights Reserved. 11
  12. 12. SEARCH, NETWORK AND ANALYTICSSet Cover Problem Definition– Given a set of elements U = {1, 2, …, m } (called the universe) and afamily S of n sets whose union equals the universe, the set coverproblem is to identify the smallest subset of S the union of whichcontains all elements in the universe. Greedy Algorithm– Rule to select sets At each stage, choose the set that contains the largest number ofuncovered elements.– Upper bound O(log s), where s is the size of the largest set from S©2013 LinkedIn Corporation. All Rights Reserved. 12
  13. 13. SEARCH, NETWORK AND ANALYTICSModified Set Cover Algorithm Example Cont.©2013 LinkedIn Corporation. All Rights Reserved. 13 Build a map of partition ID -> Graph DB nodes for the input graph
  14. 14. SEARCH, NETWORK AND ANALYTICSModified Set Cover Algorithm Example Cont.©2013 LinkedIn Corporation. All Rights Reserved. 14 Randomly pick an elementfrom U = { 1,2,5,6 }– e = 5 Retrieve from map– nodes = { R21, R13 } Intersect– R21 = {1,5 } with U,– R13 = { 5,6 } with U Select R21– U = {2,6}, C = { R21}
  15. 15. SEARCH, NETWORK AND ANALYTICSModified Set Cover Algorithm Example Cont.©2013 LinkedIn Corporation. All Rights Reserved. 15 Randomly pick an elementfrom U = { 2,6 }– e = 2 Retrieve from map– nodes = { R11, R22 } Intersect– R11 = {1,2 } with U,– R22 = { 2,4 } with U Select R22– U = {6}, C = { R21, R22 }
  16. 16. SEARCH, NETWORK AND ANALYTICSModified Set Cover Algorithm Example Cont.©2013 LinkedIn Corporation. All Rights Reserved. 16 Pick the final element fromU = { 6 }– e = 6 Retrieve from map– nodes = { R23, R13 } Intersect– R23 = { 3,6 } with U,– R13 = { 5,6 } with U Select R23– U = {}, C = {R21, R22, R23 }
  17. 17. SEARCH, NETWORK AND ANALYTICSModified Set Cover Algorithm Example Solution Solution Compared to Optimal Result for U , where U = {1,2,5,6}©2013 LinkedIn Corporation. All Rights Reserved. 17
  18. 18. SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work©2013 LinkedIn Corporation. All Rights Reserved. 18
  19. 19. SEARCH, NETWORK AND ANALYTICSEvaluation Production Results– Second degree cache creation drops 38% on 99th percentile– Graph distance calculation drops 25% on 99th percentile– Outbound traffic drops 40%; Inbound traffic drops 10%©2013 LinkedIn Corporation. All Rights Reserved. 1995th quantile 99th quantile0123control setcover control setcoverlatency(normalized)network cache building95th quantile 99th quantile0123control setcover control setcoverlatency(normalized)getdistances
  20. 20. SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work©2013 LinkedIn Corporation. All Rights Reserved. 20
  21. 21. SEARCH, NETWORK AND ANALYTICSRelated Work Scaling through Replications– Collocating neighbors [Pujol2010]– Replication based on read/write frequencies [Mondal2012]– Replication based on locality [Carrasco2011] Multi-Core Implementations– Parallel graph exploration [Hong2011] Offline Graph Systems– Google’s Pregel [Malewicz2010]– Distributed GraphLab [Low2012]©2013 LinkedIn Corporation. All Rights Reserved. 21
  22. 22. SEARCH, NETWORK AND ANALYTICSOutline LinkedIn’s Distributed Graph Infrastructure The Scaling Problem Set Cover by Example Evaluation Related Work Conclusions and Future Work©2013 LinkedIn Corporation. All Rights Reserved. 22
  23. 23. SEARCH, NETWORK AND ANALYTICSConclusions and Future Work Future Work– Incremental cache updates– Replication on GraphDB nodes through LRU caching– New graph traversal algorithms Conclusions– Key challenges tackled Work distribution balancing Communication Bandwidth– Set cover optimized latency for distributed query by Identifying a much smaller set of GraphDB nodes serving queries Shifting workload to GraphDB to utilize parallel powers Alleviating the K-way merging costs on NCS by reducing K– Available athttps://github.com/linkedin-sna/norbert/blob/master/network/src/main/scala/com/linkedin/norbert/network/partitioned/loadbalancer/DefaultPartitionedLoadBalancerFactory.scala©2013 LinkedIn Corporation. All Rights Reserved. 23
  24. 24. SEARCH, NETWORK AND ANALYTICS©2013 LinkedIn Corporation. All Rights Reserved. 24

×