Successfully reported this slideshow.
The Evolution of the ProfessionalGraph at LinkedIn        Chris Conrad                     Igor PerisicSenior Engineering ...
LinkedIn•  The site officially launched on May 5, 2003. At the end of the first   month in operation, LinkedIn had a total...
In the beginning…
The Cloud•  Cloud is the original name of our graph engine•  Responsible for read scaling graph queries (and it used to do...
What was wrong?•  Large memory footprint   –  Network cache used simple but inefficient data structures   –  The size and ...
C++ Graph•  First project: migrate the network cache to a new data structure to   reduce memory usage•  Second project: im...
Several million users later
New Problems•  Growth   –  The size and density of the graph was increasing   –  We were running out of memory   –  We wer...
Split cloud•  cloud-session: Move the load balancing logic into a service we   control•  rgraph: Extract the C++ graph int...
New problems, same as the old•  rgraph instances still had a large memory footprint    –  The density of the graph was inc...
Distribute the Graph•  Introduce Norbert a new cluster management system•  Partition the graph data•  Partition the networ...
Mission Accomplished
So now what?
My Connections
Common Connections
My Network
How am I connected?
What is the professional graph?•  LinkedIn connections•  Current and past co-workers•  University colleagues and alumni•  ...
New requirements•  Members aren’t the only type of node in the professional graph•  LinkedIn connections aren’t the only t...
Making changes was hard•  Code was rigid   –  Data was stored using class hierarchies, introducing data types was      pro...
Graph as a Service•  Custom persistence engine   –  Log structured   –  Memory mapped files keeps data out of the Java hea...
Graph Queries•  Company(:id)[CompanyFollowers]•  Member(:id)[MemberToMember{CreatedAt > :t}]•  Member(:id)[topN(MemberToMe...
What do we have in common?
How am I connected?
What’s next?•  Online schema migration•  Automated repartitioning and data migration•  Automated provisioning•  Hierarchic...
And we’re still growing                                     200M+            2/sec                                      63...
We’re Hiring•  http://studentcareers.linkedin.com•  Or email me at cconrad@linkedin.com
Q&A
Upcoming SlideShare
Loading in …5
×

LinkedIn Graph Presentation

2,481 views

Published on

Chris Conrad (Senior Engineering Manager) and Igor Perisic (Senior Director Engineering) from LinkedIn gave this talk to UC Santa Barbara in 2012.

Published in: Technology
  • Be the first to comment

LinkedIn Graph Presentation

  1. 1. The Evolution of the ProfessionalGraph at LinkedIn Chris Conrad Igor PerisicSenior Engineering Manager, Sr. Director of Engineering, SNA Social Graph
  2. 2. LinkedIn•  The site officially launched on May 5, 2003. At the end of the first month in operation, LinkedIn had a total of 4,500 members in the network.•  As of January 9, 2013, LinkedIn operates the world’s largest professional network on the Internet with more than 200 million members in over 200 countries and territories.•  As of September 30, 2012, LinkedIn counts executives from all 2012 Fortune 500 companies as members; its corporate talent solutions are used by 85 of the Fortune 100 companies.•  As of the school year ending May 2012, there are over 20 million students and recent college graduates on LinkedIn. They are LinkedIns fastest-growing demographic.
  3. 3. In the beginning…
  4. 4. The Cloud•  Cloud is the original name of our graph engine•  Responsible for read scaling graph queries (and it used to do search, too)•  Stored 4 primary sets of data: Cloud Member Network Data Cache Group Connections Membership
  5. 5. What was wrong?•  Large memory footprint –  Network cache used simple but inefficient data structures –  The size and density of the graph was increasing•  Garbage Collector woes –  Large JVM heap caused long GC pauses –  Long GC pauses reduces availability resulting in site outages
  6. 6. C++ Graph•  First project: migrate the network cache to a new data structure to reduce memory usage•  Second project: implement a C++ JNI library to move the graph data off heap•  Result: Drastic reduction in JVM heap utilization Cloud Java Heap libGraphJNI.so Member Network Data Cache Connections Group Membership
  7. 7. Several million users later
  8. 8. New Problems•  Growth –  The size and density of the graph was increasing –  We were running out of memory –  We were running out of CPU cycles –  Proliferation of services increased the overhead of maintaining client side software load balancer –  As of September 30, 2012, LinkedIn has 3,177 full-time employees located around the world. LinkedIn started off 2012 with about 2,100 full-time employees worldwide, up from around 1,000 at the beginning of 2011 and about 500 at the beginning of 2010.•  C++ code had a much higher maintenance cost –  Coredumps are much less friendly than a NullPointerException –  LinkedIn didn’thave the expertise or infrastructure to support C++ development
  9. 9. Split cloud•  cloud-session: Move the load balancing logic into a service we control•  rgraph: Extract the C++ graph into its own service cloud-session Cloud rgraph Java Heap libGraphJNI.so Member Network Data Cache Connections Group Membership
  10. 10. New problems, same as the old•  rgraph instances still had a large memory footprint –  The density of the graph was increasing –  We were running out of memory –  We were running out of CPU cycles•  cloud-session’s software load balancer implementation was essentially a single point of failure
  11. 11. Distribute the Graph•  Introduce Norbert a new cluster management system•  Partition the graph data•  Partition the network cache service cloud-session dgraph Connections Cloud Java Heap Group Membership Member Data Network Cache Service
  12. 12. Mission Accomplished
  13. 13. So now what?
  14. 14. My Connections
  15. 15. Common Connections
  16. 16. My Network
  17. 17. How am I connected?
  18. 18. What is the professional graph?•  LinkedIn connections•  Current and past co-workers•  University colleagues and alumni•  Group members•  And what about geography, industry and skill overlap?
  19. 19. New requirements•  Members aren’t the only type of node in the professional graph•  LinkedIn connections aren’t the only type of edge in the profession graph•  We already supported groups and group membership
  20. 20. Making changes was hard•  Code was rigid –  Data was stored using class hierarchies, introducing data types was prohibitively slow –  Queries were built by combining object instances•  BDBJE•  Everything was back in the heap –  Garbage collection time was starting to go up –  GC pauses no longer caused outages, but flapping introduced high developer and operational overhead
  21. 21. Graph as a Service•  Custom persistence engine –  Log structured –  Memory mapped files keeps data out of the Java heap –  Data described using DDL like schema•  Custom SQL like query language –  Query language understands DDL –  Text based language reduces code changes
  22. 22. Graph Queries•  Company(:id)[CompanyFollowers]•  Member(:id)[MemberToMember{CreatedAt > :t}]•  Member(:id)[topN(MemberToMember, Score, 10)]
  23. 23. What do we have in common?
  24. 24. How am I connected?
  25. 25. What’s next?•  Online schema migration•  Automated repartitioning and data migration•  Automated provisioning•  Hierarchical data partitioning•  Monitoring and statistics•  Query optimization•  Query fragment caching•  Result set caching•  Query parallelization•  Very large data set handling•  …
  26. 26. And we’re still growing 200M+ 2/sec 63% non U.S. 25th Most visit website worldwide 90 (Comscore 6-12) 55 >2.6M Company pages 85% 32 17 8 2 4 Fortune 100 Companies use LinkedIn to hire2004 2005 2006 2007 2008 2009 2010 2011 LinkedIn Members (Millions)
  27. 27. We’re Hiring•  http://studentcareers.linkedin.com•  Or email me at cconrad@linkedin.com
  28. 28. Q&A

×