Data Infrastructure @ LinkedInSid AnandQCon London 2012@r39132                                 1
About MeCurrent Life…  LinkedIn                        *      Web / Software Engineering           Search, Network, and...
Our missionConnect the world’s professionals to make  them more productive and successful                  @r39132        ...
The world’s largest professional networkOver 60% of members are now international                                         ...
Other Company Facts•  Headquartered in Mountain View, Calif., with offices around the world!                              ...
Agenda  Company Overview  Architecture   –  Data Infrastructure Overview   –  Technology Spotlight           Oracle    ...
LinkedIn : Architecture	Overview•  Our site runs primarily on Java, with some use of Scala for specific   infrastructure• ...
LinkedIn : Architecture	                                                                                     A web page r...
LinkedIn : Architecture	                                                                                     A web page r...
LinkedIn : Data Infrastructure Technologies	•    Database Technologies      •  Oracle      •  Voldemort                   ...
LinkedIn : Data Infrastructure Technologies	This talk will focus on a few of the key technologies below!•    Database Tech...
LinkedIn Data Infrastructure TechnologiesOracle: Source of Truth for User-Provided Data                              @r391...
Oracle : Overview	Oracle•  All user-provided data is stored in Oracle – our current source of truth•  About 50 Schemas run...
Oracle : Overview – Data Service Context	 Scaling Oracle Reads using DSC   DSC uses a token (e.g. cookie) to ensure that ...
Oracle : Overview – How DSC Works?	    When a user writes data to the master, the DSC token (for that     data domain) is...
LinkedIn Data Infrastructure TechnologiesVoldemort: Highly-Available Distributed Data Store                              @...
Voldemort : Overview	•    A distributed, persistent key-value store influenced by the AWS Dynamo paper•    Key Features of...
Voldemort : Overview	                        API                                    Layered, Pluggable Architecture     Ve...
Voldemort : Overview	                                                 Layered, Pluggable Architecture                     ...
Where Does LinkedIn use      Voldemort?          @r39132         20
Voldemort : Usage Patterns @ LinkedIn	 2 Usage-Patterns   Read-Write Store     –  Uses BDB JE for the storage engine     ...
Voldemort : RO Store Usage at LinkedIn	     People You May Know	     Viewers of this profile also viewed	   Related Searche...
Voldemort : Usage Patterns @ LinkedIn	RO Store Usage Pattern1.    Use Hadoop to build a model2.    Voldemort loads the out...
How Do The Voldemort RO     Stores Perform?          @r39132         24
Voldemort : RO Store Performance : TP vs. Latency	                                                             ●       MyS...
LinkedIn Data Infrastructure SolutionsDatabus : Timeline-Consistent Change Data Capture                               @r39...
Where Does LinkedIn use       DataBus?          @r39132         27
DataBus : Use-Cases @ LinkedIn	                  Updates                                   Standard      Search       Grap...
DataBus : Architecture	                                            DataBus consists of 2 services        Capture          ...
DataBus : Architecture	                                                                   Client        Capture          R...
DataBus : Architecture - Bootstrap	    Generate consistent snapshots and consolidated deltas     during continuous update...
LinkedIn Data Infrastructure SolutionsKafka: High-Volume Low-Latency Messaging System                               @r3913...
Kafka : Usage at LinkedIn	Where as DataBus is used for Database change capture andreplication, Kafka is used for applicati...
Kafka : Overview	                                   Broker Tier     WebTier                                               ...
Kafka : Overview	Key Design Choices•  When reading from a file and sending to network socket, we typically incur 4 buffer ...
How Does Kafka Perform?          @r39132         36
Kafka : Performance : Throughput vs. Latency	                                     (100 topics, 1 producer, 1 broker)      ...
Kafka : Performance : Linear Incremental Scalability	                                    (10 topics, broker flush interval...
Kafka : Performance : Resilience as Messages Pile Up 	                                                (1	  topic,	  broker...
Acknowledgments	 Presentation & Content   Chavdar Botev (DataBus)       @cbotev   Roshan Sumbaly (Voldemort)    @rsumbal...
y Questions?   @r39132     41
Upcoming SlideShare
Loading in...5
×

LinkedIn Data Infrastructure (QCon London 2012)

90,889

Published on

This is a talk I gave in March 2012 at QCon London on Data Infrastructure at LinkedIn

Published in: Technology, Business
2 Comments
165 Likes
Statistics
Notes
No Downloads
Views
Total Views
90,889
On Slideshare
0
From Embeds
0
Number of Embeds
25
Actions
Shares
0
Downloads
1,492
Comments
2
Likes
165
Embeds 0
No embeds

No notes for slide

LinkedIn Data Infrastructure (QCon London 2012)

  1. 1. Data Infrastructure @ LinkedInSid AnandQCon London 2012@r39132 1
  2. 2. About MeCurrent Life…  LinkedIn *   Web / Software Engineering   Search, Network, and Analytics (SNA)   Distributed Data Systems (DDS)   MeIn a Previous Life…  Netflix, Cloud Database Architect  eBay, Web Development, Research Lab, & Search Engine ***And Many Years Prior…  Studying Distributed Systems at Cornell University @r39132 2
  3. 3. Our missionConnect the world’s professionals to make them more productive and successful @r39132 3
  4. 4. The world’s largest professional networkOver 60% of members are now international 75% ** 150M+ * 90 Fortune 100 Companies use LinkedIn to hire >2M *** Company Pages 55 32 16 Languages ~4.2B 17 *** 8 2 4 Professional searches in 2011 2004 2005 2006 2007 2008 2009 2010 *as of February 9, 2012 **as of September 30, 2011 LinkedIn Members (Millions) ***as of December 31, 2011 @r39132 4
  5. 5. Other Company Facts•  Headquartered in Mountain View, Calif., with offices around the world! *•  As of December 31, 2011, LinkedIn has 2,116 full-time employees located around the world. •  Currently around 650 people work in Engineering •  400 in Web/Software Engineering •  Plan to add another 200 in 2012 *** •  250 in Operations *as of February 9, 2012 **as of September 30, 2011 ***as of December 31, 2011 @r39132 5
  6. 6. Agenda  Company Overview  Architecture –  Data Infrastructure Overview –  Technology Spotlight   Oracle   Voldemort   DataBus   Kafka  Q & A @r39132 6
  7. 7. LinkedIn : Architecture Overview•  Our site runs primarily on Java, with some use of Scala for specific infrastructure•  What runs on Scala? •  Network Graph Service •  Kafka•  Most of our services run on Apache + Jetty @r39132 7
  8. 8. LinkedIn : Architecture   A web page requests information A and B Presentation Tier   A thin layer focused on building the UI. It assembles the page by making parallel requests to BST services A A B B C C Business Service Tier   Encapsulates business logic. Can call other BST clusters and its own DST cluster. A A B B C C Data Service Tier   Encapsulates DAL logic and concerned with one Oracle Schema. Data Infrastructure   Concerned with the Master Slave Master Master persistent storage of and easy access to data Memcached Voldemort @r39132 8
  9. 9. LinkedIn : Architecture   A web page requests information A and B Presentation Tier   A thin layer focused on building the UI. It assembles the page by making parallel requests to BST services A A B B C C Business Service Tier   Encapsulates business logic. Can call other BST clusters and its own DST cluster. A A B B C C Data Service Tier   Encapsulates DAL logic and concerned with one Oracle Schema. Data Infrastructure   Concerned with the Master Slave Master Master persistent storage of and easy access to data Memcached Voldemort @r39132 9
  10. 10. LinkedIn : Data Infrastructure Technologies •  Database Technologies •  Oracle •  Voldemort * •  Espresso•  Data Replication Technologies •  Kafka •  DataBus•  Search Technologies •  Zoie – real-time search and indexing with Lucene •  Bobo – faceted search library for Lucene *** •  SenseiDB – fast, real-time, faceted, KV and full-text Search Engine and more @r39132 10
  11. 11. LinkedIn : Data Infrastructure Technologies This talk will focus on a few of the key technologies below!•  Database Technologies * •  Oracle •  Voldemort •  Espresso – A new K-V store under development•  Data Replication Technologies •  Kafka •  DataBus•  Search Technologies *** •  Zoie – real-time search and indexing with Lucene •  Bobo – faceted search library for Lucene •  SenseiDB – fast, real-time, faceted, KV and full-text Search Engine and more @r39132 11
  12. 12. LinkedIn Data Infrastructure TechnologiesOracle: Source of Truth for User-Provided Data @r39132 12
  13. 13. Oracle : Overview Oracle•  All user-provided data is stored in Oracle – our current source of truth•  About 50 Schemas running on tens of physical instances *•  With our user base and traffic growing at an accelerating pace, so how do we scale Oracle for user-provided data?Scaling Reads•  Oracle Slaves (c.f. DSC)•  Memcached•  Voldemort – for key-value lookups ***Scaling Writes•  Move to more expensive hardware or replace Oracle with something else @r39132 13
  14. 14. Oracle : Overview – Data Service Context Scaling Oracle Reads using DSC   DSC uses a token (e.g. cookie) to ensure that a reader always sees his or her own writes immediately –  If I update my own status, it is okay if you don’t see the change for a few minutes, but I have to see it immediately @r39132 14
  15. 15. Oracle : Overview – How DSC Works?   When a user writes data to the master, the DSC token (for that data domain) is updated with a timestamp   When the user reads data, we first attempt to read from a replica (a.k.a. slave) database   If the data in the slave is older than the data in the DSC token, we read from the Master instead @r39132 15
  16. 16. LinkedIn Data Infrastructure TechnologiesVoldemort: Highly-Available Distributed Data Store @r39132 16
  17. 17. Voldemort : Overview •  A distributed, persistent key-value store influenced by the AWS Dynamo paper•  Key Features of Dynamo   Highly Scalable, Available, and Performant   Achieves this via Tunable Consistency •  For higher consistency, the user accepts lower availability, scalability, and performance, and vice-versa   Provides several self-healing mechanisms when data does become inconsistent •  Read Repair   Repairs value for a key when the key is looked up/read •  Hinted Handoff   Buffers value for a key that wasn’t successfully written, then writes it later •  Anti-Entropy Repair   Scans the entire data set on a node and fixes it   Provides means to detect node failure and a means to recover from node failure •  Failure Detection •  Bootstrapping New Nodes @r39132 17
  18. 18. Voldemort : Overview API Layered, Pluggable Architecture VectorClock<V> get (K key) Client put (K key, VectorClock<V> value) applyUpdate(UpdateAction action, int retries) Client API Conflict ResolutionVoldemort-specific Features Serialization  Implements a layered, pluggable Repair Mechanism architecture Failure Detector Routing  Each layer implements a common interface (c.f. API). This allows us to replace or remove implementations at any layer Server Repair Mechanism •  Pluggable data storage layer Failure Detector Admin   BDB JE, Custom RO storage, Routing etc… Storage Engine •  Pluggable routing supports   Single or Multi-datacenter routing @r39132 18
  19. 19. Voldemort : Overview Layered, Pluggable Architecture ClientVoldemort-specific Features Client API Conflict Resolution•  Supports Fat client or Fat Server Serialization Repair Mechanism •  Repair Mechanism + Failure Failure Detector Detector + Routing can run on Routing server or client Server•  LinkedIn currently runs Fat Client, but we Repair Mechanism would like to move this to a Fat Server Failure Detector Admin Model Routing Storage Engine @r39132 19
  20. 20. Where Does LinkedIn use Voldemort? @r39132 20
  21. 21. Voldemort : Usage Patterns @ LinkedIn 2 Usage-Patterns   Read-Write Store –  Uses BDB JE for the storage engine –  50% of Voldemort Stores (aka Tables) are RW   Read-only Store –  Uses a custom Read-only format –  50% of Voldemort Stores (aka Tables) are RO   Let’s look at the RO Store @r39132 21
  22. 22. Voldemort : RO Store Usage at LinkedIn People You May Know Viewers of this profile also viewed Related Searches Events you may be interested in LinkedIn Skills Jobs you may be interested in @r39132 22
  23. 23. Voldemort : Usage Patterns @ LinkedIn RO Store Usage Pattern1.  Use Hadoop to build a model2.  Voldemort loads the output of Hadoop3.  Voldemort serves fast key-value look-ups on the site –  e.g. For key=“Sid Anand”, get all the people that “Sid Anand” may know! –  e.g. For key=“Sid Anand”, get all the jobs that “Sid Anand” may be interested in! @r39132 23
  24. 24. How Do The Voldemort RO Stores Perform? @r39132 24
  25. 25. Voldemort : RO Store Performance : TP vs. Latency ● MySQL ● Voldemort ● MySQL ● Voldemort ● Voldemort ● MySQL median MySQL ● Voldemort 99th percentile ● median median ● 99th percentile ●percentile 99th median 99th percentile 3.5 ● ● ● 250 ● ● ● ● ● 3.5 3.5 ● 250 ● ● ● 3.5 3.0 ● 250 ● ● ● ● ● ● 250 ● ● ● 3.0 ● 200 ● ● ● ● ● ● 3.0 ● ● ● ● ● ● 3.0 ● latency (ms) 2.5 ● 200 ● ● ● ● 200 200 latency (ms) 2.5 latency (ms) 2.5 ● ● ● ● latency (ms) 2.5 2.0 ● ● ● ● ● ● ● 150 ● 2.0 2.0● ● ●● ● ● 150 150 2.0 ● 1.5 ● ● ● ● 150 ● ● ● ● ● ● ● 100 ●● ● ● 1.5 1.5 ● ● ● ●● ● ● 1.5 1.0 ● ● 100 100 ● ● ● ● ● ● 100 ●● ● 1.0 ● 1.0 ● 50 ● ● ● ● ● ● ● ● 1.0 0.5 ● ● ● ● 50 ● ● 50 ● ● ● ● 0.5 0.5 50 ● 0.5 0.0 0 0.0 0.0 0 0 0.0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700 100 200 300 400 500 600 700 throughput 300 400 500 600 700 500 600 700 100 200 100 200 300 400 500 600 700 (qps) 200 300 400 100 100 200 300 400 500 600 700 100 200 300 400 500 600 700 throughput (qps) (qps) throughput (qps) throughput 100 GB data, 24 GB RAM @r39132 25
  26. 26. LinkedIn Data Infrastructure SolutionsDatabus : Timeline-Consistent Change Data Capture @r39132 26
  27. 27. Where Does LinkedIn use DataBus? @r39132 27
  28. 28. DataBus : Use-Cases @ LinkedIn Updates Standard Search Graph Read ization Index Index Replicas Oracle Data Change EventsA user updates his profile with skills and position history. He also accepts a connection•  The write is made to an Oracle Master and DataBus replicates:•  the profile change to the Standardization service   E.G. the many forms of IBM are canonicalized for search-friendliness and recommendation-friendliness•  the profile change to the Search Index service   Recruiters can find you immediately by new keywords•  the connection change to the Graph Index service   The user can now start receiving feed updates from his new connections immediately @r39132 28
  29. 29. DataBus : Architecture DataBus consists of 2 services Capture Relay •  Relay Services DB Changes Event Win •  Sharded •  Maintain an in-memory buffer per On-line shard Changes •  Each shard polls Oracle and then deserializes transactions into Avro Bootstrap •  Bootstrap Service DB •  Picks up online changes as they appear in the Relay •  Supports 2 types of operations from clients   If a client falls behind and needs records older than what the relay has, Bootstrap can send consolidated deltas!   If a new client comes on line, Bootstrap can send a consistent snapshot @r39132 29
  30. 30. DataBus : Architecture Client Capture Relay Consumer 1 Client Lib On-line Databus DB Changes Changes Event Win Consumer n On-line ated Changes solid Con eT Sinc Delta Client Bootstrap Consumer 1 Client Lib Consistent Databus Snapshot at U Consumer n DB Guarantees   Transactional semantics   In-commit-order Delivery   At-least-once delivery   Durability (by data source)   High-availability and reliability   Low latency @r39132 30
  31. 31. DataBus : Architecture - Bootstrap   Generate consistent snapshots and consolidated deltas during continuous updates with long-running queries Client Read Consumer 1 Client Lib Databus Online Relay Changes Consumer n Event Win Read Changes Bootstrap server Bootstrap Read Log Writer Server recent events Replay Log Storage events Snapshot Storage Log Applier @r39132 31
  32. 32. LinkedIn Data Infrastructure SolutionsKafka: High-Volume Low-Latency Messaging System @r39132 32
  33. 33. Kafka : Usage at LinkedIn Where as DataBus is used for Database change capture andreplication, Kafka is used for application-level data streamsExamples:•  End-user Action Tracking (a.k.a. Web Tracking) of •  Emails opened •  Pages seen •  Links followed •  Executing Searches•  Operational Metrics •  Network & System metrics such as •  TCP metrics (connection resets, message resends, etc…) •  System metrics (iops, CPU, load average, etc…) @r39132 33
  34. 34. Kafka : Overview Broker Tier WebTier Consumers Push Sequential write sendfile Pull Events Events Iterator 1 Client Lib Topic 1 Kafka 100 MB/sec 200 MB/sec Topic 2 Iterator n Topic N Topic  Offset Topic, Partition Offset Ownership Zookeeper ManagementFeatures Guarantees Scale  Pub/Sub   At least once delivery   Billions of Events  Batch Send/Receive   Very high throughput   TBs per day  System Decoupling   Low latency   Inter-colo: few seconds   Durability   Typical retention: weeks   Horizontally Scalable @r39132 34
  35. 35. Kafka : Overview Key Design Choices•  When reading from a file and sending to network socket, we typically incur 4 buffer copies and 2 OS system calls   Kafka leverages a sendFile API to eliminate 2 of the buffer copies and 1 of the system calls•  No double-buffering of messages - we rely on the OS page cache and do not store a copy of the message in the JVM   Less pressure on memory and GC   If the Kafka process is restarted on a machine, recently accessed messages are still in the page cache, so we get the benefit of a warm start•  Kafka doesnt keep track of which messages have yet to be consumed -- i.e. no book keeping overhead   Instead, messages have time-based SLA expiration -- after 7 days, messages are deleted @r39132 35
  36. 36. How Does Kafka Perform? @r39132 36
  37. 37. Kafka : Performance : Throughput vs. Latency (100 topics, 1 producer, 1 broker) 250 200 Consumer latency in ms 150 100 50 0 0 20 40 60 80 100 Producer throughput in MB/sec @r39132 37
  38. 38. Kafka : Performance : Linear Incremental Scalability (10 topics, broker flush interval 100K) 400 381 350 300 Throughput in MB/s 293 250 200 190 150 100 101 50 0 1 broker 2 brokers 3 brokers 4 brokers @r39132 38
  39. 39. Kafka : Performance : Resilience as Messages Pile Up (1  topic,  broker  flush  interval  10K)   200000 160000 Throughput  in  msg/s   120000 80000 40000 0 10 105 199 294 388 473 567 662 756 851 945 1039 Unconsumed data in GB @r39132 39
  40. 40. Acknowledgments Presentation & Content   Chavdar Botev (DataBus) @cbotev   Roshan Sumbaly (Voldemort) @rsumbaly   Neha Narkhede (Kafka) @nehanarkhede Development Team Aditya Auradkar, Chavdar Botev, Shirshanka Das, Dave DeMaagd, Alex Feinberg, Phanindra Ganti, Lei Gao, Bhaskar Ghosh, Kishore Gopalakrishna, Brendan Harris, Joel Koshy, Kevin Krawez, Jay Kreps, Shi Lu, Sunil Nagaraj, Neha Narkhede, Sasha Pachev, Igor Perisic, Lin Qiao, Tom Quiggle, Jun Rao, Bob Schulman, Abraham Sebastian, Oliver Seeliger, Adam Silberstein, Boris Skolnick, Chinmay Soman, Roshan Sumbaly, Kapil Surlaker, Sajid Topiwala, Balaji Varadarajan, Jemiah Westerman, Zach White, David Zhang, and Jason Zhang @r39132 40
  41. 41. y Questions? @r39132 41
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×