Xldb2011 tue 1005_linked_in

912 views
843 views

Published on

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
912
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
17
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Xldb2011 tue 1005_linked_in

  1. 1. Data Infrastructure at LinkedInShirshanka DasXLDB 2011 1
  2. 2. Me UCLA Ph.D. 2005 (Distributed protocols in content delivery networks) PayPal (Web frameworks and Session Stores) Yahoo! (Serving Infrastructure, Graph Indexing, Real-time Bidding in Display Ad Exchanges) @ LinkedIn (Distributed Data Systems team): Distributed data transport and storage technology (Kafka, Databus, Espresso, ...) 2
  3. 3. Outline LinkedIn Products Data Ecosystem LinkedIn Data Infrastructure Solutions Next Play 3
  4. 4. LinkedIn By The Numbers 120,000,000+ users in August 2011 2 new user registrations per second 4 billion People Searches expected in 2011 2+ million companies with LinkedIn Company Pages 81+ million unique visitors monthly* 150K domains feature the LinkedIn Share Button 7.1 billion page views in Q2 2011 1M LinkedIn Groups* Based on comScore, Q2 2011 4
  5. 5. Member Profiles 5
  6. 6. Signal - faceted stream search 6
  7. 7. People You May Know 7
  8. 8. Outline LinkedIn Products Data Ecosystem LinkedIn Data Infrastructure Solutions Next Play 8
  9. 9. Three Paradigms : Simplifying the Data Continuum• Member Profiles • Signal • People You May Know• Company Profiles • Profile Standardization • Connection Strength• Connections • News • News• Communications • Recommendations • Recommendations • Search • Next best idea • Communications Online Nearline OfflineActivity that should Activity that should Activity that can bebe reflected immediately be reflected soon reflected later 9
  10. 10. Data Infrastructure Toolbox (Online) Capabilities Systems Analysis Key-value access Rich structures (e.g. indexes) Change capture capability { Search platform Graph engine 10
  11. 11. Data Infrastructure Toolbox (Nearline) Capabilities Systems Analysis Change capture streams Messaging for site events, monitoring Nearline processing 11
  12. 12. Data Infrastructure Toolbox (Offline) Capabilities Systems Analysis Machine learning, ranking, relevance Analytics on Social gestures 12
  13. 13. Laying out the tools 13
  14. 14. Outline LinkedIn Products Data Ecosystem LinkedIn Data Infrastructure Solutions Next Play 14
  15. 15. Focus on four systems in Online and Nearline Data Transport – Kafka – Databus Online Data Stores – Voldemort – Espresso 15
  16. 16. LinkedIn Data Infrastructure SolutionsKafka: High-Volume Low-Latency Messaging System 16
  17. 17. Kafka: Architecture Broker Tier WebTier Consumers Push Sequential write sendfile Pull Iterator 1 Client Lib Event Events Topic 1 Kafka s 100 MB/sec 200 MB/sec Topic 2 Iterator n Topic N Topic  Offset Topic, Partition Offset Ownership Zookeeper ManagementScale Guarantees Billions of Events  At least once delivery TBs per day  Very high throughput Inter-colo: few seconds  Low latency Typical retention: weeks  Durability 17
  18. 18. LinkedIn Data Infrastructure SolutionsDatabus : Timeline-Consistent Change Data Capture 18
  19. 19. Databus at LinkedIn Client Relay Consumer 1 Client Lib Capture Databus On-line DB Changes Changes Event Win Consumer n On-line Changes Bootstrap Client Consumer 1 Client Lib Databus Consistent Snapshot at U Consumer n DBFeatures Guarantees Transport independent of data  Transactional semantics source: Oracle, MySQL, …  Timeline consistency with the data source Portable change event  Durability (by data source) serialization and versioning  At-least-once delivery Start consumption from arbitrary  Availability point  Low latency 19
  20. 20. LinkedIn Data Infrastructure SolutionsVoldemort: Highly-Available Distributed Data Store 20
  21. 21. Voldemort: ArchitectureHighlights In production• Open source • Data products• Pluggable components • Network updates, sharing,• Tunable consistency / page view tracking, availability rate-limiting, more…• Key/value model, • Future: SSDs, server side “views” multi-tenancy
  22. 22. LinkedIn Data Infrastructure SolutionsEspresso: Indexed Timeline-Consistent Distributed Data Store 22
  23. 23. Espresso: Key Design Points Hierarchical data model – InMail, Forums, Groups, Companies Native Change Data Capture Stream – Timeline consistency – Read after Write Rich functionality within a hierarchy – Local Secondary Indexes – Transactions – Full-text search Modular and Pluggable – Off-the-shelf: MySQL, Lucene, Avro 23
  24. 24. Application View 24
  25. 25. Partitioning 25
  26. 26. Partition Layout: Master, Slave3 Storage Engine nodes, 2 way replication Database P.1 P.2 P.3 P.5 P.6 P.7 Partition: P.1 Node: 1 P.4 P.5 P.6 P.8 P.1 P.2 … Partition: P.12 Node: 3 P.9 P.1 P.11 P.1 0 2 Node 1 Node 2 Cluster Node: 1 M: P.1 – Active P.9 P.1 P.11 … 0 S: P.5 – Active … P.1 P.3 P.4 2 P.7 P.8 Master Cluster Slave Manager Node 3
  27. 27. Espresso: API REST over HTTP Get Messages for bob – GET /MailboxDB/MessageMeta/bob Get MsgId 3 for bob – GET /MailboxDB/MessageMeta/bob/3 Get first page of Messages for bob that are unread and in the inbox – GET /MailboxDB/MessageMeta/bob/?query=“+isUnread:true +isInbox:true”&start=0&count=15 27
  28. 28. Espresso: API Transactions• Add a message to bob’s mailbox • transactionally update mailbox aggregates, insert into metadata and details. POST /MailboxDB/*/bob HTTP/1.1 Content-Type: multipart/binary; boundary=1299799120 Accept: application/json --1299799120 Content-Type: application/json Content-Location: /MailboxDB/MessageStats/bob Content-Length: 50 {“total”:”+1”, “unread”:”+1”} --1299799120 Content-Type: application/json Content-Location: /MailboxDB/MessageMeta/bob Content-Length: 332 {“from”:”…”,”subject”:”…”,…} --1299799120 Content-Type: application/json Content-Location: /MailboxDB/MessageDetails/bob Content-Length: 542 {“body”:”…”} --1299799120— 28
  29. 29. Espresso: System Components 29
  30. 30. Espresso @ LinkedIn First applications – Company Profiles – InMail Next – Unified Social Content Platform – Member Profiles – Many more… 30
  31. 31. Espresso: Next steps Launched first application Oct 2011 Open source 2012 Multi-Datacenter support Log-structured storage Time-partitioned data 31
  32. 32. Outline LinkedIn Products Data Ecosystem LinkedIn Data Infrastructure Solutions Next Play 32
  33. 33. The Specialization Paradox in Distributed Systems Good: Build specialized systems so you can do each thing really well Bad: Rebuild distributed routing, failover, cluster management, monitoring, tooling 33
  34. 34. Generic Cluster Manager: Helix• Generic Distributed State Model• Centralized Config Management• Automatic Load Balancing• Fault tolerance• Health monitoring• Cluster expansion and rebalancing• Open Source 2012• Espresso, Databus and Search 34
  35. 35. Stay tuned for Innovation – Nearline processing – Espresso eco-system – Storage / indexing – Analytics engine – Search Convergence – Building blocks for distributed data management systems 35
  36. 36. Thanks! 36
  37. 37. Appendix 37
  38. 38. Espresso: Routing Router is a high-performance HTTP proxy Examines URL, extracts partition key Per-db routing strategy – Hash Based – Route To Any (for schema access) – Range (future) Routing function maps partition key to partition Cluster Manager maintains mapping of partition to hosts: – Single Master – Multiple Slaves 38
  39. 39. Espresso: Storage Node Data Store (MySQL) – Stores document as Avro serialized blob – Blob indexed by (partition key {, sub-key}) – Row also contains limited metadata  Etag, Last modified time, Avro schema version Document Schema specifies per-field index constraints Lucene index per partition key / resource 39
  40. 40. Espresso: Replication MySQL replication of mastered partitions MySQL “Slave” is MySQL instance with custom storage engine – custom storage engine just publishes to databus Per-database commit sequence number Replication is Databus – Supports existing downstream consumers Storage node consumes from Databus to update secondary indexes and slave partitions 40

×