SlideShare a Scribd company logo
1 of 49
Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
*
***
Sid	
  Anand	
  
QCon	
  NY	
  2013	
  
Building	
  a	
  Modern	
  Website	
  for	
  Scale	
  
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/website-outages
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
About Me
2
*
Current Life…
Ø  LinkedIn
Ø  Search, Network, and Analytics (SNA)
Ø  Search Infrastructure
Ø  Me
In a Previous Life…
Ø  LinkedIn, Data Infrastructure, Architect
Ø  Netflix, Cloud Database Architect
Ø  eBay, Web Development, Research Lab, & Search Engine
And Many Years Prior…
Ø  Studying Distributed Systems at Cornell University
@r39132 2
Our mission
Connect the world’s professionals to make
them more productive and successful
3@r39132 3
Over 200M members and counting
2 4 8
17
32
55
90
145
2004 2005 2006 2007 2008 2009 2010 2011 2012
LinkedIn Members (Millions)
200+
The world’s largest professional network
Growing at more than 2 members/sec
Source :
http://press.linkedin.com/about
5
*
>88%Fortune 100 Companies
use LinkedIn Talent Soln to hire
Company Pages
>2.9M
Professional searches in 2012
>5.7B
Languages
19
@r39132 5
>30MFastest growing demographic:
Students and NCGs
The world’s largest professional network
Over 64% of members are now international
Source :
http://press.linkedin.com/about
Other Company Facts
6
*
•  Headquartered in Mountain View, Calif., with offices around the world!
•  As of June 1, 2013, LinkedIn has ~3,700 full-time employees located around the
world
@r39132 6
Source :
http://press.linkedin.com/about
Agenda
ü  Company Overview
§  Serving Architecture
§  How Does LinkedIn Scale
–  Web Services
–  Databases
–  Messaging
–  Other
§  Q & A
7@r39132
Serving Architecture
8@r39132 8
Overview
•  Our site runs primarily on Java, with some use of Scala for specific
infrastructure
•  The presentation tier is an exception – runs on everything!
•  What runs on Scala?
•  Network Graph Engine
•  Kafka
•  Some front ends (Play)
•  Most of our services run on Jetty
LinkedIn : Serving Architecture	

@r39132 9
LinkedIn : Serving Architecture	

@r39132 10
Frontier
Presentation Tier
Play
Spring
MVC
NodeJS JRuby Grails Django
USSR (Chrome V8 JS Engine)
Our presentation tier is composed of ATS with 2 plugins:
•  Fizzy
•  A content aggregator that unifies content across a diverse set of front-ends
•  Open-source JS templating framework
•  USSR (a.k.a. Unified Server-Side Rendering)
•  Packages Google Chrome’s V8 JS engine as an ATS plugin
A A B B
Master
C C
D D E E F F
Presentation Tier
Business Service Tier
Data Service Tier
Data Infrastructure
Slave Master Master
Memcached
à  A web page requests
information A and B
à  A thin layer focused on
building the UI. It assembles
the page by making parallel
requests to BST services
à  Encapsulates business
logic. Can call other BST
clusters and its own DST
cluster.
à  Encapsulates DAL logic and
concerned with one Oracle
Schema.
à  Concerned with the
persistent storage of and
easy access to data
LinkedIn : Serving Architecture	

Hadoop
@r39132 11
Other
12
Serving Architecture : Other?	

Oracle or
Espresso Data Change Events
Search
Index
Graph
Index
Read
Replicas
Updates
Standard
ization
As I will discuss later, data that is committed to Databases needs to also be made
available to a host of other online serving systems :
•  Search
•  Standardization Services
•  These provider canonical names for your titles, companies, schools, skills,
fields of study, etc..
•  Graph engine
•  Recommender Systems
This data change feed needs to be scalable, reliable, and fast. [ Databus ]
@r39132
13
Serving Architecture : Hadoop	

@r39132
How do we Hadoop to Serve?
•  Hadoop is central to our analytic infrastructure
•  We ship data streams into Hadoop from our
primary Databases via Databus & from
applications via Kafka
•  Hadoop jobs take daily or hourly dumps of this
data and compute data files that Voldemort can
load!
•  Voldemort loads these files and serves them on
the site
Voldemort : RO Store Usage at LinkedIn	

People You May Know	

LinkedIn Skills	

Related Searches	

Viewers of this profile also viewed	

Events you may be interested in	

 Jobs you may be
interested in	

@r39132 14
How Does LinkedIn Scale?
15@r39132 15
Scaling Web Services
LinkedIn : Web Services
16@r39132
LinkedIn : Scaling Web Services 	

@r39132 17
Problem
•  How do 150+ web services communicate with each other to fulfill user requests in
the most efficient and fault-tolerant manner?
•  How do they handle slow downstream dependencies?
•  For illustration sake, consider the following scenario:
•  Service B has 2 hosts
•  Service C has 2 hosts
•  A machine in Service B sends a web request to a machine in Service C
A A B B C C
LinkedIn : Scaling Web Services 	

@r39132 18
What sorts of failure modes are we concerned about?
•  A machine in service C
•  has a long GC pause
•  calls a service that has a long GC pause
•  calls a service that calls a service that has a long GC pause
•  … see where I am going?
•  A machine in service C or in its downstream dependencies may be slow for any
reason, not just GC (e.g. bottlenecks on CPU, IO, and memory, lock-contention)
Goal : Given all of this, how can we ensure high uptime?
Hint : Pick the right architecture and implement best-practices on top of it!
LinkedIn : Scaling Web Services 	

@r39132 19
In the early days, LinkedIn made a big bet on Spring and Spring RPC.
Issues
1.  Spring RPC is difficult to debug
•  You cannot call the service using simple command-line tools like Curl
•  Since the RPC call is implemented as a binary payload over HTTP, http access
logs are not very useful
B B
C C
LB
2.  A Spring RPC-based architecture leads to high MTTR
•  Spring RPC is not flexible and pluggable -- we cannot use
•  custom client-side load balancing strategies
•  custom fault-tolerance features
•  Instead, all we can do is to put all of our service nodes
behind a hardware load-balancer & pray!
•  If a Service C node experiences a slowness issue, a
NOC engineer needs to be alerted and then manually
remove it from the LB (MTTR > 30 minutes)
LinkedIn : Scaling Web Services 	

@r39132 20
Solution
A better solution is one that we see often in both cloud-based architectures and
NoSQL systems : Dynamic Discovery + Client-side load-balancing
Step 1 :
Service C nodes announce their
availability to serve traffic to a ZK
registry
Step 2 :
Service B nodes get updates
from ZK
B B
C C
ZK
ZK
ZK
B B
C C
ZK
ZK
ZK
Step 3 :
Service B nodes route
traffic to service C nodes
B B
C C
ZK
ZK
ZK
LinkedIn : Scaling Web Services 	

@r39132 21
With this new paradigm for discovering services and routing requests to
them, we can incorporate additional fault-tolerant services
LinkedIn : Scaling Web Services 	

@r39132 22
Best Practices
•  Fault-tolerance Support
1.  No client should wait indefinitely for a response from a service
•  Issues
•  Waiting causes a traffic jam : all upstream clients end up also getting
blocked
•  Each service has a fixed number of Jetty or Tomcat threads. Once
those are all tied up waiting, no new requests can be handled
•  Solution
•  After a configurable timeout, return
•  Store different SLAs in ZK for each REST end-points
•  In other words, all calls are not the same and should not have
the same read time out
LinkedIn : Scaling Web Services 	

@r39132 23
Best Practices
•  Fault-tolerance Support
2.  Isolate calls to back-ends from one another
•  Issues
•  You depend on a responses from independent services A and B. If A
slows down, will you still be able to serve B?
•  Details
•  This is a common use-case for federated services and for shard-
aggregators :
•  E.g. Search at LinkedIn is federated and will call people-
search, job-search, group-search, etc... In parallel
•  E.g. People-search is itself sharded, so an additional shard-
aggregation step needs to happen across 100s of shards
•  Solution
•  Use Async requests or independent ExecutorServices for sync
requests (one per each shard or vertical)
LinkedIn : Scaling Web Services 	

@r39132 24
Best Practices
•  Fault-tolerance Support
3.  Cancel Unnecessary Work
•  Issues
•  Work issued down the call-graphs is unnecessary if the clients at the
top of the call graph have already timed out
•  Imagine that as a call reaches half-way down your call-tree, the
caller at the root times out.
•  You will still issue work down the remaining half-depth of your
tree unless you cancel it!
•  Solution
•  A possible approach
•  Root of the call-tree adds (<tree-UUID>, inProgress status) to
Memcached
•  All services pass the tree-UUID down the call-tree (e.g. as a
HTTP custom request header)
•  Servlet filters at each hop check whether inProgress == false. If
so, immediately respond with an empty response
LinkedIn : Scaling Web Services 	

@r39132 25
Best Practices
•  Fault-tolerance Support
4.  Avoid Sending Requests to Hosts that are GCing
•  Issues
•  If a client sends a web request to a host in Service C and if that host
is experiencing a GC pause, the client will wait 50-200ms, depending
on the read time out for the request
•  During that GC pause other requests will also be sent to that node
before they all eventually time out
•  Solution
•  Send a “GC scout” request before every “real” web request
LinkedIn : Scaling Web Services 	

@r39132 26
Why is this a good idea?
•  Scout requests are cheap and provide negligible overhead for requests
Step 1 :
A Service B node sends a cheap
1 msec TCP request to a
dedicated “scout” Netty port
Step 2 :
If the scout request comes
back within 1 msec, send the
real request to the Tomcat or
Jetty port
Step 3 :
Else repeat with a different
host in Service C
B B
Netty Tomcat
ZK
ZK
ZK
C
B B
Netty Tomcat
ZK
ZK
ZK
C
B B ZK
ZK
ZK
Netty Tomcat
C
Netty Tomcat
C
LinkedIn : Scaling Web Services 	

@r39132 27
Best Practice
•  Fault-tolerance Support
5.  Services should protect themselves from traffic bursts
•  Issues
•  Service nodes should protect themselves from being over-whelmed
by requests
•  This will also protect their downstream servers from being
overwhelmed
•  Simply setting the tomcat or jetty thread pool size is not always an
option. Often times, these are not configurable per application.
•  Solution
•  Use a sliding window counter. If the counter exceeds a configured
threshold, return immediately with a 503 (‘service unavailable’)
•  Set threshold below Tomcat or Jetty thread pool size
Espresso : Scaling Databases
LinkedIn : Databases
28@r39132
Espresso : Overview 	

@r39132 29
Problem
•  What do we do when we run out of QPS capacity on an Oracle database server?
•  You can only buy yourself out of this problem so far (i.e. buy a bigger box)
•  Read-replicas and memcached will help scale reads, but not writes!
Solution à Espresso
You need a horizontally-scalable database!
Espresso is LinkedIn’s newest NoSQL store. It offers the following features:
•  Horizontal Scalability
•  Works on commodity hardware
•  Document-centric
•  Avro documents supporting rich-nested data models
•  Schema-evolution is drama free
•  Extensions for Lucene indexing
•  Supports Transactions (within a partition, e.g. memberId)
•  Supports conditional reads & writes using standard HTTP headers (e.g. if-modified-since)
Espresso : Overview 	

@r39132 30
Why not use Open-source?
•  Change capture stream (e.g. Databus)
•  Backup-restore
•  Mature storage-engine (innodb)
31
•  Components
•  Request Routing Tier
•  Consults Cluster Manager to
discover node to route to
•  Forwards request to
appropriate storage node
•  Storage Tier
•  Data Store (MySQL)
•  Local Secondary Index
(Lucene)
•  Cluster Manager
•  Responsible for data set
partitioning
•  Manages storage nodes
•  Relay Tier
•  Replicates data to consumers
Espresso: Architecture	

@r39132
Databus : Scaling Databases
LinkedIn : Database Streams
32@r39132
33
DataBus : Overview	

Problem
Our databases (Oracle & Espresso) are used for R/W web-site traffic. However,
various services (Search, Graph DB, Standardization, etc…) need the ability to
•  Read the data as it is changed in these OLTP stores
•  Occasionally, scan the contents in order rebuild their entire state
Solution è Databus
Databus provides a consistent, in-time-order stream of database changes that
•  Scales horizontally
•  Protects the source database from high-read-load
@r39132
Where Does LinkedIn use
DataBus?
34@r39132 34
35
DataBus : Usage @ LinkedIn	

Oracle or
Espresso Data Change Events
Search
Index
Graph
Index
Read
Replicas
Updates
Standard
ization
A user updates the company, title, & school on his profile. He also accepts a
connection
•  The write is made to an Oracle or Espresso Master and DataBus replicates:
•  the profile change is applied to the Standardization service
Ø  E.g. the many forms of IBM were canonicalized for search-friendliness and
recommendation-friendliness
•  the profile change is applied to the Search Index service
Ø  Recruiters can find you immediately by new keywords
•  the connection change is applied to the Graph Index service
Ø  The user can now start receiving feed updates from his new connections immediately
@r39132
Relay
Event Win
36
DB
Bootstrap
Capture
Changes
On-line
Changes
DB
DataBus consists of 2 services
•  Relay Service
•  Sharded
•  Maintain an in-memory buffer per
shard
•  Each shard polls Oracle and then
deserializes transactions into Avro
•  Bootstrap Service
•  Picks up online changes as they
appear in the Relay
•  Supports 2 types of operations
from clients
Ø  If a client falls behind and
needs records older than what
the relay has, Bootstrap can
send consolidated deltas!
Ø  If a new client comes on line
or if an existing client fell too
far behind, Bootstrap can
send a consistent snapshot
DataBus : Architecture	

@r39132
Relay
Event Win
37
DB
Bootstrap
Capture
Changes
On-line
Changes
On-line
Changes
DB
Consolidated
Delta Since T
Consistent
Snapshot at U
Consumer 1
Consumer n
Databus
ClientLib
Client
Consumer 1
Consumer n
Databus
ClientLib
Client
Guarantees
§  Transactions
§  In-commit-order Delivery à commits are replicated in order
§  Durability à you can replay the change stream at any time in the future
§  Reliability à 0% data loss
§  Low latency à If your consumers can keep up with the relay à sub-second
response time
DataBus : Architecture	

@r39132
38
DataBus : Architecture	

Cool Features
§  Server-side (i.e. relay-side & bootstrap-side) filters
§  Problem
§  Say that your consuming service is sharded 100 ways
§  e.g. Member Search Indexes sharded by member_id % 100
§  index_0, index_1, …, index_99
§  However, you have a single member Databus stream
§  How do you avoid having every shard read data it is not interested in?
§  Solution
§  Easy, Databus already understands the notion of server-side filters
§  It will only send updates to your consumer instance for the shard it is
interested in
@r39132
Kafka: Scaling Messaging
LinkedIn : Messaging
39@r39132
40
Kafka : Overview	

Problem
We have Databus to stream changes that were committed to a database. How do we
capture and stream high-volume data if we relax the requirement that the data needs
long-term durability?
•  In other words, the data can have limited retention
Challenges
•  Needs to handle a large volume of events
•  Needs to be highly-available, scalable, and low-latency
•  Needs to provide limited durability guarantees (e.g. data retained for a week)
Solution è Kafka
Kafka is a messaging system that supports topics. Consumers can subscribe to topics
and read all data within the retention window. Consumers are then notified of new
messages as they appear!
@r39132
41
Kafka is used at LinkedIn for a variety of business-critical needs:
Examples:
•  End-user Activity Tracking (a.k.a. Web Tracking)
•  Emails opened
•  Logins
•  Pages Seen
•  Executed Searches
•  Social Gestures : Likes, Sharing, Comments
•  Data Center Operational Metrics
•  Network & System metrics such as
•  TCP metrics (connection resets, message resends, etc…)
•  System metrics (iops, CPU, load average, etc…)
Kafka : Usage @ LinkedIn	

@r39132
42
WebTier
Topic 1
Broker Tier
Push
Events
Topic 2
Topic N
Zookeeper Message Id
Management
Topic, Partition
Ownership
Sequential write sendfile
Kafka
ClientLib
Consumers
Pull
Events Iterator 1
Iterator n
Topic à Message Id
100 MB/sec 200 MB/sec
§  Pub/Sub
§  Batch Send/Receive
§  E2E Compression
§  System Decoupling
Features Guarantees
§  At least once delivery
§  Very high throughput
§  Low latency (0.8)
§  Durability (for a time period)
§  Horizontally Scalable
Kafka : Architecture	

@r39132
•  Average Unique Message @Peak
•  writes/sec = 460k
•  reads/sec: 2.3m
•  # topics: 693
28 billion unique messages written per day
Scale at LinkedIn
43
Improvements in 0.8
•  Low Latency Features
•  Kafka has always been designed for high-throughput, but E2E latency could
have been as high as 30 seconds
•  Feature 1 : Long-polling
•  For high throughput requests, a consumer’s request for data will always be
fulfilled
•  For low throughput requests, a consumer’s request will likely return 0 bytes,
causing the consumer to back-off and wait. What happens if data arrives on
the broker in the meantime?
•  As of 0.8, a consumer can “park” a request on the broker for as much
as “m milliseconds have passed”
•  If data arrives during this period, it is instantly returned to the
consumer
Kafka : Overview	

@r39132
44
Improvements in 0.8
•  Low Latency Features
•  In the past, data was not visible to a consumer until it was flushed to disk on the
broker.
•  Feature 2 : New Commit Protocol
•  In 0.8, replicas and a new commit protocol has been introduced. As long as
data has been replicated to the memory of all replicas, even if it has not
been flushed to disk on any one of them, it is considered “committed” and
becomes visible to consumers
Kafka : Overview	

@r39132
§  Jay Kreps (Kafka)
§  Neha Narkhede (Kafka)
§  Kishore Gopalakrishna (Helix)
§  Bob Shulman (Espresso)
§  Cuong Tran (Perfomance & Scalability)
§  Diego “Mono” Buthay (Search Infrastructure)
45
Acknowledgments	

@r39132
y Questions?
46@r39132 46
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/website-
outages

More Related Content

More from C4Media

Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/awaitC4Media
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?C4Media
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 

More from C4Media (20)

Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Building Modern Web Sites: A Story of Scalability and Availability

  • 1. Recruiting SolutionsRecruiting SolutionsRecruiting Solutions * *** Sid  Anand   QCon  NY  2013   Building  a  Modern  Website  for  Scale  
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /website-outages
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. About Me 2 * Current Life… Ø  LinkedIn Ø  Search, Network, and Analytics (SNA) Ø  Search Infrastructure Ø  Me In a Previous Life… Ø  LinkedIn, Data Infrastructure, Architect Ø  Netflix, Cloud Database Architect Ø  eBay, Web Development, Research Lab, & Search Engine And Many Years Prior… Ø  Studying Distributed Systems at Cornell University @r39132 2
  • 5. Our mission Connect the world’s professionals to make them more productive and successful 3@r39132 3
  • 6. Over 200M members and counting 2 4 8 17 32 55 90 145 2004 2005 2006 2007 2008 2009 2010 2011 2012 LinkedIn Members (Millions) 200+ The world’s largest professional network Growing at more than 2 members/sec Source : http://press.linkedin.com/about
  • 7. 5 * >88%Fortune 100 Companies use LinkedIn Talent Soln to hire Company Pages >2.9M Professional searches in 2012 >5.7B Languages 19 @r39132 5 >30MFastest growing demographic: Students and NCGs The world’s largest professional network Over 64% of members are now international Source : http://press.linkedin.com/about
  • 8. Other Company Facts 6 * •  Headquartered in Mountain View, Calif., with offices around the world! •  As of June 1, 2013, LinkedIn has ~3,700 full-time employees located around the world @r39132 6 Source : http://press.linkedin.com/about
  • 9. Agenda ü  Company Overview §  Serving Architecture §  How Does LinkedIn Scale –  Web Services –  Databases –  Messaging –  Other §  Q & A 7@r39132
  • 11. Overview •  Our site runs primarily on Java, with some use of Scala for specific infrastructure •  The presentation tier is an exception – runs on everything! •  What runs on Scala? •  Network Graph Engine •  Kafka •  Some front ends (Play) •  Most of our services run on Jetty LinkedIn : Serving Architecture @r39132 9
  • 12. LinkedIn : Serving Architecture @r39132 10 Frontier Presentation Tier Play Spring MVC NodeJS JRuby Grails Django USSR (Chrome V8 JS Engine) Our presentation tier is composed of ATS with 2 plugins: •  Fizzy •  A content aggregator that unifies content across a diverse set of front-ends •  Open-source JS templating framework •  USSR (a.k.a. Unified Server-Side Rendering) •  Packages Google Chrome’s V8 JS engine as an ATS plugin
  • 13. A A B B Master C C D D E E F F Presentation Tier Business Service Tier Data Service Tier Data Infrastructure Slave Master Master Memcached à  A web page requests information A and B à  A thin layer focused on building the UI. It assembles the page by making parallel requests to BST services à  Encapsulates business logic. Can call other BST clusters and its own DST cluster. à  Encapsulates DAL logic and concerned with one Oracle Schema. à  Concerned with the persistent storage of and easy access to data LinkedIn : Serving Architecture Hadoop @r39132 11 Other
  • 14. 12 Serving Architecture : Other? Oracle or Espresso Data Change Events Search Index Graph Index Read Replicas Updates Standard ization As I will discuss later, data that is committed to Databases needs to also be made available to a host of other online serving systems : •  Search •  Standardization Services •  These provider canonical names for your titles, companies, schools, skills, fields of study, etc.. •  Graph engine •  Recommender Systems This data change feed needs to be scalable, reliable, and fast. [ Databus ] @r39132
  • 15. 13 Serving Architecture : Hadoop @r39132 How do we Hadoop to Serve? •  Hadoop is central to our analytic infrastructure •  We ship data streams into Hadoop from our primary Databases via Databus & from applications via Kafka •  Hadoop jobs take daily or hourly dumps of this data and compute data files that Voldemort can load! •  Voldemort loads these files and serves them on the site
  • 16. Voldemort : RO Store Usage at LinkedIn People You May Know LinkedIn Skills Related Searches Viewers of this profile also viewed Events you may be interested in Jobs you may be interested in @r39132 14
  • 17. How Does LinkedIn Scale? 15@r39132 15
  • 18. Scaling Web Services LinkedIn : Web Services 16@r39132
  • 19. LinkedIn : Scaling Web Services @r39132 17 Problem •  How do 150+ web services communicate with each other to fulfill user requests in the most efficient and fault-tolerant manner? •  How do they handle slow downstream dependencies? •  For illustration sake, consider the following scenario: •  Service B has 2 hosts •  Service C has 2 hosts •  A machine in Service B sends a web request to a machine in Service C A A B B C C
  • 20. LinkedIn : Scaling Web Services @r39132 18 What sorts of failure modes are we concerned about? •  A machine in service C •  has a long GC pause •  calls a service that has a long GC pause •  calls a service that calls a service that has a long GC pause •  … see where I am going? •  A machine in service C or in its downstream dependencies may be slow for any reason, not just GC (e.g. bottlenecks on CPU, IO, and memory, lock-contention) Goal : Given all of this, how can we ensure high uptime? Hint : Pick the right architecture and implement best-practices on top of it!
  • 21. LinkedIn : Scaling Web Services @r39132 19 In the early days, LinkedIn made a big bet on Spring and Spring RPC. Issues 1.  Spring RPC is difficult to debug •  You cannot call the service using simple command-line tools like Curl •  Since the RPC call is implemented as a binary payload over HTTP, http access logs are not very useful B B C C LB 2.  A Spring RPC-based architecture leads to high MTTR •  Spring RPC is not flexible and pluggable -- we cannot use •  custom client-side load balancing strategies •  custom fault-tolerance features •  Instead, all we can do is to put all of our service nodes behind a hardware load-balancer & pray! •  If a Service C node experiences a slowness issue, a NOC engineer needs to be alerted and then manually remove it from the LB (MTTR > 30 minutes)
  • 22. LinkedIn : Scaling Web Services @r39132 20 Solution A better solution is one that we see often in both cloud-based architectures and NoSQL systems : Dynamic Discovery + Client-side load-balancing Step 1 : Service C nodes announce their availability to serve traffic to a ZK registry Step 2 : Service B nodes get updates from ZK B B C C ZK ZK ZK B B C C ZK ZK ZK Step 3 : Service B nodes route traffic to service C nodes B B C C ZK ZK ZK
  • 23. LinkedIn : Scaling Web Services @r39132 21 With this new paradigm for discovering services and routing requests to them, we can incorporate additional fault-tolerant services
  • 24. LinkedIn : Scaling Web Services @r39132 22 Best Practices •  Fault-tolerance Support 1.  No client should wait indefinitely for a response from a service •  Issues •  Waiting causes a traffic jam : all upstream clients end up also getting blocked •  Each service has a fixed number of Jetty or Tomcat threads. Once those are all tied up waiting, no new requests can be handled •  Solution •  After a configurable timeout, return •  Store different SLAs in ZK for each REST end-points •  In other words, all calls are not the same and should not have the same read time out
  • 25. LinkedIn : Scaling Web Services @r39132 23 Best Practices •  Fault-tolerance Support 2.  Isolate calls to back-ends from one another •  Issues •  You depend on a responses from independent services A and B. If A slows down, will you still be able to serve B? •  Details •  This is a common use-case for federated services and for shard- aggregators : •  E.g. Search at LinkedIn is federated and will call people- search, job-search, group-search, etc... In parallel •  E.g. People-search is itself sharded, so an additional shard- aggregation step needs to happen across 100s of shards •  Solution •  Use Async requests or independent ExecutorServices for sync requests (one per each shard or vertical)
  • 26. LinkedIn : Scaling Web Services @r39132 24 Best Practices •  Fault-tolerance Support 3.  Cancel Unnecessary Work •  Issues •  Work issued down the call-graphs is unnecessary if the clients at the top of the call graph have already timed out •  Imagine that as a call reaches half-way down your call-tree, the caller at the root times out. •  You will still issue work down the remaining half-depth of your tree unless you cancel it! •  Solution •  A possible approach •  Root of the call-tree adds (<tree-UUID>, inProgress status) to Memcached •  All services pass the tree-UUID down the call-tree (e.g. as a HTTP custom request header) •  Servlet filters at each hop check whether inProgress == false. If so, immediately respond with an empty response
  • 27. LinkedIn : Scaling Web Services @r39132 25 Best Practices •  Fault-tolerance Support 4.  Avoid Sending Requests to Hosts that are GCing •  Issues •  If a client sends a web request to a host in Service C and if that host is experiencing a GC pause, the client will wait 50-200ms, depending on the read time out for the request •  During that GC pause other requests will also be sent to that node before they all eventually time out •  Solution •  Send a “GC scout” request before every “real” web request
  • 28. LinkedIn : Scaling Web Services @r39132 26 Why is this a good idea? •  Scout requests are cheap and provide negligible overhead for requests Step 1 : A Service B node sends a cheap 1 msec TCP request to a dedicated “scout” Netty port Step 2 : If the scout request comes back within 1 msec, send the real request to the Tomcat or Jetty port Step 3 : Else repeat with a different host in Service C B B Netty Tomcat ZK ZK ZK C B B Netty Tomcat ZK ZK ZK C B B ZK ZK ZK Netty Tomcat C Netty Tomcat C
  • 29. LinkedIn : Scaling Web Services @r39132 27 Best Practice •  Fault-tolerance Support 5.  Services should protect themselves from traffic bursts •  Issues •  Service nodes should protect themselves from being over-whelmed by requests •  This will also protect their downstream servers from being overwhelmed •  Simply setting the tomcat or jetty thread pool size is not always an option. Often times, these are not configurable per application. •  Solution •  Use a sliding window counter. If the counter exceeds a configured threshold, return immediately with a 503 (‘service unavailable’) •  Set threshold below Tomcat or Jetty thread pool size
  • 30. Espresso : Scaling Databases LinkedIn : Databases 28@r39132
  • 31. Espresso : Overview @r39132 29 Problem •  What do we do when we run out of QPS capacity on an Oracle database server? •  You can only buy yourself out of this problem so far (i.e. buy a bigger box) •  Read-replicas and memcached will help scale reads, but not writes! Solution à Espresso You need a horizontally-scalable database! Espresso is LinkedIn’s newest NoSQL store. It offers the following features: •  Horizontal Scalability •  Works on commodity hardware •  Document-centric •  Avro documents supporting rich-nested data models •  Schema-evolution is drama free •  Extensions for Lucene indexing •  Supports Transactions (within a partition, e.g. memberId) •  Supports conditional reads & writes using standard HTTP headers (e.g. if-modified-since)
  • 32. Espresso : Overview @r39132 30 Why not use Open-source? •  Change capture stream (e.g. Databus) •  Backup-restore •  Mature storage-engine (innodb)
  • 33. 31 •  Components •  Request Routing Tier •  Consults Cluster Manager to discover node to route to •  Forwards request to appropriate storage node •  Storage Tier •  Data Store (MySQL) •  Local Secondary Index (Lucene) •  Cluster Manager •  Responsible for data set partitioning •  Manages storage nodes •  Relay Tier •  Replicates data to consumers Espresso: Architecture @r39132
  • 34. Databus : Scaling Databases LinkedIn : Database Streams 32@r39132
  • 35. 33 DataBus : Overview Problem Our databases (Oracle & Espresso) are used for R/W web-site traffic. However, various services (Search, Graph DB, Standardization, etc…) need the ability to •  Read the data as it is changed in these OLTP stores •  Occasionally, scan the contents in order rebuild their entire state Solution è Databus Databus provides a consistent, in-time-order stream of database changes that •  Scales horizontally •  Protects the source database from high-read-load @r39132
  • 36. Where Does LinkedIn use DataBus? 34@r39132 34
  • 37. 35 DataBus : Usage @ LinkedIn Oracle or Espresso Data Change Events Search Index Graph Index Read Replicas Updates Standard ization A user updates the company, title, & school on his profile. He also accepts a connection •  The write is made to an Oracle or Espresso Master and DataBus replicates: •  the profile change is applied to the Standardization service Ø  E.g. the many forms of IBM were canonicalized for search-friendliness and recommendation-friendliness •  the profile change is applied to the Search Index service Ø  Recruiters can find you immediately by new keywords •  the connection change is applied to the Graph Index service Ø  The user can now start receiving feed updates from his new connections immediately @r39132
  • 38. Relay Event Win 36 DB Bootstrap Capture Changes On-line Changes DB DataBus consists of 2 services •  Relay Service •  Sharded •  Maintain an in-memory buffer per shard •  Each shard polls Oracle and then deserializes transactions into Avro •  Bootstrap Service •  Picks up online changes as they appear in the Relay •  Supports 2 types of operations from clients Ø  If a client falls behind and needs records older than what the relay has, Bootstrap can send consolidated deltas! Ø  If a new client comes on line or if an existing client fell too far behind, Bootstrap can send a consistent snapshot DataBus : Architecture @r39132
  • 39. Relay Event Win 37 DB Bootstrap Capture Changes On-line Changes On-line Changes DB Consolidated Delta Since T Consistent Snapshot at U Consumer 1 Consumer n Databus ClientLib Client Consumer 1 Consumer n Databus ClientLib Client Guarantees §  Transactions §  In-commit-order Delivery à commits are replicated in order §  Durability à you can replay the change stream at any time in the future §  Reliability à 0% data loss §  Low latency à If your consumers can keep up with the relay à sub-second response time DataBus : Architecture @r39132
  • 40. 38 DataBus : Architecture Cool Features §  Server-side (i.e. relay-side & bootstrap-side) filters §  Problem §  Say that your consuming service is sharded 100 ways §  e.g. Member Search Indexes sharded by member_id % 100 §  index_0, index_1, …, index_99 §  However, you have a single member Databus stream §  How do you avoid having every shard read data it is not interested in? §  Solution §  Easy, Databus already understands the notion of server-side filters §  It will only send updates to your consumer instance for the shard it is interested in @r39132
  • 41. Kafka: Scaling Messaging LinkedIn : Messaging 39@r39132
  • 42. 40 Kafka : Overview Problem We have Databus to stream changes that were committed to a database. How do we capture and stream high-volume data if we relax the requirement that the data needs long-term durability? •  In other words, the data can have limited retention Challenges •  Needs to handle a large volume of events •  Needs to be highly-available, scalable, and low-latency •  Needs to provide limited durability guarantees (e.g. data retained for a week) Solution è Kafka Kafka is a messaging system that supports topics. Consumers can subscribe to topics and read all data within the retention window. Consumers are then notified of new messages as they appear! @r39132
  • 43. 41 Kafka is used at LinkedIn for a variety of business-critical needs: Examples: •  End-user Activity Tracking (a.k.a. Web Tracking) •  Emails opened •  Logins •  Pages Seen •  Executed Searches •  Social Gestures : Likes, Sharing, Comments •  Data Center Operational Metrics •  Network & System metrics such as •  TCP metrics (connection resets, message resends, etc…) •  System metrics (iops, CPU, load average, etc…) Kafka : Usage @ LinkedIn @r39132
  • 44. 42 WebTier Topic 1 Broker Tier Push Events Topic 2 Topic N Zookeeper Message Id Management Topic, Partition Ownership Sequential write sendfile Kafka ClientLib Consumers Pull Events Iterator 1 Iterator n Topic à Message Id 100 MB/sec 200 MB/sec §  Pub/Sub §  Batch Send/Receive §  E2E Compression §  System Decoupling Features Guarantees §  At least once delivery §  Very high throughput §  Low latency (0.8) §  Durability (for a time period) §  Horizontally Scalable Kafka : Architecture @r39132 •  Average Unique Message @Peak •  writes/sec = 460k •  reads/sec: 2.3m •  # topics: 693 28 billion unique messages written per day Scale at LinkedIn
  • 45. 43 Improvements in 0.8 •  Low Latency Features •  Kafka has always been designed for high-throughput, but E2E latency could have been as high as 30 seconds •  Feature 1 : Long-polling •  For high throughput requests, a consumer’s request for data will always be fulfilled •  For low throughput requests, a consumer’s request will likely return 0 bytes, causing the consumer to back-off and wait. What happens if data arrives on the broker in the meantime? •  As of 0.8, a consumer can “park” a request on the broker for as much as “m milliseconds have passed” •  If data arrives during this period, it is instantly returned to the consumer Kafka : Overview @r39132
  • 46. 44 Improvements in 0.8 •  Low Latency Features •  In the past, data was not visible to a consumer until it was flushed to disk on the broker. •  Feature 2 : New Commit Protocol •  In 0.8, replicas and a new commit protocol has been introduced. As long as data has been replicated to the memory of all replicas, even if it has not been flushed to disk on any one of them, it is considered “committed” and becomes visible to consumers Kafka : Overview @r39132
  • 47. §  Jay Kreps (Kafka) §  Neha Narkhede (Kafka) §  Kishore Gopalakrishna (Helix) §  Bob Shulman (Espresso) §  Cuong Tran (Perfomance & Scalability) §  Diego “Mono” Buthay (Search Infrastructure) 45 Acknowledgments @r39132
  • 49. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/website- outages