SlideShare a Scribd company logo
Ensuring Consistency in a Replicated World 
Josh 
Snyder 
2014-­‐09-­‐30
2 
what is Yelp?
• we operate in a bunch of markets 
• aim to be globally distributed 
• our users should never see stale content 
• our developers should be able to design an application resilient to 
replication delay 
3 
goals
4 
a sample architecture
• a small set of moving parts 
• enables us to do more with fewer shards 
• masks geographic traffic split from users and developers 
• enhanced tolerance to replication delay 
• ability to 
– perform online replication hierarchy changes 
– batch-load data 
5 
our toolset
6 
cookies
• give the client a short-lived “dirty session” cookie 
• encode the time of the latest interaction between you and them 
• expire or ignore the cookie after replicas have caught up 
7 
cookies
• load balancer: 
• POST? 
• GET? -> cookie? 
• routes the request into the appropriate datacenter 
• adds headers to requests 
8 
request routing
• users get read-after-write consistency 
• routing a user’s request between datacenters increases latency 
! 
• getting it wrong: increased load on the master database 
9 
tradeoffs
• we need to be assured that a user’s request falls back to a datacenter that 
has all of their data 
10 
tradeoffs
• we need a clear picture of it 
• never underestimate replication delay, always overestimate 
11 
replication delay
• made of lies (for this purpose) 
• underestimates most of the time 
• overestimates some of the time 
12 
Seconds_Behind_Master 
http://bugs.mysql.com/bug.php?id=66921
13 
heartbeats
• insert known data on the master 
• wait until you see it on the slave 
• time waited is replication delay 
14 
heartbeats
15 
clocks are evil
16 
clocks are evil (2)
17 
pt-heartbeat
18 
yelp_heartbeat
19 
the secret sauce
• A sensu check: 
20 
what does that get us? (pt 1)
21 
why that way?
22 
time is hard 
http://bugs.mysql.com/bug.php?id=48326
• aggregates heartbeat information 
• provides it to the webapp 
• determines when to expire the dirty session cookie 
23 
repl_delay_reporter
• Wait for replication: 
• “I inserted some data; when will it be available on all replicas?” 
• Throttle to replication: 
• “I want to bulk insert data. Will doing so cause too much replication delay?” 
24 
operations
• insert some data 
• ask the master database “what’s the heartbeat right now?” 
• ask the repl_delay_reporter “what’s the lowest heartbeat right now?” 
• wait a bit 
• loop until the lowest heartbeat exceeds the original master heartbeat 
25 
wait for replication
• determines when to expire the dirty session cookie 
• relies on only 1 clock, and only for monotonicity 
• used heavily by batches 
– provides read-after-write consistency 
26 
wait for replication
• prevents batches from causing excessive replication delay 
• operates before the beginning of each transaction 
– batches ask “is replication delay low enough for me to write right 
now?” 
• batches are required to keep their transactions reasonably-sized 
27 
throttle to replication
• load on masters 
• laggards 
• over-throttling 
28 
gotchas
• batch data can reside on the same shards that serve OLTP requests 
• support databases with heterogenous SLAs 
• automatic load-shedding when there is a replication issue 
29 
what this gets us
• shunting of nearly ALL reading and reporting off of the master 
• better mileage out of the Percona toolkit 
• on-line replication hierarchy changes 
30 
what this gets us
Ensuring Consistency in a Replicated World

More Related Content

What's hot

Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
Gwen (Chen) Shapira
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
confluent
 
Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster.org
 
Apache con2016final
Apache con2016final Apache con2016final
Apache con2016final
Salesforce
 
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
confluent
 
Benchmarking NGINX for Accuracy and Results
Benchmarking NGINX for Accuracy and ResultsBenchmarking NGINX for Accuracy and Results
Benchmarking NGINX for Accuracy and Results
NGINX, Inc.
 
How Much Kafka?
How Much Kafka?How Much Kafka?
How Much Kafka?
confluent
 
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data PlatformStream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
confluent
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Gwen (Chen) Shapira
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scale
Matteo Merli
 
SaltConf14 - Brendan Burns, Google - Management at Google Scale
SaltConf14 - Brendan Burns, Google - Management at Google ScaleSaltConf14 - Brendan Burns, Google - Management at Google Scale
SaltConf14 - Brendan Burns, Google - Management at Google Scale
SaltStack
 
Effectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache PulsarEffectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache Pulsar
Matteo Merli
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
Gwen (Chen) Shapira
 
Hands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarHands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache Pulsar
Sijie Guo
 
Mobile 3: Launch Like a Boss!
Mobile 3: Launch Like a Boss!Mobile 3: Launch Like a Boss!
Mobile 3: Launch Like a Boss!
MongoDB
 
Apache BookKeeper Distributed Store- a Salesforce use case
Apache BookKeeper Distributed Store- a Salesforce use caseApache BookKeeper Distributed Store- a Salesforce use case
Apache BookKeeper Distributed Store- a Salesforce use case
Salesforce Engineering
 
High performance messaging with Apache Pulsar
High performance messaging with Apache PulsarHigh performance messaging with Apache Pulsar
High performance messaging with Apache Pulsar
Matteo Merli
 
How to Fail at Kafka
How to Fail at KafkaHow to Fail at Kafka
How to Fail at Kafka
confluent
 
Altitude SF 2017: Optimizing your hit rate
Altitude SF 2017: Optimizing your hit rateAltitude SF 2017: Optimizing your hit rate
Altitude SF 2017: Optimizing your hit rate
Fastly
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
confluent
 

What's hot (20)

Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
 
Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...Gluster Metrics: why they are crucial for running stable deployments of all s...
Gluster Metrics: why they are crucial for running stable deployments of all s...
 
Apache con2016final
Apache con2016final Apache con2016final
Apache con2016final
 
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
 
Benchmarking NGINX for Accuracy and Results
Benchmarking NGINX for Accuracy and ResultsBenchmarking NGINX for Accuracy and Results
Benchmarking NGINX for Accuracy and Results
 
How Much Kafka?
How Much Kafka?How Much Kafka?
How Much Kafka?
 
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data PlatformStream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scale
 
SaltConf14 - Brendan Burns, Google - Management at Google Scale
SaltConf14 - Brendan Burns, Google - Management at Google ScaleSaltConf14 - Brendan Burns, Google - Management at Google Scale
SaltConf14 - Brendan Burns, Google - Management at Google Scale
 
Effectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache PulsarEffectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache Pulsar
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
 
Hands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarHands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache Pulsar
 
Mobile 3: Launch Like a Boss!
Mobile 3: Launch Like a Boss!Mobile 3: Launch Like a Boss!
Mobile 3: Launch Like a Boss!
 
Apache BookKeeper Distributed Store- a Salesforce use case
Apache BookKeeper Distributed Store- a Salesforce use caseApache BookKeeper Distributed Store- a Salesforce use case
Apache BookKeeper Distributed Store- a Salesforce use case
 
High performance messaging with Apache Pulsar
High performance messaging with Apache PulsarHigh performance messaging with Apache Pulsar
High performance messaging with Apache Pulsar
 
How to Fail at Kafka
How to Fail at KafkaHow to Fail at Kafka
How to Fail at Kafka
 
Altitude SF 2017: Optimizing your hit rate
Altitude SF 2017: Optimizing your hit rateAltitude SF 2017: Optimizing your hit rate
Altitude SF 2017: Optimizing your hit rate
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 

Viewers also liked

Microservices Summit - The Human Side of Services
Microservices Summit - The Human Side of ServicesMicroservices Summit - The Human Side of Services
Microservices Summit - The Human Side of Services
Yelp Engineering
 
Yelp Tech Talks: Mobile Testing 1, 2, 3
Yelp Tech Talks: Mobile Testing 1, 2, 3Yelp Tech Talks: Mobile Testing 1, 2, 3
Yelp Tech Talks: Mobile Testing 1, 2, 3
Yelp Engineering
 
Building a World Class Security Team
Building a World Class Security TeamBuilding a World Class Security Team
Building a World Class Security Team
Yelp Engineering
 
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E..."Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
Yelp Engineering
 
Giving Design Critique
Giving Design CritiqueGiving Design Critique
Giving Design Critique
Yelp Engineering
 
Humans by the hundred
Humans by the hundredHumans by the hundred
Humans by the hundred
Yelp Engineering
 

Viewers also liked (6)

Microservices Summit - The Human Side of Services
Microservices Summit - The Human Side of ServicesMicroservices Summit - The Human Side of Services
Microservices Summit - The Human Side of Services
 
Yelp Tech Talks: Mobile Testing 1, 2, 3
Yelp Tech Talks: Mobile Testing 1, 2, 3Yelp Tech Talks: Mobile Testing 1, 2, 3
Yelp Tech Talks: Mobile Testing 1, 2, 3
 
Building a World Class Security Team
Building a World Class Security TeamBuilding a World Class Security Team
Building a World Class Security Team
 
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E..."Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
 
Giving Design Critique
Giving Design CritiqueGiving Design Critique
Giving Design Critique
 
Humans by the hundred
Humans by the hundredHumans by the hundred
Humans by the hundred
 

Similar to Ensuring Consistency in a Replicated World

MySQL Infrastructure Testing Automation at GitHub
MySQL Infrastructure Testing Automation at GitHubMySQL Infrastructure Testing Automation at GitHub
MySQL Infrastructure Testing Automation at GitHub
Ike Walker
 
Membase East Coast Meetups
Membase East Coast MeetupsMembase East Coast Meetups
Membase East Coast Meetups
Membase
 
(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management
BIOVIA
 
MySQL Performance Tuning at COSCUP 2014
MySQL Performance Tuning at COSCUP 2014MySQL Performance Tuning at COSCUP 2014
MySQL Performance Tuning at COSCUP 2014
Ryusuke Kajiyama
 
Membase Intro from Membase Meetup San Francisco
Membase Intro from Membase Meetup San FranciscoMembase Intro from Membase Meetup San Francisco
Membase Intro from Membase Meetup San Francisco
Membase
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
BIOVIA
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Kristofferson A
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
Gwen (Chen) Shapira
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
Joe Alex
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into Overdrive
Todd Palino
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Speedment, Inc.
 
Monitoring Apache Kafka
Monitoring Apache KafkaMonitoring Apache Kafka
Monitoring Apache Kafka
confluent
 
Architecting for the cloud elasticity security
Architecting for the cloud elasticity securityArchitecting for the cloud elasticity security
Architecting for the cloud elasticity security
Len Bass
 
Work with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMsWork with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMs
Malin Weiss
 
Big Data for QAs
Big Data for QAsBig Data for QAs
Big Data for QAs
Ahmed Misbah
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
Nick Dimiduk
 
Got Problems? Let's Do a Health Check
Got Problems? Let's Do a Health CheckGot Problems? Let's Do a Health Check
Got Problems? Let's Do a Health Check
Luis Guirigay
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Hakka Labs
 
Case Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataCase Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of Data
Schubert Zhang
 
Why would I store my data in more than one database?
Why would I store my data in more than one database?Why would I store my data in more than one database?
Why would I store my data in more than one database?
Kurtosys Systems
 

Similar to Ensuring Consistency in a Replicated World (20)

MySQL Infrastructure Testing Automation at GitHub
MySQL Infrastructure Testing Automation at GitHubMySQL Infrastructure Testing Automation at GitHub
MySQL Infrastructure Testing Automation at GitHub
 
Membase East Coast Meetups
Membase East Coast MeetupsMembase East Coast Meetups
Membase East Coast Meetups
 
(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management
 
MySQL Performance Tuning at COSCUP 2014
MySQL Performance Tuning at COSCUP 2014MySQL Performance Tuning at COSCUP 2014
MySQL Performance Tuning at COSCUP 2014
 
Membase Intro from Membase Meetup San Francisco
Membase Intro from Membase Meetup San FranciscoMembase Intro from Membase Meetup San Francisco
Membase Intro from Membase Meetup San Francisco
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into Overdrive
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
 
Monitoring Apache Kafka
Monitoring Apache KafkaMonitoring Apache Kafka
Monitoring Apache Kafka
 
Architecting for the cloud elasticity security
Architecting for the cloud elasticity securityArchitecting for the cloud elasticity security
Architecting for the cloud elasticity security
 
Work with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMsWork with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMs
 
Big Data for QAs
Big Data for QAsBig Data for QAs
Big Data for QAs
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
 
Got Problems? Let's Do a Health Check
Got Problems? Let's Do a Health CheckGot Problems? Let's Do a Health Check
Got Problems? Let's Do a Health Check
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
Case Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataCase Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of Data
 
Why would I store my data in more than one database?
Why would I store my data in more than one database?Why would I store my data in more than one database?
Why would I store my data in more than one database?
 

More from Yelp Engineering

Human Ops
Human OpsHuman Ops
Human Ops
Yelp Engineering
 
Teeing Up Python - Code Golf
Teeing Up Python - Code GolfTeeing Up Python - Code Golf
Teeing Up Python - Code Golf
Yelp Engineering
 
Fluxx Streaming
Fluxx StreamingFluxx Streaming
Fluxx Streaming
Yelp Engineering
 
Humans by the hundred (DevOps Days Ohio)
Humans by the hundred (DevOps Days Ohio)Humans by the hundred (DevOps Days Ohio)
Humans by the hundred (DevOps Days Ohio)
Yelp Engineering
 
A Beginners Guide To Launching Yelp In Hong Kong
A Beginners Guide To Launching Yelp In Hong KongA Beginners Guide To Launching Yelp In Hong Kong
A Beginners Guide To Launching Yelp In Hong Kong
Yelp Engineering
 
MySQL At Yelp
MySQL At YelpMySQL At Yelp
MySQL At Yelp
Yelp Engineering
 
Own Your Career
Own Your CareerOwn Your Career
Own Your Career
Yelp Engineering
 
Scaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique VisitorsScaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique Visitors
Yelp Engineering
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
Yelp Engineering
 
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen..."Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
Yelp Engineering
 

More from Yelp Engineering (10)

Human Ops
Human OpsHuman Ops
Human Ops
 
Teeing Up Python - Code Golf
Teeing Up Python - Code GolfTeeing Up Python - Code Golf
Teeing Up Python - Code Golf
 
Fluxx Streaming
Fluxx StreamingFluxx Streaming
Fluxx Streaming
 
Humans by the hundred (DevOps Days Ohio)
Humans by the hundred (DevOps Days Ohio)Humans by the hundred (DevOps Days Ohio)
Humans by the hundred (DevOps Days Ohio)
 
A Beginners Guide To Launching Yelp In Hong Kong
A Beginners Guide To Launching Yelp In Hong KongA Beginners Guide To Launching Yelp In Hong Kong
A Beginners Guide To Launching Yelp In Hong Kong
 
MySQL At Yelp
MySQL At YelpMySQL At Yelp
MySQL At Yelp
 
Own Your Career
Own Your CareerOwn Your Career
Own Your Career
 
Scaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique VisitorsScaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique Visitors
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
 
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen..."Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
 

Recently uploaded

GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 

Recently uploaded (20)

GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 

Ensuring Consistency in a Replicated World

  • 1. Ensuring Consistency in a Replicated World Josh Snyder 2014-­‐09-­‐30
  • 2. 2 what is Yelp?
  • 3. • we operate in a bunch of markets • aim to be globally distributed • our users should never see stale content • our developers should be able to design an application resilient to replication delay 3 goals
  • 4. 4 a sample architecture
  • 5. • a small set of moving parts • enables us to do more with fewer shards • masks geographic traffic split from users and developers • enhanced tolerance to replication delay • ability to – perform online replication hierarchy changes – batch-load data 5 our toolset
  • 7. • give the client a short-lived “dirty session” cookie • encode the time of the latest interaction between you and them • expire or ignore the cookie after replicas have caught up 7 cookies
  • 8. • load balancer: • POST? • GET? -> cookie? • routes the request into the appropriate datacenter • adds headers to requests 8 request routing
  • 9. • users get read-after-write consistency • routing a user’s request between datacenters increases latency ! • getting it wrong: increased load on the master database 9 tradeoffs
  • 10. • we need to be assured that a user’s request falls back to a datacenter that has all of their data 10 tradeoffs
  • 11. • we need a clear picture of it • never underestimate replication delay, always overestimate 11 replication delay
  • 12. • made of lies (for this purpose) • underestimates most of the time • overestimates some of the time 12 Seconds_Behind_Master http://bugs.mysql.com/bug.php?id=66921
  • 14. • insert known data on the master • wait until you see it on the slave • time waited is replication delay 14 heartbeats
  • 16. 16 clocks are evil (2)
  • 19. 19 the secret sauce
  • 20. • A sensu check: 20 what does that get us? (pt 1)
  • 21. 21 why that way?
  • 22. 22 time is hard http://bugs.mysql.com/bug.php?id=48326
  • 23. • aggregates heartbeat information • provides it to the webapp • determines when to expire the dirty session cookie 23 repl_delay_reporter
  • 24. • Wait for replication: • “I inserted some data; when will it be available on all replicas?” • Throttle to replication: • “I want to bulk insert data. Will doing so cause too much replication delay?” 24 operations
  • 25. • insert some data • ask the master database “what’s the heartbeat right now?” • ask the repl_delay_reporter “what’s the lowest heartbeat right now?” • wait a bit • loop until the lowest heartbeat exceeds the original master heartbeat 25 wait for replication
  • 26. • determines when to expire the dirty session cookie • relies on only 1 clock, and only for monotonicity • used heavily by batches – provides read-after-write consistency 26 wait for replication
  • 27. • prevents batches from causing excessive replication delay • operates before the beginning of each transaction – batches ask “is replication delay low enough for me to write right now?” • batches are required to keep their transactions reasonably-sized 27 throttle to replication
  • 28. • load on masters • laggards • over-throttling 28 gotchas
  • 29. • batch data can reside on the same shards that serve OLTP requests • support databases with heterogenous SLAs • automatic load-shedding when there is a replication issue 29 what this gets us
  • 30. • shunting of nearly ALL reading and reporting off of the master • better mileage out of the Percona toolkit • on-line replication hierarchy changes 30 what this gets us