SlideShare a Scribd company logo
1 of 32
Download to read offline
Building Reliable Services From Unreliable Components
Operational Buddhism
Ernie Souhrada
Database Engineer / Bit Wrangler, Pinterest
Percona Live Data Performance Conference – 20 April 2016
 1
•  Introductions
•  Who Am I, Why Am I Here?
•  Pinterest Infrastructure
•  Operational Materialism
•  The Rise of Utility Computing
•  Operational Buddhism
•  Four Noble Truths
•  The Pinterest Way
•  A PaaS Not Taken
•  Q&A, Credits
Agenda
2
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
My god, it’s full of cats!
Who am I?
•  Database Engineer at Pinterest (January 2015)
–  One of two people solely responsible for hundreds of TB of MySQL data
–  Also loosely affiliated with HBase and Core SRE teams
•  Previously: Percona, Sun, assorted random small companies
•  Jack of many trades, master of some

Why am I here?
•  Interested in almost EVERYTHING (not just tech)
•  Success in a DBA/SRE role requires cross-pollination.
Who Am I, Why Am I Here?
3
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Turning technical skill into cat food since 1996
•  Pinterest is 100% hosted in the AWS cloud.
•  Petabytes of data spread across MySQL, HBase, S3, Redis, etc.
•  Tens of thousands of servers running at any given time
•  Hundreds of unique services interacting with each other
•  We make heavy use of some AWS offerings:
–  EC2
–  S3
–  Route53
–  Redshift
•  Others, not so much (or at all):
–  RDS
–  CloudFront
–  ElastiCache
–  ElasticBeanstalk
Pinterest Infrastructure
4
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
The view from ceiling cat’s perspective
A Trip Back In Time: Operational Materialism
5
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Or, how did we ever manage without “the cloud” ?
Consider the world before the rise of
AWS, Google Cloud, Microsoft Azure. 

Need computing power or an Internet
presence? Not many options:
•  Use a managed-services provider.
•  Build it yourself.

In the end, these are basically the same.
Operational Materialism
6
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
It’s still servers all the way down.
Someone still has to deal with the vendors, buy the
hardware, rack it, stack it, configure it, and make sure it
all stays up and functional within agreed-upon SLAs.
The Four Sorrows
Operational Materialism
7
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
1.  Individual servers matter.
2.  Failure is expensive, so it must be prevented.
3.  Capacity planning can make or break you.

4.  Sometimes your destiny is still outside your control.
#1: Individual servers matter.
Operational Materialism
8
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
A server dies… now what?
•  Hot/warm stand-by
•  Spare parts / DIY
•  Roll the trucks!
•  Did you remember to buy the extended
warranty / gold service plan?
#2: Failure is expensive.
Operational Materialism
9
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Server XYZ is down ! the site is down. 
•  What is this costing you in terms of lost business? 

(The Lamborghini Factor)

•  Cost of employee time to recover?

•  What about the cost to your reputation?

Not a situation we want to be in.
#2: Therefore, it must be avoided or prevented.
Operational Materialism
10
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Upgrade ALL THE THINGS!
•  Server grade hardware
•  Spare parts, spare servers
•  Cluster and scale up and out
•  Multiple network paths
•  Redundant generators
•  Backup data centers
•  And so on….
•  Don’t forget the extra humans!
Want another nine? Add another zero.
Operational Materialism
11
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Mo’ Problems ! Mo’ Money ! Different Problems
-  Failure prevention infrastructure: complexity increases
-  Server grade hardware: cost increases
-  More humans: every type of pain imaginable increases








Efficiency cat is not pleased with this situation.
#3: Capacity planning can make or break you.
Operational Materialism
12
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Got capacity?
•  Not enough à performance sucks
•  Too much à wasted resources



The problem of TIME .…
#4: Sometimes your destiny is still outside your control.
Operational Materialism
13
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Even the best capacity planning models can break

down due to unforeseen circumstances.
-  Natural disasters
-  Supply chain disruptions
-  Legal conflicts
-  The “Slashdot Effect”
•  Not all doom and gloom – stuff was built, services were operated.
•  Organizations managed their infrastructure this way for years.
•  Many still do.


•  But sometimes the landscape changes in a fundamental way.
Operational Materialism
14
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
All good things must come to an end?
The Rise of Utility Computing
15
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Forecast calls for clouds.
Free your mind, and your servers will follow.
The Rise of Utility Computing
16
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Some promises[1] of utility computing:
•  Unlimited, on-demand capacity
•  No massive up-front capital expenditures
•  Democratization of computing technology
•  Focus on building products, not running servers
•  Experimentation is easy; failure inexpensive
[1] Some of these promises are more easily kept than others.
TANSTAAFL


The Rise of Utility Computing
17
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
But there are tradeoffs….
-  Reduced architectural flexibility
-  Black box infrastructure
-  Unpredictable performance
-  Strong potential for vendor lock-in
-  New challenges to cost containment


Operational Materialism doesn’t work here. We need a new mindset.
No servers. No attachment. No suffering. No problem.
Operational Buddhism
18
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016
Four Noble Truths
Operational Buddhism
19
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
1.  Cloud servers can, and do, fail at any time, for any reason.

2.  Trying to prevent this server failure is an endless source of suffering
for SREs and DBAs alike.

3.  Accepting the impermanence of our servers, we should design
systems that are failure-resilient, not failure-resistant.

4.  We can break the cycle of suffering and create a better experience for
end users, internal customers, and colleagues.
#1: Failure happens.
Operational Buddhism
20
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Cloud-based servers can fail at any time, for any reason.
•  Underlying physical hardware problems
•  Oversubscription / “noisy neighbors”
•  Hypervisor bugs
•  Cascading failures from elsewhere in the cloud
#2: Attachment to servers leads to suffering.
Operational Buddhism
21
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Trying to prevent individual server failure is an unending source of suffering for
SREs and DBAs alike.
No control over physical infrastructure.
No visibility into physical infrastructure.
No guarantees are possible.


VMs fail at a much higher rate than server-grade
bare metal hardware.
Operational Buddhism
22
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Accept the impermanence of individual servers, and in doing so, design systems
that are failure-resilient, not failure-resistant.
#3: Be failure-resilient, not failure-resistant.
THIS IS NOT A SERVER – MEOW!
THESE ARE SERVERS – MOO!
Operational Buddhism
23
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
We can escape the cycle of suffering and create a better experience for our
internal customers, end users, and colleagues.
#4: The cycle can be broken.
The best infrastructure is invisible to those that
rely on it or build on top of it.


STUFF JUST WORKS.
The Pinterest Way
Operational Buddhism
24
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Servers can die at any time for any reason.
•  Automated replacement
–  AWS Auto-Scaling Groups 
–  Teletraan[1]: Deployment and auto-scaling platform
–  Morpheus[2]: Automated remediation framework
–  Destroy all humans!
•  Configuration management tools
–  We are a Puppet shop.
–  You should use whatever works for you.
[1] https://github.com/pinterest/teletraan 
[2] Not open-sourced… yet. Later this year, perhaps.
The Pinterest Way
Operational Buddhism
25
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
AWS Auto-Scaling Groups
•  Based on server count, not resource usage
•  Useful even if there’s only one backend machine
•  Front with an ELB for best results
•  Example: Nagios Server
The Pinterest Way
Operational Buddhism
26
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Auto-Scaling With Teletraan
•  Scale based on resource utilization
•  More fine-grained control than Amazon ASGs
•  Example: Application server fleet

Fixing The Matrix With Morpheus
•  Multi-stage remediation workflow engine
•  Example: MySQL slave replacement
•  Dumb, slow, and cautious
The Pinterest Way
Operational Buddhism
27
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Trying to prevent server failure leads only to suffering.
•  Don’t do it.
•  Don’t even try.
•  Shoot them in the head and move on.
•  It’s not always necessary (or even possible) to know why 

a server went AWOL.
The Pinterest Way
Operational Buddhism
28
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Design systems that are failure-resilient;

avoid operational anti-patterns.
•  Retry logic with back-off can be useful.
•  Hardcoded hostnames are the devil.
•  KingPin[1, 2]:
•  ZooKeeper-based service discovery and
runtime configuration management tools
•  Convergence across ~20K hosts in under 10sec

[1] https://engineering.pinterest.com/blog/open-sourcing-kingpin-building-blocks-scaling-pinterest 
[2] https://github.com/pinterest/kingpin
The Pinterest Way
Operational Buddhism
29
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Service Discovery with KingPin
•  ZooKeeper by itself can be a liability at scale.
•  ZUM-times you win, ZUM-times you lose.
The Pinterest Way
Operational Buddhism
30
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
Break the cycle of suffering; create a better
experience for all involved.
•  Solid infrastructure is just the beginning.
•  Automate yourself out of shit work;

free up cycles for more interesting challenges.
•  Developer buy-in is critical all the way up the stack.



Servers may come and go, but uptime is forever.
A PaaS Not Taken
Operational Buddhism
31
Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 
•  Infrastructure-as-a-Service (IaaS) vs. Platform-as-a-Service (PaaS) ?
•  No canonical right or wrong answer.
–  Those who would tell you otherwise are trying to sell you something.
•  At Pinterest, we lean heavily towards IaaS.
–  Just give us servers; we’ll handle the rest.
–  We want the flexibility & control; we have the resources to manage it.
•  PaaS shifts more of the burden to the provider.
–  Additional abstraction comes with additional costs & restrictions.
–  Being tied too heavily to one vendor is always dangerous.
•  Example: Amazon RDS vs. MySQL on EC2
32
Questions? Answers! Credits!
email: esouhrada@pinterest.com | twitter: @denshikarasu | pinterest engineering blog: https://engineering.pinterest.com
Cat memes are from all over the
Internet, primarily:
-  icanhazcheezburger.com
-  lolcatz.com
-  http.cat
We are hiring!
https://careers.pinterest.com 

But most importantly…
As much as I’d like to say I thought of it
first, the term “Operational Buddhism”
was coined by one of my Pinterest
colleagues, Danilo Stefanovic.

More Related Content

What's hot

50 Shades of Fail KScope16
50 Shades of Fail KScope1650 Shades of Fail KScope16
50 Shades of Fail KScope16Christian Berg
 
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...Spark Summit
 
Lessons from Sharding Solr
Lessons from Sharding SolrLessons from Sharding Solr
Lessons from Sharding SolrGregg Donovan
 
ARIN 36 IETF IPv6 Activities Report
ARIN 36 IETF IPv6 Activities ReportARIN 36 IETF IPv6 Activities Report
ARIN 36 IETF IPv6 Activities ReportARIN
 
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQLNoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQLAndrew Morgan
 
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011Michael McIntosh
 
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)John Adams
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersJulien Anguenot
 
Deploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopDeploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopAllen Wittenauer
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingAll Things Open
 
Diagnosing MySQL performance problems
Diagnosing  MySQL performance problemsDiagnosing  MySQL performance problems
Diagnosing MySQL performance problemsJustin Swanhart
 
Play concurrency
Play concurrencyPlay concurrency
Play concurrencyJustin Long
 
Cassandra summit 2013 how not to use cassandra
Cassandra summit 2013  how not to use cassandraCassandra summit 2013  how not to use cassandra
Cassandra summit 2013 how not to use cassandraAxel Liljencrantz
 
The Highs and Lows of Stateful Containers
The Highs and Lows of Stateful ContainersThe Highs and Lows of Stateful Containers
The Highs and Lows of Stateful ContainersC4Media
 
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hairRENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hairJohn Constable
 

What's hot (20)

50 Shades of Fail KScope16
50 Shades of Fail KScope1650 Shades of Fail KScope16
50 Shades of Fail KScope16
 
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
 
Lessons from Sharding Solr
Lessons from Sharding SolrLessons from Sharding Solr
Lessons from Sharding Solr
 
ARIN 36 IETF IPv6 Activities Report
ARIN 36 IETF IPv6 Activities ReportARIN 36 IETF IPv6 Activities Report
ARIN 36 IETF IPv6 Activities Report
 
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQLNoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
 
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
 
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developers
 
Deploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopDeploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache Hadoop
 
DrupalCon 2011 Highlight
DrupalCon 2011 HighlightDrupalCon 2011 Highlight
DrupalCon 2011 Highlight
 
Coscup
CoscupCoscup
Coscup
 
Database TCO
Database TCODatabase TCO
Database TCO
 
YARN
YARNYARN
YARN
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
 
Diagnosing MySQL performance problems
Diagnosing  MySQL performance problemsDiagnosing  MySQL performance problems
Diagnosing MySQL performance problems
 
Play concurrency
Play concurrencyPlay concurrency
Play concurrency
 
Cassandra summit 2013 how not to use cassandra
Cassandra summit 2013  how not to use cassandraCassandra summit 2013  how not to use cassandra
Cassandra summit 2013 how not to use cassandra
 
The Highs and Lows of Stateful Containers
The Highs and Lows of Stateful ContainersThe Highs and Lows of Stateful Containers
The Highs and Lows of Stateful Containers
 
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hairRENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
 

Viewers also liked

Deliver Docker Containers Continuously On AWS - DevOpsCon Munich 2016
Deliver Docker Containers Continuously On AWS - DevOpsCon Munich 2016Deliver Docker Containers Continuously On AWS - DevOpsCon Munich 2016
Deliver Docker Containers Continuously On AWS - DevOpsCon Munich 2016Philipp Garbe
 
DPC 2016 - 53 Minutes or Less - Architecting For Failure
DPC 2016 - 53 Minutes or Less - Architecting For FailureDPC 2016 - 53 Minutes or Less - Architecting For Failure
DPC 2016 - 53 Minutes or Less - Architecting For Failurebenwaine
 
Level Up Your Automated Tests
Level Up Your Automated TestsLevel Up Your Automated Tests
Level Up Your Automated TestsTrisha Gee
 
Distributed Systems Theory for Mere Mortals
Distributed Systems Theory for Mere MortalsDistributed Systems Theory for Mere Mortals
Distributed Systems Theory for Mere MortalsEnsar Basri Kahveci
 
Introduction to Docker by Adrian Mouat
Introduction to Docker by Adrian MouatIntroduction to Docker by Adrian Mouat
Introduction to Docker by Adrian MouatContainer Solutions
 
Docker 1.12 networking deep dive
Docker 1.12 networking deep diveDocker 1.12 networking deep dive
Docker 1.12 networking deep diveMadhu Venugopal
 
Post 1200 final final
Post 1200 final finalPost 1200 final final
Post 1200 final finalAHTR
 
Without Resilience, Nothing Else Matters
Without Resilience, Nothing Else MattersWithout Resilience, Nothing Else Matters
Without Resilience, Nothing Else MattersJonas Bonér
 
[devops REX 2016] Retour d’expérience de la transformation DevOps de Microsoft
[devops REX 2016] Retour d’expérience de la transformation DevOps de Microsoft[devops REX 2016] Retour d’expérience de la transformation DevOps de Microsoft
[devops REX 2016] Retour d’expérience de la transformation DevOps de Microsoftdevops REX
 
From Microliths To Microsystems
From Microliths To MicrosystemsFrom Microliths To Microsystems
From Microliths To MicrosystemsJonas Bonér
 

Viewers also liked (10)

Deliver Docker Containers Continuously On AWS - DevOpsCon Munich 2016
Deliver Docker Containers Continuously On AWS - DevOpsCon Munich 2016Deliver Docker Containers Continuously On AWS - DevOpsCon Munich 2016
Deliver Docker Containers Continuously On AWS - DevOpsCon Munich 2016
 
DPC 2016 - 53 Minutes or Less - Architecting For Failure
DPC 2016 - 53 Minutes or Less - Architecting For FailureDPC 2016 - 53 Minutes or Less - Architecting For Failure
DPC 2016 - 53 Minutes or Less - Architecting For Failure
 
Level Up Your Automated Tests
Level Up Your Automated TestsLevel Up Your Automated Tests
Level Up Your Automated Tests
 
Distributed Systems Theory for Mere Mortals
Distributed Systems Theory for Mere MortalsDistributed Systems Theory for Mere Mortals
Distributed Systems Theory for Mere Mortals
 
Introduction to Docker by Adrian Mouat
Introduction to Docker by Adrian MouatIntroduction to Docker by Adrian Mouat
Introduction to Docker by Adrian Mouat
 
Docker 1.12 networking deep dive
Docker 1.12 networking deep diveDocker 1.12 networking deep dive
Docker 1.12 networking deep dive
 
Post 1200 final final
Post 1200 final finalPost 1200 final final
Post 1200 final final
 
Without Resilience, Nothing Else Matters
Without Resilience, Nothing Else MattersWithout Resilience, Nothing Else Matters
Without Resilience, Nothing Else Matters
 
[devops REX 2016] Retour d’expérience de la transformation DevOps de Microsoft
[devops REX 2016] Retour d’expérience de la transformation DevOps de Microsoft[devops REX 2016] Retour d’expérience de la transformation DevOps de Microsoft
[devops REX 2016] Retour d’expérience de la transformation DevOps de Microsoft
 
From Microliths To Microsystems
From Microliths To MicrosystemsFrom Microliths To Microsystems
From Microliths To Microsystems
 

Similar to Operational Buddhism: Building Reliable Services From Unreliable Components - Percona Live 2016

The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationInside Analysis
 
Webinar share point performance feb2016 slideshare
Webinar share point performance feb2016 slideshareWebinar share point performance feb2016 slideshare
Webinar share point performance feb2016 slideshareDynatrace
 
SharePoint Summit 2013 - Vancouver - MS Access 2013 - The new (old) thing
SharePoint Summit 2013 - Vancouver - MS Access 2013 - The new (old) thingSharePoint Summit 2013 - Vancouver - MS Access 2013 - The new (old) thing
SharePoint Summit 2013 - Vancouver - MS Access 2013 - The new (old) thingRuven Gotz
 
SharePoint Performance: Physical to Virtual to Microsoft Azure Cloud and Offi...
SharePoint Performance: Physical to Virtual to Microsoft Azure Cloud and Offi...SharePoint Performance: Physical to Virtual to Microsoft Azure Cloud and Offi...
SharePoint Performance: Physical to Virtual to Microsoft Azure Cloud and Offi...Joel Oleson
 
You've Got No UI?! (Agile Data Teams)
You've Got No UI?! (Agile Data Teams)You've Got No UI?! (Agile Data Teams)
You've Got No UI?! (Agile Data Teams)Mark Barber
 
SPS Kansas City - MS-Access and SharePoint - The new old thing - November 2013
SPS Kansas City - MS-Access and SharePoint - The new old thing - November 2013SPS Kansas City - MS-Access and SharePoint - The new old thing - November 2013
SPS Kansas City - MS-Access and SharePoint - The new old thing - November 2013Ruven Gotz
 
Making sense of microservices, service mesh, and serverless
Making sense of microservices, service mesh, and serverlessMaking sense of microservices, service mesh, and serverless
Making sense of microservices, service mesh, and serverlessChristian Posta
 
7 Fatal Mistakes Made When Migrating From SP 2007 to SP 2010
7 Fatal Mistakes Made When Migrating  From SP 2007 to SP 20107 Fatal Mistakes Made When Migrating  From SP 2007 to SP 2010
7 Fatal Mistakes Made When Migrating From SP 2007 to SP 2010Netwoven Inc.
 
The Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to BeThe Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to Beconfluent
 
ASTQB washington-sept-2015
ASTQB washington-sept-2015ASTQB washington-sept-2015
ASTQB washington-sept-2015Dan Boutin
 
Outsourcing Your SharePoint Hosting - the clouds fine print magnified
Outsourcing Your SharePoint Hosting - the clouds fine print magnifiedOutsourcing Your SharePoint Hosting - the clouds fine print magnified
Outsourcing Your SharePoint Hosting - the clouds fine print magnifiedSherWeb
 
Sps chicago suburbs outsourcing your share point hosting - the clouds fine ...
Sps chicago suburbs   outsourcing your share point hosting - the clouds fine ...Sps chicago suburbs   outsourcing your share point hosting - the clouds fine ...
Sps chicago suburbs outsourcing your share point hosting - the clouds fine ...SherWeb
 
Sps chicago suburbs outsourcing your share point hosting - the clouds fine ...
Sps chicago suburbs   outsourcing your share point hosting - the clouds fine ...Sps chicago suburbs   outsourcing your share point hosting - the clouds fine ...
Sps chicago suburbs outsourcing your share point hosting - the clouds fine ...SherWeb
 
Building a modern data platform with scala, akka, apache beam
Building a modern data platform with scala, akka, apache beamBuilding a modern data platform with scala, akka, apache beam
Building a modern data platform with scala, akka, apache beamRaymond Tay
 
Data Stack Considerations: Build vs. Buy at Tout
Data Stack Considerations: Build vs. Buy at ToutData Stack Considerations: Build vs. Buy at Tout
Data Stack Considerations: Build vs. Buy at ToutLooker
 
Why Bad Data May Be Your Best Opportunity
Why Bad Data May Be Your Best OpportunityWhy Bad Data May Be Your Best Opportunity
Why Bad Data May Be Your Best OpportunityZach Gardner
 
Considering bare metal as a viable cloud option
Considering bare metal as a viable cloud optionConsidering bare metal as a viable cloud option
Considering bare metal as a viable cloud optionInternap
 
SharePoint Saturday Richmond - So you want to implement SharePoint 2010, what...
SharePoint Saturday Richmond - So you want to implement SharePoint 2010, what...SharePoint Saturday Richmond - So you want to implement SharePoint 2010, what...
SharePoint Saturday Richmond - So you want to implement SharePoint 2010, what...eavanesian
 

Similar to Operational Buddhism: Building Reliable Services From Unreliable Components - Percona Live 2016 (20)

The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data Exploration
 
Webinar share point performance feb2016 slideshare
Webinar share point performance feb2016 slideshareWebinar share point performance feb2016 slideshare
Webinar share point performance feb2016 slideshare
 
SharePoint Summit 2013 - Vancouver - MS Access 2013 - The new (old) thing
SharePoint Summit 2013 - Vancouver - MS Access 2013 - The new (old) thingSharePoint Summit 2013 - Vancouver - MS Access 2013 - The new (old) thing
SharePoint Summit 2013 - Vancouver - MS Access 2013 - The new (old) thing
 
SharePoint Performance: Physical to Virtual to Microsoft Azure Cloud and Offi...
SharePoint Performance: Physical to Virtual to Microsoft Azure Cloud and Offi...SharePoint Performance: Physical to Virtual to Microsoft Azure Cloud and Offi...
SharePoint Performance: Physical to Virtual to Microsoft Azure Cloud and Offi...
 
You've Got No UI?! (Agile Data Teams)
You've Got No UI?! (Agile Data Teams)You've Got No UI?! (Agile Data Teams)
You've Got No UI?! (Agile Data Teams)
 
SPS Kansas City - MS-Access and SharePoint - The new old thing - November 2013
SPS Kansas City - MS-Access and SharePoint - The new old thing - November 2013SPS Kansas City - MS-Access and SharePoint - The new old thing - November 2013
SPS Kansas City - MS-Access and SharePoint - The new old thing - November 2013
 
Making sense of microservices, service mesh, and serverless
Making sense of microservices, service mesh, and serverlessMaking sense of microservices, service mesh, and serverless
Making sense of microservices, service mesh, and serverless
 
7 Fatal Mistakes Made When Migrating From SP 2007 to SP 2010
7 Fatal Mistakes Made When Migrating  From SP 2007 to SP 20107 Fatal Mistakes Made When Migrating  From SP 2007 to SP 2010
7 Fatal Mistakes Made When Migrating From SP 2007 to SP 2010
 
The Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to BeThe Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to Be
 
ASTQB washington-sept-2015
ASTQB washington-sept-2015ASTQB washington-sept-2015
ASTQB washington-sept-2015
 
Outsourcing Your SharePoint Hosting - the clouds fine print magnified
Outsourcing Your SharePoint Hosting - the clouds fine print magnifiedOutsourcing Your SharePoint Hosting - the clouds fine print magnified
Outsourcing Your SharePoint Hosting - the clouds fine print magnified
 
Sps chicago suburbs outsourcing your share point hosting - the clouds fine ...
Sps chicago suburbs   outsourcing your share point hosting - the clouds fine ...Sps chicago suburbs   outsourcing your share point hosting - the clouds fine ...
Sps chicago suburbs outsourcing your share point hosting - the clouds fine ...
 
Sps chicago suburbs outsourcing your share point hosting - the clouds fine ...
Sps chicago suburbs   outsourcing your share point hosting - the clouds fine ...Sps chicago suburbs   outsourcing your share point hosting - the clouds fine ...
Sps chicago suburbs outsourcing your share point hosting - the clouds fine ...
 
API ARU-ARU
API ARU-ARUAPI ARU-ARU
API ARU-ARU
 
Anzo intelligence workbench
Anzo intelligence workbenchAnzo intelligence workbench
Anzo intelligence workbench
 
Building a modern data platform with scala, akka, apache beam
Building a modern data platform with scala, akka, apache beamBuilding a modern data platform with scala, akka, apache beam
Building a modern data platform with scala, akka, apache beam
 
Data Stack Considerations: Build vs. Buy at Tout
Data Stack Considerations: Build vs. Buy at ToutData Stack Considerations: Build vs. Buy at Tout
Data Stack Considerations: Build vs. Buy at Tout
 
Why Bad Data May Be Your Best Opportunity
Why Bad Data May Be Your Best OpportunityWhy Bad Data May Be Your Best Opportunity
Why Bad Data May Be Your Best Opportunity
 
Considering bare metal as a viable cloud option
Considering bare metal as a viable cloud optionConsidering bare metal as a viable cloud option
Considering bare metal as a viable cloud option
 
SharePoint Saturday Richmond - So you want to implement SharePoint 2010, what...
SharePoint Saturday Richmond - So you want to implement SharePoint 2010, what...SharePoint Saturday Richmond - So you want to implement SharePoint 2010, what...
SharePoint Saturday Richmond - So you want to implement SharePoint 2010, what...
 

Recently uploaded

Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 

Recently uploaded (20)

Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 

Operational Buddhism: Building Reliable Services From Unreliable Components - Percona Live 2016

  • 1. Building Reliable Services From Unreliable Components Operational Buddhism Ernie Souhrada Database Engineer / Bit Wrangler, Pinterest Percona Live Data Performance Conference – 20 April 2016 1
  • 2. •  Introductions •  Who Am I, Why Am I Here? •  Pinterest Infrastructure •  Operational Materialism •  The Rise of Utility Computing •  Operational Buddhism •  Four Noble Truths •  The Pinterest Way •  A PaaS Not Taken •  Q&A, Credits Agenda 2 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 My god, it’s full of cats!
  • 3. Who am I? •  Database Engineer at Pinterest (January 2015) –  One of two people solely responsible for hundreds of TB of MySQL data –  Also loosely affiliated with HBase and Core SRE teams •  Previously: Percona, Sun, assorted random small companies •  Jack of many trades, master of some Why am I here? •  Interested in almost EVERYTHING (not just tech) •  Success in a DBA/SRE role requires cross-pollination. Who Am I, Why Am I Here? 3 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Turning technical skill into cat food since 1996
  • 4. •  Pinterest is 100% hosted in the AWS cloud. •  Petabytes of data spread across MySQL, HBase, S3, Redis, etc. •  Tens of thousands of servers running at any given time •  Hundreds of unique services interacting with each other •  We make heavy use of some AWS offerings: –  EC2 –  S3 –  Route53 –  Redshift •  Others, not so much (or at all): –  RDS –  CloudFront –  ElastiCache –  ElasticBeanstalk Pinterest Infrastructure 4 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 The view from ceiling cat’s perspective
  • 5. A Trip Back In Time: Operational Materialism 5 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Or, how did we ever manage without “the cloud” ? Consider the world before the rise of AWS, Google Cloud, Microsoft Azure. Need computing power or an Internet presence? Not many options: •  Use a managed-services provider. •  Build it yourself. In the end, these are basically the same.
  • 6. Operational Materialism 6 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 It’s still servers all the way down. Someone still has to deal with the vendors, buy the hardware, rack it, stack it, configure it, and make sure it all stays up and functional within agreed-upon SLAs.
  • 7. The Four Sorrows Operational Materialism 7 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 1.  Individual servers matter. 2.  Failure is expensive, so it must be prevented. 3.  Capacity planning can make or break you. 4.  Sometimes your destiny is still outside your control.
  • 8. #1: Individual servers matter. Operational Materialism 8 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 A server dies… now what? •  Hot/warm stand-by •  Spare parts / DIY •  Roll the trucks! •  Did you remember to buy the extended warranty / gold service plan?
  • 9. #2: Failure is expensive. Operational Materialism 9 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Server XYZ is down ! the site is down. •  What is this costing you in terms of lost business? 
 (The Lamborghini Factor) •  Cost of employee time to recover? •  What about the cost to your reputation? Not a situation we want to be in.
  • 10. #2: Therefore, it must be avoided or prevented. Operational Materialism 10 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Upgrade ALL THE THINGS! •  Server grade hardware •  Spare parts, spare servers •  Cluster and scale up and out •  Multiple network paths •  Redundant generators •  Backup data centers •  And so on…. •  Don’t forget the extra humans!
  • 11. Want another nine? Add another zero. Operational Materialism 11 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Mo’ Problems ! Mo’ Money ! Different Problems -  Failure prevention infrastructure: complexity increases -  Server grade hardware: cost increases -  More humans: every type of pain imaginable increases Efficiency cat is not pleased with this situation.
  • 12. #3: Capacity planning can make or break you. Operational Materialism 12 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Got capacity? •  Not enough à performance sucks •  Too much à wasted resources The problem of TIME .…
  • 13. #4: Sometimes your destiny is still outside your control. Operational Materialism 13 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Even the best capacity planning models can break
 down due to unforeseen circumstances. -  Natural disasters -  Supply chain disruptions -  Legal conflicts -  The “Slashdot Effect”
  • 14. •  Not all doom and gloom – stuff was built, services were operated. •  Organizations managed their infrastructure this way for years. •  Many still do. •  But sometimes the landscape changes in a fundamental way. Operational Materialism 14 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 All good things must come to an end?
  • 15. The Rise of Utility Computing 15 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Forecast calls for clouds.
  • 16. Free your mind, and your servers will follow. The Rise of Utility Computing 16 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Some promises[1] of utility computing: •  Unlimited, on-demand capacity •  No massive up-front capital expenditures •  Democratization of computing technology •  Focus on building products, not running servers •  Experimentation is easy; failure inexpensive [1] Some of these promises are more easily kept than others.
  • 17. TANSTAAFL The Rise of Utility Computing 17 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 But there are tradeoffs…. -  Reduced architectural flexibility -  Black box infrastructure -  Unpredictable performance -  Strong potential for vendor lock-in -  New challenges to cost containment Operational Materialism doesn’t work here. We need a new mindset.
  • 18. No servers. No attachment. No suffering. No problem. Operational Buddhism 18 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016
  • 19. Four Noble Truths Operational Buddhism 19 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 1.  Cloud servers can, and do, fail at any time, for any reason. 2.  Trying to prevent this server failure is an endless source of suffering for SREs and DBAs alike. 3.  Accepting the impermanence of our servers, we should design systems that are failure-resilient, not failure-resistant. 4.  We can break the cycle of suffering and create a better experience for end users, internal customers, and colleagues.
  • 20. #1: Failure happens. Operational Buddhism 20 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Cloud-based servers can fail at any time, for any reason. •  Underlying physical hardware problems •  Oversubscription / “noisy neighbors” •  Hypervisor bugs •  Cascading failures from elsewhere in the cloud
  • 21. #2: Attachment to servers leads to suffering. Operational Buddhism 21 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Trying to prevent individual server failure is an unending source of suffering for SREs and DBAs alike. No control over physical infrastructure. No visibility into physical infrastructure. No guarantees are possible. VMs fail at a much higher rate than server-grade bare metal hardware.
  • 22. Operational Buddhism 22 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Accept the impermanence of individual servers, and in doing so, design systems that are failure-resilient, not failure-resistant. #3: Be failure-resilient, not failure-resistant. THIS IS NOT A SERVER – MEOW! THESE ARE SERVERS – MOO!
  • 23. Operational Buddhism 23 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 We can escape the cycle of suffering and create a better experience for our internal customers, end users, and colleagues. #4: The cycle can be broken. The best infrastructure is invisible to those that rely on it or build on top of it. STUFF JUST WORKS.
  • 24. The Pinterest Way Operational Buddhism 24 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Servers can die at any time for any reason. •  Automated replacement –  AWS Auto-Scaling Groups –  Teletraan[1]: Deployment and auto-scaling platform –  Morpheus[2]: Automated remediation framework –  Destroy all humans! •  Configuration management tools –  We are a Puppet shop. –  You should use whatever works for you. [1] https://github.com/pinterest/teletraan [2] Not open-sourced… yet. Later this year, perhaps.
  • 25. The Pinterest Way Operational Buddhism 25 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 AWS Auto-Scaling Groups •  Based on server count, not resource usage •  Useful even if there’s only one backend machine •  Front with an ELB for best results •  Example: Nagios Server
  • 26. The Pinterest Way Operational Buddhism 26 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Auto-Scaling With Teletraan •  Scale based on resource utilization •  More fine-grained control than Amazon ASGs •  Example: Application server fleet Fixing The Matrix With Morpheus •  Multi-stage remediation workflow engine •  Example: MySQL slave replacement •  Dumb, slow, and cautious
  • 27. The Pinterest Way Operational Buddhism 27 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Trying to prevent server failure leads only to suffering. •  Don’t do it. •  Don’t even try. •  Shoot them in the head and move on. •  It’s not always necessary (or even possible) to know why 
 a server went AWOL.
  • 28. The Pinterest Way Operational Buddhism 28 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Design systems that are failure-resilient;
 avoid operational anti-patterns. •  Retry logic with back-off can be useful. •  Hardcoded hostnames are the devil. •  KingPin[1, 2]: •  ZooKeeper-based service discovery and runtime configuration management tools •  Convergence across ~20K hosts in under 10sec [1] https://engineering.pinterest.com/blog/open-sourcing-kingpin-building-blocks-scaling-pinterest [2] https://github.com/pinterest/kingpin
  • 29. The Pinterest Way Operational Buddhism 29 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Service Discovery with KingPin •  ZooKeeper by itself can be a liability at scale. •  ZUM-times you win, ZUM-times you lose.
  • 30. The Pinterest Way Operational Buddhism 30 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 Break the cycle of suffering; create a better experience for all involved. •  Solid infrastructure is just the beginning. •  Automate yourself out of shit work;
 free up cycles for more interesting challenges. •  Developer buy-in is critical all the way up the stack. Servers may come and go, but uptime is forever.
  • 31. A PaaS Not Taken Operational Buddhism 31 Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest – Percona Live 2016 •  Infrastructure-as-a-Service (IaaS) vs. Platform-as-a-Service (PaaS) ? •  No canonical right or wrong answer. –  Those who would tell you otherwise are trying to sell you something. •  At Pinterest, we lean heavily towards IaaS. –  Just give us servers; we’ll handle the rest. –  We want the flexibility & control; we have the resources to manage it. •  PaaS shifts more of the burden to the provider. –  Additional abstraction comes with additional costs & restrictions. –  Being tied too heavily to one vendor is always dangerous. •  Example: Amazon RDS vs. MySQL on EC2
  • 32. 32 Questions? Answers! Credits! email: esouhrada@pinterest.com | twitter: @denshikarasu | pinterest engineering blog: https://engineering.pinterest.com Cat memes are from all over the Internet, primarily: -  icanhazcheezburger.com -  lolcatz.com -  http.cat We are hiring! https://careers.pinterest.com But most importantly… As much as I’d like to say I thought of it first, the term “Operational Buddhism” was coined by one of my Pinterest colleagues, Danilo Stefanovic.