SlideShare a Scribd company logo
1 of 12
Download to read offline
Why did we think large scale
distributed systems would be
easy?
Gordon Rowell
PuppetConf San Francisco 2013
gordonr@google.com
Background
Site Reliability Engineering runs many services
The same rules always apply:
●  Make the service scale
●  Make the deployment consistent
●  Understand all layers of the system
●  Monitor everything
●  Plan for failure
●  Break things, under controlled conditions
Scaling is fun
We don't deploy "a server"
•  Servers break, power fails
•  Clients/DNS need to be reconfigured
We don't deploy "a cluster"
•  Networks break, servers break, power fails
•  Clients/DNS need to be reconfigured
We deploy redundant clusters
•  Attempt to send clients to nearest serving cluster
•  Anycast allows for unified client configuration
But client DoS is not
Poorly written code...
●  on small numbers of clients...
●  is annoying
Poorly written code...
●  on a huge number of clients...
●  can cause serious infrastructure pain
Write good code and stage your releases
●  Work with the service owners
●  Stage rollouts, allow soak time
●  Have a rollback plan for clients and test it
●  Have DoS limits for services, test them
Load balancing is fun
Do you have enough capacity?
•  How many backends do you need?
•  What happens if half of your backends lose power?
•  What about when half are already out for repairs?
How do you send clients to the right cluster?
•  Client configuration
•  DNS round-robin (simple global load balancing)
•  DNS views (give best answer for client IP)
•  Anycast (portable IP, routed to "nearest" cluster)
•  Consider: DNS views plus Anycast
But global outages are not
Monitor everything
●  Health check failures bring down your service
●  ...by design
Test everything
●  You should expect (and test) data center outages
●  A global outage can ruin your day
●  Cascading failures are unpleasant
Learn from outages
●  Write postmortems
●  Focus on the facts!
●  What went wrong and what can be better?
●  A postmortem is not about blame
Thundering herds are not
For Puppet
•  "Lots" of Mac desktops and laptops
•  "Lots" of Ubuntu desktops, laptops and servers
•  "Some" others
What if they all want to do a puppet run?
•  What about every hour?
•  What about every five minutes?
Randomize your cron jobs! (and test it)
How can you shed load on the server?
Anycast is fun
Anycast is "coarse-grain" load balancing
•  Routes traffic to the “nearest”, “serving” cluster
Networks break
•  Physical issues
•  Routing issues
•  Configuration issues
•  Load balancer bugs
Anycast monitoring is hard
Anycast directed to one site is not fun
Anycast directed to one site is not fun
All clients could be sent to the same cluster
•  Be ready for that
•  Can a single cluster handle worldwide traffic?
•  What do you do if it can't?
Have a mitigation strategy to shed load
●  Include load calculations early in health checks
●  Consider DNS views to redirect some traffic
●  Drop traffic if you have to
Diversity is good...for people
Be ruthless against platform diversity
If you can’t automate it, don’t do it
●  “Could we bring up another 50 today, please?”
●  “That backend was just a little different and...oops”
Anycast helps you be consistent
●  Traffic could go anywhere
Every OS upgrade is a time to refactor and clean
Questions?
Gordon Rowell
gordonr@google.com

More Related Content

More from Puppet

Enforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automationEnforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automationPuppet
 
Keynote: Puppet camp compliance
Keynote: Puppet camp complianceKeynote: Puppet camp compliance
Keynote: Puppet camp compliancePuppet
 
Automating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNowAutomating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNowPuppet
 
Puppet: The best way to harden Windows
Puppet: The best way to harden WindowsPuppet: The best way to harden Windows
Puppet: The best way to harden WindowsPuppet
 
Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020Puppet
 
Accelerating azure adoption with puppet
Accelerating azure adoption with puppetAccelerating azure adoption with puppet
Accelerating azure adoption with puppetPuppet
 
Puppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael PinsonPuppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael PinsonPuppet
 
ServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkPuppet
 
Take control of your dev ops dumping ground
Take control of your  dev ops dumping groundTake control of your  dev ops dumping ground
Take control of your dev ops dumping groundPuppet
 
100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy Software100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy SoftwarePuppet
 
Puppet User Group
Puppet User GroupPuppet User Group
Puppet User GroupPuppet
 
Continuous Compliance and DevSecOps
Continuous Compliance and DevSecOpsContinuous Compliance and DevSecOps
Continuous Compliance and DevSecOpsPuppet
 
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick MaludyThe Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick MaludyPuppet
 
ServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkPuppet
 
Puppet in k8s, Miroslav Hadzhiev
Puppet in k8s, Miroslav HadzhievPuppet in k8s, Miroslav Hadzhiev
Puppet in k8s, Miroslav HadzhievPuppet
 
Bolt on Windows - James Pogran
Bolt on Windows - James PogranBolt on Windows - James Pogran
Bolt on Windows - James PogranPuppet
 
The Business Value of Modernizing your Windows Infrastructure and Bringing Li...
The Business Value of Modernizing your Windows Infrastructure and Bringing Li...The Business Value of Modernizing your Windows Infrastructure and Bringing Li...
The Business Value of Modernizing your Windows Infrastructure and Bringing Li...Puppet
 
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020Puppet
 
Navigating the new normal with self healing infrastructure automation
Navigating the new normal with self healing infrastructure automationNavigating the new normal with self healing infrastructure automation
Navigating the new normal with self healing infrastructure automationPuppet
 
Take control of your DevOps Dumping Ground; Melissa Sussmann
Take control of your DevOps Dumping Ground; Melissa SussmannTake control of your DevOps Dumping Ground; Melissa Sussmann
Take control of your DevOps Dumping Ground; Melissa SussmannPuppet
 

More from Puppet (20)

Enforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automationEnforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automation
 
Keynote: Puppet camp compliance
Keynote: Puppet camp complianceKeynote: Puppet camp compliance
Keynote: Puppet camp compliance
 
Automating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNowAutomating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNow
 
Puppet: The best way to harden Windows
Puppet: The best way to harden WindowsPuppet: The best way to harden Windows
Puppet: The best way to harden Windows
 
Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020
 
Accelerating azure adoption with puppet
Accelerating azure adoption with puppetAccelerating azure adoption with puppet
Accelerating azure adoption with puppet
 
Puppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael PinsonPuppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael Pinson
 
ServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin Reeuwijk
 
Take control of your dev ops dumping ground
Take control of your  dev ops dumping groundTake control of your  dev ops dumping ground
Take control of your dev ops dumping ground
 
100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy Software100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy Software
 
Puppet User Group
Puppet User GroupPuppet User Group
Puppet User Group
 
Continuous Compliance and DevSecOps
Continuous Compliance and DevSecOpsContinuous Compliance and DevSecOps
Continuous Compliance and DevSecOps
 
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick MaludyThe Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
 
ServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin Reeuwijk
 
Puppet in k8s, Miroslav Hadzhiev
Puppet in k8s, Miroslav HadzhievPuppet in k8s, Miroslav Hadzhiev
Puppet in k8s, Miroslav Hadzhiev
 
Bolt on Windows - James Pogran
Bolt on Windows - James PogranBolt on Windows - James Pogran
Bolt on Windows - James Pogran
 
The Business Value of Modernizing your Windows Infrastructure and Bringing Li...
The Business Value of Modernizing your Windows Infrastructure and Bringing Li...The Business Value of Modernizing your Windows Infrastructure and Bringing Li...
The Business Value of Modernizing your Windows Infrastructure and Bringing Li...
 
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020
 
Navigating the new normal with self healing infrastructure automation
Navigating the new normal with self healing infrastructure automationNavigating the new normal with self healing infrastructure automation
Navigating the new normal with self healing infrastructure automation
 
Take control of your DevOps Dumping Ground; Melissa Sussmann
Take control of your DevOps Dumping Ground; Melissa SussmannTake control of your DevOps Dumping Ground; Melissa Sussmann
Take control of your DevOps Dumping Ground; Melissa Sussmann
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Why Did We Think Large Scale Distributed Systems Would be Easy? - PuppetConf 2013

  • 1. Why did we think large scale distributed systems would be easy? Gordon Rowell PuppetConf San Francisco 2013 gordonr@google.com
  • 2. Background Site Reliability Engineering runs many services The same rules always apply: ●  Make the service scale ●  Make the deployment consistent ●  Understand all layers of the system ●  Monitor everything ●  Plan for failure ●  Break things, under controlled conditions
  • 3. Scaling is fun We don't deploy "a server" •  Servers break, power fails •  Clients/DNS need to be reconfigured We don't deploy "a cluster" •  Networks break, servers break, power fails •  Clients/DNS need to be reconfigured We deploy redundant clusters •  Attempt to send clients to nearest serving cluster •  Anycast allows for unified client configuration
  • 4. But client DoS is not Poorly written code... ●  on small numbers of clients... ●  is annoying Poorly written code... ●  on a huge number of clients... ●  can cause serious infrastructure pain Write good code and stage your releases ●  Work with the service owners ●  Stage rollouts, allow soak time ●  Have a rollback plan for clients and test it ●  Have DoS limits for services, test them
  • 5. Load balancing is fun Do you have enough capacity? •  How many backends do you need? •  What happens if half of your backends lose power? •  What about when half are already out for repairs? How do you send clients to the right cluster? •  Client configuration •  DNS round-robin (simple global load balancing) •  DNS views (give best answer for client IP) •  Anycast (portable IP, routed to "nearest" cluster) •  Consider: DNS views plus Anycast
  • 6. But global outages are not Monitor everything ●  Health check failures bring down your service ●  ...by design Test everything ●  You should expect (and test) data center outages ●  A global outage can ruin your day ●  Cascading failures are unpleasant Learn from outages ●  Write postmortems ●  Focus on the facts! ●  What went wrong and what can be better? ●  A postmortem is not about blame
  • 7. Thundering herds are not For Puppet •  "Lots" of Mac desktops and laptops •  "Lots" of Ubuntu desktops, laptops and servers •  "Some" others What if they all want to do a puppet run? •  What about every hour? •  What about every five minutes? Randomize your cron jobs! (and test it) How can you shed load on the server?
  • 8. Anycast is fun Anycast is "coarse-grain" load balancing •  Routes traffic to the “nearest”, “serving” cluster Networks break •  Physical issues •  Routing issues •  Configuration issues •  Load balancer bugs Anycast monitoring is hard
  • 9. Anycast directed to one site is not fun
  • 10. Anycast directed to one site is not fun All clients could be sent to the same cluster •  Be ready for that •  Can a single cluster handle worldwide traffic? •  What do you do if it can't? Have a mitigation strategy to shed load ●  Include load calculations early in health checks ●  Consider DNS views to redirect some traffic ●  Drop traffic if you have to
  • 11. Diversity is good...for people Be ruthless against platform diversity If you can’t automate it, don’t do it ●  “Could we bring up another 50 today, please?” ●  “That backend was just a little different and...oops” Anycast helps you be consistent ●  Traffic could go anywhere Every OS upgrade is a time to refactor and clean