SlideShare a Scribd company logo
1 of 45
Download to read offline
May 16-17 2018
Mike Fowler, Senior Site Reliability Engineer
Leveraging Automation for a Disposable Infrastructure
Senior Site Reliability Engineer in the Public Cloud
Practice
Background in Software & Systems Engineering,
System & Database Administration
Contributed to PostgreSQL, Terraform & YAWL
PostgreSQL evangelist
May 16-17 2018
About Me
So I like to think I know Data...
May 16-17 2018
The story, all names, characters, and incidents
portrayed in this production are fictitious. No
identification with actual persons (living or
deceased), places, buildings, and products is
intended or should be inferred.
Franchise coffee shops
Our hero, a lowly Head of Systems Engineering is
faced with the epic quest of moving to the cloud
May 16-17 2018
Our Hero’s Epic Quest
Use cloud as spare/batch capacity
Duplicate existing estate in the cloud
Brave New World
- Greenfield development
- “Version 2.0”
May 16-17 2018
Approaching Cloud Migration
Direct mapping of existing infrastructure to the cloud
- Load balancers become Elastic Load Balancers
- SANs become Buckets or Elastic File Systems
Minimal operational change required
- Everything is the same just in a new location
Perceived as a “quick win” to cloud adoption
- Little AWS/GCP/Azure specific knowledge required
May 16-17 2018
The Appeal of a Lift & Shift
We’re changing only where our hardware is
- Operationally no different then the past
- Instance size based on current hardware size
- No change to deployment process
Under utilisation of resource
- Still paying for excess capacity
Stunted scalability
- We can throw more virtual hardware at it
- Add additional node behind load balancers
May 16-17 2018
The Penalty of a Lift & Shift
Our hero has a new CTO
Recognises that we’re just moving our problems
“We’re under-investing in the future”
May 16-17 2018
Brave New World
No “legacy” baggage
Free reign for experimentation
Perceived as a “low risk” path to cloud adoption
- If it doesn’t work, switch it off
- “No risk” to existing production environment
May 16-17 2018
The Appeal of a Brave New World
Organisationally isolated
- Limited impact to existing practices
- Leads to a “Us vs. Them” mentality
Focus is usually on application functionality with infrastructure seen as a necessity
Project has a high risk of failure
- Care free scoping leads to an unfocused project
- Significant time can be lost to integrating with the old world
May 16-17 2018
The Penalty of a Brave New World
Are we just building a traditional but virtual data centre?
- Lift & Shift is operationally the same
- Brave New World isn’t part of the Real World
How are we leveraging the power of a dynamic infrastructure?
Our infrastructure is scalable, but is the application?
May 16-17 2018
Are we really “doing cloud”?
This is not a new problem
How do we move on from our
comfortable past?
May 16-17 2018
Breaking the Mould
Conway’s law states you’re doomed to design your
organisational structure
May 16-17 2018
● Conway’s Law:
“Organisations which design
systems … are constrained to
produce designs which are copies
of the communication structures of
these organisations”
- Melvin Conway, 1967
Breaking the Mould
Scaling of software isn’t just the same elements
bigger, it’s an increase in different elements that
interact in a nonlinear fashion. Complexity of the
whole increases much more than lineraly.
May 16-17 2018
● No Silver Bullet:
“A scaling-up of a software entity is
not merely a repetition of the same
elements in larger size; it is
necessarily an increase in the
number of different elements. In
most cases, the elements interact
with each other in some nonlinear
fashion, and the complexity of the
whole increases much more than
linearly.”
- Fred Brooks Jr., 1986
Breaking the Mould
Applying existing patterns at best misses out on possible improvements with new
technology and at worst it adds more complexity.
May 16-17 2018
● Infrastructure as Code
“In many cases, applying existing
patterns will, at best, miss out on
opportunities to leverage newer
technology to simplify and
improve the architecture. At
worst, replicating existing
patterns with the newer platforms
will involve adding even more
complexity.”
-Kief Morris, 2016
Breaking the Mould
Systems should work correctly even in the face of adversity
May 16-17 2018
● Designing Data-Intensive Applications:
“The system should continue to work
correctly (performing the correct
function at the desired level of
performance) even in the face of
adversity (hardware or software faults,
and even human error).”
- Martin Kleppmann, 2017
Breaking the Mould
Our hero needs a different approach
May 16-17 2018
●
●
A Different Approach
●
●
The more you care about individual
things the more they will hold your
attention
In a truly scalable environment you
should only care about the combination of
many individual things
May 16-17 2018
Attitude
The attitude you have to your
environment will determine the
limits of your scalability
●
You treat your servers like pets
- You give them names (igloo, husky, snowshoe)
- You give them homes (racks on site or co-located)
- If they fail, you do everything you can to save them
Every server is an investment
- Often the best hardware that can be afforded
- Amortised over years
- Excess capacity to allow for growth
Provisioning new servers takes weeks
May 16-17 2018
Attitude: Living in the Iron Age
You treat your servers like cattle
- They have identifiers
- You care only where they are geographically
- If they fail, you put them down and get a new one
Your architecture is your investment
- Configuration is chosen for your current load
- Pay for what you use
- Capacity can be added when required
Provisioning new servers takes seconds
May 16-17 2018
Attitude: Living in the Cloud Age
Are we simply herding our pets?
- In a Lift & Shift this is almost certainly so
- Scaling groups is a start but it is not the end
How are we managing our virtual servers?
- Complex cloud-init scripts?
- Traditional configuration management?
May 16-17 2018
Attitude: Is Pets v Cattle enough?
vs
Everything is a package and can be discarded
You treat your servers like single use products
- They’re pre-packaged for a particular purpose
- If they fail, you toss it away and grab another
You automate everything
Never make a manual change
May 16-17 2018
Attitude: The Disposable Infrastructure
(slide 1 of 2)
Repeatability brings reliability and predictability
Defining a build pipeline:
- Ensures the same process is followed for every change
- Provides an audit trail for every change
- Gives visibility of your value stream
May 16-17 2018
Be Continuous
Continuous integration and
delivery is a must
(slide 2 of 2)
Your developers probably already practice CI
- It is the standard for code development
- The output of CI can be the start of CD
Continuous delivery doesn’t have to mean continuous deployment
- Build pipelines can have approval stages
- Every change should be deployable
May 16-17 2018
Be Continuous
Continuous integration and
delivery is a must
Many applications expect a static infrastructure
- Hard-coded assumptions that an IP address won’t change once an application is
started
Many applications are cluster unaware
- Sticky sessions on load balancers can help
- Some protocols don’t load balance well
May 16-17 2018
Refactoring to the Cloud
Your applications need to be
(re)built to fit a dynamic
infrastructure
Refactor to contemporary architectural approaches
- Service Oriented Architectures & Microservices
- Transition from stateful services to stateless
Package everything using distribution packagers
- The output of your build pipeline is a RPM/DEB
- Your $CM_TOOL already supports this
Chose a deployment strategy
-Machine images vs. containers
May 16-17 2018
Adopting Contemporary Approaches
Fear not vendor lock in, savings are to be reaped leveraging commodity services
Use SQS instead of automating the installation and configuration of a message
broker and accepting the operational burden of maintaining it
Careful abstraction of the API will allow porting to a different platform if absolutely
necessary
May 16-17 2018
Fear not Vendor Lock-In
(slide 1/2)
Design the infrastructure in parallel to the cloud aware application changes
Mandate every instance is part of a scaling group to enforce cluster awareness
Use the same principles for infrastructure development as you use for applications
May 16-17 2018
Infrastructure is Code
Dynamic infrastructure must
be treated as a first class
citizen in any cloud project
(slide 2/2)
Script/encode everything unless there is no API/tooling support
Deploy the same infrastructure in development, test and production environments
- Sizing can be parameterised
Your deployment pipeline becomes the assembly of application packages and
infrastructure configuration
High cohesion and loose coupling applies to infrastructure as much as it does to
applications
May 16-17 2018
Infrastructure is Code
Dynamic infrastructure must
be treated as a first class
citizen in any cloud project
If it can go wrong, it will go wrong so
think in terms of when and not if
Treating our infrastructure and its hosted
applications as disposable in conjunction
with CD eliminates a number of failure
scenarios
May 16-17 2018
Planning to fail
Planning to fail will lead to
success
(slide 1/3)
Regularly test your disposability
- Terminate instances at random to ensure resiliency
- Block all network access to an instance
- Chaos Monkey & the Simian Army
- Trigger failovers for less disposable services
Constantly churning disposable instances helps prevent configuration drift
May 16-17 2018
Planning to fail
(slide 2/3)
Availability and durability cost
Identify points of failure and assess:
- How often will this failure occur?
- How do I mitigate this failure?
- How do I test this failure to ensure mitigation?
- Is the cost of mitigation worth the customer impact during failure?
May 16-17 2018
Planning to fail
(slide 3/3)
Be honest in assessing the worth of your business
- Do you really need to double your costs to run in multiple regions?
- Trello, Slack & many other high profile companies – including Amazon - were
affected by the S3 outage
May 16-17 2018
Planning to fail
Test the durability of your data
- User error is your biggest risk
- - “I forgot the WHERE clause”
- - “I thought I was in the test environment”
Regularly exercise data loss & recovery scenarios in development and test
environments
Make back-ups and regularly test they restore
- Consider storing backups in both S3 & Google
- Store backups in multiple regions
If you don’t want a full ELK stack at least ship log files to CloudWatch or Stackdriver
May 16-17 2018
Data is not Disposable
Data is not disposable and is
probably more important
than your availability
Multiple backup strategies, all failed
Multiple failures, same engineers, too much pressure, too tired, mistakes made
May 16-17 2018
https://about.gitlab.com/2017/02/10/postmortem-of-database-
outage-of-january-31/
A Lesson to Learn From
Jenkins solves all our problems!
AWS solves all our problems!
Docker solves all our problems!
Kubernetes solves all our problems!
May 16-17 2018
Tooling is Not The Answer
Tooling is not the answer
but it is part of an
automated solution
Let us assume we have a front end web application which places orders in a queue for
subsequent asynchronous fulfilment by a separate application backed by a database.
We’ve already refactored our applications for the cloud.
We will have a CI pipeline for the applications, the output being AMI images
A separate CD pipeline executes infrastructure code and rolls out the new AMIs
Goal is to promote infrastructure and AMIs between environments
May 16-17 2018
Remember Our Hero?
Can create many different machine images
Consider creating a base image to control OS updates
Use normal configuration management tools
- Support for Ansible, Chef & Puppet
- Can just write shell script if you must
Use placeholders for configuration to be filled by launch scripts
May 16-17 2018
https://packer.io
Packer
Source our code from a repo, build and test
Package our application as a DEB or RPM
Place our artifact into a S3 repository
Run Packer to generate a new AMI
May 16-17 2018
Application Pipeline
Declarative language for the construction of infrastructure
Supports all major vendors
State can be stored in buckets to facilitate sharing
Separate out infrastructure layers
- Minimises blast radius of changes
- Keep persistent apart from disposable
May 16-17 2018
https://terraform.io
Terraform
Triggered by new AMIs or Terraform code changes
Apply Terraform to update the infrastructure
Run integration tests to verify application build
Wait for approval before promotion to next environment
May 16-17 2018
Infrastructure Pipeline
Any instance can be terminated
Resilient to zone failure
Cross-region read replica allows DR for region failure
- Just need to run Terraform in the region to add the instances when required and
update Route 53
May 16-17 2018
Deployed Infrastructure
May 16-17 2018
● Have attitude
● Be continuous
● Refactor to the Cloud
● Infrastructure is code
● Plan to fail
● Data is King
● Tooling is not The Answer
Summary
May 16-17 2018
Questions?
Mike Fowler
gh-mlfowler
mlfowler
mike dot fowler at claranet dot uk
Leveraging Automation for a Disposable Infrastructure

More Related Content

Similar to Leveraging Automation for a Disposable Infrastructure

The Future of Infrastructure: Key Trends to consider
The Future of Infrastructure: Key Trends to considerThe Future of Infrastructure: Key Trends to consider
The Future of Infrastructure: Key Trends to considerCapgemini
 
CWIN16 UK Event - The Future of Infrastructure
CWIN16 UK Event - The Future of Infrastructure CWIN16 UK Event - The Future of Infrastructure
CWIN16 UK Event - The Future of Infrastructure Gunnar Menzel
 
Datadog: From a single product to a growing platform by Alexis Lê-Quôc, CTO
Datadog: From a single product to a growing platform by Alexis Lê-Quôc, CTODatadog: From a single product to a growing platform by Alexis Lê-Quôc, CTO
Datadog: From a single product to a growing platform by Alexis Lê-Quôc, CTOTheFamily
 
Aginity "Big Data" Research Lab
Aginity "Big Data" Research LabAginity "Big Data" Research Lab
Aginity "Big Data" Research Labkevinflorian
 
10 years of microservices at finn.no - why is that dragon still here (ndc o...
10 years of microservices at finn.no  - why is that dragon still here  (ndc o...10 years of microservices at finn.no  - why is that dragon still here  (ndc o...
10 years of microservices at finn.no - why is that dragon still here (ndc o...Henning Spjelkavik
 
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...Aerospike
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
How to migrate large project from Angular to React
How to migrate large project from Angular to ReactHow to migrate large project from Angular to React
How to migrate large project from Angular to ReactTomasz Bak
 
At the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackAt the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackRyan Aydelott
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data PlatformDani Solà Lagares
 
Research data management 1.5
Research data management 1.5Research data management 1.5
Research data management 1.5John Martin
 
Distilling the monolith to microservices journey at CMG
Distilling the monolith to microservices journey at CMGDistilling the monolith to microservices journey at CMG
Distilling the monolith to microservices journey at CMGBuchi Reddy Busi Reddy
 
Openbar 2 - Leuven - Faros - Invisible Infrastructure
Openbar 2 - Leuven - Faros - Invisible InfrastructureOpenbar 2 - Leuven - Faros - Invisible Infrastructure
Openbar 2 - Leuven - Faros - Invisible InfrastructureOpenbar
 
2019 Performance Monitoring and Management Trends and Insights
2019 Performance Monitoring and Management Trends and Insights2019 Performance Monitoring and Management Trends and Insights
2019 Performance Monitoring and Management Trends and InsightsOpsRamp
 
ADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise AnalyticsADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise AnalyticsDATAVERSITY
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonSynerzip
 
Monitoring End User Experiences with New Relic & Splunk
Monitoring End User Experiences with New Relic & SplunkMonitoring End User Experiences with New Relic & Splunk
Monitoring End User Experiences with New Relic & SplunkAbner Germanow
 

Similar to Leveraging Automation for a Disposable Infrastructure (20)

The Future of Infrastructure: Key Trends to consider
The Future of Infrastructure: Key Trends to considerThe Future of Infrastructure: Key Trends to consider
The Future of Infrastructure: Key Trends to consider
 
CWIN16 UK Event - The Future of Infrastructure
CWIN16 UK Event - The Future of Infrastructure CWIN16 UK Event - The Future of Infrastructure
CWIN16 UK Event - The Future of Infrastructure
 
Datadog: From a single product to a growing platform by Alexis Lê-Quôc, CTO
Datadog: From a single product to a growing platform by Alexis Lê-Quôc, CTODatadog: From a single product to a growing platform by Alexis Lê-Quôc, CTO
Datadog: From a single product to a growing platform by Alexis Lê-Quôc, CTO
 
Aginity "Big Data" Research Lab
Aginity "Big Data" Research LabAginity "Big Data" Research Lab
Aginity "Big Data" Research Lab
 
10 years of microservices at finn.no - why is that dragon still here (ndc o...
10 years of microservices at finn.no  - why is that dragon still here  (ndc o...10 years of microservices at finn.no  - why is that dragon still here  (ndc o...
10 years of microservices at finn.no - why is that dragon still here (ndc o...
 
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...
Theresa Melvin, HP Enterprise - IOT/AI/ML at Hyperscale - how to go faster wi...
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
How to migrate large project from Angular to React
How to migrate large project from Angular to ReactHow to migrate large project from Angular to React
How to migrate large project from Angular to React
 
At the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackAt the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with Openstack
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
Research data management 1.5
Research data management 1.5Research data management 1.5
Research data management 1.5
 
Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a Detail
 
Distilling the monolith to microservices journey at CMG
Distilling the monolith to microservices journey at CMGDistilling the monolith to microservices journey at CMG
Distilling the monolith to microservices journey at CMG
 
Openbar 2 - Leuven - Faros - Invisible Infrastructure
Openbar 2 - Leuven - Faros - Invisible InfrastructureOpenbar 2 - Leuven - Faros - Invisible Infrastructure
Openbar 2 - Leuven - Faros - Invisible Infrastructure
 
2019 Performance Monitoring and Management Trends and Insights
2019 Performance Monitoring and Management Trends and Insights2019 Performance Monitoring and Management Trends and Insights
2019 Performance Monitoring and Management Trends and Insights
 
ADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise AnalyticsADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise Analytics
 
SAP vs SAS - Comparison
SAP vs SAS - ComparisonSAP vs SAS - Comparison
SAP vs SAS - Comparison
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
 
Monitoring End User Experiences with New Relic & Splunk
Monitoring End User Experiences with New Relic & SplunkMonitoring End User Experiences with New Relic & Splunk
Monitoring End User Experiences with New Relic & Splunk
 

More from Mike Fowler

From Warehouses to Lakes: The Value of Streams
From Warehouses to Lakes: The Value of StreamsFrom Warehouses to Lakes: The Value of Streams
From Warehouses to Lakes: The Value of StreamsMike Fowler
 
From Warehouses to Lakes: The Value of Streams
From Warehouses to Lakes: The Value of StreamsFrom Warehouses to Lakes: The Value of Streams
From Warehouses to Lakes: The Value of StreamsMike Fowler
 
Getting Started with Machine Learning on AWS
Getting Started with Machine Learning on AWSGetting Started with Machine Learning on AWS
Getting Started with Machine Learning on AWSMike Fowler
 
Building with Firebase
Building with FirebaseBuilding with Firebase
Building with FirebaseMike Fowler
 
Reducing Pager Fatigue Using a Serverless ML Bot
Reducing Pager Fatigue Using a Serverless ML BotReducing Pager Fatigue Using a Serverless ML Bot
Reducing Pager Fatigue Using a Serverless ML BotMike Fowler
 
Getting started with Machine Learning
Getting started with Machine LearningGetting started with Machine Learning
Getting started with Machine LearningMike Fowler
 
Migrating with Debezium
Migrating with DebeziumMigrating with Debezium
Migrating with DebeziumMike Fowler
 
Migrating PostgreSQL to the Cloud
Migrating PostgreSQL to the CloudMigrating PostgreSQL to the Cloud
Migrating PostgreSQL to the CloudMike Fowler
 
Shaping Clouds with Terraform
Shaping Clouds with TerraformShaping Clouds with Terraform
Shaping Clouds with TerraformMike Fowler
 
Elephants in the Cloud
Elephants in the CloudElephants in the Cloud
Elephants in the CloudMike Fowler
 
Google Cloud & Your Data
Google Cloud & Your DataGoogle Cloud & Your Data
Google Cloud & Your DataMike Fowler
 
Hosted PostgreSQL
Hosted PostgreSQLHosted PostgreSQL
Hosted PostgreSQLMike Fowler
 
Disposable infrastructure
Disposable infrastructureDisposable infrastructure
Disposable infrastructureMike Fowler
 
Fun Things to do with Logical Decoding
Fun Things to do with Logical DecodingFun Things to do with Logical Decoding
Fun Things to do with Logical DecodingMike Fowler
 
Handling XML and JSON in the Database
Handling XML and JSON in the DatabaseHandling XML and JSON in the Database
Handling XML and JSON in the DatabaseMike Fowler
 
Migrating Rant & Rave to PostgreSQL
Migrating Rant & Rave to PostgreSQLMigrating Rant & Rave to PostgreSQL
Migrating Rant & Rave to PostgreSQLMike Fowler
 

More from Mike Fowler (16)

From Warehouses to Lakes: The Value of Streams
From Warehouses to Lakes: The Value of StreamsFrom Warehouses to Lakes: The Value of Streams
From Warehouses to Lakes: The Value of Streams
 
From Warehouses to Lakes: The Value of Streams
From Warehouses to Lakes: The Value of StreamsFrom Warehouses to Lakes: The Value of Streams
From Warehouses to Lakes: The Value of Streams
 
Getting Started with Machine Learning on AWS
Getting Started with Machine Learning on AWSGetting Started with Machine Learning on AWS
Getting Started with Machine Learning on AWS
 
Building with Firebase
Building with FirebaseBuilding with Firebase
Building with Firebase
 
Reducing Pager Fatigue Using a Serverless ML Bot
Reducing Pager Fatigue Using a Serverless ML BotReducing Pager Fatigue Using a Serverless ML Bot
Reducing Pager Fatigue Using a Serverless ML Bot
 
Getting started with Machine Learning
Getting started with Machine LearningGetting started with Machine Learning
Getting started with Machine Learning
 
Migrating with Debezium
Migrating with DebeziumMigrating with Debezium
Migrating with Debezium
 
Migrating PostgreSQL to the Cloud
Migrating PostgreSQL to the CloudMigrating PostgreSQL to the Cloud
Migrating PostgreSQL to the Cloud
 
Shaping Clouds with Terraform
Shaping Clouds with TerraformShaping Clouds with Terraform
Shaping Clouds with Terraform
 
Elephants in the Cloud
Elephants in the CloudElephants in the Cloud
Elephants in the Cloud
 
Google Cloud & Your Data
Google Cloud & Your DataGoogle Cloud & Your Data
Google Cloud & Your Data
 
Hosted PostgreSQL
Hosted PostgreSQLHosted PostgreSQL
Hosted PostgreSQL
 
Disposable infrastructure
Disposable infrastructureDisposable infrastructure
Disposable infrastructure
 
Fun Things to do with Logical Decoding
Fun Things to do with Logical DecodingFun Things to do with Logical Decoding
Fun Things to do with Logical Decoding
 
Handling XML and JSON in the Database
Handling XML and JSON in the DatabaseHandling XML and JSON in the Database
Handling XML and JSON in the Database
 
Migrating Rant & Rave to PostgreSQL
Migrating Rant & Rave to PostgreSQLMigrating Rant & Rave to PostgreSQL
Migrating Rant & Rave to PostgreSQL
 

Recently uploaded

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Leveraging Automation for a Disposable Infrastructure

  • 1. May 16-17 2018 Mike Fowler, Senior Site Reliability Engineer Leveraging Automation for a Disposable Infrastructure
  • 2. Senior Site Reliability Engineer in the Public Cloud Practice Background in Software & Systems Engineering, System & Database Administration Contributed to PostgreSQL, Terraform & YAWL PostgreSQL evangelist May 16-17 2018 About Me
  • 3. So I like to think I know Data... May 16-17 2018
  • 4. The story, all names, characters, and incidents portrayed in this production are fictitious. No identification with actual persons (living or deceased), places, buildings, and products is intended or should be inferred. Franchise coffee shops Our hero, a lowly Head of Systems Engineering is faced with the epic quest of moving to the cloud May 16-17 2018 Our Hero’s Epic Quest
  • 5. Use cloud as spare/batch capacity Duplicate existing estate in the cloud Brave New World - Greenfield development - “Version 2.0” May 16-17 2018 Approaching Cloud Migration
  • 6. Direct mapping of existing infrastructure to the cloud - Load balancers become Elastic Load Balancers - SANs become Buckets or Elastic File Systems Minimal operational change required - Everything is the same just in a new location Perceived as a “quick win” to cloud adoption - Little AWS/GCP/Azure specific knowledge required May 16-17 2018 The Appeal of a Lift & Shift
  • 7. We’re changing only where our hardware is - Operationally no different then the past - Instance size based on current hardware size - No change to deployment process Under utilisation of resource - Still paying for excess capacity Stunted scalability - We can throw more virtual hardware at it - Add additional node behind load balancers May 16-17 2018 The Penalty of a Lift & Shift
  • 8. Our hero has a new CTO Recognises that we’re just moving our problems “We’re under-investing in the future” May 16-17 2018 Brave New World
  • 9. No “legacy” baggage Free reign for experimentation Perceived as a “low risk” path to cloud adoption - If it doesn’t work, switch it off - “No risk” to existing production environment May 16-17 2018 The Appeal of a Brave New World
  • 10. Organisationally isolated - Limited impact to existing practices - Leads to a “Us vs. Them” mentality Focus is usually on application functionality with infrastructure seen as a necessity Project has a high risk of failure - Care free scoping leads to an unfocused project - Significant time can be lost to integrating with the old world May 16-17 2018 The Penalty of a Brave New World
  • 11. Are we just building a traditional but virtual data centre? - Lift & Shift is operationally the same - Brave New World isn’t part of the Real World How are we leveraging the power of a dynamic infrastructure? Our infrastructure is scalable, but is the application? May 16-17 2018 Are we really “doing cloud”?
  • 12. This is not a new problem How do we move on from our comfortable past? May 16-17 2018 Breaking the Mould
  • 13. Conway’s law states you’re doomed to design your organisational structure May 16-17 2018 ● Conway’s Law: “Organisations which design systems … are constrained to produce designs which are copies of the communication structures of these organisations” - Melvin Conway, 1967 Breaking the Mould
  • 14. Scaling of software isn’t just the same elements bigger, it’s an increase in different elements that interact in a nonlinear fashion. Complexity of the whole increases much more than lineraly. May 16-17 2018 ● No Silver Bullet: “A scaling-up of a software entity is not merely a repetition of the same elements in larger size; it is necessarily an increase in the number of different elements. In most cases, the elements interact with each other in some nonlinear fashion, and the complexity of the whole increases much more than linearly.” - Fred Brooks Jr., 1986 Breaking the Mould
  • 15. Applying existing patterns at best misses out on possible improvements with new technology and at worst it adds more complexity. May 16-17 2018 ● Infrastructure as Code “In many cases, applying existing patterns will, at best, miss out on opportunities to leverage newer technology to simplify and improve the architecture. At worst, replicating existing patterns with the newer platforms will involve adding even more complexity.” -Kief Morris, 2016 Breaking the Mould
  • 16. Systems should work correctly even in the face of adversity May 16-17 2018 ● Designing Data-Intensive Applications: “The system should continue to work correctly (performing the correct function at the desired level of performance) even in the face of adversity (hardware or software faults, and even human error).” - Martin Kleppmann, 2017 Breaking the Mould
  • 17. Our hero needs a different approach May 16-17 2018 ● ● A Different Approach ● ●
  • 18. The more you care about individual things the more they will hold your attention In a truly scalable environment you should only care about the combination of many individual things May 16-17 2018 Attitude The attitude you have to your environment will determine the limits of your scalability ●
  • 19. You treat your servers like pets - You give them names (igloo, husky, snowshoe) - You give them homes (racks on site or co-located) - If they fail, you do everything you can to save them Every server is an investment - Often the best hardware that can be afforded - Amortised over years - Excess capacity to allow for growth Provisioning new servers takes weeks May 16-17 2018 Attitude: Living in the Iron Age
  • 20. You treat your servers like cattle - They have identifiers - You care only where they are geographically - If they fail, you put them down and get a new one Your architecture is your investment - Configuration is chosen for your current load - Pay for what you use - Capacity can be added when required Provisioning new servers takes seconds May 16-17 2018 Attitude: Living in the Cloud Age
  • 21. Are we simply herding our pets? - In a Lift & Shift this is almost certainly so - Scaling groups is a start but it is not the end How are we managing our virtual servers? - Complex cloud-init scripts? - Traditional configuration management? May 16-17 2018 Attitude: Is Pets v Cattle enough? vs
  • 22. Everything is a package and can be discarded You treat your servers like single use products - They’re pre-packaged for a particular purpose - If they fail, you toss it away and grab another You automate everything Never make a manual change May 16-17 2018 Attitude: The Disposable Infrastructure
  • 23. (slide 1 of 2) Repeatability brings reliability and predictability Defining a build pipeline: - Ensures the same process is followed for every change - Provides an audit trail for every change - Gives visibility of your value stream May 16-17 2018 Be Continuous Continuous integration and delivery is a must
  • 24. (slide 2 of 2) Your developers probably already practice CI - It is the standard for code development - The output of CI can be the start of CD Continuous delivery doesn’t have to mean continuous deployment - Build pipelines can have approval stages - Every change should be deployable May 16-17 2018 Be Continuous Continuous integration and delivery is a must
  • 25. Many applications expect a static infrastructure - Hard-coded assumptions that an IP address won’t change once an application is started Many applications are cluster unaware - Sticky sessions on load balancers can help - Some protocols don’t load balance well May 16-17 2018 Refactoring to the Cloud Your applications need to be (re)built to fit a dynamic infrastructure
  • 26. Refactor to contemporary architectural approaches - Service Oriented Architectures & Microservices - Transition from stateful services to stateless Package everything using distribution packagers - The output of your build pipeline is a RPM/DEB - Your $CM_TOOL already supports this Chose a deployment strategy -Machine images vs. containers May 16-17 2018 Adopting Contemporary Approaches
  • 27. Fear not vendor lock in, savings are to be reaped leveraging commodity services Use SQS instead of automating the installation and configuration of a message broker and accepting the operational burden of maintaining it Careful abstraction of the API will allow porting to a different platform if absolutely necessary May 16-17 2018 Fear not Vendor Lock-In
  • 28. (slide 1/2) Design the infrastructure in parallel to the cloud aware application changes Mandate every instance is part of a scaling group to enforce cluster awareness Use the same principles for infrastructure development as you use for applications May 16-17 2018 Infrastructure is Code Dynamic infrastructure must be treated as a first class citizen in any cloud project
  • 29. (slide 2/2) Script/encode everything unless there is no API/tooling support Deploy the same infrastructure in development, test and production environments - Sizing can be parameterised Your deployment pipeline becomes the assembly of application packages and infrastructure configuration High cohesion and loose coupling applies to infrastructure as much as it does to applications May 16-17 2018 Infrastructure is Code Dynamic infrastructure must be treated as a first class citizen in any cloud project
  • 30. If it can go wrong, it will go wrong so think in terms of when and not if Treating our infrastructure and its hosted applications as disposable in conjunction with CD eliminates a number of failure scenarios May 16-17 2018 Planning to fail Planning to fail will lead to success
  • 31. (slide 1/3) Regularly test your disposability - Terminate instances at random to ensure resiliency - Block all network access to an instance - Chaos Monkey & the Simian Army - Trigger failovers for less disposable services Constantly churning disposable instances helps prevent configuration drift May 16-17 2018 Planning to fail
  • 32. (slide 2/3) Availability and durability cost Identify points of failure and assess: - How often will this failure occur? - How do I mitigate this failure? - How do I test this failure to ensure mitigation? - Is the cost of mitigation worth the customer impact during failure? May 16-17 2018 Planning to fail
  • 33. (slide 3/3) Be honest in assessing the worth of your business - Do you really need to double your costs to run in multiple regions? - Trello, Slack & many other high profile companies – including Amazon - were affected by the S3 outage May 16-17 2018 Planning to fail
  • 34. Test the durability of your data - User error is your biggest risk - - “I forgot the WHERE clause” - - “I thought I was in the test environment” Regularly exercise data loss & recovery scenarios in development and test environments Make back-ups and regularly test they restore - Consider storing backups in both S3 & Google - Store backups in multiple regions If you don’t want a full ELK stack at least ship log files to CloudWatch or Stackdriver May 16-17 2018 Data is not Disposable Data is not disposable and is probably more important than your availability
  • 35. Multiple backup strategies, all failed Multiple failures, same engineers, too much pressure, too tired, mistakes made May 16-17 2018 https://about.gitlab.com/2017/02/10/postmortem-of-database- outage-of-january-31/ A Lesson to Learn From
  • 36. Jenkins solves all our problems! AWS solves all our problems! Docker solves all our problems! Kubernetes solves all our problems! May 16-17 2018 Tooling is Not The Answer Tooling is not the answer but it is part of an automated solution
  • 37. Let us assume we have a front end web application which places orders in a queue for subsequent asynchronous fulfilment by a separate application backed by a database. We’ve already refactored our applications for the cloud. We will have a CI pipeline for the applications, the output being AMI images A separate CD pipeline executes infrastructure code and rolls out the new AMIs Goal is to promote infrastructure and AMIs between environments May 16-17 2018 Remember Our Hero?
  • 38. Can create many different machine images Consider creating a base image to control OS updates Use normal configuration management tools - Support for Ansible, Chef & Puppet - Can just write shell script if you must Use placeholders for configuration to be filled by launch scripts May 16-17 2018 https://packer.io Packer
  • 39. Source our code from a repo, build and test Package our application as a DEB or RPM Place our artifact into a S3 repository Run Packer to generate a new AMI May 16-17 2018 Application Pipeline
  • 40. Declarative language for the construction of infrastructure Supports all major vendors State can be stored in buckets to facilitate sharing Separate out infrastructure layers - Minimises blast radius of changes - Keep persistent apart from disposable May 16-17 2018 https://terraform.io Terraform
  • 41. Triggered by new AMIs or Terraform code changes Apply Terraform to update the infrastructure Run integration tests to verify application build Wait for approval before promotion to next environment May 16-17 2018 Infrastructure Pipeline
  • 42. Any instance can be terminated Resilient to zone failure Cross-region read replica allows DR for region failure - Just need to run Terraform in the region to add the instances when required and update Route 53 May 16-17 2018 Deployed Infrastructure
  • 43. May 16-17 2018 ● Have attitude ● Be continuous ● Refactor to the Cloud ● Infrastructure is code ● Plan to fail ● Data is King ● Tooling is not The Answer Summary
  • 44. May 16-17 2018 Questions? Mike Fowler gh-mlfowler mlfowler mike dot fowler at claranet dot uk