SlideShare a Scribd company logo
1 of 61
Staying sane with IaC
Dewey Sasser
Principal Consultant
Aligned Software
Contents

Where I'm coming from

IaC

DevOps

Sanity

How To...
About Dewey

Distributed Application Developer for 25 years
− Doing build/release/software process for
about that long
− Accidentally doing DevOps out of self-defense

Wandered into operations about 7 years ago
− Built some private cloud for dev
− Built some private cloud for prod
− Moved to public cloud architecture
Deployment Context

Largest Deployment:
− ~64 application servers
− ~96 MongoDB nodes
− Many postgres, S3, ...
− ~14,000 TPS

Smallest Deployment:
− 1 application server
− 1 DB
− 1-2 TPS
Where did all this come from?

I'm a developer

In doing IaC, I noticed some things really didn't
work well
− Some “developer” assumptions about Ops
didn't work
− Ops was missing some lessons developers
have learned

IaC patterns are a work in progress
− There are definitely some loose ends -- this is
*NOT* the product of 20 years of industry
consensus
Why IaC

Manage vastly increased complexity

Reduce risk

Understand changes before they hit production
− create a low risk location to try changes
− helps developers, too!

DR anyone?
Why do you
care so much
about Sanity?

UPTIME!

Sleep!

Because you're in this for the long run
Dev vs. Ops

Dev
− “I can make this faster!”
− “I can make this better!”
− “This will only take a minute...”
Enthusiasm!
Dev vs. Ops

Ops
− “Don't break anything!”
− “Don't lose data!”
− Alan Shepard's Prayer
Pessimism
DevOps

We know this will work

We can repeat this

We develop to make ops easier

We operate to make coding easier
Confidence!
The Development Cycle

Conceive

Plan

Build
− Develop
− Test

Verify
− Acceptance Test

Deploy

Manage
IaC code is different than program
code
IaC is like program code...

Controlled changes

Build outside of user view

You can have “good code” and “bad code”

There are patterns and anti-patterns
IaC is unlike
program code...

Programs describe HOW. IaC describe WHAT

Behavior is something that happens within the
infrastructure

Infrastructure is more difficult to test than
programs
− Nearly every test is an “integration test”

It's hard to isolate the pieces
OK, let's get to that Sanity Part...
What's important in IaC

IaC is not about speeding up your departure,
it's about speeding up your arrival

You need to map your technical concepts to
your thought processes
Consider your
end goal

You need a running system

But you're going to be changing and deploying
new running systems

And you need to keep it running
Maintenance
Window
Continuous Up-time
Some Guiding Philosophies
Be Declarative!

It's very difficult to reason about the state of an
infrastructure that is managed by procedures
− Procedural thinking makes “is” a 2nd
class
concept

Incidentally, developers seem to love
procedures and hate state declarations
Procedures

Are ultimately necessary

Should be as far down in
the process as possible
− Cloud Formation and
Terraform do this

Should be idempotent
− EnsureWebServer(),
not
createWebServer()
− Run again and again
and again
Bad!
db = createDatabase()
web = createWebServer(db)
addDNS(“www...”, web)

Because if you run it again, you get a 2nd
DB,
web server, ...
Better
db = ensureDBExists()
web = ensureWebExists(db)
ensureDNSName(“www...”, web)
Best
# pesudo-code
resource “DB” {…}
resource “Web” {…; database=db}
resource “DNS” { name=”www...”; IP=web.ip }

Why? Because the tool can understand what
you want and figure out how to get it
− CloudFormation, Terraform, Puppet
Idempotence
Not idempotent:
Idempotent:
startServer() {
nginx -g "/var/run/nginx.pid;"
}
startServer() {
pid=$(cat /var/run/nginx.pid)
if ! processIsRunning $pid ; then
nginx -g "/var/run/nginx.pid;"
fi
}
Make it Modular

Program code has object and functions

IaC should be modular, too
− Re-use
− Easy modification
− Consistent definitions

Make your modules semantically meaningful,
not functionally meaningful
Test and Verify

You wouldn't deliver an untested program

Don't deliver an untested infrastructure
But how?
Testing

You have a monitoring system, right?

Guess what...
− testing “Is” is just the first part of monitoring
− Testing “does” is the 2nd part

This gives you...Test Driven Development!
− You know when you're complete
− You know if something breaks

This means your monitoring system is not an
afterthought, it's a forethought!
Organizing your Code
“Full Stack” Ideal
“Full Stack” Reality
“Full Stack” thinking is
hazardous

Full Stack is difficult to change post deployment

It's difficult to deal with cross-cutting concern
across multiple stacks

Different parts of the stack are often maintained
by different people
− In large companies, sometimes by different
groups
− In any group over a single developer,
someone will always know more about part of
the system
Consider instead: Planes

Instead of a “stack”, think of “planes”:
− Physical
− Network
− Persistence
− Service
− Application
− Control
Planes

Similar to the OSI network model

Allows clear separation of responsibilities and
concerns

Planes support other planes

Your individual design may move a functionality
to a different plane. They key is to handle it
consistently
Physical Plane

Provided by AWS (or other cloud provider)

Or a rack of systems
Network Plane

Enables and controls communication

Location for computation and storage

Maybe handles communication security
Persistence Plane

This is a special plane – the only one you can't
burn down

Holds your critical data

If the company will fold if this data is lost, it's in
the Persistence plane.

Backups and non-recreatable data
Services Plane

Commonly available services
− DNS, VPN
− Service Location (consul? Etcd? DNS?)
− Secrets Management
− User Authentication/Management (maybe)
...but what about Databases?

DBs might be part of the “Service” plane

You might consider DBs part of the application
instead

You probably don't want to see them as
“Persistence” unless you're betting your
company that they'll never go down
Application Plane

This is the part that touches the users!
− Application Servers
− Micro-services
− Web farm
− DBs (maybe)

Modifying Application plane often modifies state
in the other planes
− e.g. DNS, consul, ...
Control Plane

Global Procedures
− Backups
− Batch Jobs
− State changes
− Infrastructure Roll-out
− Recovery

Often your cloud system provides part of this
implicitly

“Scale” is often set here – ability to scale is not
Some practical details
Must

Have a separate, isolated, no-risk infrastructure
development area

Have a way to verify functionality

Use a source control tool
Must Not

Develop in Production!
− Yeah, yeah, everyone
knows this
− but “Can you just make
this one little change...”
Note: If your process is good,
you can make “one little
change” in the process, not
work-around the process.
Should

Have an automated (not necessarily automatic)
IaC roll-out process

Frequently “set up from scratch”
Should Not

Develop “near” production
− You don't want accidents
to impact production –
build safeties!

Have manual steps
− A little bit of friction has
impact FAR out of
proportion to it's size

Did you forget it?

Did you miss a step?
Write the way you
think

We think about infrastructure as objects
− “There is a network”
− “Here is a server”
− “The load balancer has these instances”

We don't think about infrastructure as
procedures

So...our tools should be work the same way
Dealing with
Component
Dependencies

“Just make sure the DB is up before the app.”

“Oh, and auth needs to be up before the web
farm”

“Right, logging needs to be up before
everything..."

This way lies madness (and fragility!)
− Failures cascade
− Recovery doesn't
Cascading Recovery

Health Checks

Retry and Wait
− If it's not up, keep trying

This gives you...
− Scalability!
− Recovery time!
− Self Healing!

Why? Because very few people are at their
best at 3am
What if we don't have this already?

Wrap the component and make it

Don't be afraid of building better pieces out of
what developers give you

We want to make recovery cascade

Incidentally, this makes deployment a lot
easier...and faster
Service Discovery

Keep Service Discovery as local as possible
− Use cluster-local discovery instead of e.g.
global DNS
− Consul, Etcd, Private DNS zones

Keep service names as unchanged as possible
− Set the context of the service, don't configure
the pieces

Bind external services as late as possible
− Map the “www...” name in at the end
Writing the Code

Use inspection instead of variable passing
between planes

Use variable passing rather instead of hard-
coding

Comment your code!
− The code shows “how”, you need to comment
on “why”

Use good commit messages!
− Make it easy to find when and where changes
are made
Rolling Out

Roll out changes to lower risk environments
− Dev Environment
− QA Environment
− Staging
− Prod

Wow, that's a lot of environments to manage

But you have IaC, so it's easy!
More Roll-out

Have some Canaries if possible
− Put a little traffic on the new system
− Be ready to take traffic off
− Often not possible
Roll back vs Roll Forward

Either works, but make a decision and stick
with it

If you're rolling forward, make sure you can roll
forward to a previous (working) state

Roll forward is easier (and faster) for
development
The Control Plane

What you use to control the behavior of the
other planes.

Execute and control backups

Add/Remove Users, etc

Run any batch jobs you need (data purge?)
A note about scheduled jobs...

Cron is Evil
− Well, not actually
evil, but...
− Hard to monitor
− Hard to view results
− Hard to modify
− Requires sysadmin knowledge to change

Much better to have a single location with a UI
Run-books

Sufficiently detailed documentation is
executable
− Anything you do regularly should be scripted
− Failure recovery should be as automated as
possible

because downtime is bad

and thinking under pressure is harder

So, what's left is troubleshooting and problems
you don't yet know or understand

...which are difficult to Run-book
Tying it up

Describe your IaC as declaratively as possible

Develop your infrastructure in a separate
location

Organize your IaC into planes of responsibility

Build your IaC out of modules

Deploy your changes across environments

Automate all of your normal operations
Results

Less stress

Less risk

More predictability
Image Credits
All images discovered by Google Images set for "Labeled for reuse"
https://commons.wikimedia.org/wiki/File:Cable_closet_bh.jpg
https://commons.wikimedia.org/wiki/File:CERN_Server_03.jpg
https://commons.wikimedia.org/wiki/File:Rugged_1U_Computer.png
https://oer.gitlab.io/oer-on-oer-infrastructure/Git-introduction.html#/sec-title-slide
https://commons.wikimedia.org/wiki/File:Hair_pulling_stress.jpg
https://commons.wikimedia.org/wiki/File:Software_Developer_at_work_03.jpg
https://commons.wikimedia.org/wiki/File:National_Security_Operations_Center_photograph,_c._1985_-
_National_Cryptologic_Museum_-_DSC07661.JPG
https://commons.wikimedia.org/wiki/File:Devops-toolchain.svg
https://commons.wikimedia.org/wiki/File:Devops.svg
https://www.flickr.com/photos/thebusybrain/2492945625
https://pixnio.com/people/female-women/woman-programmer-internet-business-blogging-business-coding-computer-programming
https://en.wikipedia.org/wiki/File:DMZ_network_diagram_1_firewall.svg
https://commons.wikimedia.org/wiki/File:Gnome-emblem-important.svg
https://www.flickr.com/photos/davedugdale/5026217210
https://commons.wikimedia.org/wiki/File:Exclamation_mark_red.png
https://www.flickr.com/photos/oskay/2156889157
https://commons.wikimedia.org/wiki/File:Concrete_Compression_Testing.jpg
https://www.xymon.com/ (screen shot)
https://pixabay.com/en/photos/stabilit%C3%A4t/
https://www.flickr.com/photos/fdecomite/2335204025
https://www.flickr.com/photos/internetarchivebookimages/14777225344
https://www.flickr.com/photos/102642344@N02/14960581044
https://pixabay.com/en/socket-concrete-slab-underground-2828305/
https://commons.wikimedia.org/wiki/File:Inside_Suite.jpg
https://commons.wikimedia.org/wiki/File:Ego_network.png
https://commons.wikimedia.org/wiki/File:MUTCD_R3-7R.svg
Image Credits
https://commons.wikimedia.org/wiki/File:Philippines_road_sign_R3-14P.svg
https://www.flickr.com/photos/dullhunk/7214525854/
https://commons.wikimedia.org/wiki/File:CALTRANS_SR39A_(CA).svg
https://pixabay.com/en/thinker-words-thoughts-mind-white-3025789/
https://commons.wikimedia.org/wiki/File:Acyclic_dependencies,_circular_dependency_example.svg
http://phdthesis-bioinformatics-maxplanckinstitute-molecularplantphys.matthias-scholz.de/
https://de.wikipedia.org/wiki/Datei:Usb_otg.jpg
https://commons.wikimedia.org/wiki/File:Copyright_Card_Catalog_Drawer.jpg
https://pxhere.com/en/photo/891776
https://commons.wikimedia.org/wiki/File:Discovery_rollout_ceremony.jpg
https://pixabay.com/en/photos/pause/?image_type=vector
https://pixabay.com/en/control-panels-controls-equipment-1840480/
https://commons.wikimedia.org/wiki/File:Jenkins_Home.png
https://en.m.wikipedia.org/wiki/File:Bottle_Sling_ABOK_1142_Tying_Complete.jpg
https://skitterphoto.com/photos/2188/girl-taking-a-picture
https://pixabay.com/en/jenga-balance-sensitivity-stability-1941500/
https://pixabay.com/en/emoji-smilie-whatsapp-emotion-2762568/

More Related Content

What's hot

Process and Challenges for Upgrading OC, RDC and TMS
 Process and Challenges for Upgrading OC, RDC and TMS Process and Challenges for Upgrading OC, RDC and TMS
Process and Challenges for Upgrading OC, RDC and TMS
Perficient
 
55918644 13221359-heartbeat-tutorial
55918644 13221359-heartbeat-tutorial55918644 13221359-heartbeat-tutorial
55918644 13221359-heartbeat-tutorial
Jaebum Park
 

What's hot (14)

The Continuous delivery Value @ codemotion 2014
The Continuous delivery Value @ codemotion 2014The Continuous delivery Value @ codemotion 2014
The Continuous delivery Value @ codemotion 2014
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 
Accelerating Devops via Data Virtualization | Delphix
Accelerating Devops via Data Virtualization | DelphixAccelerating Devops via Data Virtualization | Delphix
Accelerating Devops via Data Virtualization | Delphix
 
Patterns of resilience
Patterns of resiliencePatterns of resilience
Patterns of resilience
 
opensource Monitoring Tool , an overview
opensource Monitoring Tool , an overviewopensource Monitoring Tool , an overview
opensource Monitoring Tool , an overview
 
Managing Performance in a Virtual Environment
Managing Performance in a Virtual EnvironmentManaging Performance in a Virtual Environment
Managing Performance in a Virtual Environment
 
Stop Feeding IBM i Performance Hogs - Robot
Stop Feeding IBM i Performance Hogs - RobotStop Feeding IBM i Performance Hogs - Robot
Stop Feeding IBM i Performance Hogs - Robot
 
Process and Challenges for Upgrading OC, RDC and TMS
 Process and Challenges for Upgrading OC, RDC and TMS Process and Challenges for Upgrading OC, RDC and TMS
Process and Challenges for Upgrading OC, RDC and TMS
 
55918644 13221359-heartbeat-tutorial
55918644 13221359-heartbeat-tutorial55918644 13221359-heartbeat-tutorial
55918644 13221359-heartbeat-tutorial
 
CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012
CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012
CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012
 
Delphix
DelphixDelphix
Delphix
 
Why resilience - A primer at varying flight altitudes
Why resilience - A primer at varying flight altitudesWhy resilience - A primer at varying flight altitudes
Why resilience - A primer at varying flight altitudes
 
Implementing DevOps Automation: Best Practices & Common Mistakes - DevOps Eas...
Implementing DevOps Automation: Best Practices & Common Mistakes - DevOps Eas...Implementing DevOps Automation: Best Practices & Common Mistakes - DevOps Eas...
Implementing DevOps Automation: Best Practices & Common Mistakes - DevOps Eas...
 
Preparing for DevOps
Preparing for DevOpsPreparing for DevOps
Preparing for DevOps
 

Similar to Infrastructure as Code to Maintain your Sanity

Sai devops - the art of being specializing generalist
Sai   devops - the art of being specializing generalistSai   devops - the art of being specializing generalist
Sai devops - the art of being specializing generalist
Odd-e
 

Similar to Infrastructure as Code to Maintain your Sanity (20)

A real-life account of moving 100% to a public cloud
A real-life account of moving 100% to a public cloudA real-life account of moving 100% to a public cloud
A real-life account of moving 100% to a public cloud
 
North east user group tour
North east user group tourNorth east user group tour
North east user group tour
 
The "Holy Grail" of Dev/Ops
The "Holy Grail" of Dev/OpsThe "Holy Grail" of Dev/Ops
The "Holy Grail" of Dev/Ops
 
Do's and don'ts when deploying akka in production
Do's and don'ts when deploying akka in productionDo's and don'ts when deploying akka in production
Do's and don'ts when deploying akka in production
 
Challenges and best practices of database continuous delivery
Challenges and best practices of database continuous deliveryChallenges and best practices of database continuous delivery
Challenges and best practices of database continuous delivery
 
RSA 2015 Realities of Private Cloud Security
RSA 2015 Realities of Private Cloud SecurityRSA 2015 Realities of Private Cloud Security
RSA 2015 Realities of Private Cloud Security
 
Sai devops - the art of being specializing generalist
Sai   devops - the art of being specializing generalistSai   devops - the art of being specializing generalist
Sai devops - the art of being specializing generalist
 
Serverless meetup Auckland #6
Serverless meetup Auckland #6Serverless meetup Auckland #6
Serverless meetup Auckland #6
 
Success Factors for a Mature Microservices Implementation
Success Factors for a Mature Microservices ImplementationSuccess Factors for a Mature Microservices Implementation
Success Factors for a Mature Microservices Implementation
 
5 Quick Wins for the Cloud
5 Quick Wins for the Cloud5 Quick Wins for the Cloud
5 Quick Wins for the Cloud
 
Cloudy in Indonesia: Java and Cloud
Cloudy in Indonesia: Java and CloudCloudy in Indonesia: Java and Cloud
Cloudy in Indonesia: Java and Cloud
 
Availability in a cloud native world v1.6 (Feb 2019)
Availability in a cloud native world v1.6 (Feb 2019)Availability in a cloud native world v1.6 (Feb 2019)
Availability in a cloud native world v1.6 (Feb 2019)
 
DevOps, A brief introduction to Vagrant & Ansible
DevOps, A brief introduction to Vagrant & AnsibleDevOps, A brief introduction to Vagrant & Ansible
DevOps, A brief introduction to Vagrant & Ansible
 
Java Agile ALM: OTAP and DevOps in the Cloud
Java Agile ALM: OTAP and DevOps in the CloudJava Agile ALM: OTAP and DevOps in the Cloud
Java Agile ALM: OTAP and DevOps in the Cloud
 
Simplified DevOps Bliss -with OpenAI API
Simplified DevOps Bliss -with OpenAI APISimplified DevOps Bliss -with OpenAI API
Simplified DevOps Bliss -with OpenAI API
 
HLayer / Cloud Native Best Practices
HLayer / Cloud Native Best PracticesHLayer / Cloud Native Best Practices
HLayer / Cloud Native Best Practices
 
DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty Details
 
Best practice adoption (and lack there of)
Best practice adoption (and lack there of)Best practice adoption (and lack there of)
Best practice adoption (and lack there of)
 
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as CodeConfoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 

Infrastructure as Code to Maintain your Sanity

  • 1. Staying sane with IaC Dewey Sasser Principal Consultant Aligned Software
  • 2. Contents  Where I'm coming from  IaC  DevOps  Sanity  How To...
  • 3. About Dewey  Distributed Application Developer for 25 years − Doing build/release/software process for about that long − Accidentally doing DevOps out of self-defense  Wandered into operations about 7 years ago − Built some private cloud for dev − Built some private cloud for prod − Moved to public cloud architecture
  • 4. Deployment Context  Largest Deployment: − ~64 application servers − ~96 MongoDB nodes − Many postgres, S3, ... − ~14,000 TPS  Smallest Deployment: − 1 application server − 1 DB − 1-2 TPS
  • 5. Where did all this come from?  I'm a developer  In doing IaC, I noticed some things really didn't work well − Some “developer” assumptions about Ops didn't work − Ops was missing some lessons developers have learned  IaC patterns are a work in progress − There are definitely some loose ends -- this is *NOT* the product of 20 years of industry consensus
  • 6. Why IaC  Manage vastly increased complexity  Reduce risk  Understand changes before they hit production − create a low risk location to try changes − helps developers, too!  DR anyone?
  • 7. Why do you care so much about Sanity?  UPTIME!  Sleep!  Because you're in this for the long run
  • 8. Dev vs. Ops  Dev − “I can make this faster!” − “I can make this better!” − “This will only take a minute...” Enthusiasm!
  • 9. Dev vs. Ops  Ops − “Don't break anything!” − “Don't lose data!” − Alan Shepard's Prayer Pessimism
  • 10. DevOps  We know this will work  We can repeat this  We develop to make ops easier  We operate to make coding easier Confidence!
  • 11. The Development Cycle  Conceive  Plan  Build − Develop − Test  Verify − Acceptance Test  Deploy  Manage
  • 12. IaC code is different than program code
  • 13. IaC is like program code...  Controlled changes  Build outside of user view  You can have “good code” and “bad code”  There are patterns and anti-patterns
  • 14. IaC is unlike program code...  Programs describe HOW. IaC describe WHAT  Behavior is something that happens within the infrastructure  Infrastructure is more difficult to test than programs − Nearly every test is an “integration test”  It's hard to isolate the pieces
  • 15. OK, let's get to that Sanity Part...
  • 16. What's important in IaC  IaC is not about speeding up your departure, it's about speeding up your arrival  You need to map your technical concepts to your thought processes
  • 17. Consider your end goal  You need a running system  But you're going to be changing and deploying new running systems  And you need to keep it running Maintenance Window Continuous Up-time
  • 19. Be Declarative!  It's very difficult to reason about the state of an infrastructure that is managed by procedures − Procedural thinking makes “is” a 2nd class concept  Incidentally, developers seem to love procedures and hate state declarations
  • 20. Procedures  Are ultimately necessary  Should be as far down in the process as possible − Cloud Formation and Terraform do this  Should be idempotent − EnsureWebServer(), not createWebServer() − Run again and again and again
  • 21. Bad! db = createDatabase() web = createWebServer(db) addDNS(“www...”, web)  Because if you run it again, you get a 2nd DB, web server, ...
  • 22. Better db = ensureDBExists() web = ensureWebExists(db) ensureDNSName(“www...”, web)
  • 23. Best # pesudo-code resource “DB” {…} resource “Web” {…; database=db} resource “DNS” { name=”www...”; IP=web.ip }  Why? Because the tool can understand what you want and figure out how to get it − CloudFormation, Terraform, Puppet
  • 24. Idempotence Not idempotent: Idempotent: startServer() { nginx -g "/var/run/nginx.pid;" } startServer() { pid=$(cat /var/run/nginx.pid) if ! processIsRunning $pid ; then nginx -g "/var/run/nginx.pid;" fi }
  • 25. Make it Modular  Program code has object and functions  IaC should be modular, too − Re-use − Easy modification − Consistent definitions  Make your modules semantically meaningful, not functionally meaningful
  • 26. Test and Verify  You wouldn't deliver an untested program  Don't deliver an untested infrastructure But how?
  • 27. Testing  You have a monitoring system, right?  Guess what... − testing “Is” is just the first part of monitoring − Testing “does” is the 2nd part  This gives you...Test Driven Development! − You know when you're complete − You know if something breaks  This means your monitoring system is not an afterthought, it's a forethought!
  • 31. “Full Stack” thinking is hazardous  Full Stack is difficult to change post deployment  It's difficult to deal with cross-cutting concern across multiple stacks  Different parts of the stack are often maintained by different people − In large companies, sometimes by different groups − In any group over a single developer, someone will always know more about part of the system
  • 32. Consider instead: Planes  Instead of a “stack”, think of “planes”: − Physical − Network − Persistence − Service − Application − Control
  • 33. Planes  Similar to the OSI network model  Allows clear separation of responsibilities and concerns  Planes support other planes  Your individual design may move a functionality to a different plane. They key is to handle it consistently
  • 34. Physical Plane  Provided by AWS (or other cloud provider)  Or a rack of systems
  • 35. Network Plane  Enables and controls communication  Location for computation and storage  Maybe handles communication security
  • 36. Persistence Plane  This is a special plane – the only one you can't burn down  Holds your critical data  If the company will fold if this data is lost, it's in the Persistence plane.  Backups and non-recreatable data
  • 37. Services Plane  Commonly available services − DNS, VPN − Service Location (consul? Etcd? DNS?) − Secrets Management − User Authentication/Management (maybe)
  • 38. ...but what about Databases?  DBs might be part of the “Service” plane  You might consider DBs part of the application instead  You probably don't want to see them as “Persistence” unless you're betting your company that they'll never go down
  • 39. Application Plane  This is the part that touches the users! − Application Servers − Micro-services − Web farm − DBs (maybe)  Modifying Application plane often modifies state in the other planes − e.g. DNS, consul, ...
  • 40. Control Plane  Global Procedures − Backups − Batch Jobs − State changes − Infrastructure Roll-out − Recovery  Often your cloud system provides part of this implicitly  “Scale” is often set here – ability to scale is not
  • 42. Must  Have a separate, isolated, no-risk infrastructure development area  Have a way to verify functionality  Use a source control tool
  • 43. Must Not  Develop in Production! − Yeah, yeah, everyone knows this − but “Can you just make this one little change...” Note: If your process is good, you can make “one little change” in the process, not work-around the process.
  • 44. Should  Have an automated (not necessarily automatic) IaC roll-out process  Frequently “set up from scratch”
  • 45. Should Not  Develop “near” production − You don't want accidents to impact production – build safeties!  Have manual steps − A little bit of friction has impact FAR out of proportion to it's size  Did you forget it?  Did you miss a step?
  • 46. Write the way you think  We think about infrastructure as objects − “There is a network” − “Here is a server” − “The load balancer has these instances”  We don't think about infrastructure as procedures  So...our tools should be work the same way
  • 47. Dealing with Component Dependencies  “Just make sure the DB is up before the app.”  “Oh, and auth needs to be up before the web farm”  “Right, logging needs to be up before everything..."  This way lies madness (and fragility!) − Failures cascade − Recovery doesn't
  • 48. Cascading Recovery  Health Checks  Retry and Wait − If it's not up, keep trying  This gives you... − Scalability! − Recovery time! − Self Healing!  Why? Because very few people are at their best at 3am
  • 49. What if we don't have this already?  Wrap the component and make it  Don't be afraid of building better pieces out of what developers give you  We want to make recovery cascade  Incidentally, this makes deployment a lot easier...and faster
  • 50. Service Discovery  Keep Service Discovery as local as possible − Use cluster-local discovery instead of e.g. global DNS − Consul, Etcd, Private DNS zones  Keep service names as unchanged as possible − Set the context of the service, don't configure the pieces  Bind external services as late as possible − Map the “www...” name in at the end
  • 51. Writing the Code  Use inspection instead of variable passing between planes  Use variable passing rather instead of hard- coding  Comment your code! − The code shows “how”, you need to comment on “why”  Use good commit messages! − Make it easy to find when and where changes are made
  • 52. Rolling Out  Roll out changes to lower risk environments − Dev Environment − QA Environment − Staging − Prod  Wow, that's a lot of environments to manage  But you have IaC, so it's easy!
  • 53. More Roll-out  Have some Canaries if possible − Put a little traffic on the new system − Be ready to take traffic off − Often not possible
  • 54. Roll back vs Roll Forward  Either works, but make a decision and stick with it  If you're rolling forward, make sure you can roll forward to a previous (working) state  Roll forward is easier (and faster) for development
  • 55. The Control Plane  What you use to control the behavior of the other planes.  Execute and control backups  Add/Remove Users, etc  Run any batch jobs you need (data purge?)
  • 56. A note about scheduled jobs...  Cron is Evil − Well, not actually evil, but... − Hard to monitor − Hard to view results − Hard to modify − Requires sysadmin knowledge to change  Much better to have a single location with a UI
  • 57. Run-books  Sufficiently detailed documentation is executable − Anything you do regularly should be scripted − Failure recovery should be as automated as possible  because downtime is bad  and thinking under pressure is harder  So, what's left is troubleshooting and problems you don't yet know or understand  ...which are difficult to Run-book
  • 58. Tying it up  Describe your IaC as declaratively as possible  Develop your infrastructure in a separate location  Organize your IaC into planes of responsibility  Build your IaC out of modules  Deploy your changes across environments  Automate all of your normal operations
  • 60. Image Credits All images discovered by Google Images set for "Labeled for reuse" https://commons.wikimedia.org/wiki/File:Cable_closet_bh.jpg https://commons.wikimedia.org/wiki/File:CERN_Server_03.jpg https://commons.wikimedia.org/wiki/File:Rugged_1U_Computer.png https://oer.gitlab.io/oer-on-oer-infrastructure/Git-introduction.html#/sec-title-slide https://commons.wikimedia.org/wiki/File:Hair_pulling_stress.jpg https://commons.wikimedia.org/wiki/File:Software_Developer_at_work_03.jpg https://commons.wikimedia.org/wiki/File:National_Security_Operations_Center_photograph,_c._1985_- _National_Cryptologic_Museum_-_DSC07661.JPG https://commons.wikimedia.org/wiki/File:Devops-toolchain.svg https://commons.wikimedia.org/wiki/File:Devops.svg https://www.flickr.com/photos/thebusybrain/2492945625 https://pixnio.com/people/female-women/woman-programmer-internet-business-blogging-business-coding-computer-programming https://en.wikipedia.org/wiki/File:DMZ_network_diagram_1_firewall.svg https://commons.wikimedia.org/wiki/File:Gnome-emblem-important.svg https://www.flickr.com/photos/davedugdale/5026217210 https://commons.wikimedia.org/wiki/File:Exclamation_mark_red.png https://www.flickr.com/photos/oskay/2156889157 https://commons.wikimedia.org/wiki/File:Concrete_Compression_Testing.jpg https://www.xymon.com/ (screen shot) https://pixabay.com/en/photos/stabilit%C3%A4t/ https://www.flickr.com/photos/fdecomite/2335204025 https://www.flickr.com/photos/internetarchivebookimages/14777225344 https://www.flickr.com/photos/102642344@N02/14960581044 https://pixabay.com/en/socket-concrete-slab-underground-2828305/ https://commons.wikimedia.org/wiki/File:Inside_Suite.jpg https://commons.wikimedia.org/wiki/File:Ego_network.png https://commons.wikimedia.org/wiki/File:MUTCD_R3-7R.svg
  • 61. Image Credits https://commons.wikimedia.org/wiki/File:Philippines_road_sign_R3-14P.svg https://www.flickr.com/photos/dullhunk/7214525854/ https://commons.wikimedia.org/wiki/File:CALTRANS_SR39A_(CA).svg https://pixabay.com/en/thinker-words-thoughts-mind-white-3025789/ https://commons.wikimedia.org/wiki/File:Acyclic_dependencies,_circular_dependency_example.svg http://phdthesis-bioinformatics-maxplanckinstitute-molecularplantphys.matthias-scholz.de/ https://de.wikipedia.org/wiki/Datei:Usb_otg.jpg https://commons.wikimedia.org/wiki/File:Copyright_Card_Catalog_Drawer.jpg https://pxhere.com/en/photo/891776 https://commons.wikimedia.org/wiki/File:Discovery_rollout_ceremony.jpg https://pixabay.com/en/photos/pause/?image_type=vector https://pixabay.com/en/control-panels-controls-equipment-1840480/ https://commons.wikimedia.org/wiki/File:Jenkins_Home.png https://en.m.wikipedia.org/wiki/File:Bottle_Sling_ABOK_1142_Tying_Complete.jpg https://skitterphoto.com/photos/2188/girl-taking-a-picture https://pixabay.com/en/jenga-balance-sensitivity-stability-1941500/ https://pixabay.com/en/emoji-smilie-whatsapp-emotion-2762568/

Editor's Notes

  1. “Is” is the 1st question we ask, only after we've established “is” do we proceed to “does”
  2. Semantically meaningful: what you talk about when describing the architecture. “The app server”, “the cache server” Not “a linux server that runs NGINX and Jetty to serve a Java web application”
  3. JIRA as an example