SlideShare a Scribd company logo
1 of 30
Download to read offline
Managing Expensive or
Destructive Operations
in Jenkins CI
http://bit.ly/dangerous-jenkins
September 26, 2018
Richard Bullington-McGuire
Principal Architect, Modus Create
richard@moduscreate.com
@obscurerichard
Deploy with Jenkins?
Sure!
But NOT ACCIDENTALLY
My Continuous Integration & CD Experience
Early Experiments - CruiseControl (Java) (2004-2008)
CruiseControl.NET (C#) 2008
Hudson + Windows / Linux + Java: 2009-2011
Jenkins (Linux, Windows, Android, iOS, Java, .NET, Python,
NodeJS, Angular, React, Objective C, etc etc.) 2011-2018
Commercial clients in a wide variety of industries
What is Expensive?
Downtime for critical systems
Deploying Cloud back end services (at scale)
Large scale load testing
Starting large scale analysis jobs - computational
fluid dynamics, geonomics, etc.
What is Destructive?
● Cloud provisioning: CloudFormation or Terraform
● Very risky: large changes w/ Infrastructure as Code
● Targeting the wrong environment by mistake
“Ooops, I just deleted the prod database server!”
Defense Strategies
1. Access Controls
2. Separate Control Systems for Production
3. Code Review
4. Human Check on Deployments
5. Opt-In Switches and CAPTCHAs
6. Small Increments with Metrics & Monitoring
7. Tiered Environments
8. Restricted Branches
Case Study: Jenkins & Terraform at Work
Case Study: Jenkins & Terraform at Work
● Education company cloud migration (4mo -> prod)
● Apps w/> 30,000 RPM at peak measured with New Relic
● Production with 80+ sizeable AMIs baseline
● Auto Scaling to 200+ AMIs under heavy load
● Multiple environments: dev, qa, staging, prod
● Terabyte-scale MySQL Aurora cluster, 50+ TB in S3
● All managed with Jenkins, Terraform, Ansible, Packer,
CodeDeploy, multiple tech stacks
What could go wrong?
● Some changes to resources destroy old & create new
○ Eg.. Instances not part of auto scaling groups
are easy to destroy by accident
● Changes to network environments may require deleting
and recreating whole server stacks
● Terraform has imperfect grasp of some dependencies -
some plans fail to execute
● Accidentally deploying to the wrong environment
That’s not exactly Continuous Deployment!
Nope.
But not every organization is ready for that,
Nor will it work to push every change to every environment
without careful review.
Access Controls
Use access controls baked into Jenkins & target systems
● Integrate with Enterprise Directory - AD or Google SSO
● Limit access to deploy jobs
● Use Jenkins secrets or other secret store to hide keys
Example: Microsoft Cloud SSO + Jenkins SAML provider
Separate Environments
Consider separating dev & prod deploy mechanisms
● Have separate CI systems for dev & prod systems
● Limit access to prod systems more strictly
● May help with Sarbanes Oxley compliance or other
enterprise security controls. Could also be just a crutch.
Eg... One Jenkins server per AWS account
Code Review
Bake code review into deployment pipeline
1. Use GitHub pull requests or equivalent mechanisms
2. Require sign-off from tech lead or multiple people
3. Have linting tools to automate some code review chores
Human Check on Deployments
Require human review of critical steps
1. Use Jenkins inputs to pause and provide an abort option
2. Set rules of engagement on how deploys are done
Example: on a mixed team of consultants and company
employees, only do a production deploy with customer staff
watching and signing off
Opt-In Switches and CAPTCHAs
Add speed bumps (CAPTCHAs) to deployments
1. Avoid expensive or destructive operations on every commit
2. Make deployers run a job manually and select expensive or
destructive operations specifically
3. Make deployers solve a simple math problem to slow them
down and think about what they are doing
Confirm?
Type CONFIRM to continue
`
What do you think happens
when people try asking for
confirmation like this?
`
BAD IDEA proven awful
through HARD EXPERIENCE
● People always just type CONFIRM,
● Or worse, the browser autofills it!
Confirm? CONFIRM
Type CONFIRM to continue
Solve it! 42
What do you get when you multiply 6 by 7?
`A simple math problem works
much better in practice
Small Increments with Metrics & Monitoring
Deploy small increments at a time and monitor everything:
1. Make small changes and test them independently
2. Monitor target systems with New Relic or similar systems
3. Write automated tests for the small increments
4. Write load tests that can test your system at scale
Restricted Branches & Deploy Constraints
Require some changes to go through a certain branch:
1. Restrict prod deployments to the master branch
2. Ensure changes have gone through code review & testing
3. Make deploy jobs fail fast if they violate constraints
4. Build switches into code that have to be backed out in
commits in order to do some types of dangerous changes
5. Restrict deploys to certain times or by metrics
Implementation
http://bit.ly/dangerous-jenkins
Local Development
CI Environment
Jenkins
Elastic Load
Balancer
EC2 Auto Scaling
Group - Web App
Terraform Provision
Packer
Provision
Demo
http://bit.ly/dangerous-jenkins
Step By Step
● Builds on CIS Baseline demo from NYC DevOps meetup
○ Ansible, Packer, OpenSCAP scan
● Adds building & destroying:
○ AWS VPC
○ Auto Scaling Group with ELB
○ Simple web application (from DevOps Wall St demo)
● Has Auto Scaling Group rotation
Defenses implemented
This demo has 4 of 7 of the defenses implemented:
1. Access Controls
2. Code Review
3. Human Check on Deployments
4. Opt-In Switches and CAPTCHAs
Lessons Learned
● Keeping people in the deploy loop is good sometimes
● Mashing up Terraform, Ansible, Packer & Jenkins works
● Simple math problem CAPTCHAs are way better than
simple confirmation prompts
● Using Docker for both Packer and Terraform works well
● Working around Docker’s tendency to create root owned
files can be done with Docker busybox and a chown -R
Extending Defenses Further
● Find a way to integrate a better (real) CAPTCHA
● Use TOTP passwords (Google Authenticator style)
during job approval process
● Hook into metrics to avoid changes at most volatile
times - add extra danger indicators
● Adapt amount of approvals needed per environment
● Chef Inspect / compliance tests / Test Kitchen
Thank You!
http://bit.ly/dangerous-jenkins
richard@moduscreate.com
@obscurerichard

More Related Content

Recently uploaded

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Recently uploaded (20)

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
 
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid Environments
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Managing expensive or destructive operations in jenkins ci

  • 1. Managing Expensive or Destructive Operations in Jenkins CI http://bit.ly/dangerous-jenkins September 26, 2018 Richard Bullington-McGuire Principal Architect, Modus Create richard@moduscreate.com @obscurerichard
  • 4. My Continuous Integration & CD Experience Early Experiments - CruiseControl (Java) (2004-2008) CruiseControl.NET (C#) 2008 Hudson + Windows / Linux + Java: 2009-2011 Jenkins (Linux, Windows, Android, iOS, Java, .NET, Python, NodeJS, Angular, React, Objective C, etc etc.) 2011-2018 Commercial clients in a wide variety of industries
  • 5. What is Expensive? Downtime for critical systems Deploying Cloud back end services (at scale) Large scale load testing Starting large scale analysis jobs - computational fluid dynamics, geonomics, etc.
  • 6. What is Destructive? ● Cloud provisioning: CloudFormation or Terraform ● Very risky: large changes w/ Infrastructure as Code ● Targeting the wrong environment by mistake “Ooops, I just deleted the prod database server!”
  • 7. Defense Strategies 1. Access Controls 2. Separate Control Systems for Production 3. Code Review 4. Human Check on Deployments 5. Opt-In Switches and CAPTCHAs 6. Small Increments with Metrics & Monitoring 7. Tiered Environments 8. Restricted Branches
  • 8. Case Study: Jenkins & Terraform at Work
  • 9. Case Study: Jenkins & Terraform at Work ● Education company cloud migration (4mo -> prod) ● Apps w/> 30,000 RPM at peak measured with New Relic ● Production with 80+ sizeable AMIs baseline ● Auto Scaling to 200+ AMIs under heavy load ● Multiple environments: dev, qa, staging, prod ● Terabyte-scale MySQL Aurora cluster, 50+ TB in S3 ● All managed with Jenkins, Terraform, Ansible, Packer, CodeDeploy, multiple tech stacks
  • 10. What could go wrong? ● Some changes to resources destroy old & create new ○ Eg.. Instances not part of auto scaling groups are easy to destroy by accident ● Changes to network environments may require deleting and recreating whole server stacks ● Terraform has imperfect grasp of some dependencies - some plans fail to execute ● Accidentally deploying to the wrong environment
  • 11. That’s not exactly Continuous Deployment! Nope. But not every organization is ready for that, Nor will it work to push every change to every environment without careful review.
  • 12. Access Controls Use access controls baked into Jenkins & target systems ● Integrate with Enterprise Directory - AD or Google SSO ● Limit access to deploy jobs ● Use Jenkins secrets or other secret store to hide keys Example: Microsoft Cloud SSO + Jenkins SAML provider
  • 13. Separate Environments Consider separating dev & prod deploy mechanisms ● Have separate CI systems for dev & prod systems ● Limit access to prod systems more strictly ● May help with Sarbanes Oxley compliance or other enterprise security controls. Could also be just a crutch. Eg... One Jenkins server per AWS account
  • 14. Code Review Bake code review into deployment pipeline 1. Use GitHub pull requests or equivalent mechanisms 2. Require sign-off from tech lead or multiple people 3. Have linting tools to automate some code review chores
  • 15. Human Check on Deployments Require human review of critical steps 1. Use Jenkins inputs to pause and provide an abort option 2. Set rules of engagement on how deploys are done Example: on a mixed team of consultants and company employees, only do a production deploy with customer staff watching and signing off
  • 16. Opt-In Switches and CAPTCHAs Add speed bumps (CAPTCHAs) to deployments 1. Avoid expensive or destructive operations on every commit 2. Make deployers run a job manually and select expensive or destructive operations specifically 3. Make deployers solve a simple math problem to slow them down and think about what they are doing
  • 17. Confirm? Type CONFIRM to continue ` What do you think happens when people try asking for confirmation like this?
  • 18. ` BAD IDEA proven awful through HARD EXPERIENCE ● People always just type CONFIRM, ● Or worse, the browser autofills it! Confirm? CONFIRM Type CONFIRM to continue
  • 19. Solve it! 42 What do you get when you multiply 6 by 7? `A simple math problem works much better in practice
  • 20. Small Increments with Metrics & Monitoring Deploy small increments at a time and monitor everything: 1. Make small changes and test them independently 2. Monitor target systems with New Relic or similar systems 3. Write automated tests for the small increments 4. Write load tests that can test your system at scale
  • 21. Restricted Branches & Deploy Constraints Require some changes to go through a certain branch: 1. Restrict prod deployments to the master branch 2. Ensure changes have gone through code review & testing 3. Make deploy jobs fail fast if they violate constraints 4. Build switches into code that have to be backed out in commits in order to do some types of dangerous changes 5. Restrict deploys to certain times or by metrics
  • 24. CI Environment Jenkins Elastic Load Balancer EC2 Auto Scaling Group - Web App Terraform Provision Packer Provision
  • 26. Step By Step ● Builds on CIS Baseline demo from NYC DevOps meetup ○ Ansible, Packer, OpenSCAP scan ● Adds building & destroying: ○ AWS VPC ○ Auto Scaling Group with ELB ○ Simple web application (from DevOps Wall St demo) ● Has Auto Scaling Group rotation
  • 27. Defenses implemented This demo has 4 of 7 of the defenses implemented: 1. Access Controls 2. Code Review 3. Human Check on Deployments 4. Opt-In Switches and CAPTCHAs
  • 28. Lessons Learned ● Keeping people in the deploy loop is good sometimes ● Mashing up Terraform, Ansible, Packer & Jenkins works ● Simple math problem CAPTCHAs are way better than simple confirmation prompts ● Using Docker for both Packer and Terraform works well ● Working around Docker’s tendency to create root owned files can be done with Docker busybox and a chown -R
  • 29. Extending Defenses Further ● Find a way to integrate a better (real) CAPTCHA ● Use TOTP passwords (Google Authenticator style) during job approval process ● Hook into metrics to avoid changes at most volatile times - add extra danger indicators ● Adapt amount of approvals needed per environment ● Chef Inspect / compliance tests / Test Kitchen