Managing expensive or destructive operations in jenkins ci

Managing Expensive or
Destructive Operations
in Jenkins CI
http://bit.ly/dangerous-jenkins
September 26, 2018
Richard Bullington-McGuire
Principal Architect, Modus Create
richard@moduscreate.com
@obscurerichard

My Continuous Integration & CD Experience
Early Experiments - CruiseControl (Java) (2004-2008)
CruiseControl.NET (C#) 2008
Hudson + Windows / Linux + Java: 2009-2011
Jenkins (Linux, Windows, Android, iOS, Java, .NET, Python,
NodeJS, Angular, React, Objective C, etc etc.) 2011-2018
Commercial clients in a wide variety of industries

What is Expensive?
Downtime for critical systems
Deploying Cloud back end services (at scale)
Large scale load testing
Starting large scale analysis jobs - computational
fluid dynamics, geonomics, etc.

What is Destructive?
● Cloud provisioning: CloudFormation or Terraform
● Very risky: large changes w/ Infrastructure as Code
● Targeting the wrong environment by mistake
“Ooops, I just deleted the prod database server!”

Defense Strategies
1. Access Controls
2. Separate Control Systems for Production
3. Code Review
4. Human Check on Deployments
5. Opt-In Switches and CAPTCHAs
6. Small Increments with Metrics & Monitoring
7. Tiered Environments
8. Restricted Branches

Case Study: Jenkins & Terraform at Work

Case Study: Jenkins & Terraform at Work
● Education company cloud migration (4mo -> prod)
● Apps w/> 30,000 RPM at peak measured with New Relic
● Production with 80+ sizeable AMIs baseline
● Auto Scaling to 200+ AMIs under heavy load
● Multiple environments: dev, qa, staging, prod
● Terabyte-scale MySQL Aurora cluster, 50+ TB in S3
● All managed with Jenkins, Terraform, Ansible, Packer,
CodeDeploy, multiple tech stacks

What could go wrong?
● Some changes to resources destroy old & create new
○ Eg.. Instances not part of auto scaling groups
are easy to destroy by accident
● Changes to network environments may require deleting
and recreating whole server stacks
● Terraform has imperfect grasp of some dependencies -
some plans fail to execute
● Accidentally deploying to the wrong environment

That’s not exactly Continuous Deployment!
Nope.
But not every organization is ready for that,
Nor will it work to push every change to every environment
without careful review.

Access Controls
Use access controls baked into Jenkins & target systems
● Integrate with Enterprise Directory - AD or Google SSO
● Limit access to deploy jobs
● Use Jenkins secrets or other secret store to hide keys
Example: Microsoft Cloud SSO + Jenkins SAML provider

Separate Environments
Consider separating dev & prod deploy mechanisms
● Have separate CI systems for dev & prod systems
● Limit access to prod systems more strictly
● May help with Sarbanes Oxley compliance or other
enterprise security controls. Could also be just a crutch.
Eg... One Jenkins server per AWS account

Code Review
Bake code review into deployment pipeline
1. Use GitHub pull requests or equivalent mechanisms
2. Require sign-off from tech lead or multiple people
3. Have linting tools to automate some code review chores

Human Check on Deployments
Require human review of critical steps
1. Use Jenkins inputs to pause and provide an abort option
2. Set rules of engagement on how deploys are done
Example: on a mixed team of consultants and company
employees, only do a production deploy with customer staff
watching and signing off

Opt-In Switches and CAPTCHAs
Add speed bumps (CAPTCHAs) to deployments
1. Avoid expensive or destructive operations on every commit
2. Make deployers run a job manually and select expensive or
destructive operations specifically
3. Make deployers solve a simple math problem to slow them
down and think about what they are doing

Confirm?
Type CONFIRM to continue
`
What do you think happens
when people try asking for
confirmation like this?

`
BAD IDEA proven awful
through HARD EXPERIENCE
● People always just type CONFIRM,
● Or worse, the browser autofills it!
Confirm? CONFIRM
Type CONFIRM to continue

Solve it! 42
What do you get when you multiply 6 by 7?
`A simple math problem works
much better in practice

Small Increments with Metrics & Monitoring
Deploy small increments at a time and monitor everything:
1. Make small changes and test them independently
2. Monitor target systems with New Relic or similar systems
3. Write automated tests for the small increments
4. Write load tests that can test your system at scale

Restricted Branches & Deploy Constraints
Require some changes to go through a certain branch:
1. Restrict prod deployments to the master branch
2. Ensure changes have gone through code review & testing
3. Make deploy jobs fail fast if they violate constraints
4. Build switches into code that have to be backed out in
commits in order to do some types of dangerous changes
5. Restrict deploys to certain times or by metrics

Implementation

CI Environment
Jenkins
Elastic Load
Balancer
EC2 Auto Scaling
Group - Web App
Terraform Provision
Packer
Provision

Demo

Step By Step
● Builds on CIS Baseline demo from NYC DevOps meetup
○ Ansible, Packer, OpenSCAP scan
● Adds building & destroying:
○ AWS VPC
○ Auto Scaling Group with ELB
○ Simple web application (from DevOps Wall St demo)
● Has Auto Scaling Group rotation

Defenses implemented
This demo has 4 of 7 of the defenses implemented:
1. Access Controls
2. Code Review
3. Human Check on Deployments
4. Opt-In Switches and CAPTCHAs

Lessons Learned
● Keeping people in the deploy loop is good sometimes
● Mashing up Terraform, Ansible, Packer & Jenkins works
● Simple math problem CAPTCHAs are way better than
simple confirmation prompts
● Using Docker for both Packer and Terraform works well
● Working around Docker’s tendency to create root owned
files can be done with Docker busybox and a chown -R

Extending Defenses Further
● Find a way to integrate a better (real) CAPTCHA
● Use TOTP passwords (Google Authenticator style)
during job approval process
● Hook into metrics to avoid changes at most volatile
times - add extra danger indicators
● Adapt amount of approvals needed per environment
● Chef Inspect / compliance tests / Test Kitchen

Thank You!
richard@moduscreate.com
@obscurerichard

Managing expensive or destructive operations in jenkins ci

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Managing expensive or destructive operations in jenkins ci