This document summarizes Gene Gotimer's presentation on bringing continuous delivery to a Department of Defense (DoD) project. It discusses the challenges of integrating commercial off-the-shelf products for the DoD, including custom integrations and infrequent release cycles. It then outlines Gotimer's approach to incrementally implementing DevOps practices like continuous integration, automated testing, deployments, and security testing over 41⁄2 years to help enable more reliable and frequent releases despite cultural resistance from some teams. Gotimer provides advice that continuous delivery should emerge from removing bottlenecks rather than being an explicit goal, and emphasizes adopting changes incrementally and focusing on establishing a repeatable, reliable process.
Unlocking the Future of AI Agents with Large Language Models
Bringing Continuous Delivery to the DoD
1. # B 2 0 C O N
ITSM DEVOPS CONFERENCE
Bringing Continuous
Delivery to the DoD
G e n e G o t i m e r
S e n i o r A r c h i t e c t , C o v e r o s , I n c .
#B 2 0 CON
2. # B 2 0 C O N
• COTS product integration for DoD
– custom Python glue
– and Java, PHP, Perl
• Releases every 6 months or so
– Freeze 2-4 weeks in advance
– Deploy Friday evening to Sunday
afternoon
– Repair broken functionality Monday
and Tuesday (and on)
• Barely starting Agile
– Daily Stand-ups
• (really daily status calls)
– 2-week Sprints
– Good, pruned backlog
– No automated testing
– No unit tests
– No continuous integration
The Project
3. # B 2 0 C O N
• Development – Local
– 2 Developers
– 1 Business Analyst
– 1 Project Manager
• DISA PMO
– 1 Program Manager
– 1 Chief Engineer
– 1 Program Director
– 1 Systems Engineer
• Test and Integration – Remote
– 4-6 Testers
– 4-6 Integrators
• including security experts
– 1 Information Assurance
• Off-team
– Systems Administrators
• hardware and software
The Delivery Team
4. # B 2 0 C O N
The Problem
¯_(ツ)_/¯
“It works on my machine!”
- Every developer, at some point
= HIGH RISK DEPLOYS
5. # B 2 0 C O N
DevOps is…
“How long would it take your organization to deploy a change that
involves just one single line of code?
Do you do this on a repeatable, reliable basis?”
- Mary and Tom Poppendieck
Implementing Lean Software Development: From Concept to Cash
6. # B 2 0 C O N
DevOps is…
“The goal of DevOps is not just to increase the rate of change, but
to successfully deploy features into production without causing
chaos and disrupting other services, while quickly detecting and
correcting incidents when they occur.”
- Gene Kim
Top 11 Things You Need to Know About DevOps
7. # B 2 0 C O N
Continuous Delivery
• Make releasing a business decision, not a technical decision
• High-confidence releases
– Small releases
– Fully tested
– No expectation of problems
• Hotfix releases
– Possible, no more than moderate risk and moderate coordination
8. # B 2 0 C O N
Continuous Deployment
• Continuous Deployment was not a goal
9. # B 2 0 C O N
• Started with things that were in
our control
– Dev and Test environments
– Development process
• Make changes behind the scenes
– Free/open source tools
– Easy to integrate into our CI system
– Small changes
• Disclose the changes when there
was a win
– Highlight ease of use
– Use as justification for higher
environments
The Approach
10. # B 2 0 C O N
1. Continuous Integration
2. Functional Testing
3. Automated Deploys
4. Security Testing
5. Performance
6. Culture Clash
4½ Years
• September 2009 – March 2014
The Journey
11. # B 2 0 C O N
• Trouble explaining “integration”
– between two or more developers
– not between systems
• Set up SecureCI one afternoon
• Explained the advantages later
• Wired to the ALM tool we had
– Jenkins (Hudson at the time)
– Nexus
– SonarQube (Sonar at the time)
– Automated builds
• Ant, Maven
• PMD, FindBugs, Checkstyle
• Cobertura
• Later added Python tools
1. Continuous Integration
12. # B 2 0 C O N
2. Functional Testing
• Functional testing was done manually
– from a script written in Microsoft Word
• We waited a year before staging a coup
– we didn’t want to encroach on their domain
• Demo of Selenium
– demonstrated record-and-playback through the Selenium IDE
– we recorded the first set of tests
– then turned it back over to the test team
Sound from soundbible.com, CC BY 3.0 US
13. # B 2 0 C O N
2. Functional Testing
• They argued later that automated testing was ineffective
– the automated script (singular) only worked one time, then needed to
be re-recorded when any changes got made to the app
Lesson Learned:
Automated testing isn’t just about replacing
manual tests with a test framework.
It requires a different way of thinking.
14. # B 2 0 C O N
2. Functional Testing
• We took it back
• Rewrote existing tests in Java
• Showed our business analyst how to clone-and-mutate the Java
tests
• Started with JUnit, but went to TestNG
– better tagging and parameterization
– pre-test run initialization
15. # B 2 0 C O N
2. Functional Testing
• Development team had more confidence in releases
• Also began testing user roles
– Security testing = what can this type of user NOT do
Lesson Learned:
Should have focused on demonstrating that
there were fewer escaped defects.
It was hard to point to a clear benefit.
16. # B 2 0 C O N
• Project Manager came across the
book in a book store
• Everything made so much sense
• Logical extension of what we
were trying to do
• Addressed a lot of the issues we
were running into
• No money or time for an effort,
so we adopted it as our long-
term goal
Continuous Delivery
17. # B 2 0 C O N
3. Automated Deploys
• Started with automating a Drupal web
server install
– new system, not yet in production
– database server was easy, so we skipped it
for now
• Then automated the manual COTS install
• Then started reverse engineering the
broken COTS installer
18. # B 2 0 C O N
3. Automated Deploys
• Down the road, realized we could automate everything
– Doesn’t just reduce risk, also speeds up the process
Lesson Learned:
Automate everything- even the easy stuff.
When it is easy to install, you’ll stumble
across more reasons to install it.
Go from Why? to Why not?
19. # B 2 0 C O N
• No Puppet Enterprise Server
– just manually ran puppet apply
from the command line
– every system (DB, Web server, SVN
server, ALM tool) used the same
puppet apply command
• Vagrant would have been helpful
for local deploys
– Just hadn’t heard of it
3. Automated Deploys
20. # B 2 0 C O N
4. Security Testing
• Noticed extra processes running
• Dev system in cloud with default
password
• Tested Security Blanket
– just purchased by Raytheon
– couldn’t get it purchased
21. # B 2 0 C O N
4. Security Testing
• Decided we needed at least some security in dev
– System hardening
– Web application scanning
• We knew it couldn’t replace the “official” testing
– plus, we didn’t want to encroach on their domain
22. # B 2 0 C O N
4. Security Testing
• Knew we had some good base for security
– CI, static analysis, user role testing
• Wanted a security scanner
– at the time, none worked with client certificates out of the box
• Found w3af
– Python
– customizable
– client certificate support was there, but not exposed
– handed it over to the “security experts” on the integration team
23. # B 2 0 C O N
4. Security Testing
Found 0 vulnerabilities!
24. # B 2 0 C O N
4. Security Testing
• Never got past the login screen
• Never read the output or log
• So we took it back
– Eventually had problems getting customized w3af
to work properly
– Switched to OWASP ZAP, run manually
• Security team focused on STIG and SELinux
– that was their expertise anyway
25. # B 2 0 C O N
4. Security Testing
• Lost a lot of faith in us
• Information Assurance isn’t the same as Security
Lesson Learned:
Protect every system, everywhere.
Many hacks are just there for the system,
not the data.
26. # B 2 0 C O N
• Over a few days, implemented
OpenSCAP in Jenkins for STIG
– immediately found issues
– started adding Puppet manifests for
remediation
• Started using Nikto2 for web
server scanning
– immediately found issues
• Started running weekly scans of
dev and test using OpenVAS
– no immediate issues, but started
seeing package security updates
before they became IAVMs
• Discovered SELinux was in
permissive mode
– had never been in enforcing
4+. Security Testing
27. # B 2 0 C O N
4+. Security Testing
• Easier audits
• Proactive security upgrades
• Much better relationship with the data center
Lesson Learned:
Benefits of security testing go
beyond increased security.
28. # B 2 0 C O N
• Applying STIG to database server
– seemed like it was getting slower
• Used JMeter to get baseline
• Took rough breakdown of most
common queries
• Repeated as a 15-minute test
• Monitored trend
• Added similar testing to
functional tests, another 15 mins
• Also, number of functional tests
was growing slowly
• Watched functional test elapsed
time as rough guide
5. Performance
29. # B 2 0 C O N
5. Performance
• Watching trends can be very worthwhile
• Some testing can be almost as valuable as full testing
Lesson Learned:
A baseline can be a great safety net.
30. # B 2 0 C O N
• Continuous Delivery was being
openly discussed
– PMO had just started thinking of it
as a clear plan
– Kept asking when “continuous
delivery” would be delivered, and
how it would be packaged
• Test and Integration started
complaining
– 3½ of us were pushing the 12+ of
them too hard
– moving too fast
– not a risk or control complaint,
merely effort
• People on test and integration
team started leaving
– including “Burt”
6. Culture Clash
31. # B 2 0 C O N
6. Culture Clash
• Benefits were growing clear
• Effort was minimal
• No active resistance
Lesson Learned:
Do not underestimate cultural inertia.
Some will not or cannot ever make the mental shift.
32. # B 2 0 C O N
• Test and Integration decided not
to renew their contract
– all remaining personnel ended
project with a month
• Security issue found the following
week
– deployed 3 days later
• Went back to 2-week deploy
cycles, sometimes faster
• Left 3 people on development
team
– One went back to take over for the
test and integration team as hands-
on-keyboard
– BA left project and another came in
½ time for testing
• Dropped into maintenance mode
The Aftermath
33. # B 2 0 C O N
• Development – Local
– 1 Developer
– 1 Release Manager
– ½ Tester
• DISA PMO
– 1 Program Manager
– 1 Chief Engineer
– 1 Program Director
– 1 Systems Engineer
• Test and Integration – Remote
– 1 Information Assurance
• Off-team
– Systems Administrators
• hardware and software
The Delivery Team
34. # B 2 0 C O N
• Barely Agile
– Maintenance only
– Kanban-ish
• tracking work in progress
– Daily Stand-ups
• (really daily status calls)
– 2-week Sprints
• Releases prepared every 2 weeks
– Soft freeze Thursday for Friday
release
– Deploy Friday evening
– 100% working functionality Friday
evening
– Non-event
The Project
35. # B 2 0 C O N
The Project
• Puppet took the configuration parameters
– from 200+ untracked values
– to ~30 Hiera-controlled values
• Biggest coordination issue: 72 hours for user messaging
• Biggest time consumer: 3-6 hours for VM clones
36. # B 2 0 C O N
My Advice
Lessons Learned:
DevOps and Continuous Delivery are not a goal.
Do not set out to do DevOps or CD.
Remove road blocks and bottlenecks.
Fix quality issues.
Be more responsive to change.
37. # B 2 0 C O N
My Advice
Lessons Learned:
Adopt change incrementally.
As you build a repeatable, reliable process for
delivering software, CD will “magically” appear.
38. # B 2 0 C O N
My Advice
• Read Continuous Delivery and The Phoenix Project
39. # B 2 0 C O N
• Automated deploys
– more valuable than just reducing
risk
• Vagrant
• Some security scanning earlier
– do not just assume someone else is
doing it
• Some performance testing earlier
– some is a lot better than none
– maybe almost as good as a lot
• We relied on client-side
certificates for authentication
– EJBCA should have been set up
immediately
• Upgrades are a huge time sink
– components, libraries, applications,
system software
– add tools to track it as early as
possible
Missed Opportunities
40. # B 2 0 C O N
• Jenkins
• Puppet (no Puppet Enterprise)
– 2 puppet apply commands per
server
• one --noop for system audit
• one for deploy
• Security
– OpenSCAP (every deploy, minutes)
– OpenVAS (every weekend, hours)
• included Nikto2
• used Kali Linux
– OWASP Dependency Check (on-
demand, many minutes)
– OWASP Zed Attack Proxy (on-
demand, few days)
– Full role-based Selenium test
coverage (every deploy, overnight)
• 10k+ Selenium tests via TestNG
The Tool Chain
41. # B 2 0 C O N
• Testing
– TestNG for Java unit tests
– Nose for Python unit tests
– Mockito/Mockito for Python
• JMeter
– for some representative
performance tests
• Static Analysis - Java
– PMD
– FindBugs
– Checkstyle
– Cobertura
– SonarQube
• Static Analysis - Python
– Pylint
– coverage.py
The Tool Chain
42. # B 2 0 C O N
Gene Gotimer
gene.gotimer@coveros.com
@CoverosGene
Questions?
43. # B 2 0 C O N W W W. B E Y O N D 2 0 C O N F E R E N C E . C O M
THANKS FOR JOINING THE SESSION!
LET US KNOW WHAT YOU THOUGHT.
ITSM DEVOPS CONFERENCE
Editor's Notes
It isn't always feasible to get top-level buy-in to say "let's do DevOps." I'll discuss techniques and tools we've used to bring a DevOps mindset and Continuous Delivery practices into an environment that wasn't already Agile.
This is a war story. I'll talk about how we were able to start in development, where we had the most control, with a "let's starting being Agile" initiative and working on "why is continuous integration important?" From there, we incrementally brought our practices through "higher environments" until the project was confidently delivering working, QA-tested, security-tested releases, ready for production every two weeks.
Coveros is a consulting company that helps organizations build better software. We provide software development, application security, QA/testing, and software process improvement services. Coveros focuses on organizations that must build and deploy software within the constraints of significant regulatory or compliance requirements. The primary markets we serve include: DoD, Homeland Security & associated critical infrastructure companies, Healthcare providers, and Financial services institutions
There were other people on the team, but these were the technical people directly involved in getting features and releases out the door.
Test and Integration team was responsible for all testing, all installation processes, all security testing/scanning, and coordination with Operations. We could not talk to the sysadmins. No hands on keyboards.
Typical government contract, the Test/Integration/Security team was a different contract, although we worked well together and didn’t have near as many issues as some cross-contract teams do. Or so we thought.
Definitely not trying to slam the other team. They worked better than many gov’t contractors, and their world was the traditional DoD, long-term, stable, slow-and-steady wins the race type of projects. This journey ended up being a huge culture shock to them.
-- Every developer, everywhere, at some point
In this case test and integration team.
Ran on servers that were configured differently, with different security restrictions because it was in a different data center.
High drama, high risk, lots of deliberation
We didn’t know it at the time, but in retrospect we had every anti-DevOps stereotype: painful releases, so do fewer, with more changes
Also, deliberate throwing over the wall between integrator testing/documenting the deploy and integrator doing the deploy. That is how the team was making sure the deploy notes were complete, if someone could install the software sight unseen with the documentation they had prepared. They saw that as part of separation of duties.
Everyone has their own definition, so we were thinking of it as
So when I say we got to a Continuous Delivery process, this was what we achieved
This is the approach that worked for us.
THIS IS NOT PRESCRIPTIVE! Your situation is different, so your approach may be different.
Also, while it is nothing we did, we had air cover from a really, really strong advocate and a directive to be an “exemplar” project, and show other DoD projects that they could be Agile and show how to do it. We never would have succeeded without a champion to buy us the time and flexibility to undertake the changes to the process we made.
In all cases, we started with the pieces that were within our control (dev/test) and showed the value there as justification to push out further to “higher” environments and outside our immediate team.
A lot more overlap than I’m going to describe, but it did happen roughly in these waves over 4 years.
When we started, we were not driving to a CD process. We didn’t even know what CD was.
Continuous integration to them was a two week test cycle that could be kicked off with roughly 6 weeks of lead time run once or twice a year. We explained that was neither continuous nor developer integration.
They didn’t see the value in CI, but didn’t see any harm since it was just an afternoon to get Jenkins set up anyway. We stuck with primarily open source software, because this wasn’t an explicitly funded effort. Plus it made it instant (no lead time for procurement).
This gave us a strong basis for CD later, although we didn’t know it at the time.
Lesson learned: CI is valuable, but outside the dev team it isn’t obvious. Also, the biggest advantage to open-source tools is often acquisition time, not acquisition cost.
After a few months of CI, functional testing became the biggest bottleneck and showed the least value.
Lesson learned: Automated testing isn’t just replacing manual with a test framework. There is a different way of thinking.
The test team was happy not to be burdened with testing.
Since it was COTS, focused on testing system interfaces, not application functionality
Positive and negative user role testing was a great idea in retrospect. Strong basis for security and highest risk point when adding new functionality.
Lesson learned: Should have focused on demonstrating that there were fewer escaped defects. It was hard to point to a clear win.
Project manager dropped to ½ time to free up budget, hired release engineer as Puppet “expert” even though he didn’t know Puppet at the time.
Chose Puppet almost at random. Chef or Ansible would have worked just as well. None of them is a wrong choice, and anything is better than nothing.
Lucky timing: Integration team was focusing or had just finished a 6-month, full team effort to update OS from 32-bit to 64-bit.
COTS install was a crap shoot, didn’t always work, took several days, never seemed to end up installed the same way. When we called the vendor, they confirmed their professional services had a 50% success rate. Their solution was just to try again. It almost always works on the second try.
Used Drupal success as a clear win for automating COTS install.
Integration team was happy not to be burdened with integration or documenting the install process.
COTS vendor eventually called us and asked for our Puppet code. We said no. Largely out of spite.
Lesson learned: automate everything, even the easy stuff. It isn’t just to reduce risk, but also to speed things up. If you can install it easily, you will stumble across reasons to install it more often. We went from Why? to Why not?
Couldn’t easily get funds for license, extra server, nor accreditation for Puppet Enterprise.
Also, Vagrant would have been useful here had we known about it. With so few developers, coordination was easy and we almost never had conflicts. But Vagrant would have been easier if even one more dev added, especially if not colocated.
Had a hacker get into a dev system with a default password. Just an email bot, not a directed attack.
But we realized how lax we were with security, even in the safety of a dev env.
Again, open source acquisition was faster. Raytheon had just purchased Security Blanket and no one knew how to sell it to us.
3 months of effort, finally got results
Announced on daily call
Our non-technical BA was able to see the issue right away.
Never got past the login screen
But didn’t start at the beginning, so they even missed a XSS bug on the home page
The security guys were happy not to be burdened with security testing. They were IA, really checklist guys anyway.
Security Technical Implementation Guide
Lesson learned: IA not the same as security. And protect every system everywhere, it doesn’t matter if it has “production” data, many hackers just want the system.
Pattern recognition began to set in.
In their defense, this was automated vs. manual checks, not really incompetence
Found SELinux when Puppet code ran and assumed SELinux was enforcing, COTS product could not work in enforcing
Lesson learned: The benefits aside from increased security are significant: easier audits, proactive security upgrades on our schedule, and in the long term a much better relationship with the data center Ops guys.
We were applying STIG to database settings and noticed the database got slower
We never got around to adding true L&P or stress, or even response times.
Lesson learned: Even without formal guidance, a baseline is a great safety net. A relatively short test to show the trend over time is very worthwhile.
Remember, we were doing the development, testing, writing functional tests, security scans, and automating the deploys
“Burt”- no last name because I never met him in 2-2½ years. Nor heard of him, before or after the status call when they announced he was leaving. He wasn’t on the call that day. Never spoke up on the daily status call. Never showed up to a release planning meeting that happened every 6 months, sometimes in their offices. No tasks assigned nor delivered. Never supported a deploy. They were sad to see him go.
No back filling positions, just let the test and integration team atrophy.
Lesson learned: No matter how many advantages and benefits you show, even if there is little effort to be expended, some people/teams/orgs will never make the mental shift. It wasn’t active resistance, they just couldn’t/didn’t make the mental shift.
These results have been reinforced by our experiences at other gov’t agencies and commercial clients
5 years in, 4 years since CI introduced. May have been a little more notice, but I don’t believe so.
Used to have 2 deploys a month split between two sets of properties, dwindled to no more than once a month due to “moving too fast”
Smaller local team, only 1 on remote
to ~30 Hiera-controlled through standardization and composition
Jeff Payne- “Don’t do Agile, you have to be Agile”
Not focusing on DevOps/CD as a goal will help you prioritize what needs to be improved and will show you benefits sooner
Incremental adoption
less culture shock
more visible and concrete benefits sooner
Monty Hall Let's Make a Deal (1963–1977)
said “Actually, I'm an overnight success, but it took twenty years.”http://www.brainyquote.com/quotes/quotes/m/montyhall192769.html#QTimGb64Le97MELL.99
5 envs: dev, test, uat, staging/RACE, production
OWASP Dep Check every new library and periodically
OWASP ZAP any change in security posture
Enough functional testing, regression testing, performance testing, and security testing to give us confidence