SpringOne 2021
Session Title: Platform Agility, Reliability, and Security: Can You Really Have it All in the Federal Sector?
Speakers: Dan Loomis, Product Manager at VMware; Matthew Goehring, Platform Engineer at Scientific Research Corporation
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
Platform Agility, Reliability, and Security: Can You Really Have it All in the Federal Sector?
1. Platform Agility, Reliability, and Security
Can you really have it all in the Federal sector?
Dan Loomis - Product Manager, VMware
Matt Goehring - Platform Engineer, SRC
2. Section 31 @ Kobayashi Maru
Kobayashi Maru
Section 31
Thousands of users
United States Space Force
Combined Space
Operations Center (CSpOC)
7 application teams, 9+ applications,
30+ developers
Cloud Native Platform
Built on Cloud Foundry
Strategy - Planning - Tasking - Support
Space Tasking Cycle
Delivers capabilities that
support
3. Kobayashi Maru @ Spring One
Platform Agility, Reliability, and Security: Can you really have
it all in the Federal sector?
Driveway to Highway: Driving Outcomes with Infrastructure
as a Product
Saving the DoD $800M: How Portfolio Management is the
Missing Link Between Agile and Waterfall
4. The Team
Tachyon @
San Diego
CSpOC
Kobayashi
Maru
Tachyon @
Colorado
Tachyon @
NC
Tachyon @
NY
5. The Section 31 software factory operates like a
modern cloud-native commercial enterprise.
Meaning…
Our software teams continuously push to deliver
capabilities to production with the Speed and
Reliability of a startup. This creates tension when
overlaying the strict security requirements of the U.S.
Space Force. So...
The platform and infrastructure teams must adapt
and help bridge these two worlds.
The balance between Agility, Reliability, and Security
Security
R
e
l
i
a
b
i
l
i
t
y
A
g
i
l
i
t
y
Tension!
6. What we provide
Tanzu Application Service (part of Pivotal Cloud Foundry)
● Ops Manager: LTS
● Tanzu Application Service: LTS by Sept. 2021
● MySQL: database
● Rabbit MQ: messaging
● Spring Cloud Services: configuration
● Credhub: secrets
● Minio: file storage
● Concourse: for Platform Automation and app deployments
Six non-production and production foundations on multiple IaaSs and security enclaves
The developer-ready
Azure environment was
built in weeks, not months
!
7. How we operate like an XP team
Outcome-based roadmapping
Iteration planning (in a backlog)
Daily standups
Pairing
Pair negotiation
Story acceptance
User-centered design
Retrospective
WEEKLY
Product
Management
Platform
Engineering
Team
Team
Product Management also drives:
● Branding and marketing
● Stakeholder engagement
● Process improvements
8. Where we work
Source: https://en.m.wikipedia.org/wiki/File:Intel_GreenDoor.jpg
Our customer, the software teams... ...and where we work
9. What we do
Onsite Operations
● Platform and app monitoring
● Application deployments
● Platform security and product
patching
● Incident management
Remote Engineering
● Automation!
● Standing up new foundations
● Standing up new products
● Enabling software team requirements
● Reducing toil
Cleared people working in disconnected
environments, with rotations due to COVID.
Uncleared people working at remote
locations. No travel due to COVID.
Initial design and engineering in our
unclassified IL4 development environment
High-side implementation and day 2 ops
10. AWS
Azure
How we deploy
Non-Prod
Staging Prod
vSphere
Staging Prod
● Dev
● Acceptance
● Pre-Release
Platform Automation enables: Paving, Patching, Upgrades. Product version and configuration parity across all
foundations. Maintenance during regular business hours.
Dev-Sec-Rel
App Pipeline
● Build
● Security Scans
● Release
Management
Non-Prod
11. How we automate
Terraform for standing up the IaaS
Platform Automation + Concourse for
managing:
● New foundations
● New products
● Product and security patching
● Application deployments
12. How Tanzu Application Service helps
● Cloud Foundry (CF) is an accredited and trusted software product, so
delivering new capabilities on CF is relatively fast.
● Small platform team can scale up to more foundations, apps, and users.
● BOSH keeps the VMs up and running -> everything runs on auto-pilot.
● Regular patching is a non-event. Three Availability Zones, Apps deploying
multiple instances, most services are running HA. Again, BOSH takes care of
patching the VMs.
● Most platform products are self-serve, e.g. MySQL, Rabbit MQ, etc.
● The same platform experience can be stood up on multiple IaaS and security
enclaves.
13. Other Federal Challenges
● It’s almost impossible for our application teams to release and support their
own applications in production, since many do not have clearances. This puts
more pressure on the Tachyon team to do this on their behalf.
● Not everyone on the platform team can get into these disconnected, secure
networks to do production work.
● Requests that flow outside of the program can take weeks or months to
approve.
● Incident management - can’t talk about it in unclassified forums!
● Lingering pressure to get to K8s.
14. Lessons Learned
● Managing the platform as a product really helped us deliver the capabilities
our developers want.
● Being part of a software factory helped us align to a cloud-native mindset.
● Healthy relationships are HUGE - especially with Cyber and teams outside of
our program. And our customers of course. :-)
● Don’t assume cross-agency or department processes will be fast.
● Following happy-path processes first, even if they’re frustratingly inefficient.
This establishes trust...worry about trying to optimize processes later.
Platform Agility, Reliability, and Security - yes these are achievable outcomes!