Platform Agility, Reliability, and Security
Can you really have it all in the Federal sector?
Dan Loomis - Product Manager, VMware
Matt Goehring - Platform Engineer, SRC
Section 31 @ Kobayashi Maru
Kobayashi Maru
Section 31
Thousands of users
United States Space Force
Combined Space
Operations Center (CSpOC)
7 application teams, 9+ applications,
30+ developers
Cloud Native Platform
Built on Cloud Foundry
Strategy - Planning - Tasking - Support
Space Tasking Cycle
Delivers capabilities that
support
Kobayashi Maru @ Spring One
Platform Agility, Reliability, and Security: Can you really have
it all in the Federal sector?
Driveway to Highway: Driving Outcomes with Infrastructure
as a Product
Saving the DoD $800M: How Portfolio Management is the
Missing Link Between Agile and Waterfall
The Team
Tachyon @
San Diego
CSpOC
Kobayashi
Maru
Tachyon @
Colorado
Tachyon @
NC
Tachyon @
NY
The Section 31 software factory operates like a
modern cloud-native commercial enterprise.
Meaning…
Our software teams continuously push to deliver
capabilities to production with the Speed and
Reliability of a startup. This creates tension when
overlaying the strict security requirements of the U.S.
Space Force. So...
The platform and infrastructure teams must adapt
and help bridge these two worlds.
The balance between Agility, Reliability, and Security
Security
R
e
l
i
a
b
i
l
i
t
y
A
g
i
l
i
t
y
Tension!
What we provide
Tanzu Application Service (part of Pivotal Cloud Foundry)
● Ops Manager: LTS
● Tanzu Application Service: LTS by Sept. 2021
● MySQL: database
● Rabbit MQ: messaging
● Spring Cloud Services: configuration
● Credhub: secrets
● Minio: file storage
● Concourse: for Platform Automation and app deployments
Six non-production and production foundations on multiple IaaSs and security enclaves
The developer-ready
Azure environment was
built in weeks, not months
!
How we operate like an XP team
Outcome-based roadmapping
Iteration planning (in a backlog)
Daily standups
Pairing
Pair negotiation
Story acceptance
User-centered design
Retrospective
WEEKLY
Product
Management
Platform
Engineering
Team
Team
Product Management also drives:
● Branding and marketing
● Stakeholder engagement
● Process improvements
Where we work
Source: https://en.m.wikipedia.org/wiki/File:Intel_GreenDoor.jpg
Our customer, the software teams... ...and where we work
What we do
Onsite Operations
● Platform and app monitoring
● Application deployments
● Platform security and product
patching
● Incident management
Remote Engineering
● Automation!
● Standing up new foundations
● Standing up new products
● Enabling software team requirements
● Reducing toil
Cleared people working in disconnected
environments, with rotations due to COVID.
Uncleared people working at remote
locations. No travel due to COVID.
Initial design and engineering in our
unclassified IL4 development environment
High-side implementation and day 2 ops
AWS
Azure
How we deploy
Non-Prod
Staging Prod
vSphere
Staging Prod
● Dev
● Acceptance
● Pre-Release
Platform Automation enables: Paving, Patching, Upgrades. Product version and configuration parity across all
foundations. Maintenance during regular business hours.
Dev-Sec-Rel
App Pipeline
● Build
● Security Scans
● Release
Management
Non-Prod
How we automate
Terraform for standing up the IaaS
Platform Automation + Concourse for
managing:
● New foundations
● New products
● Product and security patching
● Application deployments
How Tanzu Application Service helps
● Cloud Foundry (CF) is an accredited and trusted software product, so
delivering new capabilities on CF is relatively fast.
● Small platform team can scale up to more foundations, apps, and users.
● BOSH keeps the VMs up and running -> everything runs on auto-pilot.
● Regular patching is a non-event. Three Availability Zones, Apps deploying
multiple instances, most services are running HA. Again, BOSH takes care of
patching the VMs.
● Most platform products are self-serve, e.g. MySQL, Rabbit MQ, etc.
● The same platform experience can be stood up on multiple IaaS and security
enclaves.
Other Federal Challenges
● It’s almost impossible for our application teams to release and support their
own applications in production, since many do not have clearances. This puts
more pressure on the Tachyon team to do this on their behalf.
● Not everyone on the platform team can get into these disconnected, secure
networks to do production work.
● Requests that flow outside of the program can take weeks or months to
approve.
● Incident management - can’t talk about it in unclassified forums!
● Lingering pressure to get to K8s.
Lessons Learned
● Managing the platform as a product really helped us deliver the capabilities
our developers want.
● Being part of a software factory helped us align to a cloud-native mindset.
● Healthy relationships are HUGE - especially with Cyber and teams outside of
our program. And our customers of course. :-)
● Don’t assume cross-agency or department processes will be fast.
● Following happy-path processes first, even if they’re frustratingly inefficient.
This establishes trust...worry about trying to optimize processes later.
Platform Agility, Reliability, and Security - yes these are achievable outcomes!
Thank you

Platform Agility, Reliability, and Security: Can You Really Have it All in the Federal Sector?

  • 1.
    Platform Agility, Reliability,and Security Can you really have it all in the Federal sector? Dan Loomis - Product Manager, VMware Matt Goehring - Platform Engineer, SRC
  • 2.
    Section 31 @Kobayashi Maru Kobayashi Maru Section 31 Thousands of users United States Space Force Combined Space Operations Center (CSpOC) 7 application teams, 9+ applications, 30+ developers Cloud Native Platform Built on Cloud Foundry Strategy - Planning - Tasking - Support Space Tasking Cycle Delivers capabilities that support
  • 3.
    Kobayashi Maru @Spring One Platform Agility, Reliability, and Security: Can you really have it all in the Federal sector? Driveway to Highway: Driving Outcomes with Infrastructure as a Product Saving the DoD $800M: How Portfolio Management is the Missing Link Between Agile and Waterfall
  • 4.
    The Team Tachyon @ SanDiego CSpOC Kobayashi Maru Tachyon @ Colorado Tachyon @ NC Tachyon @ NY
  • 5.
    The Section 31software factory operates like a modern cloud-native commercial enterprise. Meaning… Our software teams continuously push to deliver capabilities to production with the Speed and Reliability of a startup. This creates tension when overlaying the strict security requirements of the U.S. Space Force. So... The platform and infrastructure teams must adapt and help bridge these two worlds. The balance between Agility, Reliability, and Security Security R e l i a b i l i t y A g i l i t y Tension!
  • 6.
    What we provide TanzuApplication Service (part of Pivotal Cloud Foundry) ● Ops Manager: LTS ● Tanzu Application Service: LTS by Sept. 2021 ● MySQL: database ● Rabbit MQ: messaging ● Spring Cloud Services: configuration ● Credhub: secrets ● Minio: file storage ● Concourse: for Platform Automation and app deployments Six non-production and production foundations on multiple IaaSs and security enclaves The developer-ready Azure environment was built in weeks, not months !
  • 7.
    How we operatelike an XP team Outcome-based roadmapping Iteration planning (in a backlog) Daily standups Pairing Pair negotiation Story acceptance User-centered design Retrospective WEEKLY Product Management Platform Engineering Team Team Product Management also drives: ● Branding and marketing ● Stakeholder engagement ● Process improvements
  • 8.
    Where we work Source:https://en.m.wikipedia.org/wiki/File:Intel_GreenDoor.jpg Our customer, the software teams... ...and where we work
  • 9.
    What we do OnsiteOperations ● Platform and app monitoring ● Application deployments ● Platform security and product patching ● Incident management Remote Engineering ● Automation! ● Standing up new foundations ● Standing up new products ● Enabling software team requirements ● Reducing toil Cleared people working in disconnected environments, with rotations due to COVID. Uncleared people working at remote locations. No travel due to COVID. Initial design and engineering in our unclassified IL4 development environment High-side implementation and day 2 ops
  • 10.
    AWS Azure How we deploy Non-Prod StagingProd vSphere Staging Prod ● Dev ● Acceptance ● Pre-Release Platform Automation enables: Paving, Patching, Upgrades. Product version and configuration parity across all foundations. Maintenance during regular business hours. Dev-Sec-Rel App Pipeline ● Build ● Security Scans ● Release Management Non-Prod
  • 11.
    How we automate Terraformfor standing up the IaaS Platform Automation + Concourse for managing: ● New foundations ● New products ● Product and security patching ● Application deployments
  • 12.
    How Tanzu ApplicationService helps ● Cloud Foundry (CF) is an accredited and trusted software product, so delivering new capabilities on CF is relatively fast. ● Small platform team can scale up to more foundations, apps, and users. ● BOSH keeps the VMs up and running -> everything runs on auto-pilot. ● Regular patching is a non-event. Three Availability Zones, Apps deploying multiple instances, most services are running HA. Again, BOSH takes care of patching the VMs. ● Most platform products are self-serve, e.g. MySQL, Rabbit MQ, etc. ● The same platform experience can be stood up on multiple IaaS and security enclaves.
  • 13.
    Other Federal Challenges ●It’s almost impossible for our application teams to release and support their own applications in production, since many do not have clearances. This puts more pressure on the Tachyon team to do this on their behalf. ● Not everyone on the platform team can get into these disconnected, secure networks to do production work. ● Requests that flow outside of the program can take weeks or months to approve. ● Incident management - can’t talk about it in unclassified forums! ● Lingering pressure to get to K8s.
  • 14.
    Lessons Learned ● Managingthe platform as a product really helped us deliver the capabilities our developers want. ● Being part of a software factory helped us align to a cloud-native mindset. ● Healthy relationships are HUGE - especially with Cyber and teams outside of our program. And our customers of course. :-) ● Don’t assume cross-agency or department processes will be fast. ● Following happy-path processes first, even if they’re frustratingly inefficient. This establishes trust...worry about trying to optimize processes later. Platform Agility, Reliability, and Security - yes these are achievable outcomes!
  • 15.