Container Patching
Making It Less Gross Than The Seattle Gum Wall
Greg Castle
@mrgcastle, @gregcastle@infosec.exchange
GKE Security, Google Cloud
Weston Panther
GKE Security, Google Cloud
Simple View Of Patching
0 days Scanner detects
15 days Maintainer patches
30 days Production patched
Severity FedRAMP
Targets
CRITICAL/
HIGH
30 days
Medium 90 days
Low 180 days
A Trip Down Empathy Lane
2 weeks 🛡 …Bi-weekly cluster scan…
2 weeks 🛡 Hey web team, webfrontend is missing 2 CRITICAL patches
3 weeks 🛡 Friendly ping?
3 weeks 🕸 Not our code, maybe django base container?
3 weeks 🛡 Django container team, can you patch?
4 weeks 🐍 These vulns are in perl, we don’t even use perl, do we need to patch?
A Trip Down Empathy Lane
4 weeks 🛡 Yes, or better yet, remove perl.
OUT OF FedRAMP/PCI SLO
5 weeks 🐍 Patched 🎉 acme-django:v2.1.1
5 weeks 🛡 Hey web team, rebuild with acme-django:v2.1.1
6 weeks 🕸 Done!
7 weeks 🛡 Still running the old version?
A Trip Down Empathy Lane
8 weeks 🕸 Forgot to update the K8s manifest. Done!
9 weeks 🛡 …Still no? Also there’s three new HIGH vulns, but let’s get this done first.
10 weeks 🕸 Had to soak in QA first, updated for prod rollout.
11 weeks 🛡 Fixed! Who else runs django apps..?
Why It’s Gross
Humans at every step
Which layer needs patching?
No inventory
Patching unused code
Vulns faster than patches
= Slow, incomplete, unscalable
patching
Is the majority of the industry doing
better than this today?
88% of respondents:
“Challenging to ensure containerized
applications are free from vulnerabilities”
https://www.slim.ai/blog/container-report-2022/
GKE Container Patching
Case Study
Enforcement Points
Prevent: minimal containers
Detect: scanning capability/coverage
Fix: ownership, dependencies, release
Monitor: dashboards, alerting, escalations
What Containers?
Vendor/MSP containers
Containers you rebuild
K8s manifests you update
What Do We Know Anyway?
Patching for 1000s of containers across GKE,
Anthos and adjacent products
But…our environment constraints help a lot:
● Mandatory use of compiled language
● Mandatory container repo
● Mandatory base images
● Control over code/config pre-submit
● Control over release
Container/K8s Delivery Pipeline
Source
Container
Build
Package Run
Deploy
Staging Repo Prod Repo
Inventory
Dev
Good Start: Runtime Detection
Source
Container
Build
Package Run
Deploy
Staging Repo Prod Repo
Inventory
Dev
Runtime scanner:
detection and
inventory
Better: Prevention Complementing Detection
Source
Container
Build
Package Run
Deploy
Staging Repo Prod Repo
Inventory
Dev
Prevention requires integration with pipeline
Prevent
Prevent: Problems
So many containers
So many dependencies
Meeting SLO is hard without reducing volume
#Prevent
Prevent: Strategy
● Standardize base containers
● Minimal containers: Less code, less vulns, less patching
● Remove unused code: separate build and runtime images
● Two approaches:
○ Start small: Scratch, Distroless, Wolfi/Chainguard
Images
○ Slim down: SlimToolkit
● Challenge: apply consistently everywhere
#Prevent
Our Solution
● Standardize on Distroless
○ Just enough to run golang binaries
● All containers in a single registry
○ Inventory
○ Availability
#Prevent
Our Solution
Source
Dev Pre-Submit Check
image = gcr.io/gke-release/*
deployment.yaml
Container /etc/os-release
HOME_URL=https://github.com/
GoogleContainerTools/distroless
#Prevent
Is distroless
Alternatives
Container
Build
Package Run
Deploy
Staging Repo Prod Repo
Is distroless
image = gcr.io/gke-release/*
#Prevent
Alternatives: Admission
Container
Build
Run
Deploy
Prod Repo
isDistroless
attestation
Admission
image = gcr.io/gke-release/*
Verify isDistroless attestation
#Prevent
On GKE: Use Binary
Authorization
Demo:
Admission
#Prevent
Prevent: Summary
● Identify and use enforcement points
● Standardize on patchable base containers
● Standardize on container registries for inventory
#Prevent
Detect
Detect: Problems
● Which containers to scan?
● Which scanner?
○ Different coverage
○ Different vuln sources
○ Duplicate handling
○ Filtering noise
● Which layer has the vuln?
#Detect
All Container Images
Which Container? Our Solution
Source
Dev
List of Containers in
Production
Scanner
Pre-Submit Check
Image Fully Patched
#Detect
Which Container? Alternatives
Inventory
List of Containers In
Production
Scanner
In registry, but what is
running in production?
Daemonset
Prod Repo
Run
#Detect
Language Pack
Scanning
SBOM Consumption
VEX Support
Supplemental
CVE Sources
Scan programs in your container:
● Rust Cargo.lock
● Python egg files
● Go binaries / go.mod
● etc.
Scanners are starting to
support SBOMs
Filter out remediated
vulnerabilities based on VEX
More vulns from more places
● OS vendor feeds
● Github Advisories
Database
● Language-specific DBs
(vuln.go.dev)
#Detect
Which Scanner?
Base Image
Detection
Reachability
Analysis
Additional Scans
Try to determine the base
images:
● From metadata in image
manifest
● From the Dockerfile
Try to figure out if the code is
actually in use:
● Typically uses source
● Can use symbol table in
a binary
Scan all the things:
● CIS benchmarks
● Hardcoded keys
● Misconfigurations (root
user, host volume
mounts, etc.)
#Detect
Which Scanner?
False Positives vs. Coverage
#Detect
Which one is correct?
● Vulnerable module
(golang.org/x/crypto/ssh)
● Built with old golang
version (1.18.1)
● On old debian base
(buster-20210208)
Which Scanner? Our Solution
Public containers: probably “more than one”
Identify gaps and false positives
See what our customers see
#Detect
Detect: Noise
● CVEs that will never be patched (debian CVE-2004-0971,
CVE-2005-2541, CVE-2010-4756)
● Ancient low priority vulns without patches (debian
CVE-2011-4116, CVE-2016-2781)
● OS vendor has a lower rating than NVD (debian
CVE-2022-37434)
● CVE is for a different architecture (golang
CVE-2021-38297)
● CVEs that are clearly overrated (CVE-2020-29363: 9.8
down to 7.5) #Detect
User Control
Scanner Control
● Codepath is unused
● Recent CVEs with no patch
Noise: Golang Specific
● Problem: all vulns in whole version or module detected
● Solution: govulncheck can report only reachable
vulnerabilities
● go.dev/blog/vuln
#Detect
Demo:
govulncheck
#Detect
Detect: Summary
● Take advantage of new advances in coverage
● Look to your scanner vendor to help with noise
● Use silence/ignore where it fits threat model
#Detect
Fix
Problems: Multi-layer Complex Process
CVE 2023-123 in
kube-proxy
Scanner
This Container? Find Owner File Bug/
Send PR
Wait For
Rebuild
Dependent
Containers?
Done
Find Parent
yes
no
no
yes
#Fix
Our Solution: Base Images
#Fix
gke-distroless
debian-base
debian-iptables
v1
v1
v1
v2
v2
v3
v3
1. Scan the latest base images
2. If fixable vulns, rebuild
3. Repeat for eternity
v4
Our Solution: Ownership
#Fix
Find Owner
Source
Dev Pre-Submit Check
Image Ownership File
deployment.yaml
Our Solution: Simplified Process
CVE 2023-123 in
kube-proxy
Scanner Find owner File Bug/
Send PR
Wait for
rebuild
Done
#Fix
Summary: Fix
● Track container parent-child relationships
● Automate patching base images
● Comprehensive inventory and ownership
● Use existing ticket systems to track
#Fix
Monitor
Monitor: Problems
#Monitor
Container isn’t patched - who
is watching? Who do we
escalate to?
Which containers have the
CVE? Which applications use
this container?
Are we meeting our SLOs?
What are the gaps and pain
points?
Is CVE-123 patched? Has it
rolled out everywhere?
What gets measured…
Our Solution
#Monitor
New Finding?
Scanner
Detects CVE
Find Existing Bug
yes
Nearing SLO? Escalate
no
Bug Filed
Add Comment Past SLO?
yes yes
CVEs
Containers
Applications
GKE Version
Monitor: Composition
#Monitor
CVE-2021-44228?
Monitor: Visibility
#Monitor
New Image
Created
PR Merged
Manifest Updated
Qualification Starts
Rollout Begins
Patched
Clock starts
Dashboards provide status at-a-glance Track progress with metrics over time
Measure each step to find pain points
Active CVE Count By Image
Image Name Image Tag # Fixable CVEs
fake-image v1.0.1 55
fake-image v1.0.3 45
demo-image v3.5 20
nginx 1.22.1 10
Monitor: Alternatives
● Inventory: Scanners
● Composition:
○ Lyft: Cartography graph database
○ SBOMs / GUAC
○ Ignore layers, just patch: copacetic, crane rebase
● SLO:
○ Bug management software
○ Track commits and rollouts
#Monitor
Summary: Monitor
● Track SLOs over time
● Track patch/release stages to identify bottlenecks
● Use existing systems for escalation/dashboarding
#Monitor
Summary
● Standardize on registries and minimal containers
● Enforce as far left as possible
● Scanners for inventory + visibility
● Record ownership of containers
● Auto-patch if possible
● Tickets to track/escalate
Prefer automation (doing) over telling
Links
● Demo code
● Slim.ai container report
● Lyft patching blogpost
● Separate build and runtime images
● Small images: Scratch, Distroless, Wolfi/Chainguard Images
● SlimToolkit
● AllowedRepos Gatekeeper policy
● Sigstore: signing, policy controller
● GKE Binary Authorization: attestations, image policy
● Opensource scanners: trivy, clair
● Google Container Analysis
● GUAC
● The Seattle Gum Wall
Feedback
@mrgcastle
@gregcastle@infosec.exchange
Appendix: Feature Request Wishlist Idea
#Detect
Prisma Cloud has received 150 reports that dispute this severity
Aqua Security has received 231 reports that dispute this severity
Google Container Analysis has received 109 reports that dispute this severity
If enough users report a critical vuln as inaccurate, the scanner manually evaluates,
updates the severity for all their users, and works with NIST to correct NVD
#Detect

Container Patching: Cloud Native Security Con 2023

  • 1.
    Container Patching Making ItLess Gross Than The Seattle Gum Wall Greg Castle @mrgcastle, @gregcastle@infosec.exchange GKE Security, Google Cloud Weston Panther GKE Security, Google Cloud
  • 2.
    Simple View OfPatching 0 days Scanner detects 15 days Maintainer patches 30 days Production patched Severity FedRAMP Targets CRITICAL/ HIGH 30 days Medium 90 days Low 180 days
  • 3.
    A Trip DownEmpathy Lane 2 weeks 🛡 …Bi-weekly cluster scan… 2 weeks 🛡 Hey web team, webfrontend is missing 2 CRITICAL patches 3 weeks 🛡 Friendly ping? 3 weeks 🕸 Not our code, maybe django base container? 3 weeks 🛡 Django container team, can you patch? 4 weeks 🐍 These vulns are in perl, we don’t even use perl, do we need to patch?
  • 4.
    A Trip DownEmpathy Lane 4 weeks 🛡 Yes, or better yet, remove perl. OUT OF FedRAMP/PCI SLO 5 weeks 🐍 Patched 🎉 acme-django:v2.1.1 5 weeks 🛡 Hey web team, rebuild with acme-django:v2.1.1 6 weeks 🕸 Done! 7 weeks 🛡 Still running the old version?
  • 5.
    A Trip DownEmpathy Lane 8 weeks 🕸 Forgot to update the K8s manifest. Done! 9 weeks 🛡 …Still no? Also there’s three new HIGH vulns, but let’s get this done first. 10 weeks 🕸 Had to soak in QA first, updated for prod rollout. 11 weeks 🛡 Fixed! Who else runs django apps..?
  • 6.
    Why It’s Gross Humansat every step Which layer needs patching? No inventory Patching unused code Vulns faster than patches = Slow, incomplete, unscalable patching
  • 7.
    Is the majorityof the industry doing better than this today?
  • 8.
    88% of respondents: “Challengingto ensure containerized applications are free from vulnerabilities” https://www.slim.ai/blog/container-report-2022/
  • 9.
    GKE Container Patching CaseStudy Enforcement Points Prevent: minimal containers Detect: scanning capability/coverage Fix: ownership, dependencies, release Monitor: dashboards, alerting, escalations
  • 10.
    What Containers? Vendor/MSP containers Containersyou rebuild K8s manifests you update
  • 11.
    What Do WeKnow Anyway? Patching for 1000s of containers across GKE, Anthos and adjacent products But…our environment constraints help a lot: ● Mandatory use of compiled language ● Mandatory container repo ● Mandatory base images ● Control over code/config pre-submit ● Control over release
  • 12.
    Container/K8s Delivery Pipeline Source Container Build PackageRun Deploy Staging Repo Prod Repo Inventory Dev
  • 13.
    Good Start: RuntimeDetection Source Container Build Package Run Deploy Staging Repo Prod Repo Inventory Dev Runtime scanner: detection and inventory
  • 14.
    Better: Prevention ComplementingDetection Source Container Build Package Run Deploy Staging Repo Prod Repo Inventory Dev Prevention requires integration with pipeline
  • 15.
  • 16.
    Prevent: Problems So manycontainers So many dependencies Meeting SLO is hard without reducing volume #Prevent
  • 17.
    Prevent: Strategy ● Standardizebase containers ● Minimal containers: Less code, less vulns, less patching ● Remove unused code: separate build and runtime images ● Two approaches: ○ Start small: Scratch, Distroless, Wolfi/Chainguard Images ○ Slim down: SlimToolkit ● Challenge: apply consistently everywhere #Prevent
  • 18.
    Our Solution ● Standardizeon Distroless ○ Just enough to run golang binaries ● All containers in a single registry ○ Inventory ○ Availability #Prevent
  • 19.
    Our Solution Source Dev Pre-SubmitCheck image = gcr.io/gke-release/* deployment.yaml Container /etc/os-release HOME_URL=https://github.com/ GoogleContainerTools/distroless #Prevent
  • 20.
    Is distroless Alternatives Container Build Package Run Deploy StagingRepo Prod Repo Is distroless image = gcr.io/gke-release/* #Prevent
  • 21.
    Alternatives: Admission Container Build Run Deploy Prod Repo isDistroless attestation Admission image= gcr.io/gke-release/* Verify isDistroless attestation #Prevent On GKE: Use Binary Authorization
  • 22.
  • 23.
    Prevent: Summary ● Identifyand use enforcement points ● Standardize on patchable base containers ● Standardize on container registries for inventory #Prevent
  • 24.
  • 25.
    Detect: Problems ● Whichcontainers to scan? ● Which scanner? ○ Different coverage ○ Different vuln sources ○ Duplicate handling ○ Filtering noise ● Which layer has the vuln? #Detect
  • 26.
    All Container Images WhichContainer? Our Solution Source Dev List of Containers in Production Scanner Pre-Submit Check Image Fully Patched #Detect
  • 27.
    Which Container? Alternatives Inventory Listof Containers In Production Scanner In registry, but what is running in production? Daemonset Prod Repo Run #Detect
  • 28.
    Language Pack Scanning SBOM Consumption VEXSupport Supplemental CVE Sources Scan programs in your container: ● Rust Cargo.lock ● Python egg files ● Go binaries / go.mod ● etc. Scanners are starting to support SBOMs Filter out remediated vulnerabilities based on VEX More vulns from more places ● OS vendor feeds ● Github Advisories Database ● Language-specific DBs (vuln.go.dev) #Detect Which Scanner?
  • 29.
    Base Image Detection Reachability Analysis Additional Scans Tryto determine the base images: ● From metadata in image manifest ● From the Dockerfile Try to figure out if the code is actually in use: ● Typically uses source ● Can use symbol table in a binary Scan all the things: ● CIS benchmarks ● Hardcoded keys ● Misconfigurations (root user, host volume mounts, etc.) #Detect Which Scanner?
  • 30.
    False Positives vs.Coverage #Detect Which one is correct? ● Vulnerable module (golang.org/x/crypto/ssh) ● Built with old golang version (1.18.1) ● On old debian base (buster-20210208)
  • 31.
    Which Scanner? OurSolution Public containers: probably “more than one” Identify gaps and false positives See what our customers see #Detect
  • 32.
    Detect: Noise ● CVEsthat will never be patched (debian CVE-2004-0971, CVE-2005-2541, CVE-2010-4756) ● Ancient low priority vulns without patches (debian CVE-2011-4116, CVE-2016-2781) ● OS vendor has a lower rating than NVD (debian CVE-2022-37434) ● CVE is for a different architecture (golang CVE-2021-38297) ● CVEs that are clearly overrated (CVE-2020-29363: 9.8 down to 7.5) #Detect User Control Scanner Control ● Codepath is unused ● Recent CVEs with no patch
  • 33.
    Noise: Golang Specific ●Problem: all vulns in whole version or module detected ● Solution: govulncheck can report only reachable vulnerabilities ● go.dev/blog/vuln #Detect
  • 34.
  • 35.
    Detect: Summary ● Takeadvantage of new advances in coverage ● Look to your scanner vendor to help with noise ● Use silence/ignore where it fits threat model #Detect
  • 36.
  • 37.
    Problems: Multi-layer ComplexProcess CVE 2023-123 in kube-proxy Scanner This Container? Find Owner File Bug/ Send PR Wait For Rebuild Dependent Containers? Done Find Parent yes no no yes #Fix
  • 38.
    Our Solution: BaseImages #Fix gke-distroless debian-base debian-iptables v1 v1 v1 v2 v2 v3 v3 1. Scan the latest base images 2. If fixable vulns, rebuild 3. Repeat for eternity v4
  • 39.
    Our Solution: Ownership #Fix FindOwner Source Dev Pre-Submit Check Image Ownership File deployment.yaml
  • 40.
    Our Solution: SimplifiedProcess CVE 2023-123 in kube-proxy Scanner Find owner File Bug/ Send PR Wait for rebuild Done #Fix
  • 41.
    Summary: Fix ● Trackcontainer parent-child relationships ● Automate patching base images ● Comprehensive inventory and ownership ● Use existing ticket systems to track #Fix
  • 42.
  • 43.
    Monitor: Problems #Monitor Container isn’tpatched - who is watching? Who do we escalate to? Which containers have the CVE? Which applications use this container? Are we meeting our SLOs? What are the gaps and pain points? Is CVE-123 patched? Has it rolled out everywhere? What gets measured…
  • 44.
    Our Solution #Monitor New Finding? Scanner DetectsCVE Find Existing Bug yes Nearing SLO? Escalate no Bug Filed Add Comment Past SLO? yes yes
  • 45.
  • 46.
    Monitor: Visibility #Monitor New Image Created PRMerged Manifest Updated Qualification Starts Rollout Begins Patched Clock starts Dashboards provide status at-a-glance Track progress with metrics over time Measure each step to find pain points Active CVE Count By Image Image Name Image Tag # Fixable CVEs fake-image v1.0.1 55 fake-image v1.0.3 45 demo-image v3.5 20 nginx 1.22.1 10
  • 47.
    Monitor: Alternatives ● Inventory:Scanners ● Composition: ○ Lyft: Cartography graph database ○ SBOMs / GUAC ○ Ignore layers, just patch: copacetic, crane rebase ● SLO: ○ Bug management software ○ Track commits and rollouts #Monitor
  • 48.
    Summary: Monitor ● TrackSLOs over time ● Track patch/release stages to identify bottlenecks ● Use existing systems for escalation/dashboarding #Monitor
  • 49.
    Summary ● Standardize onregistries and minimal containers ● Enforce as far left as possible ● Scanners for inventory + visibility ● Record ownership of containers ● Auto-patch if possible ● Tickets to track/escalate Prefer automation (doing) over telling
  • 50.
    Links ● Demo code ●Slim.ai container report ● Lyft patching blogpost ● Separate build and runtime images ● Small images: Scratch, Distroless, Wolfi/Chainguard Images ● SlimToolkit ● AllowedRepos Gatekeeper policy ● Sigstore: signing, policy controller ● GKE Binary Authorization: attestations, image policy ● Opensource scanners: trivy, clair ● Google Container Analysis ● GUAC ● The Seattle Gum Wall Feedback @mrgcastle @gregcastle@infosec.exchange
  • 51.
    Appendix: Feature RequestWishlist Idea #Detect Prisma Cloud has received 150 reports that dispute this severity Aqua Security has received 231 reports that dispute this severity Google Container Analysis has received 109 reports that dispute this severity If enough users report a critical vuln as inaccurate, the scanner manually evaluates, updates the severity for all their users, and works with NIST to correct NVD #Detect