Secure software supply chain on a shoestring budget

www.scling.com
Secure software supply chain
on a shoestring budget
Lars Albertsson, Founder, Scling
Jfokus, 2022-05-04
1

www.scling.com
Losing battles
2
https://www.carbonbrief.org/unep-1-5c-climate-target-slipping-out-of-reach
https://www.idea.int/gsod-indices/faqs
"I am here to bring you the bad news,
which is that we are not winning. We are
really losing this battle [on security]."
- Vinton Cerf

www.scling.com
What do we contribute?
● Internet, digitalisation + many good little things
● Ability to measure and manipulate populations at scale
● Monetising bad security
○ Stolen CPU cycles → money
○ Ransomware
3
https://spinbackup.com/blog/24-biggest-ransomware-attacks-in-2019/
https://blog.chainalysis.com/reports/2022-crypto-crime-report-preview-ransomware/
https://www.theguardian.com/news/2018/mar/17/ca
mbridge-analytica-facebook-influence-us-election

www.scling.com
vs
Risk-management rarely wins
Employees have conflicting definitions of success
Security vs productivity
4
Revenue-generation
Features
Delivery speed
Security reviews
Pentests
Password reauthentication
Phishing campaigns
Firewalls
…

www.scling.com
A simple recipe for application security:
- While we value items on the right, we value items on the left more.
- Invent alternatives that are aligned with speed
- Give employees aligned definitions of success
Security AND productivity
5
SSO
Password managers
Infrastructure as code
Hardware MFA
Ephemeral containers
…
Security reviews
Pentests
Password reauthentication
Phishing campaigns
Firewalls
…

www.scling.com
Quality expectations 1995-2002 Quality expectations 2022
We have been here before
6
https://www.cnet.com/culture/windows-may-crash-after-49-7-days/

www.scling.com
Quality and ops
7
Aligning quality with speed
TDD
Continuous
delivery
Agile
Dev-friendly
ops tooling
Test
automation
XP
Cross-functional
teams
DevOps
Trunk-based
Continuous
integration
Containers

www.scling.com
● Scaled processes
● Machine tools
● Challenges: scale,
logistics, legal,
organisation, faults, ...
Manual, mechanised, industrialised
8
● Muscle-powered
● Few tools
● Human touch for every
step
● Direct human control
● Machine tools
● Low investment, direct
return

www.scling.com
IT craft to factory
9
Security Waterfall
Application
delivery
Traditional
operations
Traditional
QA
Infrastructure
DevSecOps Agile
Containers
DevOps CI/CD
Infrastructure
as code

www.scling.com
● Toyota: Low defect rates AND high margins per vehicle
● State of DevOps report: High reliability AND high deployment rate
○ We have industrialised software engineering
Quality, speed - choose two
10
Quality
vs
Speed
Quality
AND
Speed
1000x span in
availability metrics

www.scling.com
Themes of good presentations, IMHO
● We have seen lots of X / X from a different angle. Here are some patterns.
● We have context Y. Here is how we work.
● We did a thing Z. Here is what we learnt.
11
We need to share how we work
in order to make faster progress.

www.scling.com
Security Waterfall
Data factories
12
Application
delivery
Traditional
operations
DevSecOps
Traditional
QA
Infrastructure
DB-oriented
architecture
Agile
Containers
DevOps CI/CD
Infrastructure
as code
Data factories,
data pipelines,
DataOps

www.scling.com
Data industrialisation
13
DW
~10 year capability gap
"data factory engineering"
Enterprise big data failures
"Modern data stack" -
traditional workflows, new technology
4GL / UML phase of data engineering
Data engineering education

www.scling.com
How data leaders work
14
Data processed offline
Online
Data factory
Data platform & lake
data
Data
innovation &
functionality
100+K daily
datasets
30% staff
BigQuery daily
users
Value from data!

www.scling.com
Scling - data-factory-as-a-service
15
Data value through collaboration
Customer
Data factory
Data platform & lake
data
domain
expertise
Value from data!
Rapid data
innovation
Learning by doing,
in collaboration

www.scling.com
Efficiency is sacred
● Productivity is our unique selling point
○ Client value from data is unpredictable
○ Clients don't know what they want
○ Quick experiments & pivot
● Minimal operational overhead
○ Pipelines / person
○ Datasets / day / person
● Nothing must undermine our USP
16

www.scling.com
Our security strategy
● Invest where it improves productivity
○ Cloud single sign on
○ Cloud identity management
○ Workload identities over secret tokens
○ Hardware multifactor authentication
○ Infrastructure as code
○ Patch management *
● Homogeneity over autonomy
○ Few technologies
○ Few processes
○ Processes encoded in code *
17
● Minimal attack surface *
● Strict asset management
○ Digital assets as code
○ Process to align assets with code
○ Explicit manual asset management
● Lean on Google

www.scling.com
Minimising attack surfaces
● Few ecosystems
○ Ubuntu
○ Scala + Spark
○ Python
● Few components
○ Reuse over perfect match
● Few versions
○ Single version per third party component
○ Opens gates to dependency hell *
■ Control or autonomous cells
18

www.scling.com
Our supply chain
● Google cloud
○ Kubernetes, GCS, Cloud SQL, …
● Virtual machine images
○ Ubuntu, Google
● Container base images
○ Ubuntu, phusion, MySQL, …
● Apt packages
● SaaS
○ Google, Atlassian, Gitlab
19
● Scala (+ other JVM)
○ Maven central
● Python
○ Pypi
● Direct downloads
○ URL + checksum
● Bazel plugins
○ URL + checksum
● Developer devices
○ Ubuntu, MacOS, Android, iOS

www.scling.com
Which version?
● Version specifications
○ Exact version
■ Good for application stability
○ Range
○ Latest
■ Good for patch latency
● Specification choice tradeoffs
○ Provider trust
○ Patch latency
20
● Upgrade tradeoffs
○ Vulnerability patching
○ Rogue code
○ Bugs fixed
○ Bugs introduced
○ Necessary work
● Our goal:
○ Exact version
○ Transitive dependencies locked
○ Automatically updated
● Let's pursue!

www.scling.com
Levels of up to date
● No new version of A exists
● New A version exists. Application verified ok with upgrade.
● New A version exists. Unclear whether upgrade breaks application.
● New A version exists. Upgrade breaks application.
○ We use a deprecated API.
○ New version has bug.
● New A version exists. Upgrade breaks dependency B.
○ New version of B exists.
○ No new version of B exists.
○ A and B must atomically upgrade
21

www.scling.com
A bot friendly task
● There is some order that moves us forward through hell
● Slow trial and error cycle
○ Compile or test takes minutes
● There are bots
○ Dependabot, Scala steward
■ Way too complex (100/20 KLOC, 1000s lines of doc / examples)
○ Do not cover our needs
■ Application correctness
■ Our ecosystems
22

www.scling.com
With a strong process
● we can reason and automate
○ Trial and error forward
● Process strength
○ Faulty change is detected before prod
○ Non-code changes unlikely to affect correctness
○ Self-bootstrapping
23

www.scling.com
Strong process challenges
● Everything not covered by tests
● Test infrastructure / setup defined by code
○ How to test?
○ How to bootstrap?
● Indeterministic processes / components
○ Mostly deterministic is ok
24
Extended test suite:
● Testsuite bootstrap
● Continuous deployment testsuite
● Non-production functionality
○ Dev tooling
○ Web
○ …

www.scling.com
Our build process
● Monorepo + trunk-based
○ Platforms + all client code and pipelines
○ Single version of platform
● All tests verified* for every change
○ Tests do not require cloud resources
● Build + test speed challenging
○ Spark → seconds upstart time → slow tests
● Simple recipe for speed:
○ Avoid doing things → caching
○ Do things in parallel
25

www.scling.com
Bazel
● Designed for monorepos & strong process
○ Lazy tree evaluation
○ Isolated sandboxes
● Unmatched performance features
○ Isolation → reliable caching
○ Test result caching
○ Remote caching
○ Parallelism
○ Remote execution
26
● Great for stuff used by Google
● Catching up on
○ Docker
○ Scala
○ Third-party dependencies

www.scling.com
Dependency version control
● Transitive, locked
○ Python
○ JVM
○ Lock files in version control
● Not transitive, locked
○ Direct downloads
○ Bazel plugins
○ Container base images
○ version.bzl file
■ → bazel, python, bash
27
● Apt packages
○ Latest*
● Some Google components
○ VM base images, misc
○ Latest
● Employee devices
○ Manual
● Unmanaged leftovers
○ SaaS
○ Otherwise minimal exposure

www.scling.com
bazel-deps
28
dependencies.yaml
workspace.bzl

www.scling.com
pip-tools
29
requirements.in
requirements.txt
BUILD.bazel
bootstrap tooling

www.scling.com
pip-compile, build time: bazel-deps, run time:
Python vs JVM dependency failure
30

www.scling.com
Bazel & containers
31
{scala,py}_binary
base image
files / tars
{scala,py}_image
container_run_and_commit_layer
Weak determinism
Apt, files only
Distroless tools
install_pkgs

www.scling.com
Can we make apt install deterministic?
● apt-get typically provides latest
○ Determined by Packages.gz
○ Download during build breaks determinism & caching?
● Distroless bazel package_manager:
○ Exact Packages.gz specification
○ Debian: Versioned Packages.gz
○ Ubuntu: Only latest Packages.gz
● Compromise on determinism
○ Download Packages.gz before build
○ Caching still ok
● Not running apt scripts seemed to work. For a while.
○ Subtle low-level container failures
○ Abandoned
32

www.scling.com
● Single unified platform
○ Monorepo + trunk-based process
○ Separate instance per client
○ All test suites run on every change
● Factories are adapted to constraints and important properties
○ Ok: Security, risk, quality, availability, compliance
○ No: Preferred technology, work processes
Scling collaboration models
33
Refinement factory
● Raw data in
● Valuable data out
● Non-technical clients
● "Easy" domain
Joint factory
● Hybrid teams
● Domain experts
● Data apprentices
● Scling runs data platform
Client factory
● Start as joint factory
● Goal: Client independent

www.scling.com
Divided, multi-tenant platform
34
Orion
base data platform
GCP (but portable to other clouds)
Isolated
client
instance
Isolated
client
instance
Isolated
client
instance Saturn
non-essential
operational tooling
ion CLI tool
scli CLI tool

www.scling.com
Client exit scenario
35
Orion
base data platform
Client cloud choice
Isolated
client
instance
Client monitoring,
logging, identity, etc
ion CLI tool

www.scling.com
Multiphase build bootstrap
36
Ubuntu
some python
docker
benderbot
python 3.x.y
JVM
bazel
py deps
ion
gcloud
kubectl
scli
hugo
orion/bin/tool.py
versions.bzl
requirements.txt
● Images cached based on
content
● Caches shared

www.scling.com
Benderbot
● Lazy bot that takes the easy way out
○ Dumb solutions over smart
● Find Guess next versions
○ 404 not found? Quick failure.
● Mimic developer actions
○ Upgrade source
○ Rerun bazel-deps / pip-compile
○ Run build bootstrap, test suite, dev tooling check
○ Look at logs to classify problem
○ Update checksum if necessary
○ Create merge request on success
37
● Isolated environment
○ Separate region
○ No internal network access
○ Gitlab + logging bucket credentials
○ Cheap spot instance + NVMe

www.scling.com
● Months of evening hacking
○ = weeks full time
Benderbot components / efforts
38
benderbot.py
< 1000 LOC
Statistics
data pipelines
Reporting dashboard
tool.py
few LOCs, brittle
Classification
data pipeline
Reevaluation journey:
● dash + plotly
● bokeh + bokeh
● streamlit + bokeh

www.scling.com
Benderbot reports
39

www.scling.com
Resolution classifications
● No new version of A exists
● New A version exists. Application verified ok with upgrade.
● New A version exists. Unclear whether upgrade breaks application.
● New A version exists. Upgrade breaks application.
○ We use a deprecated API.
○ New version has bug.
● New A version exists. Upgrade breaks dependency B.
○ New version of B exists.
○ No new version of B exists.
○ A and B must atomically upgrade
40
not found
test failure
success
test failure
test failure
test failure
transient
transient
transient
transient

www.scling.com
Our most productive developer
~500 MRs
41

www.scling.com
Benderbot stats - resolutions
42

www.scling.com
Benderbot stats - resolutions
43
More
hardware
Process
flakiness
Speculative
execution

www.scling.com
Resolutions by kind
44
Total
Other
JVM
Python

www.scling.com
Last resolution by dependency
45
Total
Other
JVM
Python

www.scling.com
Time between scans
46

www.scling.com
Google SLSA evaluation
● Supply-chain Levels for Software Artifacts
○ Maturity model
● SLSA 1: yes
● SLSA 2: yes
● SLSA 3: some
○ Prioritising speed over Ephemeral Environment,
Isolated, Non-Falsifiable
● SLSA 4: some
○ Parameterless
○ Dependencies complete (except apt)
47

www.scling.com
Concluding remarks
● Challenges?
○ Operational tuning to balance rate vs €
○ Google cloud_sql_proxy patch update took us down
○ Diva dependencies need custom solutions
○ Which test failure to address?
● Future?
○ Upgrade conditional on container scanning?
○ Dead dependency detection?
● Open source? No.
○ Specific to our environment
○ Bot is easy. Just do it.
○ Strong process challenging. But rewarding.
○ Offer: A copy of the code for a C-level lunch date. :-)
48

www.scling.com
Resources
https://trunkbaseddevelopment.com/
https://reproducible-builds.org/
https://www.scling.com/presentations/
49

Secure software supply chain on a shoestring budget

Recommended

Recommended

More Related Content

Similar to Secure software supply chain on a shoestring budget

Similar to Secure software supply chain on a shoestring budget (20)

More from Lars Albertsson

More from Lars Albertsson (20)

Recently uploaded

Recently uploaded (20)

Secure software supply chain on a shoestring budget