1. Software rotting or
why you need to change your approach to security
Giulio Vian
16 May 2022
@giulio_vian
https://www.getlatestversion.eu
http://blog.casavian.eu
https://www.slideshare.net/giuliov
https://github.com/giuliov
2. Executive Summary
Software decays rapidly, and decay rate
is speeding up.
Security is the main force, but not the
only one.
We must improve tooling and practices
to cope with this increased velocity.
Technical Inflation helps Management
understand what is going on.
Assume you know what SCA or SAST is
Image source: Public Domain
3. Hardware spec:
1 KB RAM
4 KB ROM
First computer Past employers Communities
Giulio Vian Principal DevOps Engineer
@giulio_vian
giuliovdev@hotmail.com
15. How broadly?
How many teams, repos,
and pipelines?
My company has 3,000 repos
across 100 teams, storing over
13 million lines of code, and
using 2,800 pipelines
A single vulnerability may affect 10s teams
and 100s of repos
Are they distributed or
centralized?
Image: The Crowd For DMB 1 by Moses
16. Finding code,
automated
SCA† tools are
pipeline–bound
Rarely built code
Pipeline does not work anymore
† Software Composition Analysis
Image: Automated storage and retrieval system using TGW Stingray
by TGWmechanics
17. Fixing code
Scan all repositories
Patch code
Latest or specific version
Can be automated?
Image: robotic arm in the Conrad Prebys Center for Chemical Genomics
by Josh Baxt
18. Can you
expedite?
Separation of Duties
Regulation / audit requirement
Slows 0-day patching
Tightly controlled usage
Automated checks
Single commit with limited
churn
Additional approvers for
quick turnaround
Image courtesy of SpaceX
21. Affected by
Vulnerability
Application stack
Container images
Virtual Machine images
Application itself
Application code
Libraries
Internal
3rd party
Self-contained run-time
Application
Run-time
OS
libraries
Base
image
Self-
contained
22. Base images
vmdk, VHD, VDI, OVA, …
AMI , VHD
Docker, OCI, ACI, …
Application
Run-time
OS
libraries
Base
image
23. App Platform shift
Chrome 1 month patched after 14 days
Node.JS 30 months (LTS) patched every 25 days
6 months
Go 6 months patched every 26 days
Two major releases supported.
MongoDB 30 months patched every 5 weeks
.NET 3 years (LTS) patched every 6 weeks
18 months
Java 3 years (LTS) patched every
6 months 12 weeks
24. Redeploy.
Every. Day.
Simplest pattern
Once automated
patching is in place
Must cover rollback scenario
Zero-downtime deploy
in place
Consider pipeline
resources
Image: the gerbil wheel pose by dbgg1979
25. Bill of Materials
on steroids
Reverse indexes
Library → Binaries [SCA tool]
O.S. API → Binaries [SAST tool]
Binary → Pipelines [artifact store]
Pipeline → Repo(s) [pipeline tool]
Pipeline
Binaries
Production
Libraries
Repo(s)
27. Costs
Investment to optimize
patching and
deployment processes
Increased on-going cost
to rebuild as needed
On top of SCA & other
security tooling
Image by Jayne Simmons
28. Technical Debt
«describes the consequences
of software development
actions that intentionally or
unintentionally prioritize
client value and/or project
constraints such as delivery
deadlines, over more
technical implementation
and design considerations.»
Holvitie J., Licorish S.A., et al. - Technical
debt and agile software development
practices and processes – Information and
Software Technology, issue 96 (2018) p.142 Image by ThoBel-0043
29. Technical
Inflation
Unintended reduction
in value of a software
product over time,
independent of source
code changes.
Depreciation does not
capture two elements:
Unintentionality
Value can be restored
Image source: Max Pixel
31. Track progress
Security SLA
Mean time to implement a
security Fix
From notice (e.g. CVE) to dev
Mean Time to Patch
Production
From dev to prod
Image by Tumisu
36. References (2/5)
https://heartbleed.com/
Why Every Business Is a Software Business — Watts S. Humphrey Informit, Feb 22, 2002
http://www.informit.com/articles/article.aspx?p=25491
https://en.wikipedia.org/wiki/Watts_Humphrey
https://www.sonatype.com/resources/state-of-the-software-supply-chain-2021
https://www.shopify.com/enterprise/global-ecommerce-statistics
https://blog.cloudflare.com/popular-domains-year-in-review-2021/
https://radar.cloudflare.com/year-in-review-2021
https://snyk.io/blog/net-open-source-security-insights/
https://www.contrastsecurity.com/the-state-of-the-oss-report-2021
https://octoverse.github.com/static/github-octoverse-2020-security-report.pdf
42. Impact
IBM
Lost business represented 38% of total breach cost.
287 Average number of days to identify and
contain a data breach.
Ransomware attacks cost an average of $4.62
million .
43. Inflation
Inflation refers to a general
progressive increase in prices of
goods and services in an
economy […] consequently,
inflation corresponds to a
reduction in the purchasing
power of money.
Source: Wikipedia
Image: Public Domain
44. Costs
This is not keeping the lights on
It is more similar to insurance
Requires CI/CD maturity
All component build, test and deploy automated
Preventive updates minimize fast-track usage
More red-taping until tools catch up
Welcome everybody to this session about Software Rotting. My name is Giulio Vian and I will start with a brief overview so you can see if this session suits your taste.
Today, I hope to convince you that we have serious problems in the way we patch and deploy applications, problems that we must address as an industry. At the core a perfectly working application today, is a huge risk tomorrow.
That’s why I speak of decay and rotting, because it is not a slow process. Wear, erosion, rust… They do not convey the urgency and work required to preserve from decay.
#1 unless you put it in a fridge or in a can, it starts smelling very soon
#2 those other processes requires time, while rotting requires quick action to stop it
I am not sure big an effort is to fix processes and tool to cope with security-related problems – the one this audience is acquainted to --. Security is the main driver, although not the only one.
To change process and invest in tools, we have to speak to leadership/executive using a simple but effective vocabulary, so I suggest using the word inflation to convey the idea and start a discussion.
As you may guessed, this presentation is a bit visionary, high-level, I will talk about industry trends and process not technology. For those interested in technology details, I recommend the sessions of my friends Michael Kaufmann and Matteo Emili.
Now you have a couple of minutes to switch if you are not interested.
Who am I?
I work at Unum, a Fortune 500 company, with more than a thousand people in IT.
I studied DevOps for over 10 years, so, no I am not an InfoSec professional. One thing I learned over the years: I try to solve a new problem each day, but some issues take years to go away.
Awarded by Microsoft as Most Valuable Professional on Azure DevOps category in the last few years. I speak at international conferences.
If you want to discuss today’s ideas or other DevOps topics you can reach me at Twitter as giulio_vian or email me directly.
Today’s presentation has four sections:
The main problem (security)
How it impacts developers’ work
and Operations’ work
and the DevOps perspective, so the overall impact
I bet you already know this, maybe you saw bits and pieces of it, so a recap should be useful to grasp the overall picture.
What these vulnerabilities have in common? They affected widely used libraries, generating major security storms.
Log4J is a Java library used for two decades
OpenSSL a C library at the core of HTTPS
pac-resolver a piece of Javascript to configure an HTTP client
WinSCPHelper a .NET library
Each language stack had/has a major issue in a library.
…display the same pattern, even more.
Why?
Apps use a lot of open source libraries, increasingly. And those libraries have vulnerabilities.
Open a parenthesis.
There are substantial differences on the number and depth of dependencies across different developer stacks.
An average Javascript app uses hundreds of libraries, while a .NET app only a couple of dozens, Java is somewhere between.
I could not find data about other platforms. How is the state for Python, Go, Ruby, etc.?
Close parenthesis
You knew this was coming, ah?
Both graphs illustrate that we, as an industry, aren’t exactly great at reacting and fixing our applications.
The one on the left is data about OSS projects.
The one on the right is more interesting because based on telemetry data, a more significant insight on IT organizations.
Fixing a library vulnerability is developers’ responsibility, but how it works in practice? Let’s see Kevin’s perspective.
Here we discuss how to identify:1. the code that needs to be patched
2. the pipeline that release that code in Production
and some issues that one may face:
If more than one branch can reach prod, which one you choose?
How do you match the exact version of code?
Software Composition Analysis kicks in only through pipelines? Is triggered by the deploy pipeline?
The deploy pipeline hasn’t been used in months and doesn’t work anymore (e.g. a token expired, or there is no more an apt agent)
It does not require a big organization to have a lot of objects to handle.
The trend in architecture is micro-services which translates in lot of independent code. It is quite normal to a 1:10 ratio or higher, i.e. one developer works on at least ten different repositories, each independently built and deployed.
Things gets worse if teams manage their own CI/CD infrastructure. You must reach every team and ask them to redeploy!
Distributed team => time zones, delay in communication.
Here we discuss how to identify:1. the code that needs to be patched
2. the pipeline that release that code in Production
and some issues that one may face:
If more than one branch can reach prod, which one you choose?
How do you match the exact version of code?
Software Composition Analysis kicks in only through pipelines? Is triggered by the deploy pipeline?
The deploy pipeline hasn’t been used in months and doesn’t work anymore (e.g. a token expired, or there is no more an apt agent)
Can be automated? <pause>To my knowledge there are some tools that do some of the work, like GitHub dependabot
It scans sources and proposes changes via a pull-request mechanism
It does not support all package manager, though, and some features requires GitHub
And clearly we need to input which is the correct version to use. We have seen toolchain attacks were the fix was to rollback, haven’t we?
Fixed the code we only care deploy it to Production, but many of us works in a regulated industry where a Release Management role, separate from developers, may be required by SOX, Basilea, HIPPA, and so on
You need speed when it is a 0-day exploit. You should be able to deploy a patch within hours of its release from a 3rd party (an OSS project or a vendor).
Thus, your organisation need special type of pipelines called fast-track or expedite.
These must balance the audit requirements with the speed, so they must be restricted for urgent patching.
Only a new CVE or a communication from the Security team can enable them with all pre-approval.
They have special checks, for example that changes are limited to build scripts (pom.xml, build.gradle, *.csproj, Makefile, package.json, … name it) or few lines of code. This prevents changes in functionality.
It should be clear that it is impossible implement a fast-track/expedite pipeline without a thorough regression-test suite.
Sometimes security patches introduce subtle changes in behaviour. Without a good test-harness, you increase the risk of disrupting production.
A high percentage of code coverage is a good indicator that the tests minimize the risk.
I think I spoke enough of developers. Let’s see what means for Steven
Kevin stops and thinks: I need to look at my pom.xml (build.gradle, *.csproj, Makefile, package.json, … name it) for references to Log4J (or whatever is vulnerable). Oh, but I use SLF4J which in turns… indirect dependencies! I need a tool just to find all possible references recursively. Oh Oh, our Tomcat configuration is using Log4J! I must check more than my JAR file, says Kevin.
And rebuild the Docker container…
Nowadays containers are a common vehicle to package applications so we have an additional piece to manage and we use automated pipelines for building container images.
The base image version maybe in the source Dockerfile or as a pipeline parameter. In any case the Ops person must have a grasp of running pipelines or ask the developers.
This won’t happen frequently, or would it?
You can reuse the stats I showed you to make people aware
.NET Core 3.1
3.1.0 December 3, 2019
3.1.22 December 14, 2021
got 22 patch releases in 3 years i.e. every 45 days/6 weeks
Node v14 (Fermium)
Active LTS start 2020-10-27 v14.15.0
2022-02-01, Version 14.19.0
total 19 releases in 463 days or 66 weeks i.e. every 24.4 days
JDK 11
Java SE 11 (LTS)September 25, 2018
11.0.13+8 (GA), October 19th 2021
total 13 releases(updates) in 1121 days i.e. every 12.3 weeks or 86.2 days
Go 1.16 released 2021-02-16
go1.16.14 (released 2022-02-10)total 14 updates in 360 days i.e. 26 days
go1 (released 2012-03-28) -> go1.17 (released 2021-08-16)
17 major releases in 3429 days or 490 weeks
MongoDB 5.0
5.0.0 - Jul 13, 2021
5.0.6 - January 31, 2022
total 6 releases in 203 days or 29 weeks i.e. every 4.8 weeks
A simple pattern would be to refresh dependencies every night and redeploy. There are important caveat though, that severely restricts applicability.
Some stacks are more fragile than others (Javascript/npm) and automatic update may very easily break applications.
Deploying a new version is running against a wall if the pipeline has no automated testing, or testing is poor.
Lastly, rebuilding everything has a cost (some ballpark measures: Microsoft-hosted is $40 per parallel job, GitLab $10 per 1,000 minutes, other have more complex formulas).
Current tooling may offer some information but a well-rounded process lot of cross-reference data.
Dependency management is a weak spot in general, SCA (Software Composition Analysis) can identify vulnerabilities in libraries.
Use of API may be caught by security scans
Artifact management tool can track the source (build) of binaries if properly used.
Pipeline knows which repositories they use, what we need here is ability to call a REST API that tell us the dependency.
If you can use such tools, great. Maybe you need to follow a bit of conventions and write some query tools.
In the worst scenario, you have to build and maintain your own database.
Annie’s perspective
security is a cost, this continuously updating is expensive, right?
I won’t teach you all the costs of neglecting security and quality.
A good source is 2018 report of the [US] Council of Economic Advisors titled “The Cost of Malicious Cyber Activity to the U.S. Economy”. It lists 13 different economical impacts of an adverse incident [see below]
So how can we justify the budget increase?
Loss of IP
Loss of strategic information
Reputational damage
Increased cost of capital
Cybersecurity improvements
Loss of data and equipment
Loss of revenue
Public relations
Regulatory penalties
Customer protection
Breach notification
Court settlement fees
Forensics
Can we explain it using technical debt?
What is technical debt, precisely. Although the use started with Ward Cunningham in 1992, I found that scholars have a more precise definition.
{{read}}it clearly does not match the observation: software decay due to external changes, not by developers’ action
Should we use another term?
Johannes Holvitie, Sherlock A. Licorish, Rodrigo O. Spínola, et al. - Technical debt and agile software development practices and processes: An industry practitioner survey - Information and Software Technology, issue 96 (2018) p.142
I think so, and I suggest to talk about inflation. As all metaphors it has limits: it is impossible to restore value for a currency like you can with software (unless you remove some digits and convert to a new currency).
All the mechanism suggested before (expedite pipelines, dependency metabase) plus
Reducing the run-times
Strong policies for quality
I should have demonstrated that there is an urgent need to automate rebuild and redeploy applications and underlying stacks at scale. We find new flaws in dependencies every day. Do we have enough resources to manage at scale or the process is still heavily manual? How quickly do we react without automation? Have we covered the entire application portfolio?
Solving this problem requires all parties to collaborate: development, operations, management, and vendors.
Hope I helped you understand toolchain problems in a broader way and will be easier for you to discuss with colleagues, managers and leadership.
I am open for questions. If you prefer you can each me via Twitter @giulio_vian or email me directly giuliovdev@hotmail.com.
I am open for questions. If you prefer you can each me via Twitter @giulio_vian or email me directly giuliovdev@hotmail.com.
If you want to discuss today’s ideas or other DevOps topics you
…are there tools to support me and detect vulnerabilities in the code I deliver?Yes, there are BLAH
Ma Guilio non si perde d’animo e sa come trovo vulnerabilita` nel codice e nelle librerie usate: SAST e SCA!
Static Application Security Testing (SAST) analizza i sorgenti per errori come il mancato controllo dell’input o SQL injection.
Software Composition Analysis (SCA) analizza i binari o i sorgenti per identificare le versioni di librerie in uso e controllare in un database continuamente aggiornato se hanno vulnerabilita` note. Quegli strumenti SCA che validano i binari sono in grado di indentificare anche componenti di runtime o del Sistema operative riguardo a vulnerabilita` note.
Guilio non ha budget e quindi usera` un versione open source o freemium per la sua ricerca.
E chiudiamo la parentesi
https://docs.docker.com/engine/scan/
There are studies that quantified in million of dollars
I mentioned before that a successful attack can significantly impact the bottom line, didn’t I?
Implementing the convention and tools, described before, can be expensive, depending on your situation.
How to justify this work? It balance with the risk of falling down because of an attack.
The last item is the reduction of velocity mentioned in “Business Impact” slide