GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look at Open Source Software Supply Chains

Gene Kim
Author, Researcher
“The Unicorn Project,”
Co-Author: “The Phoenix Project,” “DevOps Handbook,”
“Accelerate”
Exemplars, Laggards, and Hoarders
A Data-driven Look at Open Source Software Supply Chains
Dr. Stephen Magill
CEO, MuseDev
Principal Scientist, Galois, Inc.
@stephenmagill

@stephenmagill @RealGeneKim
• Open Source Software is everywhere
• Nat Friedman, CEO, GitHub: “99% of new software projects include open
source”
• How do these teams you depend on manage updates / security /
testing?
• “You are inviting thousands of developers into your code” when you use open
source dependencies
• Will they help or hurt you? (Erica Brescia, COO, GitHub)
• Which practices correspond to good component security outcomes
and therefore good security for your software?
Problem Statement

State of DevOps Research
• State of DevOps Report (2013-2019)
• Dr. Nicole Forsgren, Jez Humble, Gene Kim
• Cross population study spanning over 35K respondents
• Identified “IT performance” and the factors that predicts:
• Deployment Frequency
• Deployment Lead Time
• Deploy Success Rate
• Mean Time to Restore
Source: Google/DORA: 2018 State Of DevOps Report:
https://cloudplatformonline.com/2018-state-of-devops.html

• Our goal: Study what structures and practices are correlated
with exemplary outcomes (fast time to update, fast time to
remediate security vulnerabilities)
• Will we find the same trends we do in the enterprise, with faster
delivery correlating with good “business” outcomes?
Goals

Session ID:
The Opportunity: Study the Java Maven Ecosystem!

Session ID:
The Opportunity: Study the Java Maven Ecosystem!
Clojure
Haskell

Dr. Stephen Magill (Galois)
Gene Kim (IT Revolution)
Bruce Mayhew (Sonatype)
Gazi Mahmud (Sonatype)
Thanks also to:
Kevin Witten, Derek Weeks,
and Matt Howard

• Hypothesis 1: Projects that release frequently have better
outcomes.
• Hypothesis 2: Projects that update dependencies more
frequently are generally more secure.
• Hypothesis 3: Projects with fewer dependencies will stay more
up to date.
• Hypothesis 4: More popular projects will be better about staying
up to date.
Hypotheses *

310,888
Java components
4.2 Million
Artifacts (JARs)
6,952
GitHub Repos
27,704
8.9% with known
vulnerabilities
@RealGeneKim@stephenmagill
MavenCentral

TODO: Dependency Graph
Visualization

Maven Central

Recent

Connected

Correct
Versioning

All of the above
for dependencies

Updated
Dependencies
13.6% of the total population

Attributes Measure
Popularity Avg. daily Central Repository downloads
Release Frequency Avg. period between releases
Development Activity Avg. commits per month
Size of Team Avg. unique monthly contributors
Presence of CI Presence of popular cloud CI systems
Foundation Support Associated with an open source foundation
Security Based on reported vulnerabilities
Update Lag Based on dependency updates

• Popularity
• Main component: Average number of downloads per day from The Central Repository.
• Also used the Libraries.io dataset: Number of GitHub stars, forks, and pull requests.
• Sonatype Nexus IQ Server: Popularity score based on how frequently components are seen by the Nexus IQ
repository scanning service
• Commit activity
• SCM Commits per Month – average number of commits per month (Perceval)
• Developer Team Size –average number of unique developers committing each month (Perceval)
• (8 core VM scanning repositories for three days: Clojure wrapper around Perceval and jq)
• Presence of Continuous Integration (CI): as measured by the detection of any CI-related
configuration files in the source code repository (e.g., Travis, Jenkins, CircleCI, etc.).
• Clojure program retrieving HTML from GitHub repo, regular expressions to detect CI
Data Gathered: Repositories*
We used the CHAOSS Perceval utility to gather GitHub commit data, we gathered the number of commits per
month for twelve months, as well as the number of unique developers committing during each month.
Thank you to CHAOSS and Libaries.io for your amazing tools and data!

• Support Type: support for the component comes from an open source foundation, a
commercial organization, or is not officially supported by any organization (e.g., a
personal project).
• Number of Dependencies: the maximum count of dependencies for any given component
across all versions in the study period, as measured by the dependencies in the Maven
pom.xml file.
• Stale Dependencies (fewer is better): the average percentage of out-of-date component
dependencies (i.e., a newer version has been released) present when the component has
a new release.
• Release Period (shorter is better): average time in days each component version spends
as the “current” release. A shorter average release period equates to more frequent
releases.
Data Gathered: Project-Level *

ITPERF (2013-2019) Software Supply Chain (2019)
Deployment Frequency
• Commits / month *
• Releases / month
• Commits / dev / month *
Deployment Lead Time
• PR lead time
• Issue resolution time
Deploy Success Rate • API Breakage rate, Build Breakage Rate, PR Breakage Rate
Mean Time to Restore
• MTTR (mean time to remediate security vulnerabilities) *
• MTTU (mean time to update available components) *
• Age of stale / vulnerable dependencies *
Org Perf
• Stars / Popularity / Download count *
Thoughts On ITPERF <-> SSC Metrics
* Explored in this year’s research: 2019 State of the Software Supply Chain

@RealGeneKim
Hypothesis 1
Projects that release frequently have better outcomes.
(State of DevOps Report shows decisively that
shorter deployment lead times and
higher release frequency
improves outcomes)
@stephenmagill
*

@RealGeneKim
Hypothesis 1
Projects that release frequently have better outcomes.
(VALIDATED)
@stephenmagill
*

Projects that release most frequently (top 20%):
are 5x more popular (Maven Central downloads, GitHub stars and forks)
have 79% more developers
have 12% greater foundation support rates.

Attributes Measure
Popularity Avg. daily Central Repository downloads
Release Frequency Avg. period between releases
Development Speed Avg. commits per month
Size of Team Avg. unique monthly contributors
Presence of CI Presence of popular cloud CI systems
Foundation Support Associated with an open source foundation
Security Based on reported vulnerabilities
Update Speed Based on dependency updates
Dependency-Level Metrics

Security: Time to Remediate (TTR)

@RealGeneKim
B Vulnerable Time
@stephenmagill

@RealGeneKim
C Vulnerable Time
@stephenmagill

@RealGeneKim
C Remediation Time
@stephenmagill

Security: Time to Update (TTU)
@RealGeneKim
C Update Time (for B)
@stephenmagill

Security: Time to Update (TTU)
@RealGeneKim
C Update Time (for A)
@stephenmagill

Security: Stale Dependencies
@RealGeneKim
Stale Dependency
@stephenmagill

@RealGeneKim
The Key Dependency Metrics
(per-update)
Time to Remediate
Time to Update
Stale Dependencies
@stephenmagill

@RealGeneKim
The Key Dependency Metrics
(per-project)
Median Time to Remediate
Median Time to Update
Median Stale Dependencies
@stephenmagill

Time to Remediate Security Vulnerabilities*

Time to Remediate Security Vulnerabilities
Do these update
quickly in general?

Time to Remediate vs. Time to Update Dependencies (TTU)

Time to Remediate (TTR) vs. Time to Update (TTU) *
@RealGeneKim
Pearson correlation 0.6
@stephenmagill

Most projects stay secure by staying up to date.
55% have MTTR and MTTU within 20% of each other.
Only 15% of projects with worse than average MTTU
manage to maintain better than average MTTR.

Time to Remediate (TTR) vs. Time to Update (TTU)
@RealGeneKim
Pearson correlation 0.6
@stephenmagill

Hypothesis 2
Projects that update dependencies more frequently
are generally more secure.

Hypothesis 2
Projects that update dependencies more frequently
are generally more secure.
(VALIDATED)
*PrimeFaces CVE-2017-1000486: published 1/3/2018; vuln unreported as CVE; was fixed in
2/2016; cryptominers started using it (Source: Jeremy Long: @ctxt)

Hypothesis 3
Projects with fewer dependencies will stay more up to date.

Hypothesis 3
Projects with fewer dependencies will stay more up to date.
(REFUTED)
Components with more dependencies actually have better MTTU.

More dependencies
correlate with larger
development teams.
@RealGeneKim
Larger development
teams have 50% faster
MTTU and release 2.6x
more frequently.
@stephenmagill

@RealGeneKim
Hypothesis 4
More popular projects will be better about staying up to date.
@stephenmagill

Not all popular
projects are
exemplary and
release fast
(10-10K downloads
per day)

@RealGeneKim
Hypothesis 4
More popular projects will be better about staying up to date.
(REFUTED)
There are plenty of popular components with poor MTTU.
Popularity does not correlate with MTTU.
The most popular projects are not statistically different
from others with respect to MTTU.
@stephenmagill

@RealGeneKim
Number of stars or number of forks
IS NOT AN EFFECTIVE HEURISTIC
for selecting which components to use
(if security is important to you)
@stephenmagill

5 Behavioral Clusters for OSS “Suppliers”
@RealGeneKim
Small Exemplar
(606)
Large Exemplar
(595)
Small development
teams (1.6 devs),
exemplary MTTU.
Large development teams (8.9
devs), exemplary MTTU, very
likely to be foundation supported,
11x more popular.
@stephenmagill
Laggards
(521)
Features First
(280)
Cautious
(429)
Poor MTTU, high
stale dependency
count, more likely to
be commercially
supported.
Frequent releases,
but poor TTU.
Still reasonably
popular.
Good TTU,
but seldom
completely up
to date.

• We conducted survey with 658 respondents who completed it — three clusters
emerged, which we called “high, medium, and low update pain” clusters
• Comparison between “high pain” vs. “low pain” clusters:
• Updating dependencies is painful: 3.2x less likely to strongly agree
• Updating vulnerable components is painful: 2.6x less likely
• We schedule updating dependencies as part of our daily work: 10x more likely
• We strive to use the latest version (or latest-N) of all our dependencies: 6.2x more likely
• We use some process to add a new dependency (e.g., evaluate, approve, standardize,
etc.): 11x more likely
• We have a process to proactively remove problematic or unused dependencies: 9.3x
more likely
• We have automated tools to track, manage, and/or ensure policy compliance of our
dependencies: 12x more likely
Exemplars: Survey Data (N=658) *

Dr. Stephen Magill (Galois)
Gene Kim (IT Revolution)
Bruce Mayhew (Sonatype)
Gazi Mahmud (Sonatype)
Thanks also to:
Kevin Witten, Derek Weeks,
and Matt Howard
stephen@muse.dev

• Study further breaking changes
• Look at transitive dependencies
• Identify leading indicators, use techniques to assert causation
Year 2 Goals

• Ways to detect breaking changes
• Outcomes resulting from Dependabot pull requests?
• For which components are updates quickly and painlessly applied
• For which components are updates never applied (i.e., because they break everything)
• Which components have a disciplined and immutable API that allows for easier
upgrades?
• E.g., Clojure programming language and standard library have had virtually no breaking changes in
12 years
• E.g., React-native: “4 months after not touching it, it no longer builds if you update all the
dependencies”
• Get data on pull request lead time and issue resolution time
• (DONE? Thank you code.gov!)
• Authoritative list of foundation-supported projects?
Help We’re Looking For

Quick Takeaways
Integrate updating dependencies into your daily work!
Contribute dependency updates to components you use!
Don’t make decisions based solely on popularity!
Tell us what hypotheses you would like to see investigated!

GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look at Open Source Software Supply Chains

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look at Open Source Software Supply Chains

Similar to GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look at Open Source Software Supply Chains (20)

More from Gene Kim

More from Gene Kim (8)

Recently uploaded

Recently uploaded (20)

GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look at Open Source Software Supply Chains

Editor's Notes