7. Utilizing billions
of parts from
open source
communities... 80% to 90% of modern apps
consist of assembled components.
8. “The Japanese auto manufacturer buys
80% of his stamping requirements from
contract metal stampers. The reverse
is true in the U.S.”
W. Edwards Deming
Out of the Crisis
1982
23. Warehouses Manufacturers Finished Goods
6.1%
component downloads are
vulnerable
5.6%
components in repository managers
are vulnerable
6.8%
components in applications are
vulnerable
24. SOURCING PRACTICES ARE IMPROVING
2010
25,000
Year
ACTIVE NEXUS REPOSITORY
INSTANCES
2011 2012 2013 2014 2015 2016
50,000
75,000
100,000
125,000
2017
25. INSPECTION PRACTICES ARE IMPROVING
2010
25,000
Year
NEXUS REPOSITORY
2011 2012 2013 2014 2015 2016
50,000
75,000
100,000
125,000
2017
NEXUS REPOSITORY
w/ REPOSITORY HEALTH CHECK
REPOSITORIES SCANNED
w/ REPOSITORY HEALTH CHECK
26. NEWER COMPONENTS MAKE BETTER
SOFTWAREAnalysis of components in 25,000 applications scans
COMPONENTS BY YEAR
DEFECT DENSITY
1 2 3 4 5 6 7 8 9 10 11
5%
10%
15%
20%
25%
Component Age in Years
3X HIGHER DEFECT DENSITY
27. OLDER COMPONENTS DIE OFFAnalysis of components in 25,000 applications scans
INACTIVE PROJECTS
(% on latest version)
1 2 3 4 5 6 7 8 9 10 11
5%
10%
15%
20%
25%
Component Age in Years
28. 8 years later, vulnerable
versions of Bouncy Castle
were downloaded…
5.8M times
CVE-2007-6721
CVSS Base Score: 10.0
HIGH
Impact Subscore: 10.0
Exploitability Subscore: 10.0
2007 2015
USE THE HIGHEST QUALITY PARTS
30. “Governance processes that
depend on manual inspection
are guaranteed to fail.”
Diego Lo Giudice
DevOps Analyst, Forrester
November 2016
31
31. “Quality comes not from inspection,
but from improvement of the
production process.”
W. Edwards Deming
Out of the Crisis
1982
33. Elegant Procurement Trio
Ingredients
Anything sold must provide a Bill of Materials of 3rd
Party and Open Source Components
Hygiene & Avoidable Risk
Cannot use known vulnerable components
Remediation
Must be patchable/updateable
“Cease dependence on mass inspection.”
Inspection does not improve quality. Nor guarantee quality. Inspection is too late.
Harold F. Dodge: “You cannot inspect quality into a product”
Automatic inspection and recording require constant vigil.
Jessica: Hi everyone, and welcome to today's presentation where we're going to be sharing the findings on our 2016 State of the Software Chain Report. Before we get started, I'd like to review a few logistics with you. First, we are recording today's presentation, so we will make the recording and slides available to all registrants a little later today. We also are encouraging questions throughout the presentation. A couple different ways that you can ask questions is, there's a Q and A box and a chat window on your Web X panel on the lower right-hand corner of your screen. Please ask questions throughout the presentation. Later in the presentation, we'll be taking Q and A. We also do encourage questions on twitter using the hashtag Sonatype.
[00:01:00] Let's go ahead and get started. Today I'm joined by Derek Weeks, our VP and DevOps advocate, who is also the author of our second annual 2016 State of the Software Supply Chain Report. I'm excited to be joined by Derek. Derek has been spending the better part of the last month or two poring over the data and the findings that you'll hear today, and I know he's excited to share them. Welcome, Derek.
Derek Weeks:Thanks, Jessica. I'm really excited to present the findings that we have from the State of the Software Supply Chain Report this year. As Jessica said, we've spent the last few months poring through the data, and documenting the results. We're going to spend just about thirty minutes today reviewing some of the highlights from the report itself.
In the application economy, companies that build great software will own the future…
[00:09:00] One of the things that we see, and we documented in the report, is that the best organizations out there that recognize that they use these software supply chains, and have a software supply chain, are taking key principles from the DevOps practices, and all the way back to W. Edwards Deming, who talked about in order to improve the quality of products that are being developed, you need to use the fewest and best suppliers out there to get the highest quality parts and to use those in the product that you have. And when you're using those parts, you need to be able to track, and trade, and identify where those parts have been used, and which software applications, or which products. Because if any defect or security vulnerability might be found in one of those components over time, you need to be able to quickly identify, to reduce your mean time to identify and repair those vulnerable, or defective components within your supply chain. You'll see throughout the report where we've documented various organizations like Intuit and Capital One, that are applying these kind of practices within their own organizations to improve the software that they're delivering.
Unfortunately, not all parts are equal...
Some are healthy, some are not…
…and all go bad over time (like milk, not like wine).
[00:02:00] Before we get into the findings, I wanted to talk to you about one of the themes that we had as we were going through the information in the report. Also, because this is our second report, building in some feedback that we had from last year's report, is this is really about making the invisible things visible. Many of the organizations out there that have software supply chains, and that's most of you listening to this conversation, have software supply chains, but you don't have an idea of the volume of components, open-source and third party components, that your organization is consuming, nor is there good enough visibility to the quality of those components flowing into and across your software supply chain. One of the key reasons why we did this report was to begin to reveal those things, and make them easier to understand. But also to establish some benchmarks for performance and behavior out there, that you can compare your own organization against. Then also compare them against some of the best practices that we've highlighted throughout the report, some of which I'll share today.
19,800 GAVS
5589 unique
4.4%
737,000
[00:03:00] A story that comes to my mind over and over again, as we talk about the data that was revealed in this report, was there's a large healthcare organization, and that organization has a couple of thousand developers. That organization, from the data that we saw within our analysis, downloaded 16 million open source components last year into their organization. That is a huge number, but when we looked even further into that number, not only were they consuming 16 million parts, but when I looked at which parts they were, only about five thousand of those parts,
[00:04:00] including all the different versions of those components that they were consuming, were unique components. They were taking five thousand unique components and downloading them 16 million times across the whole year. When we looked further into the quality of those components that they were consuming, out of the 16 million downloads, 4.4% of their downloads had a known security vulnerability associated with them. Now, 4.4% doesn't sound like a lot, but when you look, that's one in twenty-five of their components, or 773 thousand component downloads, that included a known security vulnerability. Then that threatens the quality, and really impacts the quality of the software that that organization is building, and delivering to their customers, some of which are in mission critical applications.
This provides a huge benefit to the organizations that are utilizing the components, because it provides a massive productivity gain. We can develop more faster, we can leverage the innovations of others, and these open-source projects that have built some terrific code that we can leverage and use in our own applications.
There's a tremendous productivity boost, but there's also risk, as you'll see in blind consumption practices. We'll get into that later.
[00:07:00] In the report this year, we were able to do a deep dive on 3000 high performance software development organizations to understand what were those organizations consuming? How many components were they consuming? What are the quality of those components? Again, we're going to share that with you, and there's more detail shared in the report, so that you, as an organization can use some of this data to even benchmark your own performance or behavior as an organization. We also took a deep dive in looking at 25000 different applications, and to look at the components that were used within those, to get a sense of how components are used within applications, but again, what the quality metrics around those were. I'm going to share and reveal some of those findings with you today.
Say hello to YOUR software supply chain, not “the software supply chain”; personalizing it more for the audience.
For those of you that are unfamiliar with a software supply chain, it's really an allegate to the traditional supply chains used in manufacturing today. Those supply chains have suppliers that are building components. In the case of software development, that is the open-source [projects 00:07:53] that are building components, and making them freely available to developers around the world.
[00:08:00] They're able to store and distribute those components in the large central warehouses, like the central repository that Sonatype is responsible for managing, but also repositories like rubygems.org, [pipi.org 00:08:16], thenugetgallery, etc. This is where the components are stored and available to the manufacturers, that are really the software development teams, that are consuming these components and downloading these components over the years. Those components are then used to create the finished goods, or the software applications, that organizations are then delivering to their customers. We'll continue to use this supply chain analogy for the software supply chain, then compare and contrast what's happening in traditional manufacturing, is to what's happening in software today.
I'm going to walk you through some of the findings of the report. In the first part of the report, we talk about how organizations are feasting on this massive supply of open-source components.
There's a really interesting site out there called moduleaccounts.com. It has a simple value, it keeps track of the number of different components, or packages that are available across the different development languages, from pipi, to nuget, to bower, to maven, components, etc. And it shows the increase in the number of these components that are available to the developer ecosystem, or the developer population, over time. We used some data from that site to see that over a thousand new open-source projects were created each day. People delivering a new kind of software, a new kind of component.
Then, from the general population of all open-source projects worldwide, we were able to estimate that ten thousand new versions of components are introduced every day. There's this huge supply of components entering the ecosystem, and available to our software supply chains. When we look at the central repository that Sonatype manages, of maven style or java open-source components, we looked across 380 thousand open-source projects, and found that on average those projects were releasing fourteen new versions of their components every year. That's great from a supply chain aspect, that the suppliers are very active, actively releasing new software, actively releasing new innovations, and actively improving the software that they're making available to developers worldwide.
385,000 packages
1.6 billion downloads per week
104,000 publishing users
322,000 registered users (i.e. intending to publish)
5.9 million total users (i.e. downloading only)
There are about 11M JavaScript developers worldwide
~55% of them currently use npm
More than 1.5 million unique IPs access the service
As discussed, we don’t capture all of the sorts of cohort analysis I know you’d like, but as a start, we do easily know npm Registry usage by Node version and npm client version; see charts
The other staggering number that ... It's almost hard to get your mind around how big this is, that we saw was Sonatype's central repository received 31 billion download requests last year. The previous year was 17 billion. These numbers are huge. To put this in scales, and think about what this means, there are only about 10 million java developers on the planet that are requesting these specific components from the central repository. If you think about 10 million developers requesting 31 billion parts to use in building their software, it's amazing to see the volume of consumption by the different organizations out there. The volume of consumption is one particular part that we detail in the report, but we also want to make people aware that the quality of parts is not equal across all of those downloads. Of course you have a number of different versions of components that are going out there. A J query] component from six years ago is not going to be as good as the versions that were released today. The quality can differ by age, by license type of those components, and also by things like security defects.
[00:14:00] One of the things that we measured year over year, and we do do some year over year comparisons throughout the report, is that 6.2% of the downloads from the central repository last year out of the billions of downloads, had a known security vulnerability in them. This past year we saw 6.1% of the downloads had a known vulnerability. That's about one in sixteen of every component download has a known vulnerability in it.
Also, I wanted to dive down into, what does this really mean for your organization in particular? In some sense, who cares if there are 31 billion downloads of components, because your organization is not downloading that many. When we looked at the 3000 different development organizations, and what they were consuming, then you get ... Here's the impactful number for my organization. On average, we saw nearly a quarter of a million components being downloaded from each of these organizations. When you further look into those components, just as I had on the one company that I mentioned up front in the conversation, out of all of those downloads, there were only just over 5000 versions of all the components that were downloaded. If you strip out all of the versions of components that were downloaded, and you just look at who is an open-source that I'm pulling a component from, that whittles down to only 2000 unique components from individual suppliers.
One way to think about this is, "I've decided not to write this piece of software for my application, I've outsourced it effectively to an open-source project, and I'm consuming that into my organization. There's a massive benefit to doing that, because I didn't have to write that code and it was freely available to me." But you do see some inefficiencies within these practices, and that those 2000 components, if you remove all the different versions of those you are consuming, you've downloaded those over 100 times each, on average. We do see some inefficiencies there, which we could work on building into software supply chains and making them more efficient.
[00:17:00] As we mentioned before, all parts are not created equal. One of the things that we found within these 3000 organizations was 17000 of their component downloads had known vulnerabilities, or nearly 7 1/2% had security vulnerabilities associated with them. Again, this is something that we want to bring to light to organizations. Most organizations, when they've seen our data from the previous year's report, say, "I had no idea that our organization was consuming this many components. I knew that we were consuming, and using open-source components, I didn't realize the volume. When I look at the particular quality aspects of those, security being one, I'm somewhat surprised by the number that we would be consuming, and maybe we need to have better practices around that."
[00:18:00] Part of those practices are how much hygiene are we building into our software supply chain? This year's report allowed us to get visibility from the downloads from the central warehouses, being 6% were known vulnerable, to components that were downloaded to repository managers. Imagine a local warehouse, if you will, for component parts used by developers. 5.6% of those downloads were known vulnerable. Then the finished goods, across the 25000 applications that we analyze, 6.8% of those components were known vulnerable. That means that the components that were downloaded ended up in the finished goods, or in the applications that are being shipped and shared with customers. Meaning, there's not enough vetting taking place from where we're sourcing components and bringing them into our organizations to what's ending up in the final products.
It is said that software components age like milk, not wine. Analysis of the scanned applications revealed that the latest versions of components had the lowest percentage of known defects. Components under three years in age represented 38% of parts used in the average application with security defect rates under 5%. By comparison, components between five and seven years old had 2x the known security defect rate. The 2016 Verizon Data Breach and Investigations Report confirms that the vast majority of successful exploits last year where from CVE’s published 1998 -2013. Combining the Verizon data with Sonatype’s analysis further demonstrates the economic value of using newer, higher quality components.
[00:19:00] In the analysis of the 25000 applications, we not only looked at is there a security defect within those, but we recognized that there was a relationship between the security vulnerabilities and the age of the components. Let me walk you through what this particular chart from the report tells you. It says out of all the applications we analyzed, there are different components used, and those components have varying ages from one year old, to eleven year old components. You'll see in year two, for example, just over 20% of the components in all of these applications are two years old. One year old components make up maybe 18% of the overall application footprint. Those particular applications, you'll see on the red line within the report, have a defect density, or security vulnerability just hovering 5% there. When you look at the components that are six, seven, eight, nine, ten years old within this analysis, what you see is the defect density actually rises to almost 15%. Applications that are using components that are much older, have a three times higher defect density rate than the younger components, which would tell us if we're going to use components, let's use the newest versions of those components versus using the older ones.
APPLICATION VULNERABILITY DENSITY IS 6.8%
COMPONENTS >2 YEARS OLD ACCOUNT FOR 62% OF ALL COMPONENTS
COMPONENTS > 2 YEARS OLD ACCOUNT FOR 77% OF THE RISK
Versions that were seven years or older made up approximately 18% of the component footprint of the 25,000 scans. For the older components, analysis showed that as many as 23% were on the latest version -- meaning, the open source projects for those components were inactive or dead. Discovery of components with known security vulnerabilities or other defects used in applications is not something anyone desires. Unfortunately, when these defects are discovered in older components, chances of remediating the issue by upgrading to a newer component version are greatly diminished. If a new version does not exist, the only options are to keep the vulnerable component in the application, switch to a newer like component from another open source project, or to code the functionality required from scratch in order to replace the defect. None of these options comes without a significant cost or impact on quality.
The other thing that was really interesting that we found in this analysis, was we wanted to understand, are people using the latest versions of components that are available? The data, when we look again at this chart, we can see that components that are one year old, this chart will tell us that just under 15% of those components are on the latest version. Within the first year of a release, if a project is releasing 14 releases on average, you can see not all of the components that are one year old would necessarily be on the latest version.
When you look at components that are eight, nine, and ten years old, in the nine year old case, 20% of the components that are nine years old were on the latest version. Meaning there's no newer version of that component that has been available or released in the last nine years. You'll remember from the previous slide, the components that are old have three times higher defect density or security vulnerability that is associated with them. If there is a vulnerability in that component, and yet it hasn't released a new version in the last nine years, there is no remediation path for that particular component. You can't replace it with a newer one from the project. You would look at, am I using the highest quality suppliers for my component if this supplier hasn't released a new version? I might want to consider using an alternative supplier for that functionality, for that component in my application.
The CVE is dated 2007. Therefore, I changed the original year noted as 2009 above the line to 2007.
Changed the left side to 2015 from 2014. I updated the stats with the newest data showing 5.7M “bad” Bouncy Castle downloads.
WMY.
My talking point for this slide apart from the facts is ’Why is this happening?’ It’s because the supply chain is so complex that it’s impossible to keep up with this manually – developers aren’t being deliberately careless , they just don’t have the tools to help them keep on top of this. It’s so complex that it can only be done in an automated fashion.
Bill
Additional talking point is the age factor and the fact that these are unnecessary risks to be taking.
The other thing that's really important to know is that not only are there suppliers that have not released projects in many years, but the quality within their releases also differs dramatically. A really good supplier that I want to highlight, and we detail this in the report as well, is Bouncy Castle. This is a cryptographic library, very commonly used among java developers. In 2007 this cryptographic library had a level ten security vulnerability announced in it. That me and that the cryptographic library was exploitable, it was known to be broken, people could go in and hack that application. The developers wanted to keep the communications within that application private by using the cryptographic library, but Bouncy Castle came out and quickly introduced a new version of that component. A safer, fixed version. Over the years, there have been multiple defects found within this project, but they've always quickly come out and repaired those vulnerabilities, and made new ones available.
When we looked at the number of downloads of Bouncy Castle components last year, we saw that there were 17 million downloads of all the versions of this particular component. But in those vulnerable versions of that project's releases, 33% or 5.8 million downloads were known vulnerable, when newer versions, or secure versions of those components were available. It's not just about using the best suppliers, but using the best versions of parts from those suppliers that we need to keep in mind.
If we look further into the reports, the last section of the report talks about the practices of supply chain management gaining traction. This is not just Sonatype talking about software supply chain management, and software supply chain automation, there are a number of organizations around the world highlighted throughout the report that talk about, we need new rules of procurement.
What are we bringing into our organization, whether that's developers that are sourcing components for use in our own applications, or whether that's software that we as an organization are buying from someone else. That is establishing new rules of procurement to talk about what components are in the software. Are there any known vulnerable or risky components in that software that we need to be aware of? And if there are vulnerable components, are those able to be remediated over time? Are people, are my vendors going to tell me, or our development or operations teams going to be able to identify components that have gone bad over time so that we can fix those quickly?
The other thing that we document, and make visible in the report is the cost that organizations could incur to remediate vulnerabilities for old components, or risky license types within their applications. There's a calculator that we built online, again this is free to use for anyone ... but taking statistics from the report and showing an average application has about 100 components in it. 6.8% of those components, we know, are vulnerable or have at least one security vulnerability associated with them. A number more have license risks that would be associated with them. Again, components could have even more than one vulnerability associated with them. The calculator could help us see that an organization with 2000 applications across its portfolio, to remediate just 10% of the vulnerabilities in those components, picking maybe the worst vulnerabilities across their portfolio, could cost them seven million dollars to do this. Another way of looking at this is, it's not just a cost of seven million dollars, but it's seven million dollars you're really pulling from your innovation budget. You don't want to have development teams going back and doing rework, and creating waste, or managing and reducing the technical debt, and the security debt into the organization. You would rather spend that seven million innovating new software, delivering new software.
I think one of the key lessons that comes out of this visibility from the report is the way to manage your software supply chain. There are many organizations out there that we're aware of that have policies in place on what kind of components you can bring into your development organization, or development operations. You're not allowed to use parts with gpl licenses, or you're not allowed to use components with known security vulnerabilities in them, because you're following the different guidelines from those industry groups or agencies that I had reviewed earlier. When you understand that we're not just bringing in a couple hundred, or a couple thousand software components into the organization, but we're bringing in nearly a quarter of a million, you can't manually review what's coming in the door in any way that would not dramatically slow down your development operations. Developers don't want to wait for two weeks to understand, is this component approved for use or not?
What we need to do is look for solutions that are automated, that tell a developer right from the beginning, if you're going to pick this particular part, is it good or bad? Does it meet our policy within our organization or not? It's almost like going to the grocery store, You pick a product off the shelf, then you see the food label on that, and it tells you the quality and the ingredients, and the calories in that. You know if it's good or bad for you, or you have a better idea. We need to make that kind of information immediately available to developers so that they can build quality in from the start and reduce the amount of rework or waste that happens within our software supply chains. We need to automate the production of a software bill of materials to make it easier to understand what we use, and also where we've used components over time.
First of all… when you can clearly see the threat levels of components in your IDE, you can easily shift to a safer one.
The area here in the lower right works like a slider… you simply slide to the right to identify a safer, accepted version of a component.
So you see, you not only see a potential problem early one, but you also see the solution.
Better yet…
=========
Click onto pane and zoom in and zoom out
Guide your eyes to the RIGHT….
This is a normal Developer IDE called Eclipse…
Sonatype made a PLUGIN within it to show a developer the component BEFORE before they choose or commit to ELECTIVE/AVOIDABLE Risk/AttackSurface/Complexity/LegalIssues …
The RED chain (e.g.) is every version of Strut2-core…. And if you move RIGHT far enough…. It will lack KNOW CRITICAL vulnerabilities.
The Green bar charts are the download popularity… which doesn’t speak at all to SECURITY… but may give people more comfort that it is stable and being used.
License rsik is based on self-defined policy – we track if the use of this license can cause your whole website to now be FREE common opensource – like GPL… which might be very bad for you… and a DIFFERENT type of risk…
By approaching their open-source component inventory that was flowing through their supply chain and saying, "We're going to use the best component suppliers, the best open-source projects that have the best hygiene, and rely on those within our organization." This insurer was able to reduce the number of defects from ten to four per 10000 lines of code. When they look within each of those projects, and said, "We're not just going to accept any version of a component from that project, but we're going to rely on the best versions of those components from those projects." They were able to reduce their defect density from four to one per 10000 lines of code. Then they were able to continue to maintain the productivity of developers by using a software bill of materials to help them track and trace components over time across their software supply chain.
The end result was their developers were 30% more productive following these practices because they had streamlined the operation within their software supply chain, made them more efficient, and also improved the quality of what they were consuming.
When do you have an hour to spare?
With that said, there's a lot more detail that you can get to with what's happening in your software supply chain, and some of the benchmarks of what others are doing, and some of the case studies that we've put in there from various organizations. That is freely available to all of you, and the follow up email you will receive from this webinar. If you haven't downloaded the report yet, we will make a copy available to all of you.
First, I want to thank all of you for attending, and listening to this summary of some of the data that we have from the report.