Utilizing billions
of partsfrom
open source
communities... 80% to 90% of modern apps
consist of assembled components.
7.
“The Japanese automanufacturer buys
80% of his stamping requirements from
contract metal stampers. The reverse
is true in the U.S.”
W. Edwards Deming
Out of the Crisis
1982
1,000 new projectsper day
10,000 new versions per day
14x releases per year
14.
Dockerized Apps
3,000% Growthin 2 years
Official Repos
20% of all image pulls
460K 100+
Growing user demand for
commercial, supported and
licensed content
Source: DockerCon 2016 Keynote
15.
THE SSC INDEX
OpenSource Component Download
Requests, The Central Repository,
2008 - 2016
Warehouses Manufacturers FinishedGoods
6.1%
component downloads are
vulnerable
5.6%
components in repository managers
are vulnerable
6.8%
components in applications are
vulnerable
23.
Source: 2014 SonatypeOpen Source Development and Application Security Survey, and Sonatype’s 2017 DevSecOps Community survey
ALMOST 4-IN-10 RUN WITHOUT AN OPEN SOURCE
POLICY
Q: Does your organization have an open source policy?
2014 2017
57%
YES
58%
YES
24.
SOURCING AND INSPECTIONARE IMPROVING
2010
25,000
Year
NEXUS REPOSITORY
INSTANCES
2011 2012 2013 2014 2015 2016
50,000
75,000
100,000
125,000
2017
NEXUS REPOSITORY INSTANCES
w/ REPOSITORY HEALTH CHECK
25.
NEWER COMPONENTS MAKEBETTER
SOFTWAREAnalysis of components in 25,000 applications scans
COMPONENTS BY YEAR
DEFECT DENSITY
1 2 3 4 5 6 7 8 9 10 11
5%
10%
15%
20%
25%
Component Age in Years
3X HIGHER DEFECT DENSITY
26.
OLDER COMPONENTS DIEOFFAnalysis of components in 25,000 applications scans
INACTIVE PROJECTS
(% on latest version)
1 2 3 4 5 6 7 8 9 10 11
5%
10%
15%
20%
25%
Component Age in Years
27.
9 years later,vulnerable
versions of Bouncy Castle
were downloaded…
11M times
CVE-2007-6721
CVSS Base Score: 10.0
HIGH
Impact Subscore: 10.0
Exploitability Subscore: 10.0
2007 2016
USE THE HIGHEST QUALITY PARTS
28.
WHERE BITS &BYTES MEET FLESH &
BLOOD
23,476,966
Thanks to Josh Corman, Atlantic Council
29.
WHERE BITS &BYTES MEET FLESH &
BLOOD
18,330,958
Thanks to Josh Corman, Atlantic Council
30.
“Quality comes notfrom inspection,
but from improvement of the
production process.”
W. Edwards Deming
Out of the Crisis
1982
Elegant Procurement Trio
Ingredients
Anythingsold must provide a Bill of Materials of 3rd
Party and Open Source Components
Hygiene & Avoidable Risk
Cannot use known vulnerable components
Remediation
Must be patchable/updateable
#3 “Cease dependence on mass inspection.”
Inspection does not improve quality. Nor guarantee quality. Inspection is too late.
Harold F. Dodge: “You cannot inspect quality into a product”
Automatic inspection and recording require constant vigil.
#4 In the application economy, companies that build great software will own the future…
#9 [00:09:00] One of the things that we see, and we documented in the report, is that the best organizations out there that recognize that they use these software supply chains, and have a software supply chain, are taking key principles from the DevOps practices, and all the way back to W. Edwards Deming, who talked about in order to improve the quality of products that are being developed, you need to use the fewest and best suppliers out there to get the highest quality parts and to use those in the product that you have. And when you're using those parts, you need to be able to track, and trade, and identify where those parts have been used, and which software applications, or which products. Because if any defect or security vulnerability might be found in one of those components over time, you need to be able to quickly identify, to reduce your mean time to identify and repair those vulnerable, or defective components within your supply chain. You'll see throughout the report where we've documented various organizations like Intuit and Capital One, that are applying these kind of practices within their own organizations to improve the software that they're delivering.
#10 Unfortunately, not all parts are equal...
Some are healthy, some are not…
…and all go bad over time (like milk, not like wine).
#11 [00:02:00] Before we get into the findings, I wanted to talk to you about one of the themes that we had as we were going through the information in the report. Also, because this is our second report, building in some feedback that we had from last year's report, is this is really about making the invisible things visible. Many of the organizations out there that have software supply chains, and that's most of you listening to this conversation, have software supply chains, but you don't have an idea of the volume of components, open-source and third party components, that your organization is consuming, nor is there good enough visibility to the quality of those components flowing into and across your software supply chain. One of the key reasons why we did this report was to begin to reveal those things, and make them easier to understand. But also to establish some benchmarks for performance and behavior out there, that you can compare your own organization against. Then also compare them against some of the best practices that we've highlighted throughout the report, some of which I'll share today.
#12 There's a tremendous productivity boost, but there's also risk, as you'll see in blind consumption practices. We'll get into that later.
[00:07:00] In the report this year, we were able to do a deep dive on 3000 high performance software development organizations to understand what were those organizations consuming? How many components were they consuming? What are the quality of those components? Again, we're going to share that with you, and there's more detail shared in the report, so that you, as an organization can use some of this data to even benchmark your own performance or behavior as an organization. We also took a deep dive in looking at 25000 different applications, and to look at the components that were used within those, to get a sense of how components are used within applications, but again, what the quality metrics around those were. I'm going to share and reveal some of those findings with you today.
#13 Say hello to YOUR software supply chain, not “the software supply chain”; personalizing it more for the audience.
For those of you that are unfamiliar with a software supply chain, it's really an allegate to the traditional supply chains used in manufacturing today. Those supply chains have suppliers that are building components. In the case of software development, that is the open-source [projects 00:07:53] that are building components, and making them freely available to developers around the world.
[00:08:00] They're able to store and distribute those components in the large central warehouses, like the central repository that Sonatype is responsible for managing, but also repositories like rubygems.org, [pipi.org 00:08:16], thenugetgallery, etc. This is where the components are stored and available to the manufacturers, that are really the software development teams, that are consuming these components and downloading these components over the years. Those components are then used to create the finished goods, or the software applications, that organizations are then delivering to their customers. We'll continue to use this supply chain analogy for the software supply chain, then compare and contrast what's happening in traditional manufacturing, is to what's happening in software today.
#14 There's a really interesting site out there called moduleaccounts.com. It has a simple value, it keeps track of the number of different components, or packages that are available across the different development languages, from pipi, to nuget, to bower, to maven, components, etc. And it shows the increase in the number of these components that are available to the developer ecosystem, or the developer population, over time. We used some data from that site to see that over a thousand new open-source projects were created each day. People delivering a new kind of software, a new kind of component.
Then, from the general population of all open-source projects worldwide, we were able to estimate that ten thousand new versions of components are introduced every day. There's this huge supply of components entering the ecosystem, and available to our software supply chains. When we look at the central repository that Sonatype manages, of maven style or java open-source components, we looked across 380 thousand open-source projects, and found that on average those projects were releasing fourteen new versions of their components every year. That's great from a supply chain aspect, that the suppliers are very active, actively releasing new software, actively releasing new innovations, and actively improving the software that they're making available to developers worldwide.
#15 There's a really interesting site out there called moduleaccounts.com. It has a simple value, it keeps track of the number of different components, or packages that are available across the different development languages, from pipi, to nuget, to bower, to maven, components, etc. And it shows the increase in the number of these components that are available to the developer ecosystem, or the developer population, over time. We used some data from that site to see that over a thousand new open-source projects were created each day. People delivering a new kind of software, a new kind of component.
Then, from the general population of all open-source projects worldwide, we were able to estimate that ten thousand new versions of components are introduced every day. There's this huge supply of components entering the ecosystem, and available to our software supply chains. When we look at the central repository that Sonatype manages, of maven style or java open-source components, we looked across 380 thousand open-source projects, and found that on average those projects were releasing fourteen new versions of their components every year. That's great from a supply chain aspect, that the suppliers are very active, actively releasing new software, actively releasing new innovations, and actively improving the software that they're making available to developers worldwide.
#17 385,000 packages
1.6 billion downloads per week
1 million requests/hour
5.9 million total users
#18 385,000 packages
1.6 billion downloads per week
1 million requests/hour
5.9 million total users
#19 The other staggering number that ... It's almost hard to get your mind around how big this is, that we saw was Sonatype's central repository received 31 billion download requests last year. The previous year was 17 billion. These numbers are huge. To put this in scales, and think about what this means, there are only about 10 million java developers on the planet that are requesting these specific components from the central repository. If you think about 10 million developers requesting 31 billion parts to use in building their software, it's amazing to see the volume of consumption by the different organizations out there. The volume of consumption is one particular part that we detail in the report, but we also want to make people aware that the quality of parts is not equal across all of those downloads. Of course you have a number of different versions of components that are going out there. A J query] component from six years ago is not going to be as good as the versions that were released today. The quality can differ by age, by license type of those components, and also by things like security defects.
#20 [00:14:00] One of the things that we measured year over year, and we do do some year over year comparisons throughout the report, is that 6.2% of the downloads from the central repository last year out of the billions of downloads, had a known security vulnerability in them. This past year we saw 6.1% of the downloads had a known vulnerability. That's about one in sixteen of every component download has a known vulnerability in it.
#21 Also, I wanted to dive down into, what does this really mean for your organization in particular? In some sense, who cares if there are 31 billion downloads of components, because your organization is not downloading that many. When we looked at the 3000 different development organizations, and what they were consuming, then you get ... Here's the impactful number for my organization. On average, we saw nearly a quarter of a million components being downloaded from each of these organizations. When you further look into those components, just as I had on the one company that I mentioned up front in the conversation, out of all of those downloads, there were only just over 5000 versions of all the components that were downloaded. If you strip out all of the versions of components that were downloaded, and you just look at who is an open-source that I'm pulling a component from, that whittles down to only 2000 unique components from individual suppliers.
One way to think about this is, "I've decided not to write this piece of software for my application, I've outsourced it effectively to an open-source project, and I'm consuming that into my organization. There's a massive benefit to doing that, because I didn't have to write that code and it was freely available to me." But you do see some inefficiencies within these practices, and that those 2000 components, if you remove all the different versions of those you are consuming, you've downloaded those over 100 times each, on average. We do see some inefficiencies there, which we could work on building into software supply chains and making them more efficient.
#22 [00:17:00] As we mentioned before, all parts are not created equal. One of the things that we found within these 3000 organizations was 17000 of their component downloads had known vulnerabilities, or nearly 7 1/2% had security vulnerabilities associated with them. Again, this is something that we want to bring to light to organizations. Most organizations, when they've seen our data from the previous year's report, say, "I had no idea that our organization was consuming this many components. I knew that we were consuming, and using open-source components, I didn't realize the volume. When I look at the particular quality aspects of those, security being one, I'm somewhat surprised by the number that we would be consuming, and maybe we need to have better practices around that."
#23 [00:18:00] Part of those practices are how much hygiene are we building into our software supply chain? This year's report allowed us to get visibility from the downloads from the central warehouses, being 6% were known vulnerable, to components that were downloaded to repository managers. Imagine a local warehouse, if you will, for component parts used by developers. 5.6% of those downloads were known vulnerable. Then the finished goods, across the 25000 applications that we analyze, 6.8% of those components were known vulnerable. That means that the components that were downloaded ended up in the finished goods, or in the applications that are being shipped and shared with customers. Meaning, there's not enough vetting taking place from where we're sourcing components and bringing them into our organizations to what's ending up in the final products.
#26 It is said that software components age like milk, not wine. Analysis of the scanned applications revealed that the latest versions of components had the lowest percentage of known defects. Components under three years in age represented 38% of parts used in the average application with security defect rates under 5%. By comparison, components between five and seven years old had 2x the known security defect rate. The 2016 Verizon Data Breach and Investigations Report confirms that the vast majority of successful exploits last year where from CVE’s published 1998 -2013. Combining the Verizon data with Sonatype’s analysis further demonstrates the economic value of using newer, higher quality components.
[00:19:00] In the analysis of the 25000 applications, we not only looked at is there a security defect within those, but we recognized that there was a relationship between the security vulnerabilities and the age of the components. Let me walk you through what this particular chart from the report tells you. It says out of all the applications we analyzed, there are different components used, and those components have varying ages from one year old, to eleven year old components. You'll see in year two, for example, just over 20% of the components in all of these applications are two years old. One year old components make up maybe 18% of the overall application footprint. Those particular applications, you'll see on the red line within the report, have a defect density, or security vulnerability just hovering 5% there. When you look at the components that are six, seven, eight, nine, ten years old within this analysis, what you see is the defect density actually rises to almost 15%. Applications that are using components that are much older, have a three times higher defect density rate than the younger components, which would tell us if we're going to use components, let's use the newest versions of those components versus using the older ones.
#27 APPLICATION VULNERABILITY DENSITY IS 6.8%
COMPONENTS >2 YEARS OLD ACCOUNT FOR 62% OF ALL COMPONENTS
COMPONENTS > 2 YEARS OLD ACCOUNT FOR 77% OF THE RISK
Versions that were seven years or older made up approximately 18% of the component footprint of the 25,000 scans. For the older components, analysis showed that as many as 23% were on the latest version -- meaning, the open source projects for those components were inactive or dead. Discovery of components with known security vulnerabilities or other defects used in applications is not something anyone desires. Unfortunately, when these defects are discovered in older components, chances of remediating the issue by upgrading to a newer component version are greatly diminished. If a new version does not exist, the only options are to keep the vulnerable component in the application, switch to a newer like component from another open source project, or to code the functionality required from scratch in order to replace the defect. None of these options comes without a significant cost or impact on quality.
The other thing that was really interesting that we found in this analysis, was we wanted to understand, are people using the latest versions of components that are available? The data, when we look again at this chart, we can see that components that are one year old, this chart will tell us that just under 15% of those components are on the latest version. Within the first year of a release, if a project is releasing 14 releases on average, you can see not all of the components that are one year old would necessarily be on the latest version.
When you look at components that are eight, nine, and ten years old, in the nine year old case, 20% of the components that are nine years old were on the latest version. Meaning there's no newer version of that component that has been available or released in the last nine years. You'll remember from the previous slide, the components that are old have three times higher defect density or security vulnerability that is associated with them. If there is a vulnerability in that component, and yet it hasn't released a new version in the last nine years, there is no remediation path for that particular component. You can't replace it with a newer one from the project. You would look at, am I using the highest quality suppliers for my component if this supplier hasn't released a new version? I might want to consider using an alternative supplier for that functionality, for that component in my application.
#28 in 2016 there were 197 GAVs related to bouncycastle downloaded a total of 23,412,020 times. 61 of thos GAVs were insecure, and those were downloaded 11,181,493 times
#29 for commons-collection, there were 25 GAVs downloaded a total of 23,476,966 times. 7 of those GAVs were insecure, and those were downloaded 18,330,958 times.
#30 for commons-collection, there were 25 GAVs downloaded a total of 23,476,966 times. 7 of those GAVs were insecure, and those were downloaded 18,330,958 times.
#32 If we look further into the reports, the last section of the report talks about the practices of supply chain management gaining traction. This is not just Sonatype talking about software supply chain management, and software supply chain automation, there are a number of organizations around the world highlighted throughout the report that talk about, we need new rules of procurement.
#33 What are we bringing into our organization, whether that's developers that are sourcing components for use in our own applications, or whether that's software that we as an organization are buying from someone else. That is establishing new rules of procurement to talk about what components are in the software. Are there any known vulnerable or risky components in that software that we need to be aware of? And if there are vulnerable components, are those able to be remediated over time? Are people, are my vendors going to tell me, or our development or operations teams going to be able to identify components that have gone bad over time so that we can fix those quickly?
#34 APPLICATION VULNERABILITY DENSITY IS 6.8%
COMPONENTS >2 YEARS OLD ACCOUNT FOR 62% OF ALL COMPONENTS
COMPONENTS > 2 YEARS OLD ACCOUNT FOR 77% OF THE RISK
Versions that were seven years or older made up approximately 18% of the component footprint of the 25,000 scans. For the older components, analysis showed that as many as 23% were on the latest version -- meaning, the open source projects for those components were inactive or dead. Discovery of components with known security vulnerabilities or other defects used in applications is not something anyone desires. Unfortunately, when these defects are discovered in older components, chances of remediating the issue by upgrading to a newer component version are greatly diminished. If a new version does not exist, the only options are to keep the vulnerable component in the application, switch to a newer like component from another open source project, or to code the functionality required from scratch in order to replace the defect. None of these options comes without a significant cost or impact on quality.
The other thing that was really interesting that we found in this analysis, was we wanted to understand, are people using the latest versions of components that are available? The data, when we look again at this chart, we can see that components that are one year old, this chart will tell us that just under 15% of those components are on the latest version. Within the first year of a release, if a project is releasing 14 releases on average, you can see not all of the components that are one year old would necessarily be on the latest version.
When you look at components that are eight, nine, and ten years old, in the nine year old case, 20% of the components that are nine years old were on the latest version. Meaning there's no newer version of that component that has been available or released in the last nine years. You'll remember from the previous slide, the components that are old have three times higher defect density or security vulnerability that is associated with them. If there is a vulnerability in that component, and yet it hasn't released a new version in the last nine years, there is no remediation path for that particular component. You can't replace it with a newer one from the project. You would look at, am I using the highest quality suppliers for my component if this supplier hasn't released a new version? I might want to consider using an alternative supplier for that functionality, for that component in my application.
#35 I think one of the key lessons that comes out of this visibility from the report is the way to manage your software supply chain. There are many organizations out there that we're aware of that have policies in place on what kind of components you can bring into your development organization, or development operations. You're not allowed to use parts with gpl licenses, or you're not allowed to use components with known security vulnerabilities in them, because you're following the different guidelines from those industry groups or agencies that I had reviewed earlier. When you understand that we're not just bringing in a couple hundred, or a couple thousand software components into the organization, but we're bringing in nearly a quarter of a million, you can't manually review what's coming in the door in any way that would not dramatically slow down your development operations. Developers don't want to wait for two weeks to understand, is this component approved for use or not?
What we need to do is look for solutions that are automated, that tell a developer right from the beginning, if you're going to pick this particular part, is it good or bad? Does it meet our policy within our organization or not? It's almost like going to the grocery store, You pick a product off the shelf, then you see the food label on that, and it tells you the quality and the ingredients, and the calories in that. You know if it's good or bad for you, or you have a better idea. We need to make that kind of information immediately available to developers so that they can build quality in from the start and reduce the amount of rework or waste that happens within our software supply chains. We need to automate the production of a software bill of materials to make it easier to understand what we use, and also where we've used components over time.
#37 First of all… when you can clearly see the threat levels of components in your IDE, you can easily shift to a safer one.
The area here in the lower right works like a slider… you simply slide to the right to identify a safer, accepted version of a component.
So you see, you not only see a potential problem early one, but you also see the solution.
Better yet…
=========
Click onto pane and zoom in and zoom out
Guide your eyes to the RIGHT….
This is a normal Developer IDE called Eclipse…
Sonatype made a PLUGIN within it to show a developer the component BEFORE before they choose or commit to ELECTIVE/AVOIDABLE Risk/AttackSurface/Complexity/LegalIssues …
The RED chain (e.g.) is every version of Strut2-core…. And if you move RIGHT far enough…. It will lack KNOW CRITICAL vulnerabilities.
The Green bar charts are the download popularity… which doesn’t speak at all to SECURITY… but may give people more comfort that it is stable and being used.
License rsik is based on self-defined policy – we track if the use of this license can cause your whole website to now be FREE common opensource – like GPL… which might be very bad for you… and a DIFFERENT type of risk…
#40 When do you have an hour to spare?
With that said, there's a lot more detail that you can get to with what's happening in your software supply chain, and some of the benchmarks of what others are doing, and some of the case studies that we've put in there from various organizations. That is freely available to all of you, and the follow up email you will receive from this webinar. If you haven't downloaded the report yet, we will make a copy available to all of you.
First, I want to thank all of you for attending, and listening to this summary of some of the data that we have from the report.