Charlotte Gayton's OpenChain ISO 18974 Dissertation

Department of Computer Science
Submitted in part fulfilment for the degree of BEng.
Implementing OpenChain
ISO/IEC 18974, the Open Source
Compliance Standard for
Improving Security Assurance
Charlotte Gayton
Version 2.0, 2024-March
Supervisor: Dr. Konstantinos Barmpis

to be completed
Acknowledgements
to be completed

Contents
1 Abbreviations vii
Executive Summary viii
2 Introduction 1
2.1 Project aim . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Objectives to Achieve Project Aim . . . . . . . . . . . . . . 2
2.2.1 The OpenChain ISO/IEC 18974 Standard . . . . . . 3
2.3 Why Open-Source Software Security is Important . . . . . . 3
2.4 Approach to the project . . . . . . . . . . . . . . . . . . . . 4
3 Background 5
3.1 SBOMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Vulnerability Databases . . . . . . . . . . . . . . . . . . . . 5
3.3 Vulnerability Scanners . . . . . . . . . . . . . . . . . . . . . 6
3.4 DevOps processes . . . . . . . . . . . . . . . . . . . . . . 7
4 Methodology 8
4.1 Requirements Engineering . . . . . . . . . . . . . . . . . . 8
4.1.1 Feasibility Study . . . . . . . . . . . . . . . . . . . . 9
4.1.2 Requirements Elicitation . . . . . . . . . . . . . . . 10
4.1.3 Requirements Specification . . . . . . . . . . . . . . 12
4.1.4 Requirements Verification and Validation . . . . . . . 13
4.1.5 Requirements Management . . . . . . . . . . . . . 14
4.2 Picking a Vulnerability Tool . . . . . . . . . . . . . . . . . . 14
4.3 Continous Integration (CI) . . . . . . . . . . . . . . . . . . . 15
4.4 Evaluation Process . . . . . . . . . . . . . . . . . . . . . . 16
4.5 Ethical Considerations . . . . . . . . . . . . . . . . . . . . 16
5 Design and Architecture 17
5.1 High-Level Architecture of the System . . . . . . . . . . . . 17
5.2 Vulnerability Scanner . . . . . . . . . . . . . . . . . . . . . 19
5.3 Analysing Vulnerability Data . . . . . . . . . . . . . . . . . 20
5.4 Publishing Results to Datalake . . . . . . . . . . . . . . . . 21
5.5 Designing the Backstage Site . . . . . . . . . . . . . . . . . 21
6 Implementation 23
6.1 GitHub Actions Workflow . . . . . . . . . . . . . . . . . . . 23
iii

Contents
6.2 Changing vulnerability scanner tool . . . . . . . . . . . . . 24
6.3 Backstage site . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.4 Creating CI/CD tests . . . . . . . . . . . . . . . . . . . . . 25
6.5 After first meeting with JD . . . . . . . . . . . . . . . . . . . 25
7 Evaluation 26
7.1 Interview with Endjin . . . . . . . . . . . . . . . . . . . . . 26
7.2 System Functionality . . . . . . . . . . . . . . . . . . . . . 27
7.3 Functional and Non-Functional Requirements . . . . . . . . 28
7.4 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
A Endjin’s Feedback 30
B Fulfillment of Functional Requirements 32
C Fulfillment of Non-Functional Requirements 34
D GitHub Actions Workflow Diagram 35
E Screenshot of project board on GitHub 36
iv

List of Figures
5.1 System architecture design diagram . . . . . . . . . . . . . 18
D.1 Diagram representing the different steps in the GitHub Ac-
tions workflow script . . . . . . . . . . . . . . . . . . . . . . 35
E.1 Screenshot of project board on GitHub . . . . . . . . . . . . 36
v

List of Tables
4.1 Table answering feasibility questions . . . . . . . . . . . . . 9
4.2 Table showing the requirements of the OpenChain specifica-
tion with the decision I made on whether to implement each
one . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Table showing the functional requirements . . . . . . . . . . 12
4.4 Table showing the non-functional requirements . . . . . . . 13
A.1 Table showing Endjin’s feedback . . . . . . . . . . . . . . . 30
B.1 Table showing the functional requirements and whether they
have been fulfilled . . . . . . . . . . . . . . . . . . . . . . . 32
C.1 Table showing the non-functional requirements and whether
they have been fulfilled . . . . . . . . . . . . . . . . . . . . 34
vi

1 Abbreviations
OSS Open-Source Software
SBOM Software Bill of Materials
CVE Common Vulnerabilities and Exposures
NVD National Vulnerability Database
ADR Architecture Design Record
CI Continuous Integration
CD Continuous Development
SPDX System Package Data Exchange
DevOps Development Operations
vii

Executive Summary
With a world that heavily relies on technology, software is being developed
extensively and being used in all facets of life. Lots of the software available
is built up from open-source software (OSS) that is distributed freely under
a specific license which outlines the permissions of usage. Whilst this
speeds up development, saving time and money, it can also come with
risks.
Vulnerabilities can be found in open-source software which are subject to
be exploited, data could be leaked or systems could have access gained.
The open-source community and developers are generally good at patching
these issues quickly, but each user of the software has to individually update
their dependencies or else they could still be exposed to the risk.
This is where the OpenChain ISO/IEC 18974 specification comes in, for
ensuring security along the open-source software supply chain. The stand-
ard defines the ‘what’ and ‘why’ parts of the program, which can help lead
organisations to pick up best practices.
Endjin is a data consultancy whose experties lie in modern data engineering
solutions. They have both their own, and use a variety of open-source
software in their projects. This means it is crucial to them that their systems
are secure, and that they can have confidence that the software they are
using is safe.
The aim for this project is to work on the autonomous processes for Endjin
which will aid the progress in working toward the OpenChain specification.
The stakeholders at Endjin want a system that will integrate well with their
current systems.
One of the most crucial parts of this project is understanding the require-
ments from Endjin to ensure the product fully meets their needs, this means
also breaking down and understanding the requirements of the OpenChain
specification. To define and manage the requirements throughout the pro-
cess of this project, I use Requirements Engineering practices. This is the
process for identfiying, analysing, specifying and managing the needs and
expectations of the stakeholders. Due to the aim of the project focusing
on the OpenChain specification it is important that the functionalities and
viii

Executive Summary
the build of the system reflects what is defined as a standard, whilst also
fulfilling Endjin’s internal requirements.
State-of-the-art open-source vulnerability scanners are readily available,
presenting one of the project’s key considerations: selecting the most suit-
able vulnerability scanner. The scanner needs to fulfil all the requirements
defined. Initially, the right tool for the project was a tool called ‘Bomber’
which is
The system design is a workflow that has individual modular steps which
pass data from one to another. This makes each components easier
to understand, maintain and debug. The ‘vulnerability scanner’ step of
the workflow is responsible for getting the SBOMs and scanning their
components for vulnerabilities. This stage will incorporate the vulnerability
scanner tool which was chosen in the methodology.
The next stage of the design is the analysis on the vulnerability report. For
clarity, some data cleansing and manipulation is required at this step to
ensure the data is easy to understand and can be used in the most useful
way. Then it is split into a number of different reports which can be used for
different purposes. The patch report contains the components and whether
they have a fixed version, and then an extra layer of checking on top to see
whether that update is a patch, minor or major as these could be dealt with
differently. The other is a simplified version of the report for which the data
will be used on the central site. The final report is a summarised verison of
the report containing the count of how many vulnerabilities for each severity
there are.
The final stage of the workflow uploads these results to the Azure data lake,
which will be accessible by the central site, and can be used to store all the
data.
ix

2 Introduction
As the world is quickly developing, the use of technology is growing more
so. This means large amounts of investment are being thrown at new
projects, and everyone is working as fast as they can to keep up with the
industry leaders. Technology is apparent in many different sectors: finance,
medicine, business, education, and so many more. As technology contin-
ues to advance, the demand for software grows, leading to an increasing
number of organisations creating and developing software. Open-source
software (OSS) defines free distribution and access to the source code
of software [1]. This software is released under different licenses, each
outlining its terms of usage; the most common types include permissive
and copyleft.
OSS is becoming more popular as it is cheap and easy to use, can easily
be integrated into developing software, and is constantly being updated.
However, many risks come with using OSS, which need to be taken into
account when using it, either personally or in an organisation. These risks
include both legal and security issues. Regardless of the project’s size,
whether it is a student’s dissertation or government top-secret documents,
any existing vulnerability in the code could potentially be exploited. While,
for hackers, targeting government information may be more ’beneficial’, no
software should be left untracked if the contents or data are important.
With OSS being very new, there are still standards, along with laws, being
developed, meaning new tools and processes are being created daily.
Every organisation will have different software that is written in different
languages and tackles different issues with different risks, so therefore, how
are we able to create tools that are general enough for everyone to use?
This is the issue companies are running into when trying to become safer
and more compliant using OSS. The software available that can manage
this is not always perfectly suited to their business. And so, this is where
the idea of my project stems: integrating processes for Endjin (discussed
later) that will ensure OSS safety.
1

2 Introduction
2.1 Project aim
This project aims to aid in the implementation of the OpenChain ISO/IEC
18974 standard for Endjin, a data consulting company that specialises in
data engineering solutions. [2] They have a large range of open-source
software that assists both their work and the work of their customers.
The OpenChain project is a set of standards defined by the Linux Found-
ation that focus on the best practices for using and creating open-source
software. These standards aim to "reduce friction" and "increase efficiency"
when using open-source software [3]. There are two main standards they
maintain:
• OpenChain ISO/IEC 5230: The international standard for open
source license compliance programmes [4]
• OpenChain ISO/IEC 18974: The industry standard for open source
security assurance programmes [3]
As part of my degree, I completed a year in industry at Endjin. Over
the year, I implemented the OpenChain ISO/IEC 5230 standard, which
focuses on open-source license compliance. I defined and implemented
methods to create, track, and manage Endjin’s open-source software across
their code base. This included generating a software bill of materials
(SBOM, discussed later in the report). Endjin found this project as a whole
successful; they can now track all the licenses they are using across their
code base, instilling confidence that the software they are developing and
using is safe for both themselves and their customers to use.
As a result of this, Endjin was keen to get started on the second of the two
standards, OpenChain ISO/IEC 18974, and wanted a system that would
integrate well with their current work.
2.2 Objectives to Achieve Project Aim
The main objective of this project, as defined above, is to create software
for Endjin that helps them work towards achieving the OpenChain ISO/IEC
18974 standard.
The main stakeholder for this project is James Dawson [5], the Principle
Engineer at Endjin, who specialises as a DevOps consultant. He is in
charge of the DevOps processes across Endjin’s codebase and has a main
interest in the development of the specification. Ultimately, the biggest
priority for the project is to fulfil the specific requirements and desires of the
2

2 Introduction
stakeholders.
2.2.1 The OpenChain ISO/IEC 18974 Standard
This newly developed standard, focusing on security assurance, has been
developed since noticing OpenChain ISO/IEC 5230 being used in the
security domain [6], therefore deciding to create a standard based solely
on security. The specification [6] defines the ‘what’ and ‘why’ parts of
the programme, rather than the ‘how’ and the ‘when’. Instead of explicitly
defining what each organisation needs to do, they lay out the different
processes and documents that are needed because each case is unique.
The aim of this project is to target the implementation aspect of the spe-
cification rather than the policies and documents that Endjin would need to
generate. Consequently, only a segment of the specification relates to this
project:
• ‘It focuses on a narrow subset of primary concern: checking open-
source software against publicly known security vulnerabilities like
CVEs, GitHub/GitLab vulnerability reports, and so on’
• ‘A process shall exist for creating and maintaining a bill of materials
that includes each open source software component from which the
supplied software is comprised’
The above extracts from the specification set out the main objectives for
the project, checking all of Endjin’s software for publicly known security
vulnerabilities.
2.3 Why Open-Source Software Security is
Important
Although OSS is cheap and easy to use, it comes with a number of risks that
organisations or users need to be aware of in order to use the code safely.
OSS can be found everywhere [7], making up over 80% of the software
code in use in modern applications. Whole applications themselves can
be open-source, such as the Android operating system for mobiles, which
is used by over 3.3 billion people[8]. Even though this system itself is
just one piece of open-source software, it could potentially use hundreds
or thousands of different OSS libraries and packages. These packages
could use more OSS licenses and packages; this is a demonstration of the
software supply chain.
3

2 Introduction
The term ‘software supply chain’ refers to the representation of depend-
encies that exist within a software component. While that programme will
have code of its own, it is likely that it also uses open-source software to
build up certain components. These components could also have OSS
dependencies, therefore creating this idea of a software supply chain, which
commonly appears as a tree shape.
The log4j attack is one of the most well-known vulnerabilities across the
world. Log4j is a Java logging framework that is listed as one of the top
100 critical open source software projects [9] and is used commonly across
software and website applications. The vulnerability, which was discovered
in 2021, was easy to exploit, requiring very little expertise [10]. Because it
is such a widely used piece of software, many organisations were unaware
that this software was being used in their supply chain. A patch was
released very quickly, allowing users to update their software; however, it
was discovered that even two years after the attack, 2.8% of applications
were still using the un-patched version of Log4j [11]. In a study [11] it was
found that 79% of the time, developers didn’t update the dependencies they
were using after they added them to the software, which is apparent when
it is estimated that a third of applications currently are using a vulnerable
version of Log4j.
There are several companies that have already completed the OpenChain
ISO/IEC 18974 standard. Blackberry was the first in America, [12], and is
focused on ‘building a more resilient and trusted software supply chain’.
2.4 Approach to the project
Initially, when starting this project, I intended to develop a tool that Endjin
could use that would scan a representation of their code base for any
vulnerabilities and give a report based on its findings. However, after re-
searching the current state-of-the-art, I came across many already existing
tools that did exactly what I had planned, if not more so. Although the pro-
cess of building this tool from the ground up could have been a fascinating
experience in data gathering and manipulation, it doesn’t align with the
project’s best interests. There are better solutions out there, and some are
open-source too.
This changed my approach. Having intended to initially develop a singular
tool, I now decided to develop an autonomous process for Endjin that would
run and ingest the results of a vulnerability scanner built up from Endjin’s
current architecture (discussed in more detail in the background section).
4

3 Background
3.1 SBOMs
A Software Bill of Materials (SBOM) serves as an inventory of all compon-
ents and dependencies within a given software. It defines all the parts
that make up the specified application. As SBOMs have become more
popular in the open-source community, there have been more standards
and formats emerging, so implementing and standardising these security
standards will be much easier. The two main leading SBOM formats are
SPDX and CycloneDx.
Endjin uses a tool called Covenant [13] made by Patrik Svensson. SPDX
SBOMs can be generated with a custom alteration that includes some
extra meta data. By integrating this tool into Endjin’s build processes, an
updated version is automatically produced each night and uploaded to their
database. Once there, the information undergoes a cleansing process,
preparing it for analysis in accordance with the OpenChain ISO/IEC 5230
procedures.
SBOMs can be used for each piece of software to track components and
dependencies; these files can be analysed and checked to gain insight into
a system as a whole.
3.2 Vulnerability Databases
Once vulnerabilities are identified, they are recorded and published in a
vulnerability database, alerting users to known software issues. One of
the most prominent databases is the [14] National Vulnerability Database,
which includes security-related software flaws, product names, and impact
metrics, aiding in automated vulnerability detection.
The database includes Common Vulnerabilities and Exposures (CVEs)
which are, as referenced to by NIST [15], a dictionary of vulnerabilities.
They are assigned an ID so they can be easily searched for and identified.
5

3 Background
Once the CVEs have been defined the NVD [14], which is managed by
NIST, will analyse each entry and upload it to the NVD.
GitHub’s Advisory Database, which includes CVEs and advisories from
open-source software, integrates well with GitHub’s automatic scanner
and adviser. However, it may not be as effective for Endjin as not all their
software is hosted on GitHub.
While vulnerability databases are useful, they also alert potential hackers to
known vulnerabilities, highlighting the need for automated processes that
scan for vulnerabilities and update software promptly.
3.3 Vulnerability Scanners
By producing SBOMs for software, it’s possible to accurately represent an
entire codebase as a list of distinct elements. This is only the beginning
of what is required for security compliance, as these components need
to be checked against real vulnerability data. As discussed above, this
information can be collected from vulnerability databases. A large number
of these databases feature APIs that facilitate the downloading or querying
of their contents. Existing state-of-the-art open-source tools can perform
this exact function, often with additional features.
Certain vulnerability scanners rely on a single database, which could lead
to a single point of failure. This could result in overlooking a vulnerability
that another database might have detected. Therefore, it’s crucial to select
a scanner that gathers data from a reliable source to avoid missing any
potential vulnerabilities. The size of the system should also be a considera-
tion when selecting a tool, as some scanners may not have the capability
to handle larger systems and could operate too slowly.
The tools ‘Grype’ [16] and ‘Syft’ [17] are examples of products created by
Anchore Inc. [18] Syft is a CLI tool and library that generates an SBOM from
container images and filesystems. Endjin’s predominantly used language
is C#, which isn’t supported by Syft. Anchore’s other product, Grype, can
take a range of different SBOMs and scan them for vulnerabilities, checking
the results against an external database. This could cover the vulnerability
scanning section of my project.
Another tool I found to scan vulnerabilities is called ’Bomber’ [19], an open-
source repository that takes either SPDX [20], CycloneDx, or Syft [17]
SBOM formats. It uses multiple different vulnerability information providers:
OSV [21], Sonatype OSS index [22], and Snyk [23]. You have to pay for
Snyk, while with Sonatype you register for free, and OSV is fully free.
6

3 Background
Docker Scout [24], a built-in feature of Docker, automatically scans con-
tainer images for packages with vulnerabilities. It displays an assigned
CVE, providing details like the vulnerability’s severity and the error’s version
number.
There are several vulnerability scanners that, for this project, aren’t con-
sidered given they are not open-source software; these included FOSSA’s
vulnerability scanner [25] and Vigilant Ops [26], which have tools that create
and scan SBOMs.
3.4 DevOps processes
As it evolved, what was initially conceived as a software development
project has morphed into a DevOps initiative. The primary objective is now
to aid Endjin in enhancing their DevOps workflows.
DevOps is a methodology that encourages communication, automation,
integration, and rapid feedback cycles[27]. A DevOps platform brings
together lots of different tools and processes into one system that can be
maintained[27]. It is built on top of the principles of Agile development,
which focus on incremental and iterative processes[27].
Continuous Integration (CI) and Continuous Development (CD), also known
as CI/CD, play a key part in following and developing DevOps processes.
CI/CD are the processes that automate code management, from running
automated tests to deploying and building the software[28].
This project will support Endjin’s CI/CD processes, which will involve check-
ing software for vulnerabilities. This can be for the codebase as a whole,
or the system developed can be used on individual repositories which can
be checked before being merged into the main branch of code. In terms of
DevOps, this is an autonomous process that can integrate with the rest of
Endjin’s DevOps platform, meaning it will be tied in with the rest of Endjin’s
software.
7

4 Methodology
To design and evalutate the project, I will use a range of different methodo-
logies which will help to achieve the project’s objectives. For each, I justify
any variations in methods I have made to better suit this project and each
method is supported by the appropriate literature.
The first, Requirements Engineering ensures that the stakeholders of the
project get the system that they need and that based on their requirements
the appropriate strategy is taken.
AGILE
4.1 Requirements Engineering
Requirements Engineering is the process of identifying, analysing, spe-
cifying, and managing the needs and expectations of the stakeholders to
implement a software system effectively[29].
The five Requirement Engineering processes to be applied include con-
ducting a feasibility study, eliciting and specifying requirements, verifying
and validating these requirements for accuracy and stakeholder alignment,
and managing these requirements through documentation, tracking, and
stakeholder communication throughout the development lifecycle.
Somerville’s ‘Software Engineering’ book [30] outlines four unique stages
of the Requirements Engineering process. The second process, ‘Require-
ments elicitation and analysis’, talks through two processes about collecting
and deciding on requirements. In the context of my project, where there
is a focus on understanding the requirements from different sources, I
have decided to look at both the elicitation and analysis separately. This
will allow me to more clearly distinguish between the two and therefore
better understand the requirements. I will name the ‘requirements analysis’
section ‘requirements specification’ instead because it better encompasses
both the analysis of requirements and the formalisation of them.
8

4 Methodology
4.1.1 Feasibility Study
This stage of the process aims to answer questions that will help determine
whether the investment of time, resources, and effort into creating and
developing this system will align with the objectives and constraints of both
Endjin and the developer.
Having prepared some questions ahead of the initial meeting with Endjin, I
could work with the stakeholders to understand the scope and objectives of
this project. This is an important stage of the process because it determines
the feasibility of the work that could take place. Initiating a project without
fully understanding its requirements can lead to wasted time and resources,
especially if unforeseen requirements emerge later in the process. It is
natural for systems and requirements to evolve; however, having a good
understanding of the project from the beginning is crucial for selecting the
right approach.
Table 4.1: Table answering feasibility questions
Question Discussion
Does the system contribute to-
wards Endjin’s objectives of being
OpenChain compliant
Yes, being able to track and manage vulnerab-
ilities across Endjin’s codebase will move them
closer to being OpenChain ISOIEC 18974 com-
pliant
Can the system be integrated with
Endjin’s current architecture?
After discussing Endjin’s current architecture, I
can use the same systems and processes as
them to ensure they work together
Can the system be developed
within the time constraints of this
project?
Yes, the base system should be possible in time,
and any leftover time can be contributed to extra
features
Does this system require new tech-
nology that hasn’t been used by
Endjin before?
It does; it requires an open-source vulnerability
scanner, which will need to fulfil a set of require-
ments
Does the system require techno-
logy that isn’t available within the
project’s scope?
No, the technology needed (GitHub and Azure)
has limited availability due to a lack of funds, but
will be enough for this project
What benefit will the OpenChain
specification be to Endjin?
Being compliant means Endjin can be more con-
fident in the open-source software they use and
can use their current architecture as an offering
to future clients
The questions answered in Table 4.1 demonstrate that this project is feasible
to carry out and highlight the appropriate technology available to create a
system that is architecturally similar to Endjins. This means that the project
can viably continue, and requirements can be elicited from Endjin. Knowing
the system is feasible increases the likelihood of success for the project, as
initiating a project that might be out of scope or unsupported by available
technology would be counterproductive.
9

4 Methodology
4.1.2 Requirements Elicitation
Requirements Elicitation is the stage of the process that collects information
regarding the requirements of the system. There are usually a range of
different methods of identifying these, such as surveys, focus groups, and
prototyping. For this project, I will be conducting interviews with Endjin
to understand the current state of their system and the features they are
expecting. As this software is being used in a system for a company that is
currently operating in the industry, their business needs and expectations
for the project must be properly accounted for.
In these interviews with James Dawson [5], the main stakeholder for this
project, I discussed the current state of their DevOps architecture and
learned how their different components currently interact with one another.
This was crucial to learn so I could tailor my solution to be as similar to their
current system as possible and achieve a seamless transition from one to
the other. He outlined a few different requirements and expectations:
• Understanding the size of the vulnerability: Given a vulnerability, what
is its severity, and how quickly or urgently should it be fixed and
looked at?
• How big of an impact is an update? Is it just a minor update, or
will it be a major update that could potentially break current code if
updated?
• Be able to view the current state of each component and whether
there are any vulnerabilities on a site they can easily access.
• Be notified of any vulnerabilities that require a serious change or
could break code if updated.
Because Endjin both develops its own open-source software and uses third-
party dependencies, it is a priority to them that they have the assurance
that their code isn’t vulnerable. This is especially important because Endjin
are consultants, meaning they develop solutions for companies and provide
help to them on specialist subjects. A lot of this involves open-source
software and developing solutions for customers using their own OSS and
freely available OSS.
One of their requirements was that I be led by the OpenChain specific-
ation to make my decisions. Understanding and applying the guidance
from the specification to Endjin would be the most important. Given that
ISO/IEC 18974 primarily serves as a prescriptive standard, there are very
generic aims and goals that can be interpreted differently depending on the
organisation.
10

4 Methodology
Understanding the requirements of the OpenChain 18974
Specification
The OpenChain specification [6] encompasses several aspects, including
awareness and policy. These elements identify who is accountable for
implementing changes and ensuring no vulnerabilities exist. It also involves
training staff members to keep them informed about what changes have
been made, the reasons behind these changes, and any actions they
may need to take to maintain the company’s compliance. While these
are extremely important to become compliant and to be assured that they
are as safe as possible, the main focus will instead be on the high-level
requirements set out by James Dawson mentioned above. This is because
the remaining parts of the specification are primarily for the company’s
discretion, while this project’s focus is on the implementation aspect.
Table 4.2 shows the main implementation practices available to Endjin. The
requirements defined by OpenChain for the security assurance specification
are listed under the ‘Requirement’ column. The second column contains
the decision on whether to cover the implementation as part of the project.
Table 4.2: Table showing the requirements of the OpenChain specification
with the decision I made on whether to implement each one
Requirement Decision
Method to identify structural and technical threats to the supplied
software;
No
Method for detecting the existence of known vulnerabilities in
supplied software;
Yes
Method for following up on identified known vulnerabilities; Yes
Method to communicate identified known vulnerabilities to the cus-
tomer base when warranted;
No
Method for analysing supplied software for newly published known
vulnerabilities post-release of the supplied software;
No
Method for continuous and repeated security testing to be ap-
plied for all supplied software before release;
Yes
Methods to verify that identified risks will have been addressed before
the release of supplied software;
No
Method to export information about identified risks to third
parties as appropriate.
Yes
Internally, Endjin will need to make extra decisions regarding the require-
ments, which would be beyond the scope of this project due to developing
this system externally for Endjin. While these requirements cannot fully be
implemented in the design of the system, the foundation can be created to
support Endjin’s further development.
The requirements that discuss identifying risks before and after supplied
software is released can be partially implemented in this system because
11

4 Methodology
there will be the functionality to scan software for vulnerabilities; however, it
will require some extra steps along with this implementation from Endjin to
fully meet these requirements. Similarly to the requirement, which defines
a method to communicate vulnerabilities to the customer base, whilst this
system itself won’t communicate these changes, it can still create a report
that can be sent to customers. The responsibility, however, to send these
to the customers will be of Endjin.
The requirements appropriate and in scope for this project include creating
methods for detecting the existence of known vulnerabilities and displaying
this information in a useful way. This includes having a method that will
allow for continuous integration and testing. Finally, to have a method to
generate a report that Endjin can send to appropriate third parties.
4.1.3 Requirements Specification
Requirements specification is the process of formalising the data collected
in Requirements Elicitation. It encompasses the analysis stage to prioritise
and structure requirements into easy-to-manage tables. These will describe
the expected functionalities of the system and set expectations for how they
are run, grouped by functional and non-functional requirements.
Firstly, the functional requirements (Table 4.3), which define how the system
must work [31]. This includes all the functionalities defined by Endjin and
the OpenChain specification, which together will guide and shape the
project.
Table 4.3: Table showing the functional requirements
No. Requirement
1 Collect SBOMs from the cloud;
2 Convert SBOMs to the correct format;
3 Scan each SBOM for vulnerabilities;
4 Store vulnerability data;
5 Cleanse vulnerability data to improve clarity;
6 Generate a report with patch number recommendations;
7 List update types in patch recommendations e.g. patch, minor, major;
8 Display data on a central site;
9 Assign severity scores to identified vulnerabilities;
10 Include version number and patch recommendations for vulnerabilities identified;
11 Identify the source SBOM for each vulnerability;
12 Store vulnerability reports in the cloud;
13 Automatically run the system upon code changes;
14 Users can sort data from ascending to descending severity on the output site;
15 Seamlessly integrate with Endjin’s CI/CD pipelines;
16 Able to scale for varying numbers of SBOMs;
17 Keep vulnerability data private;
12

4 Methodology
The success of the project is not only determined by meeting Endjin’s
requirements but also by ensuring easy integration into their current code-
base, with the overarching goal of minimising additional work for Endjin.
The non-functional requirements (Table 4.4) describe how the system
works; they don’t affect the functionality of the system; however, they are
focused on the system’s usability and robustness.
Table 4.4: Table showing the non-functional requirements
No. Requirement
1 Able to handle a varying number of SBOMs;
2 Should be reliable;
3 Should integrate seamlessly with Endjin’s current architecture;
4 Should be easy to maintain, especially for Endjin;
5 The system should be secure;
4.1.4 Requirements Verification and Validation
This stage of the Requirement Engineering process is where the require-
ments are reviewed and checked so that, together, they support the stake-
holder’s overall understanding of the system and the communicated re-
quirements. This process validates the specified requirements, ensuring
no functionalities are overlooked. The completion of these indicates the
system’s coverage and readiness. To verify the requirements, it must be
considered whether they are completable within the scope of the project
and to ensure that the requirements don’t collide with one another [29].
To validate the requirements, I engaged with the stakeholders to review both
the functional and non-functional requirements derived from the interviews.
This ensured we were all aligned with our understanding of the system’s
expected functionalities.
To verify the requirements were suitable for the system as a whole, I re-
viewed the requirements to ensure there were no inconsistencies and that
they were testable so that during implementation they could be demon-
strated to prove that the functionality exists and works.
I reviewed and conducted this step of the Requirements Engineering
method multiple times during the process of my project to ensure the
requirements were still relevant. Further along in the project, when there
is more knowledge about the system as a whole, it could change the
perspective on these requirements, highlighting some that may not be
possible or could have conflicts. Additionally, I checked in with the stake-
holders regularly to discuss the progress of the project and to ensure these
13

4 Methodology
requirements were still a reflection of their expectations.
4.1.5 Requirements Management
Requirements Management is an important part of Requirements Engin-
eering to track the changes that could occur. The requirements that are
designed are crucial to the development and success of the project, so hav-
ing a versioned history of these helps to keep track of what was completed,
and what still needs to be done.
I utilised GitHub Projects [32] within my repository to monitor the progress
of my implementation. Projects are made up of ‘issues’ and ‘pull requests’
which can be categorised on a visual board. I used the columns ‘To-
Do’, ‘In Progress’, and ‘Done’. Each issue was a card on the board that
represented a task that needed to be done. Some of the requirements were
better broken down into smaller tasks, which made it easier to measure
the progress when working toward a specific requirement. For each of the
tasks I assigned a ‘Task Size’ to label how big the task was, and therefore
how long it might take to complete.
This management board was useful during the process of the development
of the system. Being a singular person working on this project, it was useful
to be able to track my progress, both for Endjin and myself, because they
can see what features are currently in progress and how the project is
coming along.
4.2 Picking a Vulnerability Tool
When researching an appropriate tool for my system, it needed to fulfil
all the requirements of my system. This includes being able to scan an
SPDX-format SBOM for components and using vulnerability data from an
external database to check these for any security threats. The tool would
also have to identify what the threat is and, if there is a patch, communicate
this information.
Initially, the first tool I came across, as discussed in my background section,
was Bomber, the vulnerability scanner. There were a few different issues
with this tool. Initially, the tool scanned a folder full of various SBOMs. To
save time, it extracted all components from the SBOMs, removed duplicates,
and then queried each unique component against a database. Despite this
efficiency, the workflow, as discussed later, still consumed a considerable
14

4 Methodology
amount of time. When the system combined all the components, it lost
crucial information about which SBOM each component was associated
with. This information is vital because, if vulnerabilities are detected, it’s
necessary to pinpoint the repository to which the component belongs, and
a component might be present in multiple repositories. The workaround
for this was to create a script that mapped the components back to each
repository.
However, further into the project, I realised the tool didn’t identify the re-
commended patch numbers for vulnerable components. The requirement
to have patch versions is a prioritised requirement for both Endjin and
the OpenChain specification. Having this number means the system can
automate updates, meaning that patches can be fixed instantly. The stake-
holders specifically wanted this information and insight into the size of the
fix (e.g., patch, minor, or major), and the system couldn’t be a success
for any of the objectives without this feature. Therefore, I decided to find
a new tool that would more accurately fit the requirements of my system.
While this would take more time to implement and rewrite some of the data
manipulation already done, it was an essential step required for the system
to operate as anticipated.
This highlighted an important part of the project, indicating that more re-
search and testing should have been conducted on the tool before selection,
as all subsequent work and analysis were based on this tool. From then on
in the project, I made sure to review any decision that was being made and
check it against the requirements to ensure it was right for the project.
The new open-source scanner I found, called Trivy, can access multiple
vulnerability databases and provide much more thorough information than
Bomber did; this included patch version numbers for vulnerabilities. After
having found and tested this new tool against Endjin’s SBOMs, I decided
it was a much better fit for this project and therefore updated my data
manipulation to fit the new schema.
4.3 Continous Integration (CI)
Another methodology used was continuous integration (CI). While the
project itself is supporting Endjin’s CI, there have also been CI methods
used in the development of this system.
I have designed some tests that will run when there is a change to the code,
a branch is created, or a branch is about to be merged into the main branch.
If these tests pass, it indicates that the changes to the code haven’t caused
15

4 Methodology
any errors and that the code should run as expected.
When I push a change to GitHub, the pipeline will automatically run, check-
ing that each part of the system works. If a success, the workflow will show
a green tick, and if not, a red cross. This shows if any of the changes
that were made are breaking changes or a change that wasn’t correctly
implemented. The design of the system ensures that each stage of the
pipeline relies on the stage before, so if one part fails, so does the whole
pipeline. This is great for diagnosing problems because it makes it easy
to see where the workflow failed, and generally, there is an accompanying
message about the error.
4.4 Evaluation Process
4.5 Ethical Considerations
It is important ethically and in regards to saftey that the practices along this
project are considered. There are several different factors when it comes
to working with data in this context that may need to be addressed.
Endjin is giving access to a portion of their SBOM data that can be used to
test and develop the system with. Whilst this data has been given for the
purpose of this project, it is important that it isn’t misused or lost.
Throughout the process of this project Endjin, the owners of the data and
the stakeholders of the project, have access to the working repository and
are able to see the code and what it is being used for, creating a good level
of transparency.
It is also important to consider the security of the project, in both the tools
that are being used and the software that is hosting the system. When
making decisions about the technology being used in this project, I will
consider the security implications.
16

5 Design and Architecture
Endjin currently has many DevOps processes running that manage their
codebase and ensure the software is up-to-date and in line with their
internal set requirements. The majority of these processes are Continuous
Integration (CI) and Continuous Deployment (CD) processes. These are
run as GitHub Action workflows, which can make changes to repositories
and generate and report data.
Endjin’s implementation of the first OpenChain standard ISO/IEC 5230
runs majoritively through different GitHub Actions workflows, running the
SBOM generator tool Covenant directly onto their repositories and then
using these to generate a report of their license statuses, and then saving
these SBOMs to the datalake to save in-case of further use.
The design of this system will mirror Endjin’s implementation of the first
OpenChain standard (ISO/IEC 5230) whilst bringing it in line with the new
standard (ISO/IEC 18974). This will make maintenance and further devel-
opment of the system much easier if the two systems can work concurrently
with one another so Endjin can be familiar with the system when moving it
over to their codebase and start running it. This design philosophy helps
achieve the functional requirement of a seamless transition.
Figure 5.1 represents the proposed system design as a whole, and how
the different components and data in the system interact with one another.
5.1 High-Level Architecture of the System
For the system to integrate with Endjin’s existing processes, I intend to
design a workflow that will run in GitHub Actions, similar to Endjin’s pro-
cesses.
To represent each repository, an SBOM will contain the components used
in each system and their version numbers. This is what will be needed to
find whether there are any known vulnerabilities in the system. Being able
to get SBOMs as a source of data from Endjin means I can get an overview
of the system without needing the source code itself. In my system I will
17

take these from my replicated Azure cloud storage container, therefore,
when Endjin fully implements it into their systems, it will be easy for them
to feed their SBOMs in as it will be in the same folder structure.
For maintainability, Endjin has processes in place that ensure components
are up-to-date, without any major breaking changes. This ensures that
the software can support changing versions, and will be easy for Endjin to
continue using in the future. GitHub Actions automatically have updates to
the software and hardware that runs the workflows meaning Endjin won’t
need to worry about keeping this up-to-date.
Figure 5.1: System architecture design diagram
In terms of scalability, this system architecture is flexible to the amount of
inputs required. A folder containing either one up to thousands of different
SBOMs could be fed into the workflow, which would each individually get
scanned and have a report generated. Being run as a workflow in GitHub
Actions, the hardware capabilities are flexible also, as it will be run on an
external service, rather than being limited by the hardware that Endjin may
have.
The system as a whole is a workflow, that starts with the SBOMs as an
input (representing Endjin’s codebase) and produces a set of reports that
can be used for different purposes relating to the OpenChain specification.
I will talk more about these in the sections below.
As shown in Figure 5.1 I decided to bring modularity into the system
because it helps to bring cohesion. This makes it easier to understand,
18

maintain, and debug. For each modular section, all the components are in
one place.
Additionally, the modular architecture of the system enhances testing as it
allows for targeted and more specific testing for different components. By
having separate components, it isolates their processes from one another,
making it easier to see how they communicate. This helps with debugging
because errors are isolated to the section in which they occurred, making
it more obvious what the problem could be. This works toward the non-
functional requirements that require the system to be reliable and easy to
maintain.
5.2 Vulnerability Scanner
The ‘Vulnerability Scanner’ section of the workflow is responsible for getting
the SBOMs and scanning their components for vulnerabilities. This stage
of the process builds the foundation of the rest of the pipeline because it
scans for vulnerabilities which is one of the most crucial requirements.
The SBOM data will be stored in the Azure Data Lake, as replicated by
Endjin’s current architecture, meaning it will be easy for them to transition to
using this system. This data is stored in nested directories which represent
the different GitHub organisations and repositories they belong to and
so will need to flatten the data to ensure the scanner can access all the
SBOMs. ‘Flattening’ will mean taking each of the SBOMs from their nested
positions and storing them in one directory instead. This will all take place
in the ‘Flattening Input Data’ script.
The SBOMs, at the start of the system, will be in the Covenant custom
format, which will make it difficult for a vulnerability scanner to extract the
required information. Therefore, they will need to be transformed into an
SPDX format because it is the most similar to the current custom format,
and is one of the most popular SBOM formats, meaning there’s a good
chance the scanner will work well.
Once the SBOMs are in SPDX format, they can be given to the vulnerability
scanner tool, which will scan the SBOMs and produce the vulnerability
report. Now that the vulnerabilities have been scanned, this stage of the
workflow has finished. As mentioned earlier in the report, I had previously
chosen a different vulnerability scanner, however as it wasn’t fit for purpose
I implemented a different one instead.
The only data outputted at the end of this section is the vulnerability report.
19

I grouped these parts because they are the only components that deal
with the raw SBOMs, and after this stage, the only information that is
handled will be the vulnerability report. As mentioned above, the benefits
of modularity mean any errors that may occur in this stage of workflow will
be isolated from the other stages, that have different goals.
5.3 Analysing Vulnerability Data
This stage processes the information received about the vulnerabilities
detected. This part aims to cleanse the data and create some meaningful
reports which can be used as part of a report, or to support Endjin’s DevOps
processes.
The data inputted at this stage of the workflow is the data that is outputted
at the end of the last stage the ‘Vulnerability Scanner’. The vulnerability
report will be imported by the ‘Analysing Vulnerability Data’ script, which is
then used for analysis.
Each SBOM is represented by an individual vulnerability report. The first
step the ‘Analysing Vulnerability Data’ script will take is to merge these
into one table. This is beneficial as data will no longer be coming from
multiple sources, and will be more efficient to spot trends or patterns in
the data. The functional requirement for sorting data and displaying it on
a central site will be more achievable by having data that is easy to apply
and access.
The data that is generated by the Vulnerability Scanner will be nested, and
therefore less easy to work with. It means specific data is more easily
accessed and will make it easier to integrate with other tools later in the
workflow. This is important to fulfill the objective to display the data on
a central site and to produce a patch report. The flattened data will be
easier to manipulate and apply business logic to, for example, deciding on
a specific version to update.
After the data has been flattened it can be filtered and cleansed to remove
the data that is no longer needed. To create a central site and to store
vulnerabilities in the cloud, this data needs to be useful. The site, for
example, needs only the data it will use, as anything else could cause
complications and an unnecessary use of storage.
The functional requirement ‘generate a report with patch number recom-
mendations’ can be satisfied with the step above, but will need some extra
logic to determine how big of a change the fixed version number is. Adding
this information as another column to the table will allow Endjin’s current
20

DevOps processes to automatically update their components with the right
version numbers.
5.4 Publishing Results to Datalake
Once all the reports have been generated from the ‘Analysing Vulnerability
Data’ stage, these need to be stored somewhere that the information can
be accessed and used. Endjin’s current architecture heavily uses the Azure
storage data lake to store lots of their data. This includes their SBOM
data and the analysed results. Therefore, to follow Endjin’s architecture as
closely as possible, meaning more of a seamless transition for them to add
this tool to their system, the output data will be stored in the cloud.
Aligning with Endjin’s current architecture isn’t the only benefit of saving
the data to the cloud, it also enables the central site (as defined in the func-
tional requirements) to access this information more easily. The site is an
important part of the project, which displays the results of the vulnerability
data so that Endjin can see the current state of its codebase concerning
safety.
5.5 Designing the Backstage Site
Whilst the central site (as defined in the requirements) isn’t the main focus
of the project, it is important to demonstrate how the data can be visualised.
The value of the data can be assessed when correctly displayed because
it is an important step in understanding the vulnerabilities in the system,
which is the main purpose of this project.
Endjin uses an open-source framework called Backstage for building de-
veloper platforms [23], created by Spotify. Endjin uses it to track the de-
pendencies and components in their systems. It was designed to increase
the ability to centralising information and use the same format throughout,
maintaining consistency. As Endjin currently uses it as a development tool,
therefore it would be the ideal place to create an overall report page for the
vulnerability information.
The Backstage site would ideally display a summary of how many vulnerab-
ilities are in each severity band, and also a more in-depth view of the report
as a whole. The summary allows the reviewer to understand what the state
of the system is, which may influence whether they take action or not, for
example, if they see there are 5 critical vulnerabilities then it indicates there
21

needs to be an urgent update, however, if there are only 3 medium ones,
these could be scheduled for the upcoming week. This information would
be pulled from the datalake and then used to display the data, this means
there will need to be authentication for the site so the site can safely access
the information.
22

6 Implementation
6.1 GitHub Actions Workflow
To individually group and run the scripts in this system, as described in
the design, I used GitHub Actions. The design describes different parts of
the system running individually and passing information along. I separated
these into separate jobs in the GitHub Actions workflow that can have
artifacts passed along the system. To do this I used a singular GitHub
Actions script which described each individual job, including which scripts
needed to be run, and also set up some extra conditions. Artifacts [33]
are a collection of files that allow information to be shared between jobs in
a workflow. For this project, artifacts can be used to pass data about the
SBOMs and vulnerability report between different stages of the process.
This means that data doesn’t need to be uploaded to the cloud between
steps, and the information doesn’t need to be written into the code of the
repository. It makes it easier to individually run these jobs, which makes it
easier to debug them, as mentioned in the design section of the report. At
the end of the system, these artifacts get published to the repository as a
release so that they can be checked as part of an end-to-end process?
Two separate parts of the workflow both connect to the Azure datalake
to either pull or push infomration. Access to information in the data lake
requires a token of access, the most relevant in this scenario is using a
SAS token. In Azure I can generate one of these codes and then add it
to the repositories secrets, which can then be referenced as environment
variables when the workflow runs. This makes it more secure to connect
to the data lake. Even if the repository is public, the token isn’t available
for anyone to see, even the user that added the token, so it is very safe.
Commands can then be made to the data lake which can get and put
information.
23

6 Implementation
6.2 Changing vulnerability scanner tool
The vulnerbaility tool is a fundamental component to this project. As
mentioned both in the methodology and the design, the vulnerability tool
being used in this project is open-source and therefore needs to be imported
into the workflow. To do this we can use a simple command which will find
the latest version of the tool:
As discussed in the methodology, I had decided to switch to a different vul-
nerbaility tool halfway through the project. Whilst this was hugely beneficial
for the project, it meant that there would need to be changes to the rest of
the system. Whilst it was extremely easy to change the tool in the GitHub
Actions process, there had to be some data processing changes in the
Python scripts as part of the system. This was mainly because the output
of the first tool had a different output to the new one, so needed some
different processing to get the same information and structure as before.
This meant though I only had to make change to one job in the system, the
analysing data section. This mean that I could test that any changes to this
part hadn’t affected other parts of the system. Whilst this did cause more
time and effort to change the tool, after review the design only reduced the
time that was needed to change the tool, however didn’t completely remove
the effort needed for the change. I don’t think any design changes would
have impacted this though, as the data was very different.
6.3 Backstage site
The Backstage site, as mentioned in the Design section, is the central
site that aims to fulfill the functional requriment of having a central site for
the data to be displayed. The final two files in the workflow; ’summarised
vulnerabilities’ and ’simplified vulnerability report’ are the ones that will be
displayed on the site.
Due to the nature of the Backstage site, it is stored in a separate repository
and has no relation to the worklflow. To access the data which is processed
by part of the workflow, the site needs access to the data lake. Endjin gives
the site access by having a login mechanism on the site, which connects a
user to a Microsoft account which in turn will provide access to the data (if
they have it). Having created a data lake for the purpose of this project, my
intention was to replicate Endjin’s idea and connect to the project’s data
lake. This work would have been useful to demonstrate how the site can get
the latest information for this project. Timing-wise, Endjin weren’t available
for questions about this part of how they implemented the authentication
24

6 Implementation
part of the site, and due to the timing of the project I simply had to drag
the inputs of the files into the code, to demonstrate how the data can be
presented. For Endjin, using this plugin in their own Backstage site, they
would need to copy the plugin over and it should run smoothly. As they
already have implemented the authentication they wouldn’t need any of my
code to make it work. Whilst the functional requirement for creating the
central site isn’t fully functional, it still demonstrates the capability of the
software and what the data would look like if it had been pulled from the
datalake.
6.4 Creating CI/CD tests
6.5 After first meeting with JD
Toward the end of the project, once most of the implementation was done
I had a meeting with James Dawson to discuss the progress and to see
if there were any misalignments with the requirements. I did a full walk-
through of the project from start to finish and questioned whether the
system was fit for purpose. On the whole, everything looked as though it
was on track to meet the requirements, however, there were a few changes
to the data analysis part of the system that would dramatically increase the
usefulness of the system. Whilst these were small changes regarding the
amount of information being displayed, and some extra data manipulation
to display more meaningful data, these would have a big impact in the
usability and quality of the system.
25

7 Evaluation
The evaluation section of this report is crucial for reviewing the methodo-
logies used, design and implementation of this project. To measure this,
there are a set of results which will reflect the success of the project. The
objectives of this project was to create a system for Endjin which would aid
their processes in becoming OpenChain ISO/IEC 18974 compliant. This
included creating a workflow which would seamlessly integrate with their
existing processes. However, ultimately the main goal is to fulfil the specific
requirements and desires of the stakeholders.
After completing the project implementation, the outcome is a robust system
which is designed to process SBOMs efficiently. An SBOM is inputted into
the system, which is then scanned for vulnerabilities. The scanned data
undergoes manipulation and analysis and is transformed into multiple
reports, designed for different purposes (e.g. Backstage site or 3rd party
reports). The reports are stored in the data lake. The system operates
seamlessly from start to finish, and can proccess varying quantities of
SBOMs without any errors.
7.1 Interview with Endjin
Once the implementation had been finished and concluded, I organised
another meeting with Endjin to give them a demonstration of the system as
a whole. Being the main stakeholders of this project, it was important that
the implementation of the system would be as expected. This reflects the
project aim, to create a system for Endjin which will seamlessly integrate
into their current architecture.
After giving a demonstration of the system functionality, I asked a set of
questions to the stakeholders at Endjin which would help understand their
thoughts on the system as a whole, and how easy it would be for them to
integrate into their current architecture.
Endjin, as described in the interview, said they thought they system would
integrate well with their current architecture, and they said they could "plumb
it straight in". The core functional core functional requirements that were
26

7 Evaluation
provided were "spot on", and having seen the data outputted, some further
refining of the SBOMs that are being used could be done to make the data
clearer. Taking this into account, I adjusted some of the system which
would then only display the most recent SBOMs. When asked whether
they would use the system, they agreed and said they believe it would
get them closer to becoming OpenChain ISO/IEC 18974 compliant, and
have already started integrating it into their current systems. Appendix A
contains the table of this feedback from Endjin in detail, whilst the above
describes a summary of their feedback.
After summarising the interview with Endjin, it is clear that the stakeholders
are pleased with the system presented and it fulfils the requirements initially
set out. This reflects the success of the Requirements Engineering method
in defining and managing requirements along the proccess of designing
and implementing the system. The requirements engineering ensured that,
as the developer of the system, the stakeholder’s ideas and needs for the
system were recognised and reflected. Having had several meetings with
Endjin throughout this project regarding their ideas, the main points were
that the project is led by the OpenChain specification and that it was easy
to integrate into their architecture. Understanding and discussing the state
of their current systems allowed me to tailor the system to their architecture
specifically, as defined by the aim of this project to create a system that
Endjin can use to scan their codebase for vulnerabilities. Whilst there are
already state-of-the-art tools that exist, Endjin needs a custom tool that
will work specifically for their codebase. Having defined and reviewed the
requirements throughout the stage of the process ensured that Endjin got
the system that was curated for them.
7.2 System Functionality
To ensure the system ran as expected I conducted a manual checking test
which compared one SBOM to a vulnerability database to check whether
all the vulnerabilities for the components in the SBOM were discovered.
This test returned successful, identifying all the vulnerabilities. This gives
confidence that the system works as expected and, as described as a
non-functional requirement, is reliable.
Checks also included manually checking the output of the files which were
generated as part of the workflow. These were uploaded to the data lake,
so I checked they contained the right data types, and that the data made
sense, e.g. the column containing the severity scores actually contained
values which described the severity of the vulnerability.
27

7 Evaluation
Finally, when there are changes to the workflow’s code, automatic CI/CD
tests are run which checks the unit tests to make sure none of the existing
code’s functinoality had been broken. This insured confidence when making
changes to the system that the system was still functioning as it should be.
7.3 Functional and Non-Functional
Requirements
After reviewing the above results, from the interview with Endjin regarding
their thoughts on the system, and the measured functionality of the system,
I created a table for both the functional (Appendix B) and non-functional
requirements (Appendix C) which described and justify their completeness.
These results can represent whether the system was a success by following
the requriements which represent the project aim and objectives.
All the functional requirements B were system were met, meaning the
system as a whole should represent the aim and objectives of the project.
As mentioned earlier in the project, the first vulnerability tool didn’t satisfy
one of the functional requirements which would have meant the system
didn’t reflect the stakeholders needs and the overall aim of the project.
Choosing a new tool, which looked for fixed version numbers, meant that
all the functional requirements were satisfied. Whilst changing the tool
took more time and effort, it ensured the success of the project because
otherwise Endjin wouldn’t be able to manage the vulnerabilities in their
system by using the version numbers to update each component. Whilst
harder to establish due to some of these being subjective, all the non-
functonal requirements B were met. This meant the system was reliable
and will integrate seamlessly with Endjin’s architecture.
Once implementing the new vulnerability scanner, Trivy, it was clear that
more research and testing should have been conducted on the scanners
before deciding on one. It was significantly faster, looked at more databases,
scanned SBOMs individually, and importantly produced the fixed version
number for the component with the vulnerability.
The design of the system meant that each different stage of the produced
workflow would be modular. Having a modular system made debugging
easier, and allowed for more targeted testing on specific components. It
meant that the minimum amount of information required for each stage was
provided, reducing the risk of error. From the stakeholder’s point of view,
when integrating a new workflow into their current system, if it is clear what
each module of the system does, it helps their understanding of how the
system works. When changing vulnerability scanners, due to the system’s
28

7 Evaluation
modularity it meant that not much of the code for the rest of the system
needed to be changed.
7.4 Synthesis
Following the feedback from Endjin regarding the implementation and re-
quirements for this project, along with the break-down of the functional and
non-functional requirements and the evidence from the overall functionality
of the system, it is clear that a workflow has been created that will be
beneficial to Endjin to move closer to becoming OpenChain ISO/IEC 18974
compliant. The workflow collects the SBOMs (which represent all the com-
ponents across Endjin’s codebase) and produces a vulnerability report that
provides information on the vulnerabilities found for certain components.
The OpenChain specification defines processes that should check ‘open
source Software against publicly known security vulnerabilities like CVEs,
GitHub/GitLab vulnerability reports’ [6], which is exactly what this workflow
does. As mentioned in the review with Endjin, the stakeholders say that
this will bring them closer to becoming OpenChain compliant from an
implementation stance, and they will integrate this into their current systems,
in fact have already started working on it. The system will ensure the open-
source software they use has been checked for vulnerabilities, and the
resulting data can be used to update their components autonomously.
The OpenChain ISO/IEC 18974 is a brand new specification meaning
there is room for growth and development around the management of open-
source software from a security perspective. There are many state of the art
tools that currently exist for scanning vulnerabilities, however the application
of the specification to the individual organisation is what makes this project
unique. Now that Endjin has a new system which seamlessly integrates
with their current architecture, it will be easy to add extra functionality if the
state of the art develops or if there are any changes to the standard.
As mentioned in the project aim, this project aims to create a system for
Endjin that will aid the implementation of the OpenChain specification and
can be integrated seamlessly into their current architecture. A lot of the
decisions around the design of this project were based on the requirement
for the workflow to be similar to Endjin’s current architecture, these were
then reflected in the implementation. Endjin, as stakeholders in this project,
state that it will be easy for them to maintain and integrate into their current
systems.
29

A Endjin’s Feedback
Table A.1: Table showing Endjin’s feedback
30

A Endjin’s Feedback
Question Endjin’s Thoughts
Do you think the system
aligns well with your cur-
rent architecture (DevOps/-
CI/CD)?
Yes. They said the fact that it was in GitHub Actions
was really good, they should be able to ‘plumb it straight
in’. The way the system is designed will allow it to work
for both a single SBOM, which can be checked in a PR
to see if there have been any changes, and for a whole
system-level view. Endjin will have to create the outer
shell to get the instant feedback built into their system,
which I wouldn’t have been able to handle myself
How easy will it be for you to
integrate this tool into your
current architecture? Will
you have to change much
to get it up and running?
Once they have integrated it into their current archi-
tecture they will have a better idea. However, at the
moment it looks good and doesn’t seem like anything
will stop them from being able to implement it.
Do you feel this software
covers your requirements?
If not, is there anything that
is missing? If yes, what
functionalities are missing?
Yes. For the core functional requirements, it’s spot on.
They asked how well it fulfils the OpenChain specifica-
tion, and if so then yes. They said there could be some
refining of the SBOMs that are being used, as some of
them are older scans that may not be useful to visualise
in Backstage.
Do you think you will use
this system?
Yes. Integrate it into current CI/CD processes, use it to
look at a singular SBOM when any changes have been
made to the repository to get feedback before merging,
and also use it as a whole to see the whole current state
of the system. Will use the patch report to automate
updating components. Will use the Backstage plugin
as part of their Backstage site to track vulnerabilities.
Will the system help you get
closer to becoming Open-
Chain ISOIEC 18974 com-
pliant?
Yes
Is the Backstage site easy
to understand?
Yes, maybe some hyperlinks would be useful that link
to the repository in GitHub that the vulnerability belongs
to. Otherwise, it follows the Backstage general format-
ting and will be easy to integrate into their current site.
Would have been nice to see the authentication to data
lake demonstrated, but doesn’t affect any work they
need to do
Do you feel the functional-
ities are a reflection of the
requirements you specified
in our initial interview?
Yes
Were the tasks easy to fol-
low on the GitHub project
board?
Yes.
Does this system make you
more confident you have
knowledge of the vulnerabil-
ities across your codebase?
Once implemented into its own system it will, yes.
Please feel free to add any
more comments regarding
the final product of this sys-
tem:
“Really good job”
31

B Fulfillment of Functional
Requirements
Table B.1: Table showing the functional requirements and whether they
have been fulfilled
32

B Fulfillment of Functional Requirements
No. Requirement Fulfilled
1
Collect SBOMs from the
cloud;
Yes. The SBOMs are collected from the cloud at
the beginning of the system
2
Convert SBOMs to the cor-
rect format;
Yes. SBOMs are converted to SPDX in the ‘scan
SBOMs’ script
3
Scan each SBOM for vul-
nerabilities;
Yes. Each SBOM is scanned by a tool called ‘Trivy’
to detect vulnerabilities
4 Store vulnerability data;
Yes. The vulnerability data is stored in the data lake
at the end of the workflow
5
Cleanse vulnerability data
to improve clarity;
Yes. The vulnerability data is cleansed and split
into different documents to improve clarity and to fit
purpose
6
Generate a report with
patch number recommend-
ations;
Yes. A report called ‘patch-report’ is generated
which checks the patch version and determines the
size of the update
7
List update types in
patch recommendations
e.g. patch, minor, major;
Yes. Listed as a dictionary in a column in the report
generated called ‘patch-report’
8
Display data on a central
site;
Yes. Demonstrated data can be displayed on a site
used by Endjin for internal development. Whilst the
data isn’t automatically pulled from the data lake,
it demonstrates how the data can be displayed to
provide value
9
Assign severity scores to
identified vulnerabilities;
Yes. The vulnerability scanner assigns either ‘Crit-
ical’, ‘High’, ‘Moderate’, ‘Low’, or ‘Unassigned’ to
each vulnerability
10
Include version number and
patch recommendations for
vulnerabilities identified;
Yes. These are collected by the vulnerability scan-
ner and included in all the output tables
11
Identify the source SBOM
for each vulnerability;
Yes. Each row in the data representing a vulnerabil-
ity has a value representing the SBOM the compon-
ent came from
12
Store vulnerability reports
in the cloud;
Yes. The reports are stored in an Azure data lake
which is also a part of Endjin’s architecture
13
Automatically run the sys-
tem upon code changes;
Yes. When changes to the code are pushed to the
repository, the workflow will automatically run
14
Users can sort data from
ascending to descending
severity on the output site;
Yes. There is a table that can be sorted
15
Seamlessly integrate with
Endjin’s CI/CD pipelines;
Yes. Following Endjin’s review, it follows their current
architecture and will be quick and easy for them to
integrate with
16
Able to scale for varying
numbers of SBOMs;
Yes. This tool can be used for a singular SBOM or
many SBOMs. Endjin pointed out this feature which
could be used to check a singular repository before
merging a branch.
17
Keep vulnerability data
private;
Yes. All the data given is stored solely on the data
lake, which can only be accessed by the GitHub
Actions script and the owner
33

C Fulfillment of Non-Functional
Requirements
Table C.1: Table showing the non-functional requirements and whether they
have been fulfilled
No. Requirement Fulfilled
1
Able to handle a vary-
ing number of SBOMs;
Yes. This tool can be used for a singu-
lar SBOM or many SBOMs. Endjin poin-
ted out this feature which could be used to
check a singular repository before merging
a branch.
2 Should be reliable;
3
Should integrate seam-
lessly with Endjin’s cur-
rent architecture;
Yes. Following Endjin’s review, it follows
their current architecture and will be quick
and easy for them to integrate with
4
Should be easy to
maintain, especially for
Endjin;
Yes. Following Endjin’s current architec-
ture, they use the same technology intern-
ally, meaning it will be easy to update and
maintain
5
The system should be
secure;
Yes. The information being taken from the
data lake is being accessed by a SAS token,
which is only available to the GitHub Actions
workflow which keeps it as a secret
34

D GitHub Actions Workflow
Diagram
Figure D.1: Diagram representing the different steps in the GitHub Actions
workflow script
35

E Screenshot of project board on
GitHub
Figure E.1: Screenshot of project board on GitHub
36

Bibliography
[1] L. Zhao and S. Elbaum, ‘Quality assurance under the open source
development model,’ Journal of Systems and Software, vol. 66,
no. 1, pp. 65–75, 2003, ISSN: 0164-1212. DOI: https : / / doi . org /
10.1016/S0164-1212(02)00064-X. [Online]. Available: https://www.
sciencedirect.com/science/article/pii/S016412120200064X.
[2] Endjin.com, 2022. [Online]. Available: https://endjin.com/.
[3] Openchain iso/iec 18974 - security assurance, OpenChain, 2023.
[Online]. Available: https : / / www. openchainproject . org / security -
assurance.
[4] Openchain iso/iec 5230 - license compliance, OpenChain, 2023.
[Online]. Available: https : / / www. openchainproject . org / license -
compliance.
[5] J. Dawson, James dawson - principal i. [Online]. Available: https:
//endjin.com/who-we-are/our-people/james-dawson/.
[6] 2023. [Online]. Available: https://github.com/OpenChain-Project/
Security-Assurance-Specification/blob/main/Security-Assurance-
Specification/DIS-18974/en/DIS-18974.md.
[7] J. Lindner, Open source software statistics [fresh research] gitnux,
2023. [Online]. Available: https://gitnux.org/open-source-software-
statistics/.
[8] R. Shewale, Android statistics for 2024 (market share & users), 2023.
[Online]. Available: https://www.demandsage.com/android-statistics/.
[9] [Online]. Available: https://logging.apache.org/log4j/2.x/.
[10] [Online]. Available: https://www.ncsc.gov.uk/information/log4j-
vulnerability-what-everyone-needs-to-know.
[11] C. Eng, B. Roche, N. Trauben, R. Haynes and C. Eng, State of log4j
vulnerabilities: How much did log4shell change? [Online]. Available:
https://www.veracode.com/blog/research/state-log4j-vulnerabilities-
how-much-did-log4shell-change.
[12] [Online]. Available: https://www.blackberry.com/us/en/company/
newsroom/press-releases/2022/blackberry-announces-first-openchain-
security-assurance-specification-conformance-in-the-americas.
[13] 2023. [Online]. Available: https://github.com/patriksvensson/covenant.
37

Bibliography
[14] 2022. [Online]. Available: https://nvd.nist.gov/.
[15] 2022. [Online]. Available: https://nvd.nist.gov/general/cve-process#:~:
text=Founded%20in%201999%2C%20the%20CVE,Infrastructure%
20Security%20Agency%20(CISA)..
[16] 2023. [Online]. Available: https://github.com/anchore/grype.
[17] 2023. [Online]. Available: https://github.com/anchore/syft.
[18] 2023. [Online]. Available: https://anchore.com/.
[19] 2023. [Online]. Available: https://github.com/devops-kung-fu/bomber.
[20] 2021. [Online]. Available: https://spdx.dev/.
[21] 2022. [Online]. Available: https://osv.dev/.
[22] I. Sonatype, Sonatype oss index, 2018. [Online]. Available: https:
//ossindex.sonatype.org/.
[23] 2023. [Online]. Available: https://snyk.io/.
[24] 2023. [Online]. Available: https://docs.docker.com/scout/.
[25] 2023. [Online]. Available: https://fossa.com/product/open-source-
vulnerability-management.
[26] 2023. [Online]. Available: https://www.vigilant-ops.com/products/.
[27] GitLab, What is devops? 2023. [Online]. Available: https://about.
gitlab.com/topics/devops/.
[28] GitLab, What is ci/cd? 2023. [Online]. Available: https://about.gitlab.
com/topics/ci-cd/.
[29] GfG, Requirements engineering process in software engineering,
2024. [Online]. Available: https://www.geeksforgeeks.org/software-
engineering-requirements-engineering-process/.
[30] I. Sommerville, Software engineering. Addison-Wesley, 2007.
[31] [Online]. Available: https://enkonix.com/blog/functional-requirements-
vs-non-functional/.
[32] [Online]. Available: https://docs.github.com/en/issues/organizing-
your-work-with-project-boards/managing-project-boards/about-
project-boards.
[33] [Online]. Available: https : / / docs. github. com / en / actions / using -
workflows/storing-workflow-data-as-artifacts.
38

Charlotte Gayton's OpenChain ISO 18974 Dissertation

More Related Content

Similar to Charlotte Gayton's OpenChain ISO 18974 Dissertation

Recently uploaded

Charlotte Gayton's OpenChain ISO 18974 Dissertation