SlideShare a Scribd company logo
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
AboutCode and beyond:
End-to-end SCA with open
source code and open data
Philippe Ombredanne,
Lead maintainer of AboutCode and CTO of nexB, Inc.
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Agenda
2
● About AboutCode & nexB
● Software Composition
Analysis
○ Vulnerabilities AND licensing
○ Proprietary problems
● The AboutCode stack
● New projects
● Roadmap
● Questions?
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
About me
● On a mission to enable easier and safer to reuse FOSS code with
best-in-class open source Software Composition Analysis (SCA)
tools, data, and standards for open source discovery, license & security
compliance
● Lead maintainer of AboutCode projects (ScanCode, DejaCode,
VulnerableCode and others)
● Factoids
○ In 2010, I said that Docker technology would never succeed
○ Signed off on the largest deletion of code in the Linux kernel
(but these were only license comments)
● CTO and co-founder of nexB, Inc.
○ pombredanne@nexb.com
○ GitHub: https://github.com/pombredanne
○ LinkedIn: https://www.linkedin.com/in/philippeombredanne
○ Often assisted by Chihuahua Technical Advisor
3
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
AboutCode and nexB
● AboutCode's FOSS-first mission: FOSS for FOSS
○ Open source tools and open knowledge base (AboutCode stack)
○ Simple and practical standards (Package-URL)
○ Applications for Legal & Business users (DejaCode) with APIs for everything
● Trusted experts in Software Composition Analysis (SCA) since 2007
○ Creator of Package-URL: https://github.com/package-url
○ Co-founders of SPDX: https://spdx.org
○ Contributors to CycloneDX: https://cyclonedx.org
○ Co-founders of ClearlyDefined: https://clearlydefined.io
● nexB provides professional services and support for SCA
○ 800+ SCA projects completed to-date with 100% customer satisfaction
○ Sponsored development for AboutCode projects
○ Technical support and advisory for SCA tools implementations and deployments
4
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Software Composition Analysis
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Identification – Identify distinct “units” of third-party software used in a product or project
and their provenance
● Licensing – Determine the licensing for each software unit
● Security – Identify known security vulnerabilities for each software unit
● Quality – Evaluate the quality of a software unit based on software development data, such
as number of bugs, fixes, etc. - this is the domain of the CHAOSS project
● Read "SCA the FOSS Way" for more information:
https://www.nexb.com/software-composition-analysis/
Software Composition Analysis needs to be a core competency for any
software development organization.
● Embed in the software development workflow from design through release - as it is in
manufacturing
● The choice of SCA tools will depend on your platform, stack and product
Software Composition Analysis (SCA)
6
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Most SCA tools focus on either vulnerabilities OR licensing
○ Current focus is on security vulnerabilities because of perceived higher risk
● The communities of interest are separate - security vs legal - but converging
● License data may be complex, yet mostly stable over time
○ But very few tools get it right. Accuracy is still a major, unsolved problem
● Dependency graphs are highly dynamic and demand constant care
○ They impact the stability of licensing and vulnerability information
● Vulnerability data is complex, but extremely dynamic - if included directly in
an SBOM, it may be wrong by the time you receive an SBOM
● Most SCA security tools are lightweight with respect to both provenance and
licensing, and focus on the easy things
You need SCA coverage for vulnerabilities AND licensing - plus quality.
SCA: Vulnerabilities AND licensing
7
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
SCA: Proprietary tools and data
8
● Increasingly expensive with the surge of interest in SBOMs and pricing
based on number of developers
● Large companies may be able to “afford” proprietary SCA scanning tools,
but they do not scale across the FOSS supply chain
○ The cost of scan curation is prohibitive with high false positive rates and poor license detection
accuracy
● Most current data about FOSS packages and vulnerabilities is proprietary
○ Vendors may offer some free or open source tools but you must pay for access to their data
○ Barrier to community access and analysis
● Many vendors use some open source for marketing only - “fauxpen source”
○ Complex and restrictive licenses
○ No contributions back to the community
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
SCA: Open source tools and data
9
● There are many open source SCA tools and some databases:
○ License compliance focus: ORT, Fossology, SW360
○ Vulnerability SBOM focus: CycloneDX, Dependency Check, Syft/Grype (Anchore) , Trivy (Aqua
Security)
● So, why did we develop AboutCode?
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Free and open source software AND free and open data
○ FOSS for FOSS
○ Open knowledge base with open data for licenses, packages and vulnerabilities
● Modular and integrated best-in-class SCA tools for developers
○ Tackling the harder code analysis problems so you do not have to
○ PURL-based for easier integration in/out
● Bespoke pipelines enable true end-to-end automation
○ Working towards management by exception to focus on the complex cases of origin and license
○ Decentralized analysis, close to the developers
● Management web app for centralized policies, curations and compliance
workflows and data
○ Supports engineering, business and legal stakeholders with features tailored for each using
common/shared information
Why AboutCode? [1]
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● The state of SCA tooling accuracy is not great
● Recently, made a large scale comparison of many container scanners
○ Both FOSS and commercial
○ Using SBOMs as a way to compare scans of the same container images
● Commercial tools are making up packages, "hallucinating" PURLs
● Most look only skin deep, only looking at package manifests and DB
● Beyond package origin, the quality of report licenses is plain bad and
misleading
○ In most case this is a grep on the declared license of package manifests
● Several tools created invalid SBOMs
● We can do better!
Why AboutCode? [2]
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Introducing the
AboutCode stack
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
AboutCode: Who is using it?
Many organizations and most SCA providers use AboutCode tools,
libraries or standards:
○ Most free software and open source foundations
○ Five of the top big tech companies
○ A leading database company and a leading Linux company
○ European and US government agencies
○ All major European car manufacturers and most of their vendors
○ Major US chip and microprocessor providers
○ Four leading European industrial companies
○ All SBOM and VEX standards
○ All open source SCA and SBOM tools
○ Most proprietary SCA, SBOM or code hosting tools
13
SCA Tools
Management
Apps
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
SCA Tools Management Apps
Open Knowledge Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
SCA Tools Management Apps
Open Knowledge Base
ScanCode DejaCode
Licenses Packages Vulnerabilities
Scan Match Analysis pipelines Policies Curations Software inventory
Workflows SBOMs Custom reports
Binary analysis Dependency analysis
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Supports safe and compliant use of FOSS, with FOSS
○ Recognized worldwide as best-in-class tools
○ Modular design for adaptation to development team processes, tools and environment
○ Coverage for all languages and frameworks
○ Package URL (PURL) used throughout as the package identifier
○ Code AND data licensed under open source licenses, no gimmicks
● Reduce licensing and vulnerability risks from using FOSS or other
third-party software components
○ Share risk management responsibilities among business, legal, engineering and security teams
○ Provide a comprehensive view of open source and other third-party components used in your
software
● Active community of contributors and users, including many FOSS tools
● Technical support, implementation, advisory services available from nexB
Benefits of the AboutCode stack
16
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Contribute to an AboutCode project with code,
documentation, use cases, bug reports
■ https://github.com/nexB
● Sponsor AboutCode project maintainers
○ Accelerate development of new features and fund contributors
■ https://github.com/sponsors/nexB
● Buy support, implementation, and advisory services
from nexB to pay the maintainers
■ https://nexb.com
● Join the community:
■ https://www.aboutcode.org/
■ https://gitter.im/aboutcode-org/discuss
AboutCode also needs your help!
17
"Dependency" by xkcd, used under CC BY-NC 2.5 /
Modified text from original
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
AboutCode:
New Projects
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Pipeline Input: sources and binaries
● Collect symbols and identifiers from source and binaries
○ Parse Java bytecode, ELF, DWARF, WinPE, Mach-O, JS mapfiles, collect literals,
source symbols
● Map and match these symbols from binaries back to source
● If not mapped, fall back to code matching the PurlDB
● Report discrepancies
○ Code that is found in binaries and NOT in the source
● WIP BUT the code from before xz has been able to detect
xz-utils problems and tagged the problematic, malicious build
script as "require review"! yeah!
Binary, deployment analysis: back2source
19
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
New project: CRAVEX [1]
20
● Goal is to automate App vulnerabilities management
● .... AND compliance regulatory reporting
● Built for open source projects and small businesses as a free and open
solution to comply with the emerging regulatory mandates (SBOMs,
CRA) with minimal friction and costs
● Package- and software product-centric management of vulnerabilities
● Web-based, database-backed application to collect, track, and triage
FOSS package vulnerabilities and determine their exploitability
○ Rank based on urgency, assess remediation
○ Create VEX reports
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
New project: CRAVEX [2]
● Import SBOM and scans for one or more apps, products & components
● Schedule vulnerability lookups and store the results in the database
● Web UI to rank and prioritize package vulnerabilities based on
○ Multiple scores
○ Rule-based automation
○ Vulnerable code reachability and exploitability
○ Usage context
● Export the results of the vulnerabilities triage and processing as VEX
documents and attestations
21
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
New project: Code reachability
● Upcoming companion to CRAVEX
● Goal: help prioritize vulnerabilities based on actual local exploitability
● Use multiple factors to help better qualify the urgency
● Symbols-based reachability of the vulnerable code
● Call graph-based reachability of the vulnerable code
● Integrate local context to assign exploitability priorities
○ Development or internal tool vs. production software or consumer device
● Integrate existing excellent FOSS efforts in the space
○ Eclipse steady, JORN, Chen
22
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Question: How to reuse safely AI-Generated code?
● AI-Generated code is a wonderful productivity booster
● Thought experiment
○ Build a small LM from only GPL-licensed code from the GNU project.
○ Add Gen-AI on top. Is the generated code derived from the GPL-licensed code?
● AI-Generated code may violate licenses and copyrights
● AI-Generated code may copy vulnerable code sections
● Some large businesses and open source foundations have defined
policies wrt. AI-generated code, in some cases prohibiting its use.
New project: GenAI Code Search [1]
23
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Fight fire with fire
○ An approach we considered is to use GenAI to regenerate code under analysis and compute
similarity between regenerated and original code
○ Impractical as too expensive and too slow
● Find similar code fragments
○ The focus of this project
● Traditional code fragments matching does not work for AI
○ The code is broken in chunks using a content-defined heuristic
○ Chunks are matched exactly using a checksum
○ BUT, AI-generated code is seldom exactly the same as indexed FOSS code
○ Existing solutions have ever growing indexes with more fragments to avoid false negative
○ Furthermore, precision and recall are frozen in the choice of parameters for the index
New project: GenAI Code Search [2]
24
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● A new approach to approximate code fragment matching
● Fingerprint-based
○ Helps scale the index, but also scale the query as a whole codebase (Gigabyte size) is the query
○ Traditional Information retrieval with inverted indexes does not work for queries this large
● Approximate, fuzzy fingerprinting
○ Using new algorithm that enables matching code that was never indexed
● Furthermore, tunable fingerprint
○ Can be tuned at query time for precision and recall
New project: GenAI Code Search [2]
25
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
New project idea: Open Containers KB
26
● Containers composition is a mess. Most tools are just plain bad
○ Lack basic tracing of license and package (and therefore vulnerabilities)
● Image builders, OS and distro vendors do not seem to care
○ Official images are sometimes not compliant or not traceable
○ Package volume amplifies vulnerability and license issues
○ Source of binary packages disappears
● We can do better!
● Project idea: create a mini consortium to do a
proper, automated and correct SCA of key
public base images
● Share these as open data
● Work with upstream to clean their acts
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
AboutCode
Roadmap
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Roadmap for AboutCode: ScanCode Toolkit
● Build single exe standalone apps for ScanCode for easier
deployment in Ci/CD
● Improve copyright and license detection speed
● Build smaller single-purpose tools and libraries from
"mono repo"
● Improve data models for Packages and
Dependencies/Requirements
● Parse more package manifests and lock files
● Improve support for license exceptions (WITH)
● Move inconclusive, unknown license detection to clues
● Add post-processing to rematch using SPDX matching
guidelines
28
28
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Integrate with CI and other tools
○ Create Ci/CD pre-configured integrations with main CI (GitHub, GitLab, Jenkins)
● Extend binary analysis and deployment tracing workflows
○ Support ELF/Native, Go, Ruby, Android in addition to Java and JS
○ Find the exact subset of the code that is deployed and used in production
● Automate analysis review in ScanCode.io
○ End to end automated pipelines for embedded devices, Android and C/C++
○ Multi-stack deployment analysis for Java, JS, C/C++
○ Report TODO items to review only "by exception"
Roadmap for AboutCode: SCA Tools
29
29
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Code match smart ranking and disambiguation
○ Avoid false positives
● Accurately match to the correct package version
● Match code snippets approximately
○ Using our new approximate fingerprinting
○ Integrate other code matching schemes from SWH and SCANOSS
● Match source symbols and binary symbols to sources
and binaries
● New matching pipelines
● Decentralized curation and corrections using
in-codebase ABOUT files
Roadmap for AboutCode: Code matching
30
30
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Compare scans to focus review work on changes only
(DeltaCode)
● APIs and CLI to query all the things by PURL from the KB
(purl2all)
● More code inspectors
○ Lightweight package dependency resolution
○ Dedicated ecosystem-focused libraries
● New lightweight package-inspector
○ Single executable to find packages and dependencies
● Trace build execution to find the exact subset of source
code that is deployed and used (TraceCode)
Roadmap for AboutCode: Other SCA Tools
31
31
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Roadmap for AboutCode: Management Apps
32
● Add support for CycloneDX 1.5 and 1.6 and SPDX 3.0
● Create new review automation apps:
○ License detection review
○ Code match review
○ Vulnerability review
● Overall goal is to reduce review and curation work
○ Extend license clarity scoring to code matches with origin clarity scoring
○ "Auto conclude" matches that are conclusive
● New app for advanced Vulnerability management and
support for CRA (Cyber Resiliency Act) compliance
○ Automated triage of vulnerabilities and workflow triggers
○ VEX creation, VEX import and export (Vulnerability Exploitability Exchange)
with CSAF and CycloneDX
32
Management
Apps
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Extend License data with compatibility matrix
● Add new license aliases dataset
● Add more extensive tagging and categorization
● Extend License data with improved exception details
○ To disambiguate license detections of L/GPL with/without exceptions
● Extend License data with improved "or later" details
○ To disambiguate detection of "or later" notices with their primary texts
● Add "key phrases" to all license detection rules
● Add variable text segments to license rules
● Add Fedora alternative SPDX identifiers
● Work with CycloneDX to become their license reference
Roadmap for AboutCode: Licenses
33
33
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Roadmap for AboutCode: Vulnerabilities
34
● Extend Non-vulnerable dependency resolution
○ Beyond Python - add Java and JS
● Extend vulnerability data with new upstream data sources
● Add fix commit details and support for vulnerability reachability
● Mine the graph to surface related package fixes
● Mine git logs, issues and forums to enrich vulnerability data
● Surface inconsistencies and conflicts between different advisory
data sources (VulnTotal throughout)
● Add source/binary discrepancy data (from back2source)
34
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Confirm the true origin of code to avoid ambiguous matches
● Supply chain package verification
○ Map deployed binary packages to their corresponding source code
○ Find suspicious code drift between package versions
● Mine extensive list of "off registry" packages
○ Common native C/C++ code and libraries for embedded
○ Glibc, Busybox, zlib, etc. that are not published on ecosystem package registries
● Collect code symbols from source and binaries (for matching)
● On demand, just in time code mining to build your KB on the fly
● Federated, decentralized shared KB data with Git and ActivityPub
○ Share scans, vulnerabilities, origin facts and curations
○ Scan once, analyze once and collaborate on reviews to clear out the junk!
Roadmap for AboutCode: Packages
35
35
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
The AboutCode Stack
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
The AboutCode stack for SCA [1]
37
SCA Tools
Management
Apps
Open
Knowledge
Base
● Web-based enterprise management application
○ DejaCode for ensuring license and security compliance
● SCA tools for identifying third-party code and determining
code license and origin
○ ScanCode is the leading code scanner for software component, package
and dependency identification, and license detection
○ MatchCode is a new tool for package and file matching
○ container-inspector: analysis tool for Docker & other images
○ nuget-inspector and python-inspector for in-depth dependency resolution
○ many other libraries
○ See https://aboutcode.org for an overview of AboutCode projects
○ See https://github.com/nexB for the code
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
The AboutCode stack for SCA [2]
38
SCA Tools
Management
Apps
Open
Knowledge
Base
● Open knowledge base with open data for licenses, packages
and vulnerabilities
○ LicenseDB - open source and other public licenses at:
https://scancode-licensedb.aboutcode.org/
○ PurlDB - package data at: https://public.purldb.io/api/packages/
○ VulnerableCode - aggregated vulnerability data and comprehensive
vulnerability reporting at: https://public.vulnerablecode.io/
● Standards
○ Package-URL: Specification and tools for identifying packages at:
https://github.com/package-url
○ Univers: Parse and compare package versions and ranges at:
https://github.com/nexB/univers
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Industry-leading scanning engine
● License detection with multiple techniques, rule-based
● Copyright notices with NLP
● Identify packages
○ Normalize all the package metadata
○ Includes dependencies and package license detection
○ Package manifests, system package databases and lockfile parsing
● New summarization and license clarity scoring
○ Identify and focus curation on actual licensing issues
● Accuracy is paramount
○ An incorrect license detection is treated as a bug
● ABOUT files for curations/corrections stored in the codebase
The AboutCode stack: ScanCode Toolkit
39
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Web-based scanning server using ScanCode
○ Smarter scripted scanning in multiple steps
● Specialized pipelines for customized analysis
○ Tag items that need your review
○ Pipeline for best-in-class container and VM scanning
● Unique deployment analysis using binary analysis
○ Map binaries back to their sources
● Code matching integrated with the knowledge base
○ Starting with exact and approximate file matching
● Integrated enrichment of the knowledge base
○ Collect and pre-scan all the packages that you use
○ Watch and collect new versions continuously
The AboutCode stack: ScanCode.io
40
SCA Tools
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● New Web-based code matching server
● Includes mining for custom knowledge base
○ All package ecosystems and linux distros
● Smarter matching in multiple steps
○ Whole tree, exact file, approximate tree and file
○ Coming up: snippet matching, with a twist for AI-Generated code
● Pipeline for ranking and picking best matches
● A different matching approach
○ Exact matching demands a constantly growing index
○ Approximate matching can match software that is NOT indexed
○ Top down rather than bottom up
The AboutCode stack: MatchCode
41
SCA Tools
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Inspectors: tech-specific tools and dependency resolvers
○ Container and VM images, Debian, ELF and DWARF, NuGet, Python, source
● aboutcode-toolkit: Generate Attribution Notices
○ Using scans or ABOUT files as input
● package-url (PURL): URL string to identify a software package
○ Adopted by CSAF, CycloneDX, SPDX and the whole SCA ecosystem
○ Now part of the CVE specification v5.1
○ Recommended by US CISA and German BSi
● univers: parse and compare package versions and version
ranges
● license-expression: parse and compare License expressions
The AboutCode stack: Other projects
42
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Input: buildable codebase
● Run build under "strace" and collect the trace
○ All kernel syscalls that open, close, write to files, spawn processes
● Reconstruct build graph
○ Determine the subset of the sources used in deployment
● Then Scan and Match the source subset
● Useful, but still marginal usage as it requires a lot of tuning
Build tracing: TraceCode
43
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Licenses: 2,000+ licenses and 35,000 rules
○ ScanCode LicenseDB has the basic license data
○ ScanCode Toolkit has the license detection rules
○ DejaCode is synchronized with LicenseDB and adds License Conditions
○ All licenses have SPDX Identifiers with “Licenseref-scancode” namespace for the
many licenses not included in the SPDX License List (currently 567 licenses)
● No known alternative with comparable depth and breadth
The AboutCode stack: Open Data [1]
44
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Packages: 21M+ packages and, files and their fingerprints
○ PURL-based
○ Public PurlDB is at: https://public.purldb.io/api/packages/
○ All major ecosystems and distributions - sources AND binaries
○ Built-in mining of all package ecosystems, not half-baked
○ Also just-in-time, on-demand data collection
○ Collect, scan, and index all the packages sources, binaries and VCS repos
○ Index with code fingerprints used for code matching
● Other Package databases:
○ Software Heritage, ClearlyDefined, deps.dev (Google)
○ Centralized and too big to share
○ No on-premises option for private operations (too big again)
The AboutCode stack: Open Data [2]
45
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Vulnerabilities: 760K+ packages and 240K+ vulnerabilities
○ PURL-based
○ Public VulnerableCodeDB is at: https://public.vulnerablecode.io/
○ All major ecosystems and vulnerability DBs aggregated and correlated
○ Discover relations (and inconsistencies) in data from mining the graph
● Other Vulnerability databases:
○ OSV (reuses some AboutCode code too), GitHub, GitLab, NVD
○ Often contain conflicting data for vulnerable ranges, fixed versions or affected
packages
○ Comparison made possible with VulnTotal to query vulnerable version ranges
given a PURL
The AboutCode stack: Open Data [3]
46
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● PURLs (Package URLs) are wonderful
● : 760K+ packages and 240K+ vulnerabilities
○ PURL-based
○ Public VulnerableCodeDB is at: https://public.vulnerablecode.io/
○ All major ecosystems and vulnerability DBs aggregated and correlated
○ Discover relations (and inconsistencies) in data from mining the graph
● Other Vulnerability databases:
○ OSV (reuses some AboutCode code too), GitHub, GitLab, NVD
○ Often contain conflicting data for vulnerable ranges, fixed versions or affected
packages
○ Comparison made possible with VulnTotal to query vulnerable version ranges
given a PURL
The AboutCode stack: Open Data [3]
47
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
The AboutCode stack: DejaCode [1]
48
Integrate all tools and data in one web-based application for SCA
and compliance management
● Manage product and component Inventories
● Curate code origin and licenses
● Define and apply license policies
● Launch scans and access the Knowledge Base
● Identify package vulnerabilities
● Consume and enrich SBOMs (CycloneDX or SPDX)
● Generate FOSS compliance documents, such as product
Attribution Notices and SBOMs (CycloneDX or SPDX)
Management
Apps
Open
Knowledge
Base
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
The AboutCode stack: DejaCode [2]
49
Integrate all tools and data in one web-based application for SCA
and compliance management
● Standard and custom reports
● JSON API and webhooks
● Built-in basic workflows
● Integrated with AboutCode SCA Tools and Open Knowledge
Base
Management
Apps
Open
Knowledge
Base
SCA Tools

More Related Content

Similar to OpenChain Webinar: AboutCode and Beyond - End-to-End SCA

Managing Open Source Software Supply Chains
Managing Open Source Software Supply ChainsManaging Open Source Software Supply Chains
Managing Open Source Software Supply Chains
nexB Inc.
 
Open Source Software: What Are Your Obligations?
Open Source Software: What Are Your Obligations? Open Source Software: What Are Your Obligations?
Open Source Software: What Are Your Obligations?
Source Code Control Limited
 
Opensource wildey
Opensource wildeyOpensource wildey
Opensource wildey
Richard Jobity
 
Managing Software Inventories & Automating Open Source Software Compliance
Managing Software Inventories & Automating Open Source Software ComplianceManaging Software Inventories & Automating Open Source Software Compliance
Managing Software Inventories & Automating Open Source Software Compliance
nexB Inc.
 
Enterprise-Grade DevOps Solutions for a Start Up Budget
Enterprise-Grade DevOps Solutions for a Start Up BudgetEnterprise-Grade DevOps Solutions for a Start Up Budget
Enterprise-Grade DevOps Solutions for a Start Up Budget
DevOps.com
 
Open source software governance with DejaCode
Open source software governance with DejaCodeOpen source software governance with DejaCode
Open source software governance with DejaCode
nexB Inc.
 
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
sparkfabrik
 
Open Source evaluation: A comprehensive guide on what you are using
Open Source evaluation: A comprehensive guide on what you are usingOpen Source evaluation: A comprehensive guide on what you are using
Open Source evaluation: A comprehensive guide on what you are using
All Things Open
 
How open source empowers startups to start big, with case Double Open Oy
How open source empowers startups to start big, with case Double Open OyHow open source empowers startups to start big, with case Double Open Oy
How open source empowers startups to start big, with case Double Open Oy
Mindtrek
 
EWTT22_ Apakah Open Source Cocok digunakan dalam Korporasi_.pdf
EWTT22_ Apakah Open Source Cocok digunakan dalam Korporasi_.pdfEWTT22_ Apakah Open Source Cocok digunakan dalam Korporasi_.pdf
EWTT22_ Apakah Open Source Cocok digunakan dalam Korporasi_.pdf
Equnix Business Solutions
 
VulnerableCode: Finding FOSS software vulnerabilities with FOSS tools
VulnerableCode: Finding FOSS software vulnerabilities with FOSS toolsVulnerableCode: Finding FOSS software vulnerabilities with FOSS tools
VulnerableCode: Finding FOSS software vulnerabilities with FOSS tools
Michael Herzog
 
Managing the Software Supply Chain: Policies that Promote Innovation While Op...
Managing the Software Supply Chain: Policies that Promote Innovation While Op...Managing the Software Supply Chain: Policies that Promote Innovation While Op...
Managing the Software Supply Chain: Policies that Promote Innovation While Op...
FINOS
 
Introduction to Open Source License and Business Model
Introduction to Open Source License and Business ModelIntroduction to Open Source License and Business Model
Introduction to Open Source License and Business Model
Mohd Izhar Firdaus Ismail
 
KCD Italy 2023 - Secure Software Supply chain for OCI Artifact on Kubernetes
KCD Italy 2023 - Secure Software Supply chain for OCI Artifact on KubernetesKCD Italy 2023 - Secure Software Supply chain for OCI Artifact on Kubernetes
KCD Italy 2023 - Secure Software Supply chain for OCI Artifact on Kubernetes
sparkfabrik
 
Licensing in Composite Open Source Projects
Licensing in Composite Open Source ProjectsLicensing in Composite Open Source Projects
Licensing in Composite Open Source Projects
Protecode
 
Generating SBOMS FROM FOSS (Detecting OSS licences)
Generating SBOMS FROM FOSS (Detecting OSS licences)Generating SBOMS FROM FOSS (Detecting OSS licences)
Generating SBOMS FROM FOSS (Detecting OSS licences)
Thierry Gayet
 
Advantages & Disadvantages (Open-Source vs. Proprietary Software)
Advantages & Disadvantages (Open-Source vs. Proprietary Software)Advantages & Disadvantages (Open-Source vs. Proprietary Software)
Advantages & Disadvantages (Open-Source vs. Proprietary Software)
Fleurati
 
How to Manage Open Source requirements with AboutCode
How to Manage Open Source requirements with AboutCodeHow to Manage Open Source requirements with AboutCode
How to Manage Open Source requirements with AboutCode
nexB Inc.
 
Data Con LA 2022-Open Source or Open Core in Your Data Layer? What Needs to B...
Data Con LA 2022-Open Source or Open Core in Your Data Layer? What Needs to B...Data Con LA 2022-Open Source or Open Core in Your Data Layer? What Needs to B...
Data Con LA 2022-Open Source or Open Core in Your Data Layer? What Needs to B...
Data Con LA
 
Social Code Scanning
Social Code ScanningSocial Code Scanning
Social Code Scanning
Symphony Software Foundation
 

Similar to OpenChain Webinar: AboutCode and Beyond - End-to-End SCA (20)

Managing Open Source Software Supply Chains
Managing Open Source Software Supply ChainsManaging Open Source Software Supply Chains
Managing Open Source Software Supply Chains
 
Open Source Software: What Are Your Obligations?
Open Source Software: What Are Your Obligations? Open Source Software: What Are Your Obligations?
Open Source Software: What Are Your Obligations?
 
Opensource wildey
Opensource wildeyOpensource wildey
Opensource wildey
 
Managing Software Inventories & Automating Open Source Software Compliance
Managing Software Inventories & Automating Open Source Software ComplianceManaging Software Inventories & Automating Open Source Software Compliance
Managing Software Inventories & Automating Open Source Software Compliance
 
Enterprise-Grade DevOps Solutions for a Start Up Budget
Enterprise-Grade DevOps Solutions for a Start Up BudgetEnterprise-Grade DevOps Solutions for a Start Up Budget
Enterprise-Grade DevOps Solutions for a Start Up Budget
 
Open source software governance with DejaCode
Open source software governance with DejaCodeOpen source software governance with DejaCode
Open source software governance with DejaCode
 
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
 
Open Source evaluation: A comprehensive guide on what you are using
Open Source evaluation: A comprehensive guide on what you are usingOpen Source evaluation: A comprehensive guide on what you are using
Open Source evaluation: A comprehensive guide on what you are using
 
How open source empowers startups to start big, with case Double Open Oy
How open source empowers startups to start big, with case Double Open OyHow open source empowers startups to start big, with case Double Open Oy
How open source empowers startups to start big, with case Double Open Oy
 
EWTT22_ Apakah Open Source Cocok digunakan dalam Korporasi_.pdf
EWTT22_ Apakah Open Source Cocok digunakan dalam Korporasi_.pdfEWTT22_ Apakah Open Source Cocok digunakan dalam Korporasi_.pdf
EWTT22_ Apakah Open Source Cocok digunakan dalam Korporasi_.pdf
 
VulnerableCode: Finding FOSS software vulnerabilities with FOSS tools
VulnerableCode: Finding FOSS software vulnerabilities with FOSS toolsVulnerableCode: Finding FOSS software vulnerabilities with FOSS tools
VulnerableCode: Finding FOSS software vulnerabilities with FOSS tools
 
Managing the Software Supply Chain: Policies that Promote Innovation While Op...
Managing the Software Supply Chain: Policies that Promote Innovation While Op...Managing the Software Supply Chain: Policies that Promote Innovation While Op...
Managing the Software Supply Chain: Policies that Promote Innovation While Op...
 
Introduction to Open Source License and Business Model
Introduction to Open Source License and Business ModelIntroduction to Open Source License and Business Model
Introduction to Open Source License and Business Model
 
KCD Italy 2023 - Secure Software Supply chain for OCI Artifact on Kubernetes
KCD Italy 2023 - Secure Software Supply chain for OCI Artifact on KubernetesKCD Italy 2023 - Secure Software Supply chain for OCI Artifact on Kubernetes
KCD Italy 2023 - Secure Software Supply chain for OCI Artifact on Kubernetes
 
Licensing in Composite Open Source Projects
Licensing in Composite Open Source ProjectsLicensing in Composite Open Source Projects
Licensing in Composite Open Source Projects
 
Generating SBOMS FROM FOSS (Detecting OSS licences)
Generating SBOMS FROM FOSS (Detecting OSS licences)Generating SBOMS FROM FOSS (Detecting OSS licences)
Generating SBOMS FROM FOSS (Detecting OSS licences)
 
Advantages & Disadvantages (Open-Source vs. Proprietary Software)
Advantages & Disadvantages (Open-Source vs. Proprietary Software)Advantages & Disadvantages (Open-Source vs. Proprietary Software)
Advantages & Disadvantages (Open-Source vs. Proprietary Software)
 
How to Manage Open Source requirements with AboutCode
How to Manage Open Source requirements with AboutCodeHow to Manage Open Source requirements with AboutCode
How to Manage Open Source requirements with AboutCode
 
Data Con LA 2022-Open Source or Open Core in Your Data Layer? What Needs to B...
Data Con LA 2022-Open Source or Open Core in Your Data Layer? What Needs to B...Data Con LA 2022-Open Source or Open Core in Your Data Layer? What Needs to B...
Data Con LA 2022-Open Source or Open Core in Your Data Layer? What Needs to B...
 
Social Code Scanning
Social Code ScanningSocial Code Scanning
Social Code Scanning
 

More from Shane Coughlan

openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
Shane Coughlan
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
Shane Coughlan
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
Shane Coughlan
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
Shane Coughlan
 
OpenChain Monthly Meeting North America and Asia - 2024-03-19
OpenChain Monthly Meeting North America and Asia - 2024-03-19OpenChain Monthly Meeting North America and Asia - 2024-03-19
OpenChain Monthly Meeting North America and Asia - 2024-03-19
Shane Coughlan
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
Shane Coughlan
 
openEuler Community Overview - a presentation showing the current scale
openEuler Community Overview - a presentation showing the current scaleopenEuler Community Overview - a presentation showing the current scale
openEuler Community Overview - a presentation showing the current scale
Shane Coughlan
 
OpenChain AI Study Group - North America and Europe - 2024-02-20
OpenChain AI Study Group - North America and Europe - 2024-02-20OpenChain AI Study Group - North America and Europe - 2024-02-20
OpenChain AI Study Group - North America and Europe - 2024-02-20
Shane Coughlan
 
AI Study Group North America - Europe 2024-02-06
AI Study Group North America - Europe 2024-02-06AI Study Group North America - Europe 2024-02-06
AI Study Group North America - Europe 2024-02-06
Shane Coughlan
 
OpenChain Monthly North America / Europe Call - 2024-02-06
OpenChain Monthly North America / Europe Call - 2024-02-06OpenChain Monthly North America / Europe Call - 2024-02-06
OpenChain Monthly North America / Europe Call - 2024-02-06
Shane Coughlan
 
OpenChain Export Control Work Group 2024-01-09
OpenChain Export Control Work Group 2024-01-09OpenChain Export Control Work Group 2024-01-09
OpenChain Export Control Work Group 2024-01-09
Shane Coughlan
 
OpenChain Legal Work Group - 2024-01-17
OpenChain Legal Work Group -  2024-01-17OpenChain Legal Work Group -  2024-01-17
OpenChain Legal Work Group - 2024-01-17
Shane Coughlan
 
Openchain AI Study Group 2024-01-23.pptx
Openchain AI Study Group 2024-01-23.pptxOpenchain AI Study Group 2024-01-23.pptx
Openchain AI Study Group 2024-01-23.pptx
Shane Coughlan
 
OpenChain Webinar #58 - FOSS License Management through aliens4friends in Ecl...
OpenChain Webinar #58 - FOSS License Management through aliens4friends in Ecl...OpenChain Webinar #58 - FOSS License Management through aliens4friends in Ecl...
OpenChain Webinar #58 - FOSS License Management through aliens4friends in Ecl...
Shane Coughlan
 
Maturity Models - Open Compliance Summit 2023
Maturity Models - Open Compliance Summit 2023Maturity Models - Open Compliance Summit 2023
Maturity Models - Open Compliance Summit 2023
Shane Coughlan
 
OpenChain Annual Report 2023 - Key Metrics Slides
OpenChain Annual Report 2023 - Key Metrics SlidesOpenChain Annual Report 2023 - Key Metrics Slides
OpenChain Annual Report 2023 - Key Metrics Slides
Shane Coughlan
 
OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
Shane Coughlan
 
FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30
Shane Coughlan
 
OpenChain Webinar #56: Generative AI and Your Code
OpenChain Webinar #56: Generative AI and Your CodeOpenChain Webinar #56: Generative AI and Your Code
OpenChain Webinar #56: Generative AI and Your Code
Shane Coughlan
 

More from Shane Coughlan (20)

openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
OpenChain Monthly Meeting North America and Asia - 2024-03-19
OpenChain Monthly Meeting North America and Asia - 2024-03-19OpenChain Monthly Meeting North America and Asia - 2024-03-19
OpenChain Monthly Meeting North America and Asia - 2024-03-19
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
 
openEuler Community Overview - a presentation showing the current scale
openEuler Community Overview - a presentation showing the current scaleopenEuler Community Overview - a presentation showing the current scale
openEuler Community Overview - a presentation showing the current scale
 
OpenChain AI Study Group - North America and Europe - 2024-02-20
OpenChain AI Study Group - North America and Europe - 2024-02-20OpenChain AI Study Group - North America and Europe - 2024-02-20
OpenChain AI Study Group - North America and Europe - 2024-02-20
 
AI Study Group North America - Europe 2024-02-06
AI Study Group North America - Europe 2024-02-06AI Study Group North America - Europe 2024-02-06
AI Study Group North America - Europe 2024-02-06
 
OpenChain Monthly North America / Europe Call - 2024-02-06
OpenChain Monthly North America / Europe Call - 2024-02-06OpenChain Monthly North America / Europe Call - 2024-02-06
OpenChain Monthly North America / Europe Call - 2024-02-06
 
OpenChain Export Control Work Group 2024-01-09
OpenChain Export Control Work Group 2024-01-09OpenChain Export Control Work Group 2024-01-09
OpenChain Export Control Work Group 2024-01-09
 
OpenChain Legal Work Group - 2024-01-17
OpenChain Legal Work Group -  2024-01-17OpenChain Legal Work Group -  2024-01-17
OpenChain Legal Work Group - 2024-01-17
 
Openchain AI Study Group 2024-01-23.pptx
Openchain AI Study Group 2024-01-23.pptxOpenchain AI Study Group 2024-01-23.pptx
Openchain AI Study Group 2024-01-23.pptx
 
OpenChain Webinar #58 - FOSS License Management through aliens4friends in Ecl...
OpenChain Webinar #58 - FOSS License Management through aliens4friends in Ecl...OpenChain Webinar #58 - FOSS License Management through aliens4friends in Ecl...
OpenChain Webinar #58 - FOSS License Management through aliens4friends in Ecl...
 
Maturity Models - Open Compliance Summit 2023
Maturity Models - Open Compliance Summit 2023Maturity Models - Open Compliance Summit 2023
Maturity Models - Open Compliance Summit 2023
 
OpenChain Annual Report 2023 - Key Metrics Slides
OpenChain Annual Report 2023 - Key Metrics SlidesOpenChain Annual Report 2023 - Key Metrics Slides
OpenChain Annual Report 2023 - Key Metrics Slides
 
OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
 
FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30
 
OpenChain Webinar #56: Generative AI and Your Code
OpenChain Webinar #56: Generative AI and Your CodeOpenChain Webinar #56: Generative AI and Your Code
OpenChain Webinar #56: Generative AI and Your Code
 

Recently uploaded

AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 

Recently uploaded (20)

AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 

OpenChain Webinar: AboutCode and Beyond - End-to-End SCA

  • 1. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org AboutCode and beyond: End-to-end SCA with open source code and open data Philippe Ombredanne, Lead maintainer of AboutCode and CTO of nexB, Inc.
  • 2. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org Agenda 2 ● About AboutCode & nexB ● Software Composition Analysis ○ Vulnerabilities AND licensing ○ Proprietary problems ● The AboutCode stack ● New projects ● Roadmap ● Questions?
  • 3. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org About me ● On a mission to enable easier and safer to reuse FOSS code with best-in-class open source Software Composition Analysis (SCA) tools, data, and standards for open source discovery, license & security compliance ● Lead maintainer of AboutCode projects (ScanCode, DejaCode, VulnerableCode and others) ● Factoids ○ In 2010, I said that Docker technology would never succeed ○ Signed off on the largest deletion of code in the Linux kernel (but these were only license comments) ● CTO and co-founder of nexB, Inc. ○ pombredanne@nexb.com ○ GitHub: https://github.com/pombredanne ○ LinkedIn: https://www.linkedin.com/in/philippeombredanne ○ Often assisted by Chihuahua Technical Advisor 3
  • 4. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org AboutCode and nexB ● AboutCode's FOSS-first mission: FOSS for FOSS ○ Open source tools and open knowledge base (AboutCode stack) ○ Simple and practical standards (Package-URL) ○ Applications for Legal & Business users (DejaCode) with APIs for everything ● Trusted experts in Software Composition Analysis (SCA) since 2007 ○ Creator of Package-URL: https://github.com/package-url ○ Co-founders of SPDX: https://spdx.org ○ Contributors to CycloneDX: https://cyclonedx.org ○ Co-founders of ClearlyDefined: https://clearlydefined.io ● nexB provides professional services and support for SCA ○ 800+ SCA projects completed to-date with 100% customer satisfaction ○ Sponsored development for AboutCode projects ○ Technical support and advisory for SCA tools implementations and deployments 4
  • 5. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org Software Composition Analysis
  • 6. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Identification – Identify distinct “units” of third-party software used in a product or project and their provenance ● Licensing – Determine the licensing for each software unit ● Security – Identify known security vulnerabilities for each software unit ● Quality – Evaluate the quality of a software unit based on software development data, such as number of bugs, fixes, etc. - this is the domain of the CHAOSS project ● Read "SCA the FOSS Way" for more information: https://www.nexb.com/software-composition-analysis/ Software Composition Analysis needs to be a core competency for any software development organization. ● Embed in the software development workflow from design through release - as it is in manufacturing ● The choice of SCA tools will depend on your platform, stack and product Software Composition Analysis (SCA) 6
  • 7. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Most SCA tools focus on either vulnerabilities OR licensing ○ Current focus is on security vulnerabilities because of perceived higher risk ● The communities of interest are separate - security vs legal - but converging ● License data may be complex, yet mostly stable over time ○ But very few tools get it right. Accuracy is still a major, unsolved problem ● Dependency graphs are highly dynamic and demand constant care ○ They impact the stability of licensing and vulnerability information ● Vulnerability data is complex, but extremely dynamic - if included directly in an SBOM, it may be wrong by the time you receive an SBOM ● Most SCA security tools are lightweight with respect to both provenance and licensing, and focus on the easy things You need SCA coverage for vulnerabilities AND licensing - plus quality. SCA: Vulnerabilities AND licensing 7
  • 8. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org SCA: Proprietary tools and data 8 ● Increasingly expensive with the surge of interest in SBOMs and pricing based on number of developers ● Large companies may be able to “afford” proprietary SCA scanning tools, but they do not scale across the FOSS supply chain ○ The cost of scan curation is prohibitive with high false positive rates and poor license detection accuracy ● Most current data about FOSS packages and vulnerabilities is proprietary ○ Vendors may offer some free or open source tools but you must pay for access to their data ○ Barrier to community access and analysis ● Many vendors use some open source for marketing only - “fauxpen source” ○ Complex and restrictive licenses ○ No contributions back to the community
  • 9. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org SCA: Open source tools and data 9 ● There are many open source SCA tools and some databases: ○ License compliance focus: ORT, Fossology, SW360 ○ Vulnerability SBOM focus: CycloneDX, Dependency Check, Syft/Grype (Anchore) , Trivy (Aqua Security) ● So, why did we develop AboutCode?
  • 10. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Free and open source software AND free and open data ○ FOSS for FOSS ○ Open knowledge base with open data for licenses, packages and vulnerabilities ● Modular and integrated best-in-class SCA tools for developers ○ Tackling the harder code analysis problems so you do not have to ○ PURL-based for easier integration in/out ● Bespoke pipelines enable true end-to-end automation ○ Working towards management by exception to focus on the complex cases of origin and license ○ Decentralized analysis, close to the developers ● Management web app for centralized policies, curations and compliance workflows and data ○ Supports engineering, business and legal stakeholders with features tailored for each using common/shared information Why AboutCode? [1]
  • 11. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● The state of SCA tooling accuracy is not great ● Recently, made a large scale comparison of many container scanners ○ Both FOSS and commercial ○ Using SBOMs as a way to compare scans of the same container images ● Commercial tools are making up packages, "hallucinating" PURLs ● Most look only skin deep, only looking at package manifests and DB ● Beyond package origin, the quality of report licenses is plain bad and misleading ○ In most case this is a grep on the declared license of package manifests ● Several tools created invalid SBOMs ● We can do better! Why AboutCode? [2]
  • 12. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org Introducing the AboutCode stack
  • 13. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org AboutCode: Who is using it? Many organizations and most SCA providers use AboutCode tools, libraries or standards: ○ Most free software and open source foundations ○ Five of the top big tech companies ○ A leading database company and a leading Linux company ○ European and US government agencies ○ All major European car manufacturers and most of their vendors ○ Major US chip and microprocessor providers ○ Four leading European industrial companies ○ All SBOM and VEX standards ○ All open source SCA and SBOM tools ○ Most proprietary SCA, SBOM or code hosting tools 13 SCA Tools Management Apps Open Knowledge Base
  • 14. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org SCA Tools Management Apps Open Knowledge Base
  • 15. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org SCA Tools Management Apps Open Knowledge Base ScanCode DejaCode Licenses Packages Vulnerabilities Scan Match Analysis pipelines Policies Curations Software inventory Workflows SBOMs Custom reports Binary analysis Dependency analysis
  • 16. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Supports safe and compliant use of FOSS, with FOSS ○ Recognized worldwide as best-in-class tools ○ Modular design for adaptation to development team processes, tools and environment ○ Coverage for all languages and frameworks ○ Package URL (PURL) used throughout as the package identifier ○ Code AND data licensed under open source licenses, no gimmicks ● Reduce licensing and vulnerability risks from using FOSS or other third-party software components ○ Share risk management responsibilities among business, legal, engineering and security teams ○ Provide a comprehensive view of open source and other third-party components used in your software ● Active community of contributors and users, including many FOSS tools ● Technical support, implementation, advisory services available from nexB Benefits of the AboutCode stack 16
  • 17. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Contribute to an AboutCode project with code, documentation, use cases, bug reports ■ https://github.com/nexB ● Sponsor AboutCode project maintainers ○ Accelerate development of new features and fund contributors ■ https://github.com/sponsors/nexB ● Buy support, implementation, and advisory services from nexB to pay the maintainers ■ https://nexb.com ● Join the community: ■ https://www.aboutcode.org/ ■ https://gitter.im/aboutcode-org/discuss AboutCode also needs your help! 17 "Dependency" by xkcd, used under CC BY-NC 2.5 / Modified text from original
  • 18. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org AboutCode: New Projects
  • 19. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Pipeline Input: sources and binaries ● Collect symbols and identifiers from source and binaries ○ Parse Java bytecode, ELF, DWARF, WinPE, Mach-O, JS mapfiles, collect literals, source symbols ● Map and match these symbols from binaries back to source ● If not mapped, fall back to code matching the PurlDB ● Report discrepancies ○ Code that is found in binaries and NOT in the source ● WIP BUT the code from before xz has been able to detect xz-utils problems and tagged the problematic, malicious build script as "require review"! yeah! Binary, deployment analysis: back2source 19 SCA Tools
  • 20. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org New project: CRAVEX [1] 20 ● Goal is to automate App vulnerabilities management ● .... AND compliance regulatory reporting ● Built for open source projects and small businesses as a free and open solution to comply with the emerging regulatory mandates (SBOMs, CRA) with minimal friction and costs ● Package- and software product-centric management of vulnerabilities ● Web-based, database-backed application to collect, track, and triage FOSS package vulnerabilities and determine their exploitability ○ Rank based on urgency, assess remediation ○ Create VEX reports
  • 21. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org New project: CRAVEX [2] ● Import SBOM and scans for one or more apps, products & components ● Schedule vulnerability lookups and store the results in the database ● Web UI to rank and prioritize package vulnerabilities based on ○ Multiple scores ○ Rule-based automation ○ Vulnerable code reachability and exploitability ○ Usage context ● Export the results of the vulnerabilities triage and processing as VEX documents and attestations 21
  • 22. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org New project: Code reachability ● Upcoming companion to CRAVEX ● Goal: help prioritize vulnerabilities based on actual local exploitability ● Use multiple factors to help better qualify the urgency ● Symbols-based reachability of the vulnerable code ● Call graph-based reachability of the vulnerable code ● Integrate local context to assign exploitability priorities ○ Development or internal tool vs. production software or consumer device ● Integrate existing excellent FOSS efforts in the space ○ Eclipse steady, JORN, Chen 22
  • 23. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org Question: How to reuse safely AI-Generated code? ● AI-Generated code is a wonderful productivity booster ● Thought experiment ○ Build a small LM from only GPL-licensed code from the GNU project. ○ Add Gen-AI on top. Is the generated code derived from the GPL-licensed code? ● AI-Generated code may violate licenses and copyrights ● AI-Generated code may copy vulnerable code sections ● Some large businesses and open source foundations have defined policies wrt. AI-generated code, in some cases prohibiting its use. New project: GenAI Code Search [1] 23
  • 24. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Fight fire with fire ○ An approach we considered is to use GenAI to regenerate code under analysis and compute similarity between regenerated and original code ○ Impractical as too expensive and too slow ● Find similar code fragments ○ The focus of this project ● Traditional code fragments matching does not work for AI ○ The code is broken in chunks using a content-defined heuristic ○ Chunks are matched exactly using a checksum ○ BUT, AI-generated code is seldom exactly the same as indexed FOSS code ○ Existing solutions have ever growing indexes with more fragments to avoid false negative ○ Furthermore, precision and recall are frozen in the choice of parameters for the index New project: GenAI Code Search [2] 24
  • 25. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● A new approach to approximate code fragment matching ● Fingerprint-based ○ Helps scale the index, but also scale the query as a whole codebase (Gigabyte size) is the query ○ Traditional Information retrieval with inverted indexes does not work for queries this large ● Approximate, fuzzy fingerprinting ○ Using new algorithm that enables matching code that was never indexed ● Furthermore, tunable fingerprint ○ Can be tuned at query time for precision and recall New project: GenAI Code Search [2] 25
  • 26. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org New project idea: Open Containers KB 26 ● Containers composition is a mess. Most tools are just plain bad ○ Lack basic tracing of license and package (and therefore vulnerabilities) ● Image builders, OS and distro vendors do not seem to care ○ Official images are sometimes not compliant or not traceable ○ Package volume amplifies vulnerability and license issues ○ Source of binary packages disappears ● We can do better! ● Project idea: create a mini consortium to do a proper, automated and correct SCA of key public base images ● Share these as open data ● Work with upstream to clean their acts
  • 27. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org AboutCode Roadmap
  • 28. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org Roadmap for AboutCode: ScanCode Toolkit ● Build single exe standalone apps for ScanCode for easier deployment in Ci/CD ● Improve copyright and license detection speed ● Build smaller single-purpose tools and libraries from "mono repo" ● Improve data models for Packages and Dependencies/Requirements ● Parse more package manifests and lock files ● Improve support for license exceptions (WITH) ● Move inconclusive, unknown license detection to clues ● Add post-processing to rematch using SPDX matching guidelines 28 28 SCA Tools
  • 29. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Integrate with CI and other tools ○ Create Ci/CD pre-configured integrations with main CI (GitHub, GitLab, Jenkins) ● Extend binary analysis and deployment tracing workflows ○ Support ELF/Native, Go, Ruby, Android in addition to Java and JS ○ Find the exact subset of the code that is deployed and used in production ● Automate analysis review in ScanCode.io ○ End to end automated pipelines for embedded devices, Android and C/C++ ○ Multi-stack deployment analysis for Java, JS, C/C++ ○ Report TODO items to review only "by exception" Roadmap for AboutCode: SCA Tools 29 29 SCA Tools
  • 30. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Code match smart ranking and disambiguation ○ Avoid false positives ● Accurately match to the correct package version ● Match code snippets approximately ○ Using our new approximate fingerprinting ○ Integrate other code matching schemes from SWH and SCANOSS ● Match source symbols and binary symbols to sources and binaries ● New matching pipelines ● Decentralized curation and corrections using in-codebase ABOUT files Roadmap for AboutCode: Code matching 30 30 SCA Tools
  • 31. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Compare scans to focus review work on changes only (DeltaCode) ● APIs and CLI to query all the things by PURL from the KB (purl2all) ● More code inspectors ○ Lightweight package dependency resolution ○ Dedicated ecosystem-focused libraries ● New lightweight package-inspector ○ Single executable to find packages and dependencies ● Trace build execution to find the exact subset of source code that is deployed and used (TraceCode) Roadmap for AboutCode: Other SCA Tools 31 31 SCA Tools
  • 32. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org Roadmap for AboutCode: Management Apps 32 ● Add support for CycloneDX 1.5 and 1.6 and SPDX 3.0 ● Create new review automation apps: ○ License detection review ○ Code match review ○ Vulnerability review ● Overall goal is to reduce review and curation work ○ Extend license clarity scoring to code matches with origin clarity scoring ○ "Auto conclude" matches that are conclusive ● New app for advanced Vulnerability management and support for CRA (Cyber Resiliency Act) compliance ○ Automated triage of vulnerabilities and workflow triggers ○ VEX creation, VEX import and export (Vulnerability Exploitability Exchange) with CSAF and CycloneDX 32 Management Apps
  • 33. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Extend License data with compatibility matrix ● Add new license aliases dataset ● Add more extensive tagging and categorization ● Extend License data with improved exception details ○ To disambiguate license detections of L/GPL with/without exceptions ● Extend License data with improved "or later" details ○ To disambiguate detection of "or later" notices with their primary texts ● Add "key phrases" to all license detection rules ● Add variable text segments to license rules ● Add Fedora alternative SPDX identifiers ● Work with CycloneDX to become their license reference Roadmap for AboutCode: Licenses 33 33 Open Knowledge Base
  • 34. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org Roadmap for AboutCode: Vulnerabilities 34 ● Extend Non-vulnerable dependency resolution ○ Beyond Python - add Java and JS ● Extend vulnerability data with new upstream data sources ● Add fix commit details and support for vulnerability reachability ● Mine the graph to surface related package fixes ● Mine git logs, issues and forums to enrich vulnerability data ● Surface inconsistencies and conflicts between different advisory data sources (VulnTotal throughout) ● Add source/binary discrepancy data (from back2source) 34 Open Knowledge Base
  • 35. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Confirm the true origin of code to avoid ambiguous matches ● Supply chain package verification ○ Map deployed binary packages to their corresponding source code ○ Find suspicious code drift between package versions ● Mine extensive list of "off registry" packages ○ Common native C/C++ code and libraries for embedded ○ Glibc, Busybox, zlib, etc. that are not published on ecosystem package registries ● Collect code symbols from source and binaries (for matching) ● On demand, just in time code mining to build your KB on the fly ● Federated, decentralized shared KB data with Git and ActivityPub ○ Share scans, vulnerabilities, origin facts and curations ○ Scan once, analyze once and collaborate on reviews to clear out the junk! Roadmap for AboutCode: Packages 35 35 Open Knowledge Base
  • 36. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org The AboutCode Stack
  • 37. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org The AboutCode stack for SCA [1] 37 SCA Tools Management Apps Open Knowledge Base ● Web-based enterprise management application ○ DejaCode for ensuring license and security compliance ● SCA tools for identifying third-party code and determining code license and origin ○ ScanCode is the leading code scanner for software component, package and dependency identification, and license detection ○ MatchCode is a new tool for package and file matching ○ container-inspector: analysis tool for Docker & other images ○ nuget-inspector and python-inspector for in-depth dependency resolution ○ many other libraries ○ See https://aboutcode.org for an overview of AboutCode projects ○ See https://github.com/nexB for the code
  • 38. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org The AboutCode stack for SCA [2] 38 SCA Tools Management Apps Open Knowledge Base ● Open knowledge base with open data for licenses, packages and vulnerabilities ○ LicenseDB - open source and other public licenses at: https://scancode-licensedb.aboutcode.org/ ○ PurlDB - package data at: https://public.purldb.io/api/packages/ ○ VulnerableCode - aggregated vulnerability data and comprehensive vulnerability reporting at: https://public.vulnerablecode.io/ ● Standards ○ Package-URL: Specification and tools for identifying packages at: https://github.com/package-url ○ Univers: Parse and compare package versions and ranges at: https://github.com/nexB/univers
  • 39. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Industry-leading scanning engine ● License detection with multiple techniques, rule-based ● Copyright notices with NLP ● Identify packages ○ Normalize all the package metadata ○ Includes dependencies and package license detection ○ Package manifests, system package databases and lockfile parsing ● New summarization and license clarity scoring ○ Identify and focus curation on actual licensing issues ● Accuracy is paramount ○ An incorrect license detection is treated as a bug ● ABOUT files for curations/corrections stored in the codebase The AboutCode stack: ScanCode Toolkit 39 SCA Tools
  • 40. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Web-based scanning server using ScanCode ○ Smarter scripted scanning in multiple steps ● Specialized pipelines for customized analysis ○ Tag items that need your review ○ Pipeline for best-in-class container and VM scanning ● Unique deployment analysis using binary analysis ○ Map binaries back to their sources ● Code matching integrated with the knowledge base ○ Starting with exact and approximate file matching ● Integrated enrichment of the knowledge base ○ Collect and pre-scan all the packages that you use ○ Watch and collect new versions continuously The AboutCode stack: ScanCode.io 40 SCA Tools Open Knowledge Base
  • 41. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● New Web-based code matching server ● Includes mining for custom knowledge base ○ All package ecosystems and linux distros ● Smarter matching in multiple steps ○ Whole tree, exact file, approximate tree and file ○ Coming up: snippet matching, with a twist for AI-Generated code ● Pipeline for ranking and picking best matches ● A different matching approach ○ Exact matching demands a constantly growing index ○ Approximate matching can match software that is NOT indexed ○ Top down rather than bottom up The AboutCode stack: MatchCode 41 SCA Tools Open Knowledge Base
  • 42. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Inspectors: tech-specific tools and dependency resolvers ○ Container and VM images, Debian, ELF and DWARF, NuGet, Python, source ● aboutcode-toolkit: Generate Attribution Notices ○ Using scans or ABOUT files as input ● package-url (PURL): URL string to identify a software package ○ Adopted by CSAF, CycloneDX, SPDX and the whole SCA ecosystem ○ Now part of the CVE specification v5.1 ○ Recommended by US CISA and German BSi ● univers: parse and compare package versions and version ranges ● license-expression: parse and compare License expressions The AboutCode stack: Other projects 42 SCA Tools
  • 43. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Input: buildable codebase ● Run build under "strace" and collect the trace ○ All kernel syscalls that open, close, write to files, spawn processes ● Reconstruct build graph ○ Determine the subset of the sources used in deployment ● Then Scan and Match the source subset ● Useful, but still marginal usage as it requires a lot of tuning Build tracing: TraceCode 43 SCA Tools
  • 44. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Licenses: 2,000+ licenses and 35,000 rules ○ ScanCode LicenseDB has the basic license data ○ ScanCode Toolkit has the license detection rules ○ DejaCode is synchronized with LicenseDB and adds License Conditions ○ All licenses have SPDX Identifiers with “Licenseref-scancode” namespace for the many licenses not included in the SPDX License List (currently 567 licenses) ● No known alternative with comparable depth and breadth The AboutCode stack: Open Data [1] 44 Open Knowledge Base
  • 45. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Packages: 21M+ packages and, files and their fingerprints ○ PURL-based ○ Public PurlDB is at: https://public.purldb.io/api/packages/ ○ All major ecosystems and distributions - sources AND binaries ○ Built-in mining of all package ecosystems, not half-baked ○ Also just-in-time, on-demand data collection ○ Collect, scan, and index all the packages sources, binaries and VCS repos ○ Index with code fingerprints used for code matching ● Other Package databases: ○ Software Heritage, ClearlyDefined, deps.dev (Google) ○ Centralized and too big to share ○ No on-premises option for private operations (too big again) The AboutCode stack: Open Data [2] 45 Open Knowledge Base
  • 46. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Vulnerabilities: 760K+ packages and 240K+ vulnerabilities ○ PURL-based ○ Public VulnerableCodeDB is at: https://public.vulnerablecode.io/ ○ All major ecosystems and vulnerability DBs aggregated and correlated ○ Discover relations (and inconsistencies) in data from mining the graph ● Other Vulnerability databases: ○ OSV (reuses some AboutCode code too), GitHub, GitLab, NVD ○ Often contain conflicting data for vulnerable ranges, fixed versions or affected packages ○ Comparison made possible with VulnTotal to query vulnerable version ranges given a PURL The AboutCode stack: Open Data [3] 46 Open Knowledge Base
  • 47. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● PURLs (Package URLs) are wonderful ● : 760K+ packages and 240K+ vulnerabilities ○ PURL-based ○ Public VulnerableCodeDB is at: https://public.vulnerablecode.io/ ○ All major ecosystems and vulnerability DBs aggregated and correlated ○ Discover relations (and inconsistencies) in data from mining the graph ● Other Vulnerability databases: ○ OSV (reuses some AboutCode code too), GitHub, GitLab, NVD ○ Often contain conflicting data for vulnerable ranges, fixed versions or affected packages ○ Comparison made possible with VulnTotal to query vulnerable version ranges given a PURL The AboutCode stack: Open Data [3] 47 Open Knowledge Base
  • 48. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org The AboutCode stack: DejaCode [1] 48 Integrate all tools and data in one web-based application for SCA and compliance management ● Manage product and component Inventories ● Curate code origin and licenses ● Define and apply license policies ● Launch scans and access the Knowledge Base ● Identify package vulnerabilities ● Consume and enrich SBOMs (CycloneDX or SPDX) ● Generate FOSS compliance documents, such as product Attribution Notices and SBOMs (CycloneDX or SPDX) Management Apps Open Knowledge Base SCA Tools
  • 49. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org The AboutCode stack: DejaCode [2] 49 Integrate all tools and data in one web-based application for SCA and compliance management ● Standard and custom reports ● JSON API and webhooks ● Built-in basic workflows ● Integrated with AboutCode SCA Tools and Open Knowledge Base Management Apps Open Knowledge Base SCA Tools