Copyright © nexB Inc. License: CC-BY-SA-4.0
“State of the Tooling”
in Open Source Automation
OpenChain German work group
Philippe Ombredanne, AboutCode.org nexB Inc.
Copyright © nexB Inc. License: CC-BY-SA-4.0
Philippe Ombredanne
► Project lead and maintainer for VulnerableCode, ScanCode and AboutCode
► Creator of Package URL, co-founder of SPDX & ClearlyDefined
► FOSS veteran, long time Google Summer of Code mentor
► Co-founder and CTO of nexB Inc., makers of DejaCode
► Weird facts and claims to fame
● Signed off on the largest deletion of lines of code in the Linux kernel
(but these were only comments)
● Unrepentant code hoarder. Had 60,000+ GH forks
now down only to 20K forks
► pombredanne@nexb.com irc:pombreda
Copyright © nexB Inc. License: CC-BY-SA-4.0
Why open source compliance tooling?
▷ Because open source for open source: This is the way!
● Dogfooding
▷ Free as in beer and freedom of course
● Code of course, but do not forget the data!
▷ Key to enable right-sized automation for your open chain
▷ Best-in-class tools in several areas
Copyright © nexB Inc. License: CC-BY-SA-4.0
Key trends (1) Time to retool?
▷ 3rd wave of Compliance tooling creation and adoption underway
● 1st wave was commercial
● 2nd wave was centered on license compliance and legal
● 3rd wave will be centered on developers and appsec
■ Eventually balanced and holistic FOSS solutions
▷ TODO: Review your existing approach and retool
Copyright © nexB Inc. License: CC-BY-SA-4.0
Key trends (2)
▷ Security is top of mind
● SBOMs are everywhere, but for what? Few can process them
▷ And license compliance is not yet solved
● Still a lot of work left for automation
● Emerging scripting platforms to capture your pipelines
■ Orchestrate many tools
▷ Open data and data sharing will happen
● Everybody wants it, but also everyone wants to control it
● Centralized or decentralized?
Copyright © nexB Inc. License: CC-BY-SA-4.0
Key trends (3)
▷ Software health, quality, sustainability are not yet on the radar
▷ FOSS GUI/Web apps are still badly missing
▷ Slowly the analysis of builds and binaries will displace source-only
scans
▷ Dependency tracking is not yet solved at scale
Copyright © nexB Inc. License: CC-BY-SA-4.0
Key trends (4) Best tools are FOSS
▷ The leading tools are mostly FOSS first
● License detection
● Container analysis
● Package detection
● Dependency tracking and resolution
▷ But BEWARE
● Lots of tools are shallow and look only skin deep
■ Barely suitable for serious license or security work
● Do your homework and try the tools: they are open after all
Copyright © nexB Inc. License: CC-BY-SA-4.0
▷ Vulnerability and package databases are the new rush
● Open or commercial vulnerability databases with supposedly
"premium" content
● But BEWARE of the data quality. Size DOES NOT matter.
■ Made up packages, made up versions
■ Not worth their price: Compare and include open solutions!
▷ Every commercial tool now includes license data
● License data derived from package manifest is NOT ENOUGH
● Built-in policies are impractical: is GPL always bad??
Key trends (5) Poor data quality
Copyright © nexB Inc. License: CC-BY-SA-4.0
PURL is emerging as the glue to avoid lock-in!
● Started to support package ids in ScanCode and VulnerableCode, now everywhere
○ CycloneDX
○ SPDX including just released GitHub SPDX SBOMs features
○ Google OSV
○ Sonatype OSSIndex
○ New PurlDB, MatchCode
○ Most FOSS tools such as ORT, Fosslight, DependencyTrack, Anchore, Tern and
most of the open (and prioprietary) SCA and Infosec/Appsec tools
● Coming to the NVD in version 5.1!!
● Key vector for interop: if two tools speak PURL, integration is made easier
● Demand its adoption by your vendors and projects
Key trends (6) PURL is the essential glue
Copyright © nexB Inc. License: CC-BY-SA-4.0
Key insights (1): Share the data!
"I would like to have automation to avoid repeat work when re-running tools"
"Let's avoid re-running scans, share them and reuse them instead"
● Everyone wants to share and reuse data from scans, and origin and license data
○ Speed up origin and license review
○ Avoid redoing the scans and the same review either inside my org or across orgs
● But "It is hard to overcome lawyers’ objections to sharing data such as license conclusions
and curations"
● And how to trust the scans and curations? And deal with different policies and
standards for conclusions and curations? (specifically about licensing)
● What is the motivation and ease for public data sharing?
Copyright © nexB Inc. License: CC-BY-SA-4.0
Key insights (2): Open the data!
● Open data (e.g., as in free and open licensed data on FOSS) are emerging
○ The too big to share argument will not hold
● Eventually open, community curated FOSS package "knowledge bases" will become
the norm and supplant proprietary, closed source alternatives
● We should share raw scanners/tools outputs first
● We should fix upstream licensing issues, upstream
● The centralized approach does not work well
○ Too big to share
○ Out of date
○ Lack of trust in centralized control
Copyright © nexB Inc. License: CC-BY-SA-4.0
License and Vulnerability are like oil and vinegar
● Even if core process is code origin determination, constituents are not the same (yet)
○ License folks care less about Vulnerabilities
○ Security folks care less about Licenses
● FOSS projects that cater to both should provide differentiated documentation for
each audience
● Some core tools are the same, but users are different
● Expect a convergence of the two aspects in the future
● Until then, advice to OSPOs:
○ Handle both domains
○ But adapt your language to each constituent/persona
Key insights (3) Licensing != Security?
Copyright © nexB Inc. License: CC-BY-SA-4.0
Multiple FOSS projects try to solve license compatibility
● FLICT, OSADL, Hermine Oniro
● Automating license conflicts/compatibility checks is a real problem at scale
● Projects may work together and eventually some conventions will emerge
● Key domains
○ Help legal understand/zoom in on key license concerns
○ What is the effect of multiple licenses?
○ How to surface license compatibility issues
● Effective/resulting license inference and compatibility is a policy issue
○ But tooling can automate the grunt work
Key insights (4) License Compatibility
Copyright © nexB Inc. License: CC-BY-SA-4.0
● Does copying a snippet of code really matter?
○ Have you looked at the big rocks first? e.g., whole libraries
○ Are you ready to pay the price in time and/or cash?
Image credits: https://www.integrativenutrition.com/
Key insights (5) Snippets and matching?
Copyright © nexB Inc. License: CC-BY-SA-4.0
● Domain has been abandoned by commercial vendors
○ Snyk has spun off FOSSID
○ Synopsys mostly abandoned Protex
● One new entrant with open source code but proprietary data: SCANOSS
● Snippets may not matter (too much)
● But AI/ML-generated code snippets anyone?
○ Will Artificial general intelligence (AGI) make snippets both more relevant and
useless at the same time when everyone can generate the same boilerplate
derived from everyone's code
● Yet code matching can speed up the analysis when done right (find big rocks first)
○ Reuse previous analysis based on matching code: WIP with MatchCode
Key insights (5) Snippets and matching?
Copyright © nexB Inc. License: CC-BY-SA-4.0
● SBOMs are everywhere
○ GitHub can even create these directly from a repo
○ But what about data quality (depth and breadth)?
○ But what about using proper machine readable identifiers (license, PURL)?
● Hi-Fi or Lo-Fi SBOMs?
● Every tool creates SBOMs but then what?
○ 2 out of 50+ folks were effectively consuming SBOMs
● Big gaps in tool-to-tool integration
● Too much over engineering, and under-specification
● Advice: Ignore the SPDX vs. CycloneDX feud and embrace both, with PURL
○ Feel free to ignore SWID
○ SBOM is just a reporting format
Key insights (6) SBOM, mehBOM?
Copyright © nexB Inc. License: CC-BY-SA-4.0
● Collaborate: License conflict/compatibility checking FOSS projects on data
and standards (FLIct/OSADl/Hermie)
● Create: A live inventory of all FOSS tools and their capabilities
● Share: Approaches to dependency detection/resolution/processing
● Define: Evolve a standard/schema for tool-to-tool technical scan data sharing
● DATA: Exchange data!
Follow up on collaboration opportunities?
Copyright © nexB Inc. License: CC-BY-SA-4.0
Credits
▷ Presentation template by SlidesCarnival licensed under CC-BY-4.0
▷ Photograph by Unsplash licensed under Unsplash License
▷ Other content licensed under CC-BY-SA-4.0
18

“State of the Tooling” in Open Source Automation

  • 1.
    Copyright © nexBInc. License: CC-BY-SA-4.0 “State of the Tooling” in Open Source Automation OpenChain German work group Philippe Ombredanne, AboutCode.org nexB Inc.
  • 2.
    Copyright © nexBInc. License: CC-BY-SA-4.0 Philippe Ombredanne ► Project lead and maintainer for VulnerableCode, ScanCode and AboutCode ► Creator of Package URL, co-founder of SPDX & ClearlyDefined ► FOSS veteran, long time Google Summer of Code mentor ► Co-founder and CTO of nexB Inc., makers of DejaCode ► Weird facts and claims to fame ● Signed off on the largest deletion of lines of code in the Linux kernel (but these were only comments) ● Unrepentant code hoarder. Had 60,000+ GH forks now down only to 20K forks ► pombredanne@nexb.com irc:pombreda
  • 3.
    Copyright © nexBInc. License: CC-BY-SA-4.0 Why open source compliance tooling? ▷ Because open source for open source: This is the way! ● Dogfooding ▷ Free as in beer and freedom of course ● Code of course, but do not forget the data! ▷ Key to enable right-sized automation for your open chain ▷ Best-in-class tools in several areas
  • 4.
    Copyright © nexBInc. License: CC-BY-SA-4.0 Key trends (1) Time to retool? ▷ 3rd wave of Compliance tooling creation and adoption underway ● 1st wave was commercial ● 2nd wave was centered on license compliance and legal ● 3rd wave will be centered on developers and appsec ■ Eventually balanced and holistic FOSS solutions ▷ TODO: Review your existing approach and retool
  • 5.
    Copyright © nexBInc. License: CC-BY-SA-4.0 Key trends (2) ▷ Security is top of mind ● SBOMs are everywhere, but for what? Few can process them ▷ And license compliance is not yet solved ● Still a lot of work left for automation ● Emerging scripting platforms to capture your pipelines ■ Orchestrate many tools ▷ Open data and data sharing will happen ● Everybody wants it, but also everyone wants to control it ● Centralized or decentralized?
  • 6.
    Copyright © nexBInc. License: CC-BY-SA-4.0 Key trends (3) ▷ Software health, quality, sustainability are not yet on the radar ▷ FOSS GUI/Web apps are still badly missing ▷ Slowly the analysis of builds and binaries will displace source-only scans ▷ Dependency tracking is not yet solved at scale
  • 7.
    Copyright © nexBInc. License: CC-BY-SA-4.0 Key trends (4) Best tools are FOSS ▷ The leading tools are mostly FOSS first ● License detection ● Container analysis ● Package detection ● Dependency tracking and resolution ▷ But BEWARE ● Lots of tools are shallow and look only skin deep ■ Barely suitable for serious license or security work ● Do your homework and try the tools: they are open after all
  • 8.
    Copyright © nexBInc. License: CC-BY-SA-4.0 ▷ Vulnerability and package databases are the new rush ● Open or commercial vulnerability databases with supposedly "premium" content ● But BEWARE of the data quality. Size DOES NOT matter. ■ Made up packages, made up versions ■ Not worth their price: Compare and include open solutions! ▷ Every commercial tool now includes license data ● License data derived from package manifest is NOT ENOUGH ● Built-in policies are impractical: is GPL always bad?? Key trends (5) Poor data quality
  • 9.
    Copyright © nexBInc. License: CC-BY-SA-4.0 PURL is emerging as the glue to avoid lock-in! ● Started to support package ids in ScanCode and VulnerableCode, now everywhere ○ CycloneDX ○ SPDX including just released GitHub SPDX SBOMs features ○ Google OSV ○ Sonatype OSSIndex ○ New PurlDB, MatchCode ○ Most FOSS tools such as ORT, Fosslight, DependencyTrack, Anchore, Tern and most of the open (and prioprietary) SCA and Infosec/Appsec tools ● Coming to the NVD in version 5.1!! ● Key vector for interop: if two tools speak PURL, integration is made easier ● Demand its adoption by your vendors and projects Key trends (6) PURL is the essential glue
  • 10.
    Copyright © nexBInc. License: CC-BY-SA-4.0 Key insights (1): Share the data! "I would like to have automation to avoid repeat work when re-running tools" "Let's avoid re-running scans, share them and reuse them instead" ● Everyone wants to share and reuse data from scans, and origin and license data ○ Speed up origin and license review ○ Avoid redoing the scans and the same review either inside my org or across orgs ● But "It is hard to overcome lawyers’ objections to sharing data such as license conclusions and curations" ● And how to trust the scans and curations? And deal with different policies and standards for conclusions and curations? (specifically about licensing) ● What is the motivation and ease for public data sharing?
  • 11.
    Copyright © nexBInc. License: CC-BY-SA-4.0 Key insights (2): Open the data! ● Open data (e.g., as in free and open licensed data on FOSS) are emerging ○ The too big to share argument will not hold ● Eventually open, community curated FOSS package "knowledge bases" will become the norm and supplant proprietary, closed source alternatives ● We should share raw scanners/tools outputs first ● We should fix upstream licensing issues, upstream ● The centralized approach does not work well ○ Too big to share ○ Out of date ○ Lack of trust in centralized control
  • 12.
    Copyright © nexBInc. License: CC-BY-SA-4.0 License and Vulnerability are like oil and vinegar ● Even if core process is code origin determination, constituents are not the same (yet) ○ License folks care less about Vulnerabilities ○ Security folks care less about Licenses ● FOSS projects that cater to both should provide differentiated documentation for each audience ● Some core tools are the same, but users are different ● Expect a convergence of the two aspects in the future ● Until then, advice to OSPOs: ○ Handle both domains ○ But adapt your language to each constituent/persona Key insights (3) Licensing != Security?
  • 13.
    Copyright © nexBInc. License: CC-BY-SA-4.0 Multiple FOSS projects try to solve license compatibility ● FLICT, OSADL, Hermine Oniro ● Automating license conflicts/compatibility checks is a real problem at scale ● Projects may work together and eventually some conventions will emerge ● Key domains ○ Help legal understand/zoom in on key license concerns ○ What is the effect of multiple licenses? ○ How to surface license compatibility issues ● Effective/resulting license inference and compatibility is a policy issue ○ But tooling can automate the grunt work Key insights (4) License Compatibility
  • 14.
    Copyright © nexBInc. License: CC-BY-SA-4.0 ● Does copying a snippet of code really matter? ○ Have you looked at the big rocks first? e.g., whole libraries ○ Are you ready to pay the price in time and/or cash? Image credits: https://www.integrativenutrition.com/ Key insights (5) Snippets and matching?
  • 15.
    Copyright © nexBInc. License: CC-BY-SA-4.0 ● Domain has been abandoned by commercial vendors ○ Snyk has spun off FOSSID ○ Synopsys mostly abandoned Protex ● One new entrant with open source code but proprietary data: SCANOSS ● Snippets may not matter (too much) ● But AI/ML-generated code snippets anyone? ○ Will Artificial general intelligence (AGI) make snippets both more relevant and useless at the same time when everyone can generate the same boilerplate derived from everyone's code ● Yet code matching can speed up the analysis when done right (find big rocks first) ○ Reuse previous analysis based on matching code: WIP with MatchCode Key insights (5) Snippets and matching?
  • 16.
    Copyright © nexBInc. License: CC-BY-SA-4.0 ● SBOMs are everywhere ○ GitHub can even create these directly from a repo ○ But what about data quality (depth and breadth)? ○ But what about using proper machine readable identifiers (license, PURL)? ● Hi-Fi or Lo-Fi SBOMs? ● Every tool creates SBOMs but then what? ○ 2 out of 50+ folks were effectively consuming SBOMs ● Big gaps in tool-to-tool integration ● Too much over engineering, and under-specification ● Advice: Ignore the SPDX vs. CycloneDX feud and embrace both, with PURL ○ Feel free to ignore SWID ○ SBOM is just a reporting format Key insights (6) SBOM, mehBOM?
  • 17.
    Copyright © nexBInc. License: CC-BY-SA-4.0 ● Collaborate: License conflict/compatibility checking FOSS projects on data and standards (FLIct/OSADl/Hermie) ● Create: A live inventory of all FOSS tools and their capabilities ● Share: Approaches to dependency detection/resolution/processing ● Define: Evolve a standard/schema for tool-to-tool technical scan data sharing ● DATA: Exchange data! Follow up on collaboration opportunities?
  • 18.
    Copyright © nexBInc. License: CC-BY-SA-4.0 Credits ▷ Presentation template by SlidesCarnival licensed under CC-BY-4.0 ▷ Photograph by Unsplash licensed under Unsplash License ▷ Other content licensed under CC-BY-SA-4.0 18