Practical Compliance
using Open Source Projects
Philippe Ombredanne,
Lead maintainer of AboutCode
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Agenda
1. OpenChain and Friends
2. Recent Community Events
3. Project updates and demos
○ AI-gen Code Matching
○ Vulnerability Management
○ Matching binaries back to sources!
○ An architecture to share data
○ Actionable SBOMs with validated PURLs
4. What's next?
2
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
OpenChain and Friends
3
● Great to bring community and
experts together in one room !
○ April 7-9, 2025 in Stuttgart, Germany
○ ORT Community Day 2025 on Day 3:
https://github.com/oss-review-toolkit/ort/wiki/ORT-Community-Day-2025
○ Event recap from Marcel Kurzmann (Bosch):
https://opensource.bosch.com/stories/openchain-event-and-eastern-eur
ope-initiative/
○ All slides available here:
https://github.com/Open-Source-Compliance/Sharing-creates-value/blob/
openchain_and_friends_2025_prep/Events/OpenChainAndFriends/OpenC
hainAndFriends_Program_2025.md
© e-mobil BW/ KD Busch at OpenChain and Friends
in Stuttgart, Germany on April 07, 2025
● Interesting takeaways
○ Compliance can be automated with
agentic AI, but only with correct data !
■ Oscar Valenzuela (Amazon)
○ Graphs provide traceability for software
supply chains and CRA compliance
■ Thomas Aynaud (Software Heritage)
○ Open automation and CRA compliance
for SMEs with OCCTET
■ Martin von Willebrand (DoubleOpen), Andreas
Kotulla (Bitsea), Michael Plagge (Eclipse Foundation)
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Community feedback from recent events
4
● AI-generated code and security are
top of minds
○ At an international gathering of open
source lawyers and licensing experts,
nobody actually talked about licensing …
○ GenAI is a compliance issue
○ Security is a compliance issue
● But licensing is NOT solved !
Photo by Salve J. Nilsen at FOSDEM 2025 Fringe: FOSS license and security compliance
tools workshop in Brussels, Belgium on January 31, 2025, used under CC BY-SA 4.0
● Tools are not interoperable ....
○ Confusing or overlapping features
○ Users can't decide what is best for their
system or processes
○ We are all reinventing the wheel
● And not sharing data
○ Every organization is rescanning the
same packages for the same data
○ With the same tools
Project updates for
practical compliance
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● AI-Generated Code Search offers a new, faster, and entirely open
approach for approximate code matching to efficiently and effectively
return accurate results
○ Works for whole file code trees and snippet matching, and detecting AI-generated code
○ Fingerprint-based matching helps scale the index but also scale the query as a whole
codebase (GBs) is the query
■ Tunable fingerprint at query time for precision and recall
○ Approximate, fuzzy fingerprinting enables matching code that was never indexed
○ EU-funded through the NGI Search fund with financial support from the European
Commission's Next Generation Internet programme
■ https://github.com/aboutcode-org/ai-gen-code-search
Matching, backed by (actually) open data
6
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
ChatGPT, please generate some code...
7
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Match with ScanCode, PurlDB, and AboutCode
1. Index the top 50,000 npm JavaScript packages in PurlDB
2. Prompt ChatGPT (4o) to generate code for a PURL of the top 100
3. Run code matching (MatchCode) in ScanCode.io
4. Results: +20% have strikingly similar code fragment matched to upstream
○ See https://ai-gen-code-search.readthedocs.io/
● What's in the box?
○ Exact and approximate package, whole code tree, whole or code fragment (snippet) matching
○ Specifically tuned for AI-generated code search, but works well on non-AI generated code
● GenAI is a wonderfully effective copy/pasting machine
8
Demo:
Detect GenAI code
with AI-Generated Code Search
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Efficient vulnerability management
10
● Cyber Resilience Application for Vulnerability Exploitability Exchange
(CRAVEX) automates vulnerability triage and compliance reporting
○ Built for open source projects and small businesses as a free and open solution to comply with
the emerging regulatory mandates (SBOMs, CRA) with minimal friction and costs
○ EU-funded through the NLnet and NGI0 funds with financial support from the European
Commission's Next Generation Internet programme
■ https://nlnet.nl/project/CRAVEX
● Extension of the AboutCode stack, integrated with open data
○ Web-based, database-backed application to collect, track, and triage FOSS package
vulnerabilities, determine their exploitability, and generate reporting as VEX documents
■ Package- and software product-centric management of vulnerabilities
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Vulnerability management for compliance
11
1. Load an SBOM in DejaCode
○ This is continuously watched for new vulnerabilities
2. Triage and dispose of vulnerabilities
3. Export VEX and SBOM
○ Both CycloneDX and SPDX formats!
● What's in the box?
○ Import and combine SBOMs from multiple formats
■ Or scan with ScanCode
○ Get PURL-keyed vulnerability advisories from VulnerableCode
○ Manage triage of vulnerabilities end to end to assess impact
■ Soon with advanced code reachability to automated triage!
○ Export VEX documents
■ Such as CSAF
Demo:
Efficient vulnerability
management with CRAVEX
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Matching binaries back to sources!
13
● back2source automates matching binaries to their sources
○ Or source to source.... XZ Utils anyone?
○ EU-funded in part through the NLnet and NGI0 funds with financial support from the European
Commission's Next Generation Internet programme
■ https://nlnet.nl/project/Back2source/
● Extension of the AboutCode stack, integrated with open data
○ Package- and software product-centric management of vulnerabilities
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Binary analysis of deployed artifacts
14
● 3,000+ instances of Log4j (CVE-2021-44228) remain in shaded
form in Maven Central
○ Scanners ignore that
○ There is NO CVE
○ Everything is going to be fine!
● back2source is a ScanCode.io pipeline
○ Automate (lightweight) decompilation and reversing of binaries
○ Match back to assume source code
○ Find missing sources!
○ Match also to the bigger PurlDB (million of packages)
● Result = detect shaded JARs
○ Like unknown-unknown copies of Log4J still in the wild!
Demo:
Find binary, shaded log4j
with back2source
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● FederatedCode is a federated database of package metadata
○ Decentralized data shared in many git repositories
■ Curation and pub/sub layer over ActivityPub
○ Keyed by PURL!
■ Both metadata, dependencies and vulnerabilities
■ And licenses, and scans!
○ EU-funded in part through the NLnet and NGI0 funds with financial support from the
European Commission's Next Generation Internet programme
■ https://nlnet.nl/project/FederatedSoftwareMetadata
○ Development at: https://github.com/aboutcode-org/federatedcode
Architecture to share data
16
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Deploying FederatedCode to share it all
17
● OCCTET is a FOSS compliance toolkit to simplify compliance for SMEs
○ Collaboration between Eclipse Foundation, Bitsea, EXPERTWARE, the EUROPEAN DIGITAL
SME ALLIANCE, AboutCode, RED ALERT LABS, and DoubleOpen
○ EU-funded through the Digital Europe Programme (DIGITAL) / European Cybersecurity
Competence Centre (ECCC)
■ https://occtet.eu/
● Integrated open data and open tools, based on the AboutCode stack,
and open CRA compliance process guide with:
○ All the raw scan data for packages and vulnerabilities
○ Reduced compliance costs and efforts
■ Streamlined conformity assessment and reduced due diligence burden via shared
database of conformity assessment attestation of FOSS
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● PURLValidator validates the PURL syntax against any known PURLs by
exposing PurlDB's reference data of 20M+ PURL
■ Accessible, single source of truth to the security and SBOM ecosystem at large and improve
the quality and accuracy of PURLs in use, imperative for CRA compliance
○ EU-funded through the NGI0 Commons fund with financial support from the European
Commission's Next Generation Internet programme
■ https://nlnet.nl/project/purlvalidator/
● Better PURL project extends the syntax validation and verifies the
existence of a known PURL, exposed as tools and utilities
○ Ensures package reference data across SBOMs and formats are consistent
○ Funded through corporate sponsors
Actionable SBOMs with validated PURLs
18
What's next?
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Collaborative knowledge sharing
20
● Developing validated and curated databases together
○ How to enable efficiencies for OSSelot and ClearlyDefined
● Federated system for vulnerabilities
○ How to avoid the CVE apocalypse
● Establishing benchmark repositories
○ How to set the pace for what we should expect from tools
● Beach cleaning
○ Scan large FOSS ecosystem registries for inconsistencies
■ Maven
■ Rust
● Create a shared registry of C/C++ packages
○ Keyed by PURL
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Engage with the community
○ AboutCode Slack:
https://join.slack.com/t/aboutcode-org/
shared_invite/zt-2hjzc448i-SZULSuI0~h6
YNSUnBWlAqA
○ Join OpenChain activities:
https://openchainproject.org/participate
Solve the problem(s) with open source tools
and open data
● More work to build a complete
end-to-end compliance solution:
○ Compliance of open source projects
against the CRA compliance
○ Security by design and by default
● Start small and avoid complexity
○ Waste of resources
● Contribute to open source projects
○ https://github.com/aboutcode-org
○ https://github.com/Open-Source-Compliance
21
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Contribute code, documentation, bug reports
■ https://github.com/aboutcode-org
● Sponsor project maintainers
○ Accelerate development of new features and fund contributors
■ https://github.com/sponsors/aboutcode-org
○ Buy support, implementation, and advisory services to pay the
AboutCode maintainers
■ Email pombredanne@aboutcode.org
● Join the community:
■ https://join.slack.com/t/aboutcode-org/
shared_invite/zt-2hjzc448i-SZULSuI0~h6YNSUnBWlAqA
■ https://gitter.im/aboutcode-org/discuss
AboutCode also needs your help!
22
"Dependency" by xkcd, used under CC BY-NC 2.5 /
Modified text from original
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● The future? Let's work together!
○ New foundation to help you secure and automate your software
supply chains with open tools, open data, and open standards
○ Goal = Resolve code origin, licensing, and vulnerabilities as a
sustained and federated effort, so:
■ You can focus on the processes, policies, and automation –
not data corrections and low-level rescanning
■ Software teams can better support and secure their
software supply chains efficiently with reliable automation
○ Founding members to be announced soon!
■ Email hello@aboutcode.org for more information
AboutCode also needs your help!
23
"Dependency" by xkcd, used under CC BY-NC 2.5 /
Modified text from original
Questions?
Philippe Ombredanne
Lead Maintainer,
AboutCode
Connect on LinkedIn!
Project Nayuki (© 2024) QR-Code-generator [Source code]
(MIT). https://github.com/nayuki/QR-Code-generator

OpenChain Webinar - AboutCode - Practical Compliance in One Stack – Licensing, Vulnerabilities, and More

  • 1.
    Practical Compliance using OpenSource Projects Philippe Ombredanne, Lead maintainer of AboutCode
  • 2.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org Agenda 1. OpenChain and Friends 2. Recent Community Events 3. Project updates and demos ○ AI-gen Code Matching ○ Vulnerability Management ○ Matching binaries back to sources! ○ An architecture to share data ○ Actionable SBOMs with validated PURLs 4. What's next? 2
  • 3.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org OpenChain and Friends 3 ● Great to bring community and experts together in one room ! ○ April 7-9, 2025 in Stuttgart, Germany ○ ORT Community Day 2025 on Day 3: https://github.com/oss-review-toolkit/ort/wiki/ORT-Community-Day-2025 ○ Event recap from Marcel Kurzmann (Bosch): https://opensource.bosch.com/stories/openchain-event-and-eastern-eur ope-initiative/ ○ All slides available here: https://github.com/Open-Source-Compliance/Sharing-creates-value/blob/ openchain_and_friends_2025_prep/Events/OpenChainAndFriends/OpenC hainAndFriends_Program_2025.md © e-mobil BW/ KD Busch at OpenChain and Friends in Stuttgart, Germany on April 07, 2025 ● Interesting takeaways ○ Compliance can be automated with agentic AI, but only with correct data ! ■ Oscar Valenzuela (Amazon) ○ Graphs provide traceability for software supply chains and CRA compliance ■ Thomas Aynaud (Software Heritage) ○ Open automation and CRA compliance for SMEs with OCCTET ■ Martin von Willebrand (DoubleOpen), Andreas Kotulla (Bitsea), Michael Plagge (Eclipse Foundation)
  • 4.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org Community feedback from recent events 4 ● AI-generated code and security are top of minds ○ At an international gathering of open source lawyers and licensing experts, nobody actually talked about licensing … ○ GenAI is a compliance issue ○ Security is a compliance issue ● But licensing is NOT solved ! Photo by Salve J. Nilsen at FOSDEM 2025 Fringe: FOSS license and security compliance tools workshop in Brussels, Belgium on January 31, 2025, used under CC BY-SA 4.0 ● Tools are not interoperable .... ○ Confusing or overlapping features ○ Users can't decide what is best for their system or processes ○ We are all reinventing the wheel ● And not sharing data ○ Every organization is rescanning the same packages for the same data ○ With the same tools
  • 5.
  • 6.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org ● AI-Generated Code Search offers a new, faster, and entirely open approach for approximate code matching to efficiently and effectively return accurate results ○ Works for whole file code trees and snippet matching, and detecting AI-generated code ○ Fingerprint-based matching helps scale the index but also scale the query as a whole codebase (GBs) is the query ■ Tunable fingerprint at query time for precision and recall ○ Approximate, fuzzy fingerprinting enables matching code that was never indexed ○ EU-funded through the NGI Search fund with financial support from the European Commission's Next Generation Internet programme ■ https://github.com/aboutcode-org/ai-gen-code-search Matching, backed by (actually) open data 6
  • 7.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org ChatGPT, please generate some code... 7
  • 8.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org Match with ScanCode, PurlDB, and AboutCode 1. Index the top 50,000 npm JavaScript packages in PurlDB 2. Prompt ChatGPT (4o) to generate code for a PURL of the top 100 3. Run code matching (MatchCode) in ScanCode.io 4. Results: +20% have strikingly similar code fragment matched to upstream ○ See https://ai-gen-code-search.readthedocs.io/ ● What's in the box? ○ Exact and approximate package, whole code tree, whole or code fragment (snippet) matching ○ Specifically tuned for AI-generated code search, but works well on non-AI generated code ● GenAI is a wonderfully effective copy/pasting machine 8
  • 9.
    Demo: Detect GenAI code withAI-Generated Code Search
  • 10.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org Efficient vulnerability management 10 ● Cyber Resilience Application for Vulnerability Exploitability Exchange (CRAVEX) automates vulnerability triage and compliance reporting ○ Built for open source projects and small businesses as a free and open solution to comply with the emerging regulatory mandates (SBOMs, CRA) with minimal friction and costs ○ EU-funded through the NLnet and NGI0 funds with financial support from the European Commission's Next Generation Internet programme ■ https://nlnet.nl/project/CRAVEX ● Extension of the AboutCode stack, integrated with open data ○ Web-based, database-backed application to collect, track, and triage FOSS package vulnerabilities, determine their exploitability, and generate reporting as VEX documents ■ Package- and software product-centric management of vulnerabilities
  • 11.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org Vulnerability management for compliance 11 1. Load an SBOM in DejaCode ○ This is continuously watched for new vulnerabilities 2. Triage and dispose of vulnerabilities 3. Export VEX and SBOM ○ Both CycloneDX and SPDX formats! ● What's in the box? ○ Import and combine SBOMs from multiple formats ■ Or scan with ScanCode ○ Get PURL-keyed vulnerability advisories from VulnerableCode ○ Manage triage of vulnerabilities end to end to assess impact ■ Soon with advanced code reachability to automated triage! ○ Export VEX documents ■ Such as CSAF
  • 12.
  • 13.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org Matching binaries back to sources! 13 ● back2source automates matching binaries to their sources ○ Or source to source.... XZ Utils anyone? ○ EU-funded in part through the NLnet and NGI0 funds with financial support from the European Commission's Next Generation Internet programme ■ https://nlnet.nl/project/Back2source/ ● Extension of the AboutCode stack, integrated with open data ○ Package- and software product-centric management of vulnerabilities
  • 14.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org Binary analysis of deployed artifacts 14 ● 3,000+ instances of Log4j (CVE-2021-44228) remain in shaded form in Maven Central ○ Scanners ignore that ○ There is NO CVE ○ Everything is going to be fine! ● back2source is a ScanCode.io pipeline ○ Automate (lightweight) decompilation and reversing of binaries ○ Match back to assume source code ○ Find missing sources! ○ Match also to the bigger PurlDB (million of packages) ● Result = detect shaded JARs ○ Like unknown-unknown copies of Log4J still in the wild!
  • 15.
    Demo: Find binary, shadedlog4j with back2source
  • 16.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org ● FederatedCode is a federated database of package metadata ○ Decentralized data shared in many git repositories ■ Curation and pub/sub layer over ActivityPub ○ Keyed by PURL! ■ Both metadata, dependencies and vulnerabilities ■ And licenses, and scans! ○ EU-funded in part through the NLnet and NGI0 funds with financial support from the European Commission's Next Generation Internet programme ■ https://nlnet.nl/project/FederatedSoftwareMetadata ○ Development at: https://github.com/aboutcode-org/federatedcode Architecture to share data 16
  • 17.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org Deploying FederatedCode to share it all 17 ● OCCTET is a FOSS compliance toolkit to simplify compliance for SMEs ○ Collaboration between Eclipse Foundation, Bitsea, EXPERTWARE, the EUROPEAN DIGITAL SME ALLIANCE, AboutCode, RED ALERT LABS, and DoubleOpen ○ EU-funded through the Digital Europe Programme (DIGITAL) / European Cybersecurity Competence Centre (ECCC) ■ https://occtet.eu/ ● Integrated open data and open tools, based on the AboutCode stack, and open CRA compliance process guide with: ○ All the raw scan data for packages and vulnerabilities ○ Reduced compliance costs and efforts ■ Streamlined conformity assessment and reduced due diligence burden via shared database of conformity assessment attestation of FOSS
  • 18.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org ● PURLValidator validates the PURL syntax against any known PURLs by exposing PurlDB's reference data of 20M+ PURL ■ Accessible, single source of truth to the security and SBOM ecosystem at large and improve the quality and accuracy of PURLs in use, imperative for CRA compliance ○ EU-funded through the NGI0 Commons fund with financial support from the European Commission's Next Generation Internet programme ■ https://nlnet.nl/project/purlvalidator/ ● Better PURL project extends the syntax validation and verifies the existence of a known PURL, exposed as tools and utilities ○ Ensures package reference data across SBOMs and formats are consistent ○ Funded through corporate sponsors Actionable SBOMs with validated PURLs 18
  • 19.
  • 20.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org Collaborative knowledge sharing 20 ● Developing validated and curated databases together ○ How to enable efficiencies for OSSelot and ClearlyDefined ● Federated system for vulnerabilities ○ How to avoid the CVE apocalypse ● Establishing benchmark repositories ○ How to set the pace for what we should expect from tools ● Beach cleaning ○ Scan large FOSS ecosystem registries for inconsistencies ■ Maven ■ Rust ● Create a shared registry of C/C++ packages ○ Keyed by PURL
  • 21.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Engage with the community ○ AboutCode Slack: https://join.slack.com/t/aboutcode-org/ shared_invite/zt-2hjzc448i-SZULSuI0~h6 YNSUnBWlAqA ○ Join OpenChain activities: https://openchainproject.org/participate Solve the problem(s) with open source tools and open data ● More work to build a complete end-to-end compliance solution: ○ Compliance of open source projects against the CRA compliance ○ Security by design and by default ● Start small and avoid complexity ○ Waste of resources ● Contribute to open source projects ○ https://github.com/aboutcode-org ○ https://github.com/Open-Source-Compliance 21
  • 22.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Contribute code, documentation, bug reports ■ https://github.com/aboutcode-org ● Sponsor project maintainers ○ Accelerate development of new features and fund contributors ■ https://github.com/sponsors/aboutcode-org ○ Buy support, implementation, and advisory services to pay the AboutCode maintainers ■ Email pombredanne@aboutcode.org ● Join the community: ■ https://join.slack.com/t/aboutcode-org/ shared_invite/zt-2hjzc448i-SZULSuI0~h6YNSUnBWlAqA ■ https://gitter.im/aboutcode-org/discuss AboutCode also needs your help! 22 "Dependency" by xkcd, used under CC BY-NC 2.5 / Modified text from original
  • 23.
    © AboutCode -License: CC-BY-SA-4.0 - https://www.aboutcode.org ● The future? Let's work together! ○ New foundation to help you secure and automate your software supply chains with open tools, open data, and open standards ○ Goal = Resolve code origin, licensing, and vulnerabilities as a sustained and federated effort, so: ■ You can focus on the processes, policies, and automation – not data corrections and low-level rescanning ■ Software teams can better support and secure their software supply chains efficiently with reliable automation ○ Founding members to be announced soon! ■ Email hello@aboutcode.org for more information AboutCode also needs your help! 23 "Dependency" by xkcd, used under CC BY-NC 2.5 / Modified text from original
  • 24.
    Questions? Philippe Ombredanne Lead Maintainer, AboutCode Connecton LinkedIn! Project Nayuki (© 2024) QR-Code-generator [Source code] (MIT). https://github.com/nayuki/QR-Code-generator