More Related Content Similar to FOSSology & GSOC Journey (20) More from Gaurav Mishra (11) FOSSology & GSOC Journey1. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project
FOSSology & GSOC Journey
shaheem.azmal@siemens.com <Shaheem Azmal M MD>
mishra.gaurav@siemens.com <Gaurav Mishra>
2. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 2
Agenda
• FOSSology introduction
• New features since last
year
• FOSSology Scanning in CI
• GSOC
• Conclusion
3. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 3
The Problem Actually
Distributing open source software requires to
∙ Provide licenses of involved software
∙ Provide copyright statements of involved authors
∙ Provide disclaimers
∙ … and much more
You know these examples
4. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 4
It is about finding licenses
∙ License texts
∙ References to licenses
∙ Written texts explaining licensing
∙ License relevant statements
Finding Licenses
5. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 5
What is FOSSology?
A Web server application for license and copyright compliance of software components.
FOSSology Project
https://www.fossology.org/
∙ Published first in 2008, GPL-2.0
∙ 2015: Linux Foundation collaboration project
∙ Web server based and command line
interfaces
∙ Scanning agents searching for license and
copyright relevant hits (and more …)
∙ A multi-user / multi-tenant Web UI for review
organizing clearing job
FOSSology Development
https://www.github.com/fossology/fossology
▪ Standard Web application stack:
▪ Linux, Apache 2, PostgreSQL, PHP,
▪ Web-based UI in PHP, but scanners
written in C / C++
▪ Two ways to interact:
▪ Web user interface
▪ Command line utilities
6. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 6
How does FOSSology work?
• Uploading source code archive (*.zip, *.tar.gz, etc)
• Agents scan for license relevant text
• Copyrights, Export Control (ECC), your keywords to look for etc.
• Review scanner results for wrong license classification
• Review other scanner findings (copyrights, ECC)
• Result of the “clearing”
• SPDX reporting
• Generated notice or readme file
• debian-copyright
Upload
Component
Agents
Scanning
Review
Results
Generate
Reporting
Pass Report
to Client
7. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 7
FOSSology Feature Overview
A Web server application for license and copyright compliance of software components.
License Scan features
∙ Regular expression scanner
∙ Text similarity scanner
∙ License (text) management
∙ Aggregation of licenses in hierarchical view
∙ License histogram
∙ Supporting concluded vs. found license
∙ Bulk processing of files with same licensing
∙ Reusing of license conclusions
Other features
▪ Copyright, authorship statements scanner
▪ Export control and customs scanner
▪ Command line interfaces
▪ Reporting
▪ SPDX RDF and tag-value
▪ Debian-copyright
▪ Plain text output
▪ Files sorting in buckets
▪ User, group and upload management
8. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 8
New Features of FOSSology since last year
▪ Update licenses from SPDX.
▪ Integration with corporate authentication LDAP.
▪ Change permissions for multiple uploads with a single click.
▪ Export all found copyright statements as CSV.
▪ Improvement of analysis report and standalone operations for different agents.
▪ New Agent OJO to scan SPDX-License-Identifier.
▪ Lots of improvements in REST API of FOSSology.
▪ Many More….
9. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 9
FOSSology Scanning In CI
Power of Open Source, benefits of automation
10. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 10
WHY?
Is the current way good enough?
Lot’s of code
change
Preparation for
release
Release
Perform license
and copyright
scanning
License
conflict
Go or no go
decision
11. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 11
New way
Ease the load with automation
New
change
Bug fix
Feature
Continuous
scan
Licenses
Copyrights
Keywords
Release Audit
Smooth
• Easy, lesser
changes
Failure
12. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 12
Changes required
.gitlab-ci.yml
whitelist.json
Checkout the documentation:
https://github.com/fossology/fossology/wiki/FOSSology-as-CI-scanner
.travis.yml
13. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 13
Pipeline status
GitLab
License check failure Oll Korrect
14. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 14
Pipeline status
Travis
License check failure Oll Korrect
15. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 15
Output
License failure
16. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 16
Output
Copyright failure
Potential whitelist file
17. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 17
nomos
• Most trusted scanner in
FOSSology
• Uses regular expression and
heuristics
ojo
• SPDX License Identifier scanner
• Can find licenses attached using
WITH, AND, OR
• Uses regular expressions
• Lightning fast
copyright
• Very low false negative findings
• Can find email and URLs too
• Uses regular expressions
keyword
• Helps in finding potential harmful
keywords like:
• licensed, modify it under, etc.
Scanners availability
Following scanners are shipped with the runner
18. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 18
Diff scanning
• Default scanning mode
• Scan only the diff created by the
merge request
• Reduced set of data to scan
• Faster feedback at commit level for
developers creating the changes
• Good for build CI pipeline
Repo scan
• Can be used using repo flag
• Scan the complete repo at that
particular commit
• Provides a good overview of the repo
for audit works
• Can be scheduled to run at set
interval crons
• Good for release/tag pipeline
Scanning modes
19. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 19
licenses
• List of licenses which are
whitelisted
• Each licenses needs to be
explicitly mentioned to avoid false
negative
Whitelisting
exclude
• Files to exclude from scan
• Configuration or test folders
• Understands file glob wild
characters
20. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 20
Benefits
Time
Frequent
checks
Faster
audit
Faster
release
Less
changes
Lesser
errors
21. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 21
Sample projects and pipelines
GitLab:
https://gitlab.com/GMishx/fossology/-/merge_requests/2/pipelines
https://gitlab.com/GMishx/fossology/-/merge_requests/3/pipelines
Travis:
https://github.com/GMishx/fossology/pulls
https://travis-ci.com/github/GMishx/fossology/builds/173617637
https://travis-ci.com/github/GMishx/fossology/builds/173617688
Pull Request:
https://github.com/fossology/fossology/pull/1736
22. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 22
GSoC
is an international annual
program in which Google
awards stipends to students
who successfully complete
a software coding project for an
open source organization
during the summer.
What is Google Summer of code?
Disclaimer: All third party logos and icons referenced by this slide are the property of their respective owners. They are just used to highlight the UI.
23. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 23
GSoC Timeline
Preparations
Gathering ideas
● Involvement of community.
● Using GitHub issues
Finalizing
Idea selection for GSoC
● Filtering idea
● Moving to Wiki
● Labelling issues
Application
Applying for GSoC
● Preparation of application
● New channel in slack
Proposals
Collaboration on selection
● Gathering proposals
● Slack, email, GitHub issue
24. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 24
Background Google Summer of Code
For Students
∙ Experience in writing code
∙ Collaborate in OSS project
∙ Work in a distributed environment
∙ Internship experience
∙ Internship stipend by Google
For Mentors
▪ Positive visibility
▪ Meet new students
▪ Extend the OSS community
▪ Experience distributed collaboration
25. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 25
Aman Jain
Atarashi
Vivek
Spasht
Sandeep
Software
Heritage Agent
Ayush
Atarashi Agent
18 … 19 …
GSoC 2018 & 2019
26. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 26
Ayush
Code Comment
Kaushlendra
Atarashi
enhancement
Darshan
Dashboard
Project Goals:
Weekly Progress Report:
Milestone achieved:
First Evaluation
Second Evaluation
Third Evaluation
Reporting
20 …
GSoC 2020
27. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 27
● Student involvement coding
● Bi-weekly meetings for interested students to
discuss the community progress.
How Students are helping us after the GSoC:
● Messenger of FOSSology
● Mentoring interested students
● And continuous collaboration
Post GSoC
28. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 28
Atarashi
A Step towards non-rule based standalone command line scanner… (https://github.com/fossology/atarashi)
Different
methods for
scanning license
statements
• Unlike rule-based approaches, like Nomos, Atarashi implements
multiple text statistics and information retrieval algorithms.
Distance finding
algorithms
• Word Frequency Similarity
• Term frequency-inverse document frequency (tf-idf)
• Damerau–Levenshtein distance
• N-grams
Similarity finding
algorithms
• Score Similarity
• Cosine Similarity
• Dice Similarity
• Bi-gram Cosine Similarity
29. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 29
Atarashi: Workflow
Process Input
File
• Extract
comments
• Normalize text
Match SPDX
headers and
SPDX
identifiers
Apply distance
finding
algorithms
Rank results
based on
similarity
Generate
the
output
30. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 30
Integration with ClearlyDefined (Spasht) and Software Heritage.
Making Conclusion easier
Disclaimer: All third party logos and icons referenced by this slide are the property of their respective owners. They are just used to highlight the UI.
31. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 31
FOSS-dash helps to extract meaningful data from fossology_DB and exported those metrics to
the time series Influx database. Grafana query tool used to query those metrics and visualized
them with the help of charts and graphs in the Dashboard.
Dashboard
Disclaimer: All third party logos and icons referenced by this slide are the property of their respective owners. They are just used to highlight the UI.
32. © 2016-2020 Siemens AG, Linux Foundation - CC-BY-SA 4.0
The FOSSology Project 32
Thank you for your attention!
© 2016-2020 Siemens AG, The Linux Foundation
CC-BY-SA 4.0
https://creativecommons.org/licenses/by-sa/4.0/
Internet
https://www.fossology.org
GitHub
https://github.com/fossology/fossology
Further Links
https://www.spdx.org
https://www.openchainproject.org
https://github.com/eclipse/sw360
Contact :
FOSSology Mailing list
• fossology@fossology.org
Email us
• shaheem.azmal@siemens.com
• mishra.gaurav@siemens.com