SlideShare a Scribd company logo
1 of 32
Download to read offline
Software Ecosystems
=
Big Data !
Prof. Dr. Tom Mens
Software Engineering Lab
tom.mens@umons.ac.be
Big Data Analytics of
Software Ecosystem Health
A software ecosystem is a
collection of [inderdependent]
software projects that are
developed and evolve together in
the same environment.
Mircea Lungu
(PhD, 2008)
Software Ecosystems = Big Data
Volume Velocity
Variety Veracity
4V
Software Ecosystems = Big Data
Volume: software ecosystems involve huge
quantities of data
Debian (Linux distribution)
Archive http://snapshot.debian.org containing daily
snapshots of packages, maintainers, dependencies, ...
"Snapshot keeps growing. We are now at approximate 60TB of files.
This made it necessary to break up the RAID-1 mirror across two
external storage arrays ..., and it also meant we needed more machines
(now six) at our mirrorsite ..."
Debian bug tracker http://methyer.ethz.ch/bts/
• 122 thousand active bugs; 779 thousand archived bugs
Debian security tracker
• 29 thousand security vulnerabilities
Software Ecosystems = Big Data
Volume: software ecosystems involve huge
quantities of data
Example: software package manager for JavaScript
Created in 2010.
In 2017:
• 3.5TB of storage required for hosting 500K packages
• 2.3 million opened GitHub pull requests for JavaScript
repositories
March 2018:
• ~0,7 million packages
• ~4,4 million package releases
• ~19,8 million (runtime) package dependencies
Software Ecosystems = Big Data
Volume: software ecosystems involve huge
quantities of data
GHTorrent: (partial) datadump of GitHub on April 2018
• 24,1 million users
• 83,6 million projects
• 67,4 million issues
• 34 million pull requests
• 930 million commits
Software Ecosystems = Big Data
Volume: software ecosystems involve huge
quantities of data
RubyGems software package manager for Ruby (since 2004)
Ecosystem size in March 2018:
• ~144 thousand packages
• ~825 thousand package releases
• ~2 million (runtime) package dependencies
Some data for the Ruby on Rails project:
68,980 commits
346 releases
3,570 contributors
> 11k issues; > 21k pull requests; >16k forks on GitHub
> 11k dependent packages; > 458k dependent repositories
Software Ecosystems = Big Data
Variety: software ecosystems involve very
heterogenous data sources
• Structured data: source code, dependency graphs,
version control systems, ...
• Semi-structured data: e.g. mailing lists, online surveys,
social media, Q&A websites, ...
• Unstructured data: unformatted text, video and voice
recordings of interviews, field notes
Heterogeneity
• Source code, packaging metadata, models,
documentation, tests, databases, bug and issue reports, ...
• Multiple programming/natural languages
• Cultural differences
Software Ecosystems = Big Data
Veracity: software ecosystem analysis
requires dealing with uncertain, inconsistent,
invalid and missing data
Examples:
• Missing: Corrupted or lost historical data (voluntarily or
not).
E.g., removed projects/user profiles; "rebasing" the
version history
• Uncertain: Different data source may disagree è which
one is correct?
• Invalid/inconsistent: Especially data produced by humans
Software Ecosystems = Big Data
Velocity: software ecosystems are growing rapidly
• New commits are made to GitHub several times every second
• For web-based development analytics dashboards, or
automated recommendation systems, using the most recent
data is important to make informed decisions
2012 2013 2014 2015 2016 2017
100
101
102
103
104
105
106
number of packages (log)
cargo
cpan
cran
npm
nuget
packagist
rubygems
2012 2013 2014 2015 2016 2017
100
101
102
103
104
105
106
107
108
number of dependencies (log)
cargo
cpan
cran
npm
nuget
packagist
rubygems
A. Decan, T. Mens, Ph. Grosjean. An Empirical Comparison of Dependency Network
Evolution in Seven Software Packaging Ecosystems. Empirical Software Engineering, 2018
Software Ecosystems = Big Data
Many Challenges
• How to retrieve data?
• Data provider services (e.g. libraries.io, GHTorrent, GitHubArchive, ...)
• How to avoid "abusing" their APIs?
• How to deal with changes in APIs?
• How to deal with data veracity?
• How to analyse data?
• Storing and sharing such amounts of data requires specific hardware
• Processing data can take a lot of time; several weeks not uncommon
• How to report such huge amounts of data?
Aggregating results
Adapted visualisation techniques
Software Ecosystems = Big Data
Challenges continued
• How to combine data originating from different sources?
O(n*m) time complexity not affordable because n and m
too large
• How to identify "interesting" and "relevant" data?
• Need new data cleaning and data mining techniques
capable of dealing with this amount of data
• Providing "incremental" solutions that keep the extracted
data up-to-date
• Dealing with identities of individuals
• Identity merging
• Preserving privacy and anonymity
Research Context
• Today over 80 percent of all software in any technology product
or service is open source software (OSS).
• CHAOSS focuses on creating analytics and metrics to help
define OSS community health.
https://chaoss.community
"The CHAOSS community is developing metrics, methodologies, and
software for expressing open source project health and sustainability. By
doing so, CHAOSS seeks to improve the transparency of open source
project health and sustainability so that relevant stakeholders can make
more informed decisions about open source project engagement."
University of Mons
Laval
University
Polytechnique
Montréal
Université de Mons
www.secohealth.org
@secohealth
2017-2019
University of Mons
Laval
University
Polytechnique
Montréal
Université de Mons
www.secohealth.org
@secohealth
Best
Practices
Best
Practices
Practices
Best
3. Provive recommendations
and guidelines to avoid future
software health problems
1. Determine indicators of
software health issues
2. Predict the impact and
propagation of health
issues
time
• Bugs
• Security vulnerabilities
• Dependency problems
• Abandoned or outdated software
• Redundant or duplicated code
• Incompatible software licences
• ...
Technical
• Lack of communication / interaction
• Social conflicts
• Contributor abandonment
• Insufficient diversity
• Cultural differences
• ..
Ecosystem Health Issues
Example: leftpad
seco-assist.github.io
@seco-assist
2018-2021
seco-assist.github.io
@seco-assist
2018-2021
SECO-ASSIST aims to provide novel software recommendation
techniques to address the software ecosystem challenges of
longevity, scale, heterogeneity, and community. This will be
achieved by combining socio-technical analysis, database usage
analysis, library evolution and software test automation.
UMONS UNamur
UAntwerpenVUB
Tom
Mens
Anthony
Cleve
Coen
De Roover
Serge
Demeyer
• Improve
software testing
and prevent
bugs
• Improve
software library
reuse
• Optimise
database
usage
• Improve
developer
team
interaction
UMONS UNamur
UAntwerpenVUB
SECO-ASSIST Goals
Improve social health
• Retain key contributors and attract new ones
• Predict abandoners and find replacements
• Identify toxic contributors
• Ensure sufficient diversity
SECO-ASSIST Goals
Improve technical health
• Better software tests, taking into account the software
dependencies
è less bugs and security issues
• Higher productivity and quality by using reusable
software libraries
• Increased maintainability by supporting upgrades and
migrations (to new libraries, other technologies, ...)
Current Research
Empirical studies on historical software ecosystem data
to analyse and understand
• Software contributor retention and abandonment
• The propagation of health problems through technical
dependencies in a software ecosystem
• The impact of "technical lag" caused by outdated
dependencies
• The impact of security vulnerabilities
Analysing Security
Vulnerabilities in
time
How long do packages
remain vulnerable?
It takes a long time before vulnerabilities are
removed from a package.
When are vulnerabilities fixed?
+ Most vulnerabilities are quickly fixed after their discovery.
- ~20% of vulnerabilities take more than 1 year to be fixed.
When are vulnerabilities fixed
in dependent packages?
Depending packages are vulnerable much
longer! Package maintainers must use security
monitoring tools, and adapt their dependency
constraints to quickly benefit from security fixes
References
• E. Constantinou, T. Mens. An Empirical Comparison of Developer
Retention in the RubyGems and npm Software Ecosystems.
Innovations in Systems and Software Engineering, 2017
• A. Decan, T. Mens, Ph. Grosjean. An Empirical Comparison of
Dependency Network Evolution in Seven Software Packaging
Ecosystems. Empirical Software Engineering, 2018
• A. Zerouali, E. Constantinou, T. Mens, G. Robles, J. Gonzalez-
Barahona. An empirical analysis of technical lag in npm package
dependencies. ICSR 2018
• A. Decan, T. Mens, E. Constantinou. On the impact of security
vulnerabilities in the npm package dependency network. MSR
2018
Questions?

More Related Content

What's hot

Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
 
EPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data TalkEPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data TalkAdina Chuang Howe
 
Research software susainability
Research software susainabilityResearch software susainability
Research software susainabilityDaniel S. Katz
 
Taming the Big Data Beast - Together
Taming the Big Data Beast - TogetherTaming the Big Data Beast - Together
Taming the Big Data Beast - TogetherKennisalliantie
 
Heartbeat: measuring installed base by analyzing downloads and Scientific S...
Heartbeat: measuring installed base by analyzing downloads and Scientific S...Heartbeat: measuring installed base by analyzing downloads and Scientific S...
Heartbeat: measuring installed base by analyzing downloads and Scientific S...James Howison
 
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical ScienceAri Berman
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...Natalie Stanford
 
NSF Data Management Plan Case Study: UVa’s Response.
NSF Data Management Plan Case Study:  UVa’s Response.NSF Data Management Plan Case Study:  UVa’s Response.
NSF Data Management Plan Case Study: UVa’s Response.Andrew Sallans
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Dan Taylor
 
Executive Summary - Data Management Hub
Executive Summary - Data Management HubExecutive Summary - Data Management Hub
Executive Summary - Data Management HubDenis Parfenov
 
SLIDES | 12 time-saving tips for research support
SLIDES | 12 time-saving tips for research supportSLIDES | 12 time-saving tips for research support
SLIDES | 12 time-saving tips for research supportLibrary_Connect
 
Trust threads: Provenance for Data Reuse in Long Tail Science
Trust threads: Provenance for Data Reuse in Long Tail ScienceTrust threads: Provenance for Data Reuse in Long Tail Science
Trust threads: Provenance for Data Reuse in Long Tail ScienceBeth Plale
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...GarethKnight
 
Citation and reproducibility in software
Citation and reproducibility in softwareCitation and reproducibility in software
Citation and reproducibility in softwareDaniel S. Katz
 
OTN Gambia 2008
OTN Gambia 2008OTN Gambia 2008
OTN Gambia 2008Greg Fegan
 

What's hot (20)

Sgci data west 12-15-16
Sgci data west 12-15-16Sgci data west 12-15-16
Sgci data west 12-15-16
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
Sgci nsf-2-22-17
Sgci nsf-2-22-17Sgci nsf-2-22-17
Sgci nsf-2-22-17
 
EPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data TalkEPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data Talk
 
Research software susainability
Research software susainabilityResearch software susainability
Research software susainability
 
20171003 lancaster data conversations Chue-Hong
20171003 lancaster data conversations Chue-Hong20171003 lancaster data conversations Chue-Hong
20171003 lancaster data conversations Chue-Hong
 
Taming the Big Data Beast - Together
Taming the Big Data Beast - TogetherTaming the Big Data Beast - Together
Taming the Big Data Beast - Together
 
Heartbeat: measuring installed base by analyzing downloads and Scientific S...
Heartbeat: measuring installed base by analyzing downloads and Scientific S...Heartbeat: measuring installed base by analyzing downloads and Scientific S...
Heartbeat: measuring installed base by analyzing downloads and Scientific S...
 
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...
 
Data Landscapes - Addiction
Data Landscapes - AddictionData Landscapes - Addiction
Data Landscapes - Addiction
 
NSF Data Management Plan Case Study: UVa’s Response.
NSF Data Management Plan Case Study:  UVa’s Response.NSF Data Management Plan Case Study:  UVa’s Response.
NSF Data Management Plan Case Study: UVa’s Response.
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
Executive Summary - Data Management Hub
Executive Summary - Data Management HubExecutive Summary - Data Management Hub
Executive Summary - Data Management Hub
 
SLIDES | 12 time-saving tips for research support
SLIDES | 12 time-saving tips for research supportSLIDES | 12 time-saving tips for research support
SLIDES | 12 time-saving tips for research support
 
Trust threads: Provenance for Data Reuse in Long Tail Science
Trust threads: Provenance for Data Reuse in Long Tail ScienceTrust threads: Provenance for Data Reuse in Long Tail Science
Trust threads: Provenance for Data Reuse in Long Tail Science
 
CV_Schroeder_2014_12
CV_Schroeder_2014_12CV_Schroeder_2014_12
CV_Schroeder_2014_12
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...
 
Citation and reproducibility in software
Citation and reproducibility in softwareCitation and reproducibility in software
Citation and reproducibility in software
 
OTN Gambia 2008
OTN Gambia 2008OTN Gambia 2008
OTN Gambia 2008
 

Similar to Software Ecosystems = Big Data

Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringTao Xie
 
Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)Daniel S. Katz
 
Sustainability in Scientific Software: Ecosystem complexity and Software Vis...
Sustainability in Scientific Software:Ecosystem complexityandSoftware Vis...Sustainability in Scientific Software:Ecosystem complexityandSoftware Vis...
Sustainability in Scientific Software: Ecosystem complexity and Software Vis...James Howison
 
On the health of the npm packaging ecosystem
On the health of the npm packaging ecosystemOn the health of the npm packaging ecosystem
On the health of the npm packaging ecosystemTom Mens
 
e-infrastructural needs to support informatics
e-infrastructural needs to support informaticse-infrastructural needs to support informatics
e-infrastructural needs to support informaticsDavid Wallom
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesDaniel S. Katz
 
RDA BoF on Sustainability - my experience with ISA tools
RDA BoF on Sustainability - my experience with ISA toolsRDA BoF on Sustainability - my experience with ISA tools
RDA BoF on Sustainability - my experience with ISA toolsSusanna-Assunta Sansone
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software DatasetsTao Xie
 
Implementing policy @ WSSSPE
Implementing policy @ WSSSPEImplementing policy @ WSSSPE
Implementing policy @ WSSSPEDaisie Huang
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing dataWorld Agroforestry (ICRAF)
 
Software: impact, metrics, and citation
Software: impact, metrics, and citationSoftware: impact, metrics, and citation
Software: impact, metrics, and citationDaniel S. Katz
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformaticsStephen Turner
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Software Management Plans and Software as Data
Software Management Plans and Software as DataSoftware Management Plans and Software as Data
Software Management Plans and Software as DataSarah Anna Stewart
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
 
Fundamentals of software sustainability
Fundamentals of software sustainabilityFundamentals of software sustainability
Fundamentals of software sustainabilityDaniel S. Katz
 
Scientific software sustainability and ecosystem complexity
Scientific software sustainability and ecosystem complexityScientific software sustainability and ecosystem complexity
Scientific software sustainability and ecosystem complexityJames Howison
 

Similar to Software Ecosystems = Big Data (20)

Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)
 
Sustainability in Scientific Software: Ecosystem complexity and Software Vis...
Sustainability in Scientific Software:Ecosystem complexityandSoftware Vis...Sustainability in Scientific Software:Ecosystem complexityandSoftware Vis...
Sustainability in Scientific Software: Ecosystem complexity and Software Vis...
 
On the health of the npm packaging ecosystem
On the health of the npm packaging ecosystemOn the health of the npm packaging ecosystem
On the health of the npm packaging ecosystem
 
e-infrastructural needs to support informatics
e-infrastructural needs to support informaticse-infrastructural needs to support informatics
e-infrastructural needs to support informatics
 
Sgci esip-7-20-18
Sgci esip-7-20-18Sgci esip-7-20-18
Sgci esip-7-20-18
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community Responses
 
RDA BoF on Sustainability - my experience with ISA tools
RDA BoF on Sustainability - my experience with ISA toolsRDA BoF on Sustainability - my experience with ISA tools
RDA BoF on Sustainability - my experience with ISA tools
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
Implementing policy @ WSSSPE
Implementing policy @ WSSSPEImplementing policy @ WSSSPE
Implementing policy @ WSSSPE
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
Ilik - Beyond the Manuscript: Using IRs for Non Traditional Content Types
Ilik - Beyond the Manuscript: Using IRs for Non Traditional Content TypesIlik - Beyond the Manuscript: Using IRs for Non Traditional Content Types
Ilik - Beyond the Manuscript: Using IRs for Non Traditional Content Types
 
Software: impact, metrics, and citation
Software: impact, metrics, and citationSoftware: impact, metrics, and citation
Software: impact, metrics, and citation
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Software Management Plans and Software as Data
Software Management Plans and Software as DataSoftware Management Plans and Software as Data
Software Management Plans and Software as Data
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
Fundamentals of software sustainability
Fundamentals of software sustainabilityFundamentals of software sustainability
Fundamentals of software sustainability
 
Scientific software sustainability and ecosystem complexity
Scientific software sustainability and ecosystem complexityScientific software sustainability and ecosystem complexity
Scientific software sustainability and ecosystem complexity
 

More from Tom Mens

How to be(come) a successful PhD student
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD studentTom Mens
 
Recognising bot activity in collaborative software development
Recognising bot activity in collaborative software developmentRecognising bot activity in collaborative software development
Recognising bot activity in collaborative software developmentTom Mens
 
A Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHubA Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHubTom Mens
 
The (r)evolution of CI/CD on GitHub
 The (r)evolution of CI/CD on GitHub The (r)evolution of CI/CD on GitHub
The (r)evolution of CI/CD on GitHubTom Mens
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureTom Mens
 
Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?Tom Mens
 
On the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHubOn the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHubTom Mens
 
On backporting practices in package dependency networks
On backporting practices in package dependency networksOn backporting practices in package dependency networks
On backporting practices in package dependency networksTom Mens
 
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and RubygemsComparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and RubygemsTom Mens
 
Lost in Zero Space
Lost in Zero SpaceLost in Zero Space
Lost in Zero SpaceTom Mens
 
Evaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messagesEvaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messagesTom Mens
 
Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!Tom Mens
 
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...Tom Mens
 
On the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystemsOn the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystemsTom Mens
 
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...Tom Mens
 
Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)Tom Mens
 
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)Tom Mens
 
SecoHealth 2019 Research Achievements
SecoHealth 2019 Research AchievementsSecoHealth 2019 Research Achievements
SecoHealth 2019 Research AchievementsTom Mens
 
SECO-Assist 2019 research seminar
SECO-Assist 2019 research seminarSECO-Assist 2019 research seminar
SECO-Assist 2019 research seminarTom Mens
 
Empirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package ManagersEmpirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package ManagersTom Mens
 

More from Tom Mens (20)

How to be(come) a successful PhD student
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
 
Recognising bot activity in collaborative software development
Recognising bot activity in collaborative software developmentRecognising bot activity in collaborative software development
Recognising bot activity in collaborative software development
 
A Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHubA Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHub
 
The (r)evolution of CI/CD on GitHub
 The (r)evolution of CI/CD on GitHub The (r)evolution of CI/CD on GitHub
The (r)evolution of CI/CD on GitHub
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the Future
 
Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?
 
On the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHubOn the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHub
 
On backporting practices in package dependency networks
On backporting practices in package dependency networksOn backporting practices in package dependency networks
On backporting practices in package dependency networks
 
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and RubygemsComparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
 
Lost in Zero Space
Lost in Zero SpaceLost in Zero Space
Lost in Zero Space
 
Evaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messagesEvaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messages
 
Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!
 
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
 
On the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystemsOn the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystems
 
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
 
Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)
 
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
 
SecoHealth 2019 Research Achievements
SecoHealth 2019 Research AchievementsSecoHealth 2019 Research Achievements
SecoHealth 2019 Research Achievements
 
SECO-Assist 2019 research seminar
SECO-Assist 2019 research seminarSECO-Assist 2019 research seminar
SECO-Assist 2019 research seminar
 
Empirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package ManagersEmpirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package Managers
 

Recently uploaded

Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayZachary Labe
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxVarshiniMK
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Masticationvidulajaib
 

Recently uploaded (20)

Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work Day
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Mastication
 

Software Ecosystems = Big Data

  • 2. Prof. Dr. Tom Mens Software Engineering Lab tom.mens@umons.ac.be Big Data Analytics of Software Ecosystem Health
  • 3. A software ecosystem is a collection of [inderdependent] software projects that are developed and evolve together in the same environment. Mircea Lungu (PhD, 2008)
  • 4. Software Ecosystems = Big Data Volume Velocity Variety Veracity 4V
  • 5. Software Ecosystems = Big Data Volume: software ecosystems involve huge quantities of data Debian (Linux distribution) Archive http://snapshot.debian.org containing daily snapshots of packages, maintainers, dependencies, ... "Snapshot keeps growing. We are now at approximate 60TB of files. This made it necessary to break up the RAID-1 mirror across two external storage arrays ..., and it also meant we needed more machines (now six) at our mirrorsite ..." Debian bug tracker http://methyer.ethz.ch/bts/ • 122 thousand active bugs; 779 thousand archived bugs Debian security tracker • 29 thousand security vulnerabilities
  • 6. Software Ecosystems = Big Data Volume: software ecosystems involve huge quantities of data Example: software package manager for JavaScript Created in 2010. In 2017: • 3.5TB of storage required for hosting 500K packages • 2.3 million opened GitHub pull requests for JavaScript repositories March 2018: • ~0,7 million packages • ~4,4 million package releases • ~19,8 million (runtime) package dependencies
  • 7. Software Ecosystems = Big Data Volume: software ecosystems involve huge quantities of data GHTorrent: (partial) datadump of GitHub on April 2018 • 24,1 million users • 83,6 million projects • 67,4 million issues • 34 million pull requests • 930 million commits
  • 8. Software Ecosystems = Big Data Volume: software ecosystems involve huge quantities of data RubyGems software package manager for Ruby (since 2004) Ecosystem size in March 2018: • ~144 thousand packages • ~825 thousand package releases • ~2 million (runtime) package dependencies Some data for the Ruby on Rails project: 68,980 commits 346 releases 3,570 contributors > 11k issues; > 21k pull requests; >16k forks on GitHub > 11k dependent packages; > 458k dependent repositories
  • 9. Software Ecosystems = Big Data Variety: software ecosystems involve very heterogenous data sources • Structured data: source code, dependency graphs, version control systems, ... • Semi-structured data: e.g. mailing lists, online surveys, social media, Q&A websites, ... • Unstructured data: unformatted text, video and voice recordings of interviews, field notes Heterogeneity • Source code, packaging metadata, models, documentation, tests, databases, bug and issue reports, ... • Multiple programming/natural languages • Cultural differences
  • 10. Software Ecosystems = Big Data Veracity: software ecosystem analysis requires dealing with uncertain, inconsistent, invalid and missing data Examples: • Missing: Corrupted or lost historical data (voluntarily or not). E.g., removed projects/user profiles; "rebasing" the version history • Uncertain: Different data source may disagree è which one is correct? • Invalid/inconsistent: Especially data produced by humans
  • 11. Software Ecosystems = Big Data Velocity: software ecosystems are growing rapidly • New commits are made to GitHub several times every second • For web-based development analytics dashboards, or automated recommendation systems, using the most recent data is important to make informed decisions 2012 2013 2014 2015 2016 2017 100 101 102 103 104 105 106 number of packages (log) cargo cpan cran npm nuget packagist rubygems 2012 2013 2014 2015 2016 2017 100 101 102 103 104 105 106 107 108 number of dependencies (log) cargo cpan cran npm nuget packagist rubygems A. Decan, T. Mens, Ph. Grosjean. An Empirical Comparison of Dependency Network Evolution in Seven Software Packaging Ecosystems. Empirical Software Engineering, 2018
  • 12. Software Ecosystems = Big Data Many Challenges • How to retrieve data? • Data provider services (e.g. libraries.io, GHTorrent, GitHubArchive, ...) • How to avoid "abusing" their APIs? • How to deal with changes in APIs? • How to deal with data veracity? • How to analyse data? • Storing and sharing such amounts of data requires specific hardware • Processing data can take a lot of time; several weeks not uncommon • How to report such huge amounts of data? Aggregating results Adapted visualisation techniques
  • 13. Software Ecosystems = Big Data Challenges continued • How to combine data originating from different sources? O(n*m) time complexity not affordable because n and m too large • How to identify "interesting" and "relevant" data? • Need new data cleaning and data mining techniques capable of dealing with this amount of data • Providing "incremental" solutions that keep the extracted data up-to-date • Dealing with identities of individuals • Identity merging • Preserving privacy and anonymity
  • 14. Research Context • Today over 80 percent of all software in any technology product or service is open source software (OSS). • CHAOSS focuses on creating analytics and metrics to help define OSS community health. https://chaoss.community "The CHAOSS community is developing metrics, methodologies, and software for expressing open source project health and sustainability. By doing so, CHAOSS seeks to improve the transparency of open source project health and sustainability so that relevant stakeholders can make more informed decisions about open source project engagement."
  • 15. University of Mons Laval University Polytechnique Montréal Université de Mons www.secohealth.org @secohealth 2017-2019
  • 17. Best Practices Best Practices Practices Best 3. Provive recommendations and guidelines to avoid future software health problems 1. Determine indicators of software health issues 2. Predict the impact and propagation of health issues time
  • 18. • Bugs • Security vulnerabilities • Dependency problems • Abandoned or outdated software • Redundant or duplicated code • Incompatible software licences • ... Technical • Lack of communication / interaction • Social conflicts • Contributor abandonment • Insufficient diversity • Cultural differences • .. Ecosystem Health Issues
  • 21. seco-assist.github.io @seco-assist 2018-2021 SECO-ASSIST aims to provide novel software recommendation techniques to address the software ecosystem challenges of longevity, scale, heterogeneity, and community. This will be achieved by combining socio-technical analysis, database usage analysis, library evolution and software test automation.
  • 23. • Improve software testing and prevent bugs • Improve software library reuse • Optimise database usage • Improve developer team interaction UMONS UNamur UAntwerpenVUB
  • 24. SECO-ASSIST Goals Improve social health • Retain key contributors and attract new ones • Predict abandoners and find replacements • Identify toxic contributors • Ensure sufficient diversity
  • 25. SECO-ASSIST Goals Improve technical health • Better software tests, taking into account the software dependencies è less bugs and security issues • Higher productivity and quality by using reusable software libraries • Increased maintainability by supporting upgrades and migrations (to new libraries, other technologies, ...)
  • 26. Current Research Empirical studies on historical software ecosystem data to analyse and understand • Software contributor retention and abandonment • The propagation of health problems through technical dependencies in a software ecosystem • The impact of "technical lag" caused by outdated dependencies • The impact of security vulnerabilities
  • 28. How long do packages remain vulnerable? It takes a long time before vulnerabilities are removed from a package.
  • 29. When are vulnerabilities fixed? + Most vulnerabilities are quickly fixed after their discovery. - ~20% of vulnerabilities take more than 1 year to be fixed.
  • 30. When are vulnerabilities fixed in dependent packages? Depending packages are vulnerable much longer! Package maintainers must use security monitoring tools, and adapt their dependency constraints to quickly benefit from security fixes
  • 31. References • E. Constantinou, T. Mens. An Empirical Comparison of Developer Retention in the RubyGems and npm Software Ecosystems. Innovations in Systems and Software Engineering, 2017 • A. Decan, T. Mens, Ph. Grosjean. An Empirical Comparison of Dependency Network Evolution in Seven Software Packaging Ecosystems. Empirical Software Engineering, 2018 • A. Zerouali, E. Constantinou, T. Mens, G. Robles, J. Gonzalez- Barahona. An empirical analysis of technical lag in npm package dependencies. ICSR 2018 • A. Decan, T. Mens, E. Constantinou. On the impact of security vulnerabilities in the npm package dependency network. MSR 2018