Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Empirically Analysing the Socio-Technical Health of Software Package Managers

415 views

Published on

Invited presentation at Concordia University (Montreal, Canada) by Eleni Constantinou and Tom Mens on recent research about the socio-technical health issues in software package management ecosystems.
Abstract: The large majority of today’s software is relying on open software software components. Such components are typically distributed through package managers for a wide variety of programming languages, and developed and maintained through online distributed software development services like GitHub. Software component repositories are perceived as software ecosystems that constitute complex and evolving socio-technical software dependency networks. Because of their complexity and evolution, these ecosystems tend to suffer from a wide variety of software health issues that can be either technical or social in nature. Examples of such issues include the ecosystem fragility due to exponential growth and transitive dependencies; the abundance of outdated, unmaintained or obsolete software components; the prolonged presence of unfixed bugs and security vulnerabilities; the abandonment or high turnover of key contributors, suboptimal collaboration between contributors, and many more. This presentation will report on our past and ongoing empirical research that studies such health factors within and across different software packaging ecosystems (such as npm, RubyGems, Cargo, CRAN, CPAN). We provide empirical evidence of some of the health problems, compare their presence across different ecosystems, and suggest ways to reduce their potential impact by providing concrete guidelines and tools. The presented research Is being conducted by researchers of the Software Engineering Lab at the University of Mons in the context of two ongoing projects SECOHealth and SECO-ASSIST, aiming to analyse and improve the health of software ecosystems.

Published in: Software
  • Be the first to comment

Empirically Analysing the Socio-Technical Health of Software Package Managers

  1. 1. Tom Mens Eleni Constantinou Alexandre Decan Ahmed Zerouali Software Engineering Lab tom.mens@umons.ac.be @tom_mens econst@gmail.com @eleni_const Empirically Analysing the Socio-Technical Health of Software Package Managers
  2. 2. SECO-ASSIST "Excellence of Science” Research Project 2018-2021 secoassist.github.io@secoassist
  3. 3. Tom Mens University of Mons Eleni Constantinou University of Mons Bram Adams Polytechnique Montreal Josianne Marsan Laval University Towards an interdisciplinary, socio-technical methodology and analysis of the health of software ecosystems secohealth.github.io @secohealth
  4. 4. SOCIO- TECHNICAL A Software Ecosystem is X
  5. 5. TECHNICAL HEALTH Ecosystem Health Issues • Outdated dependencies • Security vulnerabilities • Bugs • Dependency hell • Co-installability issues • Backward incompatibility • Abandonment of contributors • Lack of communication / interaction • Social conflicts • Insufficient diversity
  6. 6. Comparing the health of dependency networks of popular software package managers
  7. 7. Comparing the health of dependency networks of popular software package managers A Decan et al. (Feb. 2019) An Empirical Comparison of Dependency Network Evolution in Seven Software Packaging Ecosystems. Emp. Softw. Eng. 24(1), Springer A Decan, T Mens (May 2019) What do package dependencies tell us about semantic versioning? IEEE Trans. Softw. Eng. [In Press] A Zerouali et al. (Feb. 2019) A formal framework for measuring technical lag in component repositories – and its application to npm? J. Softw. Evolution and Process A Decan et al. (2018) On the impact of security vulnerabilities in the npm package dependency network. Int’l Conf. MSR E Constantinou et al. (2017) An empirical comparison of developer retention in the Rubygems and npm software ecosystems. Innovations in Systems and Softw. Eng. 13(2-3) E Constantinou, T Mens (2017) Socio-technical evolution of the Ruby ecosystem in GitHub. Int’l Conf. SANER
  8. 8. Libraries.io monitors 3,903,576 open source packages across 36 different package managers
  9. 9. Characterising the evolution of package dependency networks Decan & Mens (2019) An Empirical Comparison of Dependency Network Evolution in Seven Software Packaging Ecosystems. Empirical Software Engineering 830K packages – 5.8M package versions – 20.5M dependencies (April 2017)
  10. 10. Characterising the evolution of package dependency networks • Growth rate • Package update frequency • Connectedness of dependency network • Prevalence of transitive dependencies
  11. 11. Motivation Package updates may cause many maintainability issues or even failures in dependent packages. "Especially with respect to package dependencies, the risk of things breaking at some point due to the fact that a version of a dependency has changed without you knowing about it is immense. That actually cost us weeks and months in a couple of professional projects I was part of."
  12. 12. March 2016 Unexpected removal of left-pad caused > 2% of all packages to become uninstallable (> 5,400 packages) November 2010 Release 0.5.0 of i18n broke dependent package ActiveRecord that was transitively required by >5% of all packages Motivation
  13. 13. Example: leftpad
  14. 14. Package dependency networks grow over time
  15. 15. Dependency networks grow over time
  16. 16. Package changes are frequent Findings • #package updates grows over time • >50% of package releases are updated within 2 months • Required and young packages are updated more frequently Changeability index: Maximal value n such that there exist n packages having been updated at least n times during the last month.
  17. 17. Most packages depend on other packages Findings • 60% to 80% of all packages are connected • A stable minority (20%) of required packages collect over 80% of all reverse dependencies • # npm dependencies grows much faster Reusability index: Maximal value n such that there exist n required packages having at least n dependent packages.
  18. 18. Proportion of top-level packages by depth of dependency tree Dependency network complexity Most of the complexity is deeply hidden … … in the transitive dependencies Over 50% of top-level packages have deep dependency tree.
  19. 19. Package changes may have important transitive impact Evolution of 5-Impact Index Findings • Dependent packages have few direct but many transitive dependencies • Ratio of indirect over direct dependencies increases over time P-Impact Index : Number of packages that are transitively required by at least P% of all packages.
  20. 20. Avoid packages with outdated dependencies 1 out of 3 dependents never � update their dependency A Zerouali et al (Feb. 2019) A formal framework for measuring technical lag in component repositories – and its application to npm. Wiley Journal on Software Evolution and Process
  21. 21. Avoid packages with outdated dependencies https://chaoss.community Should package maintainers upgrade their dependencies to more recent versions? �Upgrades benefit from bug and security fixes � Upgrading allows to use new features �Upgrading requires effort � Upgrading may introduce breaking changes
  22. 22. Semantic Versioning • For package providers: Inform your dependents about which releases are backwards incompatible • For package consumers: Decide and control which newer dependency releases are permitted major minor patch 3 9 2 Breaking changes Backwards compatible changes Bug fixes
  23. 23. Technical Lag in npm https://chaoss.community Technical lag measures how outdated a package or dependency is w.r.t. the “ideal” situation where “ideal” = “most recent”; “most secure”; ”most stable”; “most compatible”; …
  24. 24. All Dependencies Outdated Dependencies Technical Lag in npm Outdatedness is related to the type of dependency constraint being used Strict constraints represent about 20% of all dependencies, but about 33% of all outdated dependencies
  25. 25. Suggestion: Rely on semantic versioning when defining dependency constraints A Decan, T Mens (May 2019) What do package dependencies tell us about semantic versioning? Transactions on Software Engineering, IEEE [In Press] Does npm behave similarly to other package managers (Cargo, Packagist and RubyGems)? January 2018 dataset from libraries.io:
  26. 26. Suggestion: Rely on semantic versioning when defining dependency constraints Different package managers interpret version constraints in different ways: More restrictive than semver More permissive than semver
  27. 27. Suggestion: Rely on semantic versioning when defining dependency constraints Proportion of dependency constraints (for package releases ≥1.0.0) that are semver-compliant, more permissive, or more restrictive
  28. 28. Suggestion: Rely on semantic versioning when defining dependency constraints • Cargo, npm and Packagist are mostly semver- compliant. • All considered ecosystems become more compliant over time. • More than 16% of the constraints in npm, Packagist and Rubygems are restrictive, preventing backward compatible upgrades from being automatically adopted.
  29. 29. Security vulnerabilities Vulnerable code introduced in 2012 • Allowed anyone on the Internet to read the memory of the systems protected by OpenSSL • Simple programming mistake Discovered and traced in April 2014 • 0.5M servers certified by trusted authorities were believed to be a affected
  30. 30. Security vulnerabilities Vulnerability introduction Vulnerability discovery Vulnerability publication Vulnerability fixed time
  31. 31. Security vulnerabilities in OSS
  32. 32. Security vulnerabilities in npm Package metadata from libraries.io on November 2017 610,097 packages Packages releases 4,202,099 Runtime package dependencies 20,240,402 399 vulnerabilities Affected packages 269 # releases of affected packages 14,931 # affected releases 6,752 Vulnerabilities from snyk.io
  33. 33. When are vulnerabilities discovered? Vulnerability introduction Vulnerability discovery Vulnerability publication Vulnerability fixed time
  34. 34. When are vulnerabilities discovered? >40% of all vulnerabilities are not discovered even 2 years after their introduction, regardless of their severity.
  35. 35. Tool support should strive to discover vulnerabilities sooner. When are vulnerabilities discovered?
  36. 36. When are vulnerabilities fixed? Vulnerability introduction Vulnerability discovery Vulnerability publication Vulnerability fixed time
  37. 37. When are vulnerabilities fixed? + Most vulnerabilities are quickly fixed after their discovery. - ~20% of vulnerabilities take more than 1 year to be fixed. Most vulnerabilities are fixed after the reported discovery date but before they become public.
  38. 38. When are vulnerabilities fixed? Vulnerabilities must be fixed early/before public announcement. Unmaintained vulnerable packages should be deprecated.
  39. 39. When are vulnerabilities fixed in dependent packages? Vulnerable packages # vulnerable packages 269 # releases of vulnerable packages 14,931 # vulnerable releases 6,752 # dependent packages 133,602 # dependent packages affected by the vulnerable packages 72,470
  40. 40. Unfixed dependent packages Improper or too restrictive use of dependency constraints Dependent package is no longer actively maintained Maintainers of dependent packages are not aware of the vulnerability or the fix Fixed version of the dependency contains incompatible changes
  41. 41. Security vulnerabilities in npm Use security monitoring tools Deprecate unmaintained obsolete packages Use better versioning and security policies
  42. 42. Socio-technical evolution r1.0.0 r1.1.0 r1.2.0 r1.3.0 r1.0.0 r1.0.1 r1.1.0 Project 1 Project 2 Project 5 Project 4 Project 3
  43. 43. Socio-technical evolution 144K packages 70K repositories 42K contributors E Constantinou, T Mens (Feb 2017) Socio-technical evolution of the Ruby ecosystem in GitHub. SANER 2017
  44. 44. Socio-technical evolution metrics 𝑇𝑒𝑎𝑚𝑅𝑒𝑛𝑒𝑤𝑎𝑙(𝑡) = |𝐽𝑜𝑖𝑛𝑒𝑟𝑠(𝑡)| | 𝑐 𝑖𝑠𝐶𝑜𝑛𝑡𝑟(𝑐, 𝑡)}| 𝑇𝑒𝑎𝑚𝐴𝑏𝑎𝑛𝑑𝑜𝑛𝑚𝑒𝑛𝑡(𝑡) = |𝐿𝑒𝑎𝑣𝑒𝑟𝑠(𝑡)| | 𝑐 𝑖𝑠𝐶𝑜𝑛𝑡𝑟(𝑐, 𝑡 − 1)}| 𝑃𝑟𝑜𝑗𝑒𝑐𝑡𝑅𝑒𝑛𝑒𝑤𝑎𝑙(𝑡) = |𝑁𝑒𝑤𝑃𝑟𝑜𝑗𝑒𝑐𝑡𝑠(𝑡)| | 𝑝 𝑖𝑠𝐴𝑐𝑡𝑖𝑣𝑒(𝑝, 𝑡)}| Project𝐴𝑏𝑎𝑛𝑑𝑜𝑛𝑚𝑒𝑛𝑡(𝑡) = |𝑂𝑏𝑠𝑜𝑙𝑒𝑡𝑒𝑃𝑟𝑜𝑗𝑒𝑐𝑡𝑠(𝑡)| | 𝑝 𝑖𝑠𝐴𝑐𝑡𝑖𝑣𝑒(𝑝,𝑡−1)}|
  45. 45. Social analysis 2008 2009 2010 2011 2012 2013 2014 2015 2016 0 0.2 0.4 0.6 0.8 1 TeamRenewal TeamAbandonment
  46. 46. Technical analysis 2008 2009 2010 2011 2012 2013 2014 2015 2016 0 0.2 0.4 0.6 0.8 1 ProjectRenewal ProjectAbandonment
  47. 47. Socio-technical impact 2008 2009 2010 2011 2012 2013 2014 2015 0 1 2 3 4 Specialization Specialization (Kullback-Liebler divergence) Project i importance for contributor Cj Project i importance in ecosystem
  48. 48. Abandoner characteristics How to determine the characteristics of contributors abandoning the ecosystem? Survival analysis Population: contributors in an ecosystem Event: leaving the ecosystem E Constantinou, T Mens (2017) An empirical comparison of developer retention in the Rubygems and npm software ecosystems. Innovations in Systems and Software Engineering 13(2-3), Springer
  49. 49. Abandoner characteristics ~ 144K packages ~ 32K contributors ~ 462K packages ~ 64K contributors Measurements on: • Technical activity • Social activity Intensity Frequency Duration of inactivity
  50. 50. Abandoner characteristics 0 50 100 150 200 0.00.20.40.60.81.0 npm Duration of commit activity (months) Survivalprobability Social inactivity Social activity Social abandoner 0 50 100 150 0.00.20.40.60.81.0 RubyGems Duration of commit activity (months) Survivalprobability Social inactivity Social activity Social abandoner 0 50 100 150 200 0.00.20.40.60.81.0 npm Duration of commit activity (months) Survivalprobability Very Short Short Long Very Long 0 50 100 150 0.00.20.40.60.81.0 RubyGems Duration of commit activity (months) Survivalprobability Very Short Short Long Very Long
  51. 51. Abandoner characteristics Developers tend to abandon the ecosystem sooner if they: do not communicate communicate or commit less intensively communicate or commit less frequently do not communicate or do not commit for a longer period of time
  52. 52. Activity prediction
  53. 53. Summary

×