Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

On the health of the npm packaging ecosystem

189 views

Published on

Presentation at DrupalCamp 2018 (Ghent) by Tom Mens (University of Mons) about lessons learned and guidelines based on a historical empirical analysis of the npm JavaScript packaging ecosystem, and the impact of technical problems in its package dependency network. This work is part of the SECOHealth and SECO-ASSIST research projects, co-financed by the FNRS-FRS.

Published in: Software
  • Be the first to comment

On the health of the npm packaging ecosystem

  1. 1. Tom Mens, University of Mons, Belgium On the health of the npm packaging ecosystem
  2. 2. On the health of the packaging ecosystem Guidelines and lessons learned based on historical software data analytics Tom Mens Software Engineering Lab tom.mens@umons.ac.be T Mens E Constantinou A Decan @tom_mens
  3. 3. Research Context • Today over 80% of all software in any technology product or service is open source software (OSS). • CHAOSS focuses on creating analytics and metrics to help define OSS community health. https://chaoss.community "The CHAOSS community is developing metrics, methodologies, and software for expressing open source project health and sustainability. By doing so, CHAOSS seeks to improve the transparency of open source project health and sustainability so that relevant stakeholders can make more informed decisions about open source project engagement."
  4. 4. www.secohealth.org @secohealth Bilateral Research Project Wallonia-Canada 2017-2019
  5. 5. seco-assist.github.io @seco-assist "Excellence of Science" Research Project 2018-2021
  6. 6. • Dependency problems • Unmaintained or outdated libraries • Security vulnerabilities • Bugs • Technical debt • Incompatible software licenses • ... Technical • Contributor abandonment / Bus factor • Lack of communication / interaction • Insufficient social diversity • Social conflicts • Cultural differences • .. Software Ecosystem Health Issues
  7. 7. Motivation: leftpad
  8. 8. Motivation: dependency hell
  9. 9. Motivation: dependency hell
  10. 10. Motivation: dependency hell
  11. 11. Most packages depend on another one. ~60% in April 2016 Motivation: dependency hell
  12. 12. Motivation: micropackages
  13. 13. Motivation: breaking changes
  14. 14. Motivation: Security vulnerabilities security exploit in 2017 “attackers entered its system in mid-May through a web-application vulnerability that had a patch available in March. In other words, the credit-reporting giant had more than two months to take precautions that would have defended the personal data of 143 million people from being exposed. It didn’t.” Wired Magazine, “Equifax Has No Excuse”, September 2017 "Patching the security hole was labor intensive and difficult, in part because it involved downloading an updated version of Struts and then using it to rebuild all apps that used older, buggy Struts versions. Some websites may depend on dozens or even hundreds of such apps, which may be scattered across dozens of servers on multiple continents. Once rebuilt, the apps must be extensively tested before going into production to ensure they don’t break key functions on the site.” Ars Technica, Failure to patch two-month-old bug led to massive Equifax breach, September 2017
  15. 15. Understanding through Big Data Analytics npm = software package manager for JavaScript since 2010 In 2017: 3.5TB of storage required for hosting 500K packages 2.3 million opened GitHub pull requests for JavaScript repositories We analysed: ~462 thousand packages ~3 million package releases ~13,6 million (runtime) package dependencies
  16. 16. Ecosystems grow rapidly For npm: Exponential growth of • #packages • #package updates • #dependencies # new packages per trimester # package updates per trimester Total # package dependencies
  17. 17. Ecosystems grow rapidly Package updates can be the cause of many maintainability issues or even failures in dependent packages ! # new packages per trimester # package updates per trimester Total # package dependencies
  18. 18. Issues in packages may have high transitive impact Average dependency depth for top-level packages Proportional dependency depth for top-level packages Many "top-level" packages have a high number of indirect (transitive) dependencies
  19. 19. Issues in packages may have high transitive impact March 2016: Unexpected removal of left-pad caused > 2% of all packages to break (> 5,400 packages) Number of packages that are transitively required by at least 5% of all packages
  20. 20. Lesson learned: Be wary of transitive dependencies! • Developers are often unaware of transitive dependencies • It just takes one such transitive package to break or compromise your software! Monitoring tools may help to detect and address such dependency issues
  21. 21. Security vulnerabilities • When are vulnerabilities discovered in npm? • When are vulnerabilities fixed in npm? • When do dependent packages adopt a fixed release? SOURCE: A Decan, T Mens, E Constantinou (2018) IEEE Int'l Conf. Mining Software Repositories "On the impact of security vulnerabilities in the npm package dependency network" "37% of websites include a JavaScript library with a known open source vulnerability." T. Lauinger et al. "Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Libraries on the Web", NDSS 2017.
  22. 22. Vulnerability introduction Vulnerability discovery Vulnerability publication Vulnerability fixed time When are vulnerabilities discovered in npm ?
  23. 23. When are vulnerabilities discovered in npm ? >40% of all vulnerabilities are not discovered even 2 years after their introduction, regardless of their severity. It takes a long time to discover vulnerabilities regardless of their severity
  24. 24. Vulnerability introduction Vulnerability discovery Vulnerability publication Vulnerability fixed time When are vulnerabilities fixed in npm ?
  25. 25. When are vulnerabilities fixed in npm ? Most vulnerabilities are fixed quickly, and before becoming public. 1 out of 5 take more than a year to be fixed  unmaintained packages that should be deprecated
  26. 26. When do dependent packages adopt a fixed release? 1 out of 3 dependents never update their dependency to a vulnerable package Improper or too restrictive use of dependency constraints Dependent package is no longer actively maintained Maintainers of dependent packages are unaware of the vulnerability or the fix Fixed package version is incompatible
  27. 27. Technical Lag (a.k.a. dependency freshness) Goal • Study, at an ecosystem level, how outdated npm software packages are with respect to their upstream dependencies. • Study to which extent semantic versioning is respected SOURCE: A Decan, T Mens, E Constantinou (2018) IEEE Int'l Conf. Software Maintenance and Evolution "On the evolution of technical lag in the npm package dependency network" Technical lag is caused by dependency constraints preventing the use of a more recent package version
  28. 28. Technical Lag Main findings • 1 out of 4 package dependencies suffers from technical lag • 1 out of 4 package releases has a technical lag of more than 9 months • Minor and patch updates tend to increase technical lag, even though they are supposed to be backward compatible • Major updates tend to reduce technical lag
  29. 29. Technical Lag Actionable results • Appropriate use of version constraints could reduce technical lag in 17% of all releases • Dependency monitoring tools should inform developers of technical lag and help to reduce it. • Package maintainers should help dependent packages to upgrade to new releases as easily as possible. • Package maintainers should backport important bug and security fixes to earlier major releases.
  30. 30. Be prudent ! • Only add a dependency if it is really needed • Avoid too many (transitive) dependencies • Avoid adding dependencies to problematic packages • too high technical lag • security vulnerabilities • unmaintained or deprecated packages Guidelines and lessons learned
  31. 31. Be agile ! • Detect and fix vulnerabilities early • Embrace semantic versioning • Use (transitive) dependency monitoring tools to review your dependencies regularly • Integrate these tools in your Continuous Integration process Guidelines and lessons learned
  32. 32. Be communicative ! • Inform your dependents about • incompatible upgrades: by adhering to semantic versioning • planned updates • deprecated features • Help your dependents to upgrade more easily • Provide (automated) migration guidelines • Provide alpha/beta releases • Test your changes on dependents before releasing updates Guidelines and lessons learned
  33. 33. SoHeal 2019 2nd International ICSE Workshop on Software Health Montreal, Canada, 28 May 2019 • Position papers: 1 February 2019 • Industry/practitioner talk proposals: 15 February 2019 https://soheal.github.io @iw_soheal What? Software Health encompasses many socio-technical aspects: success, longevity, growth, resilience, survival, diversity, sustainability, popularity, inclusiveness ... Why? • Raise awareness of software health • Present tools, methods, practical experiences, ... • Advance body of knowledge on software health. Who? Open Source Community Members, Industry and Academia

×