On Popularity and Quality Metrics of npm Packages

Postdoc Researcher
Oct. 24, 2022
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
On Popularity and Quality Metrics of npm Packages
1 of 30

More Related Content

Similar to On Popularity and Quality Metrics of npm Packages

Comparing dependency issues across software package distributions (FOSDEM 2020)Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)Tom Mens
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in RRevolution Analytics
An Empirical Analysis of Technical Lag in npm Package DependenciesAn Empirical Analysis of Technical Lag in npm Package Dependencies
An Empirical Analysis of Technical Lag in npm Package DependenciesAhmed Zerouali
Aliens in Your Apps!Aliens in Your Apps!
Aliens in Your Apps!All Things Open
On the topology of package dependency networks: A comparison of programming l...On the topology of package dependency networks: A comparison of programming l...
On the topology of package dependency networks: A comparison of programming l...Tom Mens
Predicting Android Application Security and Privacy Risk With Static Code Met...Predicting Android Application Security and Privacy Risk With Static Code Met...
Predicting Android Application Security and Privacy Risk With Static Code Met...MobileSoft

Similar to On Popularity and Quality Metrics of npm Packages(20)

More from Ahmed Zerouali

Prevalence and Evolution of License Violations in npm and RubyGems Dependency...Prevalence and Evolution of License Violations in npm and RubyGems Dependency...
Prevalence and Evolution of License Violations in npm and RubyGems Dependency...Ahmed Zerouali
Analysis And Observations Of The Evolution Of Testing Library UsageAnalysis And Observations Of The Evolution Of Testing Library Usage
Analysis And Observations Of The Evolution Of Testing Library UsageAhmed Zerouali
On the Impact of Security Vulnerabilities in the npm and RubyGems Dependency ...On the Impact of Security Vulnerabilities in the npm and RubyGems Dependency ...
On the Impact of Security Vulnerabilities in the npm and RubyGems Dependency ...Ahmed Zerouali
A multi-dimensional analysis of technical lag in Debian-based Docker imagesA multi-dimensional analysis of technical lag in Debian-based Docker images
A multi-dimensional analysis of technical lag in Debian-based Docker imagesAhmed Zerouali
Evolution of Technical Lag in DockerHub images - Benevol20Evolution of Technical Lag in DockerHub images - Benevol20
Evolution of Technical Lag in DockerHub images - Benevol20Ahmed Zerouali
PhD public defense: A Measurement Framework for  Analyzing Technical Lag in  ...PhD public defense: A Measurement Framework for  Analyzing Technical Lag in  ...
PhD public defense: A Measurement Framework for Analyzing Technical Lag in ...Ahmed Zerouali

Recently uploaded

[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...Roberto Pérez Alcolea
Kubernetes with Cilium in AWS - Experience Report!Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!QAware GmbH
What is Microsoft Power BI used for.pptxWhat is Microsoft Power BI used for.pptx
What is Microsoft Power BI used for.pptxJohnCommuserv
Freight Management System Freight Management System
Freight Management System Freightoscope
The art of AI ArtThe art of AI Art
The art of AI ArtDennis Vroegop
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...Natan Silnitsky

On Popularity and Quality Metrics of npm Packages

Editor's Notes

  1. One of the most crippling choices new developers and even existing ones face is deciding what programming language to work in, which frameworks to use and which library to learn. Given there are literally thousands of libraries to choose from, and all have their own pros and cons, it can be difficult to decide what to learn.
  2. Why it is important to pick the right software Often, you can find many open-source choices that appear to fit the your need, but picking the wrong software can have expensive consequences. A lot of time is required to learn new software and integrate it into your project, and time is money. Choosing the wrong software can be an expensive mistake. From the different reasons that developers have when choosing a new software are: SQ: Is this software library well tested and written? SF: What does it provide as functionnality? SSD: Is it well documented? SP: for example, Is it used by a lot of people?
  3. Out of all these reasons, popularity seems to be the most influenced factor.
  4. Researches interviewed developers involved in open source software ecosystems about the reason behind selecting the appropriate software, and most answer were related to: popularity and community reputation.
  5. But does this factor imply a good software quality. Do popular software packages for example in javascript have good development quality? Let’s take an example, this is sinon which is a test package is ranked 14 and it has good test coverage and all builds passing.
  6. While Chai which is also a test library, has failing buils and less test converage than the top 14 package.
  7. To verify if this is not the case for a lot of libraries, and that indeed there is a link between software quality and popularity. We investigated this issue for packages that are hosted in the NPM packages manager. We choose NPM because it’s now the largest registry for packages in the world, and because Javascript is one of the most used programming languages.
  8. We used two open source package tracking tools. Libraries.io which contains the metadata of packages dependencies extracted from 23 package managers And Npmsio. Which is an open sources …..
  9. The scores are calculated using many different metrics.
  10. For the data extraction, we had the choice between downloading the the prepared and available data of 15th june 2017 or use their api and get the latest iformation, but since there is not a lot of time between june and october and also because libraries.io has a rate limit, we used the available metadata. For npms, we used their API To download the data using npms.io was also really fast
  11. After combining the data from both sources, from the 516,705 packages in libraries.io of the 15th june 2017 extracted dataset, we found 308,777 of them also in npms.io. And We observed that all packages in npms.io are hosted on Gihtub, which is of a great value to us, since our purpose is to analyze packages that evolve in the same.
  12. To empirically study the relationship between software quality and popularity. We consider the following research questions.
  13. In order to be able to answer these questions: We used only python for the extarction, cleaning and preparing the data. As well as for the analysis. To play with the data we used:
  14. Our aim with this preliminary question is to better understand the concept of popularity: In order to study popularity, we rely on the popularity score of npms.io. This score includes, among other metrics, the number of other npm packages directly depending on it. Also we rely on the number of dependent repositories metric extracted from libraries.io . it counts the number of Git repositories that do not correspond to an npm package yet depend on the npm package under consideration. The package scores computed by npms.io are values between 0 and 1. To facilitate comparison with the aforementioned metric from libraries.io, we normalize this metrics to a value between 0 and 1. As shown in this figure, the scatter plot of npm package popularity in terms of community interest compared to the number of dependent repositories, reveals a correlation between both kinds of popularity. To confirm this we calculated the … and we found strong correlation at R=0.81
  15. We also found that.
  16. For the first question we verified if there is quantitative evidence of a relation between popularity terms of community interest dependent external repositoried and quaity in terms of …..
  17. We observe that most npm packages have low popularity within the community and have very little external repositories depending on them, while most of them have a good quality score. We also verified statistically if packagees that do not use any dependency are different but we couldn’t find a statistical signficance difference.
  18. To calculate the quality scores, the high weights were given to carefulness and testing. To have a deep look at how these two metrics are distributed, we divided packages in quintiles by their popularity score. And we statistical found that for most categories, the characteristic of carefulness is higher than the characteristic of testing. We also checked whether we can find a correlation between carefulness and testing for all packages with popularity, and we found only a weak linear correlation.
  19. After that we studied the relation between maintenance activity and popularity. We expected that packages under active maintenance are more popular than packages that are no longer being maintained. When checking the source code of npms, we find that they had difficulties to evaluate packages that have disabled or zero issues in their repository. That’s why for this particular research question, we filtred them out. We investigated the relationship between releasing, committing and fixing issues. For all packages considered for this analysis, we grouped them into two categories of equal size based on the median value for the commit frequency. And we found that npm packages that commit frequently have good fixing issues scores and they also release frequently.
  20. Using the maintenance score, we checked whether we could find different distributions of the number of dependent npm packages and repositories. Similar to what we did in before, we divided packages in quintiles according to their maintenance score. As shown in the figure, we couldnt find relevant difference between the distributions . Which means that maintenance does not have a large impact on the popularity of npm packages
  21. To know how the deprecated npm packages are being handled, we identified all npm packages in the libraries.io dataset that have a “deprectaed” status in them. From all packages we found only: 768 After that we analyzed manually description of all packages that have the word ‘deprecat’ in their description We filtred packages distined to handle deprecation and we found 1522 more deprecated packages. From this number of deprecated packages, only 836 was found in our dataset.
  22. We analyzed their scores and popularity
  23. After that, and in order to know how front end package are different We extracted all packages that are hosted the front end dedicated package manager Bower. And then we identified which of these packages are also on npm. And finaly we could find 20,210 packages that are hosted on bower and npm and they are in our dataset
  24. For these packages and the other packages hosted on npm, we carried out a comparison between their maintenance, quality and popularity characteristics scores. And we found that front end packages are different in size, age and popularity. They are more popular than the other packages.
  25. Our results could be different when relying on other metrics that have been defined and implemented in a different way to quantify quality or popularity. Since we only used metrics already evaluated by npms and libraries.io. We did not differentiate or classify the npm packages by their category or domain, which may impact our findings.
  26. This analysis presented an empirical analysis on software package popularity and quality in npm packages. Using the available data on libraries.io and npms.io, two open source services that provide software dependency tracking, we analyzed the characteristics of open source npm packages in order to investigate the relationship between quality and popularity within the npm ecosystem.