Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analyzing rich club behavior in open source projects

218 views

Published on

The network of collaborations in an open source project can reveal relevant emergent properties that influence its prospects of success.
In this work, we analyze open source projects to determine whether they exhibit a rich-club behavior, i.e., a phenomenon where contributors with a high number of collaborations (i.e., strongly connected within the collaboration network)
are likely to cooperate with other well-connected individuals. The presence or absence of a rich-club has an impact on the sustainability and robustness of the project.

For this analysis, we build and study a dataset with the 100 most popular projects in GitHub, exploiting connectivity patterns in the graph structure of collaborations that arise from commits, issues and pull requests. Results show that rich-club behavior is present in all the projects, but only few of them have an evident club structure. We compute coefficients both for single source graphs and the overall interaction graph, showing that rich-club behavior varies across different layers of software development. We provide possible explanations of our results, as well as implications for further analysis.

Published in: Software
  • If you want to download or read this book, copy link or url below in the New tab ......................................................................................................................... DOWNLOAD FULL PDF EBOOK here { http://bit.ly/2m6jJ5M } .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • If you want to download or read this book, Copy link or url below in the New tab ......................................................................................................................... DOWNLOAD FULL PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Analyzing rich club behavior in open source projects

  1. 1. Analyzing Rich-Club Behavior in Open Source Projects OpenSym 2019, the 15th International Symposium on Open Collaboration Skövde, Sweden Mattia Gasparini1, Javier Luis Cànovas Izquierdo2, Robert Clarisò2, Marco Brambilla1, Jordi Cabot2 Politecnico di Milano1 Universitat Oberta de la Catalunya2
  2. 2. Introduction • Git and Github data to analyze evolution, success and management of Open Source Software. • Define developers behavioral patterns. • Discover how collaborations between developers work. 2
  3. 3. Problem Statement ANALYSIS OF COLLABORATION NETWORKS COMMITS, ISSUES AND PULL REQUESTS AS SOURCES DISCOVER PRESENCE OF SPECIFIC COLLABORATION STRUCTURES: RICH-CLUBS 3
  4. 4. Rich-club coefficient • Graph structural property: It represents the tendency of well-connected nodes (i.e.: hubs) to interact with other well- connected nodes. • Formulation: 𝜙 𝑘 = 2𝐸 𝑘 𝑁𝑘(𝑁𝑘 − 1) 𝜌 𝑘 = 𝜙(𝑘) 𝜙 𝑟𝑎𝑛𝑑𝑜𝑚(𝑘) 𝐸 𝑘: number of edges between nodes of degree greater or equal to 𝑘 𝑁𝑘: number of nodes with degree greater or equal to 𝑘 𝜙 𝑘 : rich-club coefficient 𝜌 𝑘 : normalized rich-club coefficient 4
  5. 5. Related Work • Rich-club phenomenon for a specific project [2], or for a single FLOSS community [3]. • Study of the presence of a rich-club effect across the whole GitHub social network [4]. • Analysis on open source communities exploiting email exchanges among participants [5]. 5 [2] Weifeng Pan, Bing Li, Yutao Ma, and Jing Liu. 2011. Multi-granularity evolution analysis of software using complex network theory [3] Guido Conaldi. 2010. Flat for the few, steep for the many: Structural cohesion and Rich-Club effect as measures of hierarchy and control in FLOSS communities [4] Antonio Lima, Luca Rossi, and Mirco Musolesi. 2014. Coding Together at Scale: GitHub as a Collaborative Social Network [5] Sergi Valverde and Ricard V. Solé. 2007. Self-organization versus hierarchy in open-source social networks
  6. 6. Case Study 6 Top-100 starred projects in 2016 on GitHub 926K commits produced by 50K Git users 1.3M issues-related events generated by 118K GitHub users 280K pullrequest-related events generated by 20K GitHub users
  7. 7. Analysis Pipeline 7
  8. 8. Data Collection & Preprocessing • Git repository cloning for commits data using Gitana • Github activities for issues and PR activities querying GHArchive • Duplicity and clashing problem 8
  9. 9. Graphs Construction • Definition of 4 undirected graphs: a. PR graph b. Commits graph c. Issues graph d. Supergraph (a + b + c) • Nodes: users • Edges connect a pair of users if they interacted on the same element (issue, PR, file) 9
  10. 10. Graphs Example Materialize PR graph (a) Materialize commits graph (b) Materialize issues graph (c) Materialize supergraph (d) 10
  11. 11. Rich-club Coefficient Calculation • Calculation using algorithm implementation included in NetworkX6 • Normalized coefficient 𝜌(𝑘): rich-club effect relevant if 𝜌 𝑘 > 1 • Discard networks for which randomization fails 11 [6] https://networkx.github.io/documentation/stable/reference/algorithms/rich_club.html
  12. 12. Rich-club Coefficient Results • 60 projects have a defined coefficient for the supergraph. • Each graph presents a rich- club effect, since 𝜌 𝑘 > 1 for some 𝑘
  13. 13. Materialize7: Rich-Club Supergraph Coefficient Maximum normalized coefficient (k = 49) corresponds to maximum club effect with nodes of degree at least 49. 13[7] https://materializecss.com
  14. 14. Materialize: Supergraph 14
  15. 15. Swift8: Rich-Club Supergraph Coefficient 15[8] https://swift.org/
  16. 16. Swift: Supergraph 16
  17. 17. Rich-club Coefficient Results 17
  18. 18. Maximum coefficient distribution • Distribution of the maximum rich-club coefficient for each type of graph across the studied projects. • Mean value around 1 for issues and commits graphs coefficients: weak rich-club presence. • Mean value around 1.4 for PR graphs coefficient: strong rich- club presence. Further insights 18
  19. 19. Multi-club users • 25 over 60 projects present a set of users belonging to multiple rich- clubs. • Distribution of multi-club users across the 25 projects. • Developers form community with strong influence in each project level. Further insights 19
  20. 20. Conclusions First systematic evaluation of the rich-club behaviour on open source projects: • 60% of projects shows rich-clubs in the supergraph, mostly with a slight effect. • Rich-club behavior could undermine the open paradigma, but phenomeon requires further analysis. • Strong rich-club presence in PR graphs may reside to criticality of the activity. • 25 over 60 projects have users belonging to multiple rich-clubs. 20
  21. 21. Future Work Weighted rich-club coefficient Rich-club effect at module and ecosystem level Time dimension to highlight temporal clubs 21
  22. 22. Questions?

×