Social Interactions aroundCross-System Bug Fixings: The Case of FreeBSD and OpenBSD Gerardo Canfora, Luigi Cerulo,Marta Cimitile, Massimiliano Di Penta firstname.lastname@example.org
Context Source code is often reused across different systems Unixes (FreeBSD, OpenBSD, Linux) Office applications (NeoOffice, OpenOffice) Desktop environment apps (KDE or GNOME apps) Maintenance might require to propagate bug fixings We call this “Cross System Bug Fixing” (CSBF) Example: FreeBSD, 1996/01/19, file ip_icmp.h: – “Added definitions for ICMP router discovery. Reviewed by: wollman OpenBSD, 1996/08/02, file ip_icmp.h: – “ICMP Router Discovery definitions; from FreeBSD”
What we propose A method to track CSBFs A study on the social characteristics and development activity made by CSBF committers degree, betweenness, brokerage commits, lines changed
Detecting CSBF - I Step 1: mining cross-referencing commits openbsd, atphy.c,2008/09/25 20:47:16,brad, Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@ Step 2: mine commits previously performed on files with same name in the other system freebsd,atphy.c,2008/05/19 01:12:10,yongari, Add Attansic/Atheros F1 PHY driver. openbsd, atphy.c,2008/09/25 20:47:16,brad, Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@
Detecting CSBF - II Step 3: compute file similarity with clone detection CCFinder Threshold: at least 10% of cloned lines Step 4: take the previous change with the highest textual similarity in the commit note Use of Vector Space models Cosine similarity; threshold (0.20) to filter out unrelated commits Add Attansic/Atheros F1 PHY driver. = 0.72 Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@
Building Committers Network We extract communication from mailing lists Bug fixing mailing lists Heuristic similar to the one of Bird et al.  to map inconsistent namings / emails Also, to map committer Ids to mailing list names/emails Nodes of the network labeled as: Committer / other mailing list contributors CSBFs committer
Empirical Study Goal: analyze the phenomenon of CSBFs Purpose: understanding its relevance with respect to the social characteristics of the involved developers Context: CVS repositories and mailing lists archives of FreeBSD and OpenBSD Period: 1993-2009 (FreeBSD), 1998-2009 (OpenBSD) Commits: 119,000 (FreeBSD), 70,000 (OpenBSD)
Research Questions RQ1: How do the source code committers and contributors of the two systems overlap? RQ2: How frequent is the phenomenon of CSBFs? RQ3: Who are the contributors involved in CSBFs? RQ4: Are mailing list contributors involved in CSBFs more active than others?
RQ1 – Team overlap FreeBSD OpenBSD Both Committers 383 211 26 Mailing list contribs 8035 3843 359 Committers and 213 122 17 mailing list contributors The two projects have less than 10% of common contributors → the development team of Free and Open BSD is really different
RQ2 – Commit filtering 1000 933 900 800 700 600 500 439 400 296 300 200 133 120 100 59 0 FreeBSD OpenBSD Referring commits Cloned files Linked commits At the end of the filtering not that many but...
RQ2 – Cloned lines in CSBF files C source files header files Percentage smaller for .h files Use of preprocessor conditional to make header files system- dependent #if defined(__FreeBSD__)
RQ3 – CSBF Graph (excerpt)Blue/cyan: FreeBSDRed/orange: OpenBSDYellow: common
RQ3: social characteristics Importance in terms of (in/out) degree: number of (incoming/outcoming) communication links Betweenness: number of communications for which the node is in the short path Brokerage metrics: useful to analyze the communication between two clusters B is a coordinator B is a gatekeeper B is a representative
RQ3 – social characteristics Representative Gatekeeper 12 Coordinator /10 10 Betweenness / 1000 8 Out-degree Column 1 6 In-degree Column 2 Column 3 4 Degree 2 0 5 10 15 20 25 30 35 40 45 50 0 Row 1 CSBF Row 2 Others Row 3 Row 4 All differences statistically significant High effect size (Cohen d>1) Contributors involved in CSBF have a higher importance in the communication and in the flow of communication between systems
RQ4 – change activity of CSBFcommitters and others LOC added/removed Commits40000 1500 100020000 500 0 0 FreeBSD OpenBSD FreeBSD OpenBSD CSBF Others CSBF Others All differences statistically significant High effect size (Cohen d∼1) Contributors involved in CSBF are more active than others
Conclusions and Work-in-Progress We proposed method to mine CSBF We reported a study on FreeBSD and OpenBSD where: Development team is almost disjoint There is a small, though not negligible portion of CSBF Committers involved in CSBF have – Higher social importance – Higher brokerage level – Higher activity in source code commits Work-in-progress: Better approaches to identify implicit CSBF, tracking and linking changes occurring on both systems More extensive study on less obvious cases