Dipenta msr2011-csbf
Upcoming SlideShare
Loading in...5

Dipenta msr2011-csbf



Social Interactions around Cross-System Bug Fixings: the Case of FreeBSD and OpenBSD

Social Interactions around Cross-System Bug Fixings: the Case of FreeBSD and OpenBSD



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Dipenta msr2011-csbf Dipenta msr2011-csbf Presentation Transcript

  • Social Interactions aroundCross-System Bug Fixings: The Case of FreeBSD and OpenBSD Gerardo Canfora, Luigi Cerulo,Marta Cimitile, Massimiliano Di Penta dipenta@unisannio.it
  • Context  Source code is often reused across different systems  Unixes (FreeBSD, OpenBSD, Linux)  Office applications (NeoOffice, OpenOffice)  Desktop environment apps (KDE or GNOME apps)  Maintenance might require to propagate bug fixings  We call this “Cross System Bug Fixing” (CSBF)  Example:  FreeBSD, 1996/01/19, file ip_icmp.h: – “Added definitions for ICMP router discovery. Reviewed by: wollman  OpenBSD, 1996/08/02, file ip_icmp.h: – “ICMP Router Discovery definitions; from FreeBSD”
  • What we propose  A method to track CSBFs  A study on the social characteristics and development activity made by CSBF committers  degree, betweenness, brokerage  commits, lines changed
  • Detecting CSBF - I  Step 1: mining cross-referencing commits  openbsd, atphy.c,2008/09/25 20:47:16,brad, Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@  Step 2: mine commits previously performed on files with same name in the other system  freebsd,atphy.c,2008/05/19 01:12:10,yongari, Add Attansic/Atheros F1 PHY driver.  openbsd, atphy.c,2008/09/25 20:47:16,brad, Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@
  • Detecting CSBF - II  Step 3: compute file similarity with clone detection  CCFinder  Threshold: at least 10% of cloned lines  Step 4: take the previous change with the highest textual similarity in the commit note  Use of Vector Space models  Cosine similarity; threshold (0.20) to filter out unrelated commits Add Attansic/Atheros F1 PHY driver. = 0.72 Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@
  • Building Committers Network  We extract communication from mailing lists  Bug fixing mailing lists  Heuristic similar to the one of Bird et al. [2006] to map inconsistent namings / emails  Also, to map committer Ids to mailing list names/emails  Nodes of the network labeled as:  Committer / other mailing list contributors  CSBFs committer
  • Empirical Study Goal: analyze the phenomenon of CSBFs Purpose: understanding its relevance with respect to the social characteristics of the involved developers Context: CVS repositories and mailing lists archives of FreeBSD and OpenBSD  Period: 1993-2009 (FreeBSD), 1998-2009 (OpenBSD)  Commits: 119,000 (FreeBSD), 70,000 (OpenBSD)
  • Research Questions  RQ1: How do the source code committers and contributors of the two systems overlap?  RQ2: How frequent is the phenomenon of CSBFs?  RQ3: Who are the contributors involved in CSBFs?  RQ4: Are mailing list contributors involved in CSBFs more active than others?
  • RQ1 – Team overlap FreeBSD OpenBSD Both Committers 383 211 26 Mailing list contribs 8035 3843 359 Committers and 213 122 17 mailing list contributors The two projects have less than 10% of common contributors → the development team of Free and Open BSD is really different
  • RQ2 – Commit filtering 1000 933 900 800 700 600 500 439 400 296 300 200 133 120 100 59 0 FreeBSD OpenBSD Referring commits Cloned files Linked commits At the end of the filtering not that many but...
  • RQ2 – Cloned lines in CSBF files C source files header files  Percentage smaller for .h files  Use of preprocessor conditional to make header files system- dependent  #if defined(__FreeBSD__)
  • RQ3 – CSBF Graph (excerpt)Blue/cyan: FreeBSDRed/orange: OpenBSDYellow: common
  • RQ3: social characteristics  Importance in terms of  (in/out) degree: number of (incoming/outcoming) communication links  Betweenness: number of communications for which the node is in the short path  Brokerage metrics: useful to analyze the communication between two clusters B is a coordinator B is a gatekeeper B is a representative
  • RQ3 – social characteristics Representative Gatekeeper 12 Coordinator /10 10 Betweenness / 1000 8 Out-degree Column 1 6 In-degree Column 2 Column 3 4 Degree 2 0 5 10 15 20 25 30 35 40 45 50 0 Row 1 CSBF Row 2 Others Row 3 Row 4  All differences statistically significant  High effect size (Cohen d>1)  Contributors involved in CSBF have a higher importance in the communication and in the flow of communication between systems
  • RQ3 – committers with highestsocial metrics
  • RQ4 – change activity of CSBFcommitters and others LOC added/removed Commits40000 1500 100020000 500 0 0 FreeBSD OpenBSD FreeBSD OpenBSD CSBF Others CSBF Others  All differences statistically significant  High effect size (Cohen d∼1)  Contributors involved in CSBF are more active than others
  • Conclusions and Work-in-Progress  We proposed method to mine CSBF  We reported a study on FreeBSD and OpenBSD where:  Development team is almost disjoint  There is a small, though not negligible portion of CSBF  Committers involved in CSBF have – Higher social importance – Higher brokerage level – Higher activity in source code commits  Work-in-progress:  Better approaches to identify implicit CSBF, tracking and linking changes occurring on both systems  More extensive study on less obvious cases