Dipenta msr2011-csbf
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Dipenta msr2011-csbf

  • 559 views
Uploaded on

Social Interactions around Cross-System Bug Fixings: the Case of FreeBSD and OpenBSD

Social Interactions around Cross-System Bug Fixings: the Case of FreeBSD and OpenBSD

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
559
On Slideshare
559
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Social Interactions aroundCross-System Bug Fixings: The Case of FreeBSD and OpenBSD Gerardo Canfora, Luigi Cerulo,Marta Cimitile, Massimiliano Di Penta dipenta@unisannio.it
  • 2. Context  Source code is often reused across different systems  Unixes (FreeBSD, OpenBSD, Linux)  Office applications (NeoOffice, OpenOffice)  Desktop environment apps (KDE or GNOME apps)  Maintenance might require to propagate bug fixings  We call this “Cross System Bug Fixing” (CSBF)  Example:  FreeBSD, 1996/01/19, file ip_icmp.h: – “Added definitions for ICMP router discovery. Reviewed by: wollman  OpenBSD, 1996/08/02, file ip_icmp.h: – “ICMP Router Discovery definitions; from FreeBSD”
  • 3. What we propose  A method to track CSBFs  A study on the social characteristics and development activity made by CSBF committers  degree, betweenness, brokerage  commits, lines changed
  • 4. Detecting CSBF - I  Step 1: mining cross-referencing commits  openbsd, atphy.c,2008/09/25 20:47:16,brad, Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@  Step 2: mine commits previously performed on files with same name in the other system  freebsd,atphy.c,2008/05/19 01:12:10,yongari, Add Attansic/Atheros F1 PHY driver.  openbsd, atphy.c,2008/09/25 20:47:16,brad, Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@
  • 5. Detecting CSBF - II  Step 3: compute file similarity with clone detection  CCFinder  Threshold: at least 10% of cloned lines  Step 4: take the previous change with the highest textual similarity in the commit note  Use of Vector Space models  Cosine similarity; threshold (0.20) to filter out unrelated commits Add Attansic/Atheros F1 PHY driver. = 0.72 Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@
  • 6. Building Committers Network  We extract communication from mailing lists  Bug fixing mailing lists  Heuristic similar to the one of Bird et al. [2006] to map inconsistent namings / emails  Also, to map committer Ids to mailing list names/emails  Nodes of the network labeled as:  Committer / other mailing list contributors  CSBFs committer
  • 7. Empirical Study Goal: analyze the phenomenon of CSBFs Purpose: understanding its relevance with respect to the social characteristics of the involved developers Context: CVS repositories and mailing lists archives of FreeBSD and OpenBSD  Period: 1993-2009 (FreeBSD), 1998-2009 (OpenBSD)  Commits: 119,000 (FreeBSD), 70,000 (OpenBSD)
  • 8. Research Questions  RQ1: How do the source code committers and contributors of the two systems overlap?  RQ2: How frequent is the phenomenon of CSBFs?  RQ3: Who are the contributors involved in CSBFs?  RQ4: Are mailing list contributors involved in CSBFs more active than others?
  • 9. RQ1 – Team overlap FreeBSD OpenBSD Both Committers 383 211 26 Mailing list contribs 8035 3843 359 Committers and 213 122 17 mailing list contributors The two projects have less than 10% of common contributors → the development team of Free and Open BSD is really different
  • 10. RQ2 – Commit filtering 1000 933 900 800 700 600 500 439 400 296 300 200 133 120 100 59 0 FreeBSD OpenBSD Referring commits Cloned files Linked commits At the end of the filtering not that many but...
  • 11. RQ2 – Cloned lines in CSBF files C source files header files  Percentage smaller for .h files  Use of preprocessor conditional to make header files system- dependent  #if defined(__FreeBSD__)
  • 12. RQ3 – CSBF Graph (excerpt)Blue/cyan: FreeBSDRed/orange: OpenBSDYellow: common
  • 13. RQ3: social characteristics  Importance in terms of  (in/out) degree: number of (incoming/outcoming) communication links  Betweenness: number of communications for which the node is in the short path  Brokerage metrics: useful to analyze the communication between two clusters B is a coordinator B is a gatekeeper B is a representative
  • 14. RQ3 – social characteristics Representative Gatekeeper 12 Coordinator /10 10 Betweenness / 1000 8 Out-degree Column 1 6 In-degree Column 2 Column 3 4 Degree 2 0 5 10 15 20 25 30 35 40 45 50 0 Row 1 CSBF Row 2 Others Row 3 Row 4  All differences statistically significant  High effect size (Cohen d>1)  Contributors involved in CSBF have a higher importance in the communication and in the flow of communication between systems
  • 15. RQ3 – committers with highestsocial metrics
  • 16. RQ4 – change activity of CSBFcommitters and others LOC added/removed Commits40000 1500 100020000 500 0 0 FreeBSD OpenBSD FreeBSD OpenBSD CSBF Others CSBF Others  All differences statistically significant  High effect size (Cohen d∼1)  Contributors involved in CSBF are more active than others
  • 17. Conclusions and Work-in-Progress  We proposed method to mine CSBF  We reported a study on FreeBSD and OpenBSD where:  Development team is almost disjoint  There is a small, though not negligible portion of CSBF  Committers involved in CSBF have – Higher social importance – Higher brokerage level – Higher activity in source code commits  Work-in-progress:  Better approaches to identify implicit CSBF, tracking and linking changes occurring on both systems  More extensive study on less obvious cases