0
Social Interactions aroundCross-System Bug Fixings:        The Case of  FreeBSD and OpenBSD  Gerardo Canfora, Luigi Cerulo...
Context  Source code is often reused across different systems    Unixes (FreeBSD, OpenBSD, Linux)    Office application...
What we propose  A method to track CSBFs  A study on the social characteristics   and development activity made by   CSB...
Detecting CSBF - I  Step 1: mining cross-referencing commits    openbsd, atphy.c,2008/09/25 20:47:16,brad,     Add a dri...
Detecting CSBF - II  Step 3: compute file similarity with clone detection    CCFinder    Threshold: at least 10% of clo...
Building Committers Network  We extract communication from mailing   lists    Bug fixing mailing lists  Heuristic simil...
Empirical Study Goal: analyze the phenomenon of CSBFs Purpose: understanding its relevance with  respect to the social c...
Research Questions  RQ1: How do the source code committers   and contributors of the two systems   overlap?  RQ2: How fr...
RQ1 – Team overlap                              FreeBSD OpenBSD Both  Committers                      383      211       2...
RQ2 – Commit filtering   1000                                           933    900    800    700    600    500       439  ...
RQ2 – Cloned lines in CSBF files         C source files                        header files  Percentage smaller for .h fi...
RQ3 – CSBF Graph (excerpt)Blue/cyan: FreeBSDRed/orange: OpenBSDYellow: common
RQ3: social characteristics  Importance in terms of    (in/out) degree: number of (incoming/outcoming)     communication...
RQ3 – social characteristics       Representative          Gatekeeper           12       Coordinator /10           10   Be...
RQ3 – committers with highestsocial metrics
RQ4 – change activity of CSBFcommitters and others        LOC added/removed                 Commits40000                  ...
Conclusions and Work-in-Progress  We proposed method to mine CSBF  We reported a study on FreeBSD and OpenBSD where:   ...
Upcoming SlideShare
Loading in...5
×

Dipenta msr2011-csbf

414

Published on

Social Interactions around Cross-System Bug Fixings: the Case of FreeBSD and OpenBSD

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
414
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Dipenta msr2011-csbf"

  1. 1. Social Interactions aroundCross-System Bug Fixings: The Case of FreeBSD and OpenBSD Gerardo Canfora, Luigi Cerulo,Marta Cimitile, Massimiliano Di Penta dipenta@unisannio.it
  2. 2. Context  Source code is often reused across different systems  Unixes (FreeBSD, OpenBSD, Linux)  Office applications (NeoOffice, OpenOffice)  Desktop environment apps (KDE or GNOME apps)  Maintenance might require to propagate bug fixings  We call this “Cross System Bug Fixing” (CSBF)  Example:  FreeBSD, 1996/01/19, file ip_icmp.h: – “Added definitions for ICMP router discovery. Reviewed by: wollman  OpenBSD, 1996/08/02, file ip_icmp.h: – “ICMP Router Discovery definitions; from FreeBSD”
  3. 3. What we propose  A method to track CSBFs  A study on the social characteristics and development activity made by CSBF committers  degree, betweenness, brokerage  commits, lines changed
  4. 4. Detecting CSBF - I  Step 1: mining cross-referencing commits  openbsd, atphy.c,2008/09/25 20:47:16,brad, Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@  Step 2: mine commits previously performed on files with same name in the other system  freebsd,atphy.c,2008/05/19 01:12:10,yongari, Add Attansic/Atheros F1 PHY driver.  openbsd, atphy.c,2008/09/25 20:47:16,brad, Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@
  5. 5. Detecting CSBF - II  Step 3: compute file similarity with clone detection  CCFinder  Threshold: at least 10% of cloned lines  Step 4: take the previous change with the highest textual similarity in the commit note  Use of Vector Space models  Cosine similarity; threshold (0.20) to filter out unrelated commits Add Attansic/Atheros F1 PHY driver. = 0.72 Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@
  6. 6. Building Committers Network  We extract communication from mailing lists  Bug fixing mailing lists  Heuristic similar to the one of Bird et al. [2006] to map inconsistent namings / emails  Also, to map committer Ids to mailing list names/emails  Nodes of the network labeled as:  Committer / other mailing list contributors  CSBFs committer
  7. 7. Empirical Study Goal: analyze the phenomenon of CSBFs Purpose: understanding its relevance with respect to the social characteristics of the involved developers Context: CVS repositories and mailing lists archives of FreeBSD and OpenBSD  Period: 1993-2009 (FreeBSD), 1998-2009 (OpenBSD)  Commits: 119,000 (FreeBSD), 70,000 (OpenBSD)
  8. 8. Research Questions  RQ1: How do the source code committers and contributors of the two systems overlap?  RQ2: How frequent is the phenomenon of CSBFs?  RQ3: Who are the contributors involved in CSBFs?  RQ4: Are mailing list contributors involved in CSBFs more active than others?
  9. 9. RQ1 – Team overlap FreeBSD OpenBSD Both Committers 383 211 26 Mailing list contribs 8035 3843 359 Committers and 213 122 17 mailing list contributors The two projects have less than 10% of common contributors → the development team of Free and Open BSD is really different
  10. 10. RQ2 – Commit filtering 1000 933 900 800 700 600 500 439 400 296 300 200 133 120 100 59 0 FreeBSD OpenBSD Referring commits Cloned files Linked commits At the end of the filtering not that many but...
  11. 11. RQ2 – Cloned lines in CSBF files C source files header files  Percentage smaller for .h files  Use of preprocessor conditional to make header files system- dependent  #if defined(__FreeBSD__)
  12. 12. RQ3 – CSBF Graph (excerpt)Blue/cyan: FreeBSDRed/orange: OpenBSDYellow: common
  13. 13. RQ3: social characteristics  Importance in terms of  (in/out) degree: number of (incoming/outcoming) communication links  Betweenness: number of communications for which the node is in the short path  Brokerage metrics: useful to analyze the communication between two clusters B is a coordinator B is a gatekeeper B is a representative
  14. 14. RQ3 – social characteristics Representative Gatekeeper 12 Coordinator /10 10 Betweenness / 1000 8 Out-degree Column 1 6 In-degree Column 2 Column 3 4 Degree 2 0 5 10 15 20 25 30 35 40 45 50 0 Row 1 CSBF Row 2 Others Row 3 Row 4  All differences statistically significant  High effect size (Cohen d>1)  Contributors involved in CSBF have a higher importance in the communication and in the flow of communication between systems
  15. 15. RQ3 – committers with highestsocial metrics
  16. 16. RQ4 – change activity of CSBFcommitters and others LOC added/removed Commits40000 1500 100020000 500 0 0 FreeBSD OpenBSD FreeBSD OpenBSD CSBF Others CSBF Others  All differences statistically significant  High effect size (Cohen d∼1)  Contributors involved in CSBF are more active than others
  17. 17. Conclusions and Work-in-Progress  We proposed method to mine CSBF  We reported a study on FreeBSD and OpenBSD where:  Development team is almost disjoint  There is a small, though not negligible portion of CSBF  Committers involved in CSBF have – Higher social importance – Higher brokerage level – Higher activity in source code commits  Work-in-progress:  Better approaches to identify implicit CSBF, tracking and linking changes occurring on both systems  More extensive study on less obvious cases
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×