SlideShare a Scribd company logo
Social Interactions around
Cross-System Bug Fixings:
        The Case of
  FreeBSD and OpenBSD
  Gerardo Canfora, Luigi Cerulo,
Marta Cimitile, Massimiliano Di Penta
       dipenta@unisannio.it
Context
  Source code is often reused across different systems
    Unixes (FreeBSD, OpenBSD, Linux)
    Office applications (NeoOffice, OpenOffice)
    Desktop environment apps (KDE or GNOME apps)
  Maintenance might require to propagate bug fixings
    We call this “Cross System Bug Fixing” (CSBF)


  Example:
     FreeBSD, 1996/01/19, file ip_icmp.h:
       – “Added definitions for ICMP router discovery. Reviewed by:
         wollman
     OpenBSD, 1996/08/02, file ip_icmp.h:
       – “ICMP Router Discovery definitions; from FreeBSD”
What we propose
  A method to track CSBFs
  A study on the social characteristics
   and development activity made by
   CSBF committers
    degree, betweenness, brokerage
    commits, lines changed
Detecting CSBF - I
  Step 1: mining cross-referencing commits
    openbsd, atphy.c,2008/09/25 20:47:16,brad,
     Add a driver for the Attansic F1 PHY. From FreeBSD via
     kevlo@
  Step 2: mine commits previously performed on files
   with same name in the other system
    freebsd,atphy.c,2008/05/19 01:12:10,yongari,
     Add Attansic/Atheros F1 PHY driver.
    openbsd, atphy.c,2008/09/25 20:47:16,brad,
     Add a driver for the Attansic F1 PHY. From FreeBSD via
     kevlo@
Detecting CSBF - II
  Step 3: compute file similarity with clone detection
    CCFinder
    Threshold: at least 10% of cloned lines
  Step 4: take the previous change with the highest
   textual similarity in the commit note
    Use of Vector Space models
    Cosine similarity; threshold (0.20) to filter out unrelated
     commits

                  Add Attansic/Atheros F1 PHY driver.

                                    =    0.72

         Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@
Building Committers' Network
  We extract communication from mailing
   lists
    Bug fixing mailing lists
  Heuristic similar to the one of Bird et al.
   [2006] to map inconsistent namings /
   emails
    Also, to map committer Ids to mailing list
     names/emails
  Nodes of the network labeled as:
    Committer / other mailing list contributors
    CSBFs committer
Empirical Study
 Goal: analyze the phenomenon of CSBFs
 Purpose: understanding its relevance with
  respect to the social characteristics of the
  involved developers
 Context: CVS repositories and mailing lists
  archives of FreeBSD and OpenBSD
   Period: 1993-2009 (FreeBSD), 1998-2009
    (OpenBSD)
   Commits: 119,000 (FreeBSD), 70,000 (OpenBSD)
Research Questions
  RQ1: How do the source code committers
   and contributors of the two systems
   overlap?
  RQ2: How frequent is the phenomenon of
   CSBFs?
  RQ3: Who are the contributors involved in
   CSBFs?
  RQ4: Are mailing list contributors involved
   in CSBFs more active than others?
RQ1 – Team overlap
                              FreeBSD OpenBSD Both
  Committers                      383      211       26
  Mailing list contribs          8035     3843   359
  Committers and                  213     122        17
  mailing list contributors


  The two projects have less than 10% of
 common contributors →
 the development team of Free and
 Open BSD is really different
RQ2 – Commit filtering
   1000                                           933
    900

    800

    700

    600

    500       439
    400
                                                          296
    300

    200               133                                         120
    100
                              59

     0
                    FreeBSD                             OpenBSD

              Referring commits    Cloned files     Linked commits



          At the end of the filtering not that many but...
RQ2 – Cloned lines in CSBF files




         C source files                        header files
  Percentage smaller for .h files
  Use of preprocessor conditional to make header files system-
   dependent
    #if defined(__FreeBSD__)
RQ3 – CSBF Graph (excerpt)
Blue/cyan: FreeBSD
Red/orange: OpenBSD
Yellow: common
RQ3: social characteristics
  Importance in terms of
    (in/out) degree: number of (incoming/outcoming)
     communication links
    Betweenness: number of communications for which the
     node is in the short path
  Brokerage metrics: useful to analyze the
   communication between two clusters

                                B is a coordinator

                                B is a gatekeeper

                                B is a representative
RQ3 – social characteristics
       Representative
          Gatekeeper
           12
       Coordinator /10
           10
   Betweenness / 1000
           8
          Out-degree
                                                                          Column 1
           6
                In-degree                                                 Column 2
                                                                          Column 3
           4
                  Degree
           2                0   5       10   15    20   25    30     35   40   45    50
           0
                   Row 1            CSBF
                                Row 2             Others
                                              Row 3          Row 4



  All differences statistically significant
  High effect size (Cohen d>1)
  Contributors involved in CSBF have a higher importance in
   the communication and in the flow of communication
   between systems
RQ3 – committers with highest
social metrics
RQ4 – change activity of CSBF
committers and others
        LOC added/removed                 Commits
40000                           1500
                                1000
20000
                                 500

    0                              0
         FreeBSD      OpenBSD          FreeBSD      OpenBSD

           CSBF    Others                CSBF    Others




    All differences statistically significant
    High effect size (Cohen d∼1)
    Contributors involved in CSBF are more active
     than others
Conclusions and Work-in-Progress
  We proposed method to mine CSBF
  We reported a study on FreeBSD and OpenBSD where:
    Development team is almost disjoint
    There is a small, though not negligible portion of CSBF
    Committers involved in CSBF have
     – Higher social importance
     – Higher brokerage level
     – Higher activity in source code commits
  Work-in-progress:
    Better approaches to identify implicit CSBF, tracking and
     linking changes occurring on both systems
    More extensive study on less obvious cases

More Related Content

Similar to Dipenta msr2011-csbf

OSI model ,Layers in OSI model, Detail .pptx
OSI model ,Layers in OSI model, Detail .pptxOSI model ,Layers in OSI model, Detail .pptx
OSI model ,Layers in OSI model, Detail .pptx
Milind Potdar
 
Intrebari si raspunsuri CCNA1
Intrebari si raspunsuri CCNA1Intrebari si raspunsuri CCNA1
Intrebari si raspunsuri CCNA1Adrian Preda
 
Basic networking 07-2012
Basic networking 07-2012Basic networking 07-2012
Basic networking 07-2012
Samuel Dratwa
 
OSI - OSI Reference Model and TCP (Transmission Control Protocol)
OSI - OSI Reference Model and TCP (Transmission Control Protocol)OSI - OSI Reference Model and TCP (Transmission Control Protocol)
OSI - OSI Reference Model and TCP (Transmission Control Protocol)
Dktechnozone.in
 
Chapter-2.pdf
Chapter-2.pdfChapter-2.pdf
Chapter-2.pdf
MrMuneeb2
 
300192190-Chapter-2-Network-Models-Exercise-Question-With-Solution.pdf
300192190-Chapter-2-Network-Models-Exercise-Question-With-Solution.pdf300192190-Chapter-2-Network-Models-Exercise-Question-With-Solution.pdf
300192190-Chapter-2-Network-Models-Exercise-Question-With-Solution.pdf
Mohamedshabana38
 
Chapter 2: Network Models
Chapter 2: Network ModelsChapter 2: Network Models
Chapter 2: Network Models
Shafaan Khaliq Bhatti
 
Network Evolution, Standards, & Layered Architectures 2012
Network Evolution, Standards, & Layered Architectures 2012Network Evolution, Standards, & Layered Architectures 2012
Network Evolution, Standards, & Layered Architectures 2012
Tiffany Hamburg Hamburg
 
Assignment izaz sir
Assignment izaz sirAssignment izaz sir
Assignment izaz sir
ahmad iqbal
 
Network_Model. In the field of Computer Networking.ppt
Network_Model. In the field of Computer Networking.pptNetwork_Model. In the field of Computer Networking.ppt
Network_Model. In the field of Computer Networking.ppt
BlackHat41
 
Layer_arc_and_OSI_MODEL.ppt
Layer_arc_and_OSI_MODEL.pptLayer_arc_and_OSI_MODEL.ppt
Layer_arc_and_OSI_MODEL.ppt
BeniamTekeste
 
OSI Pankaj yadav
OSI  Pankaj yadavOSI  Pankaj yadav
OSI Pankaj yadav
BBAU Lucknow University
 
1b network models
1b network models1b network models
1b network modelskavish dani
 
Ch 2 network
Ch 2 networkCh 2 network
Ch 2 network
MohamedAbdELhamed35
 
Robot Operating Systems (Ros) Overview & (1)
Robot Operating Systems (Ros) Overview & (1)Robot Operating Systems (Ros) Overview & (1)
Robot Operating Systems (Ros) Overview & (1)Piyush Chand
 
Robot operating systems (ros) overview & (1)
Robot operating systems (ros) overview & (1)Robot operating systems (ros) overview & (1)
Robot operating systems (ros) overview & (1)
Piyush Chand
 
Chapter 2 network models -computer_network
Chapter 2   network models -computer_networkChapter 2   network models -computer_network
Chapter 2 network models -computer_network
Dhairya Joshi
 
Network layers
Network layersNetwork layers
Network layers
GermaineGenove
 
OSI and TCPIP Model
OSI and TCPIP ModelOSI and TCPIP Model
OSI and TCPIP Model
Tapan Khilar
 

Similar to Dipenta msr2011-csbf (20)

OSI model ,Layers in OSI model, Detail .pptx
OSI model ,Layers in OSI model, Detail .pptxOSI model ,Layers in OSI model, Detail .pptx
OSI model ,Layers in OSI model, Detail .pptx
 
Intrebari si raspunsuri CCNA1
Intrebari si raspunsuri CCNA1Intrebari si raspunsuri CCNA1
Intrebari si raspunsuri CCNA1
 
Basic networking 07-2012
Basic networking 07-2012Basic networking 07-2012
Basic networking 07-2012
 
Ch02
Ch02Ch02
Ch02
 
OSI - OSI Reference Model and TCP (Transmission Control Protocol)
OSI - OSI Reference Model and TCP (Transmission Control Protocol)OSI - OSI Reference Model and TCP (Transmission Control Protocol)
OSI - OSI Reference Model and TCP (Transmission Control Protocol)
 
Chapter-2.pdf
Chapter-2.pdfChapter-2.pdf
Chapter-2.pdf
 
300192190-Chapter-2-Network-Models-Exercise-Question-With-Solution.pdf
300192190-Chapter-2-Network-Models-Exercise-Question-With-Solution.pdf300192190-Chapter-2-Network-Models-Exercise-Question-With-Solution.pdf
300192190-Chapter-2-Network-Models-Exercise-Question-With-Solution.pdf
 
Chapter 2: Network Models
Chapter 2: Network ModelsChapter 2: Network Models
Chapter 2: Network Models
 
Network Evolution, Standards, & Layered Architectures 2012
Network Evolution, Standards, & Layered Architectures 2012Network Evolution, Standards, & Layered Architectures 2012
Network Evolution, Standards, & Layered Architectures 2012
 
Assignment izaz sir
Assignment izaz sirAssignment izaz sir
Assignment izaz sir
 
Network_Model. In the field of Computer Networking.ppt
Network_Model. In the field of Computer Networking.pptNetwork_Model. In the field of Computer Networking.ppt
Network_Model. In the field of Computer Networking.ppt
 
Layer_arc_and_OSI_MODEL.ppt
Layer_arc_and_OSI_MODEL.pptLayer_arc_and_OSI_MODEL.ppt
Layer_arc_and_OSI_MODEL.ppt
 
OSI Pankaj yadav
OSI  Pankaj yadavOSI  Pankaj yadav
OSI Pankaj yadav
 
1b network models
1b network models1b network models
1b network models
 
Ch 2 network
Ch 2 networkCh 2 network
Ch 2 network
 
Robot Operating Systems (Ros) Overview & (1)
Robot Operating Systems (Ros) Overview & (1)Robot Operating Systems (Ros) Overview & (1)
Robot Operating Systems (Ros) Overview & (1)
 
Robot operating systems (ros) overview & (1)
Robot operating systems (ros) overview & (1)Robot operating systems (ros) overview & (1)
Robot operating systems (ros) overview & (1)
 
Chapter 2 network models -computer_network
Chapter 2   network models -computer_networkChapter 2   network models -computer_network
Chapter 2 network models -computer_network
 
Network layers
Network layersNetwork layers
Network layers
 
OSI and TCPIP Model
OSI and TCPIP ModelOSI and TCPIP Model
OSI and TCPIP Model
 

Recently uploaded

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 

Recently uploaded (20)

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 

Dipenta msr2011-csbf

  • 1. Social Interactions around Cross-System Bug Fixings: The Case of FreeBSD and OpenBSD Gerardo Canfora, Luigi Cerulo, Marta Cimitile, Massimiliano Di Penta dipenta@unisannio.it
  • 2. Context  Source code is often reused across different systems  Unixes (FreeBSD, OpenBSD, Linux)  Office applications (NeoOffice, OpenOffice)  Desktop environment apps (KDE or GNOME apps)  Maintenance might require to propagate bug fixings  We call this “Cross System Bug Fixing” (CSBF)  Example:  FreeBSD, 1996/01/19, file ip_icmp.h: – “Added definitions for ICMP router discovery. Reviewed by: wollman  OpenBSD, 1996/08/02, file ip_icmp.h: – “ICMP Router Discovery definitions; from FreeBSD”
  • 3. What we propose  A method to track CSBFs  A study on the social characteristics and development activity made by CSBF committers  degree, betweenness, brokerage  commits, lines changed
  • 4. Detecting CSBF - I  Step 1: mining cross-referencing commits  openbsd, atphy.c,2008/09/25 20:47:16,brad, Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@  Step 2: mine commits previously performed on files with same name in the other system  freebsd,atphy.c,2008/05/19 01:12:10,yongari, Add Attansic/Atheros F1 PHY driver.  openbsd, atphy.c,2008/09/25 20:47:16,brad, Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@
  • 5. Detecting CSBF - II  Step 3: compute file similarity with clone detection  CCFinder  Threshold: at least 10% of cloned lines  Step 4: take the previous change with the highest textual similarity in the commit note  Use of Vector Space models  Cosine similarity; threshold (0.20) to filter out unrelated commits Add Attansic/Atheros F1 PHY driver. = 0.72 Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@
  • 6. Building Committers' Network  We extract communication from mailing lists  Bug fixing mailing lists  Heuristic similar to the one of Bird et al. [2006] to map inconsistent namings / emails  Also, to map committer Ids to mailing list names/emails  Nodes of the network labeled as:  Committer / other mailing list contributors  CSBFs committer
  • 7. Empirical Study  Goal: analyze the phenomenon of CSBFs  Purpose: understanding its relevance with respect to the social characteristics of the involved developers  Context: CVS repositories and mailing lists archives of FreeBSD and OpenBSD  Period: 1993-2009 (FreeBSD), 1998-2009 (OpenBSD)  Commits: 119,000 (FreeBSD), 70,000 (OpenBSD)
  • 8. Research Questions  RQ1: How do the source code committers and contributors of the two systems overlap?  RQ2: How frequent is the phenomenon of CSBFs?  RQ3: Who are the contributors involved in CSBFs?  RQ4: Are mailing list contributors involved in CSBFs more active than others?
  • 9. RQ1 – Team overlap FreeBSD OpenBSD Both Committers 383 211 26 Mailing list contribs 8035 3843 359 Committers and 213 122 17 mailing list contributors The two projects have less than 10% of common contributors → the development team of Free and Open BSD is really different
  • 10. RQ2 – Commit filtering 1000 933 900 800 700 600 500 439 400 296 300 200 133 120 100 59 0 FreeBSD OpenBSD Referring commits Cloned files Linked commits At the end of the filtering not that many but...
  • 11. RQ2 – Cloned lines in CSBF files C source files header files  Percentage smaller for .h files  Use of preprocessor conditional to make header files system- dependent  #if defined(__FreeBSD__)
  • 12. RQ3 – CSBF Graph (excerpt) Blue/cyan: FreeBSD Red/orange: OpenBSD Yellow: common
  • 13. RQ3: social characteristics  Importance in terms of  (in/out) degree: number of (incoming/outcoming) communication links  Betweenness: number of communications for which the node is in the short path  Brokerage metrics: useful to analyze the communication between two clusters B is a coordinator B is a gatekeeper B is a representative
  • 14. RQ3 – social characteristics Representative Gatekeeper 12 Coordinator /10 10 Betweenness / 1000 8 Out-degree Column 1 6 In-degree Column 2 Column 3 4 Degree 2 0 5 10 15 20 25 30 35 40 45 50 0 Row 1 CSBF Row 2 Others Row 3 Row 4  All differences statistically significant  High effect size (Cohen d>1)  Contributors involved in CSBF have a higher importance in the communication and in the flow of communication between systems
  • 15. RQ3 – committers with highest social metrics
  • 16. RQ4 – change activity of CSBF committers and others LOC added/removed Commits 40000 1500 1000 20000 500 0 0 FreeBSD OpenBSD FreeBSD OpenBSD CSBF Others CSBF Others  All differences statistically significant  High effect size (Cohen d∼1)  Contributors involved in CSBF are more active than others
  • 17. Conclusions and Work-in-Progress  We proposed method to mine CSBF  We reported a study on FreeBSD and OpenBSD where:  Development team is almost disjoint  There is a small, though not negligible portion of CSBF  Committers involved in CSBF have – Higher social importance – Higher brokerage level – Higher activity in source code commits  Work-in-progress:  Better approaches to identify implicit CSBF, tracking and linking changes occurring on both systems  More extensive study on less obvious cases