Mining Development Repositories
         to Study the Impact of
   Collaboration on Software Systems


                          Nicolas Bettenburg
                                nicbet@cs.queensu.ca
                                      SOFTWARE ANALYSIS
                                       & INTELLIGENCE LAB




Wednesday, 11 April, 12                                     1
Software Development is a Social Activity

              Source Code stands in direct relation to
              organizational structure. [Conway:Datamation:1968]


              Developers spent large part of work day
              communicating with fellow developers. [Begel:ICSE:2010]




Wednesday, 11 April, 12                                                 2
Communication is Critical for Success

                          Communication is the most referenced
                          problem in distributed development.
                                                   [Grinter:GROUP:1999]
                                                   [Bird:ACMComm:2009]




Wednesday, 11 April, 12                                                   3
Research Hypothesis

                 “The collaboration between stakeholders
               impacts the code quality and the development
                    community of a software system.”




Wednesday, 11 April, 12                                       4
Proposed Approach


                          I. Extraction of communication data



                          II. Study impact on software quality



           III. Study impact on development community


Wednesday, 11 April, 12                                          5
Proposed Approach


                          I. Extraction of communication data



                          II. Study impact on software quality



           III. Study impact on development community


Wednesday, 11 April, 12                                          6
Available Knowledge in Data



   Version Control Systems          Mailing Lists   Issue Tracking Systems




Wednesday, 11 April, 12                                                      7
Available Knowledge in Data



   Version Control Systems          Mailing Lists   Issue Tracking Systems




                              Communication Data




Wednesday, 11 April, 12                                                      7
Available Knowledge in Data



   Version Control Systems               Mailing Lists                Issue Tracking Systems




                              Communication Data
                                 •   Source Code Comments
                                 •   Change-Log Messages
                                 •   Developer Emails & Discussions
                                 •   Support Dialogues


Wednesday, 11 April, 12                                                                        7
Communication Data Exists
                          Mainly as Unstructured Data

                   In this report, you have defined a parameter named blocksize,
                   which is given a value of "7|D|1|D". In open script of data set,
                   there are below lines code:

                   <script begin>
                   token=Packages.java.util.StringTokenizer(params["blocksize"],"|");
                   vec=new Packages.java.util.Vector();
                   while(token.hasMoreTokens()){
                      vec.addElement(token.nextToken());   Eclipse #150222
                   }
                   params["DateRange"]=java.lang.Integer.parseInt(vec.elementAt(0));
                   </script end>

                   Since the value of params["blocksize"] is "7|D|1|D", vec.elementAt(0)
                   is "7", and then it can not be parsed to int value. In 1.0.1,
                   the value of params["blocksize"] might be 7|D|1|D, so it can be
                   parsed to int value of 7.



                     Extraction and processing of unstructured
                     data is challenging. [MUD:Workshop:2010]
Wednesday, 11 April, 12                                                                    8
Mining Collaboration Data

                                          [Bettenburg:ICPC:2011]

                                             chnical Information in Un                                                                           structured Data
       A Lightw eight Approach to Uncover Te

                                                                                                                             Michel Smidt
                                             ams, Ahmed E. Hassan
                                                                                  Build ID: M20070212-1330
                Nicolas Bettenburg, Bram Ad                                                                   Dept. of Computer Science S)
                                                     gence Lab
                       Software Analysis and Intelli
                                                                                  Steps To Reproduce:
                                                                                                                    Una des a keytyinof Bremen
                                                                                                                                        ng for "M1+S" (ie. Alt+
                                                                                  1. Create a plugin for eclipse that      iversi bindione of the top level
                                                                                                                       inclu

                                 Queen’s University
                                                                                                                                                                                  • Use Spellchecking
                                                                                                                        as mnem  onic
                                                                                                                      as Bremen, for Help > any
                                                                                    where S is any letter that is used
                                                                                                                         the mnemonic Germ &So
                                                                                                                                                     ftware Updates,
                                                                                    menus. Since eclipse uses "S"
                             Kingston, Ontario, Canada                                             Email: michelIDE nformatik.u
                                                                                     "S" is sufficient .                     @i                     ni-bremen.de

                                                                                                                                                                                  • Empirical validation
                                                    cs.queensu.ca
                    Email: {nicbet,bram,ahmed}@
                                                                                  2. Laun  ch the plugin as part of Eclipse                    our example in #1)
                                                                                                               the Help menu (to go along with
                                                                                  3. Press Alt+H to bring down
                                                                                                                 tes" is missing its mnemonic.
                                                                                    BUG: Notice "Software Upda




                                         nication through email, cha
                                                                       t, or
                                                                                   More information:
                                                                                   The code after "if (callback.is
                                                                                   Eclipse's MenuManager.
                                                                                                                   AcceleratorInUse(SWT
                                                                                                            java removes the mnemonic,
                                                                                                                                            .ALT | character))" inside
                                                                                                                                       but it seems like Eclipse
                                                                                                                                           level menumanagers like
                                                                                                                                                                                  • Improved on state of the art
           Abstract—Developer commu
                                                                                                                eratorInUse" only for top
                                                                                   should be checking "isAccel
                                        s mostly of largely uns  tructured
      issue report comments consist
                                                                                         ,Edit,...,Help, etc. :
                                                                      rma-
                                                                                   File
                                     text, mixed with technical info
      data, i.e., natural language                        ons, source code
                                      jargon, abbreviati
                                                                                     /* (non-Javadoc)                               onItem#update(java.l
                                                                                                                                                         ang.String)

      tion such as project-specific
                                                                                                               e.action.IContributi
                                                               cal artifacts           * @see org.eclipse.jfac

       patches, stack traces   and identifiers. These techni                            */
                                         of knowle  dge on the technical                                 tring property) {
       represent a valuable source
                                                                                    public void update(S
                                                         applications from
                                                                                                               = getItems();
                                                                                         tributionItem items[]
                        tem, with a wide range of
                                                                                    ICon
       part of the sys                                                    vo-
                                        s to creating project-specific                                    items.length; i++) {
       establishing traceability link                            en natural
                                                                                    for (int i = 0; i <
                                       e-style delimiters betwe                                      property);
       cabularies. However, the fre
                                                                                    items[i].update(
                                                                      hnical
                                      tent make the mining of tec                   }
       language and technical con                          general-purpose
                                         t step towards a
                                                                                     [...]
        artifacts challenging. As a firs                         information
                                                                                     }

        technique to extractin   g all kinds of technical
                                          present a  lightweight approach            Any status on this bug?
        from unstructured data, we                          guage text. Our
                            cal artifacts and natural lan
                                                                                                                                                   ) [...]
                                                                                                                       for M6 (API) or M7 (non-API               by a prototype
        to untangle techni                                                 are       I'd consider any contributions
                                                                                                                              nical information uncovered
                                         g spell checking tools, which              Figure 1. Examples of tech optionalposed Manager with API (Eclipse Platform
        approach is based on existin                                                                                                 in Menu in this paper.
                                                                                                                                                               and
                                                                     ms and          A 3.5 fix enta be to of the approach pro
                                           available across platfor
                                                                                                                that behaviour
                                                                                    implemwouldtion makeand to have the WorkbenchActionBuilder contributed
         well-understood, fast, readily                                  gh a
                                         of technical artifacts. Throu
                                                                                     off by#208626).in 3.5,
                                                                                            default early                                                     gers turn it on
                                                                                    Bug                                             ions contributed MenuMana
         impartial to different kinds
                                                                                                        and actionSets/editorAct
                                                              our approach
                                                                                      MenuManagers
                                          demonstrate that                                                        in the correct place).
         handcrafted benchmark, we
                                                                                      (if I can find MenuManagers
                                                                    technical
         is able to successfully    uncover a wide range of                                                                  team to make sure we understan
                                                                                                                                                   a
                                                                                                                                                               d what the

                                                                                                                    such, mining unstructured dat
                                                                                         I'd like us to work with the SWT
                                         data.
                                                                                                                                                                       way
                                                                                                                                    sure that we aren't getting in the
          information in unstructured                                                 or project-specific terms. As
                                                                                         correct  platform behavior is, and make
                                                                                                                                           ormation
                                                                                                                                                 onics) seems odd to me, in
                                             ge analysis, unstructured dat
                                                                             a,                                       the exchange of inf
                                                                                                            nt behavior (i.e. turning off mnem
                                                                                      is challenging: it is meant for
                                                                                         of that. The curre
             Keywords-text mining, langua
                                                                                                                               we should fix it properly.
                                                                                                                       automated processing using
                                                                                          general. If we're going to fix this,

          technical information.                                                      between humans, rather than
                                                                                                                       presents an example of tech-
                                                                                      computer machinery. Figure 1
                          I. I NT RO DU CT ION                                                                       found in unstructured data.
                                                                                       nical information commonly
                                      a unique history of design                                                         ering technical information
        Every software system has                                                         Recent approaches for discov
Wednesday, 11 April, 12 changes, as well as development and                                                               e focussed on recognizing                                                            9
           ions, software                                                                 unstructured data [3]–[5] hav
Proposed Approach


                          I. Extraction of communication data



                          II. Study impact on software quality



           III. Study impact on development community


Wednesday, 11 April, 12                                          10
Proposed Approach


                          I. Extraction of communication data



                          II. Study impact on software quality



           III. Study impact on development community


Wednesday, 11 April, 12                                          10
Proposed Approach


                          I. Extraction of communication data



                          II. Study impact on software quality



           III. Study impact on development community


Wednesday, 11 April, 12                                          11
Proposed Approach


                          I. Extraction of communication data



                          II. Study impact on software quality



           III. Study impact on development community


Wednesday, 11 April, 12                                          11
Quantify Impact on Quality: Idea


             Extracted Communication Data




Wednesday, 11 April, 12                                      12
Quantify Impact on Quality: Idea


             Extracted Communication Data
                                 compute


                             Social Metrics




Wednesday, 11 April, 12                                      12
Quantify Impact on Quality: Idea


             Extracted Communication Data
                                   compute


                             Social Metrics

                             measure relationships


                          Post-Release Defects



Wednesday, 11 April, 12                                      12
Discussion           Social
                   CONTENT           STRUCTURES

                            4 Dimensions
                            of Measures


               Measures of          Communication
               WORKFLOW               DYNAMICS

Wednesday, 11 April, 12                             13
Conceptual Approach


                            Measure         Measure
                            Discussion     Post-Release
                            Metrics           Bugs



                             6 months       6 months
                                                          time


                          LINK USING STATISTICAL MODELS
Wednesday, 11 April, 12                                          14
Findings of our work

               (1) Social metrics explain post-release defects
               as good as code metrics.




Wednesday, 11 April, 12                                          15
Findings of our work

               (1) Social metrics explain post-release defects
               as good as code metrics.

              (2) Combination of social metrics and code
              metrics is cumulative.




Wednesday, 11 April, 12                                          15
Findings of our work

               (1) Social metrics explain post-release defects
               as good as code metrics.

              (2) Combination of social metrics and code
              metrics is cumulative.

              (3) Identify factors that have positive and
              negative relationships with defects.


Wednesday, 11 April, 12                                          15
Findings of our work

               (1) Social metrics explain post-release defects
               as good as code metrics.

              (2) Combination of social metrics and code
              metrics is cumulative.

              (3) Identify factors that have positive and
              negative relationships with defects.

                                                    [ICPC‘2010] (Best Paper)
                                                    [JEMSE?]
Wednesday, 11 April, 12                                                   15
Proposed Approach


                          I. Extraction of communication data



                          II. Study impact on software quality



           III. Study impact on development community


Wednesday, 11 April, 12                                          16
Proposed Approach


                          I. Extraction of communication data



                          II. Study impact on software quality



           III. Study impact on development community


Wednesday, 11 April, 12                                          16
Proposed Approach


                          I. Extraction of communication data



                          II. Study impact on software quality



           III. Study impact on development community


Wednesday, 11 April, 12                                          16
Proposed Approach


                          I. Extraction of communication data



                          II. Study impact on software quality



           III. Study impact on development community


Wednesday, 11 April, 12                                          17
Proposed Approach


                          I. Extraction of communication data



                          II. Study impact on software quality



           III. Study impact on development community


Wednesday, 11 April, 12                                          17
Proposed Approach


                          I. Extraction of communication data



                          II. Study impact on software quality



           III. Study impact on development community


Wednesday, 11 April, 12                                          17
Available Knowledge in Data



     Code Review Systems            Mailing Lists    Issue Tracking Systems




                             Data on Management
                             of Code Contributions


Wednesday, 11 April, 12                                                       18
Contribution Management




                                   Patch
                                                                     Project




                                                   Feedback
                                                                    Repository
                  Feedback




                             Submission



                               Review      OK   Verification   OK
                                                                   Integration




Wednesday, 11 April, 12                                                          19
Studying Impact on Community through
                        Contribution Management



   Goal:
   Study how contributors, reviewers, verifiers and the
   software are impacted by communication (anomalies)
   through statistical models.

   Example:
   Reviewers leaving community due to lack of feedback


Wednesday, 11 April, 12                                  20
Available Knowledge in Data



   Version Control Systems          Mailing Lists   Issue Tracking Systems




                              Workflow Information
                                Social Networks


Wednesday, 11 April, 12                                                      21
Evolution of Code-Knowledge
                                  Communities
                                                                        Internet Explorer                                                                                          reed
                                                                                                                                                                                           masayuki




                              cjcypoi02                                                                                                                                                                                                            dietrich
                                          steve.england                                                                                                                               corevette
                                                                                                                                                                                                                                                           steffen.wilberg
                                                                                                                                                                                                  davemgarrett
                               mmortal03                                                                                  timeless                                                                                                                 mano
                                                 fittysix
                                                                                                                                                                           matspal
                                                                                                                                                   longsonr
                                                                                                                                                                                                                 zurtex
                                                                                                                  matti                                                                                                                                                   edilee
                                                                                                                                                                                                                                      mconnor
                                                                                                                                       cwwmozilla                                                                                                    beltzner
                                                                                                                             dveditz
                                       adelfino                                                                                                               zeniko
                                                                                                                                                                                                                          kliu
                                                    alice0775
                                                                                                                                                                                          sziadeh mark.finkle                                                  robert.bugzilla
                                                                                                                                                                                                                                   philringnalda


                                       sgautherie.bz                                                                                       kev
                                                                                                                                                                                                                    faaborg
                                                                                                                   johnath
                                                                                                                                                   martijn.martijn

                                               jmjeffery        jo.hermans          nrthomas gavin.sharp                                                                                                 polidobj

                                                                                                                                             m-wada
                                                                                                                                                                                                                                                   XML Parser
                                                                                                             jbecerra                                                                                      jdarmochwal
                                     john.p.baker           jruderman                                                                                                                           mak77
                                                                             ria.klaassen
                          VYV03354                                                               cbook                                                                           bomfog
                                                                                                                                                                                                                                                                                   dao
                                                                                                                  elmar.ludwig                         sdaugherty
                                                                                                                                                                                                         vseerror
                               nightstalkerz        l10n                 highmind63                                                                                                            twalker
                                                                                                                                                                                                                                                                     mh+mozilla
                                                                                                                                                                                                                                                                                   klaas1988
                                                                                                                 ehsan     stephen.donner
                                                                                            me.at.work
                                                                                                                                                       phiw
                                                                      hskupin
                                                                                                                  ctalbert
                                                                                       tchung                                                              tomer

                                                                                                     marcia                                                              timwi                                                                                    rotis
                                                                                                                                                                                                                                                   uliss

                                                                                                                                       sylvain.pasche
                                                                                                                                                         bugzilla
                                                                                                                             marco.zehe                                                                                                                 cl-bugs-new2



                          JavaScript
                                                                                                                  tonglebeak
                                                                                                     abillings                                                                                                                                                info                             UI
                            Engine
                                                                              deletesoftware                                                   anselm.meyer

                                                                                                                  eddy_nigg
                                                                                                                                                                                                                                                              matt
                                                                                                                                                   RainerStroebel
                                                                samuel.sidler+old                                                       alex
                                                                                                hasham8888

                                                                                                                                                                                                                                             aarobertxtr
                                                                                                                                                                                                                                 manujsabarwal           johnjbarton

                                                                                    myles7897
                                                                                                                                           paulc
                                                                                                                                                                                                                                                    shaver
                                                                                                                                                                                                                                      smichaud


                                                                                                                                 mozilla
                                                                                                                                            zhangchunlin                                                                                                      dtownsend
                                                                                                                                                                                                                                            jdaggett
                                                                                                                                                              kbrosnan

                                                                                                                                                                                                                                                       bzbarsky
                                                                                                                                                    sdwilsh




Wednesday, 11 April, 12                                                                                                                                                                                                                                                                             22
Thesis Progress


          Tools and techniques                    Empirical Validation
       for mining communication repositories    of presented tools and techniques




            Empirical Validation                  Empirical Validation
      of relationship between collaboration    of relationship between collaboration
                and software quality.                 and development teams.




Wednesday, 11 April, 12                                                                23
Thesis Progress


          Tools and techniques                    Empirical Validation
       for mining communication repositories    of presented tools and techniques




            Empirical Validation                  Empirical Validation
      of relationship between collaboration    of relationship between collaboration
                and software quality.                 and development teams.




Wednesday, 11 April, 12                                                                23
Thesis Progress


          Tools and techniques                    Empirical Validation
       for mining communication repositories    of presented tools and techniques




            Empirical Validation                  Empirical Validation
      of relationship between collaboration    of relationship between collaboration
                and software quality.                 and development teams.




Wednesday, 11 April, 12                                                                23
Thesis Progress


          Tools and techniques                    Empirical Validation
       for mining communication repositories    of presented tools and techniques




            Empirical Validation                  Empirical Validation
      of relationship between collaboration    of relationship between collaboration
                and software quality.                 and development teams.




Wednesday, 11 April, 12                                                                23
Thesis Progress


          Tools and techniques                    Empirical Validation
       for mining communication repositories    of presented tools and techniques




            Empirical Validation                  Empirical Validation
      of relationship between collaboration    of relationship between collaboration
                and software quality.                 and development teams.




Wednesday, 11 April, 12                                                                23
Points for Discussion


          • How to do evaluation of code-knowledge
                 communities? (ground truth)?
          • Applicability to industrial settings (almost no
                 communication data records available)?
          • Extend work to defect prediction?
          • Practical implications: management,
                 moderation, staffing, ... ?


Wednesday, 11 April, 12                                       24

Mining Development Repositories to Study the Impact of Collaboration on Software Systems

  • 1.
    Mining Development Repositories to Study the Impact of Collaboration on Software Systems Nicolas Bettenburg nicbet@cs.queensu.ca SOFTWARE ANALYSIS & INTELLIGENCE LAB Wednesday, 11 April, 12 1
  • 2.
    Software Development isa Social Activity Source Code stands in direct relation to organizational structure. [Conway:Datamation:1968] Developers spent large part of work day communicating with fellow developers. [Begel:ICSE:2010] Wednesday, 11 April, 12 2
  • 3.
    Communication is Criticalfor Success Communication is the most referenced problem in distributed development. [Grinter:GROUP:1999] [Bird:ACMComm:2009] Wednesday, 11 April, 12 3
  • 4.
    Research Hypothesis “The collaboration between stakeholders impacts the code quality and the development community of a software system.” Wednesday, 11 April, 12 4
  • 5.
    Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development community Wednesday, 11 April, 12 5
  • 6.
    Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development community Wednesday, 11 April, 12 6
  • 7.
    Available Knowledge inData Version Control Systems Mailing Lists Issue Tracking Systems Wednesday, 11 April, 12 7
  • 8.
    Available Knowledge inData Version Control Systems Mailing Lists Issue Tracking Systems Communication Data Wednesday, 11 April, 12 7
  • 9.
    Available Knowledge inData Version Control Systems Mailing Lists Issue Tracking Systems Communication Data • Source Code Comments • Change-Log Messages • Developer Emails & Discussions • Support Dialogues Wednesday, 11 April, 12 7
  • 10.
    Communication Data Exists Mainly as Unstructured Data In this report, you have defined a parameter named blocksize, which is given a value of "7|D|1|D". In open script of data set, there are below lines code: <script begin> token=Packages.java.util.StringTokenizer(params["blocksize"],"|"); vec=new Packages.java.util.Vector(); while(token.hasMoreTokens()){ vec.addElement(token.nextToken()); Eclipse #150222 } params["DateRange"]=java.lang.Integer.parseInt(vec.elementAt(0)); </script end> Since the value of params["blocksize"] is "7|D|1|D", vec.elementAt(0) is "7", and then it can not be parsed to int value. In 1.0.1, the value of params["blocksize"] might be 7|D|1|D, so it can be parsed to int value of 7. Extraction and processing of unstructured data is challenging. [MUD:Workshop:2010] Wednesday, 11 April, 12 8
  • 11.
    Mining Collaboration Data [Bettenburg:ICPC:2011] chnical Information in Un structured Data A Lightw eight Approach to Uncover Te Michel Smidt ams, Ahmed E. Hassan Build ID: M20070212-1330 Nicolas Bettenburg, Bram Ad Dept. of Computer Science S) gence Lab Software Analysis and Intelli Steps To Reproduce: Una des a keytyinof Bremen ng for "M1+S" (ie. Alt+ 1. Create a plugin for eclipse that iversi bindione of the top level inclu Queen’s University • Use Spellchecking as mnem onic as Bremen, for Help > any where S is any letter that is used the mnemonic Germ &So ftware Updates, menus. Since eclipse uses "S" Kingston, Ontario, Canada Email: michelIDE nformatik.u "S" is sufficient . @i ni-bremen.de • Empirical validation cs.queensu.ca Email: {nicbet,bram,ahmed}@ 2. Laun ch the plugin as part of Eclipse our example in #1) the Help menu (to go along with 3. Press Alt+H to bring down tes" is missing its mnemonic. BUG: Notice "Software Upda nication through email, cha t, or More information: The code after "if (callback.is Eclipse's MenuManager. AcceleratorInUse(SWT java removes the mnemonic, .ALT | character))" inside but it seems like Eclipse level menumanagers like • Improved on state of the art Abstract—Developer commu eratorInUse" only for top should be checking "isAccel s mostly of largely uns tructured issue report comments consist ,Edit,...,Help, etc. : rma- File text, mixed with technical info data, i.e., natural language ons, source code jargon, abbreviati /* (non-Javadoc) onItem#update(java.l ang.String) tion such as project-specific e.action.IContributi cal artifacts * @see org.eclipse.jfac patches, stack traces and identifiers. These techni */ of knowle dge on the technical tring property) { represent a valuable source public void update(S applications from = getItems(); tributionItem items[] tem, with a wide range of ICon part of the sys vo- s to creating project-specific items.length; i++) { establishing traceability link en natural for (int i = 0; i < e-style delimiters betwe property); cabularies. However, the fre items[i].update( hnical tent make the mining of tec } language and technical con general-purpose t step towards a [...] artifacts challenging. As a firs information } technique to extractin g all kinds of technical present a lightweight approach Any status on this bug? from unstructured data, we guage text. Our cal artifacts and natural lan ) [...] for M6 (API) or M7 (non-API by a prototype to untangle techni are I'd consider any contributions nical information uncovered g spell checking tools, which Figure 1. Examples of tech optionalposed Manager with API (Eclipse Platform approach is based on existin in Menu in this paper. and ms and A 3.5 fix enta be to of the approach pro available across platfor that behaviour implemwouldtion makeand to have the WorkbenchActionBuilder contributed well-understood, fast, readily gh a of technical artifacts. Throu off by#208626).in 3.5, default early gers turn it on Bug ions contributed MenuMana impartial to different kinds and actionSets/editorAct our approach MenuManagers demonstrate that in the correct place). handcrafted benchmark, we (if I can find MenuManagers technical is able to successfully uncover a wide range of team to make sure we understan a d what the such, mining unstructured dat I'd like us to work with the SWT data. way sure that we aren't getting in the information in unstructured or project-specific terms. As correct platform behavior is, and make ormation onics) seems odd to me, in ge analysis, unstructured dat a, the exchange of inf nt behavior (i.e. turning off mnem is challenging: it is meant for of that. The curre Keywords-text mining, langua we should fix it properly. automated processing using general. If we're going to fix this, technical information. between humans, rather than presents an example of tech- computer machinery. Figure 1 I. I NT RO DU CT ION found in unstructured data. nical information commonly a unique history of design ering technical information Every software system has Recent approaches for discov Wednesday, 11 April, 12 changes, as well as development and e focussed on recognizing 9 ions, software unstructured data [3]–[5] hav
  • 12.
    Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development community Wednesday, 11 April, 12 10
  • 13.
    Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development community Wednesday, 11 April, 12 10
  • 14.
    Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development community Wednesday, 11 April, 12 11
  • 15.
    Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development community Wednesday, 11 April, 12 11
  • 16.
    Quantify Impact onQuality: Idea Extracted Communication Data Wednesday, 11 April, 12 12
  • 17.
    Quantify Impact onQuality: Idea Extracted Communication Data compute Social Metrics Wednesday, 11 April, 12 12
  • 18.
    Quantify Impact onQuality: Idea Extracted Communication Data compute Social Metrics measure relationships Post-Release Defects Wednesday, 11 April, 12 12
  • 19.
    Discussion Social CONTENT STRUCTURES 4 Dimensions of Measures Measures of Communication WORKFLOW DYNAMICS Wednesday, 11 April, 12 13
  • 20.
    Conceptual Approach Measure Measure Discussion Post-Release Metrics Bugs 6 months 6 months time LINK USING STATISTICAL MODELS Wednesday, 11 April, 12 14
  • 21.
    Findings of ourwork (1) Social metrics explain post-release defects as good as code metrics. Wednesday, 11 April, 12 15
  • 22.
    Findings of ourwork (1) Social metrics explain post-release defects as good as code metrics. (2) Combination of social metrics and code metrics is cumulative. Wednesday, 11 April, 12 15
  • 23.
    Findings of ourwork (1) Social metrics explain post-release defects as good as code metrics. (2) Combination of social metrics and code metrics is cumulative. (3) Identify factors that have positive and negative relationships with defects. Wednesday, 11 April, 12 15
  • 24.
    Findings of ourwork (1) Social metrics explain post-release defects as good as code metrics. (2) Combination of social metrics and code metrics is cumulative. (3) Identify factors that have positive and negative relationships with defects. [ICPC‘2010] (Best Paper) [JEMSE?] Wednesday, 11 April, 12 15
  • 25.
    Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development community Wednesday, 11 April, 12 16
  • 26.
    Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development community Wednesday, 11 April, 12 16
  • 27.
    Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development community Wednesday, 11 April, 12 16
  • 28.
    Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development community Wednesday, 11 April, 12 17
  • 29.
    Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development community Wednesday, 11 April, 12 17
  • 30.
    Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development community Wednesday, 11 April, 12 17
  • 31.
    Available Knowledge inData Code Review Systems Mailing Lists Issue Tracking Systems Data on Management of Code Contributions Wednesday, 11 April, 12 18
  • 32.
    Contribution Management Patch Project Feedback Repository Feedback Submission Review OK Verification OK Integration Wednesday, 11 April, 12 19
  • 33.
    Studying Impact onCommunity through Contribution Management Goal: Study how contributors, reviewers, verifiers and the software are impacted by communication (anomalies) through statistical models. Example: Reviewers leaving community due to lack of feedback Wednesday, 11 April, 12 20
  • 34.
    Available Knowledge inData Version Control Systems Mailing Lists Issue Tracking Systems Workflow Information Social Networks Wednesday, 11 April, 12 21
  • 35.
    Evolution of Code-Knowledge Communities Internet Explorer reed masayuki cjcypoi02 dietrich steve.england corevette steffen.wilberg davemgarrett mmortal03 timeless mano fittysix matspal longsonr zurtex matti edilee mconnor cwwmozilla beltzner dveditz adelfino zeniko kliu alice0775 sziadeh mark.finkle robert.bugzilla philringnalda sgautherie.bz kev faaborg johnath martijn.martijn jmjeffery jo.hermans nrthomas gavin.sharp polidobj m-wada XML Parser jbecerra jdarmochwal john.p.baker jruderman mak77 ria.klaassen VYV03354 cbook bomfog dao elmar.ludwig sdaugherty vseerror nightstalkerz l10n highmind63 twalker mh+mozilla klaas1988 ehsan stephen.donner me.at.work phiw hskupin ctalbert tchung tomer marcia timwi rotis uliss sylvain.pasche bugzilla marco.zehe cl-bugs-new2 JavaScript tonglebeak abillings info UI Engine deletesoftware anselm.meyer eddy_nigg matt RainerStroebel samuel.sidler+old alex hasham8888 aarobertxtr manujsabarwal johnjbarton myles7897 paulc shaver smichaud mozilla zhangchunlin dtownsend jdaggett kbrosnan bzbarsky sdwilsh Wednesday, 11 April, 12 22
  • 36.
    Thesis Progress Tools and techniques Empirical Validation for mining communication repositories of presented tools and techniques Empirical Validation Empirical Validation of relationship between collaboration of relationship between collaboration and software quality. and development teams. Wednesday, 11 April, 12 23
  • 37.
    Thesis Progress Tools and techniques Empirical Validation for mining communication repositories of presented tools and techniques Empirical Validation Empirical Validation of relationship between collaboration of relationship between collaboration and software quality. and development teams. Wednesday, 11 April, 12 23
  • 38.
    Thesis Progress Tools and techniques Empirical Validation for mining communication repositories of presented tools and techniques Empirical Validation Empirical Validation of relationship between collaboration of relationship between collaboration and software quality. and development teams. Wednesday, 11 April, 12 23
  • 39.
    Thesis Progress Tools and techniques Empirical Validation for mining communication repositories of presented tools and techniques Empirical Validation Empirical Validation of relationship between collaboration of relationship between collaboration and software quality. and development teams. Wednesday, 11 April, 12 23
  • 40.
    Thesis Progress Tools and techniques Empirical Validation for mining communication repositories of presented tools and techniques Empirical Validation Empirical Validation of relationship between collaboration of relationship between collaboration and software quality. and development teams. Wednesday, 11 April, 12 23
  • 41.
    Points for Discussion • How to do evaluation of code-knowledge communities? (ground truth)? • Applicability to industrial settings (almost no communication data records available)? • Extend work to defect prediction? • Practical implications: management, moderation, staffing, ... ? Wednesday, 11 April, 12 24