Mining Development Repositories         to Study the Impact of   Collaboration on Software Systems                        ...
Software Development is a Social Activity              Source Code stands in direct relation to              organizationa...
Communication is Critical for Success                          Communication is the most referenced                       ...
Research Hypothesis                 “The collaboration between stakeholders               impacts the code quality and the...
Proposed Approach                          I. Extraction of communication data                          II. Study impact o...
Proposed Approach                          I. Extraction of communication data                          II. Study impact o...
Available Knowledge in Data   Version Control Systems          Mailing Lists   Issue Tracking SystemsWednesday, 11 April, ...
Available Knowledge in Data   Version Control Systems          Mailing Lists   Issue Tracking Systems                     ...
Available Knowledge in Data   Version Control Systems               Mailing Lists                Issue Tracking Systems   ...
Communication Data Exists                          Mainly as Unstructured Data                   In this report, you have ...
Mining Collaboration Data                                          [Bettenburg:ICPC:2011]                                 ...
Proposed Approach                          I. Extraction of communication data                          II. Study impact o...
Proposed Approach                          I. Extraction of communication data                          II. Study impact o...
Proposed Approach                          I. Extraction of communication data                          II. Study impact o...
Proposed Approach                          I. Extraction of communication data                          II. Study impact o...
Quantify Impact on Quality: Idea             Extracted Communication DataWednesday, 11 April, 12                          ...
Quantify Impact on Quality: Idea             Extracted Communication Data                                 compute         ...
Quantify Impact on Quality: Idea             Extracted Communication Data                                   compute       ...
Discussion           Social                   CONTENT           STRUCTURES                            4 Dimensions        ...
Conceptual Approach                            Measure         Measure                            Discussion     Post-Rele...
Findings of our work               (1) Social metrics explain post-release defects               as good as code metrics.W...
Findings of our work               (1) Social metrics explain post-release defects               as good as code metrics. ...
Findings of our work               (1) Social metrics explain post-release defects               as good as code metrics. ...
Findings of our work               (1) Social metrics explain post-release defects               as good as code metrics. ...
Proposed Approach                          I. Extraction of communication data                          II. Study impact o...
Proposed Approach                          I. Extraction of communication data                          II. Study impact o...
Proposed Approach                          I. Extraction of communication data                          II. Study impact o...
Proposed Approach                          I. Extraction of communication data                          II. Study impact o...
Proposed Approach                          I. Extraction of communication data                          II. Study impact o...
Proposed Approach                          I. Extraction of communication data                          II. Study impact o...
Available Knowledge in Data     Code Review Systems            Mailing Lists    Issue Tracking Systems                    ...
Contribution Management                                   Patch                                                           ...
Studying Impact on Community through                        Contribution Management   Goal:   Study how contributors, revi...
Available Knowledge in Data   Version Control Systems          Mailing Lists   Issue Tracking Systems                     ...
Evolution of Code-Knowledge                                  Communities                                                  ...
Thesis Progress          Tools and techniques                    Empirical Validation       for mining communication repos...
Thesis Progress          Tools and techniques                    Empirical Validation       for mining communication repos...
Thesis Progress          Tools and techniques                    Empirical Validation       for mining communication repos...
Thesis Progress          Tools and techniques                    Empirical Validation       for mining communication repos...
Thesis Progress          Tools and techniques                    Empirical Validation       for mining communication repos...
Points for Discussion          • How to do evaluation of code-knowledge                 communities? (ground truth)?      ...
Upcoming SlideShare
Loading in …5
×

Mining Development Repositories to Study the Impact of Collaboration on Software Systems

466
-1

Published on

Talk given at the 2011 ESEC/FSE

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
466
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Mining Development Repositories to Study the Impact of Collaboration on Software Systems

  1. 1. Mining Development Repositories to Study the Impact of Collaboration on Software Systems Nicolas Bettenburg nicbet@cs.queensu.ca SOFTWARE ANALYSIS & INTELLIGENCE LABWednesday, 11 April, 12 1
  2. 2. Software Development is a Social Activity Source Code stands in direct relation to organizational structure. [Conway:Datamation:1968] Developers spent large part of work day communicating with fellow developers. [Begel:ICSE:2010]Wednesday, 11 April, 12 2
  3. 3. Communication is Critical for Success Communication is the most referenced problem in distributed development. [Grinter:GROUP:1999] [Bird:ACMComm:2009]Wednesday, 11 April, 12 3
  4. 4. Research Hypothesis “The collaboration between stakeholders impacts the code quality and the development community of a software system.”Wednesday, 11 April, 12 4
  5. 5. Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development communityWednesday, 11 April, 12 5
  6. 6. Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development communityWednesday, 11 April, 12 6
  7. 7. Available Knowledge in Data Version Control Systems Mailing Lists Issue Tracking SystemsWednesday, 11 April, 12 7
  8. 8. Available Knowledge in Data Version Control Systems Mailing Lists Issue Tracking Systems Communication DataWednesday, 11 April, 12 7
  9. 9. Available Knowledge in Data Version Control Systems Mailing Lists Issue Tracking Systems Communication Data • Source Code Comments • Change-Log Messages • Developer Emails & Discussions • Support DialoguesWednesday, 11 April, 12 7
  10. 10. Communication Data Exists Mainly as Unstructured Data In this report, you have defined a parameter named blocksize, which is given a value of "7|D|1|D". In open script of data set, there are below lines code: <script begin> token=Packages.java.util.StringTokenizer(params["blocksize"],"|"); vec=new Packages.java.util.Vector(); while(token.hasMoreTokens()){ vec.addElement(token.nextToken()); Eclipse #150222 } params["DateRange"]=java.lang.Integer.parseInt(vec.elementAt(0)); </script end> Since the value of params["blocksize"] is "7|D|1|D", vec.elementAt(0) is "7", and then it can not be parsed to int value. In 1.0.1, the value of params["blocksize"] might be 7|D|1|D, so it can be parsed to int value of 7. Extraction and processing of unstructured data is challenging. [MUD:Workshop:2010]Wednesday, 11 April, 12 8
  11. 11. Mining Collaboration Data [Bettenburg:ICPC:2011] chnical Information in Un structured Data A Lightw eight Approach to Uncover Te Michel Smidt ams, Ahmed E. Hassan Build ID: M20070212-1330 Nicolas Bettenburg, Bram Ad Dept. of Computer Science S) gence Lab Software Analysis and Intelli Steps To Reproduce: Una des a keytyinof Bremen ng for "M1+S" (ie. Alt+ 1. Create a plugin for eclipse that iversi bindione of the top level inclu Queen’s University • Use Spellchecking as mnem onic as Bremen, for Help > any where S is any letter that is used the mnemonic Germ &So ftware Updates, menus. Since eclipse uses "S" Kingston, Ontario, Canada Email: michelIDE nformatik.u "S" is sufficient . @i ni-bremen.de • Empirical validation cs.queensu.ca Email: {nicbet,bram,ahmed}@ 2. Laun ch the plugin as part of Eclipse our example in #1) the Help menu (to go along with 3. Press Alt+H to bring down tes" is missing its mnemonic. BUG: Notice "Software Upda nication through email, cha t, or More information: The code after "if (callback.is Eclipses MenuManager. AcceleratorInUse(SWT java removes the mnemonic, .ALT | character))" inside but it seems like Eclipse level menumanagers like • Improved on state of the art Abstract—Developer commu eratorInUse" only for top should be checking "isAccel s mostly of largely uns tructured issue report comments consist ,Edit,...,Help, etc. : rma- File text, mixed with technical info data, i.e., natural language ons, source code jargon, abbreviati /* (non-Javadoc) onItem#update(java.l ang.String) tion such as project-specific e.action.IContributi cal artifacts * @see org.eclipse.jfac patches, stack traces and identifiers. These techni */ of knowle dge on the technical tring property) { represent a valuable source public void update(S applications from = getItems(); tributionItem items[] tem, with a wide range of ICon part of the sys vo- s to creating project-specific items.length; i++) { establishing traceability link en natural for (int i = 0; i < e-style delimiters betwe property); cabularies. However, the fre items[i].update( hnical tent make the mining of tec } language and technical con general-purpose t step towards a [...] artifacts challenging. As a firs information } technique to extractin g all kinds of technical present a lightweight approach Any status on this bug? from unstructured data, we guage text. Our cal artifacts and natural lan ) [...] for M6 (API) or M7 (non-API by a prototype to untangle techni are Id consider any contributions nical information uncovered g spell checking tools, which Figure 1. Examples of tech optionalposed Manager with API (Eclipse Platform approach is based on existin in Menu in this paper. and ms and A 3.5 fix enta be to of the approach pro available across platfor that behaviour implemwouldtion makeand to have the WorkbenchActionBuilder contributed well-understood, fast, readily gh a of technical artifacts. Throu off by#208626).in 3.5, default early gers turn it on Bug ions contributed MenuMana impartial to different kinds and actionSets/editorAct our approach MenuManagers demonstrate that in the correct place). handcrafted benchmark, we (if I can find MenuManagers technical is able to successfully uncover a wide range of team to make sure we understan a d what the such, mining unstructured dat Id like us to work with the SWT data. way sure that we arent getting in the information in unstructured or project-specific terms. As correct platform behavior is, and make ormation onics) seems odd to me, in ge analysis, unstructured dat a, the exchange of inf nt behavior (i.e. turning off mnem is challenging: it is meant for of that. The curre Keywords-text mining, langua we should fix it properly. automated processing using general. If were going to fix this, technical information. between humans, rather than presents an example of tech- computer machinery. Figure 1 I. I NT RO DU CT ION found in unstructured data. nical information commonly a unique history of design ering technical information Every software system has Recent approaches for discovWednesday, 11 April, 12 changes, as well as development and e focussed on recognizing 9 ions, software unstructured data [3]–[5] hav
  12. 12. Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development communityWednesday, 11 April, 12 10
  13. 13. Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development communityWednesday, 11 April, 12 10
  14. 14. Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development communityWednesday, 11 April, 12 11
  15. 15. Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development communityWednesday, 11 April, 12 11
  16. 16. Quantify Impact on Quality: Idea Extracted Communication DataWednesday, 11 April, 12 12
  17. 17. Quantify Impact on Quality: Idea Extracted Communication Data compute Social MetricsWednesday, 11 April, 12 12
  18. 18. Quantify Impact on Quality: Idea Extracted Communication Data compute Social Metrics measure relationships Post-Release DefectsWednesday, 11 April, 12 12
  19. 19. Discussion Social CONTENT STRUCTURES 4 Dimensions of Measures Measures of Communication WORKFLOW DYNAMICSWednesday, 11 April, 12 13
  20. 20. Conceptual Approach Measure Measure Discussion Post-Release Metrics Bugs 6 months 6 months time LINK USING STATISTICAL MODELSWednesday, 11 April, 12 14
  21. 21. Findings of our work (1) Social metrics explain post-release defects as good as code metrics.Wednesday, 11 April, 12 15
  22. 22. Findings of our work (1) Social metrics explain post-release defects as good as code metrics. (2) Combination of social metrics and code metrics is cumulative.Wednesday, 11 April, 12 15
  23. 23. Findings of our work (1) Social metrics explain post-release defects as good as code metrics. (2) Combination of social metrics and code metrics is cumulative. (3) Identify factors that have positive and negative relationships with defects.Wednesday, 11 April, 12 15
  24. 24. Findings of our work (1) Social metrics explain post-release defects as good as code metrics. (2) Combination of social metrics and code metrics is cumulative. (3) Identify factors that have positive and negative relationships with defects. [ICPC‘2010] (Best Paper) [JEMSE?]Wednesday, 11 April, 12 15
  25. 25. Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development communityWednesday, 11 April, 12 16
  26. 26. Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development communityWednesday, 11 April, 12 16
  27. 27. Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development communityWednesday, 11 April, 12 16
  28. 28. Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development communityWednesday, 11 April, 12 17
  29. 29. Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development communityWednesday, 11 April, 12 17
  30. 30. Proposed Approach I. Extraction of communication data II. Study impact on software quality III. Study impact on development communityWednesday, 11 April, 12 17
  31. 31. Available Knowledge in Data Code Review Systems Mailing Lists Issue Tracking Systems Data on Management of Code ContributionsWednesday, 11 April, 12 18
  32. 32. Contribution Management Patch Project Feedback Repository Feedback Submission Review OK Verification OK IntegrationWednesday, 11 April, 12 19
  33. 33. Studying Impact on Community through Contribution Management Goal: Study how contributors, reviewers, verifiers and the software are impacted by communication (anomalies) through statistical models. Example: Reviewers leaving community due to lack of feedbackWednesday, 11 April, 12 20
  34. 34. Available Knowledge in Data Version Control Systems Mailing Lists Issue Tracking Systems Workflow Information Social NetworksWednesday, 11 April, 12 21
  35. 35. Evolution of Code-Knowledge Communities Internet Explorer reed masayuki cjcypoi02 dietrich steve.england corevette steffen.wilberg davemgarrett mmortal03 timeless mano fittysix matspal longsonr zurtex matti edilee mconnor cwwmozilla beltzner dveditz adelfino zeniko kliu alice0775 sziadeh mark.finkle robert.bugzilla philringnalda sgautherie.bz kev faaborg johnath martijn.martijn jmjeffery jo.hermans nrthomas gavin.sharp polidobj m-wada XML Parser jbecerra jdarmochwal john.p.baker jruderman mak77 ria.klaassen VYV03354 cbook bomfog dao elmar.ludwig sdaugherty vseerror nightstalkerz l10n highmind63 twalker mh+mozilla klaas1988 ehsan stephen.donner me.at.work phiw hskupin ctalbert tchung tomer marcia timwi rotis uliss sylvain.pasche bugzilla marco.zehe cl-bugs-new2 JavaScript tonglebeak abillings info UI Engine deletesoftware anselm.meyer eddy_nigg matt RainerStroebel samuel.sidler+old alex hasham8888 aarobertxtr manujsabarwal johnjbarton myles7897 paulc shaver smichaud mozilla zhangchunlin dtownsend jdaggett kbrosnan bzbarsky sdwilshWednesday, 11 April, 12 22
  36. 36. Thesis Progress Tools and techniques Empirical Validation for mining communication repositories of presented tools and techniques Empirical Validation Empirical Validation of relationship between collaboration of relationship between collaboration and software quality. and development teams.Wednesday, 11 April, 12 23
  37. 37. Thesis Progress Tools and techniques Empirical Validation for mining communication repositories of presented tools and techniques Empirical Validation Empirical Validation of relationship between collaboration of relationship between collaboration and software quality. and development teams.Wednesday, 11 April, 12 23
  38. 38. Thesis Progress Tools and techniques Empirical Validation for mining communication repositories of presented tools and techniques Empirical Validation Empirical Validation of relationship between collaboration of relationship between collaboration and software quality. and development teams.Wednesday, 11 April, 12 23
  39. 39. Thesis Progress Tools and techniques Empirical Validation for mining communication repositories of presented tools and techniques Empirical Validation Empirical Validation of relationship between collaboration of relationship between collaboration and software quality. and development teams.Wednesday, 11 April, 12 23
  40. 40. Thesis Progress Tools and techniques Empirical Validation for mining communication repositories of presented tools and techniques Empirical Validation Empirical Validation of relationship between collaboration of relationship between collaboration and software quality. and development teams.Wednesday, 11 April, 12 23
  41. 41. Points for Discussion • How to do evaluation of code-knowledge communities? (ground truth)? • Applicability to industrial settings (almost no communication data records available)? • Extend work to defect prediction? • Practical implications: management, moderation, staffing, ... ?Wednesday, 11 April, 12 24
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×