Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
How Developers’ Collaboration 
Identified from Different Sources Tell us 
About Code Changes 
Sebastiano Gabriele Massimil...
Outline 
Context and Motivations 
- Software Development 
Case Study 
- Seven Open Source Projects 
Results 
- Evaluation ...
Different Sources of Information… 
‘‘…In everybody’s experience, different communication 
channels play different, sometim...
Academic Paper Preparation 
10/3/2014
Academic Paper Preparation 
10/3/2014
Academic Paper Preparation 
10/3/2014
Academic Paper Preparation 
10/3/2014
Academic Paper Preparation 
10/3/2014
Academic Paper Preparation 
10/3/2014 
‘‘Study Design’’
Academic Paper Preparation 
10/3/2014 
‘‘Study Design’’ 
‘‘Results’’
Academic Paper Preparation 
10/3/2014 
‘‘Abstract 
and Introduction"
Academic Paper Preparation 
10/3/2014 
‘‘Results’’ 
‘‘Conclusion’’
Academic Paper Preparation 
10/3/2014 
‘‘Results’’ 
‘‘Conclusion’’
Once the Paper is Ready… 
10/3/2014 
Title We face an important decision: 
determine the right order of the authors…
Focusing on a single source: 
10/3/2014
Focusing on a single source: 
10/3/2014
Focusing on a single source: 
10/3/2014 
I would Bavota as 
first author…
Focusing on a single source: 
10/3/2014 
I would Panichella 
as first author…
Merging all the sources: 
10/3/2014 
I would Panichella 
as first author… 
I say Panichella.. 
I would Panichella 
as firs...
Software Development Environment 
10/3/2014
Software Development Environment 
10/3/2014
Software Development Environment 
10/3/2014 
Example: Hibernate OSS Project
Previous Work… 
10/3/2014 
Bird et al. - MSR 2006
Previous Work… 
10/3/2014 
Canfora et al. - FSE 2012
Previous Work… 
10/3/2014 
Guzzi et al. - MSR 2013
Previous Work… 
Elliot et al. - ACM GROUP 2003 
10/3/2014
How Developers’ Collaborations Networks 
Identified from Different Sources Differ? 
10/3/2014 
IRC CHAT LOG 
ISSUE TRACKER...
Case Study 
Goal: investigating how different communication channels would 
provide different views of developers’ interac...
Context - Objects 
10/3/2014 
Project from Andr. Api Period KLOC 
Apache HTTPD June 2011-June 2013 2,021-2,240 
Apache CXF...
Data Extraction 
10/3/2014
Data Extraction 
10/3/2014
Data Extraction 
10/3/2014 
Class 1 
Class 3 
Class 2 
Class 4 
Class 1
Data Extraction 
10/3/2014 
Class 1 
Class 3 
Class 2 
Class 4 
Class 1
Data Extraction 
10/3/2014
Data Extraction 
10/3/2014 
Identifying 
people that 
use more than 
one sources
Data Extraction 
10/3/2014 
Identifying 
people that 
use more than 
one sources
Apache CXF 
Hibernate 
Infinispan 
10/3/2014 
RQ1: to what extent do developers discuss 
through the different communicati...
Apache CXF 
Hibernate 
Infinispan 
10/3/2014 
RQ1: to what extent do developers discuss 
through the different communicati...
Apache CXF 
Hibernate 
Infinispan 
10/3/2014 
RQ1: to what extent do developers discuss 
through the different communicati...
Apache CXF 
Hibernate 
Infinispan 
10/3/2014 
35% 56% 
ISSUE and CHAT 
ISSUE and MAIL 
< 
MAIL and CHAT 
MAIL and ISSUE 
<...
RQ2: how do the inferred links between developers 
overlap when using different sources of information? 
Apache Httpd 
Apa...
RQ2: how do the inferred links between developers 
overlap when using different sources of information? 
Apache Httpd 
Apa...
During an IRC Chat Meeting 
10/3/2014 
“is there a better way? 
dunno like I said this is 
brainstorming and I have 
not g...
During an IRC Chat Meeting 
10/3/2014 
“is there a better way? 
dunno like I said this is 
brainstorming and I have 
not g...
During an IRC Chat Meeting 
10/3/2014 
“is there a better way? 
dunno like I said this is 
brainstorming and I have 
not g...
During an IRC Chat Meeting 
10/3/2014 
“okay I think it is a bug 
and I’m going to 
create a jira first” 
“however plannin...
Similarity Measure of Topics Extracted from 
10/3/2014 
Different Communication Channels 
issues vs. 
mails 
issues vs. ch...
Similarity Measure of Topics Extracted from 
10/3/2014 
Different Communication Channels 
issues vs. 
mails 
issues vs. ch...
RQ3: How do social network metrics change when using 
different sources, and how would this impact on using such 
informat...
RQ3: How do social network metrics change when using 
different sources, and how would this impact on using such 
informat...
Apache CXF 
Hibernate 
Infinispan 
10/3/2014 
Mentors Overlap between 
Different Sources 
Apache Httpd 
Apache Lucene 
Sam...
Apache CXF 
Hibernate 
Infinispan 
10/3/2014 
Mentors Overlap between 
Different Sources 
41% 
Considering ALL SOURCES 
Ap...
Apache CXF 
Hibernate 
Infinispan 
10/3/2014 
Mentors Overlap between 
Different Sources 
41% 47% 
Considering ALL SOURCES...
Apache CXF 
Hibernate 
Infinispan 
10/3/2014 
High Degree Contributors Overlap 
between Different Sources 
41% 47% 
Consid...
High Degree Contributors Overlap 
Apache CXF 
Hibernate 
Infinispan 
10/3/2014 
between Different Sources 
41% 47% 
Consid...
Ohloh Kudos Score 
10/3/2014 
Kudos score: 
level of appreciation 
or respect of a 
developer working 
for a project. It i...
Issue, Chat and Email to Identify Leaders 
Hibernat 
e Samba 
Apache 
Lucene 
10/3/2014 
0% 
20% 
40% 
20% 
20% 
40% 
60% ...
Replication of the Work by Bird et al. 
10/3/2014 
Bird et al. - MSR 2006 
‘‘Developers who actually commit changes, 
play...
Social Network Metrics vs. 
Source Code Changes 
10/3/2014
Social Network Metrics vs. 
Source Code Changes 
10/3/2014 
Code Metrics 
SNA Metrics
Social Network Metrics vs. 
Source Code Changes 
10/3/2014 
Code Metrics 
SNA Metrics
Social Network Metrics vs. 
Source Code Changes 
10/3/2014 
Code Metrics 
SNA Metrics
Social Network Metrics vs. 
Source Code Changes 
10/3/2014 
Code Metrics 
SNA Metrics 
Results varying when we consider fo...
Social Network Metrics vs. 
Source Code Changes 
10/3/2014 
Code Metrics 
SNA Metrics
Social Network Metrics vs. 
Source Code Changes 
10/3/2014
Social Network Metrics vs. 
Source Code Changes 
10/3/2014
Social Network Metrics vs. 
Source Code Changes 
10/3/2014
Social Network Metrics vs. 
Source Code Changes 
10/3/2014
Conclusion 
10/3/2014
Conclusion 
10/3/2014
Conclusion 
10/3/2014
Conclusion 
10/3/2014
Conclusion 
10/3/2014
Upcoming SlideShare
Loading in …5
×

How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

966 views

Published on

Written communications recorded through channels
such as mailing lists or issue trackers, but also code cochanges, have been used to identify emerging collaborations in software projects. Also, such data has been used to identify the relation between developers’ roles in communication networks and source code changes, or to identify mentors aiding newcomers to evolve the software project. However, results of such analyses may be different depending on the communication channel being mined. This paper investigates how collaboration links vary and complement each other when they are identified through data from three different kinds of communication channels, i.e., mailing lists, issue trackers, and IRC chat logs. Also, the study investigates how such links overlap with links mined from code changes, and how the use of different sources would influence (i) the identification of project mentors, and (ii) the presence of a correlation between the social role of a developer and her changes. Results of a study conducted on seven open source projects indicate that the overlap of communication links between the various sources is relatively low, and that the application of networks obtained from different sources may lead to different results.

  • Be the first to comment

How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes

  1. 1. How Developers’ Collaboration Identified from Different Sources Tell us About Code Changes Sebastiano Gabriele Massimiliano Gerardo Giuliano Panichella Bavota Di Penta Canfora Antoniol 10/3/2014
  2. 2. Outline Context and Motivations - Software Development Case Study - Seven Open Source Projects Results - Evaluation of Developers Collaboration Identified from Different Sources - Application of Networks Obtained from Different Sources 10/3/2014
  3. 3. Different Sources of Information… ‘‘…In everybody’s experience, different communication channels play different, sometimes complementary sometimes alternative, roles: news can be gathered (and shared) from the radio, by reading a newspaper, watching a TV broadcast or surfing blogs.’’. 10/3/2014
  4. 4. Academic Paper Preparation 10/3/2014
  5. 5. Academic Paper Preparation 10/3/2014
  6. 6. Academic Paper Preparation 10/3/2014
  7. 7. Academic Paper Preparation 10/3/2014
  8. 8. Academic Paper Preparation 10/3/2014
  9. 9. Academic Paper Preparation 10/3/2014 ‘‘Study Design’’
  10. 10. Academic Paper Preparation 10/3/2014 ‘‘Study Design’’ ‘‘Results’’
  11. 11. Academic Paper Preparation 10/3/2014 ‘‘Abstract and Introduction"
  12. 12. Academic Paper Preparation 10/3/2014 ‘‘Results’’ ‘‘Conclusion’’
  13. 13. Academic Paper Preparation 10/3/2014 ‘‘Results’’ ‘‘Conclusion’’
  14. 14. Once the Paper is Ready… 10/3/2014 Title We face an important decision: determine the right order of the authors…
  15. 15. Focusing on a single source: 10/3/2014
  16. 16. Focusing on a single source: 10/3/2014
  17. 17. Focusing on a single source: 10/3/2014 I would Bavota as first author…
  18. 18. Focusing on a single source: 10/3/2014 I would Panichella as first author…
  19. 19. Merging all the sources: 10/3/2014 I would Panichella as first author… I say Panichella.. I would Panichella as first author… I would Bavota as first author… I say Bavota..
  20. 20. Software Development Environment 10/3/2014
  21. 21. Software Development Environment 10/3/2014
  22. 22. Software Development Environment 10/3/2014 Example: Hibernate OSS Project
  23. 23. Previous Work… 10/3/2014 Bird et al. - MSR 2006
  24. 24. Previous Work… 10/3/2014 Canfora et al. - FSE 2012
  25. 25. Previous Work… 10/3/2014 Guzzi et al. - MSR 2013
  26. 26. Previous Work… Elliot et al. - ACM GROUP 2003 10/3/2014
  27. 27. How Developers’ Collaborations Networks Identified from Different Sources Differ? 10/3/2014 IRC CHAT LOG ISSUE TRACKER MAILING LIST VERSIONING SYSTEM
  28. 28. Case Study Goal: investigating how different communication channels would provide different views of developers’ interaction and the use of such information in recommender systems could produce different results. Research questions: • RQ1: to what extent do developers discuss through the different communication channels? • RQ2: How do the inferred links between developers overlap when using different sources of information? • RQ3: How do social network metrics change when using different sources, and how would this impact on using such information to build recommenders? 10/3/2014
  29. 29. Context - Objects 10/3/2014 Project from Andr. Api Period KLOC Apache HTTPD June 2011-June 2013 2,021-2,240 Apache CXF June 2011-June 2013 593–771 Hibernate June 2011-June 2013 984–1,096 Infinispan June 2011-June 2013 146–286 Apache Lucene June 2011-June 2013 198–437 Samba June 2010-June 2012 1,278–1426 Weld June 2011-June 2013 108–139
  30. 30. Data Extraction 10/3/2014
  31. 31. Data Extraction 10/3/2014
  32. 32. Data Extraction 10/3/2014 Class 1 Class 3 Class 2 Class 4 Class 1
  33. 33. Data Extraction 10/3/2014 Class 1 Class 3 Class 2 Class 4 Class 1
  34. 34. Data Extraction 10/3/2014
  35. 35. Data Extraction 10/3/2014 Identifying people that use more than one sources
  36. 36. Data Extraction 10/3/2014 Identifying people that use more than one sources
  37. 37. Apache CXF Hibernate Infinispan 10/3/2014 RQ1: to what extent do developers discuss through the different communication channels? Apache Httpd Apache Lucene Samba Weld
  38. 38. Apache CXF Hibernate Infinispan 10/3/2014 RQ1: to what extent do developers discuss through the different communication channels? Apache Httpd Apache Lucene Samba Weld Developers mainly use two out of three communication channels, whereas the third one is only used sporadically.
  39. 39. Apache CXF Hibernate Infinispan 10/3/2014 RQ1: to what extent do developers discuss through the different communication channels? Apache Httpd Apache Lucene Samba Weld Developers mainly use two out of three communication channels, whereas the third one is only used sporadically. While in the past developers used emails as main communication channel, nowadays they are massively using chats or issue trackers.
  40. 40. Apache CXF Hibernate Infinispan 10/3/2014 35% 56% ISSUE and CHAT ISSUE and MAIL < MAIL and CHAT MAIL and ISSUE < 50% 86% Apache Httpd Apache Lucene Samba Weld Developers Overlap between Different Sources
  41. 41. RQ2: how do the inferred links between developers overlap when using different sources of information? Apache Httpd Apache CXF Hibernate Infinispan Apache Lucene Samba Weld 10/3/2014 35% 56% ISSUE and CHAT ISSUE and MAIL < MAIL and CHAT MAIL and ISSUE < 50% 86%
  42. 42. RQ2: how do the inferred links between developers overlap when using different sources of information? Apache Httpd Apache CXF Hibernate Infinispan Apache Lucene Samba Weld 10/3/2014 26% 38% ISSUE and CHAT ISSUE and MAIL < MAIL and CHAT MAIL and ISSUE < 20% 30%
  43. 43. During an IRC Chat Meeting 10/3/2014 “is there a better way? dunno like I said this is brainstorming and I have not given lots of thought to these cases” “but we also need to create the attributes and values in the entity binding..”
  44. 44. During an IRC Chat Meeting 10/3/2014 “is there a better way? dunno like I said this is brainstorming and I have not given lots of thought to these cases” 1) Brainstorming “however planning a pure standalone test suite would make things easier...”
  45. 45. During an IRC Chat Meeting 10/3/2014 “is there a better way? dunno like I said this is brainstorming and I have not given lots of thought to these cases” “however planning a pure standalone test suite would make things easier...” 1) Brainstorming 2) Planning (e.g. Testing activities)
  46. 46. During an IRC Chat Meeting 10/3/2014 “okay I think it is a bug and I’m going to create a jira first” “however planning a pure standalone test suite would make things easier...” 1) Brainstorming 2) Planning (e.g. Testing activities) 3) Open an Issue
  47. 47. Similarity Measure of Topics Extracted from 10/3/2014 Different Communication Channels issues vs. mails issues vs. chat mails vs. chat Apache Httpd 0.17 0.09 0.06 Apache CXF 0.86 0.11 0.01 Hibernate 0.11 0.02 0.03 Infinispan 0.07 0.03 0.03 Apache Lucene 0.08 0.03 0.02 Samba 0.06 0.02 0.02 Weld 0.11 0.04 0.03
  48. 48. Similarity Measure of Topics Extracted from 10/3/2014 Different Communication Channels issues vs. mails issues vs. chat mails vs. chat > > > > > Apache Httpd 0.17 0.09 0.06 Apache CXF 0.86 0.11 0.01 Hibernate 0.11 0.02 0.03 Infinispan 0.07 > 0.03 ≥ 0.03 Apache Lucene 0.08 0.03 0.02 > > > ≥ Samba 0.06 0.02 0.02 Weld 0.11 0.04 0.03 > >
  49. 49. RQ3: How do social network metrics change when using different sources, and how would this impact on using such information to build recommenders? 10/3/2014
  50. 50. RQ3: How do social network metrics change when using different sources, and how would this impact on using such information to build recommenders? Social Network Metrics: - Identifying high-degree developers; - Identifying mentors. 10/3/2014 (Canfora et al. - FSE 2012). Social Network Metrics vs. Code Changes: - Correlation between social roles and change activities. (replicating the study by Bird et al. - MSR 2006).
  51. 51. Apache CXF Hibernate Infinispan 10/3/2014 Mentors Overlap between Different Sources Apache Httpd Apache Lucene Samba Weld
  52. 52. Apache CXF Hibernate Infinispan 10/3/2014 Mentors Overlap between Different Sources 41% Considering ALL SOURCES Apache Httpd Apache Lucene Samba Weld
  53. 53. Apache CXF Hibernate Infinispan 10/3/2014 Mentors Overlap between Different Sources 41% 47% Considering ALL SOURCES MAIL and ISSUE < Apache Httpd Apache Lucene Samba Weld
  54. 54. Apache CXF Hibernate Infinispan 10/3/2014 High Degree Contributors Overlap between Different Sources 41% 47% Considering ALL SOURCES MAIL and ISSUE < Apache Httpd Apache Lucene Samba 36% Weld Considering ALL SOURCES
  55. 55. High Degree Contributors Overlap Apache CXF Hibernate Infinispan 10/3/2014 between Different Sources 41% 47% Considering ALL SOURCES MAIL and ISSUE < Apache Httpd Apache Lucene Samba Weld Considering ALL SOURCES MAIL and ISSUE < 36% 46%
  56. 56. Ohloh Kudos Score 10/3/2014 Kudos score: level of appreciation or respect of a developer working for a project. It is based on the judgement of other project members. http://www.ohloh.net/p/apache/contributors
  57. 57. Issue, Chat and Email to Identify Leaders Hibernat e Samba Apache Lucene 10/3/2014 0% 20% 40% 20% 20% 40% 60% 60% 60% 60% 60% 80% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Leaders Leaders Leaders Leaders Apache HTTPD Precision in Recommending Leaders MAIL ISSUE CHAT
  58. 58. Replication of the Work by Bird et al. 10/3/2014 Bird et al. - MSR 2006 ‘‘Developers who actually commit changes, play much more significant roles in the email community than non-developers’’
  59. 59. Social Network Metrics vs. Source Code Changes 10/3/2014
  60. 60. Social Network Metrics vs. Source Code Changes 10/3/2014 Code Metrics SNA Metrics
  61. 61. Social Network Metrics vs. Source Code Changes 10/3/2014 Code Metrics SNA Metrics
  62. 62. Social Network Metrics vs. Source Code Changes 10/3/2014 Code Metrics SNA Metrics
  63. 63. Social Network Metrics vs. Source Code Changes 10/3/2014 Code Metrics SNA Metrics Results varying when we consider for example issue trackers?
  64. 64. Social Network Metrics vs. Source Code Changes 10/3/2014 Code Metrics SNA Metrics
  65. 65. Social Network Metrics vs. Source Code Changes 10/3/2014
  66. 66. Social Network Metrics vs. Source Code Changes 10/3/2014
  67. 67. Social Network Metrics vs. Source Code Changes 10/3/2014
  68. 68. Social Network Metrics vs. Source Code Changes 10/3/2014
  69. 69. Conclusion 10/3/2014
  70. 70. Conclusion 10/3/2014
  71. 71. Conclusion 10/3/2014
  72. 72. Conclusion 10/3/2014
  73. 73. Conclusion 10/3/2014

×