Sattose talk

471 views

Published on

Software engineering is inherently a collaborative venture, involving many stakeholders that coordinate their efforts to produce large software systems. While importance of human aspects in software engineering has been recognised already in the 1970s, emergence of open source software (late 1990s) and platforms such as Stack Overflow and GitHub (late 2000s) enabled application of empirical methods to study of human aspects of software engineering.

In the first part of the talk we present a selection of recent results pertaining to two main
questions: who are the software developers and in what kind of activities they engage. The second part of the talk focuses on tools and techniques that have been used to obtain
the aforementioned results.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
471
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Sattose talk

  1. 1. Human  Aspects  of  So0ware   Engineering   Alexander  Serebrenik   Eindhoven  University  of   Technology,  NL  
  2. 2. 1970s   …   1998   2008   2014   SocialCom,   SocInfo,  SBP,   CHASE   MSR   books,   conferences  
  3. 3. MS  Windows   post-­‐release   fault  predic5on   Precision   predicted  &   correct  /  predicted   Recall   predicted  &  correct  /   correct     Code  churn   78.6%   79.9%   Code   complexity   79.3%   66.0%   Code  coverage   83.8%   54.5%   Code   dependencies   74.4%   69.9%   Organiza5onal   structure   86.2%   84.0%   Socio-­‐technical   network   76.9%   70.5%  
  4. 4. >  90%  in  WordPress  &  Drupal   >  95%  in  FLOSS  surveys   >  87%  in  GNOME   >  70%  in  soBware-­‐related  jobs  (NSF)   MEN  
  5. 5. FLOSS   2013   YOUNG(?)   median   Is  Programming  Knowledge  Related  to  Age?  An  Explora8on  of  Stack  Overflow.  Morrison,  P.,  Murphy-­‐Hill,  E.   MSR  2013.  FLOSS  2013:  A  survey  dataset  about  free  soPware  contributors:  challenges  for  cura8ng,  sharing   and  combining.    Robles,  G.,  Arjona-­‐Reina,  L.,  Vasilescu,  B.,  Serebrenik,  A.,  Gonzalez-­‐Barahona,  J.M.  MSR  2014  
  6. 6. FLOSS   2013  
  7. 7. FLOSS   2013  
  8. 8. FLOSS   2013   Europe,US,CA,AU Brazil/Argen5na  
  9. 9. PAGE 1508/07/14 .cpp .po .jpg /test/ /library/ .doc makefile .sql .conf
  10. 10. Occasional contributors Frequent contributors On the variation and specialization of workload---A case study of the Gnome ecosystem community Vasilescu, B., Serebrenik, A., Goeminne, M., Mens, T. Empirical Sw Engg
  11. 11. For  which  of  those  ac[vi[es  would     Joe  Average    be  more  ac[ve  than     Jane  Average?     For  which  ac[vi[es  no  differences  would   be  observed?    
  12. 12. PAGE 1808/07/14 .cpp .po .jpg /test/ /library/ .doc makefile .sql .conf
  13. 13. PAGE 1908/07/14 .cpp .po .jpg /test/ /library/ .doc makefile .sql .conf
  14. 14. PAGE 2008/07/14 .cpp .po .jpg /test/ /library/ .doc makefile .sql .conf No differences in preferences for activities, differences in the amount of commits
  15. 15. sample Engage for longer Ask more questions No diff in #answers Women can contribute to SO but choose not to! Gender, representation and online participation: A quantitative study, Vasilescu, B., Capiluppi, A., and Serebrenik, A., Interacting with Computers
  16. 16. sample No significant differences in #questions, #answers, length of engagement Disengagement of women is activity/ platform specific!
  17. 17. •  More  ac[ve  commi_ers  ask  more  on  SO   •  More  ac[ve  askers  commit  more  on  GitHub   StackOverflow  and  GitHub:  Associa8ons  between  soPware  development  and  crowdsourced  knowledge,   Vasilescu,  B.,  Filkov,  V.  and  Serebrenik,  A.,  In  Social  Compu8ng,  2013,  IEEE.    
  18. 18. StackOverflow  and  GitHub:  Associa8ons  between  soPware  development  and  crowdsourced  knowledge,   Vasilescu,  B.,  Filkov,  V.  and  Serebrenik,  A.,  In  Social  Compu8ng,  2013,  IEEE.    
  19. 19. •  Contributors  to  both  communi[es  (more  likely   developers  than  non-­‐developers)  are  more  ac[ve   than  those  who  focus  on  just  one.   How  Social  Q&A  sites  are  changing  knowledge  sharing  in  open  source  soPware  communi8es,     Vasilescu,  B.,  Serebrenik,  A.,  Devanbu,  P.T.  and  Filkov,  V.,  In  CSCW  2014,  ACM.     help  
  20. 20. How  Social  Q&A  sites  are  changing  knowledge  sharing  in  open  source  soPware  communi8es,     Vasilescu,  B.,  Serebrenik,  A.,  Devanbu,  P.T.  and  Filkov,  V.,  In  CSCW  2014,  ACM.     help  
  21. 21. Tools  and   Techniques  
  22. 22. PAGE 3008/07/14 Euphegenia Doubtfire, euphegenia@hotmail.com Robin Williams, robinw@gmail.com
  23. 23. PAGE 3108/07/14 Ordering Rajesh Sola Sola Rajesh Spelling: misspelling, diacritics, punctuation Rene Engelhard Fene Engelhard Démurget Demurget J. A. M. Carneiro J A M Carneiro Middle initials, patronyms, nicknames, additional surnames, incomplete names Daniel M. Mueth Daniel Mueth Alexander Alexandrov Shopov Alexander Shopov Carlos Garnacho Parro Carlos Garnacho Jacob “Ulysses” Berkman Jacob Berkman A S Alam Amanpreet Singh Alam Name variants: transliteration, diminutives Γιωργοσ Georgios Mike Gratton Michael Gratton Software-specific: usernames, projects, tooling artefacts mrhappypants Aaron Brown Arturo Tena/libole2 Arturo Tena (16:06) Alex Roberts Alex Roberts Mix Any combination of those
  24. 24. Jane  Smith   Jane  Alice   Smith,   jsmith@...   J.  Smith,   jane.smith@...   Jane   Smith,   jane@...   iden[ty  merging/ iden[ty  reconcilia[on/   developer  matching  
  25. 25. <Jane,  jsmith@gmail.com>     <Jane  Smith,  jsmith@gmail.com>     Ø iden[cal  mail  addresses  è  the  same  person   Ø also  useful  for  MD5  hashes  in  Stack  Overflow     <Jane  Smith,  jsmith@gmail.com>   <Jane  Smith,  jsmith@yahoo.com>   Ø likely  to  be  the  same  person  but  might  give  false   posi[ves  if  these  are  different  Janes…  
  26. 26. Latent  Seman[c  Analysis   1   ..   ..   ..   1   ..   ..   ..   1   ..   ..   ..   ?   ..   ..   ..   1   ..   ..   ..   john   johnd   joseph   jdoe   doe   johnd@domainA:      {john,  johnd,      joseph,  doe}   <John  Doe,                johnd@domainA>   <John  Joseph  Doe,  johnd@domainA>   Document-­‐term  matrix  
  27. 27. Latent  Seman[c  Analysis   1   ..   ..   ..   1   ..   ..   ..   1   ..   ..   ..   3/4   ..   ..   ..   1   ..   ..   ..   john   johnd   joseph   jdoe   doe   johnd@domainA:      {john,  johnd,      joseph,  doe}   <John  Doe,                johnd@domainA>   <John  Joseph  Doe,  johnd@domainA>   Document-­‐term  matrix   max  similarity(jdoe,      {john,  johnd,  joseph,  doe})     =  similarity(jdoe,  doe)     =  1  –  Levenshtein(jdoe,  doe)  /      max(  length(jdoe),  length(doe))   =  1  –  1/4  =  3/4  
  28. 28. Latent  Seman[c  Analysis   1   ..   ..   ..   1   ..   ..   ..   1   ..   ..   ..   3/4   ..   ..   ..   1   ..   ..   ..   john   johnd   joseph   jdoe   doe   Inverse  document  frequency     (Singular  value  decomposi[on)     Rank  (noise)  reduc[on     Cosine  between  documents     Merge  similar  documents  
  29. 29. h_p://uberpython.wordpress.com/2012/01/01/precision-­‐recall-­‐sensi[vity-­‐and-­‐specificity/   Which  technique  is  be_er?  
  30. 30. h_p://uberpython.wordpress.com/2012/01/01/precision-­‐recall-­‐sensi[vity-­‐and-­‐specificity/  
  31. 31. Most  contributors  have  only  one  “iden[ty”:  all  techniques   are  good  enough  for  the  average  case.   Advanced  techniques  are  be_er  in  presence  of  noise.   Choose  your  weapons  wisely!  
  32. 32. /  SET  /  W&I   PAGE  40   08/07/14   What  is   your   gender?  
  33. 33. Name  +   Loca[on  =   Gender  
  34. 34. Lonzo  ⇒  Alonzo     w35l3y  ⇒  wesley     Name  +   Loca[on  =   Gender  
  35. 35. <[tle>Ben  Kamens</[tle>   …   <h1>We’re  willing  to   be  embarrassed  about  what   we  <em>haven’t</ em>  done…</h1>   Heuris8cs:     [tle  +  first  h1   Ben  Kamens  We’re  willing  to  be   embarrassed  about  what  we   haven’t  done…   <PERSON>Ben  Kamens</PERSON>   We’re  willing  to  be  embarrassed   about  what  we  haven’t  done…   Stanford  Named   En8ty  Tagger  
  36. 36. Quality  of  gender  resolu[on:  Survey   Self-­‐ iden[fica[on   As  inferred     Total   M   F   ?   M   60   3   43   106   F   2   5   4   11   Self-­‐ iden[fica[on   As  inferred     Total   M   F   ?   M   90   3   13   106   F   2   9   0   11   +  avatars,  other   social  media   sites  (manually)    
  37. 37. SANER  2015   22nd  Interna[onal  Conference   on  So0ware  Analysis,  Evolu[on   and  Reengineering     •  Abstracts:  November  7,  2015   •  Papers:  November  14,  2015  

×