Linking E-Mails and Source Code Artifacts

634 views
582 views

Published on

Slides of the presentation given at ICSE 2010 (http://www.sbs.co.za/ICSE2010/) on the paper (http://www.inf.usi.ch/faculty/lanza/Downloads/Bacc2010b.pdf).

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
634
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Linking E-Mails and Source Code Artifacts

  1. 1. Linking E-Mails and Source Code Artifacts Alberto Bacchelli, Michele Lanza REVEAL @ Faculty of Informatics University of Lugano Romain Robbes PLEIAD @ DCC University of Chile
  2. 2. Linking E-Mails and Source Code Artifacts
  3. 3. Linking E-Mails and Source Code Artifacts
  4. 4. Linking E-Mails and Source Code Artifacts
  5. 5. Linking E-Mails and Source Code Artifacts
  6. 6. Linking E-Mails and Source Code Artifacts
  7. 7. Linking E-Mails and Source Code Artifacts
  8. 8. Linking E-Mails and Source Code Artifacts
  9. 9. Linking E-Mails and Source Code Artifacts
  10. 10. Linking E-Mails and Source Code Artifacts
  11. 11. E-mails are precious for software engineering
  12. 12. 0! 2000! 4000! 6000! 8000! 10000! 12000! 14000! 16000! Jun-95! Sep-95! Dec-95! Mar-96! Jun-96! Sep-96! Dec-96! Mar-97! Jun-97! Sep-97! Dec-97! Mar-98! Jun-98! Sep-98! Dec-98! Mar-99! Jun-99! Sep-99! Dec-99! Mar-00! Jun-00! Sep-00! Dec-00! Mar-01! Jun-01! Sep-01! Dec-01! Mar-02! Jun-02! Sep-02! Dec-02! Mar-03! Jun-03! Sep-03! Dec-03! Mar-04! Jun-04! Sep-04! Dec-04! Mar-05! Jun-05! Sep-05! Dec-05! Mar-06! Jun-06! Sep-06! Dec-06! Mar-07! Jun-07! Sep-07! Dec-07! Mar-08! Jun-08! Sep-08! Dec-08! Mar-09! Jun-09! Sep-09! Dec-09! Mar-10! E-mails are the “bread and butter of project communication” - Karl Fogel, creator of the Subversion project Number of e-mails
  13. 13. 0 1 2 3 4 5 6 0% 5% 10% 15% 20% 25% 30% Maintaining mental models: a study of developer work habits LaToza, Venolia, DeLine [ICSE 2006] E-mails Planned Meetings Unplanned Meetings Internal Documents Bug Database External Documents Phone Web IM Other Effectiveness Frequency of usage
  14. 14. 0 1 2 3 4 5 6 0% 5% 10% 15% 20% 25% 30% Maintaining mental models: a study of developer work habits LaToza, Venolia, DeLine [ICSE 2006] E-mails Planned Meetings Unplanned Meetings Internal Documents Bug Database External Documents Phone Web IM Other Effectiveness Frequency of usage E-mails are widely used and highly effective
  15. 15. E-mails are people-centric information used to exchange knowledge
  16. 16. Linking E-Mails and Source Code Artifacts
  17. 17. Recovering Traceability Links - State of the Art Vector Space Model Probabilistic Model Latent Semantic Indexing
  18. 18. Recovering Traceability Links - State of the Art Vector Space Model Probabilistic Model Latent Semantic Indexing Antoniol, Canfora, Casazza, De Lucia, Merlo TSE 2002
  19. 19. Recovering Traceability Links - State of the Art Vector Space Model Probabilistic Model Latent Semantic Indexing Marcus and Maletic ICSE 2003
  20. 20. Recovering Traceability Links Vector Space Model Latent Semantic Indexing
  21. 21. Recovering Traceability Links Vector Space Model Latent Semantic Indexing
  22. 22. Recovering Traceability Links Vector Space Model Latent Semantic Indexing
  23. 23. Without robust, well-designed time-tested, and, eventually well-established and accepted benchmarks, research on application of IR methods to problems in Software Engineering will not reach its full potential. - Alex Dekhtyar and Jane Huffman Hayes, ICSM 2006
  24. 24. Without benchmarks, Software Engineering will not reach its full potential.
  25. 25. Benchmarking the Link System ArgoUML Augeas Away3D Freenet Habari JMeter
  26. 26. Benchmarking the Link System Language ArgoUML Java Augeas Away3D Freenet Java Habari JMeter Java
  27. 27. Benchmarking the Link System Language ArgoUML Java Augeas C Away3D ActionScript Freenet Java Habari PHP5 JMeter Java
  28. 28. Benchmarking the Link System Language Releases ArgoUML Java 11 Augeas C 17 Away3D ActionScript 9 Freenet Java 30 Habari PHP5 12 JMeter Java 20
  29. 29. Benchmarking the Link System Language Releases Entities ArgoUML Java 11 18,252 Augeas C 17 8,042 Away3D ActionScript 9 2,351 Freenet Java 30 37,878 Habari PHP5 12 1,105 JMeter Java 20 11,105
  30. 30. Benchmarking the Link System Language Releases Entities E-Mails ArgoUML Java 11 18,252 355 Augeas C 17 8,042 281 Away3D ActionScript 9 2,351 370 Freenet Java 30 37,878 379 Habari PHP5 12 1,105 374 JMeter Java 20 11,105 380
  31. 31. Benchmarking the Link System Language Releases Entities E-Mails ArgoUML Java 11 18,252 355 Augeas C 17 8,042 281 Away3D ActionScript 9 2,351 370 Freenet Java 30 37,878 379 Habari PHP5 12 1,105 374 JMeter Java 20 11,105 380
  32. 32. The Miler Web Application
  33. 33. The Miler Web Application
  34. 34. The Miler Web Application release history
  35. 35. The Miler Web Application release history
  36. 36. The Miler Web Application release history
  37. 37. The Miler Web Application release history
  38. 38. The Miler Web Application release history
  39. 39. The Miler Web Application release history
  40. 40. The Miler Web Application release history
  41. 41. System Language Releases Entities E-Mails ArgoUML Java 11 18,252 355 Augeas C 17 8,042 281 Away3D ActionScript 9 2,351 370 Freenet Java 30 37,878 379 Habari PHP5 12 1,105 374 JMeter Java 20 11,105 380 Benchmarking the Link
  42. 42. System Language Releases Entities E-Mails ArgoUML Java 11 18,252 355 Augeas C 17 8,042 281 Away3D ActionScript 9 2,351 370 Freenet Java 30 37,878 379 Habari PHP5 12 1,105 374 JMeter Java 20 11,105 380 Benchmarking the Link
  43. 43. System Language Releases Entities E-Mails ArgoUML Java 11 18,252 355 Augeas C 17 8,042 281 Away3D ActionScript 9 2,351 370 Freenet Java 30 37,878 379 Habari PHP5 12 1,105 374 JMeter Java 20 11,105 380 Benchmarking the Link
  44. 44. System Language Releases Entities E-Mails ArgoUML Java 11 18,252 355 Augeas C 17 8,042 281 Away3D ActionScript 9 2,351 370 Freenet Java 30 37,878 379 Habari PHP5 12 1,105 374 JMeter Java 20 11,105 380 Benchmarking the Link http://miler.inf.usi.ch
  45. 45. Vector Space Model
  46. 46. Vector Space Model D1 D2 D3 ... DN t1 t2 ... tC 0 1 2 0 0 0 1 4 1 2 0 0
  47. 47. Vector Space Model D1 D2 D3 ... DN t1 t2 ... tC 0 1 2 0 0 0 1 4 1 2 0 0
  48. 48. Vector Space Model D1 D2 D3 ... DN t1 t2 ... tC 0 1 2 0 0 0 1 4 1 2 0 0
  49. 49. Vector Space Model D1 D2 D3 ... DN t1 t2 ... tC 0 1 2 0 0 0 1 4 1 2 0 0
  50. 50. Vector Space Model D1 D2 D3 ... DN t1 t2 ... tC 0 1 2 0 0 0 1 4 1 2 0 0
  51. 51. Vector Space Model D1 D2 D3 ... DN t1 t2 ... tC 0 1 2 0 0 0 1 4 1 2 0 0
  52. 52. Vector Space Model E1 E2 E3 ... EN t1 t2 ... tC 0 1 2 0 0 0 1 4 1 2 0 0
  53. 53. Vector Space Model E1 E2 E3 ... EN t1 t2 ... tC 0 1 2 0 0 0 1 4 1 2 0 0 term frequency
  54. 54. Vector Space Model E1 E2 E3 ... EN t1 t2 ... tC 0 0.3 0.3 0 0 0 0.01 0.5 0.02 0.4 0 0 term frequency
  55. 55. Vector Space Model E1 E2 E3 ... EN t1 t2 ... tC 0 0.3 0.3 0 0 0 0.01 0.5 0.02 0.4 0 0 term frequency inverse document frequency
  56. 56. Vector Space Model term frequency inverse document frequency E1 E2 E3 ... EN t1 t2 ... tC 0 0.01 0.01 0 0 0 0.01 0.5 0.02 0.4 0 0
  57. 57. Vector Space Model E1 E2 E3 ... EN Q t1 t2 ... tC 0 0.01 0.01 0 0 0 0 0.01 0.5 0.2 0.02 0.4 0 0 0.01
  58. 58. E1 E2 E3 ... EN Q t1 t2 ... tC 0 0.01 0.01 0 0 0 0 0.01 0.5 0.2 0.02 0.4 0 0 0.01 Vector Space Model
  59. 59. E1 E2 E3 ... EN Q t1 t2 ... tC 0 0.01 0.01 0 0 0 0 0.01 0.5 0.2 0.02 0.4 0 0 0.01 Vector Space Model E1
  60. 60. E1 E2 E3 ... EN Q t1 t2 ... tC 0 0.01 0.01 0 0 0 0 0.01 0.5 0.2 0.02 0.4 0 0 0.01 Vector Space Model E1
  61. 61. E1 E2 E3 ... EN Q t1 t2 ... tC 0 0.01 0.01 0 0 0 0 0.01 0.5 0.2 0.02 0.4 0 0 0.01 Vector Space Model E1
  62. 62. E1 E2 E3 ... EN Q t1 t2 ... tC 0 0.01 0.01 0 0 0 0 0.01 0.5 0.2 0.02 0.4 0 0 0.01 Vector Space Model E1 E3 E7
  63. 63. VSM on JMeter - Choosing query type and threshold entire content classname&package classname F-Measure Threshold
  64. 64. VSM on JMeter - Choosing query type and threshold entire content classname&package classname F-Measure Threshold0.01 0.11 0.21 0.31 0.41 0.51 0.61 0.71 0.81 0.91 0 0.1 0.2 0.3 0.4
  65. 65. VSM on JMeter - Choosing query type and threshold entire content classname&package classname F-Measure Threshold0.01 0.11 0.21 0.31 0.41 0.51 0.61 0.71 0.81 0.91 0 0.1 0.2 0.3 0.4
  66. 66. VSM on JMeter - Choosing query type and threshold entire content classname&package classname F-Measure Threshold0.01 0.11 0.21 0.31 0.41 0.51 0.61 0.71 0.81 0.91 0 0.1 0.2 0.3 0.4
  67. 67. VSM on JMeter - Best configuration results 0 0.2 0.4 0.6 0.8 1.0 0.01 0.11 0.21 0.31 0.41 0.51 0.61 0.71 0.81 0.91 precision recall f-measure Threshold
  68. 68. VSM - Best configuration results ArgoUML Freenet JMeter Away3D Habari Augeas Threshold F-Measure 0 0.1 0.2 0.3 0.4 0.01 0.11 0.21 0.31 0.41 0.51 0.61 0.71 0.81 0.91
  69. 69. VSM - Best configuration results ArgoUML Freenet JMeter Away3D Habari Augeas Threshold F-Measure 0 0.1 0.2 0.3 0.4 0.01 0.11 0.21 0.31 0.41 0.51 0.61 0.71 0.81 0.91
  70. 70. Latent Semantic Indexing
  71. 71. Latent Semantic Indexing ‣ Synonymy
  72. 72. Latent Semantic Indexing ‣ Synonymy NSUML NSUMLModelFacade
  73. 73. Latent Semantic Indexing ‣ Synonymy NSUML NSUMLModelFacade=
  74. 74. Latent Semantic Indexing ‣ Synonymy NSUML NSUMLModelFacade ‣ Polysemy =
  75. 75. Latent Semantic Indexing ‣ Synonymy NSUML NSUMLModelFacade ‣ Polysemy dialog Dialog =
  76. 76. Latent Semantic Indexing ‣ Synonymy NSUML NSUMLModelFacade ‣ Polysemy dialog Dialog = =
  77. 77. Latent Semantic Indexing E1 E2 ... EN t1 t2 ... tC 0 1 0 0 0 4 1 2 0
  78. 78. Latent Semantic Indexing E1 E2 ... EN t1 t2 ... tC 0 1 0 0 0 4 1 2 0 Single Value Decomposition
  79. 79. E1 E2 ... EN tpc1 tpc2 ... tpcK 0 0.02 0 0 0 0.4 0.1 0.2 0 Latent Semantic Indexing E1 E2 ... EN t1 t2 ... tC 0 1 0 0 0 4 1 2 0 Single Value Decomposition
  80. 80. E1 E2 ... EN tpc1 tpc2 ... tpcK 0 0.02 0 0 0 0.4 0.1 0.2 0 Latent Semantic Indexing E1 E2 ... EN t1 t2 ... tC 0 1 0 0 0 4 1 2 0 Single Value Decomposition
  81. 81. E1 E2 ... EN tpc1 tpc2 ... tpcK 0 0.02 0 0 0 0.4 0.1 0.2 0 Latent Semantic Indexing E1 E2 ... EN t1 t2 ... tC 0 1 0 0 0 4 1 2 0 Single Value Decomposition
  82. 82. LSI - Choosing the number of topics and query type entire content classname&package classname F-Measure Number of topics
  83. 83. 10 30 50 70 90 110 130 150 170 190 210 230 250 270 290 310 330 350 0 0.1 0.2 0.3 0.4 LSI - Choosing the number of topics and query type entire content classname&package classname F-Measure Number of topics
  84. 84. 10 30 50 70 90 110 130 150 170 190 210 230 250 270 290 310 330 350 0 0.1 0.2 0.3 0.4 LSI - Choosing the number of topics and query type entire content classname&package classname F-Measure Number of topics
  85. 85. 10 30 50 70 90 110 130 150 170 190 210 230 250 270 290 310 330 350 0 0.1 0.2 0.3 0.4 LSI - Choosing the number of topics and query type entire content classname&package classname F-Measure Number of topics
  86. 86. LSI on JMeter - Best configuration results 0 0.2 0.4 0.6 0.8 0.01 0.11 0.21 0.31 0.41 0.51 0.61 0.71 0.81 0.91 precision recall f-measure Threshold
  87. 87. 0 0.15 0.30 0.45 0.60 0.01 0.11 0.21 0.31 0.41 0.51 0.61 0.71 0.81 0.91 LSI - Best configuration results ArgoUML Freenet JMeter Away3D Habari Augeas Threshold F-Measure
  88. 88. 0 0.15 0.30 0.45 0.60 0.01 0.11 0.21 0.31 0.41 0.51 0.61 0.71 0.81 0.91 LSI - Best configuration results ArgoUML Freenet JMeter Away3D Habari Augeas Threshold F-Measure
  89. 89. What replaces PluggableImport and Generator2? (and other language module questions) Tom Morris tfmo...@gmail.com September 23, 2006 - 13:12:51 We're trying to implement support in ArgoEclipse for reverse engineering which means that we need to deal with the PluggableImport interface. It doesn't really make sense to modify that interface because it is deprecated, but I can't figure o u t w h a t r e p l a c e s i t . Th e c o m m e n t s s ay t o r e g i s t e r w i t h org.argouml.uml.reveng.Import but that class has no registration method. Additionally, it itself depends on the deprecated PluggableImport interface. On the code generation side of things, Generator2 has been deprecated in favor of CodeGenerator, but they don't appear to have equivalent functionality, so I don't understand how this is meant to work. Are there examples of modules which have been converted to the new structure? Is there a design discussion somewhere which describes how to convert old style modules to new style modules? Who's working on this stuff? I'm happy to help if I can get an idea of what the design direction is. Tom
  90. 90. What replaces PluggableImport and Generator2? (and other language module questions) Tom Morris tfmo...@gmail.com September 23, 2006 - 13:12:51 We're trying to implement support in ArgoEclipse for reverse engineering which means that we need to deal with the PluggableImport interface. It doesn't really make sense to modify that interface because it is deprecated, but I can't figure o u t w h a t r e p l a c e s i t . Th e c o m m e n t s s ay t o r e g i s t e r w i t h org.argouml.uml.reveng.Import but that class has no registration method. Additionally, it itself depends on the deprecated PluggableImport interface. On the code generation side of things, Generator2 has been deprecated in favor of CodeGenerator, but they don't appear to have equivalent functionality, so I don't understand how this is meant to work. Are there examples of modules which have been converted to the new structure? Is there a design discussion somewhere which describes how to convert old style modules to new style modules? Who's working on this stuff? I'm happy to help if I can get an idea of what the design direction is. Tom
  91. 91. Text Matching
  92. 92. Text Matching Entity Name
  93. 93. Text Matching Entity Name dictionary word?
  94. 94. dictionary word? Text Matching Entity Name no
  95. 95. Text Matching Entity Name no Name case sensitive dictionary word?
  96. 96. dictionary word? Text Matching Entity Name Name case sensitive yes Regular Expression no
  97. 97. dictionary word? Text Matching Entity Name Name case sensitive Regular Expression no yes
  98. 98. Text Matching - Regular Expression
  99. 99. Classname Text Matching - Regular Expression
  100. 100. . / space Classname Text Matching - Regular Expression
  101. 101. . / space Classname space Text Matching - Regular Expression
  102. 102. . / space Classname . / space Text Matching - Regular Expression
  103. 103. . / space Classname . / space Text Matching - Regular Expression java class as php c
  104. 104. . / space Classname . / space package Text Matching - Regular Expression java class as php c
  105. 105. . / space Classname . / space . / space package Text Matching - Regular Expression java class as php c
  106. 106. dictionary word? Text Matching Entity Name Name case sensitive Regular Expression yesno
  107. 107. Text Matching dictionary word?
  108. 108. Text Matching Dialog DialogTree dictionary word?
  109. 109. dictionary word? Text Matching Entity Name Name case sensitive Regular Expression yesno
  110. 110. CamelCase? Text Matching Entity Name Name case sensitive Regular Expression noyes
  111. 111. Text Matching 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8
  112. 112. Recall Text Matching 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8
  113. 113. Precision Recall Text Matching 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8
  114. 114. Text Matching 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 P R Precision Recall F ArgoUML Freenet JMeter Away3D Habari Augeas 0.61 0.64 0.63 0.59 0.59 0.59 0.59 0.65 0.62 0.41 0.72 0.52 0.49 0.38 0.43 0.15 0.64 0.24
  115. 115. Text Matching 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 P R Precision Recall F ArgoUML Freenet JMeter Away3D Habari Augeas 0.61 0.64 0.63 0.59 0.59 0.59 0.59 0.65 0.62 0.41 0.72 0.52 0.49 0.38 0.43 0.15 0.64 0.24 Java
  116. 116. Text Matching 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 Precision Recall F ArgoUML Freenet JMeter Away3D Habari Augeas 0.61 0.64 0.63 0.59 0.59 0.59 0.59 0.65 0.62 0.41 0.72 0.52 0.49 0.38 0.43 0.15 0.64 0.24 P R
  117. 117. Text Matching 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 Precision Recall F ArgoUML Freenet JMeter Away3D Habari Augeas 0.61 0.64 0.63 0.59 0.59 0.59 0.59 0.65 0.62 0.41 0.72 0.52 0.49 0.38 0.43 0.15 0.64 0.24 ActionScript P R
  118. 118. Text Matching 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 Precision Recall F ArgoUML Freenet JMeter Away3D Habari Augeas 0.61 0.64 0.63 0.59 0.59 0.59 0.59 0.65 0.62 0.41 0.72 0.52 0.49 0.38 0.43 0.15 0.64 0.24 P R
  119. 119. Text Matching 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 Precision Recall F ArgoUML Freenet JMeter Away3D Habari Augeas 0.61 0.64 0.63 0.59 0.59 0.59 0.59 0.65 0.62 0.41 0.72 0.52 0.49 0.38 0.43 0.15 0.64 0.24 PHP5 P R
  120. 120. Text Matching 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 Precision Recall F ArgoUML Freenet JMeter Away3D Habari Augeas 0.61 0.64 0.63 0.59 0.59 0.59 0.59 0.65 0.62 0.41 0.72 0.52 0.49 0.38 0.43 0.15 0.64 0.24 P R
  121. 121. Text Matching 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 Precision Recall F ArgoUML Freenet JMeter Away3D Habari Augeas 0.61 0.64 0.63 0.59 0.59 0.59 0.59 0.65 0.62 0.41 0.72 0.52 0.49 0.38 0.43 0.15 0.64 0.24 C P R
  122. 122. Precision Recall Overall results
  123. 123. Precision Recall Overall results 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 Freenet
  124. 124. 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 Freenet Overall results VSM Text MatchingLSI Precision Recall
  125. 125. VSM Text MatchingLSI 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 ArgoUML 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 JMeter 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 Away3D 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 Habari 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 Augeas 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 Freenet

×