Fast Feedback Cycles in Empirical Software Engineering Research: research and engineering challenges

981 views
846 views

Published on

Gathering empirical knowledge is a time consuming and error prone task while the results from empirical studies often are soon outdated by technological solutions. As a result, the impact of empirical results on software engineering practice is often not guaranteed. The empirical community is aware of this problem and during the last years, there has been a lot of discussion on how to improve the impact of empirical results on industrial practices. The discussion often focused on the use of data mining techniques and analysis of software engineering data, and the concept has often been labeled as “Empirical Software Engineering 2.0″.

Starting from the current status the discussion in this specific topic, we propose a way to use massive data analysis as a problem-driven data analysis technique and, more important, as a mean to improve the knowledge sharing process between research and industry. Our assertion is that automatic data mining and analysis, in conjunction to the emerging concepts of lean economy, wisdom of crowds, and open communities, can enable fast feedback cycles between researchers and practitioners (and among researchers as well) and consequently improve the transfer of empirical results into industrial practice.

We identify the key concepts on gathering fast feedback in empirical software engineering by following an experience-based line of reasoning by argument. Based on the identified key concepts, we design and present an approach to fast feedback cycles in empirical software engineering. We identify resulting challenges and infer a research roadmap in the form of a list of open research and engineering challenges.

Our results are not validated yet as they need a broader discussion in the community. To this end, our results serve as a basis to foster the discussion and collaboration within the research community.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
981
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
10
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Fast Feedback Cycles in Empirical Software Engineering Research: research and engineering challenges

  1. 1. Fast Feedback Cycles in Empirical Software Engineering Research Research and Engineering challenges Dr. Antonio Vetro’ Technische Universität München, Germany ! 24 February 2014 VU Amsterdam, Software and Services (S2) Research Group vetro@in.tum.de @phisaz
  2. 2. Outline • What this talk is about: Empirical Software Engineering Research • Feedback cycles • Fast Feedback Cycles in Software Engineering Research
  3. 3. Outline • What this talk is about: Empirical Software Engineering Research • Feedback cycles • Fast Feedback Cycles in Software Engineering Research
  4. 4. ! Fast Feedback Cycles in Empirical Software Engineering Research
  5. 5. ! Fast Feedback Cycles in Empirical Software Engineering Research
  6. 6. Empirical cycle Source: Empirical Practice in Software Engineering, Andreas Jedlitschka, Liliana Guzmán, Jessica Jung, Constanza Lampasona, Silke Steinbach, in Perspectives on the Future of Software Engineering, 2013, pp 217-233
  7. 7. Big Picture, 3rd layer: Methods Deduction Formal / Conceptual Analysis Theory/System of theories Grounded Theory Theory
 Building Exploratory • Case & Field Studies • Data Analysis Confirmatory • Case & Field Studies • Experiments, • Simulations (Tentative)! Hypotheses Survey and Interview Research Falsification /
 Support Pattern
 Building • Ethnographic Studies • Folklore Gathering Observations / Evaluations Induction Study Population Further reading: Vessey et al A unified classification system for research in the computing disciplines For n o not pa w, prototyp view” rt of this “m ing is (so ar en’t re ethod ferenc mode e ls)
  8. 8. Traditional approach • Benefits: – scientific tool for evaluation, validation, discovery – possible theory building
  9. 9. Traditional approach • Benefits: – scientific tool for evaluation, validation, discovery – possible theory building • Drawbacks – long, difficult knowledge generation and transfer to industry – results sensitive to context variables and time – it doesn’t fit the new paradigms of data streams – it doesn’t keep the pace of innovation – lack of flexibility (e.g., you cannot change study design after the fact)
  10. 10. Traditional approach : the need for speed • Benefits: – scientific tool for evaluation, validation, discovery – possible theory building • Drawbacks – long, difficult knowledge generation and transfer to industry – results sensitive to context variables and time – it doesn’t fit the new paradigms of data streams – it doesn’t keep the pace of innovation – lack of flexibility (e.g., you cannot change study design after the fact)
  11. 11. Empirical Software Engineering 2.0
  12. 12. Empirical Software Engineering 2.0
  13. 13. Empirical Software Engineering 2.0 Software archives Data from any artefact Data always available Instantaneous results MSR 2007 Zeller A.
  14. 14. Empirical Software Engineering 2.0 Software archives Data from any artefact Data always available Instantaneous results MSR 2007 Zeller A. Data mining on multiple sources Domain knowledge (case studies) Adaptive agents: mining + monitor + repair Local models ! ICSE 2011 Menzies T.
  15. 15. Empirical Software Engineering 2.0 Software archives Data from any artefact Data always available Instantaneous results MSR 2007 Zeller A. Data mining on multiple sources Domain knowledge (case studies) Adaptive agents: mining + monitor + repair Local models ! ICSE 2011 Menzies T. Hybrid approach: manual hp testing + speed of mining Data driven decisions IEEE SW 2012 Shull F.
  16. 16. Empirical Software Engineering 2.0 Software archives Data from any artefact Data always available Instantaneous results MSR 2007 Zeller A. Data mining on multiple sources Domain knowledge (case studies) Adaptive agents: mining + monitor + repair Local models ! ICSE 2011 Menzies T. Hybrid approach: manual hp testing + speed of mining Data driven decisions Collaborative effort in hp testing process Iterative model building IEEE SW 2012 IEEE SW 2013 Shull F. Shull F.
  17. 17. The path towards EMSE 2.0
  18. 18. The path towards EMSE 2.0 EMSE 1.0 ! l Case studies Watch, don't touch ! l Experiments l Vary a few conditions in a project l Simple analyses lLittle ANOVA, regression, maybe a T-test l
  19. 19. The path towards EMSE 2.0 EMSE 1.0 EMSE 2.0 ! l ! Case studies Watch, don't touch ! l Experiments l Vary a few conditions in a project l Simple analyses lLittle ANOVA, regression, maybe a T-test l l Data generators Case studies l Experiments l Data streams l Data analysis l 10K possible data miners Crowd-sourcing l 10K possible analysts l l Adapted from Tim Menzies, Forrest Schull, “Empirical software Engineering 2.0”
  20. 20. What this talk is about Formal / Conceptual Analysis Theory/System of theories Grounded Theory Theory
 Building Exploratory • Case & Field Studies • Data Analysis Survey and Interview Research • Ethnographic Studies • Folklore Gathering Confirmatory • Case & Field Studies • Experiments, • Simulations (Tentative)! Hypotheses Falsification /
 Support Pattern
 Building Observations / Evaluations Study Population Further reading: Vessey et al A unified classification system for research in the computing disciplines For n o not pa w, prototyp view” rt of this “m ing is (so ar en’t re ethod ferenc mode e ls)
  21. 21. What this talk is about Theory/System of theories Theory
 Building Exploratory • Data Analysis (Tentative)! Hypotheses Falsification /
 Support Pattern
 Building Observations / Evaluations Study Population Further reading: Vessey et al A unified classification system for research in the computing disciplines
  22. 22. Outline • What this talk is about: Empirical Software Engineering Research • Feedback cycles • Fast Feedback Cycles in Software Engineering Research
  23. 23. Fast Feedback Cycles in Empirical Software Engineering Research
  24. 24. Fast Feedback Cycles in Empirical Software Engineering Research
  25. 25. Lean and Agile
  26. 26. Earned value in iterations Source: Hakan Erdogmus. 2005. The Economic Impact of Learning and Flexibility on Process Decisions. IEEE Softw. 22, 6 (November 2005), 76-83. DOI=10.1109/MS.2005.165 http://dx.doi.org/10.1109/MS.2005.165
  27. 27. Earned value in iterations Our scope is: knowledge Source: Hakan Erdogmus. 2005. The Economic Impact of Learning and Flexibility on Process Decisions. IEEE Softw. 22, 6 (November 2005), 76-83. DOI=10.1109/MS.2005.165 http://dx.doi.org/10.1109/MS.2005.165
  28. 28. Big Data
  29. 29. In the web 2.0: Feedback Cycles + Big Data
  30. 30. In the web 2.0 : Feedback Cycles + Big Data
  31. 31. In the web 2.0 : Big Data + Feedback Cycles Implies crowd sourcing
  32. 32. End, finally, back to Software Engineering
  33. 33. Big Data in SE: example from agile development
  34. 34. Big Data in SE: example from agile development Stories Metrics from stories (e.g. , Req Smells) …
  35. 35. Big Data in SE: example from agile development Story points Tasks Features Dependencies Metrics from stories Estimations … Stories Metrics from stories (e.g. , Req Smells) …
  36. 36. Big Data in SE: example from agile development Implementation time Code Features from Code Metrics from Code Bugs Changes … Story points Tasks Features Dependencies Metrics from stories Estimations … Stories Metrics from stories (e.g. , Req Smells) …
  37. 37. Big Data in SE: example from agile development Implementation time Code Features from Code Metrics from Code Bugs Changes … UATs Story points Tasks Features Dependencies Metrics from stories Estimations … Stories Metrics from stories (e.g. , Req Smells) …
  38. 38. Big Data in SE: example from agile development Implementation time Code Features from Code Metrics from Code Bugs Changes … Story points Tasks Features Dependencies Metrics from stories Estimations … UATs Customer A Stories Metrics from stories (e.g. , Req Smells) …
  39. 39. Big Data in SE: example from agile development Implementation time Code Features from Code Metrics from Code Bugs Changes … Story points Tasks Features Dependencies Metrics from stories Estimations … UATs Customer A Stories Metrics from stories (e.g. , Req Smells) …
  40. 40. Big Data in SE: example from agile development Implementation time Code Features from Code Metrics from Code Bugs Changes … Story points Tasks Features Dependencies Metrics from stories Estimations … UATs Customer A Stories Metrics from stories (e.g. , Req Smells) …
  41. 41. Big Data in SE: example from agile development Implementation time Code Features from Code Metrics from Code Bugs Changes … Story points Tasks Features Dependencies Metrics from stories Estimations … UATs Customer A + Feedback Stories Metrics from stories (e.g. , Req Smells) … Acceptance Motivations Improvements Bugs …
  42. 42. Big Data in SE: example from agile development Implementation time Code Features from Code Metrics from Code Bugs Changes … Story points Tasks Features Dependencies Metrics from stories Estimations … UATs Customer A Discussion Problems in Sprints … Stories Metrics from stories (e.g. , Req Smells) … + Feedback Acceptance Motivations Improvements Bugs …
  43. 43. Big Data in SE: example from agile development Implementation time Code Features from Code Metrics from Code Bugs Changes … Story points Tasks Features Dependencies Metrics from stories Estimations … Discussion Problems in Sprints … UATs Customer A Discussion Problems in Sprints … Stories Metrics from stories (e.g. , Req Smells) … + Feedback Acceptance Motivations Improvements Bugs …
  44. 44. Big Data in SE: example from agile development Discussion Problems in Sprints … Story points Tasks Features Dependencies Metrics from stories Estimations … Implementation time Code Features from Code Metrics from Code Bugs Changes … Discussion Problems in Sprints … UATs Customer A Discussion Problems in Sprints … Stories Metrics from stories (e.g. , Req Smells) … + Feedback Acceptance Motivations Improvements Bugs …
  45. 45. Big Data in SE: example from agile development Discussion Problems in Sprints … Story points Tasks Features Dependencies Metrics from stories Estimations … Implementation time Code Features from Code Metrics from Code Bugs Changes … Discussion Problems in Sprints … Discussion Problems in Sprints … UATs Customer A Discussion Problems in Sprints … Stories Metrics from stories (e.g. , Req Smells) … + Feedback Acceptance Motivations Improvements Bugs …
  46. 46. Big Data in SE: example from agile development Discussion Problems in Sprints … Story points Tasks Features Dependencies Metrics from stories Estimations … Implementation time Code Features from Code Metrics from Code Bugs Changes … Discussion Problems in Sprints … Discussion Problems in Sprints … UATs Customer A Discussion Problems in Sprints … Stories Metrics from stories (e.g. , Req Smells) … + Feedback Acceptance Motivations Improvements Bugs … Even simple processes produce huge amount of information, interconnected each other
  47. 47. Big Data in SE: example from agile development Discussion Problems in Sprints … Story points Tasks Features Dependencies Metrics from stories Estimations … Even simple mechanisms can collect valuable feedback Implementation time Code Features from Code Metrics from Code Bugs Changes … Discussion Problems in Sprints … Discussion Problems in Sprints … UATs Customer A Discussion Problems in Sprints … Stories Metrics from stories (e.g. , Req Smells) … + Feedback Acceptance Motivations Improvements Bugs … Even simple processes produce huge amount of information, interconnected each other
  48. 48. Big Data + Feedback in SE : example from agile development
  49. 49. Big Data + Feedback in SE : example from agile development What to mine – data from any step (also from past projects)
  50. 50. Big Data + Feedback in SE : example from agile development What to mine – data from any step (also from past projects) Implementation time Code Features from Code Metrics from Code Bugs Changes … Story points Tasks Features Dependencies Metrics from stories Estimations … UATs Discussion time Problems in Sprints (list) …
  51. 51. Big Data + Feedback in SE : example from agile development What to mine – data from any step (also from past projects) What we try to find – Indicators for problems • Maintenance problems • Wrong Effort estimation • Test effort • UAT outcome • Development problems – New patterns (relationships:“what”, not “why”) Implementation time Code Features from Code Metrics from Code Bugs Changes … Story points Tasks Features Dependencies Metrics from stories Estimations … UATs Discussion time Problems in Sprints (list) …
  52. 52. Big Data + Feedback in SE : example from agile development What to mine – data from any step (also from past projects) What we try to find – Indicators for problems • Maintenance problems • Wrong Effort estimation • Test effort • UAT outcome • Development problems – New patterns (relationships:“what”, not “why”)
  53. 53. Big Data + Feedback in SE : example from agile development What to mine – data from any step (also from past projects) What we try to find – Indicators for problems • Maintenance problems • Wrong Effort estimation • Test effort • UAT outcome • Development problems – New patterns (relationships:“what”, not “why”)
  54. 54. Big Data + Feedback in SE : example from agile development What to mine – data from any step (also from past projects) What we try to find – Indicators for problems • Maintenance problems • Wrong Effort estimation • Test effort • UAT outcome • Development problems – New patterns (relationships:“what”, not “why”)
  55. 55. Big Data + Feedback in SE : example from agile development What to mine – data from any step (also from past projects) What we try to find – Indicators for problems • Maintenance problems • Wrong Effort estimation • Test effort • UAT outcome • Development problems – New patterns (relationships:“what”, not “why”)
  56. 56. Big Data + Feedback in SE : example from agile development What to mine – data from any step (also from past projects) Results are continuously visualised to stakeholders We can check them and collect fast feedback What we try to find – Indicators for problems • Maintenance problems • Wrong Effort estimation • Test effort • UAT outcome • Development problems – New patterns (relationships:“what”, not “why”)
  57. 57. Big Data + Feedback in SE : example from agile development What to mine – data from any step (also from past projects) Results are continuously visualised to stakeholders We can check them and collect fast feedback Fast feedback enables: - What we try to find – Indicators for problems • Maintenance problems • Wrong Effort estimation • Test effort • UAT outcome • Development problems – New patterns (relationships:“what”, not “why”) iterative local model building - knowledge earned value as in lean approach - input for follow up studies
  58. 58. Big Data + Feedback in SE : example from agile development What to mine – data from any step (also from past projects) Results are continuously visualised to stakeholders We can check them and collect fast feedback Fast feedback enables: - What we try to find – Indicators for problems • Maintenance problems • Wrong Effort estimation • Test effort • UAT outcome • Development problems – New patterns (relationships:“what”, not “why”) iterative local model building - knowledge earned value as in lean approach - input for follow up studies
  59. 59. Big Data + Feedback in SE : example from agile development What to mine – data from any step (also from past projects) Results are continuously visualised to stakeholders We can check them and collect fast feedback Fast feedback enables: - What we try to find – Indicators for problems • Maintenance problems • Wrong Effort estimation • Test effort • UAT outcome • Development problems – New patterns (relationships:“what”, not “why”) iterative local model building - knowledge earned value as in lean approach - input for follow up studies ! Risks minimisation Focus on value
  60. 60. Outline • What this talk is about: Empirical Software Engineering Research • Feedback cycles • Fast Feedback Cycles in Software Engineering Research
  61. 61. Fast Feedback Cycles in Empirical Software Engineering Research
  62. 62. Process to enable fast feedback cycles Automatic Data Analysis Trigger Produce Tune Facts, hypotheses (Testable knowledge) Data sources Edit ? Check Is model for Experience  base   (Tested  knowledge) View, edit Stakeholders (implicit/explicit feedback)
  63. 63. Process to enable fast feedback cycles Any data collected from software development, execution, maintenance. Stream and snapshot data Trigger Also data from past studies is considered. Automatic Data Analysis Produce Tune Facts, hypotheses (Testable knowledge) Data sources Edit ? Check Is model for Experience  base   (Tested  knowledge) View, edit Stakeholders (implicit/explicit feedback)
  64. 64. Process to enable fast feedback cycles Automatic Data Analysis Data mining techniques are applied to the data sources and used to reveal, strengthen or deny hypotheses Trigger Produce Tune Facts, hypotheses (Testable knowledge) Data sources Edit ? Check Is model for Experience  base   (Tested  knowledge) View, edit Stakeholders (implicit/explicit feedback)
  65. 65. Process to enable fast feedback cycles Automatic Data Analysis Trigger Produce Tune Facts, hypotheses (Testable knowledge) Data sources Edit ? Check Is model for Experience  base   (Tested  knowledge) Fact and hypotheses derived from data analyses View, edit Stakeholders (implicit/explicit feedback)
  66. 66. Process to enable fast feedback cycles Automatic Data Analysis Trigger Produce Tune Facts, hypotheses (Testable knowledge) Data sources Edit ? Check Is model for Experience  base   (Tested  knowledge) Industry and research collaborators View, edit Stakeholders (implicit/explicit feedback)
  67. 67. Process to enable fast feedback cycles Automatic Data Analysis Trigger Produce Tune Facts, hypotheses (Testable knowledge) Data sources Edit ? Check Is model for Experience  base   (Tested  knowledge) View, edit Collection of tested knowledge, well established theories in the field Stakeholders (implicit/explicit feedback)
  68. 68. Process to enable fast feedback cycles Automatic Data Analysis Trigger Produce Tune Facts, hypotheses (Testable knowledge) Data sources Edit ? Check Is model for Experience  base   (Tested  knowledge) View, edit Stakeholders (implicit/explicit feedback)
  69. 69. Automatic   Data  Analysis   Edit  ? Data  sources Data  sources Tune Produce Facts,  hypotheses   (Testable  knowledge) Check Stakeholders   (implicit/explicit  feedback) Is  model  for Feed View,  edit Feed
  70. 70. Fast Feedback Cycles in Empirical Software Engineering Research
  71. 71. Fast Feedback Cycles in Empirical Software Engineering Research Research and Engineering challenges
  72. 72. Process to enable fast feedback cycles Objective Data sources Any data collected from software development, execution, maintenance. Stream and snapshot data. Also data from past studies is considered. Challenges Integration of different datasets Guarantee high data quality Applicability of temporal abstractions
  73. 73. Process to enable fast feedback cycles Objective Automatic Data Analysis Data mining techniques are applied to the data sources and used to reveal, strengthen or deny hypotheses Challenges Appropriate selection and tuning of techniques (also automatic and iterative) Appropriate response variable selection Incorporation of a priori knowledge and human feedback Exploration should take into account industry needs (usually short terms)
  74. 74. Process to enable fast feedback cycles Objective Facts, hypotheses (Testable knowledge) Fact and hypotheses derived from data analyses Facts not supported by statistical significance are not rejected
 Challenges Meaningfulness of generated knowledge Consistency check (formal, semantic) with human expertise and experience base Representation issues ( See experience base)
  75. 75. Process to enable fast feedback cycles Objective Experience  base   (Tested  knowledge) Collection of tested knowledge, well established theories in the field Challenges Representation of the information in a easily query-able, representable and modifiable way. Representation of uncertainty and soft constraints. Protection of sensible data.
  76. 76. Process to enable fast feedback cycles Objective Industry and research collaborators Challenges Value the two types of stakeholders Collect and give meaningful fast feedback Create and test pragmatical feedback mechanisms: e.g., shared dashboards, interactive visualisations, …
  77. 77. Some useful and inspiring references ! Books: • Münch, Jürgen, Schmid, Klaus (Eds.) , Perspectives on the Future of Software Engineering Essays in Honor of Dieter Rombach, 2013, XVI, 366 p.r. • Mayer-Schonberger,V., Cukier, K. (2013). Big Data: A Revolution That Will Transform How We Live, Work, and Think . Boston: Houghton Mifflin Harcourt. ISBN: 0544002695 9780544002692 • Ries, Eric. 2011. The lean startup: how today's entrepreneurs use continuous innovation to create radically successful businesses. New York: Crown Business. • Biffl, S., Aurum, A., Boehm, B., Erdogmus, H., Grünbacher, P. (Eds.) , Value-Based Software Engineering, 2006, XXII, 388 p. 69 ILLUS l ! ! l Articles: • Hassan, A.E.; Hindle, A.; Runeson, P.; Shepperd, M.; Devanbu, P.; Sunghun Kim, Roundtable: What's Next in Software Analytics, Software, IEEE , vol.30, no.4, pp.53,56, July-Aug. 2013 • Forrest Shull: Getting an Intuition for Big Data. IEEE Software 30(4): 3-6 (2013) • J. Münch, F. Fagerholm, P. Johnson, J. Pirttilahti, J. Torkkel, and J. Järvinen, “Creating Minimum Viable Products in IndustryAcademia Collaborations,” in Proceedings of the Lean Enterprise Software and Systems Conference (LESS 2013), Galway, Ireland, 2013. • Forrest Shull: Research 2.0? IEEE Software 29(6): 4-8 (2012) • • ! l Dag I. K. Sjøberg, Tore Dybå, Bente C. D. Anda, and Jo E. Hannay, Building theories in software engineering. In Forrest Shull, Janice Singer, and Dag I. K. Sjøberg, editors, Guide to Advanced Empirical, Software Engineering, chapter 12, pages 312–336. Springer London, London, 2008. Victor R. Basili, Jens Heidrich, Mikael Lindvall, Jürgen Münch, Myrna Regardie, and Adam Trendowicz. ,GQM+ Strategies - Aligning Business Strategies with Software Measurement., ESEM, page 488-490. IEEE Computer Society, (2007) Presentations: Tim Menzies, Forrest Shull , 2011, Empirical Software Engineering 2.0, http://www.slideshare.net/timmenzies/empirical-softwareengineering-v20 Thomas Zimmerman, ICSM 2010, Analytics for Software Development , http://www.slideshare.net/tom.zimmermann/analytics-forsoftware-development Andreas Zeller, MSR 2007, Empirical Software Engineering 2.0: How mining software repositories changes the game for empirical software engineering research, http://msr.uwaterloo.ca/msr2007/Empirical-SE-2.0-Zeller.pdf
  78. 78. Acknowledgements Thanks for feedback, reviews, writing, and even listening to me :), to : ! Daniel Méndez Fernández, Manfred Broy, Florian Grigoleit, Henning Femmer, Peter Struss, Jacob Mund, Benedikt Hauptmann, Andreas Vogelsang ! ! Stefan Wagner ! ! Forrest Shull ! ! ! Davide Falessi ! ! Andreas Jedlitschka and Jens Heidrich ! ! !
  79. 79. THANKS! Fast Feedback Cycles in Empirical Software Engineering Research Research and Engineering challenges Dr. Antonio Vetro’ Technische Universität München, ! 24 February 2014 VU Amsterdam, Software and Services (S2) Research Group vetro@in.tum.de @phisaz

×