Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Automation in the Bug Flow - Machine Learning for Triaging and Tracing

1,524 views

Published on

Issue management is a costly part of software development. In large projects, the continuous inflow of issue reports contributes to the information overload in a project, i.e., "a state where individuals do not have time or capacity to process all available information". In issue triaging, an initial step in issue management, a developer must be able to overview existing issue reports and easily navigate the software engineering project landscape. In this presentation, we present support for two work tasks involved in issue management: 1) issue assignment and 2) change impact analysis. We use machine learning to harness the ever-growing number of issue reports, by training recommendation systems on previous issues. Our industrial evaluations on 50,000+ issue reports in two large software development organizations indicate that automated issue assignment performs in line with current manual work. Moreover, we present how traceability from already resolved issue reports to various artifacts can be reused to jump start change impact analyses for newly submitted issues. Finally, we speculate on future ways to tame information overload into helpful software engineering recommendations.

Published in: Science

Automation in the Bug Flow - Machine Learning for Triaging and Tracing

  1. 1. Automation in the Bug Flow - MACHINE LEARNING FOR TRIAGING AND TRACING MARKUS BORG, LUND UNIVERSITY
  2. 2. Bug tracker The number of incoming bug reports can be overwhelming…
  3. 3. Bug tracker Machine Learning
  4. 4. - Final year PhD student - MSc CS and engineering - ABB software developer (3 years) process automation compilers and editors Per Runeson
  5. 5. The Challenge The Solution The Evaluation
  6. 6. Reqts. DB Issue Repo Code Repo Test DB Developers in large projects must navigate complex information landscapes that continously change Doc. DB.
  7. 7. One bug is not much of a problem…
  8. 8. Bug tracker But a large simultaneous inflow of bug reports can make the best bug tracking system sweat!
  9. 9. Making the wrong prioritizations might result in bugs on your customers
  10. 10. In a safety-critical context This talk addresses two tasks involved in issue management: 1. Issue Assignment 2. Change Impact Analysis
  11. 11. By safety-critical we refer to document-driven development with a rigid process… … prior to changing source code, a formal change impact analysis has to be conducted and reported according to an approved template.
  12. 12. We want to increase the confidence at commit time even further!
  13. 13. The Challenge The Solution The Evaluation
  14. 14. We aim to harness the intrinsic navigational value of bug reports
  15. 15. Bug tracker We leverage on the number of bug reports in the projects… Machine Learning
  16. 16. Bug tracker The more bugs, the better the machine learning gets! Machine Learning
  17. 17. Automated Issue Assignment • Goal: Useful tool deployable with minimum configuration effort • Approach: Train classifiers on historical bug reports Combine them using state-of-the-art ensemble learning Leif Jonsson
  18. 18. How to Represent a Bug Report? • Component • Severity • System Version • Submit Date • Submitter Location etc. • … And the text! Title and description.
  19. 19. Automated Change Impact Analysis • Goal: Intuitive tool to jump start analyses based on historical data Faster + more accurate analyses compared to fully manual work • Approach Step 1: Mine the history Step 2: Recommend impact for new bug fix
  20. 20. Present recommendations Amazon-style: ”Other developers working on this class also modified/tested…”
  21. 21. Construct network of previously reported impact Index textual data with
  22. 22. Calculate centrality measures
  23. 23. Automated Impact Analysis • Approach part 2: Recommend impact Find similar bugs using Apache Lucene Follow links to identify candidate impact set Req. X.Y Req. Z.Y Design Doc. X.Y Test case UTC56 Design Doc. X.Y
  24. 24. Automated Impact Analysis • Approach part 2: Recommend impact Find similar bugs using Apache Lucene Follow links to identify candidate impact set Use centrality measures to rank candidate impact 1. Requirement X.Y 2. Design Document X.Y 3. Test case UTC56 4. Design Document X.Y 5. Requirement Z.Y
  25. 25. The Challenge The Solution The Evaluation
  26. 26. Experiment: Issue Assignment • Five large datasets from two companies – Telecom and Automation – 50.000+ issue reports • 10-fold cross-validation and simulation (”replaying history”)
  27. 27. Experiment: Issue Assignment 67 17 64 28 36 • Prediction accuracy in line with human activity • Numbers depict number of teams in the projects
  28. 28. Experiment: Issue Assignment • Warning! Some systems need fresh training data
  29. 29. Experiment: Issue Assignment But the decay is not always exponential…
  30. 30. Experiment: Change Impact Analysis • Experiment with historical impact – Training set: 8 years, Test set: 2 years ImpRec presents 30% of past impact among the top-5 recommendations (40%@10, 50%@20) But what does that mean? User study needed.
  31. 31. Case Study: Change Impact Analysis • Industrial case study – Two units of analysis: Team Sweden & Team India – Tool deployed in March 2014 & August 2014 – Interviews and user log files 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 Click Distribution, top-20 hits 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 IA Google Initial result: Developers’ interaction with impact recommendations similar to Google searches
  32. 32. Case Study: Change Impact Analysis • ”Finding these past bugs was exactly what I was looking for actually” - Developer, India • ”Directing attention to potential side-effects is very important” - Project manager, Sweden
  33. 33. Conclusions
  34. 34. Automated Issue Assignment • Automated assignment as accurate as current manual work – But instantaneous! • At least 2.000 bug reports needed for training • Continously monitor the accuracy Favourable Unfavourable Static team structure Dynamic team structure Maintenance project New development
  35. 35. Automated Change Impact Analysis • Recommendation system provides a useful starting point • Recommending related issues is a popular feature – Study previous issue resolutions – Compare with previous impact analyses • Recommendation recall for impact: 30-55% – Reuse previous impact to jump-start analysis – Provide warning if probable impact is missing
  36. 36. Machine learning can guide maintenance work Much potential: - Severity prediction - Resolution times - ”Noise” filtering
  37. 37. PHOTO CREDITS Brown stink bug - Marlin E. Rice Isopods - Omoshiro Aquarium - Flickr: littoraria, coda Cubicles - Flickr: templetonelliot, ifl, danburgmurmur Eightball girl - Flickr: mobilestreetlife Evaluate - Flickr: theideadesk My wife - My wife Thank you! markus.borg@cs.lth.se cs.lth.se/markus_borg @_Troddel_

×