Analytics for Smarter               Software Development               Thomas Zimmermann               Microsoft Research,...
© Microsoft Corporation
40 percent of majordecisions are basednot on facts, but onthe manager’s gut.Accenture survey among 254 US managers in indu...
analytics is the useof analysis, data, andsystematic reasoningto make decisions.Definition by Thomas H. Davenport, Jeanne ...
software analytics: empowersoftware development teams togain and share insightfrom their data to make betterdecisions.Raym...
Smart analytics             Usage analytics© Microsoft Corporation     © Microsoft Corporation    Development analytics   ...
Smart analytics© Microsoft Corporation
© Microsoft Corporation
© Microsoft Corporation
Jack Bauer© Microsoft Corporation
ChloeO’Brian© Microsoft Corporation
© Microsoft Corporation
All he needed was a paper clip© Microsoft Corporation
smart analytics isactionable© Microsoft Corporation
© Microsoft Corporation
smart analytics isreal time© Microsoft Corporation
Scene from the movie WarGames (1983). © Microsoft Corporation
smart analytics issharing© Microsoft Corporation
insight patterns data© Microsoft Corporation
insight patterns data© Microsoft Corporation
insight patterns data© Microsoft Corporation
smart analytics isdiversity© Microsoft Corporation
stakeholders tools questionsResearcher                Developer   Tester   Dev. Lead   Test Lead   Manager© Microsoft Corp...
stakeholders tools questions                                                 Clustering           Prediction          Surv...
stakeholders tools questions                          Build tools for                          frequent questions         ...
smart analytics ispeople© Microsoft Corporation
The Decider           The Brain   The Innovator© Microsoft Corporation
inductive engineeringThe Inductive Software Engineering Manifesto: Principles for Industrial Data Mining.Tim Menzies, Chri...
Usage analytics© Microsoft Corporation
Improving the Explorer for Windows 8                                                                        Explorer      ...
© Microsoft Corporation
Improving the Explorer for Windows 8© Microsoft Corporation
Improving the Explorer for Windows 8© Microsoft Corporation
Improving the Explorer for Windows 8                          Customer feedback                          • Bring back the ...
Improving the Explorer for Windows 8        Overlay showing Command usage % by button on the new Home tab© Microsoft Corpo...
Debugging in the (very) large     • Microsoft ships software to 1 billion users           – How do we find out when things...
Windows Error Reporting                                   !analyze© Microsoft Corporation
Windows Error Reporting© Microsoft Corporation
Windows Error Reporting© Microsoft Corporation
Windows Error Reporting             billions     Error reports collected            1 billion     Machines run WER client ...
Relative number of reports per bucket and cumulative distribution forTop 20 Buckets from Office 2010 ITP for a 3 week samp...
Project Gotham Racing 4                                                        Across all races:                          ...
Player progression in Halo 3 Bruce Phillips. Peering into the Black Box of Player Behavior: The Player Experience Panel at...
Development analytics© Microsoft Corporation
SWEPT datamart     • Software Engineering Productivity Tools     • Set of data sources pertaining to product,       engine...
© Microsoft Corporation
© Microsoft Corporation
Change analysis with CRANE                     Risk                     • How risky is the fix we are about to make?      ...
© Microsoft Corporation
1                                  2                                      3                          4© Microsoft Corporat...
5                          6                          7© Microsoft Corporation
© Microsoft Corporation
Branches in Windows                                                                                  networking           ...
Code flow for a single file                                        Orange nodes are                                       ...
Branch decisions                          How do we coordinate parallel                          development?             ...
Assessing branches     Simulate alternate branch structure to assess cost     and benefit of individual branches     • Cos...
Child Branch   Victim Branch   Parent Branch   Simulation (what-if)   Child Branch                                        ...
Assessing branches          Red dots      are branches     with high cost    but low benefit                          Dela...
Assessing branches          Red dots      are branches     with high cost    but low benefit        If high-cost-low-benef...
The future© Microsoft Corporation
INTELLIGENCE IN EVERYTHING                                          "The models I build are based on a                    ...
CLOUD BECOMES THE NORM  "My prediction is that the term  cloud will have disappeared from  the phrase cloud computing by  ...
CONNECTING THE CLOUD WITH THE CROWD                                           "Everything will have moved into            ...
NEW ALGORITHMS AND TOOLS  "Predicting the future will be  common for the average person […]  New algorithms and tools will...
MY ANALYTICS PREDICTIONS FOR 2020                          More + different data                          More algorithms ...
General Chair                                                   MSR 2013 — Call for PapersThomas ZimmermannMicrosoft Resea...
Call for ArticlesSOFTWARE ANALYTICS: SO WHAT?Special Issue of IEEE SoftwareSubmission Deadline: 15 December 2012Publicatio...
smart analytics is        Usage analyticsactionable                Improving the Explorer for Windows 8                   ...
Upcoming SlideShare
Loading in...5
×

Analytics for smarter software development

4,314

Published on

Keynote presented at CASCON 2012

Published in: Technology

Analytics for smarter software development

  1. 1. Analytics for Smarter Software Development Thomas Zimmermann Microsoft Research, USA© Microsoft Corporation
  2. 2. © Microsoft Corporation
  3. 3. 40 percent of majordecisions are basednot on facts, but onthe manager’s gut.Accenture survey among 254 US managers in industry.http://newsroom.accenture.com/article_display.cfm?article_id=4777© Microsoft Corporation
  4. 4. analytics is the useof analysis, data, andsystematic reasoningto make decisions.Definition by Thomas H. Davenport, Jeanne G. HarrisAnalytics at Work – Smarter Decisions, Better Results© Microsoft Corporation
  5. 5. software analytics: empowersoftware development teams togain and share insightfrom their data to make betterdecisions.Raymond Buse, Thomas Zimmermann: Information Needs for SoftwareDevelopment Analytics. ICSE 2012 SEIP Track.http://research.microsoft.com/apps/pubs/default.aspx?id=172578© Microsoft Corporation
  6. 6. Smart analytics Usage analytics© Microsoft Corporation © Microsoft Corporation Development analytics The future © Microsoft Corporation© Microsoft Corporation © Microsoft Corporation
  7. 7. Smart analytics© Microsoft Corporation
  8. 8. © Microsoft Corporation
  9. 9. © Microsoft Corporation
  10. 10. Jack Bauer© Microsoft Corporation
  11. 11. ChloeO’Brian© Microsoft Corporation
  12. 12. © Microsoft Corporation
  13. 13. All he needed was a paper clip© Microsoft Corporation
  14. 14. smart analytics isactionable© Microsoft Corporation
  15. 15. © Microsoft Corporation
  16. 16. smart analytics isreal time© Microsoft Corporation
  17. 17. Scene from the movie WarGames (1983). © Microsoft Corporation
  18. 18. smart analytics issharing© Microsoft Corporation
  19. 19. insight patterns data© Microsoft Corporation
  20. 20. insight patterns data© Microsoft Corporation
  21. 21. insight patterns data© Microsoft Corporation
  22. 22. smart analytics isdiversity© Microsoft Corporation
  23. 23. stakeholders tools questionsResearcher Developer Tester Dev. Lead Test Lead Manager© Microsoft Corporation
  24. 24. stakeholders tools questions Clustering Prediction Surveys Qualitative Analysis Measurements Benchmarking Segmenting What-if analysis Multivariate Analysis Interviews© Microsoft Corporation
  25. 25. stakeholders tools questions Build tools for frequent questions Use data scientists for infrequent questions Frequency Questions© Microsoft Corporation
  26. 26. smart analytics ispeople© Microsoft Corporation
  27. 27. The Decider The Brain The Innovator© Microsoft Corporation
  28. 28. inductive engineeringThe Inductive Software Engineering Manifesto: Principles for Industrial Data Mining.Tim Menzies, Christian Bird, Thomas Zimmermann, Wolfram Schulte and EkremKocaganeli. In MALETS 2011: Proceedings International Workshop on MachineLearning Technologies in Software Engineering© Microsoft Corporation
  29. 29. Usage analytics© Microsoft Corporation
  30. 30. Improving the Explorer for Windows 8 Explorer in Windows 7 Alex Simons: Improvements in Windows Explorer. http://blogs.msdn.com/b/b8/archive/2011/08/29/improvements-in-windows-explorer.aspx© Microsoft Corporation
  31. 31. © Microsoft Corporation
  32. 32. Improving the Explorer for Windows 8© Microsoft Corporation
  33. 33. Improving the Explorer for Windows 8© Microsoft Corporation
  34. 34. Improving the Explorer for Windows 8 Customer feedback • Bring back the "Up" button from Windows XP, • Add cut, copy, & paste into the top-level UI, • More customizable command surface, and • More keyboard shortcuts.© Microsoft Corporation
  35. 35. Improving the Explorer for Windows 8 Overlay showing Command usage % by button on the new Home tab© Microsoft Corporation
  36. 36. Debugging in the (very) large • Microsoft ships software to 1 billion users – How do we find out when things go wrong? • Fix bugs regardless of source application or OS software, hardware, or malware • Prioritize bugs that affect the most users • Get the solutions out to users most efficiently • Try to prevent bugs in the first place K. Glerum, K. Kinshumann, S. Greenberg, G. Aul, V. Orgovan, G. Nichols, D. Grant, G. Loihle, and G. Hunt: Debugging in the (Very) Large: Ten Years of Implementation and Experience. SOSP 2009.© Microsoft Corporation
  37. 37. Windows Error Reporting !analyze© Microsoft Corporation
  38. 38. Windows Error Reporting© Microsoft Corporation
  39. 39. Windows Error Reporting© Microsoft Corporation
  40. 40. Windows Error Reporting billions Error reports collected 1 billion Machines run WER client code 100 million Reports /day processing capacity many 1000s Bugs fixed almost all Microsoft product teams use it over 700 Companies using WER 200 TB of Storage >60 Servers >10 Years of use© Microsoft Corporation
  41. 41. Relative number of reports per bucket and cumulative distribution forTop 20 Buckets from Office 2010 ITP for a 3 week sample period.© Microsoft Corporation
  42. 42. Project Gotham Racing 4 Across all races: • 2 of 9 game modes were used in < 0.5% of races • 12 of 29 event types were used in < 1% of races • 50 of 134 vehicles were used in < 0.25% of races When looking at multiplayer races: • 2 of 4 game modes were used in < 2% of races • 7 of 16 event types were used in < 0.1% of races • 53 of 133 vehicles were used in < 0.25% of races Kenneth Hullett, Nachiappan Nagappan, Eric Schuh, John Hopson: Empirical analysis of user data in game software development. ESEM 2012.© Microsoft Corporation
  43. 43. Player progression in Halo 3 Bruce Phillips. Peering into the Black Box of Player Behavior: The Player Experience Panel at Microsoft Game Studios. GDC 2010© Microsoft Corporation
  44. 44. Development analytics© Microsoft Corporation
  45. 45. SWEPT datamart • Software Engineering Productivity Tools • Set of data sources pertaining to product, engineering process and organizations • Provides consistency of data discovery and access across product groups • Provides a standard platform for creating and deploying analytics • Informs data driven decision making© Microsoft Corporation
  46. 46. © Microsoft Corporation
  47. 47. © Microsoft Corporation
  48. 48. Change analysis with CRANE Risk • How risky is the fix we are about to make? • Which parts of the change are the riskiest? Test • Which subset of existing test cases should be executed to maximize chances of finding defects? • Which parts of the change will not be covered by existing tests and need new tests? Dependence • What dependent parts of the system need to be re-tested? • For code that exposes a public interface, which consumers of the APIs should be verified? Jacek Czerwonka, Rajiv Das, Nachiappan Nagappan, Alex Tarvo, Alex Teterev: CRANE: Failure Prediction, Change Analysis and Test Prioritization in Practice - Experiences from Windows. ICST 2011.© Microsoft Corporation
  49. 49. © Microsoft Corporation
  50. 50. 1 2 3 4© Microsoft Corporation
  51. 51. 5 6 7© Microsoft Corporation
  52. 52. © Microsoft Corporation
  53. 53. Branches in Windows networking integration main integration Process overhead Time delay (velocity) multimedia Changes are isolated => Less build and test breaks Christian Bird, Thomas Zimmermann: Assessing the Value of Branches with What-if Analysis. FSE 2012.© Microsoft Corporation
  54. 54. Code flow for a single file Orange nodes are move operations Blue nodes are edits to the file© Microsoft Corporation
  55. 55. Branch decisions How do we coordinate parallel development? How do we structure the branch hierarchy? Can we reduce the complexity of branching?© Microsoft Corporation
  56. 56. Assessing branches Simulate alternate branch structure to assess cost and benefit of individual branches • Cost: Average delay increase per edit (liveness) How much delay does a branch introduce into development? • Benefit: Provided isolation per edit (isolation) How many conflicts does a branch prevent per edit?© Microsoft Corporation
  57. 57. Child Branch Victim Branch Parent Branch Simulation (what-if) Child Branch faster code flow Victim Branch unneeded integrations removed Parent Branch no longer no longer no longer no longer no longer isolated isolated isolated isolated isolated© Microsoft Corporation
  58. 58. Assessing branches Red dots are branches with high cost but low benefit Delay (Cost) Each dot is a branch Green dots are branches with high benefit Provided Isolation (Benefit) and low cost© Microsoft Corporation
  59. 59. Assessing branches Red dots are branches with high cost but low benefit If high-cost-low-benefit branches had been removed, changes Delay each have saved 8.9 days of transit would (Cost) time and only introduced 0.04 additional conflicts. Each dot is a branch Green dots are branches with high benefit Provided Isolation (Benefit) and low cost© Microsoft Corporation
  60. 60. The future© Microsoft Corporation
  61. 61. INTELLIGENCE IN EVERYTHING "The models I build are based on a mix of social and computer science, statistical data and my constant travels around the world talking to people about the future. […] How do we want to make the lives of people all over the world better by infusing our lives with intelligence?"  Brian David Johnson, Futurist, IntelPhoto via http://www.flickr.com/photos/intelfreepress/6793363054Quote via Corporation © Microsoft http://mashable.com/2012/04/04/predictions-digital-future/
  62. 62. CLOUD BECOMES THE NORM "My prediction is that the term cloud will have disappeared from the phrase cloud computing by 2020, because the majority of computing will simply assumed to be done in the cloud. […]"  Jack Uldrich, FuturistPhoto via http://www.prweb.com/releases/2011/12/prweb9052671.htmQuote via Corporation © Microsoft http://mashable.com/2012/04/04/predictions-digital-future/
  63. 63. CONNECTING THE CLOUD WITH THE CROWD "Everything will have moved into the cloud: content, media, health records, education. Connecting the cloud with the crowd will become a huge business."  Gerd Leonhard, FuturistPhoto via http://www.mediafuturist.com/about.htmlQuote via Corporation © Microsoft http://mashable.com/2012/04/04/predictions-digital-future/
  64. 64. NEW ALGORITHMS AND TOOLS "Predicting the future will be common for the average person […] New algorithms and tools will unlock this rich source of data, creating unprecedented insight. Cloud based tools will allow anyone to mine this data and perform what-if analysis, even using it to predict the future."  Dave Evans, Cisco Chief FuturistDave Evans, Cisco Chief FuturistPhoto via http://www.cisco.com/web/about/ac79/docs/bio/Dave_Evans_Exec_Bio_Final.pdfQuote via Corporation © Microsoft http://mashable.com/2012/04/04/predictions-digital-future/
  65. 65. MY ANALYTICS PREDICTIONS FOR 2020 More + different data More algorithms More people (everyone mines data) More roles (data scientists!) More real-time More social© Microsoft Corporation
  66. 66. General Chair MSR 2013 — Call for PapersThomas ZimmermannMicrosoft Research, USA International Working Conference on Mining Software RepositoriesProgram Co-chairs Sponsored by IEEE TCSE and ACM SIGSOFTMassimiliano Di Penta May 18-10, 2013, San Francisco, CA, USA. Co-located with ICSE 2013.University of Sannio, Italy http://msrconf.org twitter: @msrconfSunghun KimHong Kong University of Science and Software repositories such as source control systems, archived communications between projectTechnology, China personnel, and defect tracking systems are used to help manage the progress of software pro- jects. Software practitioners and researchers are recognizing the benefits of mining this infor-Chief of Data mation to support the maintenance of software systems, improve software design/reuse, andDaniel Germán empirically validate novel ideas and techniques. Research is now proceeding to uncover the waysUniversity of Victoria, Canada in which mining these repositories can help to understand software development and software evolution, to support predictions about software development, and to exploit this knowledgeChallenge Chair concretely in planning future development. The goal of this two-day working conference is to ad-Alberto Bacchelli vance the research and practice of software engineering through the analysis of data stored inUniversity of Lugano, Switzerland software repositories.Web Chair This year, MSR solicits three types of papers: research, practice, and data papers. As in previousJulius Davies MSR editions, there will be a Mining Challenge and a special issue of best MSR papers in the Em-University of British Columbia, Canada pirical Software Engineering journal.Program Committee Important DatesTo be announced. Research/practice papers: February 15, 2013 (abstracts: February 8)Please see the conference website. Data papers: March 4, 2013 Challenge papers: March 4, 2013
  67. 67. Call for ArticlesSOFTWARE ANALYTICS: SO WHAT?Special Issue of IEEE SoftwareSubmission Deadline: 15 December 2012Publication: July/August 2013Software analytics are studies of software that lead to actionable changes toprojects. The feedback from analytics should alter decisions relating to thebusiness, management, design, development, or marketing of softwaresystems. These analytics can be applied to both the products of developers(design documents, code, emails between team members) and to data generated by running programs (usage patterns,economic effects of the running system). Often such analytics requires “big data” methods—visualizations or datamining of large datasets.In this special issue, we seek answers to seemingly simple questions: Do these software analytics really work? Inpractice, what has actually been achieved? For a supposedly data-driven field, there are surprisingly few exemplar casestudies in the literature—of both successes and failures—in this area. Hence we have no answer for the business user(or graduate student) who asks, “In this field, what are the best and worst practices, and why?”The guest editors invite articles addressing the practical successes, as well as the practical drawbacks, of softwareanalytics. Such analytics includes the application of data mining tools to SE data (but can also include combinations ofautomatic and manual data analysis). Topics for these submissions include but are not limited to the following:  the added value of software analytics to the business community (if, indeed, it exists);  the synergies (if any) that can be achieved by combining automatic and human insight about some industrial problems;
  68. 68. smart analytics is Usage analyticsactionable Improving the Explorer for Windows 8 Debugging in the (very) large Analytics for Xbox gamesreal time © Microsoft Corporationsharingdiversity Development analytics The SWEPT datamartpeople Risk Assessment with CRANE Branchmania – too many branches © Microsoft Corporation

×