Introduction to Monte-carlo Analysis for Software Development             2011                     Introduction to        ...
Introduction to Monte-carlo Analysis for Software DevelopmentIntroduction                                     This paper i...
Introduction to Monte-carlo Analysis for Software Developmentif you had the benefit of actually doing the       (or anyone...
Introduction to Monte-carlo Analysis for Software DevelopmentModeling software projects                             averag...
Introduction to Monte-carlo Analysis for Software DevelopmentFigure 1 - Input worksheet for Scrum simulationThese inputs a...
Introduction to Monte-carlo Analysis for Software Developmentexample, Equation 1 shows the function to            =((Total...
Introduction to Monte-carlo Analysis for Software Development Excel to get a set of simulation results). The              ...
Introduction to Monte-carlo Analysis for Software Development=(COUNTIF(Velocity1Range,"< " &                              ...
Introduction to Monte-carlo Analysis for Software Development represent stories flowing from left to right                ...
Introduction to Monte-carlo Analysis for Software Developmentcards are in each status at any given                     sys...
Introduction to Monte-carlo Analysis for Software Developmentsimulation has run, this application writes                as...
Introduction to Monte-carlo Analysis for Software DevelopmentFigure 9 - Manual sensitivity analysis. Defects have more imp...
Introduction to Monte-carlo Analysis for Software Developmentignored this, we compromise accuracy in                      ...
Introduction to Monte-carlo Analysis for Software Development Figure 12 - EasyFit from Mathwave Technologies is a commerci...
Introduction to Monte-carlo Analysis for Software Developmentunprecedented confidence the three likelyongoing upper-manage...
Upcoming SlideShare
Loading in …5
×

Introduction to monte-carlo analysis for software development - Troy Magennis (Focused Objective)

9,895 views

Published on

Forecasting and managing software development project risks & uncertainty. Monte-carlo analysis is the tool of choice for managing risk in many fields where risk is an inherent part of doing business. This paper examines how to use monte-carlo techniques to understand and leverage risk in Software Development projects and teams.

Published in: Technology

Introduction to monte-carlo analysis for software development - Troy Magennis (Focused Objective)

  1. 1. Introduction to Monte-carlo Analysis for Software Development 2011 Introduction to Monte-carlo Analysis for Software Development Forecasting and managing software development project risks & uncertainty Monte-carlo analysis is the tool of choice for managing risk in many fields where risk is an inherent part of doing business. This paper examines how to use monte-carlo techniques to understand and leverage risk in Software Development projects and teams. Troy MagennisTroy Magennis – Focused Objective Focused Objective (FocusedObjective.com) 1 Page 6/1/2011
  2. 2. Introduction to Monte-carlo Analysis for Software DevelopmentIntroduction This paper introduces a technique forFor software development, it is often answering these questions given the risksnecessary to estimate a project upfront in involved in software development andorder to get project approval, obtain budget delivery. Monte-carlo analysis is a provenand hire the correct team size and skill-mix. technique for determining the likelihood ofThis is often at odds with the Agile an outcome in the face of many difficult todevelopment methodology where full measure input criteria. Monte-carlo analysisupfront design and specification is avoided, doesn’t completely eliminate any risk, but itand delivery happens in small iterations does give a much higher degree ofuntil a backlog is completed. The desire to satisfactory answer than the plain guesseswork iteration to iteration and choose a and gut feel that is employed today (as tofinite level of work each cycle is compelling, release date) in many software projects.and it does un-deniably bring value to What is Monte-carlo analysis?production earlier than a pure waterfall Monte-carlo analysis is a mathematicalapproach. However, the fact still remains technique that finds the likely patterns in anthat in order to provide any value to an equations result given random input valuesorganization, a finite minimum level of that are constrained between likely real-functionality (work) needs to be delivered world values for those inputs. In place of anby a preferred date, within a budget equation, for most purposes a spreadsheetconstraint; very few companies will sign off of software model of the real-world processon a project that has no target date, and an is built, and likely (but random) inputs areopen budget. Often delays incur high cost; fed into these models many thousands ofnot just development costs, but also as times to find a pattern in the results.competitors launch new feature first, or takean increasing market share. Even with Agile For example, if you know that there areteams it is important for any development one-hundred software product storiesmanager or organization to be ready to (features) to develop, and that from historyanswer the following questions on an (or educated estimate), you know that theongoing basis – shortest time it would take each story is one day, and the longest is three days then a 1. How much will this product cost to Monte-carlo analysis would simulate in develop and deliver? software completing these one-hundred 2. What is the likelihood of releasing stories with a random work time of between by date x? one and three days; and it would do this 3. What resources do you need to hit thousands of times. The result would be a date x (money equals people, so the histogram of the total time for each question is often how much more simulated project. This would be similar to money do you need to hit date x)?Troy Magennis – Focused Objective Page 2
  3. 3. Introduction to Monte-carlo Analysis for Software Developmentif you had the benefit of actually doing the (or anyone put in this position) over-project one thousand times, but the estimate. They add a little bit more to covercomputer does this quicker. For a model the unknowns – often doubling eachthis simple, the answer can be computer by estimate. Worst still, knowing that estimatessimple averaging without employing are traditionally under-estimated, each timeMonte-carlo analysis. But as the model for they are presented, the next level ofdeveloping and delivering software starts to management mentally or in power-pointfollow a more real-world scenario, it presentation, double what they see/hear.quickly gets too complex for simple This leads to projects not being fundedarithmetic. Defects, added scope, because of the excessive investment needenvironment downtime and other blocking for even the smallest of features. On theevents, staff availability are just a few of the other end of accuracy, all too often, othernormal day to day events that cause a staff aren’t in this estimate loop, QA forcascading impact on software delivery, and testing, DevOps for release management,Monte-carlo analysis is the right tool to graphic designers for the artistic flair, oftenmanage estimation given the un-predictable don’t get the benefit of adding their input tonature of these events (but likely following the estimate equation leaving the estimatesa pattern). (even given the contingency fudge factor) under-estimated for high risk features.The problem with traditionalestimation The organization as a whole still has theDeveloper estimates for software stories problem of needing to make a decision onoften turn out to be in-accurate causing whether the cost involved in a project willmore erosion of trust in organizations than give the return on investment needed toany other aspect of the business to proceed. For that decision, a delivery datetechnology teams’ relationship. When a and the cost (staff, equipment, softwarenew project is explored, developers are licenses, etc.) of development and deliverygiven vague single sentence descriptions of is needed. Given no other option, thea vision an analyst or business owner has in developer estimates have to be taken as antheir head, and asked to give an estimate. input, and therefore delivery date andThese estimates are totaled, and that total budget are fixed. Through the use of Monte-divided by a utilization rate for developers, carlo simulation, and the ongoing tuning ofand turned into a number of weeks. From historical patterns of events within anthat time forward the date is fixed, and organization, it is possible to improve theoften the budget. estimates without causing more work by the developers in estimating, or requiringGiven the vague inputs and knowledge more detailed specification up-front.they will be held to this estimate, developerTroy Magennis – Focused Objective Page 3
  4. 4. Introduction to Monte-carlo Analysis for Software DevelopmentModeling software projects average. It’s common for smallThis paper looks at two common Agile stories to be more accurate thanmethodologies and presents Monte-carlo massive stories with lots ofmodels for each. Scrum is a commonly unknown risks, so these adjustmentsemployed agile development process, and are entered for each estimate sizeKanban is an emerging methodology that level. These can be obtained byshows great promise in predictable software analyzing actual versus estimatedelivery. data of already completed stories, or if no data exists initially guessedScrum Modeling with Excel 3. Defect rate (expressed as 1 defect forScrum delivers value through fixed time x points of y size)iterations. Teams choose a set of 4. Added scope rate (expressed as 1functionality (stories) to deliver each story for every x points of theiteration time-box, measured in a points medium story size for the project, 5system. For this example we will use in this example)Microsoft Excel. The first step to Monte- 5. Start datecarlo simulation is to build a model of a 6. Days per iteration (work days, forscrum process using various excel formula’s example 10 for a two week iterationthat cascade into a final amount of story cycle)points for each simulation row. From the 7. Number of story points per iterationstory points, the number of iterations, and targets (team velocity, pick a willtherefore a date can be determined. always be better than lower velocity,The inputs required for this model are – a stretch goal velocity for the upper bound, and the velocity falling 1. Number of stories for each “size” between these two limits as a story (in this example, stories were starting point) estimates were limited to one of 1, 2, 3, 5, 8, 13, 20, 40 units by the To capture this data, an input worksheet developers. Also in this example, can be built in Excel, similar to that shown some stories were missing estimates, in Figure 1. so they were spread according to the median story size of existing story estimates) 2. The lower bound, average and high boundary adjustments to apply against each estimate size. Random numbers will fall within these boundaries, weighted towards theTroy Magennis – Focused Objective Page 4
  5. 5. Introduction to Monte-carlo Analysis for Software DevelopmentFigure 1 - Input worksheet for Scrum simulationThese inputs allow a simulation model to be would allow specification of an exactbuilt. The calculations required at each step probability curve for random numbers, andare pretty simple, except for the random this is exactly what commercial productsnumber generation (and this is also pretty offer. For this example, we stay within thesimple in Excel). A thorough explanation of simple but often indicative standard bellstrategies for building random numbers curve (Normal Distribution) with the userthat follow certain patterns evident in real- being able to specify the bounds and thelife for a given input is covered later; For mean adjustment for each estimate size (thenow it is just important to understand that rationale being that the bigger the estimateeach random number provided by the size, the more variability, but it is wise torandom number generator will after many look at historical data and make this modelgenerations follow a bell-curve pattern in conform to a team’s estimation ability).(more occurrences happen around the Once a random number is generated forchosen mean value, falling off either side), each estimate size bin (the number of storieswith 45% below that mean, and 45% above with that estimate size), this number isthat mean. Less than 5% occur above and multiplied by the total story size of thatbelow the specified lower and upper estimate, and these are summed. Forbounds. More advanced tuning of a modelTroy Magennis – Focused Objective Page 5
  6. 6. Introduction to Monte-carlo Analysis for Software Developmentexample, Equation 1 shows the function to =((Total_Points/IntroducedScale)*In troducedPerScale)*PointsPerIntroducequate the random number within the edbounds chosen, and multiplying the totalstory size for that estimate bin in order to Equation 3 – Calculating the adjustment to add for introduced scopeget a total story points scenario for a singlesimulation event. The project story point total, the defect point total and the introduced scope total=NORMINV(RAND(),Mean1,(UpperBoun are summed and this value represents thed1-LowBound1)/3.29)*Bin1Total total story points to burn down over theEquation 1 – Calculating the adjusted story points course of a project to achieve 100%for an estimate size using a random number withinthe boundaries chosen complete. To determine the number of iterations it would take to complete theseThe equation shown in Equation 1 is stories, it is a matter of simply dividing byreplicated for each estimate size bin, and one of the three target points per iterationthese are summed to obtain a final “total” inputs as shown in Equation 4.number of story points in an entire backlog. =(Total_Points + Defect_Total +Additional story points to account for Increased_Scope_Total) /defects are calculated and added to the (PointsPerIteration*Vacation_Adjtotal. The total project points is divided by ustment)the number of defects per point input, and Equation 4 - Calculating the number of iterationsmultiplied by the number of points per required to complete ALL points. In this model, we calculate this for three different target point perdefect input, as shown in Equation 2 iteration inputs=((Total_Project_Points/DefectSc Some adjustment is done at this point toale)*DefectsPerScale)*PointsPerD account for developer vacation time, aefect particular problem for long running projectsEquation 2 – Calculating the adjustment to add for with large number of developers. Andefects adjustment is calculated that reduces theScope is often added after the project number of point per iteration to account forbegins, whether in the form of new features, this. In this model, it is simply the formulaor work relating to fixing production shown in Equation 5.defects, or non-development specific tasks. =1-(AvgDevsOnVacation/TotalDevs)This model simply applies a singleintroduced stories value following the rate Equation 5 - Simple Vacation Adjustment Equation, normally in the range of 0.9 to 0.95specified as an input and adds this to thetotal story points so far totaled. The above equations are run many thousands of times in different rows (simply the first row in copied down inTroy Magennis – Focused Objective Page 6
  7. 7. Introduction to Monte-carlo Analysis for Software Development Excel to get a set of simulation results). The of workdays, and optionally excluded more times they are calculated, the firmer public holidays as shown in Equation 6 probability patterns will emerge. Figure 2 demonstrates. shows the first five rows of many =WORKDAY(StartingDate, thousands. Each row will determine how DaysPerIteration * many iterations it will take to complete a Iteration_Target, backlog for the three target velocities, in this Public_Holiday_Dates) example 190, 200 and 220 points per Equation 6 - Finding the date given number of iteration. workdays. Iteration_Target will be 7 to 12 for our example. StartingDate and DaysPerIteration are user The only remaining steps to determine inputs as shown in Figure 1 completion date and probability results like To calculate the percentage probability of those shown in Figure 3 is to calculate the achieving a result at a target velocity (one of calendar date, and how many simulation three), the equation shown in Equation 7 is rows fall within a given number of used. This equation counts the number of iterations. The Figure 3 results ask the user simulation rows less than the target, divides to give a range of iterations, seven to twelve that by the total number of simulations to in this example. The completion date is find the percentage likelihood. This is done calculated using a convenient Excel function for each of the target velocities. that determines a date from a given numberFigure 2 - The results for the calculations showing the first 5 simulations of many thousandsFigure 3 - The results showing probabilities of hitting certain dates Troy Magennis – Focused Objective Page 7
  8. 8. Introduction to Monte-carlo Analysis for Software Development=(COUNTIF(Velocity1Range,"< " & Kanban ModelIteration_Target) Modeling a Kanban project using Excel is/COUNT(Velocity1Range)) difficult, not because the calculations areEquation 7 - Count the number of simulations that complex, but because the interactioncomplete within a target, and convert to a between stories would require at least onepercentage column per story, per simulation row, andThe results shown in Figure 3 indicate that this just gets un-maintainable. A customas long as the team can maintain 200 story application makes more sense, and thispoints per iteration, they have an 87% article covers one such application.chance of finishing by 22 October 2009 Kanban divides the steps of delivering a(when this simulation was done). As a single story into columns (called Status’project progresses, the model can be tuned throughout this article). For example, ato improve confidence and accuracy. Defect story might pass from Design, tocounts can be determined from the bug Development, to Testing, to Release. Thetracking database (how many point for x time taken for each story in each Status isnumber of defects raised), the random recorded. Work is limited in each Status,number boundaries for each estimate size and a new story can only be pulled from leftmatch actual prior data. By maintaining this to right when a vacant position is availablemodel, the probability of hitting a given (total cards within a Status are below thedate is always available, and some rigor limit). A card system on a wall using post-itwas used in the calculation. notes (or electronic version) is used toFigure 4 - Example digital Kanban Board visualizing work flowing from left to right through a process.Troy Magennis – Focused Objective Page 8
  9. 9. Introduction to Monte-carlo Analysis for Software Development represent stories flowing from left to right below the WIP limit for that status. This as shown in Figure 4. process continues until all cards have traversed from the imaginary backlog to the To simulate, the application takes the inputs completed stories pile, and the time take to of the number of Status columns, and a do this is recorded. lower bound and upper bound for time taken to complete stories in each status, and This type of simulation avoids having to the limit of stories allows in each status at have accurate estimates for each story by one time (called the WIP Limit or work in looking at the previous lower and upper progress limit). In place of an actual bounds for completing stories and using backlog, a number of initial story cards are random numbers between these specified by the user as shown in Figure 5. boundaries. It would be a small These inputs are enough to do a simple enhancement to add the ability to have size simulation, where the application loops for each story, but this would complicate simulating a given time interval, for the model and may not increase accuracy in example 1 day. The simulator grabs the first a significant way; The actual times few stories and populates the first status measured on previous work is likely more column. For each story a random time indicative of future patterns. These actual within that status’ boundaries is calculated ranges can be mined from any work and stories are only move to the right when tracking tool, and are often easy to read a) that time has elapsed, b) there is an open from a Cumulative Flow Diagram which is position that keeps the number of stories a graphical representation of how manyFigure 5 - Kanban Simulation Setup Screen for the basic inputs. Troy Magennis – Focused Objective Page 9
  10. 10. Introduction to Monte-carlo Analysis for Software Developmentcards are in each status at any given system where defects can be raised inmoment. different status’ and those defects will cause a story to start back in another status for aDefects, added scope and the time stories random time between the specifiedspend in a “Blocked” state (no test boundaries.environment, questions to a stakeholder,un-available experts) are represented byadding more stories to the backlogaccording to rates specified by the user, andextending story times by given user rates.Each of these real-world values can beobtained from tracking systems, the defectdatabase for example, or the spreadsheetholding the story data, or initially guessedfrom prior experience. Tuning these values Figure 6 - Blocking rate, Defects, and Added scopeover time and demonstrating to the entire all materially impact time and need to be simulatedteam the impact of these occurrences is a Kanban simulation is carried out with thegreat way to manage scope creep and specified setup either visually for a singlequality issues in a team. Figure 6 shows pass, or many hundreds or thousands ofhow defects are specified in our example times for Monte-carlo results. Once the Histogram - Completion Date Probability (Project start: 5/24/2011) 250 200 Frequency 150 100 Frequency 50 0 Completion DateFigure 7 - Sample histogram of Monte-carlo simulated completion dates from a Kanban modelTroy Magennis – Focused Objective Page 10
  11. 11. Introduction to Monte-carlo Analysis for Software Developmentsimulation has run, this application writes as the results show in Figure 8.the results to Excel for further analysis as Monte-carlo simulation offers advantages toshown in the histogram chart in Figure 7. teams expected to give completion dates forThere is no absolute result, just a pattern of projects, and to model the uncertainties in athe most commonly simulated completion productive way. Whilst Monte-carlodates, in this case early December to mid- simulation doesn’t give an exact date, itDecember is the likely range. shows the likely pattern and ranges that canKanban simulation can answer another key be expected, and the factors that influencequestion – If you had to add staff, how that date most, a process called Sensitivitymany and what skills do you need? If we Analysis.assume that each Kanban status has aspecific skillset, for example, graphic Sensitivity Analysis – What inputdesigners in the Design status (Status 2 in has the most impact on datethis example), Developers in the Dev status Sensitivity analysis answers the question of(Status 3), QA in the Testing status (Status what input factor has the greatest impact on4), and release management represented in the final result. In essence, if all the inputsthe DevOps status (Status 5) – then by were increased and decreased by 10% (asystematically increasing and decreasing consistent amount, 10% makes the maththe WIP limits for each status and executing easy), one at a time and a simulation runa Monte-carlo simulation run, the status each time – how much each changethat has the most impact can be determined. impacted the final result.The example simulator supports this featureFigure 8 - Kanban simulation finds what Status column increase gives the best improvement.Troy Magennis – Focused Objective Page 11
  12. 12. Introduction to Monte-carlo Analysis for Software DevelopmentFigure 9 - Manual sensitivity analysis. Defects have more impact on average iterations than increased scope inour example Scrum model. Motivate the team to reduce defect rate anyway they can.Commercial Monte-carlo simulation tools Relevant Random Numbermake this functionality easy to visualize in Generationgraphs, but for our Excel model, it is easy to Random number generation is a complexdo by hand. To determine if defects or field of mathematics. For truly randomintroduced scope is having a bigger impact numbers, a computer is the last thing youon outcome, temporarily increase the defect want. Random numbers generated byrater and then the introduced scope rate by computer are never truly random, they relya percentage and take the average number on algorithms that attempt to be random,of iterations before and after the change. but the algorithm used is repeatable givenFigure 9 shows such a result for the Scrum the same starting value, therefore – notmodel used earlier. Although close, random! For most purposes this won’tincreasing the defect rate by a percentage cause an issue for the modeling wehas more impact on average number of undertake, but it is important to realize thatiterations to complete than the same random number generators have flaws, andpercentage rate change for additional scope. to avoid them if they will impact the results.From observing many models, this is acommon case, and the model has helped To simulate effectively, we need sets ofteams understand the impact of quality random numbers that fall within the likelyearlier when developing code. After bounds of the real world problem. Excelreducing defect rate, look for the next most helps with the function: Rand(). Rand()important factor and improve that area. returns random numbers from 0 to 1 withSensitivity analysis and a Monte-carlo an equal chance of occurrence across thosesimulation give teams the tool to bounds as shown in Figure 10, butdemonstrate how little improvements obtaining a random number within a boundcount. is just one part of the problem. In the real world, the random numbers between a range might occur more frequently around one value, or end of the boundaries. If weTroy Magennis – Focused Objective Page 12
  13. 13. Introduction to Monte-carlo Analysis for Software Developmentignored this, we compromise accuracy in Excel supports applying bias to the randomthe final result. For example, when looking number generation, and customat the actual time taken for previous applications take this to a whole new levelestimates, a bias towards overrunning time offering features not only to produce sets ofcould be the pattern. Even though the random numbers that fit a curve, but also toboundaries might be from 80% (20% under look at existing data and match a randomthe estimate) to 200% (double the estimate), number set to that data.the majority of estimates are 175%. Left In the Scrum model covered in the article,alone, the random number generator in we forced the random number generator toExcel would evenly distribute random follow the common Bell Curve, or Normalvalues across the range. Distribution as shown in Figure 11. =RAND() 80 60 Frequency 40 20 Frequency 0 0.000301328 0.120246911 0.180219703 0.240192494 0.300165286 0.360138077 0.420110869 0.540056452 0.600029243 0.660002035 0.719974826 0.779947618 0.839920409 0.899893201 0.959865992 0.06027412 0.48008366 BinFigure 10 - Excels RAND() function returns random numbers from the range 0 to 1 with equal probability =NORMINV(RAND(),1,1) 200 Frequency 150 100 50 0 Frequency 0.447206008 0.947231395 1.447256781 1.947282167 2.447307553 3.447358326 3.947383712 4.447409099 4.947434485 -2.55294631 -3.052971696 -2.052920923 -1.552895537 -1.052870151 -0.552844764 -0.052819378 2.94733294 BinFigure 11 - To obtain random numbers that fit the Normal distribution (Bell curve), use the NormInv() functionTroy Magennis – Focused Objective Page 13
  14. 14. Introduction to Monte-carlo Analysis for Software Development Figure 12 - EasyFit from Mathwave Technologies is a commercial curve fitting tool that can create random numbers that fit real-world data“Normal” is one distribution curve of without disrupting development staff formany, and commercial Monte-carlo detailed analysis. Where possible, it ispackages support more than the basic recommended to analyze prior data, or tocurves. carefully consider the likely range of possible values, and whether they areEasyFit is a commercial curve-fitting weighted more frequently towards onepackage that will analyze existing data and boundary than another when choosing adetermine what probability curve fits that random number distribution fit. If it isdata. This application will also then create a significant, then look for commercial tools.set of random numbers that match thiscurve, allowing you to simulate with a Conclusionrandom input that is indicative of the real This article touches the surface of how toworld values. One use for this type of build and model software projects usingapplication is looking at the frequency of Monte-carlo techniques. The ability toprevious estimates and employing those quickly forecast a projects most likelyvalues in future Monte-carlo simulations. completion date, and the impact of addingFigure 12 shows an actual set of estimates more staff, or reducing defect counts makesfrom a previous project. Without any other this analysis an important tool for anybetter information, random numbers development managers arsenal, and placegenerated from this curve could be used to them in a position to answer with a level ofsimulate future similar project estimatesTroy Magennis – Focused Objective Page 14
  15. 15. Introduction to Monte-carlo Analysis for Software Developmentunprecedented confidence the three likelyongoing upper-management questions - 1. How much will this product cost to develop and deliver? 2. What is the likelihood of hitting date x? 3. What resources do you need to hit date x (money equals people, so the question is often how much more money do you need to hit date x)?[END]About the AuthorTroy Magennis is founder of FocusedObjective, a consulting firm that aims toimprove software development practicesand management through better tools andeducation. Troy has held positions at VPlevel for many companies in diverse fieldfrom Automotive, Financial, Image RightsManagement and Travel.For feedback, Troy can be contacted at –Troy.magennis@FocusedObjective.comFor more articles like this, visit use at –http://www.FocusedObjective.comTroy Magennis – Focused Objective Page 15

×