Upcoming SlideShare

Project Estimation

Loading in ... 3

Jun. 1, 2011•0 likes## 14 likes

•11,025 views## views

Be the first to like this

Show More

Total views

0

On Slideshare

0

From embeds

0

Number of embeds

0

Download to read offline

Report

Technology

Forecasting and managing software development project risks & uncertainty. Monte-carlo analysis is the tool of choice for managing risk in many fields where risk is an inherent part of doing business. This paper examines how to use monte-carlo techniques to understand and leverage risk in Software Development projects and teams.

Troy MagennisFollow

Director of Architecture at Corbis Corporation- Introduction to Monte-carlo Analysis for Software Development 2011 Introduction to Monte-carlo Analysis for Software Development Forecasting and managing software development project risks & uncertainty Monte-carlo analysis is the tool of choice for managing risk in many fields where risk is an inherent part of doing business. This paper examines how to use monte-carlo techniques to understand and leverage risk in Software Development projects and teams. Troy Magennis Troy Magennis – Focused Objective Focused Objective (FocusedObjective.com) 1 Page 6/1/2011
- Introduction to Monte-carlo Analysis for Software Development Introduction This paper introduces a technique for For software development, it is often answering these questions given the risks necessary to estimate a project upfront in involved in software development and order to get project approval, obtain budget delivery. Monte-carlo analysis is a proven and hire the correct team size and skill-mix. technique for determining the likelihood of This is often at odds with the Agile an outcome in the face of many difficult to development methodology where full measure input criteria. Monte-carlo analysis upfront design and specification is avoided, doesn’t completely eliminate any risk, but it and delivery happens in small iterations does give a much higher degree of until a backlog is completed. The desire to satisfactory answer than the plain guesses work iteration to iteration and choose a and gut feel that is employed today (as to finite level of work each cycle is compelling, release date) in many software projects. and it does un-deniably bring value to What is Monte-carlo analysis? production earlier than a pure waterfall Monte-carlo analysis is a mathematical approach. However, the fact still remains technique that finds the likely patterns in an that in order to provide any value to an equations result given random input values organization, a finite minimum level of that are constrained between likely real- functionality (work) needs to be delivered world values for those inputs. In place of an by a preferred date, within a budget equation, for most purposes a spreadsheet constraint; very few companies will sign off of software model of the real-world process on a project that has no target date, and an is built, and likely (but random) inputs are open budget. Often delays incur high cost; fed into these models many thousands of not just development costs, but also as times to find a pattern in the results. competitors launch new feature first, or take an increasing market share. Even with Agile For example, if you know that there are teams it is important for any development one-hundred software product stories manager or organization to be ready to (features) to develop, and that from history answer the following questions on an (or educated estimate), you know that the ongoing basis – shortest time it would take each story is one day, and the longest is three days then a 1. How much will this product cost to Monte-carlo analysis would simulate in develop and deliver? software completing these one-hundred 2. What is the likelihood of releasing stories with a random work time of between by date x? one and three days; and it would do this 3. What resources do you need to hit thousands of times. The result would be a date x (money equals people, so the histogram of the total time for each question is often how much more simulated project. This would be similar to money do you need to hit date x)? Troy Magennis – Focused Objective Page 2
- Introduction to Monte-carlo Analysis for Software Development if you had the benefit of actually doing the (or anyone put in this position) over- project one thousand times, but the estimate. They add a little bit more to cover computer does this quicker. For a model the unknowns – often doubling each this simple, the answer can be computer by estimate. Worst still, knowing that estimates simple averaging without employing are traditionally under-estimated, each time Monte-carlo analysis. But as the model for they are presented, the next level of developing and delivering software starts to management mentally or in power-point follow a more real-world scenario, it presentation, double what they see/hear. quickly gets too complex for simple This leads to projects not being funded arithmetic. Defects, added scope, because of the excessive investment need environment downtime and other blocking for even the smallest of features. On the events, staff availability are just a few of the other end of accuracy, all too often, other normal day to day events that cause a staff aren’t in this estimate loop, QA for cascading impact on software delivery, and testing, DevOps for release management, Monte-carlo analysis is the right tool to graphic designers for the artistic flair, often manage estimation given the un-predictable don’t get the benefit of adding their input to nature of these events (but likely following the estimate equation leaving the estimates a pattern). (even given the contingency fudge factor) under-estimated for high risk features. The problem with traditional estimation The organization as a whole still has the Developer estimates for software stories problem of needing to make a decision on often turn out to be in-accurate causing whether the cost involved in a project will more erosion of trust in organizations than give the return on investment needed to any other aspect of the business to proceed. For that decision, a delivery date technology teams’ relationship. When a and the cost (staff, equipment, software new project is explored, developers are licenses, etc.) of development and delivery given vague single sentence descriptions of is needed. Given no other option, the a vision an analyst or business owner has in developer estimates have to be taken as an their head, and asked to give an estimate. input, and therefore delivery date and These estimates are totaled, and that total budget are fixed. Through the use of Monte- divided by a utilization rate for developers, carlo simulation, and the ongoing tuning of and turned into a number of weeks. From historical patterns of events within an that time forward the date is fixed, and organization, it is possible to improve the often the budget. estimates without causing more work by the developers in estimating, or requiring Given the vague inputs and knowledge more detailed specification up-front. they will be held to this estimate, developer Troy Magennis – Focused Objective Page 3
- Introduction to Monte-carlo Analysis for Software Development Modeling software projects average. It’s common for small This paper looks at two common Agile stories to be more accurate than methodologies and presents Monte-carlo massive stories with lots of models for each. Scrum is a commonly unknown risks, so these adjustments employed agile development process, and are entered for each estimate size Kanban is an emerging methodology that level. These can be obtained by shows great promise in predictable software analyzing actual versus estimate delivery. data of already completed stories, or if no data exists initially guessed Scrum Modeling with Excel 3. Defect rate (expressed as 1 defect for Scrum delivers value through fixed time x points of y size) iterations. Teams choose a set of 4. Added scope rate (expressed as 1 functionality (stories) to deliver each story for every x points of the iteration time-box, measured in a points medium story size for the project, 5 system. For this example we will use in this example) Microsoft Excel. The first step to Monte- 5. Start date carlo simulation is to build a model of a 6. Days per iteration (work days, for scrum process using various excel formula’s example 10 for a two week iteration that cascade into a final amount of story cycle) points for each simulation row. From the 7. Number of story points per iteration story points, the number of iterations, and targets (team velocity, pick a will therefore a date can be determined. always be better than lower velocity, The inputs required for this model are – a stretch goal velocity for the upper bound, and the velocity falling 1. Number of stories for each “size” between these two limits as a story (in this example, stories were starting point) estimates were limited to one of 1, 2, 3, 5, 8, 13, 20, 40 units by the To capture this data, an input worksheet developers. Also in this example, can be built in Excel, similar to that shown some stories were missing estimates, in Figure 1. so they were spread according to the median story size of existing story estimates) 2. The lower bound, average and high boundary adjustments to apply against each estimate size. Random numbers will fall within these boundaries, weighted towards the Troy Magennis – Focused Objective Page 4
- Introduction to Monte-carlo Analysis for Software Development Figure 1 - Input worksheet for Scrum simulation These inputs allow a simulation model to be would allow specification of an exact built. The calculations required at each step probability curve for random numbers, and are pretty simple, except for the random this is exactly what commercial products number generation (and this is also pretty offer. For this example, we stay within the simple in Excel). A thorough explanation of simple but often indicative standard bell strategies for building random numbers curve (Normal Distribution) with the user that follow certain patterns evident in real- being able to specify the bounds and the life for a given input is covered later; For mean adjustment for each estimate size (the now it is just important to understand that rationale being that the bigger the estimate each random number provided by the size, the more variability, but it is wise to random number generator will after many look at historical data and make this model generations follow a bell-curve pattern in conform to a team’s estimation ability). (more occurrences happen around the Once a random number is generated for chosen mean value, falling off either side), each estimate size bin (the number of stories with 45% below that mean, and 45% above with that estimate size), this number is that mean. Less than 5% occur above and multiplied by the total story size of that below the specified lower and upper estimate, and these are summed. For bounds. More advanced tuning of a model Troy Magennis – Focused Objective Page 5
- Introduction to Monte-carlo Analysis for Software Development example, Equation 1 shows the function to =((Total_Points/IntroducedScale)*In troducedPerScale)*PointsPerIntroduc equate the random number within the ed bounds chosen, and multiplying the total story size for that estimate bin in order to Equation 3 – Calculating the adjustment to add for introduced scope get a total story points scenario for a single simulation event. The project story point total, the defect point total and the introduced scope total =NORMINV(RAND(),Mean1,(UpperBoun are summed and this value represents the d1-LowBound1)/3.29)*Bin1Total total story points to burn down over the Equation 1 – Calculating the adjusted story points course of a project to achieve 100% for an estimate size using a random number within the boundaries chosen complete. To determine the number of iterations it would take to complete these The equation shown in Equation 1 is stories, it is a matter of simply dividing by replicated for each estimate size bin, and one of the three target points per iteration these are summed to obtain a final “total” inputs as shown in Equation 4. number of story points in an entire backlog. =(Total_Points + Defect_Total + Additional story points to account for Increased_Scope_Total) / defects are calculated and added to the (PointsPerIteration*Vacation_Adj total. The total project points is divided by ustment) the number of defects per point input, and Equation 4 - Calculating the number of iterations multiplied by the number of points per required to complete ALL points. In this model, we calculate this for three different target point per defect input, as shown in Equation 2 iteration inputs =((Total_Project_Points/DefectSc Some adjustment is done at this point to ale)*DefectsPerScale)*PointsPerD account for developer vacation time, a efect particular problem for long running projects Equation 2 – Calculating the adjustment to add for with large number of developers. An defects adjustment is calculated that reduces the Scope is often added after the project number of point per iteration to account for begins, whether in the form of new features, this. In this model, it is simply the formula or work relating to fixing production shown in Equation 5. defects, or non-development specific tasks. =1-(AvgDevsOnVacation/TotalDevs) This model simply applies a single introduced stories value following the rate Equation 5 - Simple Vacation Adjustment Equation, normally in the range of 0.9 to 0.95 specified as an input and adds this to the total story points so far totaled. The above equations are run many thousands of times in different rows (simply the first row in copied down in Troy Magennis – Focused Objective Page 6
- Introduction to Monte-carlo Analysis for Software Development Excel to get a set of simulation results). The of workdays, and optionally excluded more times they are calculated, the firmer public holidays as shown in Equation 6 probability patterns will emerge. Figure 2 demonstrates. shows the first five rows of many =WORKDAY(StartingDate, thousands. Each row will determine how DaysPerIteration * many iterations it will take to complete a Iteration_Target, backlog for the three target velocities, in this Public_Holiday_Dates) example 190, 200 and 220 points per Equation 6 - Finding the date given number of iteration. workdays. Iteration_Target will be 7 to 12 for our example. StartingDate and DaysPerIteration are user The only remaining steps to determine inputs as shown in Figure 1 completion date and probability results like To calculate the percentage probability of those shown in Figure 3 is to calculate the achieving a result at a target velocity (one of calendar date, and how many simulation three), the equation shown in Equation 7 is rows fall within a given number of used. This equation counts the number of iterations. The Figure 3 results ask the user simulation rows less than the target, divides to give a range of iterations, seven to twelve that by the total number of simulations to in this example. The completion date is find the percentage likelihood. This is done calculated using a convenient Excel function for each of the target velocities. that determines a date from a given number Figure 2 - The results for the calculations showing the first 5 simulations of many thousands Figure 3 - The results showing probabilities of hitting certain dates Troy Magennis – Focused Objective Page 7
- Introduction to Monte-carlo Analysis for Software Development =(COUNTIF(Velocity1Range,"< " & Kanban Model Iteration_Target) Modeling a Kanban project using Excel is /COUNT(Velocity1Range)) difficult, not because the calculations are Equation 7 - Count the number of simulations that complex, but because the interaction complete within a target, and convert to a between stories would require at least one percentage column per story, per simulation row, and The results shown in Figure 3 indicate that this just gets un-maintainable. A custom as long as the team can maintain 200 story application makes more sense, and this points per iteration, they have an 87% article covers one such application. chance of finishing by 22 October 2009 Kanban divides the steps of delivering a (when this simulation was done). As a single story into columns (called Status’ project progresses, the model can be tuned throughout this article). For example, a to improve confidence and accuracy. Defect story might pass from Design, to counts can be determined from the bug Development, to Testing, to Release. The tracking database (how many point for x time taken for each story in each Status is number of defects raised), the random recorded. Work is limited in each Status, number boundaries for each estimate size and a new story can only be pulled from left match actual prior data. By maintaining this to right when a vacant position is available model, the probability of hitting a given (total cards within a Status are below the date is always available, and some rigor limit). A card system on a wall using post-it was used in the calculation. notes (or electronic version) is used to Figure 4 - Example digital Kanban Board visualizing work flowing from left to right through a process. Troy Magennis – Focused Objective Page 8
- Introduction to Monte-carlo Analysis for Software Development represent stories flowing from left to right below the WIP limit for that status. This as shown in Figure 4. process continues until all cards have traversed from the imaginary backlog to the To simulate, the application takes the inputs completed stories pile, and the time take to of the number of Status columns, and a do this is recorded. lower bound and upper bound for time taken to complete stories in each status, and This type of simulation avoids having to the limit of stories allows in each status at have accurate estimates for each story by one time (called the WIP Limit or work in looking at the previous lower and upper progress limit). In place of an actual bounds for completing stories and using backlog, a number of initial story cards are random numbers between these specified by the user as shown in Figure 5. boundaries. It would be a small These inputs are enough to do a simple enhancement to add the ability to have size simulation, where the application loops for each story, but this would complicate simulating a given time interval, for the model and may not increase accuracy in example 1 day. The simulator grabs the first a significant way; The actual times few stories and populates the first status measured on previous work is likely more column. For each story a random time indicative of future patterns. These actual within that status’ boundaries is calculated ranges can be mined from any work and stories are only move to the right when tracking tool, and are often easy to read a) that time has elapsed, b) there is an open from a Cumulative Flow Diagram which is position that keeps the number of stories a graphical representation of how many Figure 5 - Kanban Simulation Setup Screen for the basic inputs. Troy Magennis – Focused Objective Page 9
- Introduction to Monte-carlo Analysis for Software Development cards are in each status at any given system where defects can be raised in moment. different status’ and those defects will cause a story to start back in another status for a Defects, added scope and the time stories random time between the specified spend in a “Blocked” state (no test boundaries. environment, questions to a stakeholder, un-available experts) are represented by adding more stories to the backlog according to rates specified by the user, and extending story times by given user rates. Each of these real-world values can be obtained from tracking systems, the defect database for example, or the spreadsheet holding the story data, or initially guessed from prior experience. Tuning these values Figure 6 - Blocking rate, Defects, and Added scope over time and demonstrating to the entire all materially impact time and need to be simulated team the impact of these occurrences is a Kanban simulation is carried out with the great way to manage scope creep and specified setup either visually for a single quality issues in a team. Figure 6 shows pass, or many hundreds or thousands of how defects are specified in our example times for Monte-carlo results. Once the Histogram - Completion Date Probability (Project start: 5/24/2011) 250 200 Frequency 150 100 Frequency 50 0 Completion Date Figure 7 - Sample histogram of Monte-carlo simulated completion dates from a Kanban model Troy Magennis – Focused Objective Page 10
- Introduction to Monte-carlo Analysis for Software Development simulation has run, this application writes as the results show in Figure 8. the results to Excel for further analysis as Monte-carlo simulation offers advantages to shown in the histogram chart in Figure 7. teams expected to give completion dates for There is no absolute result, just a pattern of projects, and to model the uncertainties in a the most commonly simulated completion productive way. Whilst Monte-carlo dates, in this case early December to mid- simulation doesn’t give an exact date, it December is the likely range. shows the likely pattern and ranges that can Kanban simulation can answer another key be expected, and the factors that influence question – If you had to add staff, how that date most, a process called Sensitivity many and what skills do you need? If we Analysis. assume that each Kanban status has a specific skillset, for example, graphic Sensitivity Analysis – What input designers in the Design status (Status 2 in has the most impact on date this example), Developers in the Dev status Sensitivity analysis answers the question of (Status 3), QA in the Testing status (Status what input factor has the greatest impact on 4), and release management represented in the final result. In essence, if all the inputs the DevOps status (Status 5) – then by were increased and decreased by 10% (a systematically increasing and decreasing consistent amount, 10% makes the math the WIP limits for each status and executing easy), one at a time and a simulation run a Monte-carlo simulation run, the status each time – how much each change that has the most impact can be determined. impacted the final result. The example simulator supports this feature Figure 8 - Kanban simulation finds what Status column increase gives the best improvement. Troy Magennis – Focused Objective Page 11
- Introduction to Monte-carlo Analysis for Software Development Figure 9 - Manual sensitivity analysis. Defects have more impact on average iterations than increased scope in our example Scrum model. Motivate the team to reduce defect rate anyway they can. Commercial Monte-carlo simulation tools Relevant Random Number make this functionality easy to visualize in Generation graphs, but for our Excel model, it is easy to Random number generation is a complex do by hand. To determine if defects or field of mathematics. For truly random introduced scope is having a bigger impact numbers, a computer is the last thing you on outcome, temporarily increase the defect want. Random numbers generated by rater and then the introduced scope rate by computer are never truly random, they rely a percentage and take the average number on algorithms that attempt to be random, of iterations before and after the change. but the algorithm used is repeatable given Figure 9 shows such a result for the Scrum the same starting value, therefore – not model used earlier. Although close, random! For most purposes this won’t increasing the defect rate by a percentage cause an issue for the modeling we has more impact on average number of undertake, but it is important to realize that iterations to complete than the same random number generators have flaws, and percentage rate change for additional scope. to avoid them if they will impact the results. From observing many models, this is a common case, and the model has helped To simulate effectively, we need sets of teams understand the impact of quality random numbers that fall within the likely earlier when developing code. After bounds of the real world problem. Excel reducing defect rate, look for the next most helps with the function: Rand(). Rand() important factor and improve that area. returns random numbers from 0 to 1 with Sensitivity analysis and a Monte-carlo an equal chance of occurrence across those simulation give teams the tool to bounds as shown in Figure 10, but demonstrate how little improvements obtaining a random number within a bound count. is just one part of the problem. In the real world, the random numbers between a range might occur more frequently around one value, or end of the boundaries. If we Troy Magennis – Focused Objective Page 12
- Introduction to Monte-carlo Analysis for Software Development ignored this, we compromise accuracy in Excel supports applying bias to the random the final result. For example, when looking number generation, and custom at the actual time taken for previous applications take this to a whole new level estimates, a bias towards overrunning time offering features not only to produce sets of could be the pattern. Even though the random numbers that fit a curve, but also to boundaries might be from 80% (20% under look at existing data and match a random the estimate) to 200% (double the estimate), number set to that data. the majority of estimates are 175%. Left In the Scrum model covered in the article, alone, the random number generator in we forced the random number generator to Excel would evenly distribute random follow the common Bell Curve, or Normal values across the range. Distribution as shown in Figure 11. =RAND() 80 60 Frequency 40 20 Frequency 0 0.000301328 0.120246911 0.180219703 0.240192494 0.300165286 0.360138077 0.420110869 0.540056452 0.600029243 0.660002035 0.719974826 0.779947618 0.839920409 0.899893201 0.959865992 0.06027412 0.48008366 Bin Figure 10 - Excel's RAND() function returns random numbers from the range 0 to 1 with equal probability =NORMINV(RAND(),1,1) 200 Frequency 150 100 50 0 Frequency 0.447206008 0.947231395 1.447256781 1.947282167 2.447307553 3.447358326 3.947383712 4.447409099 4.947434485 -2.55294631 -3.052971696 -2.052920923 -1.552895537 -1.052870151 -0.552844764 -0.052819378 2.94733294 Bin Figure 11 - To obtain random numbers that fit the Normal distribution (Bell curve), use the NormInv() function Troy Magennis – Focused Objective Page 13
- Introduction to Monte-carlo Analysis for Software Development Figure 12 - EasyFit from Mathwave Technologies is a commercial curve fitting tool that can create random numbers that fit real-world data “Normal” is one distribution curve of without disrupting development staff for many, and commercial Monte-carlo detailed analysis. Where possible, it is packages support more than the basic recommended to analyze prior data, or to curves. carefully consider the likely range of possible values, and whether they are EasyFit is a commercial curve-fitting weighted more frequently towards one package that will analyze existing data and boundary than another when choosing a determine what probability curve fits that random number distribution fit. If it is data. This application will also then create a significant, then look for commercial tools. set of random numbers that match this curve, allowing you to simulate with a Conclusion random input that is indicative of the real This article touches the surface of how to world values. One use for this type of build and model software projects using application is looking at the frequency of Monte-carlo techniques. The ability to previous estimates and employing those quickly forecast a projects most likely values in future Monte-carlo simulations. completion date, and the impact of adding Figure 12 shows an actual set of estimates more staff, or reducing defect counts makes from a previous project. Without any other this analysis an important tool for any better information, random numbers development managers arsenal, and place generated from this curve could be used to them in a position to answer with a level of simulate future similar project estimates Troy Magennis – Focused Objective Page 14
- Introduction to Monte-carlo Analysis for Software Development unprecedented confidence the three likely ongoing upper-management questions - 1. How much will this product cost to develop and deliver? 2. What is the likelihood of hitting date x? 3. What resources do you need to hit date x (money equals people, so the question is often how much more money do you need to hit date x)? [END] About the Author Troy Magennis is founder of Focused Objective, a consulting firm that aims to improve software development practices and management through better tools and education. Troy has held positions at VP level for many companies in diverse field from Automotive, Financial, Image Rights Management and Travel. For feedback, Troy can be contacted at – Troy.magennis@FocusedObjective.com For more articles like this, visit use at – http://www.FocusedObjective.com Troy Magennis – Focused Objective Page 15