Confidence in Software Cost Estimation Results based on MMRE and PRED


Published on

Confidence in Software Cost Estimation Results based on MMRE and PRED - PROMISE 2008

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Confidence in Software Cost Estimation Results based on MMRE and PRED

  1. 1. Confidence in Software Cost Estimation Results based on MMRE and PRED Presentation for PROMISE 2008 Marcel Korte [email_address] Dan Port University of Hawai'i at Manoa Phone: +1-(808)-956-7494 [email_address]
  2. 2. Table of Contents 13 May 2008 <ul><li>Introduction </li></ul><ul><li>Approach </li></ul><ul><li>The Standard Error </li></ul><ul><li>Bootstrapping </li></ul><ul><li>The Confidence intervals </li></ul><ul><li>Datasets and models used </li></ul><ul><li>Ex.: Bootstrapped MMREs </li></ul><ul><li>Accounting for Standard Error </li></ul><ul><li>How much confidence needed? </li></ul><ul><li>The Desharnais Problem </li></ul><ul><li>Conclusion </li></ul><ul><li>Invitation for collaboration </li></ul>
  3. 3. Introduction <ul><li>Large number of cost estimation research efforts over last 20+ years </li></ul><ul><li>Still lack of confidence in such research results </li></ul><ul><li>Average overrun of software projects is 30% - 40% (Moløkken, Jørgensen) </li></ul><ul><li>Various studies show inconclusive and / or contradictory results </li></ul>13 May 2008
  4. 4. Approach <ul><li>Software cost estimation research is based on one or more datasets </li></ul><ul><li>Yet datasets are samples , perhaps significantly biased, often outdated, and of questionable relevancy </li></ul><ul><li>Empirical results, based on small datasets, are generalized to an entire population without considering the possible error inherent </li></ul><ul><li>Question: How accurate is my accuracy? </li></ul>13 May 2008
  5. 5. The Standard Error <ul><li>Widely used in many fields of research and well understood </li></ul><ul><li>Measure of the error in calculations based on sample population datasets </li></ul><ul><li>Has not been used in the field of software cost estimation yet </li></ul><ul><li>Many confusing, inconclusive, or contradictory results can be illuminated by indicating that we cannot “have confidence” in them. </li></ul>13 May 2008
  6. 6. Bootstrapping <ul><li>General problem: Distribution not known </li></ul><ul><li>„ Computer intensive“ technique similar to Monte-Carlo method </li></ul><ul><li>Resampling with replacement to „reconstruct“ the general population distribution </li></ul><ul><li>Well-accepted, straightforward approach to approximating the standard error of an estimator </li></ul><ul><li>We used 15,000 iterations in this study </li></ul>13 May 2008
  7. 7. The Confidence Intervals <ul><li>MREs are not normally distributed </li></ul><ul><li>Underlying distribution is not known </li></ul><ul><li>BC-percentile, or „bias corrected“ method has been shown effective in approximating confidence intervals for the available distributions </li></ul>13 May 2008 Histogram of bootstrapped MMRE and log-transformed MMRE for model (A), NASA93 dataset
  8. 8. Datasets and models used <ul><li>PROMISE Datasets: COCOMO81*, COCOMONASA, NASA93, and Desharnais* </li></ul><ul><li>Models: </li></ul><ul><ul><li>A: ln_LSR_CAT** </li></ul></ul><ul><ul><li>B: aSb </li></ul></ul><ul><ul><li>C: given_EM </li></ul></ul><ul><ul><li>D: ln_LSR_aSb </li></ul></ul><ul><ul><li>E: ln_LSR_EM </li></ul></ul><ul><ul><li>F: LSR_a+Sb </li></ul></ul><ul><ul><li>* Some errors found and corrected in these datasets </li></ul></ul><ul><ul><li>** Purely statistical model </li></ul></ul>13 May 2008
  9. 9. Bootstrapped MMRE intervals 1/2 13 May 2008 COCOMO81 dataset COCOMONASA dataset
  10. 10. Bootstrapped MMRE intervals 2/2 13 May 2008 NASA93 dataset Desharnais dataset (*note only D & F used with FP raw and FP adj)
  11. 11. Accounting for Standard Error 13 May 2008 Model ranking based on MMRE, not accounting for Standard Error. Model ranking based on MMRE, accounting for Standard Error at 95% confidence level. COCOMO81 COCOMONASA NASA93 1. A A A 2. E E E 3. C C C 4. B D B 5. D B D COCOMO81 COCOMONASA NASA93 1. A A A, B, C, D, E 2. C, E E - 3. B, D B, C, D - 4. - - - 5. - - -
  12. 12. How much confidence needed? 13 May 2008 Bootstrapped PRED(.30) intervals with significant differences (32%-confidence level, COCOMONASA dataset)* *This a very crude example. There are more refined approaches that account for simultaneous (ANOVA like) comparisons Bootstrapped PRED(.30) intervals (COCOMONASA dataset)
  13. 13. The Desharnais Problem 13 May 2008 Model ranking not accounting for Standard Error (Desharnais, FP adj) imply contradictory results Model ranking not accounting for Standard Error (Desharnais, FP adj). <ul><li>No confident interpretation is possible based on the Desharnais dataset and models D, F </li></ul>MMRE PRED(.25) 1. F D 2. D F MMRE PRED(.25) 1. F, D F, D 2. - -
  14. 14. Conclusions 1/2 <ul><li>We applied standard, easily analyzed and replicated statistical methods: Standard Error, Bootstrapping </li></ul><ul><li>Approach has potential for increasing confidence in research results and cost estimation practice </li></ul><ul><li>Use of Standard Error can help address: </li></ul><ul><ul><li>How can we meaningfully interpret intuitively appealing accuracy measure research results? </li></ul></ul><ul><ul><li>How to make valid statistical inferences (i.e. significant) for results based on comparing PRED or MMRE values. </li></ul></ul><ul><ul><li>Estimating how many data points are needed for confident results. </li></ul></ul>13 May 2008
  15. 15. Conclusions 2/2 <ul><ul><li>The different behaviors of MMRE and PRED (Expansion of this in ESEM 2008 paper) </li></ul></ul><ul><ul><li>Determination of an adequate sample size for model calibration. </li></ul></ul><ul><ul><li>Understanding how sample size effects model accuracy. </li></ul></ul><ul><ul><li>Can “bad” calibration data be identified? </li></ul></ul><ul><ul><li>If doing model validation studies using random methods (such as Jackknife, holdouts, or bootstrap), how many iterations are needed for stable results? </li></ul></ul><ul><ul><li>Why are some cost estimation study results contradictory and how can these be resolved? </li></ul></ul>13 May 2008
  16. 16. Invitation for collaboration <ul><ul><li>ESEM08 paper: “Comparative Studies of the Model Evaluation Criterions MMRE and PRED in Software Cost Estimation Research” (Port, Korte) </li></ul></ul><ul><ul><li>There is much interesting work still to be done in this area such as: </li></ul></ul><ul><ul><li>Standard error studies of non-COCOMO models </li></ul></ul><ul><ul><li>Refinement of “how much data is enough?” methods </li></ul></ul><ul><ul><li>Standard error studies of the “deviation” problem (i.e. variance in model parameters) (Menzies, et al) </li></ul></ul><ul><ul><li>Validation of model selection when reducing parameters (Menzies, et al) </li></ul></ul><ul><ul><li>Applying standard statistical methods for model accuracy (e.g. MSE, least-likelihood estimators) </li></ul></ul><ul><ul><li>As suggested by Tim Menzies, we are keen to “crowd source” this research so if this presentation has inspired you in some way, contact Dan Port ( and lets discuss possible collaborations! </li></ul></ul>13 May 2008
  17. 17. Thank you! 13 May 2008 Marcel Korte [email_address] Dan Port University of Hawai'i at Manoa Phone: +1-(808)-956-7494 [email_address]