Local Bias and its Impacts on the Performance of Parametric Estimation Models Ye Yang, Lang Xie, Zhimin He (ISCAS) Qi Li, ...
Outline <ul><li>Background </li></ul><ul><li>Research questions </li></ul><ul><li>Measuring local bias </li></ul><ul><li>M...
Background <ul><li>Continuously calibrated and validated parametric models are necessary for realistic software estimates....
Background(Cont.) <ul><li>Typical parametric models are calibrated over a broad range of industry data </li></ul><ul><li>A...
Background (Cont.) <ul><li>The evolution cycle of a parametric model  </li></ul><ul><ul><li>Mismatches between “general as...
Example: COCOMO II model <ul><li>COCOMO II model </li></ul><ul><li>Range of local tuning parameters: </li></ul><ul><ul><li...
Research questions <ul><li>Research questions: </li></ul><ul><ul><li>Is there a way to measure the local bias?  </li></ul>...
Outline <ul><li>Background </li></ul><ul><li>Research questions </li></ul><ul><li>Measuring local bias </li></ul><ul><li>M...
Local Bias Definition <ul><ul><li>Local bias: degree of deviation between a local model and the general model </li></ul></...
Summary of Dataset  CII 2000 Subset After2000 Subset CII 2010 Dataset
Analysis procedure <ul><li>Break After2000 subset into 10 subsets. </li></ul><ul><li>Conduct representative local calibrat...
Measuring local bias - Results <ul><li>Parameters of local models </li></ul><ul><li>Local bias of each group </li></ul><ul...
Outline <ul><li>Background </li></ul><ul><li>Research questions </li></ul><ul><li>Measuring local bias </li></ul><ul><li>M...
Measuring the impacts of local bias <ul><li>Performance assessment </li></ul><ul><ul><li>Basic performance indicators: MMR...
Analysis procedure <ul><li>First, for each group  ss i  in the After2000 subset: </li></ul><ul><ul><li>combine  ss i  with...
Results <ul><li>Model performance  </li></ul><ul><li>Model performance decreases as new subsets being introduced   </li></...
Measuring the impacts of local bias(cont.) <ul><li>Spearman correlation coefficients between local bias and model performa...
Discussions <ul><li>Two types of measures </li></ul><ul><ul><li>Local bias:  </li></ul></ul><ul><ul><ul><li>Useful to brid...
Implications to Parametric Model Calibration   <ul><li>Previous approaches </li></ul><ul><ul><li>Data pre-processing </li>...
Threats to Validity <ul><li>Other sources of bias?  </li></ul><ul><ul><li>chronological bias, new technologies influences,...
Ongoing work on handling local bias <ul><li>Assumption :  </li></ul><ul><ul><li>local historical data set with higher loca...
Conclusions <ul><li>Providing a definition for consistently understanding and measuring local bias; </li></ul><ul><li>The ...
Thank you! Contact: Ye Yang (yangye@nfs.iscas.ac.cn)
Upcoming SlideShare
Loading in …5
×

Promise 2011: "Local Bias and its Impacts on the Performance of Parametric Estimation Models"

2,903 views
2,813 views

Published on

Promise 2011:
"Local Bias and its Impacts on the Performance of Parametric Estimation Models"
Ye Yang, Lang Xie, Zhimin He, Qi Li, Vu Nguyen, Barry Boehm and Ricardo Valerdi.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,903
On SlideShare
0
From Embeds
0
Number of Embeds
2,040
Actions
Shares
0
Downloads
34
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • These pictures show the stdMRE values and MMRE values in each data group.
  • This table shows the results of correlation analysis. We can see that the range of stdMRE is significantly positive correlated with local bias and local_bias*num (loca bias times num). Both the average stdMRE and the average MMRE are significantly positive correlated with local_bias*num. Range of stdMRE reflects the uncertainty of model performance. So we argue that the bigger the local bias is, the weaker the model performance is.
  • Promise 2011: "Local Bias and its Impacts on the Performance of Parametric Estimation Models"

    1. 1. Local Bias and its Impacts on the Performance of Parametric Estimation Models Ye Yang, Lang Xie, Zhimin He (ISCAS) Qi Li, Vu Nguyen, Barry Boehm (USC) Ricardo Valerdi (MIT/Univ. of Arizona) Sep. 21, 2011 Promise 2011, Banff, Canada
    2. 2. Outline <ul><li>Background </li></ul><ul><li>Research questions </li></ul><ul><li>Measuring local bias </li></ul><ul><li>Measuring the impacts of local bias </li></ul><ul><li>Handling Local Bias </li></ul><ul><li>Conclusions and future work </li></ul>
    3. 3. Background <ul><li>Continuously calibrated and validated parametric models are necessary for realistic software estimates. </li></ul>Model user Model maintener Model researcher
    4. 4. Background(Cont.) <ul><li>Typical parametric models are calibrated over a broad range of industry data </li></ul><ul><li>Advocate local calibration to improve accuracy over the default model calibration. </li></ul><ul><li>Pros and cons of local calibration (local tuning) </li></ul><ul><ul><li>Pros: better model performance </li></ul></ul><ul><ul><li>Cons: less bound to reach full compliance with the general model </li></ul></ul>
    5. 5. Background (Cont.) <ul><li>The evolution cycle of a parametric model </li></ul><ul><ul><li>Mismatches between “general assumptions” and “local assumptions” </li></ul></ul><ul><ul><li>Resultant tuning variance caught increasing research attention </li></ul></ul><ul><ul><li>Counter-intuitive calibration results </li></ul></ul><ul><ul><li>Challenges in making use of unbalanced dataset for developing and evaluating general model </li></ul></ul>
    6. 6. Example: COCOMO II model <ul><li>COCOMO II model </li></ul><ul><li>Range of local tuning parameters: </li></ul><ul><ul><li>Yang and Clark: CII Database experience: 1<=A<=4 </li></ul></ul><ul><ul><li>Menzies : (2.2 <= A <= 9.18) ^ (0.88 <= B <= 1.09) </li></ul></ul>Ln_effort Ln_Size
    7. 7. Research questions <ul><li>Research questions: </li></ul><ul><ul><li>Is there a way to measure the local bias? </li></ul></ul><ul><ul><li>As historical data accumulates from multiple companies, how will the associated local bias impact the performance of the general parametric estimation model? </li></ul></ul><ul><ul><li>Are there any correlation patterns between local bias and model performance variation? </li></ul></ul><ul><li>Assumptions: </li></ul><ul><ul><li>The general parametric model follows a similar structure as the COCOMO II. </li></ul></ul><ul><ul><li>In model localization stage, constant A and constant B are tuned with local data. </li></ul></ul><ul><ul><li>In model usage stage, locally calibrated A and B are used for project estimation. </li></ul></ul>
    8. 8. Outline <ul><li>Background </li></ul><ul><li>Research questions </li></ul><ul><li>Measuring local bias </li></ul><ul><li>Measuring the impacts of local bias </li></ul><ul><li>Handling Local Bias </li></ul><ul><li>Conclusions and future work </li></ul>
    9. 9. Local Bias Definition <ul><ul><li>Local bias: degree of deviation between a local model and the general model </li></ul></ul><ul><ul><li>In the context of CII model: </li></ul></ul><ul><ul><li>where </li></ul></ul><ul><ul><ul><li>A’ and B’ are model parameters calibrated from local data of each organization, </li></ul></ul></ul><ul><ul><ul><li>A and B are default constant values of COCOMO II model (A=2.94, B=0.91), and </li></ul></ul></ul><ul><ul><ul><li>A standard size of 100KLOC to normalize local bias. </li></ul></ul></ul>
    10. 10. Summary of Dataset CII 2000 Subset After2000 Subset CII 2010 Dataset
    11. 11. Analysis procedure <ul><li>Break After2000 subset into 10 subsets. </li></ul><ul><li>Conduct representative local calibration to produce A’ and B’. </li></ul><ul><li>Calculate local bias and compare among groups. </li></ul>CII 2000 Subset After2000 Subset Subset 1 … A, B A 1 ’ , B 1 ’ A 2 ’ , B 2 ’ A n ’ , B n ’ local_bias 1 local_bias 2 local_bias n CII 2010 Dataset Subset 2 Subset n Group by Organization_ID Default Constants: A, B
    12. 12. Measuring local bias - Results <ul><li>Parameters of local models </li></ul><ul><li>Local bias of each group </li></ul><ul><li>Different local A and B in each group, indicating local bias introduced when adopting local calibration; </li></ul><ul><li>Local bias varies in different group, ranging from 0.06 to 2.25; </li></ul><ul><ul><li>E.g. in group 9, the relative ratio of the local model’s estimates and the CII model estimates is as great as almost EXP(2.25)=9.49 times considering a normal project size at 100KSLOC. </li></ul></ul>
    13. 13. Outline <ul><li>Background </li></ul><ul><li>Research questions </li></ul><ul><li>Measuring local bias </li></ul><ul><li>Measuring the impacts of local bias </li></ul><ul><li>Handling Local Bias </li></ul><ul><li>Conclusions and future work </li></ul>
    14. 14. Measuring the impacts of local bias <ul><li>Performance assessment </li></ul><ul><ul><li>Basic performance indicators: MMRE (mean MRE), stdMRE (the variance of MRE) </li></ul></ul><ul><ul><li>Assessment procedure: </li></ul></ul><ul><ul><li>Average MMRE, Range of MMRE, Average stdMRE, and Range of stdMRE are used to assess the performance of an estimation model. </li></ul></ul>Average MMRE Range of MMRE Average stdMRE Range of stdMRE Repeat the above steps for 2000 times 2000 (MMRE, stdMRE) pairs Spliting data set into training set and test set Tuning model parameters on training set Evaluating model performance on test set MMRE, stdMRE
    15. 15. Analysis procedure <ul><li>First, for each group ss i in the After2000 subset: </li></ul><ul><ul><li>combine ss i with CII 2000 data set to produce a new data set ds i ; </li></ul></ul><ul><ul><li>Assessing model performance on data set ds i , record values of performance indicators; </li></ul></ul><ul><li>Then conduct correlation analysis between local bias and model performance </li></ul>CII 2000 subset I SS1 Performance Local bias CII 2000 subset I SS2 Performance Local bias …… …… …… Correlation analysis
    16. 16. Results <ul><li>Model performance </li></ul><ul><li>Model performance decreases as new subsets being introduced </li></ul>Reflecting the uncertainty inherent in model performance when adding just a small group of new data points into the CII 2000 baseline dataset. CII 2000 CII2010 MMRE 0.3478 0.4063 StdMRE 0.3261 0.3401
    17. 17. Measuring the impacts of local bias(cont.) <ul><li>Spearman correlation coefficients between local bias and model performance: </li></ul><ul><ul><li>At the significant level of p-value less than 0.05, the range of stdMRE is significantly positive correlated with local bias and local_bias*num. Both the average stdMRE and the average MMRE are significantly positive correlated with local_bias*num. </li></ul></ul><ul><ul><li>Range of stdMRE reflects the uncertainty of model performance. Hence, the bigger the local bias is, the weaker the performance is. </li></ul></ul>
    18. 18. Discussions <ul><li>Two types of measures </li></ul><ul><ul><li>Local bias: </li></ul></ul><ul><ul><ul><li>Useful to bridge the potential gaps between “model building” stage and “model localization” stage </li></ul></ul></ul><ul><ul><li>Performance measures: </li></ul></ul><ul><ul><ul><li>range and average of MMRE and stdMRE are easy to produce, reflecting certain profile of bias’s influence </li></ul></ul></ul><ul><li>Two components that drive the decreased model performance </li></ul><ul><ul><li>the degree of local bias and the number of data points associated with each additional group </li></ul></ul>
    19. 19. Implications to Parametric Model Calibration <ul><li>Previous approaches </li></ul><ul><ul><li>Data pre-processing </li></ul></ul><ul><ul><ul><li>Reducing factors, removing outliers, etc </li></ul></ul></ul><ul><ul><li>regression based approaches </li></ul></ul><ul><ul><ul><li>variants of standard linear regression, incorporating a priori knowledge </li></ul></ul></ul><ul><ul><li>machine learning approaches </li></ul></ul><ul><ul><ul><li>mainly focus on optimizing model accuracy </li></ul></ul></ul><ul><li>Need to pay attention to balance accuracy and stability </li></ul>
    20. 20. Threats to Validity <ul><li>Other sources of bias? </li></ul><ul><ul><li>chronological bias, new technologies influences, etc. </li></ul></ul><ul><li>Other performance indicators? </li></ul><ul><ul><li>PRED, MRE, etc </li></ul></ul><ul><li>Other parametric models? </li></ul>
    21. 21. Ongoing work on handling local bias <ul><li>Assumption : </li></ul><ul><ul><li>local historical data set with higher local bias presents more different pattern for cost estimation, and it should be assigned a lower weight when being used for model calibration. </li></ul></ul><ul><li>Constraints for weight distribution function Weight=F ( LocalBias ) </li></ul><ul><ul><li>IF LocalBias =0, THEN Weight =1; </li></ul></ul><ul><ul><li>IF LocalBias -> +∞, THEN Weight -> 0; </li></ul></ul><ul><ul><li>The F should be a decreasing function on interval [0, +∞). </li></ul></ul><ul><li>Three functions </li></ul>
    22. 22. Conclusions <ul><li>Providing a definition for consistently understanding and measuring local bias; </li></ul><ul><li>The impact assessment and correlation analysis verify that local bias can be harmful to general model performance; </li></ul><ul><li>Offering insights to ease parametric model evolution by identifying and avoiding local bias early on in the data collection stage; </li></ul><ul><li>Better local bias handling approach is needed. </li></ul><ul><ul><li>E.g. employ machine learning approach to learn local bias, and learn how to improve the model structure to counter-effect the bias </li></ul></ul>
    23. 23. Thank you! Contact: Ye Yang (yangye@nfs.iscas.ac.cn)

    ×