• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Promise 2011: "Local Bias and its Impacts on the Performance of Parametric Estimation Models"
 

Promise 2011: "Local Bias and its Impacts on the Performance of Parametric Estimation Models"

on

  • 2,791 views

Promise 2011:

Promise 2011:
"Local Bias and its Impacts on the Performance of Parametric Estimation Models"
Ye Yang, Lang Xie, Zhimin He, Qi Li, Vu Nguyen, Barry Boehm and Ricardo Valerdi.

Statistics

Views

Total Views
2,791
Views on SlideShare
825
Embed Views
1,966

Actions

Likes
0
Downloads
32
Comments
0

3 Embeds 1,966

http://promisedata.org 1954
http://translate.googleusercontent.com 11
http://ai-at-wvu.blogspot.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • These pictures show the stdMRE values and MMRE values in each data group.
  • This table shows the results of correlation analysis. We can see that the range of stdMRE is significantly positive correlated with local bias and local_bias*num (loca bias times num). Both the average stdMRE and the average MMRE are significantly positive correlated with local_bias*num. Range of stdMRE reflects the uncertainty of model performance. So we argue that the bigger the local bias is, the weaker the model performance is.

Promise 2011: "Local Bias and its Impacts on the Performance of Parametric Estimation Models" Promise 2011: "Local Bias and its Impacts on the Performance of Parametric Estimation Models" Presentation Transcript

  • Local Bias and its Impacts on the Performance of Parametric Estimation Models Ye Yang, Lang Xie, Zhimin He (ISCAS) Qi Li, Vu Nguyen, Barry Boehm (USC) Ricardo Valerdi (MIT/Univ. of Arizona) Sep. 21, 2011 Promise 2011, Banff, Canada
  • Outline
    • Background
    • Research questions
    • Measuring local bias
    • Measuring the impacts of local bias
    • Handling Local Bias
    • Conclusions and future work
  • Background
    • Continuously calibrated and validated parametric models are necessary for realistic software estimates.
    Model user Model maintener Model researcher
  • Background(Cont.)
    • Typical parametric models are calibrated over a broad range of industry data
    • Advocate local calibration to improve accuracy over the default model calibration.
    • Pros and cons of local calibration (local tuning)
      • Pros: better model performance
      • Cons: less bound to reach full compliance with the general model
  • Background (Cont.)
    • The evolution cycle of a parametric model
      • Mismatches between “general assumptions” and “local assumptions”
      • Resultant tuning variance caught increasing research attention
      • Counter-intuitive calibration results
      • Challenges in making use of unbalanced dataset for developing and evaluating general model
  • Example: COCOMO II model
    • COCOMO II model
    • Range of local tuning parameters:
      • Yang and Clark: CII Database experience: 1<=A<=4
      • Menzies : (2.2 <= A <= 9.18) ^ (0.88 <= B <= 1.09)
    Ln_effort Ln_Size
  • Research questions
    • Research questions:
      • Is there a way to measure the local bias?
      • As historical data accumulates from multiple companies, how will the associated local bias impact the performance of the general parametric estimation model?
      • Are there any correlation patterns between local bias and model performance variation?
    • Assumptions:
      • The general parametric model follows a similar structure as the COCOMO II.
      • In model localization stage, constant A and constant B are tuned with local data.
      • In model usage stage, locally calibrated A and B are used for project estimation.
  • Outline
    • Background
    • Research questions
    • Measuring local bias
    • Measuring the impacts of local bias
    • Handling Local Bias
    • Conclusions and future work
  • Local Bias Definition
      • Local bias: degree of deviation between a local model and the general model
      • In the context of CII model:
      • where
        • A’ and B’ are model parameters calibrated from local data of each organization,
        • A and B are default constant values of COCOMO II model (A=2.94, B=0.91), and
        • A standard size of 100KLOC to normalize local bias.
  • Summary of Dataset CII 2000 Subset After2000 Subset CII 2010 Dataset
  • Analysis procedure
    • Break After2000 subset into 10 subsets.
    • Conduct representative local calibration to produce A’ and B’.
    • Calculate local bias and compare among groups.
    CII 2000 Subset After2000 Subset Subset 1 … A, B A 1 ’ , B 1 ’ A 2 ’ , B 2 ’ A n ’ , B n ’ local_bias 1 local_bias 2 local_bias n CII 2010 Dataset Subset 2 Subset n Group by Organization_ID Default Constants: A, B
  • Measuring local bias - Results
    • Parameters of local models
    • Local bias of each group
    • Different local A and B in each group, indicating local bias introduced when adopting local calibration;
    • Local bias varies in different group, ranging from 0.06 to 2.25;
      • E.g. in group 9, the relative ratio of the local model’s estimates and the CII model estimates is as great as almost EXP(2.25)=9.49 times considering a normal project size at 100KSLOC.
  • Outline
    • Background
    • Research questions
    • Measuring local bias
    • Measuring the impacts of local bias
    • Handling Local Bias
    • Conclusions and future work
  • Measuring the impacts of local bias
    • Performance assessment
      • Basic performance indicators: MMRE (mean MRE), stdMRE (the variance of MRE)
      • Assessment procedure:
      • Average MMRE, Range of MMRE, Average stdMRE, and Range of stdMRE are used to assess the performance of an estimation model.
    Average MMRE Range of MMRE Average stdMRE Range of stdMRE Repeat the above steps for 2000 times 2000 (MMRE, stdMRE) pairs Spliting data set into training set and test set Tuning model parameters on training set Evaluating model performance on test set MMRE, stdMRE
  • Analysis procedure
    • First, for each group ss i in the After2000 subset:
      • combine ss i with CII 2000 data set to produce a new data set ds i ;
      • Assessing model performance on data set ds i , record values of performance indicators;
    • Then conduct correlation analysis between local bias and model performance
    CII 2000 subset I SS1 Performance Local bias CII 2000 subset I SS2 Performance Local bias …… …… …… Correlation analysis
  • Results
    • Model performance
    • Model performance decreases as new subsets being introduced
    Reflecting the uncertainty inherent in model performance when adding just a small group of new data points into the CII 2000 baseline dataset. CII 2000 CII2010 MMRE 0.3478 0.4063 StdMRE 0.3261 0.3401
  • Measuring the impacts of local bias(cont.)
    • Spearman correlation coefficients between local bias and model performance:
      • At the significant level of p-value less than 0.05, the range of stdMRE is significantly positive correlated with local bias and local_bias*num. Both the average stdMRE and the average MMRE are significantly positive correlated with local_bias*num.
      • Range of stdMRE reflects the uncertainty of model performance. Hence, the bigger the local bias is, the weaker the performance is.
  • Discussions
    • Two types of measures
      • Local bias:
        • Useful to bridge the potential gaps between “model building” stage and “model localization” stage
      • Performance measures:
        • range and average of MMRE and stdMRE are easy to produce, reflecting certain profile of bias’s influence
    • Two components that drive the decreased model performance
      • the degree of local bias and the number of data points associated with each additional group
  • Implications to Parametric Model Calibration
    • Previous approaches
      • Data pre-processing
        • Reducing factors, removing outliers, etc
      • regression based approaches
        • variants of standard linear regression, incorporating a priori knowledge
      • machine learning approaches
        • mainly focus on optimizing model accuracy
    • Need to pay attention to balance accuracy and stability
  • Threats to Validity
    • Other sources of bias?
      • chronological bias, new technologies influences, etc.
    • Other performance indicators?
      • PRED, MRE, etc
    • Other parametric models?
  • Ongoing work on handling local bias
    • Assumption :
      • local historical data set with higher local bias presents more different pattern for cost estimation, and it should be assigned a lower weight when being used for model calibration.
    • Constraints for weight distribution function Weight=F ( LocalBias )
      • IF LocalBias =0, THEN Weight =1;
      • IF LocalBias -> +∞, THEN Weight -> 0;
      • The F should be a decreasing function on interval [0, +∞).
    • Three functions
  • Conclusions
    • Providing a definition for consistently understanding and measuring local bias;
    • The impact assessment and correlation analysis verify that local bias can be harmful to general model performance;
    • Offering insights to ease parametric model evolution by identifying and avoiding local bias early on in the data collection stage;
    • Better local bias handling approach is needed.
      • E.g. employ machine learning approach to learn local bias, and learn how to improve the model structure to counter-effect the bias
  • Thank you! Contact: Ye Yang (yangye@nfs.iscas.ac.cn)