• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
PROMISE 2011: "Handling missing data in software effort prediction with naive Bayes and EM"
 

PROMISE 2011: "Handling missing data in software effort prediction with naive Bayes and EM"

on

  • 2,831 views

PROMISE 2011:

PROMISE 2011:
"Handling missing data in software effort prediction with naive Bayes and EM"
Wen Zhang, Ye Yang and Qing Wang.

Statistics

Views

Total Views
2,831
Views on SlideShare
584
Embed Views
2,247

Actions

Likes
0
Downloads
17
Comments
0

2 Embeds 2,247

http://promisedata.org 2235
http://translate.googleusercontent.com 12

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    PROMISE 2011: "Handling missing data in software effort prediction with naive Bayes and EM" PROMISE 2011: "Handling missing data in software effort prediction with naive Bayes and EM" Presentation Transcript

    • IntroductionNaive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future work Handling missing data in software effortprediction with naive Bayes and EM algorithm Wen Zhang Ye Yang Qing Wang Laboratory for Internet Software Technologies Institute of Software, Chinese Academy of Sciences Beijing 100190, P.R.China {zhangwen,ye,wq}@itechs.iscas.ac.cn 7th International Conference on Predictive Models in Software Engineering (PROMISE), 2011 Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workOutline 1 Introduction 2 Naive Bayes and EM for software effort prediction 3 Missing data handling strategies Missing data toleration strategy. Missing data imputation strategy 4 Experiments The datasets Experiment setup Experimental results 5 Threats. 6 Conclusion and future work Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workEffort prediction with missing data. The knowledge on software project effort stored in the historical datasets can be used to develop predictive models, by either statistical methods such as linear regression and correlation analysis to predict the effort of new incoming projects. Usually, most historical effort datasets contain large amount of missing data. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workEffort prediction with missing data. Due to the small sizes of most historical databases, the common practice of ignoring projects with missing data will lead to biased and inaccurate prediction model. For these reasons, how to handle missing data in software effort datasets is becoming an important problem. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workSample data The historical effort data of projects were organized as shown in the following Table. Table: The sample data in historical project dataset. D X1 ... Xj ... Xn H D1 x11 ... x1j ... x1n h1 ... ... ... ... ... ... ... Di xi1 ... xij ... xin hi ... ... ... ... ... ... ... Dm xm1 ... xmj ... xmn hm Xj (1 ≤ j ≤ n) denotes an attribute of project Di (1 ≤ i ≤ m). hi is the effort class label of Di and it is derived from the real effort of project Di . Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workSample data. There are l effort classes for all the projects in a dataset, that is, hi is equal to one of the elements in {c1 , ..., cl }. Xj is independent of each other and has Boolean values without missing data, i.e. xij ∈ {0, 1}. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workFormulation of the problem. An effort dataset Ycom containing m historical projects as Ycom = (D1 , ..., Di , ..., Dm )T , where Di (1 ≤ i ≤ m) is a historical project and Di = (xi1 , ..., xij , ..., xin )T is represented by n attributes Xj (1 ≤ j ≤ n). hi denotes the effort class label of project Di . For each xij , which is the value of attribute Xj ) (1 ≤ j ≤ n)on Di , it would be observed or missing. Cross validation on effort prediction is used to to evaluate the performances of missing data handling techniques. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workMotivation. EM (Expectation Maximization) algorithm is a method for finding maximum likelihood or maximum a posteriori estimates of parameters in statistical models. The motivation of applying EM(Expectation Maximization) to na¨ Bayes is to augment the unlabeled projects with ive their estimated effort class labels into the labeled data sets. Thus, the performance of classification would be improved by using more data to train the prediction model. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workLabeled projects and unlabeled projects. For a labeled project DiL , its effort class P(hi = ct ∣DiL ) ∈ {0, 1} is determinate. For an unlabeled project DiU , its label P(hi = ct ∣DiU ) is unknown. However, if we can assign predicted effort class to DiU , then DiU could also be used to update the estimates P{Xj = 0∣ct }, P{Xj = 1∣ct } and P(ct ), and further to refine the effort prediction model P(ct ∣Di ). This process is described in Equations 1, 2, 3 and 4. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workEstimating P ( +1) (Xj = 1∣ct ). The likelihood of occurrence of Xj with respect to ct at + 1 iteration, is updated by Equation 1 using the estimates at iteration. 1 + m xij P ( ) (hi = ct ∣Di ) P( +1) (Xj = 1∣ct ) = i=1 . (1) n+ n j=1 m i=1 xij P ( ) (h = c ∣D ) i t i In practice, we explain P ( +1) (Xj = 1∣ct ) as probability of attribute Xj appearing in a project whose effort class is ct . Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workEstimating P ( +1) (Xj = 0∣ct ). Accordingly, the likelihood of non-occurrence of Xj with respect to ct at + 1 iteration, P ( +1) (Xj = 0∣ct ) is estimated by Equation 2. P( +1) (Xj = 0∣ct ) = 1 − P ( +1) (Xj = 1∣ct ). (2) Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workEstimating P ( +1) (ct ). Second, the effort class prior probability, P ( +1) (ct ), is updated in the same manner by Equation 3 using estimates at the iteration. In practice, we may regard P ( +1) (ct ) as the prior probability of class label ct appearing in all the software projects. m ( ) (h 1+ i=1 P i = ct ∣Di ) P( +1) (ct ) = . (3) l +m Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workEstimating P ( +1) (hi ′ = ct ∣Di ′ ). Third, the posterior probability of an unlabeled project Di ′ belonging to an effort class ct at the + 1 iteration, P ( +1) (hi ′ = ct ∣Di ′ ), is updated using Equation 4. P ( ) (ct )P ( ) (Di ′ ∣ct ) P( +1) (hi ′ = ct ∣Di ′ ) = P ( ) (Di ′ ) n P ( ) (ct ) P ( ) (xi ′ j ∣ct ) (4) j=1 = . l n P ( ) (ct ) P ( ) (xi ′ j ∣ct ) t=1 j=1 Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workEstimating P ( +1) (hi ′ = ct ∣Di ′ ). Hereafter, for labeled projects, if xij = 1, then P ( ) (xij ∣ct ) = P ( ) (Xj = 1∣ct ). Otherwise xij = 0, then P ( ) (xij ∣ct ) = P ( ) (Xj = 0∣ct ). for unlabeled projects, if xi ′ j = 1, then P ( ) (xi ′ j ∣ct ) = P ( ) (Xj = 1∣ct ). Otherwise xi ′ j = 0, then P ( ) (xi ′ j ∣ct ) = P ( ) (Xj = 0∣ct ). Here, P (0) (Xj = 1∣ct ) and P (0) (ct ) are initially estimated by merely the labeled projects at the first step of iteration, and the unlabeled project cases are appended into the learning process after they were predicted probabilistic effort class by P (1) (hi ′ = ct ∣Di ′ ). Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workPredicting the effort class of unlabeled projects. We loop the Equations 1, 2, 3 and 4 until their estimates converge to stable values. Then, P ( +1) (h i′ = ct ∣Di ′ ) is used to predict effort class of Di ′ . The ct ∈ {c1 , ..cl } that maximizes P ( +1) (h i′ = ct ∣Di ′ ) is regarded as the effort class of Di ′ . Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Missing data toleration strategy. Experiments Missing data imputation strategy Threats. Conclusion and future workOutline 1 Introduction 2 Naive Bayes and EM for software effort prediction 3 Missing data handling strategies Missing data toleration strategy. Missing data imputation strategy 4 Experiments The datasets Experiment setup Experimental results 5 Threats. 6 Conclusion and future work Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Missing data toleration strategy. Experiments Missing data imputation strategy Threats. Conclusion and future workInitial setting. When we use Equation 1 to estimate the likelihood of Xj with respect to ct , P(Xj = 1∣ct ) or P(Xj = 0∣ct ), we do not consider missing values involved in xij (1 ≤ i ≤ m). For each Xj , we can divide the whole historical dataset D into two subsets, i.e. D = {Dobs,j ∣Dmis,j } where Dobs,j is the set of projects whose values on attribute Xj are observed and Dmis,j is the set of projects whose values on attribute are unobserved. We may also divide the attributes in a project Di into two subsets, i.e. Di = {Xobs,i ∣Xmis,i } where Xobs,i is the set of attributes whose values are observed in project Di and Xmis,i denotes the set of attributes whose values are unobserved in project Di . Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Missing data toleration strategy. Experiments Missing data imputation strategy Threats. Conclusion and future workMissing data toleration strategy. This strategy is very similar with the method adopted by C4.5 to handle missing data. That is, we ignore missing values in training prediction model. To estimate P ( +1) (Xj = 1∣ct ) under this strategy, we rewrite Equation 1 into Equation 5. ∣Dobs,j ∣ 1+ xij P ( ) (hi = ct ∣Di ) i=1 P( +1) (Xj = 1∣ct ) = n . (5) ∣Dobs,j ∣ n+ i=1 xij P ( ) (hi = ct ∣Di ) j=1 Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Missing data toleration strategy. Experiments Missing data imputation strategy Threats. Conclusion and future workMissing data toleration strategy. The difference between Equations 1 and 5 lies in that only observed projects on attribute Xj , i.e., Dobs,j are used to estimate P ( +1) (Xj = 1∣ct ). Equation 2 can also be used here to estimate P ( +1) (Xj = 0∣ct ). To estimate P ( +1) (ct ), Equation 3 can also be used here. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Missing data toleration strategy. Experiments Missing data imputation strategy Threats. Conclusion and future workMissing data toleration strategy. Accordingly, the prediction model should be adapted from Equation 4 to Equation 6. P ( ) (ct )P ( ) (Di ′ ∣ct ) P( +1) (hi ′ = ct ∣Di ′ ) = P ( ) (Di ′ ) ∣Xobs,i ∣ P ( ) (ct ) P ( ) (xi ′ j ∣ct ) j=1 = . (6) ∣Xobs,i ∣ l P ( ) (ct )P ( ) (xi ′ j ∣ct ) j=1 t=1 Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Missing data toleration strategy. Experiments Missing data imputation strategy Threats. Conclusion and future workOutline 1 Introduction 2 Naive Bayes and EM for software effort prediction 3 Missing data handling strategies Missing data toleration strategy. Missing data imputation strategy 4 Experiments The datasets Experiment setup Experimental results 5 Threats. 6 Conclusion and future work Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Missing data toleration strategy. Experiments Missing data imputation strategy Threats. Conclusion and future workMissing data imputation strategy. The basic idea of this strategy is that unobserved values of attributes can be imputed using the observed values. Then, both observed values and imputed values are used to construct the prediction model. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Missing data toleration strategy. Experiments Missing data imputation strategy Threats. Conclusion and future workMissing data imputation strategy. This strategy is an embedded processing in na¨ Bayes ive and EM and we may rewrite Equation 1 to Equation 7 to estimate P ( +1) (Xj = 1∣ct ). P( +1) (Xj = 1∣ct ) = ∣Dobs,j ∣ ∣Dmis,j ∣ 1+ xij P ( ) (hi = ct ∣Di ) + x˜ P ( ) (hi = ct ∣Ds ) sj i=1 s=1 . n ∣Dobs,j ∣ ∣Dmis,j ∣ n+ { xij P ( ) (hi = ct ∣Di ) + x˜ P ( ) (hi = ct ∣Ds )} sj j=1 i=1 s=1 (7) Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Missing data toleration strategy. Experiments Missing data imputation strategy Threats. Conclusion and future workMissing data imputation strategy. The missing value xsj , which is the value of attribute Xj on the project Ds , is imputed using x˜ with Equation 8 sj ∣Dobs,j ∣ xij P ( ) (hi = ct ∣Di ) i=1 x˜ = sj . (8) ∣Dobs,j ∣ P ( ) (hi = ct ∣Di ) i=1 x˜ is a constant independent of Ds given ct . sj We regulate that x˜ is approximated to 1 if x˜ ≥ 0.5. sj sj Otherwise, x˜ is approximated to 0. sj Here, we also use Equation 3 to estimate P ( +1) (ct ) . Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Missing data toleration strategy. Experiments Missing data imputation strategy Threats. Conclusion and future workMissing data imputation strategy. As for the prediction model, P ( +1) (ct ∣Di ), can be constructed in Equation 9 with considering the missing values. P ( ) (ct )P ( ) (Di ′ ∣ct ) P( +1) (hi ′ = ct ∣Di ′ ) = P ( ) (Di ′ ) n P ( ) (ct ) P ( ) (xi ′ j ∣ct ) j=1 = . (9) n l P ( ) (ct )P ( ) (xi ′ j ∣ct ) j=1 t=1 Note that if xi ′ j is unobserved, it value will be substituted with x˜′ j given by Equation 8. i Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction The datasets Missing data handling strategies Experiment setup Experiments Experimental results Threats. Conclusion and future workOutline 1 Introduction 2 Naive Bayes and EM for software effort prediction 3 Missing data handling strategies Missing data toleration strategy. Missing data imputation strategy 4 Experiments The datasets Experiment setup Experimental results 5 Threats. 6 Conclusion and future work Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction The datasets Missing data handling strategies Experiment setup Experiments Experimental results Threats. Conclusion and future workThe ISBSG dataset. The ISBSG data set (http://www.isbsg.org) has 70 attributes and many attributes have no values in the corresponding places. We extract 188 projects with 16 attributes with the criterion that each project has at least 2/3 attributes whose values are observed and, for an attribute, its values should be observed at least in 2/3 of total projects. 13 attributes are nominal attributes and 3 attributes are continuous attributes. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction The datasets Missing data handling strategies Experiment setup Experiments Experimental results Threats. Conclusion and future workThe ISBSG dataset. We use Equation 10 to normalize the efforts of projects into l(= 3) classes. l × (effortDi − effortmin ) ct = ⌊ ⌋+1 (10) effortmax − effortmin Table: The effort classes in ISBSG data set. Class No. # of projects Label 1 85 Low 2 76 Medium 3 27 High Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction The datasets Missing data handling strategies Experiment setup Experiments Experimental results Threats. Conclusion and future workThe CSBSG dataset. CSBSG dataset contains 1103 projects collected from 140 organizations and 15 regions across China by Chinese association of software industry. We extract 94 projects and 21 attributes (15 nominal attributes and 6 continuous attributes) with same selection criterion of ISBSG data set. We use Equation 10 to normalize the efforts of projects into l(= 3) classes. Table: The effort classes in CSBSG data set. Class No. # of projects Label 1 27 Low 2 31 Medium 3 36 High Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction The datasets Missing data handling strategies Experiment setup Experiments Experimental results Threats. Conclusion and future workOutline 1 Introduction 2 Naive Bayes and EM for software effort prediction 3 Missing data handling strategies Missing data toleration strategy. Missing data imputation strategy 4 Experiments The datasets Experiment setup Experimental results 5 Threats. 6 Conclusion and future work Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction The datasets Missing data handling strategies Experiment setup Experiments Experimental results Threats. Conclusion and future workExperiment setup. To evaluate the proposed method comparatively, we adopt MI and MINI to impute the missing values of the assigned ISBSG and CSBSG dataset. BPNN is used to classify the projects in the data sets after imputation. Our experiments are conducted with 10-flod cross-validation technique. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction The datasets Missing data handling strategies Experiment setup Experiments Experimental results Threats. Conclusion and future workOutline 1 Introduction 2 Naive Bayes and EM for software effort prediction 3 Missing data handling strategies Missing data toleration strategy. Missing data imputation strategy 4 Experiments The datasets Experiment setup Experimental results 5 Threats. 6 Conclusion and future work Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction The datasets Missing data handling strategies Experiment setup Experiments Experimental results Threats. Conclusion and future workEM-T and EM-I on ISBSG dataset. The following figure illustrates the performances, of the missing data toleration strategy (hereafter called EM-T) and missing data imputation strategy (hereafter called EM-I) in handling the missing date for effort prediction on ISBSG data set. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction The datasets Missing data handling strategies Experiment setup Experiments Experimental results Threats. Conclusion and future workEM-T and EM-I on ISBSG dataset. EM−I EM−T BPNN+MI BPNN+MINI 0.8 0.75 Accuracy 0.7 0.65 0.6 0 4 8 12 16 20 # of unlabeled projects Figure: Performances of naive Bayes with EM-I and EM-T in comparison with BPNN on effort prediction using ISBSG data set. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction The datasets Missing data handling strategies Experiment setup Experiments Experimental results Threats. Conclusion and future workEM-T and EM-I on ISBSG dataset. What we can see from the figure. Both EM-I and EM-T have better performances than BPNN with either MI or MINI on classifying the projects in ISBSG data set. The performance of naive Bayes and EM is augmented when unlabeled projects are appended. This outcome illustrates that semi-supervised learning can improve the prediction of software effort. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction The datasets Missing data handling strategies Experiment setup Experiments Experimental results Threats. Conclusion and future workEM-T and EM-I on ISBSG dataset. What we can see from figure. If supervised learning was used for software effort prediction, MINI method is favorable to impute the missing values but missing toleration strategy may not be desirable to handle missing values. Imputing strategy for missing data is more effective than tolerating strategy when naive Bayes and EM is used for predicting ISBSG software efforts. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction The datasets Missing data handling strategies Experiment setup Experiments Experimental results Threats. Conclusion and future workEM-T and EM-I on CSBSG dataset. EM-T and EM-I in handling the missing date for effort prediction on CSBSG dataset. 0.8 EM−I EM−T BPNN+MI BPNN+MINI 0.75 0.7 Accuracy 0.65 0.6 0.55 0.5 0 2 4 6 8 # of unlabeled projects Figure: Performances of EM-I and EM-T in comparison with BPNN on predicting effort with different number of unlabeled projects using CSBSG dataset. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction The datasets Missing data handling strategies Experiment setup Experiments Experimental results Threats. Conclusion and future workEM-T and EM-I on CSBSG dataset. What we can see from the above figure. The better performance of EM-I than EM-T is also observed using CSBSG data set, which is the same as using ISBSG dataset. This further validate our conjecture that EM-I outperforms EM-T in software effort prediction. EM-T has better performance than EM-I on condition that the number of unlabeled projects is larger than that of "maxima", that is different from that of ISBSG dataset. We explain this result may be brought out by the relative small size of CSBSG dataset where imputation strategy will be more prone to bring bias into predictive than toleration strategy. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction The datasets Missing data handling strategies Experiment setup Experiments Experimental results Threats. Conclusion and future workMore experiments and hypotheses testing. More experimental results with explanations are detailed in the paper. Also, we conduct hypotheses testing to examine the significance of the conclusions draw from our experiments. One of interest may refer to the paper. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • IntroductionNaive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future work The threat to external validity primarily is the degree to which the attributes we used to describe the projects and the representative capacity of ISBSG and CSBSG sample datasets. The threat to internal validity are measurement and data effects that can bias our results caused by performance measure as accuracy. The threat to construct validity is that our experiments make use of clipping attributes and clipping project data from both ISBSG and CSBSG datasets Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workConclusion Semi-supervised learning as naive Bayes and EM is employed to predict software effort. We propose two embedded strategies in naive Bayes and EM to handle the missing data. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workFuture work We plan to compare the proposed techniques with other missing data imputation techniques, such as FIML and MSWR. We will develop more missing data techniques embedded with naive Bayes and EM for software effort prediction. We have already investigated the underlying mechanism of missingness (structural missing or unstructured missing) of software effort data. With this progress, we will improve the missing data handling strategies oriented to the underlying missing mechanism of software effort data. Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
    • Introduction Naive Bayes and EM for software effort prediction Missing data handling strategies Experiments Threats. Conclusion and future workThanks Any further questions about the content of the slides and the paper can be sent to Mr. Wen Zhang. Email: zhangwen@itechs.iscas.ac.cn Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm