Hybrid artificial neural network and statistical model 4031 Introduction and overviewEarned value management (EVM) is a convenient and effective method of projectmanagement. It cuts a project into several equal basic periods (generally the basic timeunit, e.g., month or day), and puts the cumulative planned workload (scaled in moneyunit) into each period as planned value (PV); as the project is performing, the actualfinished PV of each period is filled into earned value (EV); plus the actual cost (AC) ofeach period and several related derived indicators, e.g., CPI, SPI, CV, SV, etc., EVMbuilds up a framework to weigh and forecast a project performance status. However, withthe development of EVM, researchers find that EVM performs poorly in the aspect ofweighing project schedule because the indicator SPI is defined by EV/PV and SV isdefined by EV – PV which makes them approach to 1 and 0 respectively in the latter partof the project even if its performance is behind the planned schedule. This isunacceptable when forecast the project total duration cause the result generallyapproaches to the planned duration no matter if a project’s performance is good or bad(Vandevoorde and Vanhoucke, 2006; Lipke et al., 2008); besides, the two indicatorscannot reflect the project performance status properly. Based on EVM, Lipke et al.develops earned schedule (ES) method that it scales the workload in time unit and solvesthe above problem, which is an extension to EVM. Based on ES method, Lipke et al.(2008) apply statistical methods and their well established schedule performanceanalysing technique-IEACt to predict total cost and total duration of a project. Based on IEACt, the paper proposes a new hybrid method for project total durationforecasting, which combines artificial neural network (ANN), random number simulationmethod and statistical method. Experiments and test results show our hybrid methodoutperforms the classic IEACt method in aspect of forecasting accuracy. The paper’s structure is arranged as follows: Section 2 is a brief introduction to EVM,and then our hybrid model is put force and experiments are carried out in Section 3,Section 4 is accuracy test for the comparison of our hybrid model and classic IEACt andconclusions are drawn in Section 5.2 Review of EVMAn understanding of EVM is assumed in this paper. For convenience, we list the basicEVM terminology including ES that portrays the project status and forecasts the totalduration.We need to make an explanation that PV, EV, AC, ES with the performance indicatorsCPI and SPI are all cumulative expressions in default situations, that is to say for eachperiod, these values are calculated by the cumulative values. The periodic expressionscan easily obtained by the difference of the adjacent two cumulative values. In this paper,we denote periodic expressions of above terminology by adding suffix p and periodsuffix t, e.g., the EV value of the 5th period itself is denoted by EVp,5. Earned scheduleframework is a recent extension to EVM, designed for providing reliable and usefulschedule performance information (Lipke et al., 2008; Cioffi, 2006). In earned scheduleframework, the basic metric is ES, which means the schedule duration earned, can becalculated as follows:
404 Y. Li and L. Liu ESt = i + ( EVt − PVi ) ( PVi +1 − PVi ) (1)In the above definition, the tth ES is described as the workload of already finished (i, timeunit) plus a linear interpolation value which is the amount of ES accrued within theincrement of i from PVi to PVi+1. Compared with EVM metrics, ES is specially designedto cover the needs for time scale forecasting, which in traditional EVM metrics SPIapproximates to 1 no matter the performance of EV when a project is nearly finished;hence the indicator SPI could not reflect the project performance properly. However, inES metrics, ES is calculated by the periodic actual finished proportion of plannedworkload, wherever the project is performing, ES could properly express the performancestatus of a project.Table 1 Basic EVM and ES terminology EVM ES Status Earned value (EV) Earned schedule (ES) Actual cost (AC) Actual time (AT) Schedule variance (SV) Schedule variance (time) (SV(t)) SV = EV – PV SV(t) = ES – AT Schedule performance index (SPI) Schedule performance index (time) SPI = EV / PV (SPI(t)) SPI(t) = ES / AT Cost performance index (CPI) Cost variance (CV) Forecasting Independent estimate at complete Independent estimate at complete time (IEAC) (IEAC(t)) IEAC = BAC / CPI IEAC(t) = PD / SPI(t) IEAC(t) = AT + (PD – ES) / PF3 MethodologyAccording to the ES theory, the project total duration could be estimated by independentestimate at completion time (IEAC(t)), it has two forms, short form and long formrespectively: IEACt = PD / SPI (t ) (2) IEACt = AT + ( PD − ES ) / PFt (3)Where AT is the actual time, i.e., the current time; and PFt is the performance factorwhich is generally SPI(t) for duration forecasting. IEACt provides us a convenientforecasting method for project total duration. However, there is an underlyingassumption: the performance of future unfinished part of the project is equal to thecumulative indicator SPI of finished part, i.e., the current SPI(t). Based on SPI(t), Lipkeet al. (2008) propose a statistical calculation method (Lipke et al., 2008; Lipke, 2002;National Institute of Standards and Technology E-handbook of Statistical Methods,2006), we summarise it as follows:
Hybrid artificial neural network and statistical model 405 CL = ln index(cum) ± Z * σ n * AF (4) σ= ∑ ( ln index p (i ) − ln indexc )2 ( n − 1) (5) AF = ( PD − ES ) ( PD − ES / n) (6) IEACt = PD exp (CL) (7)Where CL is the confidence limit, indexp refers to periodic index values and indexc refersto cumulative index values. The index in our study refers to SPI, Z is the t distributionvalue representing the level of confidence (95% in this paper), we use t distributioninstead of normal distribution because our data sample is less than 30. σ is the standarddeviation of SPI, n is the number of observations, and AF is the adjusted factors for finitepopulation, which is derived from the statistics formula (( N − n) / ( N −1)). In this paper, we adopt this statistical method. However, we make a little change:employing long forms instead of short forms; meanwhile, we make an extension to theIEACt assumption as follows (the current time is t):Assumption 1: The schedule performance index of future unfinished part of the project isnot exactly equal to the current cumulative SPIc,t as classic IEACt does, but equal to a ∧forecasting value SPI c , t +1 , which could reflect the performance of unfinished part of theproject more properly than SPIc,t.Assumption 2: The periodic EVp,t+1, ACp,t+1 and ESp,t+1 conform to normal distributionrespectively with parameters μ and σ2, where μ is the mean of the nearest T number ofperiodic EVp,i, ACp,i, ESp,i (i = t – (T – 1,…,t) respectively, σ2 is their correspondingdeviation respectively.The main idea of Assumption 2 lies in: we believe the performance of future unfinishedpart of the project could be expressed by the nearest T number of EVp,i, ACp,i, ESp,i(i = t – (T – 1,…,t) to it instead of the whole finished part from 1 to t, i.e., the latestseveral periodic EV, AC, ES values could express better or have the greater possibility toexplain the future performance more than the whole finished periodic ones until t time.Besides, we believe that for the periodic EV, AC, ES, the three metrics’ changing rangesof future undone part (especially those of the next one period) have greater possibility tofall in the regions formed by normal random numbers with the parameters of μ and σ2respectively, instead of being simply equal to the means of t number of EV, AC, ESrespectively as traditional IEACt does. For this consideration, we cut the whole forecasting process into two stages:In the first stage we employ ANN back propagation algorithm to forecast thenext periodic earned schedule ESp,t+1 just one step further, and make ∧ ∧SPI c ,t +1 = ( ES p ,t +1 + ESc ,t ) / (t + 1) as the future performance of the project. In Stage 2,we put the forecasting value into the statistical framework to perform interval estimations ∧for total duration; meantime, we make a replacement SPI c ,t +1 for PFt in formula (3) toforecast the project total duration. The methodology of our hybrid model is summarisedas follows:
406 Y. Li and L. Liu1 Suppose the current time is t(t ≥ 3), set T =3.2 Based on the available finished t number of periodic EVp,i, ACp,i, ESp,i (i = t – (T – 1,…,t), three groups of normal distribution random numbers EVp,R, ACp,R, ESp,R are generated with their own parameters of μ and σ2 respectively.3 Regard EVp,R and ACp,R as the input nodes, ESp,R as the output node, with the help of ∧ ∧ ANN to forecast ES p ,t +1 and then to obtain SPI c ,t +1 as the performance index of future unfinished part of the project.4 Employ formula (3) to (6) to make interval estimation and point estimation of the project total duration standing at period t.5 t = t + 1, loop Step (2) to (5) until the project is actually finished.3.1 ANN forecasting procedureAlthough traditional models outperform in terms of accurately describing thephenomenon of long-term trends (Sallehuddin et al., 2009; Zou et al., 2007; Kayacan etal., 2010; Yao et al., 2003), they require a large amount of observations to construct themodel. Unlike these forecasting requirements, forecasting within a project has muchfewer data, including the number of variables and observations. At the very beginning ofthe project, we have only two to three month data and less than six indicators;furthermore, related literatures suggest that detailed project analysis is a burdensomeactivity. Thus, some widely used model like time series model and classic statisticmethod do not match this type of forecasting. In this paper, we employ two input nodes – one hidden layer – one output nodearchitecture ANN to forecast ESp,t+1. Training a network is an essential factor for thesuccess of the neural networks (Satish Kumar, 2006). Among the several learningalgorithms available, back-propagation is the most popular and most widely implementedlearning algorithm of all neural networks paradigms. In this paper, the algorithm ofback-propagation is employed and in the following experiment. In order to construct the training set, based on the Assumption 2, we generate 1,000normal distribution random numbers for the periodic EVp,t+1, ACp,t+1 and ESp,t+1respectively. Each randomly selected corresponding ternary terms as a pattern (SatishKumar, 2006), then we have got 1,000 patterns as the training set. The 1,000 ternaryterms cover as much as possible combinations of periodic EV, AC and ES, which is asimulation to actual performance situation. According to the cross validation theory, werandomly select the learning set, the validate set and the test set from the training set,where EVp,t+1, ACp,t+1 are the input nodes, ESp,t+1 is the output node, via pattern trainingmode, after learning and validation, the test set is filled in the well-trained ANN, then the ∧ ∧forecasting value ES p ,t +1 is obtained and so does SPI c ,t +1 . ∧ In view of the random nature of our training set, for each forecasting value ES p ,t +1 ,we repeat the above forecasting process 100 times, and calculate its average value as our ∧final forecasting value ES p ,t +1 . Since our ANN is not designed for the specific project,the number of nodes in the hidden layer is not necessarily designed in details. We
Hybrid artificial neural network and statistical model 407uniformly set it to 20. A real-life project data from Fabricom Airport system(Vandevoorde and Vanhoucke, 2006) is employed as the experiment data, the briefinformation of which is listed in Table 1. The whole model is programmed withMatlab 7.11.Table 2 The brief project data of Fabricom Airport system AT 1 2 3 4 5 6 7 PV 3023 5508 7828 10098 12158 13951 14205 EV 928 1904 2467 3414 4472 7152 7476 AC 1606 2766 4324 6138 7888 9835 10135 8 9 10 11 12 13 14 PV 15933 17902 19967 22208 24286 26331 26658 EV 9272 11441 13302 14699 15985 16753 17077 AC 13217 14755 16656 18768 20897 23364 23664 15 16 17 18 19 20 21 PV 28647 30989 33040 34909 36709 38016 38140 EV 20318 23061 26588 28681 30135 31487 32526 AC 26651 28437 30408 32012 34000 35554 37111 22 23 24 25 26 PV 38140 38140 38140 38140 38140 EV 33504 34513 36489 37630 38140 AC 38468 39798 41155 42600 43983For the consideration of S-curve and the scarcity nature of samples of availablecumulative ESc,i(i = 1, 2,…,t), we also employ the Grey Verhulst rolling model (Kayacanet al., 2010) to forecast ESc,t+1, the rolling cycle is also set to 3 (which is equal to T of ourmethod). However, the relative error percentage is larger than that of ANN method. Theforecasting results of two frameworks are listed below for comparison, where the actualvalue is the actual cumulative parameters ESc,t+1 in each period. The forecasting details are described as follows: Set T = 3 At the very beginning of the forecasting, suppose only three month is performed, i.e.,current time t = 3, so we have three true values for EVc,i, ACc,i, ESc,i (i = 1, 2, 3)respectively, that is also mean we have got EVp,i, ACp,i, ESp,i (i = 1, 2, 3). Based on thesetrue values, we generate random numbers and training set according to the above ∧ ∧framework, then forecast ES p ,t +1 just one step further, i.e., ES p ,4 , so the forecasting ∧ ∧value ESc ,4 = ES p ,4 + ESc ,3 is naturally obtained. The comparative error percentage iscalculated by ∧ error % = ESc,t +1− ESc ,t +1 / ESc ,t +1 *100 (8)
408 Y. Li and L. LiuTable 3 ES forecasting results of two models GM(1,1) Verhurlst model ANN model Actual value AT ∧ ∧ (ESc,t+1) Error (%) Error (%) ESc ,t +1 ESc ,t +1 4 1.1573 0.86 26 1.09 6.12 5 1.5831 1.87 18.27 1.45 8.63 6 2.7086 2.06 23.77 1.90 29.82 7 2.8483 7.12 149.93 3.23 13.58 8 3.6361 2.84 21.85 3.33 8.46 9 4.6519 5.52 18.65 4.15 10.87 10 5.6380 5.94 5.44 5.21 7.54 11 7.2859 6.49 10.94 6.25 14.28 12 8.0264 10.30 28.36 8.03 0.008 13 8.4165 8.24 2.06 8.77 4.14 14 8.5810 8.59 0.1 9.12 6.32 15 10.1566 8.64 14.94 9.27 8.74 16 11.4105 11.51 0.88 10.93 4.20 17 13.7859 12.30 10.81 12.22 11.39 18 15.0145 18.62 24.01 14.71 2.04 19 15.6354 15.49 0.91 15.96 2.08 20 16.2428 15.90 2.11 16.57 2.03 21 16.7494 16.83 0.50 17.16 2.43 22 17.2483 17.16 0.50 17.65 2.35 23 17.7881 17.74 0.28 18.13 1.90 24 18.8778 18.37 2.67 18.64 1.25 25 19.7047 20.89 6.01 19.75 0.23 26 21 20.30 3.33 20.56 2.10 RMSE 1.41 0.57 MSE 2.0 0.32 MAPE(%) 16.19 6.55 MAD 0.89 0.45In the next forecasting process t = 4, we use the nearest T (= 3) true values of EVc,i, ACc,i, ∧ESc,i (i = 2, 3, 4) to forecast ESc ,5 . The process is performed until the project is finished.We list the detail forecasting values of every forecasting period guided by twoforecasting frameworks based on the experiment project data in Table 3. To evaluate thetwo forecasting metrics, four statistical test indexes [formula (9) to (12)] are carried out:root mean square error (RMSE), mean square error (MSE), mean absolute percentageerror (MAPE), mean absolute deviation (MAD). In all the four aspects of basicforecasting performance indexes, the designed ANN model outperforms the GM(1,1)
Hybrid artificial neural network and statistical model 409Verhurlst rolling model. Hence, in the first stage of forecasting, we employ ANN insteadof GM(1,1) Verhurlst rolling model. n 1 RMSE = n ∑ (observed − predicted ) t =1 t t 2 (9) n 1 MSE = n ∑ (observed − predicted ) t t 2 (10) t =1 n observedt − predictedt 100 MAPE = ∑ observedt × n (11) t =1 n observedt − predictedt MAD = ∑t =1 n (12)3.2 Statistical estimation for total durationIn Stage 2, for each period, we make use of the latest one step further forecasting value ∧ ∧ESc ,t +1 to calculate the schedule performance index SPI c,t +1 as the SPI of futureunfinished part, which is our Assumption 1. The benefit of doing this lies in: Firstly, the classic IEACt method regards the current SPIc,t as the SPI of future undonepart. In essence, this is a kind of simple averaging process because the currentSPIc,t = ESc,t/t, that is to say, SPIc,t is the average performance ability of all t periods ∧already finished. In comparison, SPI c,t +1 is a kind of fitting value by fitting the averageperformance ability of the nearest T periods, which can better express the currentperformance status of the project, which is an extension to SPIc,t as we employ randomnumber simulation method to construct a scene that can simulate the performance abilitytrends (the trends are calculated by fitting many different ternary ties of EV, AC, ES,which can cover more possible actual situations close to actual performance). ∧ Secondly, standing at the current period, due to the extra forecasting SPI c ,t +1 , thesample number n in the formula increases compared to the classic based on n truesamples; besides, low error percentage forecasting could bring an effect that if we hadknown the true value of the next period, according to the experiments provided by Lipkeet al. (2008), which proves that the logarithm of periodic indexes of SPI or CPIapproximates to a normal distribution and would become more and more stable after 30%percent of the project. The above explanation enables our one step further forecasting toproduce an analogical effect as if we stood at the next period to forecast the totalduration, which has more possibilities to perform a comparatively better result in bothaspects of interval estimation and point estimation. Based on the above explanations, we draw the result of total duration forecasting bothin our hybrid method and the classic IEACt method from the 4th period to the 26th,including both interval and point estimation. The details for each forecasting round arelisted in Table 4, and shown in Figure 1.
Hybrid artificial neural network and statistical model 4114 TestsFrom Figure 1, we could easily observe that the interval estimation of our hybrid methodis better than the classic IEACt, as to the effect of two methods are similar in the aspect ofpoint estimation, paired t-test is carried out on forecasting accuracy (error%) to test thehypotheses. The aim of this test is to check whether the means of forecasting valuesobtained from our hybrid method are different from those of classic IEACt. Therefore, thefollowing hypotheses are proposed:H01 There is no difference between the means of the classic IEACt and the proposed hybrid method (μ1 = μ2).H11 There is a difference between the means of the classic IEACt method and the proposed hybrid method (μ1 > μ2 or μ1 < μ2).H02 The means of IEACt lower boundary and that of the proposed hybrid method are equal (IEACt_lower = Hybrid_lower).H12 The means of IEACt lower boundary and that of the proposed hybrid method are not equal (IEACt_lower > Hybrid_lower or IEACt_lower < Hybrid_lower).H03 The means of IEACt upper boundary and that of the proposed hybrid method are equal (IEACt_upper = Hybrid_upper).H13 The means of IEACt upper boundary and that of the proposed hybrid method are not equal (IEACt_upper > Hybrid_upper or IEACt_upper < Hybrid_upper).From the test results, the 2-tailed Sig (μ1 ≠ μ2) is 2.8727460171483986E-4, thecorresponding H01 single-tailed Sig (μ1 > μ2) is 1.4363730085741993E-4, under 95%confidence level the test is significance, so we reject the null hypothesis H01. Then, we test the interval estimation accuracy of two models, we compare the upperand lower boundaries of two models respectively, the results are combined in Table 4. From the test results, the 2-tailed Sig (IEACt_lower ≠ Hybrid_lower) of two models’lower boundaries is 0.0013633818441228292, hence the corresponding H02 single-tailedSig (IEACt_lower < Hybrid_lower is 0.0006816909220614146, under 95% confidencelevel the test is significance, so we reject the null hypothesis H02. Similarly, the 2-tailed Sig (IEACt_upper ≠ Hybrid_upper) of two models’ upperboundaries is 2.7602225137402837E-4, so the corresponding H03 single-tailed Sig(IEACt_upper > Hybrid_upper) is 1.38011125687014185E-4, under 95% confidencelevel the test is significance, so we reject the null hypothesis H03. To observe the test results, it is found that our hybrid model outperforms thetraditional IEACt in both aspects of the means of point estimation and interval estimation,especially for the interval estimation, in the former part of the project, our model couldachieve much more accurate interval estimation than IEACt does.
412 Y. Li and L. LiuTable 5 Paired t-test results of two model means Paired differences 95% confidence Std. Sig. Std. interval of the t df Mean error (2-tailed) deviation difference mean Lower UpperPair 1 IEAC(t) – 1.936870 2.158306 .450038 1.003548 2.870191 4.304 22 0.000 Hybrid_forecastTable 6 Paired t-test results of two model boundary means Paired differences 95% confidence interval Sig. Std. Std. error t df Mean of the difference (2-tailed) deviation mean Lower UpperPair 1 IEAC_lo – –1.480217 1.937407 .403977 –2.318015 –.642420 –3.664 22 .001 Hybrid_loPair 2 IEAC_up – 2.403217 2.667834 .556282 1.249559 3.556875 4.320 22 .000 Hybrid_up5 ConclusionsIn this paper, we propose a new hybrid model for project total duration forecasting Withthe help of random number simulation and ANN non-linear fitting ability to simulate theactual possible project performance combinations of EV, AC and ES, so as to betterforecast the performance factor SPI one step further to improve the traditional IEACt (inwhich the latest SPI is regarded as PF of future unfinished part), then replace that of theclassic IEACt to forecast the total duration of the project. Forecasting results and testsshow that our hybrid method outperforms the classic IEACt in the both aspects of pointestimation and interval estimation. Of course, to make a proper T is not an easy task,different projects due to their different changing trends of performance factor, T isdifferent; however, our model provides a new idea to better estimate the future(especially the next one period) project performance index of SPI especially in thebeginning of a project when the true performance status indexes are scarce, which is alsothe blind period of a project, because when the project approximates to an end, itsperformance status is almost clear, and the forecasting for the total duration becomes lesssignificant. So our model provides a more effective SPI estimation method for the futureundone part of a project in its beginning period. Of course our model is far from perfect, since it is build on a hypothesis that the threemain metrics: EV, AC, ES of the next one period fall in the range of normal randomnumbers generated by their respective nearest T number of finished periods’ ones, if theabove three metrics change too sharply against their nearest T ones respectively, themodel may not perform well. From Figure 1, we could observe that the advantages of ourmethod are mainly highlighted in the former and middle part of the whole project, as to
Hybrid artificial neural network and statistical model 413the final part, its prominent effect sharply decreases. This is because the normal randomnumbers are a kind of simulation of possible actual combinations of performance status,which is especially effective when the true performance indexes are scarce, with theincrease of true indexes this simulation becomes weaken; besides, when a projectapproximates to its latter part, SPI tends to stabilisation, random numbers may disturbthis trend, under this circumstance, the traditional IEACt based on the average of allhistory data could hit the bull’s-eye more easily. Future work could further extend some research on the final part of the project, sincethere are sufficient number of the performance indicators, we could employ a hybridmethod of time-series and EVM to have a try.AcknowledgementsThe research is supported by the National Natural Science Foundation of China underGrant No. 90924020 and the PhD Program Foundation of Education Ministry of Chinaunder Contract No. 200800060005.ReferencesCioffi, D.F. (2006) ‘Designing project management: a scientific notation and an improved formalism for earned value calculations’, International Journal of Project Management, Vol. 4, No. 2, pp.289–302.Kayacan, E., Ulutas, B. et al. (2010) ‘Grey system theory-based models in time series prediction’, Expert System with Applications, Vol. 37, No. 2, pp.1784–1789.Lipke, W. (2002) ‘A study of the normality of earned value management indicators’, Meas. News, pp.1–16.Lipke, W. et al. (2008) ‘Prediction of project outcome the prediction of statistical methods to earned value management and earned schedule performance indexes’, International Journal of Project Management, doi:10.1016/j.ijproman.2008.2.009.National Institute of Standards and Technology E-handbook of Statistical Methods (2006) ‘Lognormal distribution’, available at http://www.itl.nist.gov/div898/handbook/eda/section3/ eda3669.htm.Sallehuddin, R. et al. (2009) ‘Hybrid grey relational artificial neural network and auto regressive integrated moving average model for forecasting time-series data’, Applied Artificial Intelligence, Vol. 23, No. 5, pp.443–486.Satish Kumar (2006) Neural Networks, pp.165–194, Tsinghua University Publishing Company, Beijing, China.Vandevoorde, S. and Vanhoucke, M. (2006) ‘A comparison of different duration forecasting methods using earned value metrics’, International Journal of Project Management, Vol. 24, No. 4, pp.289–302.Yao, A.W.L., Chi, S.C. et al. (2003) ‘An improved grey-based approach for electricity demand forecasting’, Electric Power System Research, Vol. 67, No. 3, pp.217–224.Zou, H.F. et al. (2007) ‘An investigation and comparison of artificial neural network and time series models for Chinese food grain price forecasting’, Neurocomputing, Vol. 70, Nos. 16– 18, pp.2913-2923.