The Application of
Machine Learning
Algorithms for accurate
Estimation of Software
Development Effort (SDE)
The Application of Machine Learning
Algorithms for Estimation of
Software Development Effort (SDE)
Ph.D. Registration Seminar
Under the Supervision
of
Dr. Saurabh Bilgaiyan
Assistant Professor
B Kiran Kumar (1981037)
School of Computer Engineering, KIIT
University, Bhubaneswar
December 14, 2022
Introduction
Motivation
Objectives
Basic Concepts
Optimization
Algorithms
Machine
Learning
Related Work
Proposed Work
Plan of Action
Conclusion
References
Publications
Seminar Outline
1. Introduction
2. Basic Concepts
3. Optimization Algorithms
4. Machine Learning
5. Related Work
6. Proposed Work
7. Plan of Action
8. Conclusion
9. References
10.Publications
Introduction
• Software effort estimation deals with estimating the
software effort, which is essential to build a software
project [1]
• It considers estimates of schedules, a probable sum of
cost and manpower essential in building a software
project
• The software effort estimation approach will foretell the
practical quantity of effort required to maintain and
develop the software from undetermined, insufficient,
noisy, and inconsistent data input [2]
Motivation
• Over the past few decades, the much-needed response
to the question of how accurately, the cost of the
project can be measured or forecasted at the start and
construction phase of the project is very important.
• If the project is not estimated at the early stage then
the probability of project failure is higher.
• According to latest surveys, the key explanation for the
failure of the project is accuracy in estimating prices, in
the starting phases. Both underestimation and
overestimation of the projects can cause severe
problems in software development. [3]
• It is the explanation for enhancing the planning to
estimate the best exact value for the project's
performance at optimized cost and effort.
Objectives
Based on the motivations outlined in the previous
section, we set the following objectives for our research
work:
 To develop a novel feature selection method using
optimization algorithm
 To propose a feature selection ensemble model for the
accurate prediction of software effort estimation
 Improving predictive ability of software effort estimation
using stacking of deep learning algorithms on software
datasets.
Optimizations Algorithms
• In the Artificial Intelligence, the search for the optimal
solution in the search space is one of the common issues
in almost all sub-fields such as machine learning, deep
learning[4].
• The traditional search techniques depend on the
heuristic concepts which are problem-dependent trying
to search for approximate solutions often with less focus
on the quality[5].
• Meta heuristic-based algorithms attract the attentions of
artificial intelligence research communities due to its
impressive capability in manipulating the search space of
optimization problems [6]
Optimizations Algorithms(Cont..)
• Meta heuristic-based algorithms are conventionally
categorized in several ways [7]:
– Nature-inspired vs. non-nature inspired
– population-based vs. single point search,
– dynamic vs. static objective function,
– single vs. various neighborhood structures,
– memory usage vs. memory-less.
• Recently, the research community combined these
algorithms in categories based on the natural inspirations
of the Metaheuristic algorithm such as evolutionary
algorithms (EAs), Local Search (LSA), Swarm intelligence,
physical based, chemical-based, human-based[8]
Optimizations Algorithms(Cont..)
Machine learning Techniques
• The following machine learning techniques are applied over
the various datasets considered to calculate the effort of a
software product.
• The decision about choosing a machine learning technique
for implementation purpose in the proposed research is
performed based on the past research study done in the
literature survey[9].
• Some of the machine learning techniques are decision tree,
SVR, Random forest, SGB, XGB, ANN etc .
Related work
Optimizing Software Effort
Estimation Models Using Firefly
Algorithm (Ghatasheh et al. 2019)
Comments:
• In this work, Firefly Algorithm
is proposed as a meta-heuristic
optimization method for
optimizing the parameters of
three COCOMO-based models.
• These models include the basic
COCOMO model and other two
models proposed in the
literature as extensions of the
basic COCOMO model.
• The developed estimation
models are evaluated using
different evaluation metrics.
An effective approach for
software project effort and
duration estimation with
machine learning algorithms
(Pospieszny et al. 2018)
Comments:
• This paper introduces an
ensemble averaging model
using three machine learning
algorithms (Support Vector
Machines, Neural Networks
and Generalized Linear
Models)
Related work
On the value of parameter tuning in heterogeneous
ensembles effort estimation (Hosni et al., 2018)
Comments:
➢ In this paper, heterogeneous ensembles based on four well-
known machine learning techniques (K-nearest neighbor,
support vector regression, multilayer perceptron and decision
trees) were developed and evaluated by investigating the impact
of parameter values of the ensemble members on estimation
accuracy.
➢ In particular, this paper evaluates whether setting ensemble
parameters using two optimization techniques (e.g., grid search
optimization and particle swarm) permits more accurate
estimates of SDEE.
➢ The heterogeneous ensembles of this study were built using
three combination rules (mean, median and inverse ranked
weighted mean) over seven software datasets.
Related work(cont..)
• Rankovic et al [1] proposed a new approach to software
effort estimation using different artificial neural network
architectures and Taguchi orthogonal arrays.
• Minku, L. L [2] proposed a novel online supervised hyper
parameter tuning procedure applied to cross-company
software effort estimation.
• Ezghari, S., & Zahi, A. [3] proposed software effort
estimation using a consistent fuzzy analogy-based method.
• Ghatasheh, N., Faris, et al[4] proposed Optimizing technique
in software effort estimation models using the firefly
algorithm.
Related work(cont..)
• Fávero, E. M. D. B., et al[5] proposed A model for software
effort estimation using pre-trained embedding models
• Sehra, S. K. et al [6]. proposed Software effort estimation
using FAHP and weighted kernel LSSVM machine.
• Xia, T., Krishna, et al[7] proposed a Hyper-parameter
optimization for effort estimation.
• Kaushik, A., & Singal, N. [8] proposed A hybrid model of
wavelet neural network and metaheuristic algorithm for
software development effort estimation.
Major Findings From The Literature Survey Based On The
Evolutionary Algorithms (EA) and Swarm Optimization
Algorithms:
• Hyper parameters tuning is crucial as they control the
overall behavior of a machine learning model. Every
machine learning models will have different hyper parameters
that can be set.
• A hyper parameter is a parameter whose value is set before
the learning process begins.
• But all EA’s and PSO algorithms depends on parameter
tuning.
• For this reason, evolutionary algorithms are best employed on
problems where it is difficult or impossible to test for
optimality.
• This also means that an evolutionary algorithm never knows
for certain when to stop, aside from the length of time, or the
number of iterations or candidate solutions, that you wish to
allow it to explore [10-15].
Major Findings From The Literature Survey Based On The
Evolutionary Algorithms And Swarm Optimization
Algorithms[16-21]:
• All the evolutionary and swarm intelligence based
algorithms are probabilistic algorithms and require
common controlling parameters like population size,
number of generations, elite size, etc.
• Besides the common control parameters, different
algorithms require their own algorithm-specific control
parameters.
• The improper tuning of algorithm-specific parameters
either increases the computational effort or yields the local
optimal solution.
Key control parameters of popular evolutionary algorithms:
Proposed work
Software Effort Estimation based on Extreme gradient
boosting algorithm and Modified Jaya optimization
algorithm
• Context: Software development effort estimation is regarded
as a crucial activity for managing project cost, time, and
quality as well as for the software development life cycle. As
a result, proper estimating is crucial to the success of
projects and to lower risks.
• Method: Therefore in this work, we propose a Modified Jaya
algorithm to improve the effectiveness of the estimated
model, modified JOA selects the ideal subset of components
from a large feature collection. Then machine learning based
Extreme gradient boosting algorithm is used to estimate the
software effort.
Proposed Architecture
Jaya Algorithm
• It can be used for maximization or minimization of a
function
• It is a population based method which repeatedly modifies a
population of individual solutions
• To reduce the search space and gives optimal solutions
• Formulated for solving constrained and unconstrained
optimization problem
• Based on the concept that the solution obtained for a given
problem should move towards the best solution and should
avoid the worst solution[22]
Common control parameters
Population size
No of generations(iterations)
JAYA algorithm flowchart
Pseudo code: Modified JOA
Input:
• n- Population size or number of candidates
• m- Number of design variables or generators
• Z- Maximum number of iteration
• X- Population
Output:
• best solution of the cost or fitness function (j)
• Worst solution of the objective or fitness function (j)
• the best candidate for the fitness function
• worst candidate for fitness function
Pseudo code: Modified JOA
Start
For k=1 to n do
For i=1 to m do
Initialize population (Xi,k)
End
End
Compute x_best, x_worst
Set l=1
While the maximum number of iterations (Z)
For k=1 to n do
For i=1 to m do
Initialize the population
End
End
Update
End
Pseudo code: Modified JOA
If the solution 𝑋′𝑖,𝑘,𝑙 is better than 𝑋𝑖,𝑘,𝑙
𝑠𝑒𝑡 𝑋𝑖,𝑘,𝑙=𝑋′𝑖,𝑘,𝑙
Else
𝑠𝑒𝑡 𝑋𝑖,𝑘,𝑙+1=𝑋′𝑖,𝑘,𝑙
End
Set l=l+1
Update the weight parameter (w)
Update x_best and x_worst
End while
End
• To improve the effectiveness of the estimated model,
modified JOA selects the ideal subset of components from a
large feature collection.
• It merely highlights the most crucial elements during the
selection process. To reduce the computing cost of the
problem, it eliminates redundant or pointless features to
produce a good subset of features.
• Any optimization problem should focus on accelerating
convergence and identifying the optimal value of the cost (or
goal) function (J).
• To be effective, an evolutionary algorithm must continuously
strike a balance between global and local search. To speed
up the convergence of the computation of the optimal
solution, a weight parameter (w) is introduced to the original
Jaya algorithm's .
• While Z is the entire amount of iterations, l is the current
iteration, and w is the weight parameter, which declines
linearly throughout the course of the algorithm from a
relatively high value to a low one. 𝑤𝑓 and 𝑤0 are the
highest and lower values of w, respectively.
• Objective function
Error=α*error+β(selected features/total number of features)
(1)
• The major reason for introducing this weight component is
to quicken the search so that the difference can be reduced
more quickly.
• To speed up the convergence of the computation of the
optimal solution, a weight parameter (w) is introduced to
the original Jaya algorithm's
• The result is that the enhanced Jaya optimization reduces
the number of searching agents, iterations, and processing
load while also improving search performance.
• The updated Jaya algorithm's search ability is constantly
adjusted to balance global and local search by modifying
the weight
Feature selection using ensemble ML
techniques
• Insignificant variables may impair the performance of
machine learning systems.
• In order to eliminate those having little influence on the
target variables, selection operator (LASSO) and stepwise
regression were used in addition to Pearson correlation.
• The findings showed that every one of them has the
potential to affect predictions of effort and duration
• This work presents an efficient method for developing effort
and duration models using machine learning algorithms,
based on current practical research findings and best
practices
• Additionally, the generated input dataset is used for
modeling with the usage of cross validation, ensemble
averaging, and three ML algorithms (SVM, RF, and XGB).
Stacking of deep learning (DL)algorithms
• Deep Learning, a subfield of Machine Learning, was
pioneered by Hinton, LeCun, Bengio and Andrew Ng, and
gained popularity from around 2010, with their high
potential being established with the availability of large-
scale data and high-speed computing devices[23]
• This works presents an efficient method for developing
effort and duration models using deep learning algorithms,
based on current practical research findings and best
practices.
• The DL algorithms include CNN and LSTM models, and
stacking of DL algorithms based on optimal features
selected from optimization algorithm applied.
• The stacking of these model is theoretically expected to
give a better performance than individual regressors. These
models are being experimented upon and the parameters
adjusted according to performance.
Datasets used
Performance evaluation Metrics
• Mean Absolute Error (MAE): Error magnitude can be
easily determined using the MAE. The absolute
differences between the values anticipated and those
observed make up the majority of them. 0 is the finest
quality value for the measure, which has a range from 0
to infinite.
(2)
In this case, e represents the error of the ith observation out
of the n occurrences that are being examined
Performance evaluation Metrics
• Root Mean Square Error (RMSE): The root means
square error (RMSE) calculates the average quadratic
difference between a model's predicted values and the
actual values. This last value is the best one that can be
used because the measure is always positive or zero.
(3)
While "p" stands for the predicted values and "a" for the
actual value, there is a standard deviation for "n"
occurrences
Performance evaluation Metrics
• Prediction Accuracy (PRED): The percentile notion
serves as the foundation for the metric PRED. The
percentage of predictions within a t% of the true value of
a. PRED calculates the proportion of estimates that fall
within the tolerance limits of 25%, or typical estimates
for t = 25.
(4)
Performance evaluation Metrics
• Accuracy: Accuracy can be termed as number of correct
predictions by total number of predictions.
• For classifications accuracy is calculated in terms of
positives and negatives as shown below:
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
(6)
Comparison of the proposed algorithm with GA, PSO, and
ACO in terms with accuracy:
Dataset ML Technique Accuracy
Kemer
FSGA 0.90
FSPSO 0.89
ACO 0.90
Proposed 0.99
Desharnais
FSGA 0.89
FSPSO 0.86
ACO 0.91
Proposed 0.96
Maxwel
FSGA 0.91
FSPSO 0.89
ACO 0.88
Proposed 0.92
China
FSGA 0.93
FSPSO 0.88
ACO 0.91
Proposed 0.93
Proposed thesis workflow
To propose a feature selection
ensemble model for the accurate
prediction of software effort
estimation
To develop a novel feature selection
method using optimization
algorithm
Improving predictive ability of
software effort estimation using
stacking of deep learning
algorithms on software datasets.
Task 1
Task 3
Task 2
Plan of Action
Work completed
• Course work((Research Methodology, DMDW, High
Performance computing (HPC), Algorithms and Complexity)
• Literature survey
• To develop a novel feature selection method using
optimization algorithm
Work to de done
• To propose a feature selection ensemble model for the
accurate prediction of software effort estimation
• Improving predictive ability of software effort estimation
using stacking of deep learning algorithms on software
datasets.
• Thesis writing
• Synopsis seminar
Conclusion
• Finding a reliable and accurate estimate of the software
development effort has always been difficult in software
engineering.
• The area would benefit immensely from an accurate
estimate of the project's first effort and expense.
• Modified JOA is proposed to select the optimal features.
• It eliminates redundant or pointless features to produce a
good subset of features and it reduces the computational
cost.
• This works aims to show ensemble models provide more
accurate results than single base learners models.
• This work aims to explore the applicability of stacking
modern deep learning architectures for the purpose of
SDEE
References
[1] Rankovic, N., Rankovic, D., Ivanovic, M., & Lazic, L. (2021). A new
approach to software effort estimation using different artificial neural network
architectures and Taguchi orthogonal arrays. IEEE Access, 9, 26926-26936.
[2] Minku, L. L. (2019). A novel online supervised hyperparameter tuning
procedure applied to cross-company software effort estimation. Empirical
Software Engineering, 24(5), 3153-3204.
[3]Ezghari, S., & Zahi, A. (2018). Uncertainty management in software effort
estimation using a consistent fuzzy analogy-based method. Applied Soft
Computing, 67, 540-557.
[4] Ghatasheh, N., Faris, H., Aljarah, I., & Al-Sayyed, R. M. (2019). Optimizing
software effort estimation models using the firefly algorithm. arXiv preprint
arXiv:1903.02079.
[5] Fávero, E. M. D. B., Casanova, D., & Pimentel, A. R. (2022). SE3M: A
model for software effort estimation using pre-trained embedding models.
Information and Software Technology, 147, 106886.
[6]Sehra, S. K., Brar, Y. S., Kaur, N., & Sehra, S. S. (2019). Software effort
estimation using FAHP and weighted kernel LSSVM machine. Soft Computing,
23(21), 10881-10900.
References
[7]. Xia, T., Krishna, R., Chen, J., Mathew, G., Shen, X., & Menzies, T.
(2018). Hyperparameter optimization for effort estimation. arXiv preprint
arXiv:1805.00336.
[8]. Kaushik, A., & Singal, N. (2019). A hybrid model of wavelet neural
network and metaheuristic algorithm for software development effort
estimation. International Journal of Information Technology, 1-10.
[9]. Phan, H., & Jannesari, A. (2022). Heterogeneous Graph Neural
Networks for Software Effort Estimation. arXiv preprint arXiv:2206.11023.
[10]. Rhmann, W. (2021). An ensemble of Hybrid Search-Based algorithms
for Software effort prediction. International Journal of Software Science and
Computational Intelligence (IJSSCI), 13(3), 28-37.
[11]. Alhammad, N. K. L., Alzaghoul, E., Alzaghoul, F. A., & Akour, M.
(2020). Evolutionary neural network classifiers for software effort estimation.
International Journal of Computer Aided Engineering and Technology, 12(4),
495-512.
References
[12] De Carvalho, H. D. P., Fagundes, R., & Santos, W. (2021). Extreme
Learning Machine Applied to Software Development Effort Estimation. IEEE
Access, 9, 92676-92687.
[13] Padmaja, M., & Haritha, D. (2018). Software effort estimation using grey
relational analysis with K-Means clustering. In Information Systems Design
and Intelligent Applications (pp. 924-933). Springer, Singapore.
[14]Predescu, E. F., Tefan, A., & Zaharia, A. V. (2019). Software effort
estimation using multilayer perceptron and long short term memory.
Informatica Economica, 23(2), 76-87.
[15] Puspaningrum, A., Muhammad, F. P. B., & Mulyani, E. (2021). Flower
Pollination Algorithm for Software Effort Coefficients Optimization to Improve
Effort Estimation Accuracy. JUITA: Jurnal Informatika, 9(2), 139-144.
References
[16] Idri, A., Abnane, I., & Abran, A. (2018). Evaluating Pred (p) and
standardized accuracy criteria in software development effort
estimation. Journal of Software: Evolution and Process, 30(4), e1925.
[17] Shahpar, Z., Khatibi, V., & Khatibi Bardsiri, A. (2021). Hybrid PSO-
SA approach for feature weighting in analogy-based software project
effort estimation. Journal of AI and Data Mining, 9(3), 329-340.
[18] Benala, T. R., & Mall, R. (2018). DABE: differential evolution in
analogy-based software development effort estimation. Swarm and
Evolutionary Computation, 38, 158-172.
[19] Zakrani, A., Hain, M., & Idri, A. (2019). Improving software
development effort estimating using support vector regression and
feature selection. IAES International Journal of Artificial Intelligence,
8(4), 399.
[20] Jørgensen, M., & Halkjelsvik, T. (2020). Sequence effects in the
estimation of software development effort. Journal of Systems and
Software, 159, 110448.
References
[21] Shah, M. A., Jawawi, D. N. A., Isa, M. A., Younas, M., Abdelmaboud, A.,
& Sholichin, F. (2020). Ensembling artificial bee colony with analogy-based
estimation to improve software development effort prediction. IEEE Access, 8,
58402-58415.
[22] Rao RV, More KC (2017) Design optimization and analysis of selected
thermal devices using self-adaptive jaya algorithm.Energy Convers Manag
140:24–35
[23] LeCun, Y., Bengio, Y., and Hinton, G. Deep learning. 436-44.
Publications
• K. K. Beesetti, S. Bilgaiyan and B. S. P. Mishra, "A hybrid
feature selection method using multi-objective Jaya
algorithm," 2022 International Conference on Computing,
Communication and Power Technology (IC3P), 2022, pp.
232-237, doi: 10.1109/IC3P52835.2022.00056.
• K. K. Beesetti, S. Bilgaiyan and B. S. P. Mishra, “Software
Effort Estimation based on Extreme gradient boosting
algorithm and Modified Jaya optimization algorithm” in
Journal of Information and Software Technology (under
Review)

the application of machine lerning algorithm for SEE

  • 1.
    The Application of MachineLearning Algorithms for accurate Estimation of Software Development Effort (SDE) The Application of Machine Learning Algorithms for Estimation of Software Development Effort (SDE) Ph.D. Registration Seminar Under the Supervision of Dr. Saurabh Bilgaiyan Assistant Professor B Kiran Kumar (1981037) School of Computer Engineering, KIIT University, Bhubaneswar December 14, 2022 Introduction Motivation Objectives Basic Concepts Optimization Algorithms Machine Learning Related Work Proposed Work Plan of Action Conclusion References Publications
  • 2.
    Seminar Outline 1. Introduction 2.Basic Concepts 3. Optimization Algorithms 4. Machine Learning 5. Related Work 6. Proposed Work 7. Plan of Action 8. Conclusion 9. References 10.Publications
  • 3.
    Introduction • Software effortestimation deals with estimating the software effort, which is essential to build a software project [1] • It considers estimates of schedules, a probable sum of cost and manpower essential in building a software project • The software effort estimation approach will foretell the practical quantity of effort required to maintain and develop the software from undetermined, insufficient, noisy, and inconsistent data input [2]
  • 4.
    Motivation • Over thepast few decades, the much-needed response to the question of how accurately, the cost of the project can be measured or forecasted at the start and construction phase of the project is very important. • If the project is not estimated at the early stage then the probability of project failure is higher. • According to latest surveys, the key explanation for the failure of the project is accuracy in estimating prices, in the starting phases. Both underestimation and overestimation of the projects can cause severe problems in software development. [3] • It is the explanation for enhancing the planning to estimate the best exact value for the project's performance at optimized cost and effort.
  • 5.
    Objectives Based on themotivations outlined in the previous section, we set the following objectives for our research work:  To develop a novel feature selection method using optimization algorithm  To propose a feature selection ensemble model for the accurate prediction of software effort estimation  Improving predictive ability of software effort estimation using stacking of deep learning algorithms on software datasets.
  • 6.
    Optimizations Algorithms • Inthe Artificial Intelligence, the search for the optimal solution in the search space is one of the common issues in almost all sub-fields such as machine learning, deep learning[4]. • The traditional search techniques depend on the heuristic concepts which are problem-dependent trying to search for approximate solutions often with less focus on the quality[5]. • Meta heuristic-based algorithms attract the attentions of artificial intelligence research communities due to its impressive capability in manipulating the search space of optimization problems [6]
  • 7.
    Optimizations Algorithms(Cont..) • Metaheuristic-based algorithms are conventionally categorized in several ways [7]: – Nature-inspired vs. non-nature inspired – population-based vs. single point search, – dynamic vs. static objective function, – single vs. various neighborhood structures, – memory usage vs. memory-less. • Recently, the research community combined these algorithms in categories based on the natural inspirations of the Metaheuristic algorithm such as evolutionary algorithms (EAs), Local Search (LSA), Swarm intelligence, physical based, chemical-based, human-based[8]
  • 8.
  • 9.
    Machine learning Techniques •The following machine learning techniques are applied over the various datasets considered to calculate the effort of a software product. • The decision about choosing a machine learning technique for implementation purpose in the proposed research is performed based on the past research study done in the literature survey[9]. • Some of the machine learning techniques are decision tree, SVR, Random forest, SGB, XGB, ANN etc .
  • 10.
    Related work Optimizing SoftwareEffort Estimation Models Using Firefly Algorithm (Ghatasheh et al. 2019) Comments: • In this work, Firefly Algorithm is proposed as a meta-heuristic optimization method for optimizing the parameters of three COCOMO-based models. • These models include the basic COCOMO model and other two models proposed in the literature as extensions of the basic COCOMO model. • The developed estimation models are evaluated using different evaluation metrics. An effective approach for software project effort and duration estimation with machine learning algorithms (Pospieszny et al. 2018) Comments: • This paper introduces an ensemble averaging model using three machine learning algorithms (Support Vector Machines, Neural Networks and Generalized Linear Models)
  • 11.
    Related work On thevalue of parameter tuning in heterogeneous ensembles effort estimation (Hosni et al., 2018) Comments: ➢ In this paper, heterogeneous ensembles based on four well- known machine learning techniques (K-nearest neighbor, support vector regression, multilayer perceptron and decision trees) were developed and evaluated by investigating the impact of parameter values of the ensemble members on estimation accuracy. ➢ In particular, this paper evaluates whether setting ensemble parameters using two optimization techniques (e.g., grid search optimization and particle swarm) permits more accurate estimates of SDEE. ➢ The heterogeneous ensembles of this study were built using three combination rules (mean, median and inverse ranked weighted mean) over seven software datasets.
  • 12.
    Related work(cont..) • Rankovicet al [1] proposed a new approach to software effort estimation using different artificial neural network architectures and Taguchi orthogonal arrays. • Minku, L. L [2] proposed a novel online supervised hyper parameter tuning procedure applied to cross-company software effort estimation. • Ezghari, S., & Zahi, A. [3] proposed software effort estimation using a consistent fuzzy analogy-based method. • Ghatasheh, N., Faris, et al[4] proposed Optimizing technique in software effort estimation models using the firefly algorithm.
  • 13.
    Related work(cont..) • Fávero,E. M. D. B., et al[5] proposed A model for software effort estimation using pre-trained embedding models • Sehra, S. K. et al [6]. proposed Software effort estimation using FAHP and weighted kernel LSSVM machine. • Xia, T., Krishna, et al[7] proposed a Hyper-parameter optimization for effort estimation. • Kaushik, A., & Singal, N. [8] proposed A hybrid model of wavelet neural network and metaheuristic algorithm for software development effort estimation.
  • 14.
    Major Findings FromThe Literature Survey Based On The Evolutionary Algorithms (EA) and Swarm Optimization Algorithms: • Hyper parameters tuning is crucial as they control the overall behavior of a machine learning model. Every machine learning models will have different hyper parameters that can be set. • A hyper parameter is a parameter whose value is set before the learning process begins. • But all EA’s and PSO algorithms depends on parameter tuning. • For this reason, evolutionary algorithms are best employed on problems where it is difficult or impossible to test for optimality. • This also means that an evolutionary algorithm never knows for certain when to stop, aside from the length of time, or the number of iterations or candidate solutions, that you wish to allow it to explore [10-15].
  • 15.
    Major Findings FromThe Literature Survey Based On The Evolutionary Algorithms And Swarm Optimization Algorithms[16-21]: • All the evolutionary and swarm intelligence based algorithms are probabilistic algorithms and require common controlling parameters like population size, number of generations, elite size, etc. • Besides the common control parameters, different algorithms require their own algorithm-specific control parameters. • The improper tuning of algorithm-specific parameters either increases the computational effort or yields the local optimal solution.
  • 16.
    Key control parametersof popular evolutionary algorithms:
  • 17.
    Proposed work Software EffortEstimation based on Extreme gradient boosting algorithm and Modified Jaya optimization algorithm • Context: Software development effort estimation is regarded as a crucial activity for managing project cost, time, and quality as well as for the software development life cycle. As a result, proper estimating is crucial to the success of projects and to lower risks. • Method: Therefore in this work, we propose a Modified Jaya algorithm to improve the effectiveness of the estimated model, modified JOA selects the ideal subset of components from a large feature collection. Then machine learning based Extreme gradient boosting algorithm is used to estimate the software effort.
  • 18.
  • 19.
    Jaya Algorithm • Itcan be used for maximization or minimization of a function • It is a population based method which repeatedly modifies a population of individual solutions • To reduce the search space and gives optimal solutions • Formulated for solving constrained and unconstrained optimization problem • Based on the concept that the solution obtained for a given problem should move towards the best solution and should avoid the worst solution[22] Common control parameters Population size No of generations(iterations)
  • 20.
  • 21.
    Pseudo code: ModifiedJOA Input: • n- Population size or number of candidates • m- Number of design variables or generators • Z- Maximum number of iteration • X- Population Output: • best solution of the cost or fitness function (j) • Worst solution of the objective or fitness function (j) • the best candidate for the fitness function • worst candidate for fitness function
  • 22.
    Pseudo code: ModifiedJOA Start For k=1 to n do For i=1 to m do Initialize population (Xi,k) End End Compute x_best, x_worst Set l=1 While the maximum number of iterations (Z) For k=1 to n do For i=1 to m do Initialize the population End End Update End
  • 23.
    Pseudo code: ModifiedJOA If the solution 𝑋′𝑖,𝑘,𝑙 is better than 𝑋𝑖,𝑘,𝑙 𝑠𝑒𝑡 𝑋𝑖,𝑘,𝑙=𝑋′𝑖,𝑘,𝑙 Else 𝑠𝑒𝑡 𝑋𝑖,𝑘,𝑙+1=𝑋′𝑖,𝑘,𝑙 End Set l=l+1 Update the weight parameter (w) Update x_best and x_worst End while End
  • 24.
    • To improvethe effectiveness of the estimated model, modified JOA selects the ideal subset of components from a large feature collection. • It merely highlights the most crucial elements during the selection process. To reduce the computing cost of the problem, it eliminates redundant or pointless features to produce a good subset of features. • Any optimization problem should focus on accelerating convergence and identifying the optimal value of the cost (or goal) function (J). • To be effective, an evolutionary algorithm must continuously strike a balance between global and local search. To speed up the convergence of the computation of the optimal solution, a weight parameter (w) is introduced to the original Jaya algorithm's .
  • 25.
    • While Zis the entire amount of iterations, l is the current iteration, and w is the weight parameter, which declines linearly throughout the course of the algorithm from a relatively high value to a low one. 𝑤𝑓 and 𝑤0 are the highest and lower values of w, respectively. • Objective function Error=α*error+β(selected features/total number of features) (1)
  • 26.
    • The majorreason for introducing this weight component is to quicken the search so that the difference can be reduced more quickly. • To speed up the convergence of the computation of the optimal solution, a weight parameter (w) is introduced to the original Jaya algorithm's • The result is that the enhanced Jaya optimization reduces the number of searching agents, iterations, and processing load while also improving search performance. • The updated Jaya algorithm's search ability is constantly adjusted to balance global and local search by modifying the weight
  • 27.
    Feature selection usingensemble ML techniques • Insignificant variables may impair the performance of machine learning systems. • In order to eliminate those having little influence on the target variables, selection operator (LASSO) and stepwise regression were used in addition to Pearson correlation. • The findings showed that every one of them has the potential to affect predictions of effort and duration • This work presents an efficient method for developing effort and duration models using machine learning algorithms, based on current practical research findings and best practices • Additionally, the generated input dataset is used for modeling with the usage of cross validation, ensemble averaging, and three ML algorithms (SVM, RF, and XGB).
  • 28.
    Stacking of deeplearning (DL)algorithms • Deep Learning, a subfield of Machine Learning, was pioneered by Hinton, LeCun, Bengio and Andrew Ng, and gained popularity from around 2010, with their high potential being established with the availability of large- scale data and high-speed computing devices[23] • This works presents an efficient method for developing effort and duration models using deep learning algorithms, based on current practical research findings and best practices. • The DL algorithms include CNN and LSTM models, and stacking of DL algorithms based on optimal features selected from optimization algorithm applied. • The stacking of these model is theoretically expected to give a better performance than individual regressors. These models are being experimented upon and the parameters adjusted according to performance.
  • 29.
  • 30.
    Performance evaluation Metrics •Mean Absolute Error (MAE): Error magnitude can be easily determined using the MAE. The absolute differences between the values anticipated and those observed make up the majority of them. 0 is the finest quality value for the measure, which has a range from 0 to infinite. (2) In this case, e represents the error of the ith observation out of the n occurrences that are being examined
  • 31.
    Performance evaluation Metrics •Root Mean Square Error (RMSE): The root means square error (RMSE) calculates the average quadratic difference between a model's predicted values and the actual values. This last value is the best one that can be used because the measure is always positive or zero. (3) While "p" stands for the predicted values and "a" for the actual value, there is a standard deviation for "n" occurrences
  • 32.
    Performance evaluation Metrics •Prediction Accuracy (PRED): The percentile notion serves as the foundation for the metric PRED. The percentage of predictions within a t% of the true value of a. PRED calculates the proportion of estimates that fall within the tolerance limits of 25%, or typical estimates for t = 25. (4)
  • 33.
    Performance evaluation Metrics •Accuracy: Accuracy can be termed as number of correct predictions by total number of predictions. • For classifications accuracy is calculated in terms of positives and negatives as shown below: 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝑇𝑁 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 (6)
  • 34.
    Comparison of theproposed algorithm with GA, PSO, and ACO in terms with accuracy: Dataset ML Technique Accuracy Kemer FSGA 0.90 FSPSO 0.89 ACO 0.90 Proposed 0.99 Desharnais FSGA 0.89 FSPSO 0.86 ACO 0.91 Proposed 0.96 Maxwel FSGA 0.91 FSPSO 0.89 ACO 0.88 Proposed 0.92 China FSGA 0.93 FSPSO 0.88 ACO 0.91 Proposed 0.93
  • 35.
    Proposed thesis workflow Topropose a feature selection ensemble model for the accurate prediction of software effort estimation To develop a novel feature selection method using optimization algorithm Improving predictive ability of software effort estimation using stacking of deep learning algorithms on software datasets. Task 1 Task 3 Task 2
  • 36.
    Plan of Action Workcompleted • Course work((Research Methodology, DMDW, High Performance computing (HPC), Algorithms and Complexity) • Literature survey • To develop a novel feature selection method using optimization algorithm Work to de done • To propose a feature selection ensemble model for the accurate prediction of software effort estimation • Improving predictive ability of software effort estimation using stacking of deep learning algorithms on software datasets. • Thesis writing • Synopsis seminar
  • 37.
    Conclusion • Finding areliable and accurate estimate of the software development effort has always been difficult in software engineering. • The area would benefit immensely from an accurate estimate of the project's first effort and expense. • Modified JOA is proposed to select the optimal features. • It eliminates redundant or pointless features to produce a good subset of features and it reduces the computational cost. • This works aims to show ensemble models provide more accurate results than single base learners models. • This work aims to explore the applicability of stacking modern deep learning architectures for the purpose of SDEE
  • 38.
    References [1] Rankovic, N.,Rankovic, D., Ivanovic, M., & Lazic, L. (2021). A new approach to software effort estimation using different artificial neural network architectures and Taguchi orthogonal arrays. IEEE Access, 9, 26926-26936. [2] Minku, L. L. (2019). A novel online supervised hyperparameter tuning procedure applied to cross-company software effort estimation. Empirical Software Engineering, 24(5), 3153-3204. [3]Ezghari, S., & Zahi, A. (2018). Uncertainty management in software effort estimation using a consistent fuzzy analogy-based method. Applied Soft Computing, 67, 540-557. [4] Ghatasheh, N., Faris, H., Aljarah, I., & Al-Sayyed, R. M. (2019). Optimizing software effort estimation models using the firefly algorithm. arXiv preprint arXiv:1903.02079. [5] Fávero, E. M. D. B., Casanova, D., & Pimentel, A. R. (2022). SE3M: A model for software effort estimation using pre-trained embedding models. Information and Software Technology, 147, 106886. [6]Sehra, S. K., Brar, Y. S., Kaur, N., & Sehra, S. S. (2019). Software effort estimation using FAHP and weighted kernel LSSVM machine. Soft Computing, 23(21), 10881-10900.
  • 39.
    References [7]. Xia, T.,Krishna, R., Chen, J., Mathew, G., Shen, X., & Menzies, T. (2018). Hyperparameter optimization for effort estimation. arXiv preprint arXiv:1805.00336. [8]. Kaushik, A., & Singal, N. (2019). A hybrid model of wavelet neural network and metaheuristic algorithm for software development effort estimation. International Journal of Information Technology, 1-10. [9]. Phan, H., & Jannesari, A. (2022). Heterogeneous Graph Neural Networks for Software Effort Estimation. arXiv preprint arXiv:2206.11023. [10]. Rhmann, W. (2021). An ensemble of Hybrid Search-Based algorithms for Software effort prediction. International Journal of Software Science and Computational Intelligence (IJSSCI), 13(3), 28-37. [11]. Alhammad, N. K. L., Alzaghoul, E., Alzaghoul, F. A., & Akour, M. (2020). Evolutionary neural network classifiers for software effort estimation. International Journal of Computer Aided Engineering and Technology, 12(4), 495-512.
  • 40.
    References [12] De Carvalho,H. D. P., Fagundes, R., & Santos, W. (2021). Extreme Learning Machine Applied to Software Development Effort Estimation. IEEE Access, 9, 92676-92687. [13] Padmaja, M., & Haritha, D. (2018). Software effort estimation using grey relational analysis with K-Means clustering. In Information Systems Design and Intelligent Applications (pp. 924-933). Springer, Singapore. [14]Predescu, E. F., Tefan, A., & Zaharia, A. V. (2019). Software effort estimation using multilayer perceptron and long short term memory. Informatica Economica, 23(2), 76-87. [15] Puspaningrum, A., Muhammad, F. P. B., & Mulyani, E. (2021). Flower Pollination Algorithm for Software Effort Coefficients Optimization to Improve Effort Estimation Accuracy. JUITA: Jurnal Informatika, 9(2), 139-144.
  • 41.
    References [16] Idri, A.,Abnane, I., & Abran, A. (2018). Evaluating Pred (p) and standardized accuracy criteria in software development effort estimation. Journal of Software: Evolution and Process, 30(4), e1925. [17] Shahpar, Z., Khatibi, V., & Khatibi Bardsiri, A. (2021). Hybrid PSO- SA approach for feature weighting in analogy-based software project effort estimation. Journal of AI and Data Mining, 9(3), 329-340. [18] Benala, T. R., & Mall, R. (2018). DABE: differential evolution in analogy-based software development effort estimation. Swarm and Evolutionary Computation, 38, 158-172. [19] Zakrani, A., Hain, M., & Idri, A. (2019). Improving software development effort estimating using support vector regression and feature selection. IAES International Journal of Artificial Intelligence, 8(4), 399. [20] Jørgensen, M., & Halkjelsvik, T. (2020). Sequence effects in the estimation of software development effort. Journal of Systems and Software, 159, 110448.
  • 42.
    References [21] Shah, M.A., Jawawi, D. N. A., Isa, M. A., Younas, M., Abdelmaboud, A., & Sholichin, F. (2020). Ensembling artificial bee colony with analogy-based estimation to improve software development effort prediction. IEEE Access, 8, 58402-58415. [22] Rao RV, More KC (2017) Design optimization and analysis of selected thermal devices using self-adaptive jaya algorithm.Energy Convers Manag 140:24–35 [23] LeCun, Y., Bengio, Y., and Hinton, G. Deep learning. 436-44.
  • 43.
    Publications • K. K.Beesetti, S. Bilgaiyan and B. S. P. Mishra, "A hybrid feature selection method using multi-objective Jaya algorithm," 2022 International Conference on Computing, Communication and Power Technology (IC3P), 2022, pp. 232-237, doi: 10.1109/IC3P52835.2022.00056. • K. K. Beesetti, S. Bilgaiyan and B. S. P. Mishra, “Software Effort Estimation based on Extreme gradient boosting algorithm and Modified Jaya optimization algorithm” in Journal of Information and Software Technology (under Review)