the application of machine lerning algorithm for SEE

The Application of
Machine Learning
Algorithms for accurate
Estimation of Software
Development Effort (SDE)
The Application of Machine Learning
Algorithms for Estimation of
Software Development Effort (SDE)
Ph.D. Registration Seminar
Under the Supervision
of
Dr. Saurabh Bilgaiyan
Assistant Professor
B Kiran Kumar (1981037)
School of Computer Engineering, KIIT
University, Bhubaneswar
December 14, 2022
Introduction
Motivation
Objectives
Basic Concepts
Optimization
Algorithms
Machine
Learning
Related Work
Proposed Work
Plan of Action
Conclusion
References
Publications

Seminar Outline
1. Introduction
2. Basic Concepts
3. Optimization Algorithms
4. Machine Learning
5. Related Work
6. Proposed Work
7. Plan of Action
8. Conclusion
9. References
10.Publications

Introduction
• Software effort estimation deals with estimating the
software effort, which is essential to build a software
project [1]
• It considers estimates of schedules, a probable sum of
cost and manpower essential in building a software
project
• The software effort estimation approach will foretell the
practical quantity of effort required to maintain and
develop the software from undetermined, insufficient,
noisy, and inconsistent data input [2]

Motivation
• Over the past few decades, the much-needed response
to the question of how accurately, the cost of the
project can be measured or forecasted at the start and
construction phase of the project is very important.
• If the project is not estimated at the early stage then
the probability of project failure is higher.
• According to latest surveys, the key explanation for the
failure of the project is accuracy in estimating prices, in
the starting phases. Both underestimation and
overestimation of the projects can cause severe
problems in software development. [3]
• It is the explanation for enhancing the planning to
estimate the best exact value for the project's
performance at optimized cost and effort.

Objectives
Based on the motivations outlined in the previous
section, we set the following objectives for our research
work:
 To develop a novel feature selection method using
optimization algorithm
 To propose a feature selection ensemble model for the
accurate prediction of software effort estimation
 Improving predictive ability of software effort estimation
using stacking of deep learning algorithms on software
datasets.

Optimizations Algorithms
• In the Artificial Intelligence, the search for the optimal
solution in the search space is one of the common issues
in almost all sub-fields such as machine learning, deep
learning[4].
• The traditional search techniques depend on the
heuristic concepts which are problem-dependent trying
to search for approximate solutions often with less focus
on the quality[5].
• Meta heuristic-based algorithms attract the attentions of
artificial intelligence research communities due to its
impressive capability in manipulating the search space of
optimization problems [6]

Optimizations Algorithms(Cont..)
• Meta heuristic-based algorithms are conventionally
categorized in several ways [7]:
– Nature-inspired vs. non-nature inspired
– population-based vs. single point search,
– dynamic vs. static objective function,
– single vs. various neighborhood structures,
– memory usage vs. memory-less.
• Recently, the research community combined these
algorithms in categories based on the natural inspirations
of the Metaheuristic algorithm such as evolutionary
algorithms (EAs), Local Search (LSA), Swarm intelligence,
physical based, chemical-based, human-based[8]

Optimizations Algorithms(Cont..)

Machine learning Techniques
• The following machine learning techniques are applied over
the various datasets considered to calculate the effort of a
software product.
• The decision about choosing a machine learning technique
for implementation purpose in the proposed research is
performed based on the past research study done in the
literature survey[9].
• Some of the machine learning techniques are decision tree,
SVR, Random forest, SGB, XGB, ANN etc .

Related work
Optimizing Software Effort
Estimation Models Using Firefly
Algorithm (Ghatasheh et al. 2019)
Comments:
• In this work, Firefly Algorithm
is proposed as a meta-heuristic
optimization method for
optimizing the parameters of
three COCOMO-based models.
• These models include the basic
COCOMO model and other two
models proposed in the
literature as extensions of the
basic COCOMO model.
• The developed estimation
models are evaluated using
different evaluation metrics.
An effective approach for
software project effort and
duration estimation with
machine learning algorithms
(Pospieszny et al. 2018)
Comments:
• This paper introduces an
ensemble averaging model
using three machine learning
algorithms (Support Vector
Machines, Neural Networks
and Generalized Linear
Models)

Related work
On the value of parameter tuning in heterogeneous
ensembles effort estimation (Hosni et al., 2018)
Comments:
➢ In this paper, heterogeneous ensembles based on four well-
known machine learning techniques (K-nearest neighbor,
support vector regression, multilayer perceptron and decision
trees) were developed and evaluated by investigating the impact
of parameter values of the ensemble members on estimation
accuracy.
➢ In particular, this paper evaluates whether setting ensemble
parameters using two optimization techniques (e.g., grid search
optimization and particle swarm) permits more accurate
estimates of SDEE.
➢ The heterogeneous ensembles of this study were built using
three combination rules (mean, median and inverse ranked
weighted mean) over seven software datasets.

Related work(cont..)
• Rankovic et al [1] proposed a new approach to software
effort estimation using different artificial neural network
architectures and Taguchi orthogonal arrays.
• Minku, L. L [2] proposed a novel online supervised hyper
parameter tuning procedure applied to cross-company
software effort estimation.
• Ezghari, S., & Zahi, A. [3] proposed software effort
estimation using a consistent fuzzy analogy-based method.
• Ghatasheh, N., Faris, et al[4] proposed Optimizing technique
in software effort estimation models using the firefly
algorithm.

Related work(cont..)
• Fávero, E. M. D. B., et al[5] proposed A model for software
effort estimation using pre-trained embedding models
• Sehra, S. K. et al [6]. proposed Software effort estimation
using FAHP and weighted kernel LSSVM machine.
• Xia, T., Krishna, et al[7] proposed a Hyper-parameter
optimization for effort estimation.
• Kaushik, A., & Singal, N. [8] proposed A hybrid model of
wavelet neural network and metaheuristic algorithm for
software development effort estimation.

Major Findings From The Literature Survey Based On The
Evolutionary Algorithms (EA) and Swarm Optimization
Algorithms:
• Hyper parameters tuning is crucial as they control the
overall behavior of a machine learning model. Every
machine learning models will have different hyper parameters
that can be set.
• A hyper parameter is a parameter whose value is set before
the learning process begins.
• But all EA’s and PSO algorithms depends on parameter
tuning.
• For this reason, evolutionary algorithms are best employed on
problems where it is difficult or impossible to test for
optimality.
• This also means that an evolutionary algorithm never knows
for certain when to stop, aside from the length of time, or the
number of iterations or candidate solutions, that you wish to
allow it to explore [10-15].

Major Findings From The Literature Survey Based On The
Evolutionary Algorithms And Swarm Optimization
Algorithms[16-21]:
• All the evolutionary and swarm intelligence based
algorithms are probabilistic algorithms and require
common controlling parameters like population size,
number of generations, elite size, etc.
• Besides the common control parameters, different
algorithms require their own algorithm-specific control
parameters.
• The improper tuning of algorithm-specific parameters
either increases the computational effort or yields the local
optimal solution.

Key control parameters of popular evolutionary algorithms:

Proposed work
Software Effort Estimation based on Extreme gradient
boosting algorithm and Modified Jaya optimization
algorithm
• Context: Software development effort estimation is regarded
as a crucial activity for managing project cost, time, and
quality as well as for the software development life cycle. As
a result, proper estimating is crucial to the success of
projects and to lower risks.
• Method: Therefore in this work, we propose a Modified Jaya
algorithm to improve the effectiveness of the estimated
model, modified JOA selects the ideal subset of components
from a large feature collection. Then machine learning based
Extreme gradient boosting algorithm is used to estimate the
software effort.

Jaya Algorithm
• It can be used for maximization or minimization of a
function
• It is a population based method which repeatedly modifies a
population of individual solutions
• To reduce the search space and gives optimal solutions
• Formulated for solving constrained and unconstrained
optimization problem
• Based on the concept that the solution obtained for a given
problem should move towards the best solution and should
avoid the worst solution[22]
Common control parameters
Population size
No of generations(iterations)

Pseudo code: Modified JOA
Input:
• n- Population size or number of candidates
• m- Number of design variables or generators
• Z- Maximum number of iteration
• X- Population
Output:
• best solution of the cost or fitness function (j)
• Worst solution of the objective or fitness function (j)
• the best candidate for the fitness function
• worst candidate for fitness function

Start
For k=1 to n do
For i=1 to m do
Initialize population (Xi,k)
End
End
Compute x_best, x_worst
Set l=1
While the maximum number of iterations (Z)
For k=1 to n do
For i=1 to m do
Initialize the population
End
End
Update
End

If the solution 𝑋′𝑖,𝑘,𝑙 is better than 𝑋𝑖,𝑘,𝑙
𝑠𝑒𝑡 𝑋𝑖,𝑘,𝑙=𝑋′𝑖,𝑘,𝑙
Else
𝑠𝑒𝑡 𝑋𝑖,𝑘,𝑙+1=𝑋′𝑖,𝑘,𝑙
End
Set l=l+1
Update the weight parameter (w)
Update x_best and x_worst
End while
End

• To improve the effectiveness of the estimated model,
modified JOA selects the ideal subset of components from a
large feature collection.
• It merely highlights the most crucial elements during the
selection process. To reduce the computing cost of the
problem, it eliminates redundant or pointless features to
produce a good subset of features.
• Any optimization problem should focus on accelerating
convergence and identifying the optimal value of the cost (or
goal) function (J).
• To be effective, an evolutionary algorithm must continuously
strike a balance between global and local search. To speed
up the convergence of the computation of the optimal
solution, a weight parameter (w) is introduced to the original
Jaya algorithm's .

• While Z is the entire amount of iterations, l is the current
iteration, and w is the weight parameter, which declines
linearly throughout the course of the algorithm from a
relatively high value to a low one. 𝑤𝑓 and 𝑤0 are the
highest and lower values of w, respectively.
• Objective function
Error=α*error+β(selected features/total number of features)
(1)

• The major reason for introducing this weight component is
to quicken the search so that the difference can be reduced
more quickly.
• To speed up the convergence of the computation of the
optimal solution, a weight parameter (w) is introduced to
the original Jaya algorithm's
• The result is that the enhanced Jaya optimization reduces
the number of searching agents, iterations, and processing
load while also improving search performance.
• The updated Jaya algorithm's search ability is constantly
adjusted to balance global and local search by modifying
the weight

Feature selection using ensemble ML
techniques
• Insignificant variables may impair the performance of
machine learning systems.
• In order to eliminate those having little influence on the
target variables, selection operator (LASSO) and stepwise
regression were used in addition to Pearson correlation.
• The findings showed that every one of them has the
potential to affect predictions of effort and duration
• This work presents an efficient method for developing effort
and duration models using machine learning algorithms,
based on current practical research findings and best
practices
• Additionally, the generated input dataset is used for
modeling with the usage of cross validation, ensemble
averaging, and three ML algorithms (SVM, RF, and XGB).

Stacking of deep learning (DL)algorithms
• Deep Learning, a subfield of Machine Learning, was
pioneered by Hinton, LeCun, Bengio and Andrew Ng, and
gained popularity from around 2010, with their high
potential being established with the availability of large-
scale data and high-speed computing devices[23]
• This works presents an efficient method for developing
effort and duration models using deep learning algorithms,
based on current practical research findings and best
practices.
• The DL algorithms include CNN and LSTM models, and
stacking of DL algorithms based on optimal features
selected from optimization algorithm applied.
• The stacking of these model is theoretically expected to
give a better performance than individual regressors. These
models are being experimented upon and the parameters
adjusted according to performance.

Performance evaluation Metrics
• Mean Absolute Error (MAE): Error magnitude can be
easily determined using the MAE. The absolute
differences between the values anticipated and those
observed make up the majority of them. 0 is the finest
quality value for the measure, which has a range from 0
to infinite.
(2)
In this case, e represents the error of the ith observation out
of the n occurrences that are being examined

• Root Mean Square Error (RMSE): The root means
square error (RMSE) calculates the average quadratic
difference between a model's predicted values and the
actual values. This last value is the best one that can be
used because the measure is always positive or zero.
(3)
While "p" stands for the predicted values and "a" for the
actual value, there is a standard deviation for "n"
occurrences

• Prediction Accuracy (PRED): The percentile notion
serves as the foundation for the metric PRED. The
percentage of predictions within a t% of the true value of
a. PRED calculates the proportion of estimates that fall
within the tolerance limits of 25%, or typical estimates
for t = 25.
(4)

• Accuracy: Accuracy can be termed as number of correct
predictions by total number of predictions.
• For classifications accuracy is calculated in terms of
positives and negatives as shown below:
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
(6)

Comparison of the proposed algorithm with GA, PSO, and
ACO in terms with accuracy:
Dataset ML Technique Accuracy
Kemer
FSGA 0.90
FSPSO 0.89
ACO 0.90
Proposed 0.99
Desharnais
FSGA 0.89
FSPSO 0.86
ACO 0.91
Proposed 0.96
Maxwel
FSGA 0.91
FSPSO 0.89
ACO 0.88
Proposed 0.92
China
FSGA 0.93
FSPSO 0.88
ACO 0.91
Proposed 0.93

Proposed thesis workflow
To propose a feature selection
ensemble model for the accurate
prediction of software effort
estimation
To develop a novel feature selection
method using optimization
algorithm
Improving predictive ability of
software effort estimation using
stacking of deep learning
algorithms on software datasets.
Task 1
Task 3
Task 2

Plan of Action
Work completed
• Course work((Research Methodology, DMDW, High
Performance computing (HPC), Algorithms and Complexity)
• Literature survey
• To develop a novel feature selection method using
optimization algorithm
Work to de done
• To propose a feature selection ensemble model for the
accurate prediction of software effort estimation
• Improving predictive ability of software effort estimation
using stacking of deep learning algorithms on software
datasets.
• Thesis writing
• Synopsis seminar

Conclusion
• Finding a reliable and accurate estimate of the software
development effort has always been difficult in software
engineering.
• The area would benefit immensely from an accurate
estimate of the project's first effort and expense.
• Modified JOA is proposed to select the optimal features.
• It eliminates redundant or pointless features to produce a
good subset of features and it reduces the computational
cost.
• This works aims to show ensemble models provide more
accurate results than single base learners models.
• This work aims to explore the applicability of stacking
modern deep learning architectures for the purpose of
SDEE

References
[1] Rankovic, N., Rankovic, D., Ivanovic, M., & Lazic, L. (2021). A new
approach to software effort estimation using different artificial neural network
architectures and Taguchi orthogonal arrays. IEEE Access, 9, 26926-26936.
[2] Minku, L. L. (2019). A novel online supervised hyperparameter tuning
procedure applied to cross-company software effort estimation. Empirical
Software Engineering, 24(5), 3153-3204.
[3]Ezghari, S., & Zahi, A. (2018). Uncertainty management in software effort
estimation using a consistent fuzzy analogy-based method. Applied Soft
Computing, 67, 540-557.
[4] Ghatasheh, N., Faris, H., Aljarah, I., & Al-Sayyed, R. M. (2019). Optimizing
software effort estimation models using the firefly algorithm. arXiv preprint
arXiv:1903.02079.
[5] Fávero, E. M. D. B., Casanova, D., & Pimentel, A. R. (2022). SE3M: A
model for software effort estimation using pre-trained embedding models.
Information and Software Technology, 147, 106886.
[6]Sehra, S. K., Brar, Y. S., Kaur, N., & Sehra, S. S. (2019). Software effort
estimation using FAHP and weighted kernel LSSVM machine. Soft Computing,
23(21), 10881-10900.

References
[7]. Xia, T., Krishna, R., Chen, J., Mathew, G., Shen, X., & Menzies, T.
(2018). Hyperparameter optimization for effort estimation. arXiv preprint
arXiv:1805.00336.
[8]. Kaushik, A., & Singal, N. (2019). A hybrid model of wavelet neural
network and metaheuristic algorithm for software development effort
estimation. International Journal of Information Technology, 1-10.
[9]. Phan, H., & Jannesari, A. (2022). Heterogeneous Graph Neural
Networks for Software Effort Estimation. arXiv preprint arXiv:2206.11023.
[10]. Rhmann, W. (2021). An ensemble of Hybrid Search-Based algorithms
for Software effort prediction. International Journal of Software Science and
Computational Intelligence (IJSSCI), 13(3), 28-37.
[11]. Alhammad, N. K. L., Alzaghoul, E., Alzaghoul, F. A., & Akour, M.
(2020). Evolutionary neural network classifiers for software effort estimation.
International Journal of Computer Aided Engineering and Technology, 12(4),
495-512.

References
[12] De Carvalho, H. D. P., Fagundes, R., & Santos, W. (2021). Extreme
Learning Machine Applied to Software Development Effort Estimation. IEEE
Access, 9, 92676-92687.
[13] Padmaja, M., & Haritha, D. (2018). Software effort estimation using grey
relational analysis with K-Means clustering. In Information Systems Design
and Intelligent Applications (pp. 924-933). Springer, Singapore.
[14]Predescu, E. F., Tefan, A., & Zaharia, A. V. (2019). Software effort
estimation using multilayer perceptron and long short term memory.
Informatica Economica, 23(2), 76-87.
[15] Puspaningrum, A., Muhammad, F. P. B., & Mulyani, E. (2021). Flower
Pollination Algorithm for Software Effort Coefficients Optimization to Improve
Effort Estimation Accuracy. JUITA: Jurnal Informatika, 9(2), 139-144.

References
[16] Idri, A., Abnane, I., & Abran, A. (2018). Evaluating Pred (p) and
standardized accuracy criteria in software development effort
estimation. Journal of Software: Evolution and Process, 30(4), e1925.
[17] Shahpar, Z., Khatibi, V., & Khatibi Bardsiri, A. (2021). Hybrid PSO-
SA approach for feature weighting in analogy-based software project
effort estimation. Journal of AI and Data Mining, 9(3), 329-340.
[18] Benala, T. R., & Mall, R. (2018). DABE: differential evolution in
analogy-based software development effort estimation. Swarm and
Evolutionary Computation, 38, 158-172.
[19] Zakrani, A., Hain, M., & Idri, A. (2019). Improving software
development effort estimating using support vector regression and
feature selection. IAES International Journal of Artificial Intelligence,
8(4), 399.
[20] Jørgensen, M., & Halkjelsvik, T. (2020). Sequence effects in the
estimation of software development effort. Journal of Systems and
Software, 159, 110448.

References
[21] Shah, M. A., Jawawi, D. N. A., Isa, M. A., Younas, M., Abdelmaboud, A.,
& Sholichin, F. (2020). Ensembling artificial bee colony with analogy-based
estimation to improve software development effort prediction. IEEE Access, 8,
58402-58415.
[22] Rao RV, More KC (2017) Design optimization and analysis of selected
thermal devices using self-adaptive jaya algorithm.Energy Convers Manag
140:24–35
[23] LeCun, Y., Bengio, Y., and Hinton, G. Deep learning. 436-44.

Publications
• K. K. Beesetti, S. Bilgaiyan and B. S. P. Mishra, "A hybrid
feature selection method using multi-objective Jaya
algorithm," 2022 International Conference on Computing,
Communication and Power Technology (IC3P), 2022, pp.
232-237, doi: 10.1109/IC3P52835.2022.00056.
• K. K. Beesetti, S. Bilgaiyan and B. S. P. Mishra, “Software
Effort Estimation based on Extreme gradient boosting
algorithm and Modified Jaya optimization algorithm” in
Journal of Information and Software Technology (under
Review)

the application of machine lerning algorithm for SEE

More Related Content

Similar to the application of machine lerning algorithm for SEE

Recently uploaded

the application of machine lerning algorithm for SEE