This document summarizes a research paper that examines the use of data mining techniques to predict software aging-related bugs from imbalanced datasets. The paper compares the performance of general data mining techniques versus techniques developed for imbalanced datasets on a real-world dataset of aging bugs found in MySQL software. The results show that techniques designed for imbalanced datasets, such as SMOTEbagging and MSMOTEboosting, performed better than general techniques at correctly predicting the minority class of data points related to aging bugs. The paper concludes that imbalanced dataset techniques are more useful for predicting rare aging bugs from imbalanced software bug datasets.
Comparative Performance Analysis of Machine Learning Techniques for Software ...csandit
Machine learning techniques can be used to analyse data from different perspectives and enable
developers to retrieve useful information. Machine learning techniques are proven to be useful
in terms of software bug prediction. In this paper, a comparative performance analysis of
different machine learning techniques is explored for software bug prediction on public
available data sets. Results showed most of the machine learning methods performed well on
software bug datasets.
Ijartes v2-i1-001Evaluation of Changeability Indicator in Component Based Sof...IJARTES
The maintaining of software system is a major
cost concern. The maintaining of a software system depends
on how the changes made to it. The maintainability of a system
depending on the folw of software, its design pattern and
CBSS. In Maintainability phase of a sotware system there are
4 parts, like analyzing, testing, stability, and changes made to
it. In some side areas, these systems emerged very rapidly.
There are many companies which purchase software instead
of developing it .These companies do not have any interst in
the testing of the system but wants to like smoothness in the
flow of the system during changes.
Changeability is one of the characteristics of maintainability.
Software changeability is associated with refactoring which
makes code simpler and easier to maintain (enable all
programmers to improve their code).Factors that affect
changeability include coupling between the modules, lack of
code comments, naming of functions and variables.
Basically,”changeabilty” is the ability of a product or software
to be able to change the structure of the program. It is the rate
the product allows the modification to its components.
In this paper changeability based cost estimation is done.
Initially we take four components; these components are
evaluated based on the coupling, cohesion and Interface
metrix. Next some changes are made to the existing
components and than again these components are evaluated.
Now, on the basis of these two evaluations some conclusion is
made for changeability cost.
Machine Learning approaches are good in solving problems that have less information. In most cases, the
software domain problems characterize as a process of learning that depend on the various circumstances
and changes accordingly. A predictive model is constructed by using machine learning approaches and
classified them into defective and non-defective modules. Machine learning techniques help developers to
retrieve useful information after the classification and enable them to analyse data from different
perspectives. Machine learning techniques are proven to be useful in terms of software bug prediction. This
study used public available data sets of software modules and provides comparative performance analysis
of different machine learning techniques for software bug prediction. Results showed most of the machine
learning methods performed well on software bug datasets.
Software Defect Prediction Using Radial Basis and Probabilistic Neural NetworksEditor IJCATR
Defects in modules of software systems is a major problem in software development. There are a variety of data mining
techniques used to predict software defects such as regression, association rules, clustering, and classification. This paper is concerned
with classification based software defect prediction. This paper investigates the effectiveness of using a radial basis function neural
network and a probabilistic neural network on prediction accuracy and defect prediction. The conclusions to be drawn from this work is
that the neural networks used in here provide an acceptable level of accuracy but a poor defect prediction ability. Probabilistic neural
networks perform consistently better with respect to the two performance measures used across all datasets. It may be advisable to use
a range of software defect prediction models to complement each other rather than relying on a single technique.
Software Defect Prediction Using Local and Global AnalysisEditor IJMTER
The software defect factors are used to measure the quality of the software. The software
effort estimation is used to measure the effort required for the software development process. The defect
factor makes an impact on the software development effort. Software development and cost factors are
also decided with reference to the defect and effort factors. The software defects are predicted with
reference to the module information. Module link information are used in the effort estimation process.
Data mining techniques are used in the software analysis process. Clustering techniques are used
in the property grouping process. Rule mining methods are used to learn rules from clustered data
values. The “WHERE” clustering scheme and “WHICH” rule mining scheme are used in the defect
prediction and effort estimation process. The system uses the module information for the defect
prediction and effort estimation process.
The proposed system is designed to improve the defect prediction and effort estimation process.
The Single Objective Genetic Algorithm (SOGA) is used in the clustering process. The rule learning
operations are carried out sing the Apriori algorithm. The system improves the cluster accuracy levels.
The defect prediction and effort estimation accuracy is also improved by the system. The system is
developed using the Java language and Oracle relation database environment.
REGULARIZED FUZZY NEURAL NETWORKS TO AID EFFORT FORECASTING IN THE CONSTRUCTI...ijaia
Predicting the time to build software is a very complex task for software engineering managers. There are complex factors that can directly interfere with the productivity of the development team. Factors directly related to the complexity of the system to be developed drastically change the time necessary for the completion of the works with the software factories. This work proposes the use of a hybrid system based on artificial neural networks and fuzzy systems to assist in the construction of an expert system based on rules to support in the prediction of hours destined to the development of software according to the complexity of the elements present in the same. The set of fuzzy rules obtained by the system helps the management and control of software development by providing a base of interpretable estimates based on fuzzy rules. The model was submitted to tests on a real database, and its results were promissory in the construction of an aid mechanism in the predictability of the software construction
Comparative Performance Analysis of Machine Learning Techniques for Software ...csandit
Machine learning techniques can be used to analyse data from different perspectives and enable
developers to retrieve useful information. Machine learning techniques are proven to be useful
in terms of software bug prediction. In this paper, a comparative performance analysis of
different machine learning techniques is explored for software bug prediction on public
available data sets. Results showed most of the machine learning methods performed well on
software bug datasets.
Ijartes v2-i1-001Evaluation of Changeability Indicator in Component Based Sof...IJARTES
The maintaining of software system is a major
cost concern. The maintaining of a software system depends
on how the changes made to it. The maintainability of a system
depending on the folw of software, its design pattern and
CBSS. In Maintainability phase of a sotware system there are
4 parts, like analyzing, testing, stability, and changes made to
it. In some side areas, these systems emerged very rapidly.
There are many companies which purchase software instead
of developing it .These companies do not have any interst in
the testing of the system but wants to like smoothness in the
flow of the system during changes.
Changeability is one of the characteristics of maintainability.
Software changeability is associated with refactoring which
makes code simpler and easier to maintain (enable all
programmers to improve their code).Factors that affect
changeability include coupling between the modules, lack of
code comments, naming of functions and variables.
Basically,”changeabilty” is the ability of a product or software
to be able to change the structure of the program. It is the rate
the product allows the modification to its components.
In this paper changeability based cost estimation is done.
Initially we take four components; these components are
evaluated based on the coupling, cohesion and Interface
metrix. Next some changes are made to the existing
components and than again these components are evaluated.
Now, on the basis of these two evaluations some conclusion is
made for changeability cost.
Machine Learning approaches are good in solving problems that have less information. In most cases, the
software domain problems characterize as a process of learning that depend on the various circumstances
and changes accordingly. A predictive model is constructed by using machine learning approaches and
classified them into defective and non-defective modules. Machine learning techniques help developers to
retrieve useful information after the classification and enable them to analyse data from different
perspectives. Machine learning techniques are proven to be useful in terms of software bug prediction. This
study used public available data sets of software modules and provides comparative performance analysis
of different machine learning techniques for software bug prediction. Results showed most of the machine
learning methods performed well on software bug datasets.
Software Defect Prediction Using Radial Basis and Probabilistic Neural NetworksEditor IJCATR
Defects in modules of software systems is a major problem in software development. There are a variety of data mining
techniques used to predict software defects such as regression, association rules, clustering, and classification. This paper is concerned
with classification based software defect prediction. This paper investigates the effectiveness of using a radial basis function neural
network and a probabilistic neural network on prediction accuracy and defect prediction. The conclusions to be drawn from this work is
that the neural networks used in here provide an acceptable level of accuracy but a poor defect prediction ability. Probabilistic neural
networks perform consistently better with respect to the two performance measures used across all datasets. It may be advisable to use
a range of software defect prediction models to complement each other rather than relying on a single technique.
Software Defect Prediction Using Local and Global AnalysisEditor IJMTER
The software defect factors are used to measure the quality of the software. The software
effort estimation is used to measure the effort required for the software development process. The defect
factor makes an impact on the software development effort. Software development and cost factors are
also decided with reference to the defect and effort factors. The software defects are predicted with
reference to the module information. Module link information are used in the effort estimation process.
Data mining techniques are used in the software analysis process. Clustering techniques are used
in the property grouping process. Rule mining methods are used to learn rules from clustered data
values. The “WHERE” clustering scheme and “WHICH” rule mining scheme are used in the defect
prediction and effort estimation process. The system uses the module information for the defect
prediction and effort estimation process.
The proposed system is designed to improve the defect prediction and effort estimation process.
The Single Objective Genetic Algorithm (SOGA) is used in the clustering process. The rule learning
operations are carried out sing the Apriori algorithm. The system improves the cluster accuracy levels.
The defect prediction and effort estimation accuracy is also improved by the system. The system is
developed using the Java language and Oracle relation database environment.
REGULARIZED FUZZY NEURAL NETWORKS TO AID EFFORT FORECASTING IN THE CONSTRUCTI...ijaia
Predicting the time to build software is a very complex task for software engineering managers. There are complex factors that can directly interfere with the productivity of the development team. Factors directly related to the complexity of the system to be developed drastically change the time necessary for the completion of the works with the software factories. This work proposes the use of a hybrid system based on artificial neural networks and fuzzy systems to assist in the construction of an expert system based on rules to support in the prediction of hours destined to the development of software according to the complexity of the elements present in the same. The set of fuzzy rules obtained by the system helps the management and control of software development by providing a base of interpretable estimates based on fuzzy rules. The model was submitted to tests on a real database, and its results were promissory in the construction of an aid mechanism in the predictability of the software construction
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...IJCNCJournal
The main issues of the Intrusion Detection Systems (IDS) are in the sensitivity of these systems toward the errors, the inconsistent and inequitable ways in which the evaluation processes of these systems were often performed. Most of the previous efforts concerned with improving the overall accuracy of these models via increasing the detection rate and decreasing the false alarm which is an important issue. Machine Learning (ML) algorithms can classify all or most of the records of the minor classes to one of the main classes with negligible impact on performance. The riskiness of the threats caused by the small classes and the shortcoming of the previous efforts were used to address this issue, in addition to the need for improving the performance of the IDSs were the motivations for this work. In this paper, stratified sampling method and different cost-function schemes were consolidated with Extreme Learning Machine (ELM) method with Kernels, Activation Functions to build competitive ID solutions that improved the performance of these systems and reduced the occurrence of the accuracy paradox problem. The main experiments were performed using the UNB ISCX2012 dataset. The experimental results of the UNB ISCX2012 dataset showed that ELM models with polynomial function outperform other models in overall accuracy, recall, and F-score. Also, it competed with traditional model in Normal, DoS and SSH classes.
An Empirical Comparison and Feature Reduction Performance Analysis of Intrusi...ijctcm
This paper reports on the empirical evaluation of five machine learning algorithm such as J48, BayesNet, OneR, NB and ZeroR using ten performance criteria: accuracy, precision, recall, F-Measure, incorrectly classified instances, kappa statistic, mean absolute error, root mean squared error, relative absolute error, root relative squared error. The aim of this paper is to find out which classifier is better in its performance for intrusion detection system. Machine Learning is one of the methods used in the intrusion detection system (IDS).Based on this study, it can be concluded that J48 decision tree is the most suitable associated algorithm than the other four algorithms. In this paper we compared the performance of Intrusion Detection System (IDS) Classifiers using seven feature reduction techniques.
Genetic fuzzy process metric measurement system for an operating systemijcseit
Operating system (Os) is the most essential software of the computer system,deprived ofit, the computer
system is totally useless. It is the frontier for assessing relevant computer resources. It performance greatly
enhances user overall objective across the system. Related literatures have try in different methods and
techniques to measure the process matric performance of the operating system but none has incorporated
the use of genetic algorithm and fuzzy logic in their varied techniques which indeed is a novel approach.
Extending the work of Michalis, this research focuses on measuring the process matrix performance of an
operating system utilizing set of operating system criteria’s while fusing fuzzy logic to handle
impreciseness and genetic for process optimization.
Knowledge or Rule based Expert systems systems are widely used in engineering applications and in problem-solving. Rapid development today has brought with it environmental problems that cause loss or destruction of natural resources. Environmental impact assessment (EIA) has been acknowledged as a powerful planning and decisionmaking tool to assess new development projects. It requires qualified personnel with special expertise and responsibility in their domain. Rule-based EIA systems incorporate expert’s knowledge and act as a device-giving system. The system has an advantage over human experts and can significantly reduce the complexity of a planning task like EIA.
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...ijcsa
The work is about using Simulated Annealing Algorithm for the effort estimation model parameter
optimization which can lead to the reduction in the difference in actual and estimated effort used in model
development.
The model has been tested using OOP’s dataset, obtained from NASA for research purpose.The data set
based model equation parameters have been found that consists of two independent variables, viz. Lines of
Code (LOC) along with one more attribute as a dependent variable related to software development effort
(DE). The results have been compared with the earlier work done by the author on Artificial Neural
Network (ANN) and Adaptive Neuro Fuzzy Inference System (ANFIS) and it has been observed that the
developed SA based model is more capable to provide better estimation of software development effort than
ANN and ANFIS
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDYIAEME Publication
In the present digital era massive amount of data is being continuously generated
at exceptional and increasing scales. This data has become an important and
indispensable part of every economy, industry, organization, business and individual.
Further handling of these large datasets due to the heterogeneity in their formats is
one of the major challenge. There is a need for efficient data processing techniques to
handle the heterogeneous data and also to meet the computational requirements to
process this huge volume of data. The objective of this paper is to review, describe
and reflect on heterogeneous data with its complexity in processing, and also the use
of machine learning algorithms which plays a major role in data analytics
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...ijsc
Artificial Intelligence and Machine Learning have been around for a long time. In recent years, there has been a surge in popularity for applications integrating AI and ML technology. As with traditional development, software testing is a critical component of a successful AI/ML application. The development methodology used in AI/ML contrasts significantly from traditional development. In light of these distinctions, various software testing challenges arise. The emphasis of this paper is on the challenge of effectively splitting the data into training and testing data sets. By applying a k-Means clustering strategy to the data set followed by a decision tree, we can significantly increase the likelihood of the training data set to represent the domain of the full dataset and thus avoid training a model that is likely to fail because it has only learned a subset of the full data domain.
A Review on Software Fault Detection and Prevention Mechanism in Software Dev...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Contributors to Reduce Maintainability Cost at the Software Implementation PhaseWaqas Tariq
Software maintenance is important and difficult to measure. The cost of maintenance is the most ever during the phases of software development. One of the most critical processes in software development is the reduction of software maintainability cost based on the quality of source code during design step, however, a lack of quality models and measures can help asses the quality attributes of software maintainability process. Software maintainability suffers from a number of challenges such as lack source code understanding, quality of software code, and adherence to programming standards in maintenance. This work describes model based-factors to assess the software maintenance, explains the steps followed to obtain and validate them. Such a method can be used to eliminate the software maintenance cost. The research results will enhance the quality of the source code. It will increase software understandability, eliminate maintenance time, cost, and give confidence for software reusability.
On applications of Soft Computing Assisted Analysis for Software ReliabilityAM Publications
Developing high quality reliable software is one of the main challenges in software industry. Software
Reliability is a key concern of many users and developers of software. Demand for software reliability requires robust
modeling techniques for software quality prediction. Software reliability models are very useful to estimate the
probability of failure of software along with the time. In this study we review the available literature on software
reliability. We have also elicited the current trends, existing problems, specific difficulties, future directions and open
areas for research.
A Defect Prediction Model for Software Product based on ANFISIJSRD
Artificial intelligence techniques are day by day getting involvement in all the classification and prediction based process like environmental monitoring, stock exchange conditions, biomedical diagnosis, software engineering etc. However still there are yet to be simplify the challenges of selecting training criteria for design of artificial intelligence models used for prediction of results. This work focus on the defect prediction mechanism development using software metric data of KC1.We have taken subtractive clustering approach for generation of fuzzy inference system (FIS).The FIS rules are generated at different radius of influence of input attribute vectors and the developed rules are further modified by ANFIS technique to obtain the prediction of number of defects in software project using fuzzy logic system.
Implementation of Vertical Handoff Algorithm between IEEE 802.11 WLAN & CDMA ...IOSR Journals
Wireless communications is the fastest growing segment of the communications industry. Everyone
wants the quality of services anytime & anywhere. Wireless networks can integrate various heterogeneous radio
access technologies as GSM, WLAN, Wimax etc. WiMAX is an IP based, wireless broadband access technology
that provides performance similar to 802.11/Wi-Fi networks with the coverage and QOS (quality of service) of
cellular networks. WiMAX is also an acronym meaning "Worldwide Interoperability for Microwave Access
(WiMAX). The main promise of interconnecting these heterogeneous networks is to provide high performance in
achieving a high data rate and support real time applications. These services require various networks (such as
CDMA2000 and Wireless LAN) to be integrated into IP-based networks, which further require a seamless
vertical handoff to 4th generation wireless networks. When a mobile host (MH) changes its point of attachment,
its IP address gets changed. MH should be able to maintain all the existing connections using the new IP
address. This process of changing a connection from one IP address to another one in IP network is called
handoff. Vertical handoff is switching from one network to another while maintaining the session. Vertical
Handoff (VHO) is a major concern for different heterogeneous networks. VHO can be user requested or based
on some criteria already designed by the researcher of that particular algorithm. The main objective is to
implement efficient & effective handoff scheme between two heterogeneous network ie. 802.11 WLAN &
CDMA
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...IJCNCJournal
The main issues of the Intrusion Detection Systems (IDS) are in the sensitivity of these systems toward the errors, the inconsistent and inequitable ways in which the evaluation processes of these systems were often performed. Most of the previous efforts concerned with improving the overall accuracy of these models via increasing the detection rate and decreasing the false alarm which is an important issue. Machine Learning (ML) algorithms can classify all or most of the records of the minor classes to one of the main classes with negligible impact on performance. The riskiness of the threats caused by the small classes and the shortcoming of the previous efforts were used to address this issue, in addition to the need for improving the performance of the IDSs were the motivations for this work. In this paper, stratified sampling method and different cost-function schemes were consolidated with Extreme Learning Machine (ELM) method with Kernels, Activation Functions to build competitive ID solutions that improved the performance of these systems and reduced the occurrence of the accuracy paradox problem. The main experiments were performed using the UNB ISCX2012 dataset. The experimental results of the UNB ISCX2012 dataset showed that ELM models with polynomial function outperform other models in overall accuracy, recall, and F-score. Also, it competed with traditional model in Normal, DoS and SSH classes.
An Empirical Comparison and Feature Reduction Performance Analysis of Intrusi...ijctcm
This paper reports on the empirical evaluation of five machine learning algorithm such as J48, BayesNet, OneR, NB and ZeroR using ten performance criteria: accuracy, precision, recall, F-Measure, incorrectly classified instances, kappa statistic, mean absolute error, root mean squared error, relative absolute error, root relative squared error. The aim of this paper is to find out which classifier is better in its performance for intrusion detection system. Machine Learning is one of the methods used in the intrusion detection system (IDS).Based on this study, it can be concluded that J48 decision tree is the most suitable associated algorithm than the other four algorithms. In this paper we compared the performance of Intrusion Detection System (IDS) Classifiers using seven feature reduction techniques.
Genetic fuzzy process metric measurement system for an operating systemijcseit
Operating system (Os) is the most essential software of the computer system,deprived ofit, the computer
system is totally useless. It is the frontier for assessing relevant computer resources. It performance greatly
enhances user overall objective across the system. Related literatures have try in different methods and
techniques to measure the process matric performance of the operating system but none has incorporated
the use of genetic algorithm and fuzzy logic in their varied techniques which indeed is a novel approach.
Extending the work of Michalis, this research focuses on measuring the process matrix performance of an
operating system utilizing set of operating system criteria’s while fusing fuzzy logic to handle
impreciseness and genetic for process optimization.
Knowledge or Rule based Expert systems systems are widely used in engineering applications and in problem-solving. Rapid development today has brought with it environmental problems that cause loss or destruction of natural resources. Environmental impact assessment (EIA) has been acknowledged as a powerful planning and decisionmaking tool to assess new development projects. It requires qualified personnel with special expertise and responsibility in their domain. Rule-based EIA systems incorporate expert’s knowledge and act as a device-giving system. The system has an advantage over human experts and can significantly reduce the complexity of a planning task like EIA.
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...ijcsa
The work is about using Simulated Annealing Algorithm for the effort estimation model parameter
optimization which can lead to the reduction in the difference in actual and estimated effort used in model
development.
The model has been tested using OOP’s dataset, obtained from NASA for research purpose.The data set
based model equation parameters have been found that consists of two independent variables, viz. Lines of
Code (LOC) along with one more attribute as a dependent variable related to software development effort
(DE). The results have been compared with the earlier work done by the author on Artificial Neural
Network (ANN) and Adaptive Neuro Fuzzy Inference System (ANFIS) and it has been observed that the
developed SA based model is more capable to provide better estimation of software development effort than
ANN and ANFIS
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDYIAEME Publication
In the present digital era massive amount of data is being continuously generated
at exceptional and increasing scales. This data has become an important and
indispensable part of every economy, industry, organization, business and individual.
Further handling of these large datasets due to the heterogeneity in their formats is
one of the major challenge. There is a need for efficient data processing techniques to
handle the heterogeneous data and also to meet the computational requirements to
process this huge volume of data. The objective of this paper is to review, describe
and reflect on heterogeneous data with its complexity in processing, and also the use
of machine learning algorithms which plays a major role in data analytics
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...ijsc
Artificial Intelligence and Machine Learning have been around for a long time. In recent years, there has been a surge in popularity for applications integrating AI and ML technology. As with traditional development, software testing is a critical component of a successful AI/ML application. The development methodology used in AI/ML contrasts significantly from traditional development. In light of these distinctions, various software testing challenges arise. The emphasis of this paper is on the challenge of effectively splitting the data into training and testing data sets. By applying a k-Means clustering strategy to the data set followed by a decision tree, we can significantly increase the likelihood of the training data set to represent the domain of the full dataset and thus avoid training a model that is likely to fail because it has only learned a subset of the full data domain.
A Review on Software Fault Detection and Prevention Mechanism in Software Dev...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Contributors to Reduce Maintainability Cost at the Software Implementation PhaseWaqas Tariq
Software maintenance is important and difficult to measure. The cost of maintenance is the most ever during the phases of software development. One of the most critical processes in software development is the reduction of software maintainability cost based on the quality of source code during design step, however, a lack of quality models and measures can help asses the quality attributes of software maintainability process. Software maintainability suffers from a number of challenges such as lack source code understanding, quality of software code, and adherence to programming standards in maintenance. This work describes model based-factors to assess the software maintenance, explains the steps followed to obtain and validate them. Such a method can be used to eliminate the software maintenance cost. The research results will enhance the quality of the source code. It will increase software understandability, eliminate maintenance time, cost, and give confidence for software reusability.
On applications of Soft Computing Assisted Analysis for Software ReliabilityAM Publications
Developing high quality reliable software is one of the main challenges in software industry. Software
Reliability is a key concern of many users and developers of software. Demand for software reliability requires robust
modeling techniques for software quality prediction. Software reliability models are very useful to estimate the
probability of failure of software along with the time. In this study we review the available literature on software
reliability. We have also elicited the current trends, existing problems, specific difficulties, future directions and open
areas for research.
A Defect Prediction Model for Software Product based on ANFISIJSRD
Artificial intelligence techniques are day by day getting involvement in all the classification and prediction based process like environmental monitoring, stock exchange conditions, biomedical diagnosis, software engineering etc. However still there are yet to be simplify the challenges of selecting training criteria for design of artificial intelligence models used for prediction of results. This work focus on the defect prediction mechanism development using software metric data of KC1.We have taken subtractive clustering approach for generation of fuzzy inference system (FIS).The FIS rules are generated at different radius of influence of input attribute vectors and the developed rules are further modified by ANFIS technique to obtain the prediction of number of defects in software project using fuzzy logic system.
Implementation of Vertical Handoff Algorithm between IEEE 802.11 WLAN & CDMA ...IOSR Journals
Wireless communications is the fastest growing segment of the communications industry. Everyone
wants the quality of services anytime & anywhere. Wireless networks can integrate various heterogeneous radio
access technologies as GSM, WLAN, Wimax etc. WiMAX is an IP based, wireless broadband access technology
that provides performance similar to 802.11/Wi-Fi networks with the coverage and QOS (quality of service) of
cellular networks. WiMAX is also an acronym meaning "Worldwide Interoperability for Microwave Access
(WiMAX). The main promise of interconnecting these heterogeneous networks is to provide high performance in
achieving a high data rate and support real time applications. These services require various networks (such as
CDMA2000 and Wireless LAN) to be integrated into IP-based networks, which further require a seamless
vertical handoff to 4th generation wireless networks. When a mobile host (MH) changes its point of attachment,
its IP address gets changed. MH should be able to maintain all the existing connections using the new IP
address. This process of changing a connection from one IP address to another one in IP network is called
handoff. Vertical handoff is switching from one network to another while maintaining the session. Vertical
Handoff (VHO) is a major concern for different heterogeneous networks. VHO can be user requested or based
on some criteria already designed by the researcher of that particular algorithm. The main objective is to
implement efficient & effective handoff scheme between two heterogeneous network ie. 802.11 WLAN &
CDMA
Field Programmable Gate Array for Data Processing in Medical SystemsIOSR Journals
Abstract: Two- dimensional &Three–dimensional (3-D) image segmentation is on of the most demanding tasks in image processing. It has been proven that only the 14-neighborship of a rhombic dodecahedron can satisfy the aforementioned requirements. The 3-D-GSC process is executed in the following three phases ,coding phases,linking phases,splitting phases. An FPGA-based digital signal processing board optimized for applications needing large memory with high bandwidth has been developed and successfully used for the parallelization of a modern image segmentation algorithm for medical and industrial real-time applications.The Use of this 128-bit coprocessor board is not limited to image segmentation.We propose the perfectly parallelizable 3-D Gray-Value Structure Code (3-D-GSC) for image segmentation on a new FPGA custom machine. This 128-Bit FPGA coprocessing board features an up-to-date Virtex-II Pro architecture, two large independent DDR-SDRAM channels, two fast independent ZBT-SRAM channels, and PCI-X bus and CameraLink interfaces. Key words: Field Programmable Gate Array, Segme-ntation, Vogel bruch, Gray-valve structure code ,Homogeneous, Linking phase, Coding phase, Splitting phase, SDRAM.
Reduction of Side Lobes by Using Complementary Codes for Radar ApplicationIOSR Journals
The analysis of new types of Complementary direct sequence complex signals which have synthesized
with well – know code sequences like Barker, Walsh, Golay, and complementary codes. Build on the
autocorrelation function (ACF) and ambiguity function (AF) of signals, the numerical method estimates the
volume of side lobes separately for each signal. The results obtained show that the signals, which have a low
volume of side lobes, means of approximately zero in by using complementary codes with compare to different
codes
Development of software defect prediction system using artificial neural networkIJAAS Team
Software testing is an activity to enable a system is bug free during execution process. The software bug prediction is one of the most encouraging exercises of the testing phase of the software improvement life cycle. In any case, in this paper, a framework was created to anticipate the modules that deformity inclined in order to be utilized to all the more likely organize software quality affirmation exertion. Genetic Algorithm was used to extract relevant features from the acquired datasets to eliminate the possibility of overfitting and the relevant features were classified to defective or otherwise modules using the Artificial Neural Network. The system was executed in MATLAB (R2018a) Runtime environment utilizing a statistical toolkit and the performance of the system was assessed dependent on the accuracy, precision, recall, and the f-score to check the effectiveness of the system. In the finish of the led explores, the outcome indicated that ECLIPSE JDT CORE, ECLIPSE PDE UI, EQUINOX FRAMEWORK and LUCENE has the accuracy, precision, recall and the f-score of 86.93, 53.49, 79.31 and 63.89% respectively, 83.28, 31.91, 45.45 and 37.50% respectively, 83.43, 57.69, 45.45 and 50.84% respectively and 91.30, 33.33, 50.00 and 40.00% respectively. This paper presents an improved software predictive system for the software defect detections.
Machine learning techniques can be used to analyse data from different perspectives and enable developers to retrieve useful information. Machine learning techniques are proven to be useful
in terms of software bug prediction. In this paper, a comparative performance analysis of
different machine learning techniques is explored for software bug prediction on public
available data sets. Results showed most of the machine learning methods performed well on
software bug datasets.
An analysis of software aging in cloud environment IJECEIAES
Cloud computing is the environment in which several virtual machines (VM) run concurrently on physical machines. The cloud computing infrastructure hosts multiple cloud services that communicate with each other using the interfaces. During operation, the software systems accumulate errors or garbage that leads to system failure and other hazardous consequences. This status is called software aging. Software aging happens because of memory fragmentation, resource consumption in large scale and accumulation of numerical error. Software aging degrads the performance that may result in system failure. This happens because of premature resource exhaustion. The errors that cause software agings are of special types and target the response time and its environment. This issue is to be resolved only during run time as it occurs because of the dynamic nature of the problem. To alleviate the impact of software aging, software rejuvenation technique is being used. Rejuvenation process reboots the system or reinitiates the softwares. Software rejuvenation removes accumulated error conditions, frees up deadlocks and defragments operating system resources like memory. Software aging and rejuvenation has generated a lot of research interest recently. This work reviews some of the research works related to detection of software aging and identifies research gaps.
Software Bug Detection Algorithm using Data mining TechniquesAM Publications
The main aim of software development is to develop high quality software and high quality software is
developed using enormous amount of software engineering data. The software engineering data can be used to gain
empirically based understanding of software development. The meaning full information can be extracted using
various data mining techniques. As Data Mining for Secure Software Engineering improves software productivity and
quality, software engineers are increasingly applying data mining algorithms to various software engineering tasks.
However mining software engineering data poses several challenges, requiring various algorithms to effectively mine
sequences, graphs and text from such data. Software engineering data includes code bases, execution traces,
historical code changes, mailing lists and bug data bases. They contains a wealth of information about a projectsstatus,
progress and evolution. Using well established data mining techniques, practitioners and researchers can
explore the potential of this valuable data in order to better manage their projects and do produce higher-quality
software systems that are delivered on time and within budget
Abstract-Software is ubiquitous in our daily life. It brings us great convenience and a big headache about software reliability as well: Software is never bug-free, and software bugs keep incurring monetary loss of even catastrophes. In the pursuit of better reliability, software engineering researchers found that huge amount of data in various forms can be collected from software systems, and these data, when properly analyzed, can help improve software reliability. Unfortunately, the huge volume of complex data renders the analysis of simple techniques incompetent; consequently, studies have been resorting to data mining for more effective analysis. In the past few years, we have witnessed many studies on mining for software reliability reported in data mining as well as software engineering forums. These studies either develop new or apply existing data mining techniques to tackle reliability problems from different angles. In order to keep data mining researchers abreast of the latest development in this growing research area, we propose this paper on data mining for software reliability. In this paper, we will present a comprehensive overview of this area, examine representative studies, and lay out challenges to data mining researchers.
Improvement of Software Maintenance and Reliability using Data Mining Techniquesijdmtaiir
Software is ubiquitous in our daily life. It brings us
great convenience and a big headache about software reliability
as well: Software is never bug-free, and software bugs keep
incurring monetary loss of even catastrophes. In the pursuit of
better reliability, software engineering researchers found that
huge amount of data in various forms can be collected from
software systems, and these data, when properly analyzed, can
help improve software reliability. Unfortunately, the huge
volume of complex data renders the analysis of simple
techniques incompetent; consequently, studies have been
resorting to data mining for more effective analysis. In the past
few years, we have witnessed many studies on mining for
software reliability reported in data mining as well as software
engineering forums. These studies either develop new or apply
existing data mining techniques to tackle reliability problems
from different angles. In order to keep data mining researchers
abreast of the latest development in this growing research area,
we propose this paper on data mining for software reliability.
In this paper, we will present a comprehensive overview of this
area, examine representative studies, and lay out challenges to
data mining researchers
In the present paper, applicability and
capability of A.I techniques for effort estimation prediction has
been investigated. It is seen that neuro fuzzy models are very
robust, characterized by fast computation, capable of handling
the distorted data. Due to the presence of data non-linearity, it is
an efficient quantitative tool to predict effort estimation. The one
hidden layer network has been developed named as OHLANFIS
using MATLAB simulation environment.
Here the initial parameters of the OHLANFIS are
identified using the subtractive clustering method. Parameters of
the Gaussian membership function are optimally determined
using the hybrid learning algorithm. From the analysis it is seen
that the Effort Estimation prediction model developed using
OHLANFIS technique has been able to perform well over normal
ANFIS Model.
FROM THE ART OF SOFTWARE TESTING TO TEST-AS-A-SERVICE IN CLOUD COMPUTINGijseajournal
Researchers consider that the first edition of the book "The Art of Software Testing" by Myers (1979)
initiated research in Software Testing. Since then, software testing has gone through evolutions that have
driven standards and tools. This evolution has accompanied the complexity and variety of software
deployment platforms. The migration to the cloud allowed benefits such as scalability, agility, and better
return on investment. Cloud computing requires more significant involvement in software testing to ensure
that services work as expected. In addition to testing cloud applications, cloud computing has paved the
way for testing in the Test-as-a-Service model. This review aims to understand software testing in the
context of cloud computing. Based on the knowledge explained here, we sought to linearize the evolution of
software testing, characterizing fundamental points and allowing us to compose a synthesis of the body of
knowledge in software testing, expanded by the cloud computing paradigm.
From the Art of Software Testing to Test-as-a-Service in Cloud Computingijseajournal
Researchers consider that the first edition of the book "The Art of Software Testing" by Myers (1979)
initiated research in Software Testing. Since then, software testing has gone through evolutions that have
driven standards and tools. This evolution has accompanied the complexity and variety of software
deployment platforms. The migration to the cloud allowed benefits such as scalability, agility, and better
return on investment. Cloud computing requires more significant involvement in software testing to ensure
that services work as expected. In addition to testing cloud applications, cloud computing has paved the
way for testing in the Test-as-a-Service model. This review aims to understand software testing in the
context of cloud computing. Based on the knowledge explained here, we sought to linearize the evolution of
software testing, characterizing fundamental points and allowing us to compose a synthesis of the body of
knowledge in software testing, expanded by the cloud computing paradigm.
DETERMINING THE RISKY SOFTWARE PROJECTS USING ARTIFICIAL NEURAL NETWORKSijseajournal
Determining risky software projects early is a very important factor for project success. In this study it is aimed to choose the most correctly resulting modelling method that will be useful for early prediction of risky software projects to help companies to avoid losing time and money on unsuccessful projects and also facing legal requirements because of not being able to fullfill their responsibilites to their customers While making the research for this subject, it is seen that in previous researches, usually traditional modelling techniques were preferred. But it is observed that these methods were mostly resulted with high misclassification ratio. To overcome this problem, this study proposes a three-layered neural network (NN) architecture with a backpropagation algorithm. NN architecture was trained by using two different data sets which were OMRON data set (collected by OMRON) and 2016-2020 ES.LV data set (collected by the authors) separately. For the made of this study firstly the most relevant classification method (Gaussian Naive Bayes Algorithm) and the most relevant neural network method (Scaled Conjugate Gradient Backpropagation Algorithm) was chosen and both data sets were trained by using each method seperately for the purpose of observing which type of modelling architecture would give better results. Experimental results of this study showed that the neural network approach is useful for predicting whether a project is risky or not risky.
Job Failure Analysis in Mainframes Production Supportinventionjournals
A major part of batch processing on mainframe computers consists of several thousand batch jobs which run every day. This network of jobs runs every day to update day-to-day transaction. There are frequent failures which can cause a high delay in the batch and also degrade the performance & efficiency of the application. Permanent solution can be done to frequently occurring job failures to avoid the delay in batch and to improve performance & efficiency of the application. In this paper, we have analyzed the frequently occurring batch job failure recorded in Know Error Databases (KEBD) for past one year based on different categories. Frequently failed jobs obtained are categorized based on application, failure-type, job-runs and the resolution. Different results are obtained in the weka tool based on the different categories. From the various results obtained it can be concluded that the frequent failures are occurring in MSD application. On further analysis on this frequently failed jobs showed that data and network issue are causing the major job failures in which most of the jobs were daily processing jobs. In order to fix the failure the jobs was resolved by restarting the job from the overrides or by restarting the job from the top
TOWARDS PREDICTING SOFTWARE DEFECTS WITH CLUSTERING TECHNIQUESijaia
The purpose of software defect prediction is to improve the quality of a software project by building a
predictive model to decide whether a software module is or is not fault prone. In recent years, much
research in using machine learning techniques in this topic has been performed. Our aim was to evaluate
the performance of clustering techniques with feature selection schemes to address the problem of software
defect prediction problem. We analysed the National Aeronautics and Space Administration (NASA)
dataset benchmarks using three clustering algorithms: (1) Farthest First, (2) X-Means, and (3) selforganizing map (SOM). In order to evaluate different feature selection algorithms, this article presents a
comparative analysis involving software defects prediction based on Bat, Cuckoo, Grey Wolf Optimizer
(GWO), and particle swarm optimizer (PSO). The results obtained with the proposed clustering models
enabled us to build an efficient predictive model with a satisfactory detection rate and acceptable number
of features.
Evolutionary Computing Driven Extreme Learning Machine for Objected Oriented ...ShahanawajAhamad1
To fulfill user expectations, the rapid evolution of software
techniques and approaches has necessitated reliable and flawless
software operations. Aging prediction in the software under
operation is becoming a basic and unavoidable requirement for
ensuring the systems’ availability, reliability, and operations. In
this paper, an improved evolutionary computing-driven extreme
learning scheme (ECD-ELM) has been suggested for objectoriented software aging prediction. To perform aging prediction,
we employed a variety of metrics, including program size,
McCube complexity metrics, Halstead metrics, runtime failure
event metrics, and some unique aging-related metrics (ARM). In
our suggested paradigm, extracting OOP software metrics is done
after pre-processing, which includes outlier detection and
normalization. This technique improved our proposed system's
ability to deal with instances with unbalanced biases and metrics.
Further, different dimensional reduction and feature selection
algorithms such as principal component analysis (PCA), linear
discriminant analysis (LDA), and T-Test analysis have been
applied. We have suggested a single hidden layer multi-feed
forward neural network (SL-MFNN) based ELM, where an
adaptive genetic algorithm (AGA) has been applied to estimate the
weight and bias parameters for ELM learning. Unlike the
traditional neural networks model, the implementation of GAbased ELM with LDA feature selection has outperformed other
aging prediction approaches in terms of prediction accuracy,
precision, recall, and F-measure. The results affirm that the
implementation of outlier detection, normalization of imbalanced
metrics, LDA-based feature selection, and GA-based ELM can be
the reliable solution for object-oriented software aging prediction
A LOG-BASED TRACE AND REPLAY TOOL INTEGRATING SOFTWARE AND INFRASTRUCTUREijseajournal
We propose a log-based analysis tool for evaluating web application computer system. A feature of the tool is an integration software log with infrastructure log. Software engineers alone can resolve system faults in the tool, even if the faults are complicated by both software problems and infrastructure problems. The tool consists of 5 steps: preparation software, preparation infrastructure, collecting logs, replaying the log data, and tracing the log data. The tool was applied to a simple web application system in a small-scale
local area network. We confirmed usefulness of the tool when a software engineer detects faults of the system failures such as “404” and “no response” errors. In addition, the tool was partially applied to a real large-scale computer system with many web applications and large network environment. Using the replaying and the tracing in the tool, we found causes of a real authentication error. The causes were combined an infrastructure problem with a software problem. Even if the failure is caused by not only a software problem but also an infrastructure problem, we confirmed that software engineers distinguish between a software problem and an infrastructure problem using the tool.
With the emergence of virtualization and cloud computing technologies, several services are housed on virtualization platform. Virtualization is the technology that many cloud service providers rely on for efficient management and coordination of the resource pool. As essential services are also housed on cloud platform, it is necessary to ensure continuous availability by implementing all necessary measures. Windows Active Directory is one such service that Microsoft developed for Windows domain networks. It is included in Windows Server operating systems as a set of processes and services for authentication and authorization of users and computers in a Windows domain type network. The service is required to run continuously without downtime. As a result, there are chances of accumulation of errors or garbage leading to software aging which in turn may lead to system failure and associated consequences. This results in software aging. In this work, software aging patterns of Windows active directory service is studied. Software aging of active directory needs to be predicted properly so that rejuvenation can be triggered to ensure continuous service delivery. In order to predict the accurate time, a model that uses time series forecasting technique is built.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Monitoring Java Application Security with JDK Tools and JFR Events
E018132735
1. IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. III (Jan – Feb. 2016), PP 27-35
www.iosrjournals.org
DOI: 10.9790/0661-18132735 www.iosrjournals.org 27 | Page
Predicting software aging related bugs from imbalanced datasets
by using data mining techniques
Amir Ahmad
Faculty of Computing and Information Technology, King Abdulaziz University, Rabigh, Saudi Arabia
Abstract: Software aging bugs are related with the life-span of the software. Rebooting is one of the solutions
of this problem, however, it is time consuming and causes resources loss. It is difficult to detect these bugs
during the time-limited software testing process. Data mining techniques can be useful to predict whether a
piece of software has aging related bugs or not. The available datasets of software aging bugs present a
challenge as they are imbalanced datasets. In these datasets, the number of data points with bugs is very small
as compared to the number of data points with no bugs. It is important to predict the rare class (Bugs). In this
paper we carried out experiment with a dataset containing data points related to aging-related bugs found in an
open-source project MySQL DBMS. Data mining techniques developed for imbalanced datasets were compared
with general data mining techniques. Various performance measures were used for the comparative study. The
results suggest that data mining techniques developed for imbalanced datasets are more useful for correct
prediction of data points related to aging related bugs. Data mining techniques developed for imbalanced
datasets performed better than general data mining techniques on G-mean measure which is an important
performance measure for imbalanced datasets.
Keywords: Aging-related Bugs, Data mining, Imbalanced datasets, Performance measures.
I. Introduction
Software engineering deals with design, development and maintenance of software [1, 2]. Software
bugs are defects which occur due to error in software design or software development process [3]. Due to these
bugs, software produces incorrect results or does not produce expected results. As software plays a very
important part now a days, software bugs can lead to financial losses and effort losses. For example, in 1996,
the $1.0 billion rocket called Ariane 5 was destroyed because of a software bug [4]. Software bugs prevention is
the first step to quality assurance of software. Tasks like training, requirements and code reviews, and other
activities that promote quality software come under this step. Software testing is an important part of software
development process [1, 2]. It finds out software bugs in a given piece of software [1, 2]. In software testing
processes, software is tested by using given test cases. A large number of software bugs are generally detected
during software testing. The software design and software code are corrected to remove the bugs. However,
software testing is a time consuming process and all the possible test cases cannot be tested. In other words,
some bugs still may go undetected. Some bugs are detected by customers; the cost of removing these bugs is
higher. It is commonly believed that the earlier a defect is found, the cheaper it is to fix it. Efforts are being
made to make software testing faster, cheaper, and more reliable. Test automation is an important step in which
instead of human tester dedicated software is used for testing [5].
Software aging is related with the life-span of the software. As the software gets older it shows
degraded performance, an increased failure occurrence rate and system hanging or crashing [6, 7]. As the
runtime period of the software increases, its failure rate also increases. Aging effects in a system can only be
detected while the system is running. It occurs because of software faults, called Aging-Related Bugs (ARBs)
[6, 7]. However, due to the nature of this error (errors occur over time), it is difficult to correct the software
faults for good during the time-limited testing process. Rebooting is a short term solution of this problem. The
aging effect cannot be reversed without external intervention. ARBs can produce resource leakage, unterminated
threads and processes, unreleased files and locks, numerical errors, and disk fragmentation [6, 7].
Data mining techniques analyse huge datasets [8, 9]. Supervised learning and unsupervised learning are
two important data mining problems. In supervised learning, the output is given. The task is to create a model
from a given training data (data points with output) such that when a new data point without output comes, the
model predicts the output of the data point. When the output is class, the problem is called classification
problem. If the output is a real number, the problem is called a regression problem. Decision trees, Naïve Bayes
classifiers, neural networks etc. are popular classifiers [8, 9]. In clustering, the output of data points are not
provided, the task is to group data points in different clusters such that the data points in a cluster are more
similar than the data points in different clusters. K-means clustering, Hierarchical clustering etc. are popular
clustering methods [8, 9].
2. Predicting Software Aging Related Bugs From Imbalanced Datasets By Using Data Mini...
DOI: 10.9790/0661-18132735 www.iosrjournals.org 28 | Page
The problem discussed in the paper is a classification problem as we want to predict whether given
software has bugs or not.
Classifier ensembles are methods to improve the accuracy of classification algorithms. Classifier
ensembles are combination of accurate and diverse classifiers. The final result of an ensemble depends on output
of individual classifiers. Classifier ensembles generally perform better than single classifiers [10]. Bagging [11]
and Boosting [12] are two popular classifier ensembles.
Some datasets have class-imbalanced problem [13]. In two-class setting, most of the data points belong
to one class. This problem is also called imbalanced dataset problem. The aim of this problem is to detect a rare
but important case. The classifier algorithms may not perform well on these datasets. To overcome this problem,
different approaches have been suggested. Undersampling of majority class and oversampling of minority class
are two popular preprocessing approaches to overcome class-imbalanced problem [13]. A classifier algorithm
with a cost matrix is another approach to overcome this problem. The cost matrix gives more weightage to
classification error of minority class point. The decision threshold is shifted towards minority class. In other
words, the trained model is biased towards minority class [13].
Software repositories contain datasets about software projects. Mining these datasets can be used to
uncover interesting and actionable information about software projects [14, 15]. This information can be used by
software practitioners to improve the quality of software projects. Mining software engineering data has
emerged as a research direction in test automation. Application of data Mining software metrics have been
widely used in the past for predictive and explanatory purposes. These techniques try to find out relationship
between software metrics and the bugs in a program. Therefore, these techniques can predict which software
program is more likely to have bugs without doing any manual testing [14, 15].
Czibula et al. [16] apply association rules to predict software defects. Moeyersomsa et al. [17] use
decision tree ensembles with rule extraction algorithm for comprehensible software fault and effort prediction.
El-Sebakhy [18] applies functional networks forecasting framework to forecast software development efforts.
Dejaeger et al. [19] present a comparative study of various data mining techniques for software efforts. The
study suggests that Least Squares Regression in combination with a logarithmic transformation performs best.
Cotroneo et al. [20] use various data mining techniques to predict which source code files are more
prone to Aging-Related Bugs. Despite the fact that the datasets are imbalanced they use the normal data mining
techniques. In this paper, a detailed study of general classification algorithms and algorithms for imbalanced
data is carried out on an Aging-Related Bugs dataset.
This paper is organized in the following way. In Section 2, we discussed the data mining techniques
and the dataset used in the study. The results are discussed in Section 3. Conclusion and future work is presented
in Section 4.
II. Material & Methods
We used various general data mining techniques and data mining techniques for imbalanced datasets in
our study. In this section, we will discuss these techniques. We will also discuss about the dataset used in the
study.
2.1 General Data Mining Techniques
i. Decision Trees - Decision trees are very popular classifiers [8, 9]. They create rules which human can
understand easily. These trees are created by recursive partitioning. Started with root node, at each node
of the tree the data points available at the node are split in different braches depending upon the chosen
split criterion eg. Information Gain ratio, Gini index etc. Splitting process terminates after a particular
stopping criterion is reached. C4.5 [21] tree is a very popular decision tree, it uses Information Gain Ratio
criterion.
ii. Multilayers Perceptron (MLP) - Neural networks classifiers are inspired by biological neural networks
[8, 9]. They are interconnected nodes. These networks have weights which represent connection strengths
between neurons. These neural networks can approximate all kinds of decision boundaries [8, 9]. They are
very popular because of their high accuracy. However, they are slow. The selection of parameters is an
important step, improper parameters can reduce the accuracy of neural networks.
iii. Bagging - Bagging is a method to create classifier ensembles [11]. This is a general method and can be
used with any classifier. As discussed in Section 1, for classifier ensembles we need accurate and diverse
classifiers. Bagging creates diverse classifiers by resampling. In each round, the training datasets is created
by sampling the data from training data uniformly and with replacement. Hence, in each round some data
points are not selected whereas some data points are selected more than one time. This process create
diverse datasets, classifiers trained on these datasets are diverse and hence create accurate ensembles.
iv. AdaBoost.M1 - AdaBoost.M1 is a popular classifier ensembles method [12]. In each round of training,
weights are assigned to each data points such that the hard to classify data points will get more weights. In
other words, in subsequent rounds, classifiers try to concentrate more on difficult to predict data points. It
3. Predicting Software Aging Related Bugs From Imbalanced Datasets By Using Data Mini...
DOI: 10.9790/0661-18132735 www.iosrjournals.org 29 | Page
is a general method and work with all kinds of classifiers. However, it is more successful with decision
trees and neural networks. AdaBoost.M1 has problem with datasets with class noise.
2.2 Data mining techniques for imbalanced datasets
i. C4.5 for imbalanced datasets – Cost-sensitive classifiers are developed for imbalanced datasets [13].
For normal datasets, while calculating accuracy, all data points have same weights, whereas, in classifiers
with cost-sensitive matrix, the error of data points of minority class is given more weight. In other word,
it is better to have classification error for majority class data points than for minority class data points.
The classifier tries to minimize the error function with cost matrix. Ting [22] proposes cost-sensitive trees
in which the weights are assigned to each data point and the initial data point weights determine the type
of tree. The selection of cost-matrix is an important step in the algorithm.
ii. Multilayer Perceptron with Back Propagation Training Cost-Sensitive (NNCN) - Cost-sensitive
learning technique is also useful for neural networks. Zhou et al. [23] show that the threshold-moving
approach can be useful for neural networks for imbalanced datasets. For normal datasets the threshold for
classification is in the middle of the output range. Threshold-moving moves the output threshold toward
inexpensive classes such that data points with higher costs (minor class data points) become harder to be
misclassified.
iii. SMOTEbagging – Undersampling of majority class and over-sampling of minority class are used to
reduce the class imbalance. Chawla et al [24] propose an over-sampling approach (Synthetic Minority
Over-sampling Technique (SMOTE)) in which the minority class is over-sampled by creating “synthetic”
examples rather than by over-sampling with replacement. The minority class is over-sampled by taking
each minority class sample and introducing synthetic examples along the line segments joining any/all of
the k minority class nearest neighbours. They show that combining the proposed under-sampling
technique with general over-sampling method produce good results. In SMOTEbagging [25], in each
round a training dataset is created by bagging process and then the data imbalanced is removed by
SMOTE process. A classifier is trained on this dataset. Classifiers trained in different rounds are
combined to get an ensemble.
iv. SMOTEBoost – SMOTEboost combines SMOTE sampling technique with boosting [26]. Unlike
standard boosting, where all misclassified examples are given equal weights, in SMOTEBoost technique
the weights are updated such that the data points from the minority class are oversampled by creating
synthetic minority class examples.
v. MSMOTEbagging – Modified synthetic minority oversampling technique (MSMOTE) [27] is a
modified version of SMOTE. In this algorithm, the minority class data points are divided into three
subclasses, safe, border and latent noise instances by the calculation of the distances among all data
points. For safe data points, the algorithm randomly selects a data point from the k nearest neighbours; for
border data points, it only selects the nearest neighbour; for latent noise instances, it does nothing.
MSMOTE combines with bagging (as SMOTE combine with bagging to create SMOTEbagging) to
create MSMOTEbagging.
vi. MSMOTEboosting – MSMOTEboosting [28] combines the MSMOTE sampling method with
AdaBoost. This way the minority class is better represented in each round of AdaBoost. For many
datasets, MSMOTEboost has shown better results than SMOTEBoost.
vii. Underbagging – Undersampling technique is used to reduce the number of majority class data points so
that the data imbalanced can be reduced. In Underbagging, a training dataset is created by under-
sampling majority class randomly to build a classifier. Various diverse classifiers created by this method
are combined to create an Underbagging ensemble [29].
viii. Overbagging – To reduce data imbalance, oversampling is used to increase the number of minority class
data points. In Overbagging, a training dataset is created by oversampling minority class randomly to
build a classifier. These diverse classifiers are combined to get an ensemble. This method is called
Overbagging [25].
2.3 Dataset Used In The Study
Cotroneo et al. [20] present a data set for ARB found in MySQL_DBMS (dataset_mysql_innodb-
http://wpage.unina.it/roberto.natella/datasets/aging_related_bugs_metrics/dataset_mysql_innodb.arff). The data
set has 402 files. 370 files has no bugs (0), whereas 32 files has age related bugs (1). It can be considered as a
two class problem. It is an imbalanced data as the ratio of minority class to majority class is 1 to 11.6. Each file
is represented by 83 attributes. The first attribute is the name of the file. 49 attributes come from "Program size"
metrics for the file. These metrics are related to amount of lines of code, declarations, statements, and files. 18
attributes came from “McCabe's cyclomatic complexity" metrics for the file. These metrics represent the
control flow graph of functions and methods. "Halstead" metrics for the file are represented by 9 attributes.
4. Predicting Software Aging Related Bugs From Imbalanced Datasets By Using Data Mini...
DOI: 10.9790/0661-18132735 www.iosrjournals.org 30 | Page
These metrics represent operands and operators in the program. "Aging-related" metrics for the file are
represented by 6 attributes. These attributes are related with memory usage. For detail information about the
attributes readers may refer to [20].
III. Experiments
Experiments were carried out by using the KEEL software [30]. Table 1 presents the implementation of
different data mining algorithms. Defaults parameters were used in the experiments.
Table 1- Implementation of different data mining algorithms in KEEL software.
Name of the algorithm KEEL implementation
C4.5 [21] Decision trees
MLP-normal [8, 9] Multilayer Perceptron with Back propagation Training
Bagging [11] Bootstrap Aggregating with C4.5 Decision Tree as Base Classifier
Adaboost.M1 [12] Adaptive Boosting First Multi-Class Extension with C4.5 Decision Tree as Base
Classifier
C4.5-Imbalanced [22] C4.5 Cost-Sensitive
NNCS [23] Multilayer Perceptron with Back propagation Training Cost-Sensitive
SMOTEbagging [25] Synthetic Minority Over-sampling Technique Bagging with C4.5 Decision Tree as Base
Classifier
SMOTEadaboost [26] Synthetic Minority Over-sampling Technique Boosting with C4.5 Decision Tree as
Base Classifier
MSOMTEbagging [27] Modified Synthetic Minority Over-sampling Technique Bagging with C4.5 Decision
Tree as Base Classifier
MSMOTEboosting [28] Modified Synthetic Minority Over-sampling Technique Boost with C4.5 Decision Tree
as Base Classifier
Overbagging [25] Over-sampling Minority Classes Bagging with C4.5 Decision Tree as Base Classifier []
Underbagging [29] Under-sampling Minority Classes Bagging with C4.5 Decision Tree as Base Classifier
3.1 Performance Measures
There are many performance measures for classifiers. However, all performances measures are not
good for imbalanced datasets. First, we discuss various performance measures. Then, we will explain which
performance measures are good for imbalanced datasets [13]. The present problem is a two-class problem. The
minority class (Bug class) is taken as positive class whereas the majority class (Non-Bug class) is taken as
negative class. Table 2 presents the confusion matrix for this problem.
Table 2 - Confusion Matrix
True Class
Predicted
Class
Positive Negative
Positive True Positive (TP) False Positive (FP)
Negative False Negative (FN) True Negative (TN)
3.1.1 Definitions of various performance measures
1. Accuracy – Accuracy performance measure calculates the number of data points correctly predicted. It is
calculated by the following formula;
Accuracy = (TP + TN)/(TP + FN + FP + TN);
2. Recall (True positive rate (TPR)) – It calculates the fraction of positive class data points predicted as
positive. The following formula is used to calculate it;
Recall (TPR) = TP/(TP + FN);
3. True negative rate (TNR) = It calculates the fraction of negative class data points predicted as negative.
It is calculated by using the following formula;
TNR = TN/(TN + FP)
4. Average of TNR and TPR – The average of TNR and TPR is a performance measure which depends
on both TNR and TPR. It is calculated by using following formula;
Average = (TNR + TPR)/2
5. G-mean – The geometric mean of TNR and TPR is another performance measure which depends on
both TNR and TPR. This is a very popular performance measure for imbalanced datasets. It is calculated by
following formula;
G-mean = (TPRxTNR)1/2
6. Precision- Precision is the fraction of predicted positive class that is actually positive class. It is
calculated by following formula;
Precision = TP/(TP + FP)
5. Predicting Software Aging Related Bugs From Imbalanced Datasets By Using Data Mini...
DOI: 10.9790/0661-18132735 www.iosrjournals.org 31 | Page
7. F-measure – F-measure depends on recall and precision. It is another important performance measure. It
is calculated by following formula
F-measure = 2*Precision*Recall/(Precision + Recall)
Not all performance measures are good for imbalanced datasets. Accuracy is not a good measure for
these datasets as any classifier which will classify all the points to majority class will be very accurate.
However, this is not desirable. The most important thing is that the classifier should be able to predict minority
class data points well (High TPR). At the same time it should have high TNR. The G-mean is a good measure
for imbalanced datasets. F-measure which depends on recall and precision is also a good measure [13]. The user
should select the performance measure depending upon his application.
IV. Results And Discussion
The experiments were carried out by using the 5 fold cross validation. The results are presented in
Table 3 and Fig 1 – Fig. 5. Results suggest that the accuracy of general machine learning algorithms is high (>
0.9). TNR is also quite high. However, TPR is generally low. This is due to the fact that these algorithms are
predicting most of the testing data points as Non-Bug. As the number of Bug data points is very small (32 data
points out of 402 data points). The classification accuracy is quite high. However, these algorithms are not able
to capture Bug data points correctly. This is the reason of low TPR. The G-mean which is the most important
performance measure for imbalanced datasets is low for these algorithms [13]. Generally, the important point of
these kinds of datasets is that we should be able to predict minority class correctly (High TPR) without affecting
the predicting accuracy of majority class much (High TNR). The G-mean captures this fact. Low G-mean
means that they are not able to do it. Generally, these algorithms also have low F-measure.
Table 3- A comparative study of various data mining techniques with different performance measures. The
bold result shows the best performance for a given performance measure.
Criterion
Method
Accuracy
TPR
(Recall)
TNR
Mean = TPR
+TNR/2
G-mean =
(TPRxTNR)1/2 Precision F-measure
General
Machine
Learning
Methods
C4.5 0.9080 0.1250 0.9757 0.5503 0.3492 0.3077 0.1778
MLP-normal 0.6045 0.5938 0.6054 0.5996 0.5995 0.1152 0.1929
Bagging 0.9080 0.0625 0.9811 0.5218 0.2476 0.2222 0.0976
Adaboost.M1 0.9154 0.2813 0.9703 0.6258 0.5224 0.4500 0.3462
Machine
Learning
methods for
imbalanced
datasets
C4.5-Imbalanced 0.6841 0.9375 0.6622 0.7998 0.7879 0.1935 0.3209
NNCN 0.5821 0.6875 0.5730 0.6302 0.6276 0.1222 0.2075
SMOTEbagging 0.6741 0.8750 0.6568 0.7659 0.7581 0.1806 0.2995
SMOTEadaboost 0.7488 0.6875 0.7541 0.7208 0.7200 0.1947 0.3034
MSOMTEbagging 0.7164 0.7813 0.7108 0.7460 0.7452 0.1894 0.3049
MSMOTEboosting 0.7463 0.6250 0.7568 0.6909 0.6877 0.1818 0.2817
Overbagging 0.8234 0.3750 0.8622 0.6186 0.5686 0.1905 0.2526
Underbagging 0.6990 0.8750 0.6838 0.7794 0.7735 0.1931 0.3164
Accuracy of different algorithms
6. Predicting Software Aging Related Bugs From Imbalanced Datasets By Using Data Mini...
DOI: 10.9790/0661-18132735 www.iosrjournals.org 32 | Page
Figure 1- Accuracy of different data mining algorithms. General data mining algorithms, represented by white
columns, show better accuracy as compared to data mining techniques for imbalanced datasets, represented by
black columns.
Figure 2- G-mean of different data mining algorithms. Data mining techniques for imbalanced datasets,
represented by black columns, show better G-mean than that of general data mining algorithms, represented by
white columns.
Figure 3- TPR of different data mining algorithms. Data mining techniques for imbalanced datasets, represented
by black columns, show better TPR than that of general data mining algorithms, represented by white columns.
TPR of different
algorithms
7. Predicting Software Aging Related Bugs From Imbalanced Datasets By Using Data Mini...
DOI: 10.9790/0661-18132735 www.iosrjournals.org 33 | Page
Figure 4- TNR of different data mining algorithms. General data mining algorithms, represented by white
columns, show better TNR as compared to data mining techniques for imbalanced datasets, represented by black
columns.
Figure 5- F-measure of different data mining algorithms. Data mining techniques for imbalanced datasets,
represented by black columns, show better F-measure than that of general data mining algorithms, represented
by white columns.
Table 4- Confusion matrix for C4.5 decision tree.
True Class
Predicted Class
Bug No-Bug
Bug 4 9
No-Bug 28 361
Table 5- Confusion matrix for AdaBoost.M1.
True Class
Predicted Class
Bug No-Bug
Bug 9 11
No-Bug 23 359
Table 6- Confusion matrix for C4.5-Imbalanced decision tree.
True Class
Predicted Class
Bug No-Bug
Bug 30 125
No-Bug 2 245
TNR of different algorithms
F-measure of different algorithms
8. Predicting Software Aging Related Bugs From Imbalanced Datasets By Using Data Mini...
DOI: 10.9790/0661-18132735 www.iosrjournals.org 34 | Page
To understand this in a better way, the confusion matrix of C4.5 classifier is presented in Table 4. The
classifier predicted only 4 out of 32 Bug data points correctly. Hence, low TPR is achieved. Only 13 data points
are classified as Bug. In other words, almost all the points are predicted Non-Bug (389 out of 402 data points).
Adaboost.M1 gives the best F-measure. To understand this in a better way, the confusion matrix of
AdaBoost.M1 is presented in Table 5. The algorithm predicted 9 out of 32 Bug data points. However, the
precision in this case is relatively high (0.4500). The high precision leads to high F-measure. However, the G-
measure is relatively low (0.5224).
The accuracy of data mining algorithms for imbalanced datasets is relatively low. However, the G-
mean is generally high (most of the cases more than 0.7). The best case is C4.5 for imbalanced dataset (0.7998).
The TPR for these algorithms is relatively high. C4.5 for imbalanced dataset gave best TPR (0.9375). This
suggests that these algorithms are able to predict large number of Bug data points without affecting TNR much.
To analyse the performance, we presented the confusion matrix for C4.5 decision tree for imbalanced dataset in
Table 6. The matrix suggests that the algorithm was able to predict 30 out of 32 Bug data points correctly.
However, it also predicted a large number of Non-Bug data points as Bug data points. That leads to lower TNR
as compare to those of general data mining algorithms. The F-measure of these algorithms is also relatively
high.
It is important to note that the selection of performance measures for a given problem is an important
step. The G-mean and F-measure are the best performance measure for the imbalanced datasets. The data
mining algorithms for imbalanced datasets performed well on these two measures. The results suggest that for
Aging Related Bug problem in which the number of Bug data points is small, data mining algorithms for
imbalanced datasets should be used.
V. Conclusion
In this paper, we applied data mining techniques for imbalanced datasets to predict ARBs in a dataset
related with an open source software; MYSQL DBMS [20]. Results suggest that these data mining algorithms
are more useful than general data mining algorithms for predicting ARBs. Data mining algorithms for
imbalanced datasets have better prediction (TPR) for Bugs data points (minority class). These algorithms also
have better G-mean measure. More datasets will be used in the future. Feature selection techniques will also be
applied in future to remove the insignificant features. One class classifier is an approach to analyse dataset when
we have points of only one class [31]. In future, we will see the application of one class classifiers for predicting
ARBs.
References
[1]. S. L. Pfleeger and J. M. Atlee, Software Engineering: Theory and Practice, Prentice Hall; 4 edition, 2009.
[2]. Ian Sommerville, Software Engineering, Pearson; 9 edition, 2010.
[3]. M. McDonald, R. Musson and R. Smith, The Practical Guide to Defect Prevention, Microsoft Press, 2007.
[4]. M. Dowson, The Ariane 5 Software Failure, Software Engineering Notes 22 (2): 84. 1997.
[5]. D. Huizinga, Dorota and A. Kolawa, Automated Defect Prevention: Best Practices in Software Management. Wiley-IEEE
Computer Society Press. 2007.
[6]. M. Grottke, R. Matias, R., and K. S. Trivedi, The fundamentals of software aging, In IEEE Proceedings of Workshop on Software
Aging and Rejuvenation, in conjunction with ISSRE. Seattle, WA. 2008.
[7]. D. Controneo, R. Natella, R. Pietrantuono, and S. Russo, A survey on software aging and rejuvenation studies, IACM Journal on
Emerging Technologies in Computing Systems (JETC), 2014.
[8]. J. Han, M. Kamber and J. Pei, Data Mining: Concepts and Techniques, Morgan Kaufmann; 2011.
I. H. Witten, E. Frank, M. A. Hall (Author) Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann;
2011.
[9]. L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, 2004.
[10]. L. Breiman. Bagging predictors. Machine Learning 24, 1996, pp. 123-140.
[11]. R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning 37, 1999, pp.
297-336.
[12]. N. Chawla, Data mining for imbalanced datasets: An overview. In. Maimon O., Rokach L. (eds.) The Data Mining and Knowledge
Discovery Handbook, Springer 2005, pp. 853-867.
[13]. A. E. Hassan and T. Xie. Mining software engineering data. In Proceedings of the 32nd ACM/IEEE International Conference on
Software Engineering, 2010, pp. 503–504.
[14]. H. H. Kagdi, M. L. Collard, and J. I. Maletic. A survey and taxonomy of approaches for mining software repositories in the context
of software evolution. Journal of Software Maintenance, 19(2), 2007, pp. 77–131.
[15]. G. Czibula, , Z. Marian and I. G. Czibula, Software defect prediction using relational association rule mining. Information
Sciences, 264, 2014, pp. 260–278.
[16]. J. Moeyersomsa, E. Junqué de Fortunya, K. Dejaegerb, , B. Baesensb, D. Martensa, Comprehensible software fault and effort
prediction: A data mining approach, Journal of Systems and Software, Volume 100, Feb. 2015, pp. 80–90.
[17]. E. A. El-Sebakhy, Functional networks as a novel data mining paradigm in forecasting software development efforts, Expert
Systems with Applications, 38(3), March 2011, pp. 2187–2194.
[18]. K. Dejaeger, W. Verbeke, D. Martens, and B. Baesens. Data mining techniques for software effort estimation: A comparative study.
IEEE TSE, 38(2), 2012, pp. 375–397.
[19]. D. Cotroneo, R. Natella, R., and R. Pietrantuono, Predicting aging-related bugs using software complexity metrics. Performance
Evaluation, 70(3), 2013, pp. 163–178.
9. Predicting Software Aging Related Bugs From Imbalanced Datasets By Using Data Mini...
DOI: 10.9790/0661-18132735 www.iosrjournals.org 35 | Page
[20]. J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kauffman, 1993.
[21]. K.M. Ting. An instance-weighting method to induce cost-sensitive trees. IEEE Transactions on Knowledge and Data Engineering
14:3, 2002, pp. 659-665.
[22]. Z.-H. Zhou, X.-Y. Liu. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE
Transactions on Knowledge and Data Engineering 18:1, 2006, pp. 63-77.
[23]. N. V. Chawla, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer. SMOTE: Synthetic Minority Oversampling TEchnique. Journal
of Artificial Intelligence Research,16, 2002, 321–357.
[24]. S. Wang, X. Yao. Diversity analysis on imbalanced data sets by using ensemble models. IEEE Symposium Series on Computational
Intelligence and Data Mining (IEEE CIDM 2009). Nashville TN (USA, 2009), 2009, pp. 324-331.
[25]. N.V. Chawla, A. Lazarevic, L.O. Hall, K.W. Bowyer. SMOTEBoost: Improving prediction of the minority class in boosting. 7th
European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003). Cavtat Dubrovnik (Croatia,
2003), 2003, pp. 107-119.
[26]. S. Wang, X. Yao. Diversity analysis on imbalanced data sets by using ensemble models. IEEE Symposium Series on Computational
Intelligence and Data Mining, Nashville TN, USA, 2009, pp. 324-331.
[27]. S. Hu, Y. Liang, L. Ma, Y. He. MSMOTE: Improving classification performance when training data is imbalanced. 2nd
International Workshop on Computer Science and Engineering, Qingdao, China, 2009, pp. 13-17.
[28]. R. Barandela, R.M. Valdovinos, J.S. Sánchez. New applications of ensembles of classifiers. Pattern Analysis and Applications 6,
2003, pp. 245-256.
[29]. J. Alcalá-Fdez, A. Fernandez, J. Luengo, J. Derrac, S. García, L. Sánchez, F. Herrera. KEEL Data-Mining Software Tool: Data Set
Repository, Integration of Algorithms and Experimental Analysis Framework. Journal of Multiple-Valued Logic and Soft
Computing 17:2-3, 2011, pp. 255-287.
[30]. S.S. Khan, and M.G. Madden . A survey of recent trends in one class classification. Lect. Notes Comput. Sci. 6206, 2010, pp.
188−197.