We propose a set of methods to classify vendors based on estimated CPU performance and predict CPU performance based on hardware components. For vendor classification, we use the highest and lowest estimated performance and frequency of occurrences of each vendor to create classification zones. These zones can be used to identify vendors who manufacture hardware that satisfy a given performance requirement. We use multi-layered neural networks for performance prediction, which account for nonlinearity in performance data. Various neural network architectures are analysed in comparison to linear, quadratic, and cubic regression. Experiments show that neural networks obtain low error and high correlation between predicted and published performance values, while cubic regression produces higher
correlation than neural networks when more data is used for training than testing. An analysis of how the neural network architecture affects prediction is also performed. The proposed methods can be used to identify suitable hardware replacements.
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
Building practical and efficient intrusion detection systems in computer network is important in industrial areas today and machine learning technique provides a set of effective algorithms to detect network
intrusion. To find out appropriate algorithms for building such kinds of systems, it is necessary to evaluate various types of machine learning algorithms based on specific criteria. In this paper, we propose a novel evaluation formula which incorporates 6 indexes into our comprehensive measurement, including precision, recall, root mean square error, training time, sample complexity and practicability, in order to
find algorithms which have high detection rate, low training time, need less training samples and are easy
to use like constructing, understanding and analyzing models. Detailed evaluation process is designed to
get all necessary assessment indicators and 6 kinds of machine learning algorithms are evaluated.
Experimental results illustrate that Logistic Regression shows the best overall performance.
Function Point Software Cost Estimates using Neuro-Fuzzy techniqueijceronline
Software estimation accuracy is among the greatest challenges for software developers. As Neurofuzzy based system is able to approximate the non-linear function with more precision so it is used as a soft computing approach to generate model by formulating the relationship based on its training. The approach presented in this paper is independent of the nature and type of estimation. In this paper, Function point is used as algorithmic model and an attempt is being made to validate the soundness of Neuro fuzzy technique using ISBSG and NASA project data.
A Defect Prediction Model for Software Product based on ANFISIJSRD
Artificial intelligence techniques are day by day getting involvement in all the classification and prediction based process like environmental monitoring, stock exchange conditions, biomedical diagnosis, software engineering etc. However still there are yet to be simplify the challenges of selecting training criteria for design of artificial intelligence models used for prediction of results. This work focus on the defect prediction mechanism development using software metric data of KC1.We have taken subtractive clustering approach for generation of fuzzy inference system (FIS).The FIS rules are generated at different radius of influence of input attribute vectors and the developed rules are further modified by ANFIS technique to obtain the prediction of number of defects in software project using fuzzy logic system.
A time efficient and accurate retrieval of range aggregate queries using fuzz...IJECEIAES
Massive growth in the big data makes difficult to analyse and retrieve the useful information from the set of available data’s. Existing approaches cannot guarantee an efficient retrieval of data from the database. In the existing work stratified sampling is used to partition the tables in terms of stratic variables. However k means clustering algorithm cannot guarantees an efficient retrieval where the choosing centroid in the large volume of data would be difficult. And less knowledge about the stratic variable might leads to the less efficient partitioning of tables. This problem is overcome in the proposed methodology by introducing the FCM clustering instead of k means clustering which can cluster the large volume of data which are similar in nature. Stratification problem is overcome by introducing the post stratification approach which will leads to efficient selection of stratic variable. This methodology leads to an efficient retrieval process in terms of user query within less time and more accuracy.
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
Abstract
Internet Technology is growing at exponential rate day by day, making data security of computer systems more complex and critical. There has been multiple methodology implemented for the same in recent time as detailed in [1], [3]. Availability of larger bandwidth has made the multiple large computer server network connected worldwide and thus increasing the load on the necessity to secure data and Intrusion detection system (IDS) is one of the most efficient technique to maintain security of computer system. The proposed system is designed in such a way that are helpful in identifying malicious behavior and improper use of computer system. In this report we proposed a hybrid technique for intrusion detection using data mining algorithms. Our main objective is to do complete analysis of intrusion detection Dataset to test the implemented system.In This report we will propose a new methodology in which Modified k-mean is used for clustering whereas Naïve Bayes for the classification. These two data mining techniques will be used for Intrusion detection in large horizontally distributed database.
Keywords: Intrusion Detection, Modified K-Mean, Naïve Bays
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Comparative Performance Analysis of Machine Learning Techniques for Software ...csandit
Machine learning techniques can be used to analyse data from different perspectives and enable
developers to retrieve useful information. Machine learning techniques are proven to be useful
in terms of software bug prediction. In this paper, a comparative performance analysis of
different machine learning techniques is explored for software bug prediction on public
available data sets. Results showed most of the machine learning methods performed well on
software bug datasets.
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
Building practical and efficient intrusion detection systems in computer network is important in industrial areas today and machine learning technique provides a set of effective algorithms to detect network
intrusion. To find out appropriate algorithms for building such kinds of systems, it is necessary to evaluate various types of machine learning algorithms based on specific criteria. In this paper, we propose a novel evaluation formula which incorporates 6 indexes into our comprehensive measurement, including precision, recall, root mean square error, training time, sample complexity and practicability, in order to
find algorithms which have high detection rate, low training time, need less training samples and are easy
to use like constructing, understanding and analyzing models. Detailed evaluation process is designed to
get all necessary assessment indicators and 6 kinds of machine learning algorithms are evaluated.
Experimental results illustrate that Logistic Regression shows the best overall performance.
Function Point Software Cost Estimates using Neuro-Fuzzy techniqueijceronline
Software estimation accuracy is among the greatest challenges for software developers. As Neurofuzzy based system is able to approximate the non-linear function with more precision so it is used as a soft computing approach to generate model by formulating the relationship based on its training. The approach presented in this paper is independent of the nature and type of estimation. In this paper, Function point is used as algorithmic model and an attempt is being made to validate the soundness of Neuro fuzzy technique using ISBSG and NASA project data.
A Defect Prediction Model for Software Product based on ANFISIJSRD
Artificial intelligence techniques are day by day getting involvement in all the classification and prediction based process like environmental monitoring, stock exchange conditions, biomedical diagnosis, software engineering etc. However still there are yet to be simplify the challenges of selecting training criteria for design of artificial intelligence models used for prediction of results. This work focus on the defect prediction mechanism development using software metric data of KC1.We have taken subtractive clustering approach for generation of fuzzy inference system (FIS).The FIS rules are generated at different radius of influence of input attribute vectors and the developed rules are further modified by ANFIS technique to obtain the prediction of number of defects in software project using fuzzy logic system.
A time efficient and accurate retrieval of range aggregate queries using fuzz...IJECEIAES
Massive growth in the big data makes difficult to analyse and retrieve the useful information from the set of available data’s. Existing approaches cannot guarantee an efficient retrieval of data from the database. In the existing work stratified sampling is used to partition the tables in terms of stratic variables. However k means clustering algorithm cannot guarantees an efficient retrieval where the choosing centroid in the large volume of data would be difficult. And less knowledge about the stratic variable might leads to the less efficient partitioning of tables. This problem is overcome in the proposed methodology by introducing the FCM clustering instead of k means clustering which can cluster the large volume of data which are similar in nature. Stratification problem is overcome by introducing the post stratification approach which will leads to efficient selection of stratic variable. This methodology leads to an efficient retrieval process in terms of user query within less time and more accuracy.
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
Abstract
Internet Technology is growing at exponential rate day by day, making data security of computer systems more complex and critical. There has been multiple methodology implemented for the same in recent time as detailed in [1], [3]. Availability of larger bandwidth has made the multiple large computer server network connected worldwide and thus increasing the load on the necessity to secure data and Intrusion detection system (IDS) is one of the most efficient technique to maintain security of computer system. The proposed system is designed in such a way that are helpful in identifying malicious behavior and improper use of computer system. In this report we proposed a hybrid technique for intrusion detection using data mining algorithms. Our main objective is to do complete analysis of intrusion detection Dataset to test the implemented system.In This report we will propose a new methodology in which Modified k-mean is used for clustering whereas Naïve Bayes for the classification. These two data mining techniques will be used for Intrusion detection in large horizontally distributed database.
Keywords: Intrusion Detection, Modified K-Mean, Naïve Bays
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Comparative Performance Analysis of Machine Learning Techniques for Software ...csandit
Machine learning techniques can be used to analyse data from different perspectives and enable
developers to retrieve useful information. Machine learning techniques are proven to be useful
in terms of software bug prediction. In this paper, a comparative performance analysis of
different machine learning techniques is explored for software bug prediction on public
available data sets. Results showed most of the machine learning methods performed well on
software bug datasets.
Intrusion Detection System Based on K-Star Classifier and Feature Set ReductionIOSR Journals
Abstract: Network security and Intrusion Detection Systems (IDS’s) is an important security related research
area. This paper applies K-star algorithm with filtering analysis in order to build a network intrusion detection
system. For our experimental analysis and as a case study, we have used the new NSL-KDD dataset, which is a
modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 66.0% for the
training set and the remainder for the testing set a 2 class classifications has been implemented. WEKA which is
a java based open source software consists of a collection of machine learning algorithms for Data mining tasks
has been used in the testing process. The experimental results show that the proposed approach is very accurate
with low false positive rate and high true positive rate and it takes less learning time in comparison with other
existing approaches used for efficient network intrusion detection.
Keywords: Information Gain, Intrusion Detection System, Instance-based classifier, K-Star, Weka.
Optimization of network traffic anomaly detection using machine learning IJECEIAES
In this paper, to optimize the process of detecting cyber-attacks, we choose to propose 2 main optimization solutions: Optimizing the detection method and optimizing features. Both of these two optimization solutions are to ensure the aim is to increase accuracy and reduce the time for analysis and detection. Accordingly, for the detection method, we recommend using the Random Forest supervised classification algorithm. The experimental results in section 4.1 have proven that our proposal that use the Random Forest algorithm for abnormal behavior detection is completely correct because the results of this algorithm are much better than some other detection algorithms on all measures. For the feature optimization solution, we propose to use some data dimensional reduction techniques such as information gain, principal component analysis, and correlation coefficient method. The results of the research proposed in our paper have proven that to optimize the cyberattack detection process, it is not necessary to use advanced algorithms with complex and cumbersome computational requirements, it must depend on the monitoring data for selecting the reasonable feature extraction and optimization algorithm as well as the appropriate attack classification and detection algorithms.
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...IJMER
Intrusion Detection System (IDS) plays a major role in the provision of effective security to various types of networks. Moreover, Intrusion Detection System for networks need appropriate rule set for classifying network bench mark data into normal or attack patterns. Generally, each dataset is characterized by a large set of features. However, all these features will not be relevant or fully contribute in identifying an attack. Since different attacks need various subsets to provide better detection accuracy. In this paper an improved feature selection algorithm is proposed to identify the most appropriate subset of features for detecting a certain attacks. This proposed method is based on Minkowski distance feature ranking and an improved exhaustive search that selects a better combination of features. This system has been evaluated using the KDD CUP 1999 dataset and also with EMSVM [1] classifier. The experimental results show that the proposed system provides high classification accuracy and low false alarm rate when applied on the reduced feature subsets
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...Eswar Publications
This paper is intended to introduce an efficient as well as robust training mechanism for a neural network which can be used for testing the functionality of software. The traditional setup of neural network architecture is used constituting the two phases -training phase and evaluation phase. The input test cases are to be trained in first phase and consequently they behave like normal test cases to predict the output as untrained test cases. The test oracle measures the deviation between the outputs of untrained test cases with trained test cases and authorizes a final decision. Our framework can be applied to systems where number of test cases outnumbers the
functionalities or the system under test is too complex. It can also be applied to the test case development when the modules of a system become tedious after modification.
SAMPLING BASED APPROACHES TO HANDLE IMBALANCES IN NETWORK TRAFFIC DATASET FOR...cscpconf
Network traffic data is huge, varying and imbalanced because various classes are not equally distributed. Machine learning (ML) algorithms for traffic analysis uses the samples from this
data to recommend the actions to be taken by the network administrators as well as training. Due to imbalances in dataset, it is difficult to train machine learning algorithms for traffic
analysis and these may give biased or false results leading to serious degradation in performance of these algorithms. Various techniques can be applied during sampling to minimize the effect of imbalanced instances. In this paper various sampling techniques have been analysed in order to compare the decrease in variation in imbalances of network traffic
datasets sampled for these algorithms. Various parameters like missing classes in samples probability of sampling of the different instances have been considered for comparison
An Application of Genetic Programming for Power System Planning and OperationIDES Editor
This work incorporates the identification of model
in functional form using curve fitting and genetic programming
technique which can forecast present and future load
requirement. Approximating an unknown function with
sample data is an important practical problem. In order to
forecast an unknown function using a finite set of sample
data, a function is constructed to fit sample data points. This
process is called curve fitting. There are several methods of
curve fitting. Interpolation is a special case of curve fitting
where an exact fit of the existing data points is expected.
Once a model is generated, acceptability of the model must be
tested. There are several measures to test the goodness of a
model. Sum of absolute difference, mean absolute error, mean
absolute percentage error, sum of squares due to error (SSE),
mean squared error and root mean squared errors can be used
to evaluate models. Minimizing the squares of vertical distance
of the points in a curve (SSE) is one of the most widely used
method .Two of the methods has been presented namely Curve
fitting technique & Genetic Programming and they have been
compared based on (SSE)sum of squares due to error.
COMPUTER INTRUSION DETECTION BY TWOOBJECTIVE FUZZY GENETIC ALGORITHMcscpconf
The purpose of this paper is to describe two objective fuzzy genetics-based learning algorithms
and discusses its usage to detect intrusion in a computer network. Experiments were performed
with KDD-cup data set, which have information on computer networks, during normal behavior
and intrusive behavior. The performance of final fuzzy classification system has been
investigated using intrusion detection problem as a high dimensional classification problem.
This task is formulated as optimization problem with two objectives: To minimize the number of
fuzzy rules and to maximize the classification rate. We show a two-objective genetic algorithm
for finding non-dominated solutions of the fuzzy rule selection problem
Software Defect Prediction Using Radial Basis and Probabilistic Neural NetworksEditor IJCATR
Defects in modules of software systems is a major problem in software development. There are a variety of data mining
techniques used to predict software defects such as regression, association rules, clustering, and classification. This paper is concerned
with classification based software defect prediction. This paper investigates the effectiveness of using a radial basis function neural
network and a probabilistic neural network on prediction accuracy and defect prediction. The conclusions to be drawn from this work is
that the neural networks used in here provide an acceptable level of accuracy but a poor defect prediction ability. Probabilistic neural
networks perform consistently better with respect to the two performance measures used across all datasets. It may be advisable to use
a range of software defect prediction models to complement each other rather than relying on a single technique.
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
Building practical and efficient intrusion detection systems in computer network is important in industrial areas today and machine learning technique provides a set of effective algorithms to detect network intrusion. To find out appropriate algorithms for building such kinds of systems, it is necessary to evaluate various types of machine learning algorithms based on specific criteria. In this paper, we propose a novel evaluation formula which incorporates 6 indexes into our comprehensive measurement, including precision, recall, root mean square error, training time, sample complexity and practicability, in order to find algorithms which have high detection rate, low training time, need less training samples and are easy to use like constructing, understanding and analyzing models. Detailed evaluation process is designed to get all necessary assessment indicators and 6 kinds of machine learning algorithms are evaluated. Experimental results illustrate that Logistic Regression shows the best overall performance.
EFFICIENT USE OF HYBRID ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM COMBINED WITH N...csandit
This research study proposes a novel method for automatic fault prediction from foundry data
introducing the so-called Meta Prediction Function (MPF). Kernel Principal Component
Analysis (KPCA) is used for dimension reduction. Different algorithms are used for building the
MPF such as Multiple Linear Regression (MLR), Adaptive Neuro Fuzzy Inference System
(ANFIS), Support Vector Machine (SVM) and Neural Network (NN). We used classical
machine learning methods such as ANFIS, SVM and NN for comparison with our proposed
MPF. Our empirical results show that the MPF consistently outperform the classical methods.
Intrusion Detection System Based on K-Star Classifier and Feature Set ReductionIOSR Journals
Abstract: Network security and Intrusion Detection Systems (IDS’s) is an important security related research
area. This paper applies K-star algorithm with filtering analysis in order to build a network intrusion detection
system. For our experimental analysis and as a case study, we have used the new NSL-KDD dataset, which is a
modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 66.0% for the
training set and the remainder for the testing set a 2 class classifications has been implemented. WEKA which is
a java based open source software consists of a collection of machine learning algorithms for Data mining tasks
has been used in the testing process. The experimental results show that the proposed approach is very accurate
with low false positive rate and high true positive rate and it takes less learning time in comparison with other
existing approaches used for efficient network intrusion detection.
Keywords: Information Gain, Intrusion Detection System, Instance-based classifier, K-Star, Weka.
Optimization of network traffic anomaly detection using machine learning IJECEIAES
In this paper, to optimize the process of detecting cyber-attacks, we choose to propose 2 main optimization solutions: Optimizing the detection method and optimizing features. Both of these two optimization solutions are to ensure the aim is to increase accuracy and reduce the time for analysis and detection. Accordingly, for the detection method, we recommend using the Random Forest supervised classification algorithm. The experimental results in section 4.1 have proven that our proposal that use the Random Forest algorithm for abnormal behavior detection is completely correct because the results of this algorithm are much better than some other detection algorithms on all measures. For the feature optimization solution, we propose to use some data dimensional reduction techniques such as information gain, principal component analysis, and correlation coefficient method. The results of the research proposed in our paper have proven that to optimize the cyberattack detection process, it is not necessary to use advanced algorithms with complex and cumbersome computational requirements, it must depend on the monitoring data for selecting the reasonable feature extraction and optimization algorithm as well as the appropriate attack classification and detection algorithms.
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...IJMER
Intrusion Detection System (IDS) plays a major role in the provision of effective security to various types of networks. Moreover, Intrusion Detection System for networks need appropriate rule set for classifying network bench mark data into normal or attack patterns. Generally, each dataset is characterized by a large set of features. However, all these features will not be relevant or fully contribute in identifying an attack. Since different attacks need various subsets to provide better detection accuracy. In this paper an improved feature selection algorithm is proposed to identify the most appropriate subset of features for detecting a certain attacks. This proposed method is based on Minkowski distance feature ranking and an improved exhaustive search that selects a better combination of features. This system has been evaluated using the KDD CUP 1999 dataset and also with EMSVM [1] classifier. The experimental results show that the proposed system provides high classification accuracy and low false alarm rate when applied on the reduced feature subsets
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...Eswar Publications
This paper is intended to introduce an efficient as well as robust training mechanism for a neural network which can be used for testing the functionality of software. The traditional setup of neural network architecture is used constituting the two phases -training phase and evaluation phase. The input test cases are to be trained in first phase and consequently they behave like normal test cases to predict the output as untrained test cases. The test oracle measures the deviation between the outputs of untrained test cases with trained test cases and authorizes a final decision. Our framework can be applied to systems where number of test cases outnumbers the
functionalities or the system under test is too complex. It can also be applied to the test case development when the modules of a system become tedious after modification.
SAMPLING BASED APPROACHES TO HANDLE IMBALANCES IN NETWORK TRAFFIC DATASET FOR...cscpconf
Network traffic data is huge, varying and imbalanced because various classes are not equally distributed. Machine learning (ML) algorithms for traffic analysis uses the samples from this
data to recommend the actions to be taken by the network administrators as well as training. Due to imbalances in dataset, it is difficult to train machine learning algorithms for traffic
analysis and these may give biased or false results leading to serious degradation in performance of these algorithms. Various techniques can be applied during sampling to minimize the effect of imbalanced instances. In this paper various sampling techniques have been analysed in order to compare the decrease in variation in imbalances of network traffic
datasets sampled for these algorithms. Various parameters like missing classes in samples probability of sampling of the different instances have been considered for comparison
An Application of Genetic Programming for Power System Planning and OperationIDES Editor
This work incorporates the identification of model
in functional form using curve fitting and genetic programming
technique which can forecast present and future load
requirement. Approximating an unknown function with
sample data is an important practical problem. In order to
forecast an unknown function using a finite set of sample
data, a function is constructed to fit sample data points. This
process is called curve fitting. There are several methods of
curve fitting. Interpolation is a special case of curve fitting
where an exact fit of the existing data points is expected.
Once a model is generated, acceptability of the model must be
tested. There are several measures to test the goodness of a
model. Sum of absolute difference, mean absolute error, mean
absolute percentage error, sum of squares due to error (SSE),
mean squared error and root mean squared errors can be used
to evaluate models. Minimizing the squares of vertical distance
of the points in a curve (SSE) is one of the most widely used
method .Two of the methods has been presented namely Curve
fitting technique & Genetic Programming and they have been
compared based on (SSE)sum of squares due to error.
COMPUTER INTRUSION DETECTION BY TWOOBJECTIVE FUZZY GENETIC ALGORITHMcscpconf
The purpose of this paper is to describe two objective fuzzy genetics-based learning algorithms
and discusses its usage to detect intrusion in a computer network. Experiments were performed
with KDD-cup data set, which have information on computer networks, during normal behavior
and intrusive behavior. The performance of final fuzzy classification system has been
investigated using intrusion detection problem as a high dimensional classification problem.
This task is formulated as optimization problem with two objectives: To minimize the number of
fuzzy rules and to maximize the classification rate. We show a two-objective genetic algorithm
for finding non-dominated solutions of the fuzzy rule selection problem
Software Defect Prediction Using Radial Basis and Probabilistic Neural NetworksEditor IJCATR
Defects in modules of software systems is a major problem in software development. There are a variety of data mining
techniques used to predict software defects such as regression, association rules, clustering, and classification. This paper is concerned
with classification based software defect prediction. This paper investigates the effectiveness of using a radial basis function neural
network and a probabilistic neural network on prediction accuracy and defect prediction. The conclusions to be drawn from this work is
that the neural networks used in here provide an acceptable level of accuracy but a poor defect prediction ability. Probabilistic neural
networks perform consistently better with respect to the two performance measures used across all datasets. It may be advisable to use
a range of software defect prediction models to complement each other rather than relying on a single technique.
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
Building practical and efficient intrusion detection systems in computer network is important in industrial areas today and machine learning technique provides a set of effective algorithms to detect network intrusion. To find out appropriate algorithms for building such kinds of systems, it is necessary to evaluate various types of machine learning algorithms based on specific criteria. In this paper, we propose a novel evaluation formula which incorporates 6 indexes into our comprehensive measurement, including precision, recall, root mean square error, training time, sample complexity and practicability, in order to find algorithms which have high detection rate, low training time, need less training samples and are easy to use like constructing, understanding and analyzing models. Detailed evaluation process is designed to get all necessary assessment indicators and 6 kinds of machine learning algorithms are evaluated. Experimental results illustrate that Logistic Regression shows the best overall performance.
EFFICIENT USE OF HYBRID ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM COMBINED WITH N...csandit
This research study proposes a novel method for automatic fault prediction from foundry data
introducing the so-called Meta Prediction Function (MPF). Kernel Principal Component
Analysis (KPCA) is used for dimension reduction. Different algorithms are used for building the
MPF such as Multiple Linear Regression (MLR), Adaptive Neuro Fuzzy Inference System
(ANFIS), Support Vector Machine (SVM) and Neural Network (NN). We used classical
machine learning methods such as ANFIS, SVM and NN for comparison with our proposed
MPF. Our empirical results show that the MPF consistently outperform the classical methods.
Performance Evaluation of a Network Using Simulation Tools or Packet TracerIOSRjournaljce
Today, the importance of information and accessing information is increasing rapidly. With the advancement of technology, one of the greatest means of achieving knowledge are, computers have entered in many areas of our lives. But the most important of them are the communication fields. This study will be a practical guide for understanding how to assemble and analyze various parameters in network performance evaluation and when designing a network what is necessary to looking for to remove the consequences of degrading performance. Therefore, what can you do in a network performance evaluation using simulation tools such as Network Simulation or Packet tracer and how various parameters can be brought together successfully? CCNA, CCNP, HCNA and HCNP educational level has been used and important setting has been simulated one by one. At the result this is a good guide for a local or wide area network. Finally, the performance issues precautions described. Considering the necessary parameters, imaginary networks were designed and evaluated both in CISCO Packet Tracer and Huawei's eNSP simulation program. But it should not be left unsaid that the networks have been designed and evaluated in free virtual environments, not in a real laboratory. Therefore, it is impossible to make actual performance appraisal and output as there is no actual data available.
Analyzing the solutions of DEA through information visualization and data min...Gurdal Ertek
Data envelopment analysis (DEA) has proven to be a useful tool for assessing efficiency or productivity of organizations, which is of vital practical importance in managerial decision making. DEA provides a significant amount of information from which analysts and managers derive insights and guidelines to promote their existing performances. Regarding to this fact, effective and methodologic analysis and interpretation of DEA solutions are very critical. The main objective of this study is then to develop a general decision support system (DSS) framework to analyze the solutions of basic DEA models. The paper formally shows how the solutions of DEA models should be structured so that these solutions can be examined and interpreted by analysts through information visualization and data mining techniques effectively. An innovative and convenient DEA solver, Smart DEA, is designed and developed in accordance with the proposed analysis framework. The developed software provides a DEA solution which is consistent with the framework and is ready-to-analyze with data mining tools, through a table-based structure. The developed framework is tested and applied in a real world project for bench marking the vendors of a leading Turkish automotive company. The results show the effectiveness and the efficacy of the proposed framework.
http://research.sabanciuniv.edu.
An intrusion detection algorithm for amiIJCI JOURNAL
Nowadays, using the smart metering devices for energy users to manage a wide variety of subscribers,
reading devices for measuring, billing, disconnection and connection of subscribers’ connection
management is an important issue. The performance of these intelligent systems is based on information
transfer in the context of information technology, so reported data from network should be managed to
avoid the malicious activities that including the issues that could affect the quality of service the system. In
this paper for control of the reported data and to ensure the veracity of the obtained information, using
intrusion detection system is proposed based on the support vector machine and principle component
analysis (PCA) to recognize and identify the intrusions and attacks in the smart grid. Here, the operation of
intrusion detection systems for different kernel of SVM when using support vector machine (SVM) and PCA
simultaneously is studied. To evaluate the algorithm, based on data KDD99, numerical simulation is done
on five different kernels for an intrusion detection system using support vector machine with PCA
simultaneously. Also comparison analysis is investigated for presented intrusion detection algorithm in
terms of time - response, rate of increase network efficiency and increase system error and differences in
the use or lack of use PCA. The results indicate that correct detection rate and the rate of attack error
detection have best value when PCA is used, and when the core of algorithm is radial type, in SVM
algorithm reduces the time for data analysis and enhances performance of intrusion detection.
A Defect Prediction Model for Software Product based on ANFISIJSRD
Artificial intelligence techniques are day by day getting involvement in all the classification and prediction based process like environmental monitoring, stock exchange conditions, biomedical diagnosis, software engineering etc. However still there are yet to be simplify the challenges of selecting training criteria for design of artificial intelligence models used for prediction of results. This work focus on the defect prediction mechanism development using software metric data of KC1.We have taken subtractive clustering approach for generation of fuzzy inference system (FIS).The FIS rules are generated at different radius of influence of input attribute vectors and the developed rules are further modified by ANFIS technique to obtain the prediction of number of defects in software project using fuzzy logic system.
Artificial intelligence based pattern recognition is
one of the most important tools in process control to identify
process problems. The objective of this study was to
evaluate the relative performance of a feature-based
Recognizer compared with the raw data-based recognizer.
The study focused on recognition of seven commonly
researched patterns plotted on the quality chart. The
artificial intelligence based pattern recognizer trained using
the three selected statistical features resulted in significantly
better performance compared with the raw data-based
recognizer.
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
Abstract
Internet Technology is growing at exponential rate day by day, making data security of computer systems more complex and critical. There has been multiple methodology implemented for the same in recent time as detailed in [1], [3]. Availability of larger bandwidth has made the multiple large computer server network connected worldwide and thus increasing the load on the necessity to secure data and Intrusion detection system (IDS) is one of the most efficient technique to maintain security of computer system. The proposed system is designed in such a way that are helpful in identifying malicious behavior and improper use of computer system. In this report we proposed a hybrid technique for intrusion detection using data mining algorithms. Our main objective is to do complete analysis of intrusion detection Dataset to test the implemented system.In This report we will propose a new methodology in which Modified k-mean is used for clustering whereas Naïve Bayes for the classification. These two data mining techniques will be used for Intrusion detection in large horizontally distributed database.
Keywords: Intrusion Detection, Modified K-Mean, Naïve Bays
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...IAEME Publication
This paper presents an approach based on applying an aggregated predictor formed by multiple versions of a multilayer neural network with a back-propagation optimization algorithm for helping the engineer to get a list of the most appropriate well-test interpretation models for a given set of pressure/ production data. The proposed method consists of three stages: (1) data decorrelation through principal component analysis to reduce the covariance between the variables and the dimension of the input layer in the artificial neural network, (2) bootstrap replicates of the learning set where the data is repeatedly sampled with a random split of the data into train sets and using these as new learning sets, and (3) automatic reservoir model identification through aggregated predictor formed by a plurality vote when predicting a new class. This method is described in detail to ensure successful replication of results. The required training and test dataset were generated by using analytical solution models. In our case, there were used 600 samples: 300 for training, 100 for cross-validation, and 200 for testing. Different network structures were tested during this study to arrive at optimum network design. We notice that the single net methodology always brings about confusion in selecting the correct model even though the training results for the constructed networks are close to 1. We notice also that the principal component analysis is an effective strategy in reducing the number of input features, simplifying the network structure, and lowering the training time of the ANN. The results obtained show that the proposed model provides better performance when predicting new data with a coefficient of correlation approximately equal to 95% Compared to a previous approach 80%, the combination of the PCA and ANN is more stable and determine the more accurate results with lesser computational complexity than was feasible previously. Clearly, the aggregated predictor is more stable and shows less bad classes compared to the previous approach.
MACHINE LEARNING FOR QOE PREDICTION AND ANOMALY DETECTION IN SELF-ORGANIZING ...ijwmn
Existing mobile networking systems lack the level of intelligence, scalability, and autonomous adaptability
required to optimally enable next-generation networks like 5G and beyond, which are expected to be Self -
Organizing Networks (SONs). It is anticipated that machine learning (ML) will be instrumental in designing
future “x”G SON networks with their demanding Quality of Experience (QoE) requirements. This paper
evaluates a methodology that uses supervised machine learning to predict the QoE level of the end user
experiences and uses this information to detect anomalous behavior of dysfunctional network nodes
(eNodeBs/base stations) in self-organizing mobile networks. An end-to-end network scenario is created using
the network simulator ns-3, where end users interact with a remote host that is accessed over the Internet to
run the most commonly used applications like file downloads and uploads and the resulting output is used as
a dataset to implement ML algorithms for QoE prediction and eNodeB (eNB) anomaly detection. Three ML
algorithms were implemented and compared to study their effectiveness and the scalability of the
methodology. In the test network, an accuracy score greater than 99% is achieved using the ML algorithms.
As suggested by the ns-3 simulation the use of ML for QoE prediction will help network operators understand
end-user needs and identify network elements that are failing and need attention and recovery.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
CPU HARDWARE CLASSIFICATION AND PERFORMANCE PREDICTION USING NEURAL NETWORKS AND STATISTICAL LEARNING
1. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
DOI: 10.5121/ijaia.2020.11401 1
CPU HARDWARE CLASSIFICATION AND
PERFORMANCE PREDICTION USING NEURAL
NETWORKS AND STATISTICAL LEARNING
Courtney Foots1
, Palash Pal2
, Rituparna Datta1
and Aviv Segev1
1
Department of Computer Science, University of South Alabama, Mobile, United States
2
University Institute of Technology, Burdwan University, India
ABSTRACT
We propose a set of methods to classify vendors based on estimated CPU performance and predict CPU
performance based on hardware components. For vendor classification, we use the highest and lowest
estimated performance and frequency of occurrences of each vendor to create classification zones. These
zones can be used to identify vendors who manufacture hardware that satisfy a given performance
requirement. We use multi-layered neural networks for performance prediction, which account for
nonlinearity in performance data. Various neural network architectures are analysed in comparison to
linear, quadratic, and cubic regression. Experiments show that neural networks obtain low error and high
correlation between predicted and published performance values, while cubic regression produces higher
correlation than neural networks when more data is used for training than testing. An analysis of how the
neural network architecture affects prediction is also performed. The proposed methods can be used to
identify suitable hardware replacements.
KEYWORDS
Computer Hardware, Performance Prediction and Classification, Neural Networks, Statistical Learning,
Regression.
1. INTRODUCTION
Computer performance is measured in relation to computational time and valuable work
produced and is partly determined by hardware components such as the amount of memory and
processor speed [1]. For this study, we are interested in specifically the central processing unit
(CPU) performance, which directly affects a computer’s performance. Hardware performance
prediction can be useful from several perspectives. If an accurate prediction is obtained, it can
assist in detecting counterfeit hardware as well as viruses, spyware, Trojans, and other types of
malware. Malware and counterfeit components can decrease performance or cause performance
instability. There is a plethora of security measures that can be adopted to prevent malicious
programs from being downloaded and remove them when they have been downloaded [2, 3].
There are also many novel ways of detecting counterfeit hardware [4, 5]. Thus, methods of
detecting malware and counterfeits based on hardware performance are useful and part of highly
relevant topics in the technology field today.
The computer vendors can also be classified by the quality of their hardware. Each vendor
produces hardware that operates at different standards. These differences could be due to
differences in the intellectual property used, as well as the cost point of the hardware. Classifying
the hardware based on performance can assist in determining which vendor sells hardware
2. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
2
components that will maximize the average performance of the computer. It can also assist in
identifying potential hardware replacements that will match the original performance standards.
In the present paper, we propose a set of methods to (1) classify vendors based on estimated CPU
performance and (2) predict CPU performance based on hardware components.
The outline of the rest of this paper is as follows. Section 2 provides a brief overview of related
work in the domain of hardware performance prediction and classification. Section 3 explains the
proposed set of methods. Section 4 details our experiments and results. Section 5 presents our
conclusions.
2. RELATED WORK
On the topic of classification, Kar et al. [6] proposed a pattern classification model that uses
quantitative and qualitative measurements to guide decision making in relation to vendor
selection. This tool would assist its user by providing a robust analysis of the supplied collection
of vendors so that they may choose the best vendor. In our method, we will classify vendors
based on quantitative estimated CPU performance data only.
In the discussion of data analysis, Alexander et al. [7] presented a new methodology for
analyzing computer performance data using nonlinear time series analysis techniques. The
motivation was the concept that computers are deterministic nonlinear dynamic systems. Thus,
the previous performance analyses in which computers were considered to be linear and time
invariant are not representative of the nature of the actual testing conditions. In our method, we
will address the same issue of the nonlinearity of our performance data and use neural networks
as accommodation.
Hardware performance prediction is a well-studied topic. Lopez et al. [8] explored a way to
predict computer performance based on hardware component data without needing simulation.
They used a deep learning model to generate a benchmark score for a given hardware
configuration, then used multiple neural networks and principal component analysis to predict
performance in comparison to the corresponding benchmarks. Neural network and linear
regression techniques have been used to predict performance in multiprocessor systems [9].
Similarly, machine learning has been used to predict the performance of multi-threaded
applications with various underlying hardware designs [10]. Girard et al. [11] designed a tool to
predict the performance of avionic graphic hardware, which is used by engineers to determine
the optimal hardware architecture design before manufacturing. Adjacent to the topic of
predicting performance, Kang [12] used hardware performance to analyze the microeconomics of
buying and leasing computers.
The dataset used in this study has previously been used for detecting scientific anomalies using
probability density estimators [13] and fitting linear models in high dimensional spaces [14].
3. PROPOSED SET OF METHODS
3.1. CPU Performance Dataset Description
We aim to classify and predict the performance of CPUs based on a set of ten parameters from an
opensource dataset [15]. This dataset contains 209 entries, representing a variety of vendors and
models of CPUs. Though the data was donated in 1987, the attributes provided still work well
with the scope of our study and are used as test data for the proposed methods. The ten
parameters of the dataset are listed below:
3. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
3
1. Vendor name
2. Model name
3. Machine cycle time in nanoseconds
4. Minimum main memory in kilobytes
5. Maximum main memory in kilobytes
6. Cache memory in kilobytes
7. Minimum channels in units
8. Maximum channels in units
9. Published relative performance
10. Estimated relative performance from original article [16]
Parameters 1 and 10 are used for vendor classification. Parameters 3 through 9 are used for
performance prediction. Parameter 2 is not used in this study. Parameter 10 was calculated using
linear regression by Ein-Dor and Feldmesser [16].
3.2. Proposed Classification Method for Hardware Vendors
The highest and lowest estimated performance values are recorded for each vendor, along with
the frequency of occurrences of each vendor in the dataset. This information is used to create
classification zones. Each zone is labeled with a range of relative performance. The goal is to
produce a guide such that given a performance requirement, a list of vendors that manufacture
hardware that meet the requirement can be produced.
3.3. Proposed Prediction Method for Hardware Performance
Input parameters 3 through 8 are used to predict the performance of the CPUs. Then, parameter 9
is used with our predicted performance value to calculate the Mean Squared Error (MSE) of the
prediction and the correlation between predicted and published performance values. The MSE of
the prediction provides insight on the level of accuracy of the prediction in relation to the
published performance value. The correlation reflects the percentage of similarity between the
predicted and published performance values.
The following is the standard formula used to calculate the MSE for a dataset of n CPUs, where
pi is the predicted performance and pi’is the published performance:
To determine the correlation of the predicted and published performance values, we used the
Pearson correlation formula to find the correlation coefficient and the significance level of the
correlation. The correlation coefficient r is calculated as follows, with the values m representing
the mean of the predicted and published values:
The significance level of the correlation is ascertained by first calculating the t value as follows:
4. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
4
Then, the corresponding significance level is determined using the t distribution table with a
degree of freedom of n - 2. If the significance level is less than 5%, then the correlation between
the predicted and published performance values is considered to be significant.
Since the performance of a CPU is affected by many hardware components, there is no perfect or
absolute formula to predict its performance. The scatterplot matrix between the inputs and output
of our dataset is shown in Figure 1. From this figure, we can see that the relationship between the
input variables V3 – V8 and the output variable V9 is random and nonlinear. As a result, we will
use multilayered neural networks, which are suited for random nonlinear input and output
relationships.
Specifically, the performance predictions will be acquired using various architectures of a
multilayered feed forward network with six inputs and one output. When selecting architectures
for our tests, we aim for a variety of hidden layers to determine the level of versatility of the
neural network in producing quality results. The inputs to the neural network are the previously
discussed parameters. The output is the predicted performance, which is used with parameter 9 to
calculate the MSE and correlation values.
We will also use regression analysis for prediction since CPU performance is a continuous
measurement. Specifically, we will use linear, quadratic, and cubic regression to model the input
and output relationship of this dataset and predict performance. The input and output values for
each of these are the same used for the neural network. The predicting capabilities of the neural
networks will be compared to that of the regression analysis.
Figure 1. Scatterplot matrix of input and output variables for CPU performance data.
4. EXPERIMENTS AND RESULTS
4.1. Vendor Classification Based on Estimated CPU Performance
The classification task is performed with the vendor names and estimated relative performances
of each CPU in the dataset. The dataset contains 209 entries with 30 different vendors, out of
which the highest and lowest performance values for each vendor as well as the frequency of
occurrences of each vendor are tabulated in Table 1.
5. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
5
Table 1: Highest and lowest relative performance values along with frequency of occurrence of each
vendor from the CPU performance dataset.
Vendor Highest
Performance
Lowest
Performance
Frequency
Amdahl 1238 132 9
Sperry 978 24 13
NAS 603 29 19
Siemens 382 19 12
IBM 361 15 32
NCR 281 19 13
Adviser 199 199 1
Honeywell 181 20 13
Gould 157 75 3
CDC 138 23 9
IPL 128 30 6
Burroughs 124 22 8
BASF 117 70 2
Magnuson 88 37 6
Cambex 74 30 5
DG 72 19 7
Nixdorf 67 21 3
Perkin-
Elmer
64 24 3
BTI 64 15 2
HP 54 18 7
DEC 54 18 6
Prime 53 20 5
Harris 53 18 7
Wang 47 25 2
Stratus 41 41 1
Formation 34 34 5
Microdata 33 33 1
C.R.D 28 21 6
Apollo 24 23 2
Four-Phase 19 19 1
The classification result is shown in Figure 2. According to the results, the vendors can be
classified into five zones. Each zone represents a performance standard, with Zone I being the
lowest relative performance of 200 or less, and Zone V being the highest relative performance of
1000 or more. If the desired relative performance is less than 200, any vendor can be chosen. If
the performance requirement is from 200 – 400, any one of NCR, IBM, Siemens, NAS, Sperry,
or Amdahl can be chosen. The vendors NAS, Sperry, or Amdahl can be chosen for a performance
requirement from 400 – 600. The vendors Sperry or Amdahl can be chosen for a performance
requirement from 600 – 1000. Last, only Amdahl can be chosen for performance requirements
more than 1000.
6. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
6
Figure 2. Classification of vendors based on estimated relative CPU performance.
4.2. CPU Performance Prediction Based on Hardware Components
After scaling the data using Min-Max scaling, we construct several multilayered neural network
architectures with various numbers of hidden layers. In Tables 2 through 5, the training-testing
ratios represent the proportion of the dataset that is used for training and testing, respectively.
The sequences of architecture values represent the number of neurons in each layer of the neural
network. The scaled MSE and correlation of the predictions are calculated for each neural
network architecture at each training-testing ratio. We also calculate the MSE and correlation of
the predictions found by linear, quadratic, and cubic regression analysis. All correlation
coefficients have a significance value less than 5%, except for quadratic regression at 62.5% -
37.5% training-testing ratio. Therefore, the correlation coefficients between predicted and
published performances for this study do have significance.
For all training-testing ratios, the lowest MSE values are produced by a neural network. For
training-testing ratios 62.5% - 37.5% and 65% - 35%, the highest correlation values are produced
by a neural network. For training-testing ratio 67.25% - 32.75%, the highest correlation value is
produced by cubic regression, with a neural network outperforming linear and quadratic
regression. For training-testing ratio 70% - 30%, the highest correlation values are produced by
cubic and quadratic regression, with a neural network outperforming only linear regression.
Table 2: The MSE and predicted-published performance correlation for CPU performance prediction with
62.5% - 37.5% training-testing ratio.
Training-Testing Method Architecture Scaled MSE Correlation Significance
62.5% - 37.5%
Neural Network
6 – 3 – 1 0.00357 0.913 2.2 e-16
6 – 4 – 2 – 1 0.00387 0.909 2.2 e-16
6 – 4 – 3 – 2 – 1 0.00307 0.924 2.2 e-16
6 – 5 – 4 – 3 – 2 – 1 0.00369 0.914 2.2 e-16
Regression
Linear 0.00629 0.848 2.2 e-16
Quadratic 0.02555 0.136 0.2326
Cubic 0.01549 0.898 2.2 e-16
7. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
7
Table 3: The MSE and predicted-published performance correlation for CPU performance prediction with
65% - 35% training-testing ratio.
Training-Testing Method Architecture Scaled MSE Correlation Significance
65% - 35%
Neural Network
6 – 3 – 1 0.00190 0.956 2.2 e-16
6 – 4 – 2 – 1 0.00209 0.951 2.2 e-16
6 – 4 – 3 – 2 – 1 0.00359 0.915 2.2 e-16
6 – 5 – 4 – 3 – 2 – 1 0.00197 0.958 2.2 e-16
Regression
Linear 0.00470 0.884 2.2 e-16
Quadratic 0.00260 0.940 2.2 e-16
Cubic 0.00242 0.944 2.2 e-16
Table 4: The MSE and predicted-published performance correlation for CPU performance prediction with
67.25% - 32.75% training-testing ratio.
Training-Testing Method Architecture Scaled MSE Correlation Significance
67.25% - 32.75%
Neural Network
6 – 3 – 1 0.00284 0.934 2.2 e-16
6 – 4 – 2 – 1 0.00223 0.954 2.2 e-16
6 – 4 – 3 – 2 – 1 0.00342 0.920 2.2 e-16
6 – 5 – 4 – 3 – 2 – 1 0.00220 0.952 2.2 e-16
Regression
Linear 0.00500 0.884 2.2 e-16
Quadratic 0.00757 0.910 2.2 e-16
Cubic 0.00685 0.961 2.2 e-16
Table 5: The MSE and predicted-published performance correlation for CPU performance prediction with
70% - 30% training-testing ratio.
Training-Testing Method Architecture Scaled MSE Correlation Significance
70% - 30%
Neural Network
6 – 3 – 1 0.00351 0.898 2.2 e-16
6 – 4 – 2 – 1 0.00332 0.886 2.2 e-16
6 – 4 – 3 – 2 – 1 0.00414 0.880 2.2 e-16
6 – 5 – 4 – 3 – 2 – 1 0.00348 0.867 2.2 e-16
Regression
Linear 0.00355 0.850 2.2 e-16
Quadratic 0.00386 0.922 2.2 e-16
Cubic 0.01415 0.936 2.2 e-16
The best performing architectures and training-testing ratios are compared in Figure 3 with
respect to lowest MSE and in Figure 4 with respect to highest correlation. It is clear from Figure
3 that the lowest MSE overall is obtained using architecture 6-3-1 with a training-testing ratio of
65% - 35%. Figure 4 shows that the highest correlation overall is obtained using cubic regression
with a 67.25% - 32.75% training-testing ratio. A plot of the published vs. predicted CPU
performance by the neural network with 65% - 25% training-testing ratio and 6-3-1 architecture
is shown in Figure 5, and the best performing neural network architecture is shown in Figure 6.
8. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
8
Figure 3. Comparison of MSE results for multiple architectures and training-testing ratios for CPU
performance prediction.
Figure 4. Comparison of correlation results for multiple architectures and training-testing ratios for CPU
performance prediction.
Figure 5. Plot of the published vs. predicted CPU performance for neural network with 65% - 25%
training-testing ratio and 6-3-1 architecture
9. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
9
Figure 6. Graphical representation of the 6-3-1 neural network architecture
4.3. Analysis of Neural Network Architecture in CPU Performance Prediction
From the results of the CPU performance prediction study, there appears to be no correlation
between the number of hidden layers in the architecture of a neural network and the relative
performance of the neural network. To further explore this result, we will perform our prediction
method using six-input, one-output neural networks with a variety of hidden layer architectures
at a constant 65% training – 35% testing ratio.
We will select a variety of neural network architectures based on three variables: (1) the number
of neurons in a hidden layer, (2) the number of hidden layers, and (3) the order of the hidden
layers. Three categories of tests are run, in which one of the three variables is changed while the
other two variables remain constant.
In the first test, we construct ten neural networks where each neural network has one hidden
layer, and the number of neurons in the hidden layer ranges from one to twenty. Specifically, the
first neural network in this test set has architecture 6-1-1, and the last neural network has
architecture 6-20-1. The results of the performance predictions of this test set are shown in
Figures 7 and 8. All correlation coefficients have a significance value less than 5%, so the
correlations between published and predicted performances are considered significant. From
these figures, there appears to be no pattern in the change of MSE or correlation results due to
the change in number of neurons within a hidden layer.
10. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
10
Figure 7. Plot of the number of neurons in the neural network’s hidden layer vs. the MSE of the
performance prediction produced by the neural network
Figure 8. Plot of the number of neurons in the neural network’s hidden layer vs. the correlation of the
performance prediction produced by the neural network.
For the second test, we construct nine neural networks where each neural network has the same
number of neurons in each hidden layer, and the number of hidden layers ranges from one to
nine. Specifically, the first neural network in this test set has architecture 6-4-1, the second neural
network has architecture 6-4-4-1, and so forth. The results of the performance predictions of this
test set are shown in Figures 9 and 10. All correlation coefficients have a significance value less
than 5%, so the correlations between published and predicted performances are considered
significant. From these figures, there appears to be no pattern in the change of MSE or
correlation results due to the change in the number of hidden layers.
Figure 9. Plot of the number of hidden layers in the neural network vs. the scaled MSE of the performance
prediction produced by the neural network.
11. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
11
Figure 10. Plot of the number of hidden layers in the neural network vs. the correlation of the performance
prediction produced by the neural network.
Finally, for the last test we construct two groups of neural networks. In both groups, each neural
network architecture contains the three hidden layers. The first group has a combination of
hidden layers with 3, 4, and 5 neurons, and the second group has a combination of hidden layers
with 1, 6, and 12 neurons. Each neural network has a unique ordering of the hidden layers. The
architectures and performance prediction results are shown in Table 6. All correlation coefficients
have a significance value less than 5%, so the correlations between published and predicted
performances are considered significant. The 6-4-5-3-1 architecture has the lowest MSE and
highest correlation in the first group, and the 6-12-6-1 architecture has the lowest MSE and
highest correlation in the second group. There does not appear to be a pattern in the change of
MSE or correlation results due to the change in the order of the hidden layers.
Table 6. The MSE and predicted-published performance correlation for CPU performance prediction with
65% - 35% training-testing ratio and varying hidden layer ordering.
Training-Testing Group Architecture Scaled MSE Correlation Significance
65% - 35%
1
6 – 5 – 4 – 3 – 1 0.00253 0.941 2.2 e-16
6 – 5 – 3 – 4 – 1 0.00227 0.945 2.2 e-16
6 – 4 – 5 – 3 – 1 0.00214 0.949 2.2 e-16
6 – 4 – 3 – 5 – 1 0.00232 0.945 2.2 e-16
6 – 3 – 5 – 4 – 1 0.00302 0.927 2.2 e-16
6 – 3 – 4 – 5 – 1 0.00229 0.945 2.2 e-16
2
6 – 12 – 6 – 1 – 1 0.00206 0.951 2.2 e-16
6 – 12 – 1 – 6 – 1 0.00212 0.949 2.2 e-16
6 – 6 – 12 – 1 – 1 0.00289 0.929 2.2 e-16
6 – 6 – 1 – 12 – 1 0.00282 0.931 2.2 e-16
6 – 1 – 12 – 6 – 1 0.00262 0.941 2.2 e-16
6 – 1 – 6 – 12 – 1 0.00275 0.938 2.2 e-16
In all three tests, there is not a clear pattern of how the number of neurons, the number of hidden
layers, or the order of hidden layers affect the predicting capabilities of the neural networks.
Therefore, we are unable to conclude from these tests which multilayered, feedforward neural
network architectures predict CPU performance with highest accuracy.
5. CONCLUSION
In this paper, both classification and prediction tasks are performed to analyze the performance
of CPUs documented in the test dataset. The classification study shows that 30 vendors can be
successfully classified into 5 performance zones. Each zone provides information about the
12. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
12
relative performance capabilities of the vendors’ hardware. Then, performance prediction is
generated using neural networks and regression analysis, both of which accommodate the
random, nonlinear relationship between the input and output variables. Among all prediction
results, the neural network with 65% training - 35% testing ratio and 6-3-1 architecture performs
the best in terms of having the lowest scaled MSE. However, the performance of cubic regression
with 67.25% training - 32.75% testing ratio is found to be best in terms of the highest correlation.
The numerous experiments with varying architectures and training-testing ratios show that the
obtained results are robust. The results from our Pearson correlation analysis show that the
correlations between the predicted and published performance values are significant.
The results from our performance prediction study show that neural networks can be used to
obtain lower prediction error and often higher significant correlation between predicted and
published values. However, cubic regression may have better predicting capabilities than a neural
network when a higher percentage of the data is used for training rather than testing. This does
reiterate our initial observation that there is no perfect or absolute method of predicting CPU
performance.
The performance prediction study is briefly expanded to analyze how hidden layers in the
architecture of the tested neural networks affect their predicting capabilities. The prediction
method is used in three tests to determine how the number of neurons in a hidden layer, the
number of hidden layers, and the order of the hidden layers of a neural network affect the MSE
and correlation of the performance predictions. The results show that none of these three
variables seem to have a pattern of effect on the neural network’s performance prediction results.
More analysis of how the architecture of the neural network affects the performance prediction
can be done, specifically through testing a wider range of architectures and performance data and
changing the training-testing ratio. The prediction method can be also used on current hardware
performance data to determine how neural networks perform in comparison to regression
analysis for a more robust range of experimental structures.
Our classification result shows that other than Zone V, a given required performance can be
obtained by more than one vendor. While this result does show that suitable replacement
hardware can be found using this method, it also implies that hardware configuration can be
copied or tampered with while still having nearly the same performance as the original
configuration. To alleviate this drawback, the work will be extended to detect counterfeit
hardware by a more thorough analysis and comparison of computer hardware performance.
ACKNOWLEDGEMENTS
The work is supported by the Industry Advisory Board (IAB) of an NSF Industry–University Cooperative
Research Center (IUCRC), United States under Grant DFII-1912-USA.
REFERENCES
[1] Lilja, D. (2005) Measuring Computer Performance: A Practitioner's Guide, University of
Minnesota, Cambridge University Press.
[2] Aslan, O. & R. Samet (2020) “A Comprehensive Review on Malware Detection Approaches” IEEE
Access, Vol. 8, pp 6249 - 6271.
[3] Bakhshinejad, N. & A. Hamzeh (2020) “Parallel-CNN Network for Malware Detection,” IEEE
Information Security, Vol. 14, Issue 2, pp 210 - 219.
[4] Wang, X., Y. Han, & M. Tehranipoor (2019) “System-Level Counterfeit Detection Using On-Chip
Ring Oscillator Array,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 27,
Issue 12, pp 2884 – 2896.
13. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.4, July 2020
13
[5] Chattopadhyay, S., P. Kumari, B. Ray, R. Chakraborty (2019) “Machine Learning Assisted Accurate
Estimation of Usage Duration and Manufacturer for Recycled and Counterfeit Flash Memory
Detection,” 2019 IEEE 28th Asian Test Symposium (ATS), Kolkata, India, pp. 49-495.
[6] Kar, A., A. Pani, B. Mangaraj, & S. De (2011) “A Soft Classification Model for Vendor Selection,”
International Journal for Information and Education Technology, Vol. 1, No. 4, pp 268 - 272.
[7] Alexander, Z., T. Mytkowicz, A. Diwan, & E. Bradley (2010) “Measurement and Dynamical
Analysis of Computer Performance Data” Springer, Berlin, Heidelberg, Lecture Notes in Computer
Science, Vol. 6065, pp 18-29.
[8] Lopez, L., M. Guynn, & M. Lu (2018) “Predicting Computer Performance Based on Hardware
Configuration Using Multiple Neural Networks,” IEEE, 17th IEEE International Conference on
Machine Learning and Applications (ICMLA).
[9] Ozisikyilmaz, B., G. Memik, & A. Choudhary (2008) "Machine Learning Models to Predict
Performance of Computer System Design Alternatives," IEEE, 37th IEEE International Conference
on Parallel Processing.
[10] Agarwal, N., T. Jain, & M. Zahran (2019) "Performance Prediction for Multi-threaded
Applications," International Workshop on AI-assisted Design for Architecture.
[11] Girard, S., V. Legault, G. Bois, & J. Boland (2019) "Avionics Graphics Hardware Performance
Prediction with Machine Learning," Scientific Programming.
[12] Kang, Y. (1989) "Computer hardware performance: production and cost function analyses,"
Communications of the ACM, Vol. 32, No. 5, pp 586 –593.
[13] Pelleg, D. (2004) “Scalable and Practical Probability Density Estimators for Scientific Anomaly
Detection,” School of Computer Science, Carnegie Mellon University.
[14] Wang, Y. (2000) “A New Approach to Fitting Linear Models in High Dimensional Spaces,”
Department of Statistics, University of Auckland.
[15] Ein-Dor, P. & J. Feldmesser, Donor: W. Aha (1987) Computer Hardware Data Set, UCI Machine
Learning Repository, https://archive.ics.uci.edu/ml/datasets/Computer+Hardware.
[16] Ein-Dor, P. & J. Feldmesser (1987) “Attributes of the Performance of Central Processing Units: A
Relative Performance Prediction Model,” Communications of ACM, Vol. 30, pp 308-317.
AUTHORS
Courtney Foots is researching as an undergraduate student at the University of South
Alabama, studying computer science and mathematics. Her research interests include
counterfeit hardware detection, applications of artificial intelligence and data science.
Palash Pal received a bachelor’s degree in Technology from University Institute of Technology,
Burdwan University, West Bengal, India.
Rituparna Datta is working as a Computer Research Associate-II in the Department of
Computer Science, University of South Alabama. Prior to that, he was an Operations
Research Scientist in Boeing Research & Technology (BR&T), BOEING, Bangalore.
Aviv Segev is working as an Associate Professor at the Department of Computer Science,
University of South Alabama. His research interest is looking for the DNA of knowledge,
an underlying structure common to all knowledge, through analysis of knowledge models
in natural sciences, knowledge processing in natural and artificial neural networks, and
knowledge mapping between different knowledge domains.