In this paper, to optimize the process of detecting cyber-attacks, we choose to propose 2 main optimization solutions: Optimizing the detection method and optimizing features. Both of these two optimization solutions are to ensure the aim is to increase accuracy and reduce the time for analysis and detection. Accordingly, for the detection method, we recommend using the Random Forest supervised classification algorithm. The experimental results in section 4.1 have proven that our proposal that use the Random Forest algorithm for abnormal behavior detection is completely correct because the results of this algorithm are much better than some other detection algorithms on all measures. For the feature optimization solution, we propose to use some data dimensional reduction techniques such as information gain, principal component analysis, and correlation coefficient method. The results of the research proposed in our paper have proven that to optimize the cyberattack detection process, it is not necessary to use advanced algorithms with complex and cumbersome computational requirements, it must depend on the monitoring data for selecting the reasonable feature extraction and optimization algorithm as well as the appropriate attack classification and detection algorithms.
Improving the performance of Intrusion detection systemsyasmen essam
Intrusion detection systems (IDS) are widely studied by
researchers nowadays due to the dramatic growth in
network-based technologies. Policy violations and
unauthorized access is in turn increasing which makes
intrusion detection systems of great importance. Existing
approaches to improve intrusion detection systems focus on feature selection or reduction since some features are
irrelevant or redundant which when removed improve the
accuracy as well as the learning time.
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...IJMER
Intrusion Detection System (IDS) plays a major role in the provision of effective security to various types of networks. Moreover, Intrusion Detection System for networks need appropriate rule set for classifying network bench mark data into normal or attack patterns. Generally, each dataset is characterized by a large set of features. However, all these features will not be relevant or fully contribute in identifying an attack. Since different attacks need various subsets to provide better detection accuracy. In this paper an improved feature selection algorithm is proposed to identify the most appropriate subset of features for detecting a certain attacks. This proposed method is based on Minkowski distance feature ranking and an improved exhaustive search that selects a better combination of features. This system has been evaluated using the KDD CUP 1999 dataset and also with EMSVM [1] classifier. The experimental results show that the proposed system provides high classification accuracy and low false alarm rate when applied on the reduced feature subsets
Comparison of Data Mining Techniques used in Anomaly Based IDS IRJET Journal
This document discusses anomaly-based intrusion detection systems and compares various data mining techniques used in these systems. It begins by defining intrusion detection systems and the two main categories of misuse detection and anomaly detection. Anomaly detection involves learning normal patterns from data and detecting deviations from these patterns as potential anomalies or intrusions.
The document then examines several data mining techniques used for anomaly detection, including statistical-based approaches like chi-square statistics, and clustering algorithms like k-means, k-medoids, and EM clustering. It notes that these techniques can be applied to intrusion detection to analyze data and detect anomalies representing potential malicious activity. The methodology of anomaly detection is also summarized as involving parameterization of data,
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET Journal
This document presents a study that uses machine learning techniques to predict crime rates. Specifically, it aims to analyze crime data using supervised machine learning classification algorithms like decision trees, support vector machines, logistic regression, k-nearest neighbors, and random forests. The document outlines collecting and preprocessing crime data, selecting relevant features, training models on a portion of the data and testing them on the remaining data. It finds that random forest achieved the best prediction accuracy compared to other algorithms tested. The goal is to help law enforcement agencies better predict and reduce crime rates by analyzing historical crime data patterns.
A web application detecting dos attack using mca and tameSAT Journals
Abstract
Interconnected systems, such as all kind of servers including web servers, are been always under the threats of network attackers. There are many popular attacks like man in middle attack, cross site scripting, spamming etc. but Denial of service attack is considered to be one of most dangerous attack on the networked applications. The attack causes many serious issues on these computing systems A denial-of-service (DoS) attack is an attempt to make a machine or network resource unavailable to the intended users. The performance of the server is reduced by the DoS attack, so, to increase the efficiency of the server, detection of the attack is necessary. Hence Multivariate Correlation Analysis’ issued, this approach employs triangle area for extracting the correlation information between network traffic. Our implemented system is evaluated using KDD Cup 99 data set, and the treatment of both non-normalized data and normalized data on the performance of the proposed detection system are examined. The implemented system has capability of learning new patterns of legitimate network traffic hence it detect both known and unknown types of DoS attacks and we can say that It is working on the principle of anomaly based attack detection. Triangle-area-based technique is used to speed up the process. The stored legitimate profiles has to keep secured so Detection e=mechanism for the SQL injection is also implemented in the system. The system designed to carry out attack detection is a question-answer portal i.e. a web application and hence the system is using HTTP protocol unlike previous systems which were using TCP. Keywords: Denial-of-Service attack, Features Normalization, Triangle Area Map(TAM), Multivariate Correlation Analysis(MCA), anomaly based detection, SQL injection, HTTP, and TCP,
Evaluation of Integrated Digital Forensics Investigation Framework for the In...IJECEIAES
The handling of digital evidence can become an evidence of a determination that crimes have been committed or may give links between crime and its victims or crime and the culprit. Soft System Methodology (SSM) is a method of evaluation to compare a conceptual model with a process in the real world, so deficiencies of the conceptual model can be revealed thus it can perform corrective action against the conceptual model, thus there is no difference between the conceptual model and the real activity. Evaluation on the IDFIF stage is only done on a reactive and proactive process stages in the process so that the IDFIF model can be more flexible and can be applied on the investigation process of a smartphone.
IRJET - Neural Network based Leaf Disease Detection and Remedy Recommenda...IRJET Journal
This document describes a neural network-based system for detecting leaf diseases and recommending remedies. It uses a convolutional neural network (CNN) and deep learning techniques to classify images of plant leaves with different diseases. The system is trained on a dataset of 5000 leaf images across 4 disease classes. It aims to help farmers more easily identify leaf diseases and receive treatment recommendations without needing to directly contact experts. The document outlines the existing problems, proposed solution, literature review on related techniques like boosting and support vector machines, software and algorithms used including Python, Anaconda and Spyder. It also describes the implementation process involving modules for data loading, preprocessing, feature extraction using CNN, disease prediction, and recommending remedies.
IRJET- GDPS - General Disease Prediction SystemIRJET Journal
The document describes a General Disease Prediction System (GDPS) that uses machine learning and data mining techniques to predict diseases based on patient symptoms.
The GDPS first collects patient data, preprocesses it, and extracts relevant features. It then implements the ID3 decision tree algorithm to generate a predictive model and classify diseases. As an admin, one can train the model using sample data. As a user, one can enter symptoms and the trained model will predict the likely disease and recommend precautions.
The GDPS was tested on a dataset of 120 patients and achieved 86.67% accuracy in disease prediction. The system currently covers common diseases but future work involves expanding it to predict more serious or fatal diseases like various cancers
Improving the performance of Intrusion detection systemsyasmen essam
Intrusion detection systems (IDS) are widely studied by
researchers nowadays due to the dramatic growth in
network-based technologies. Policy violations and
unauthorized access is in turn increasing which makes
intrusion detection systems of great importance. Existing
approaches to improve intrusion detection systems focus on feature selection or reduction since some features are
irrelevant or redundant which when removed improve the
accuracy as well as the learning time.
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...IJMER
Intrusion Detection System (IDS) plays a major role in the provision of effective security to various types of networks. Moreover, Intrusion Detection System for networks need appropriate rule set for classifying network bench mark data into normal or attack patterns. Generally, each dataset is characterized by a large set of features. However, all these features will not be relevant or fully contribute in identifying an attack. Since different attacks need various subsets to provide better detection accuracy. In this paper an improved feature selection algorithm is proposed to identify the most appropriate subset of features for detecting a certain attacks. This proposed method is based on Minkowski distance feature ranking and an improved exhaustive search that selects a better combination of features. This system has been evaluated using the KDD CUP 1999 dataset and also with EMSVM [1] classifier. The experimental results show that the proposed system provides high classification accuracy and low false alarm rate when applied on the reduced feature subsets
Comparison of Data Mining Techniques used in Anomaly Based IDS IRJET Journal
This document discusses anomaly-based intrusion detection systems and compares various data mining techniques used in these systems. It begins by defining intrusion detection systems and the two main categories of misuse detection and anomaly detection. Anomaly detection involves learning normal patterns from data and detecting deviations from these patterns as potential anomalies or intrusions.
The document then examines several data mining techniques used for anomaly detection, including statistical-based approaches like chi-square statistics, and clustering algorithms like k-means, k-medoids, and EM clustering. It notes that these techniques can be applied to intrusion detection to analyze data and detect anomalies representing potential malicious activity. The methodology of anomaly detection is also summarized as involving parameterization of data,
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET Journal
This document presents a study that uses machine learning techniques to predict crime rates. Specifically, it aims to analyze crime data using supervised machine learning classification algorithms like decision trees, support vector machines, logistic regression, k-nearest neighbors, and random forests. The document outlines collecting and preprocessing crime data, selecting relevant features, training models on a portion of the data and testing them on the remaining data. It finds that random forest achieved the best prediction accuracy compared to other algorithms tested. The goal is to help law enforcement agencies better predict and reduce crime rates by analyzing historical crime data patterns.
A web application detecting dos attack using mca and tameSAT Journals
Abstract
Interconnected systems, such as all kind of servers including web servers, are been always under the threats of network attackers. There are many popular attacks like man in middle attack, cross site scripting, spamming etc. but Denial of service attack is considered to be one of most dangerous attack on the networked applications. The attack causes many serious issues on these computing systems A denial-of-service (DoS) attack is an attempt to make a machine or network resource unavailable to the intended users. The performance of the server is reduced by the DoS attack, so, to increase the efficiency of the server, detection of the attack is necessary. Hence Multivariate Correlation Analysis’ issued, this approach employs triangle area for extracting the correlation information between network traffic. Our implemented system is evaluated using KDD Cup 99 data set, and the treatment of both non-normalized data and normalized data on the performance of the proposed detection system are examined. The implemented system has capability of learning new patterns of legitimate network traffic hence it detect both known and unknown types of DoS attacks and we can say that It is working on the principle of anomaly based attack detection. Triangle-area-based technique is used to speed up the process. The stored legitimate profiles has to keep secured so Detection e=mechanism for the SQL injection is also implemented in the system. The system designed to carry out attack detection is a question-answer portal i.e. a web application and hence the system is using HTTP protocol unlike previous systems which were using TCP. Keywords: Denial-of-Service attack, Features Normalization, Triangle Area Map(TAM), Multivariate Correlation Analysis(MCA), anomaly based detection, SQL injection, HTTP, and TCP,
Evaluation of Integrated Digital Forensics Investigation Framework for the In...IJECEIAES
The handling of digital evidence can become an evidence of a determination that crimes have been committed or may give links between crime and its victims or crime and the culprit. Soft System Methodology (SSM) is a method of evaluation to compare a conceptual model with a process in the real world, so deficiencies of the conceptual model can be revealed thus it can perform corrective action against the conceptual model, thus there is no difference between the conceptual model and the real activity. Evaluation on the IDFIF stage is only done on a reactive and proactive process stages in the process so that the IDFIF model can be more flexible and can be applied on the investigation process of a smartphone.
IRJET - Neural Network based Leaf Disease Detection and Remedy Recommenda...IRJET Journal
This document describes a neural network-based system for detecting leaf diseases and recommending remedies. It uses a convolutional neural network (CNN) and deep learning techniques to classify images of plant leaves with different diseases. The system is trained on a dataset of 5000 leaf images across 4 disease classes. It aims to help farmers more easily identify leaf diseases and receive treatment recommendations without needing to directly contact experts. The document outlines the existing problems, proposed solution, literature review on related techniques like boosting and support vector machines, software and algorithms used including Python, Anaconda and Spyder. It also describes the implementation process involving modules for data loading, preprocessing, feature extraction using CNN, disease prediction, and recommending remedies.
IRJET- GDPS - General Disease Prediction SystemIRJET Journal
The document describes a General Disease Prediction System (GDPS) that uses machine learning and data mining techniques to predict diseases based on patient symptoms.
The GDPS first collects patient data, preprocesses it, and extracts relevant features. It then implements the ID3 decision tree algorithm to generate a predictive model and classify diseases. As an admin, one can train the model using sample data. As a user, one can enter symptoms and the trained model will predict the likely disease and recommend precautions.
The GDPS was tested on a dataset of 120 patients and achieved 86.67% accuracy in disease prediction. The system currently covers common diseases but future work involves expanding it to predict more serious or fatal diseases like various cancers
The efficient and effective monitoring of mobile networks is vital given the number of users who rely on
such networks and the importance of those networks. The purpose of this paper is to present a monitoring
scheme for mobile networks based on the use of rules and decision tree data mining classifiers to upgrade
fault detection and handling. The goal is to have optimisation rules that improve anomaly detection. In
addition, a monitoring scheme that relies on Bayesian classifiers was also implemented for the purpose of
fault isolation and localisation. The data mining techniques described in this paper are intended to allow a
system to be trained to actually learn network fault rules. The results of the tests that were conducted
allowed for the conclusion that the rules were highly effective to improve network troubleshooting.
AUTOMATED BUG TRIAGE USING ADVANCED DATA REDUCTION TECHNIQUESJournal For Research
Bug triage is an important step in the process of bug fixing. The goal of bug triage is to correctly assign a developer to a newly reported bug in the system. To perform the automated bug triage, text classification techniques are applied. This will helps to reduce the time cost in manual work. To reduce the scale and improve the quality of bug data, the proposed system addresses the data reduction techniques, instance selection and feature selection for bug triage. The instance selection technique used here is to identify the relevant bugs that can match the newly reported bug. The feature selection technique is used to select the relevant data from each bug in the training set. A predictive model is proposed to identify the order in which the data reduction techniques are applied for each newly reported bug. This step will improve the performance of the classification process. An experimental study using Eclipse and Firefox bug data is undergone in which the proposed system shows an accuracy of 73%.
This document discusses using data mining techniques to help with crime investigation by analyzing large amounts of crime data. It compares the performance of three data mining algorithms (J48, Naive Bayes, JRip) on a sample criminal database to identify the best performing algorithm. The best algorithm would then be used on the criminal database to help identify possible suspects for a crime based on evidence and attributes. The document provides details on each of the three algorithms and evaluates them based on classification accuracy and other metrics to select the best technique for the criminal investigation application.
A PROPOSED MODEL FOR DIMENSIONALITY REDUCTION TO IMPROVE THE CLASSIFICATION C...IJNSA Journal
Over the past few years, intrusion protection systems have drawn a mature research area in the field of computer networks. The problem of excessive features has a significant impact on
intrusion detection performance. The use of machine learning algorithms in many previous researches has been used to identify network traffic, harmful or normal. Therefore, to obtain the accuracy, we must reduce the dimensionality of the data used. A new model design based on a combination of feature selection and machine learning algorithms is proposed in this paper. This model depends on selected genes from every feature to increase the accuracy of intrusion detection systems. We selected from features content only ones which impact in attack detection. The performance has been evaluated based on a comparison of several known algorithms. The NSL-KDD dataset is used for examining classification. The proposed model outperformed the other learning approaches with accuracy 98.8 %.
Automated exam question set generator using utility based agent and learning ...Journal Papers
This document proposes an Automated Exam Question Set Generator (AEQSG) that uses two intelligent agents - a Utility Based Agent (UBA) and a Learning Agent (LA). The UBA chooses exam questions based on user preferences or utilities, while the LA learns from past exam results to improve future question set generation. The AEQSG also applies Bloom's Taxonomy and Genetic Algorithms to generate question sets that meet guidelines while distributing questions by difficulty level. This approach aims to reduce educators' time spent creating exam question sets and improve their quality.
Application of Data Mining Technique in Invasion RecognitionIOSR Journals
This document discusses applying data mining techniques to network intrusion detection. It begins by introducing traditional intrusion detection methods and their limitations. It then establishes a data mining-based model for intrusion detection that is designed to address these limitations. The model collects network data, preprocesses it, applies data mining algorithms to extract patterns, and uses the patterns to detect intrusions. This allows detection of both known and unknown intrusion types while reducing false alarms and missed detections compared to traditional methods. Key aspects of the proposed system include data acquisition, preprocessing, pattern extraction using various data mining algorithms, and intrusion detection based on the extracted patterns.
COMPUTER INTRUSION DETECTION BY TWOOBJECTIVE FUZZY GENETIC ALGORITHMcscpconf
This document describes a computer intrusion detection system that uses a two-objective fuzzy genetic algorithm. The system aims to (1) maximize detection rate and (2) minimize the number of rules while maintaining a low false rate. It generates fuzzy rules from training data to classify known attack patterns. A genetic algorithm is then applied to find non-dominated sets of rules that optimize the two objectives. The best performing rule set is used as signatures to detect known intrusions in test data.
Software Defect Prediction Using Radial Basis and Probabilistic Neural NetworksEditor IJCATR
This document discusses using neural networks for software defect prediction. It examines the effectiveness of using a radial basis function neural network and a probabilistic neural network on prediction accuracy and defect prediction compared to other techniques. The key findings are that neural networks provide an acceptable level of accuracy for defect prediction but perform poorly at actual defect prediction. Probabilistic neural networks performed consistently better than other techniques across different datasets in terms of prediction accuracy and defect prediction ability. The document recommends using an ensemble of different software defect prediction models rather than relying on a single technique.
an error in that computer program. In order to improve the software quality, prediction of faulty modules is
necessary. Various Metric suites and techniques are available to predict the modules which are critical and
likely to be fault prone. Genetic Algorithm is a problem solving algorithm. It uses genetics as its model of
problem solving. It’s a search technique to find approximate solutions to optimization and search
problems.Genetic algorithm is applied for solving the problem of faulty module prediction and as well as
for finding the most important attribute for fault occurrence. In order to perform the analysis, performance
validation of the Genetic Algorithm using open source software jEdit is done. The results are measured in
terms Accuracy and Error in predicting by calculating probability of detection and probability of false
Alarms
IRJET - Survey on Malware Detection using Deep Learning MethodsIRJET Journal
This document discusses various machine learning methods for malware detection, including support vector machines (SVM), random forests, and decision trees. It provides an overview of each method and related works that have applied these techniques. Specifically, it examines analyses that used linear SVM, random forests on Android apps, and an improved decision tree algorithm to classify malware families. The document concludes that machine learning methods have become important for malware detection as signatures alone cannot keep up with new malware variants.
Test case prioritization using firefly algorithm for software testingJournal Papers
Firefly Algorithm is applied to optimize the ordering of test cases for software testing. Test cases are represented as fireflies, with their similarity distance calculated using string metrics determining the firefly brightness. The Firefly Algorithm prioritizes test cases by moving brighter fireflies, representing more dissimilar test cases, to the front of the test sequence. Experiments on benchmark programs show the Firefly Algorithm approach achieves better or equal average percentage of faults detected and time performance compared to existing works.
Fault Detection in Mobile Communication Networks Using Data Mining Techniques...ijcisjournal
This document discusses using data mining techniques and big data analytics to detect faults in mobile communication networks. It first provides background on data mining, mobile communication networks, and fault detection techniques. It then discusses using self-organizing maps, discrete wavelet transforms, and cluster analysis as data mining techniques to analyze network data and detect faults between source and destination nodes. The goal is to identify outlier nodes experiencing faults by analyzing patterns in large datasets from mobile networks.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Evaluation of network intrusion detection using markov chainIJCI JOURNAL
Day today life internet threat has been increased significantly. There is a need to develop model in order to
maintain security of system. The most effective techniques are Intrusion Detection System (IDS).The
purpose of intrusion system through the security devices detect and deal with it. In this paper, a
mathematical approach is used effectively to predict and detect intrusion in the network. Here we discuss
about two algorithms ‘K-Means + Apriori’, a method which classify normal and abnormal activities in
computer network. In K-Means process, it partitions the training set into K-clusters using Euclidean
distance and introduce an outlier factor, then it build Apriori Algorithm to prune the data by removing
infrequent data in the database. Based on defined state the degree of incoming data is evaluated through
the experiment using sample DARPA2000 dataset, and achieves high detection performance in level of
attack in stages.
Comparative Performance Analysis of Machine Learning Techniques for Software ...csandit
Machine learning techniques can be used to analyse data from different perspectives and enable
developers to retrieve useful information. Machine learning techniques are proven to be useful
in terms of software bug prediction. In this paper, a comparative performance analysis of
different machine learning techniques is explored for software bug prediction on public
available data sets. Results showed most of the machine learning methods performed well on
software bug datasets.
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...IRJET Journal
1. The document presents a comparative analysis of machine learning algorithms for predicting customer churn in the telecom industry.
2. Logistic regression, random forest, and balanced random forest classifiers were evaluated on a dataset of 25,000 customers described by 111 variables.
3. The balanced logistic regression model that used SMOTE to address class imbalance achieved the best performance with an area under the ROC curve of 0.861, accurately predicting churn with an accuracy of 77% and recall of 76% on the test set.
Machine Learning approaches are good in solving problems that have less information. In most cases, the
software domain problems characterize as a process of learning that depend on the various circumstances
and changes accordingly. A predictive model is constructed by using machine learning approaches and
classified them into defective and non-defective modules. Machine learning techniques help developers to
retrieve useful information after the classification and enable them to analyse data from different
perspectives. Machine learning techniques are proven to be useful in terms of software bug prediction. This
study used public available data sets of software modules and provides comparative performance analysis
of different machine learning techniques for software bug prediction. Results showed most of the machine
learning methods performed well on software bug datasets.
With the increasing importance of cyberspace security, the research and application of network situational awareness is getting more attention. The research on network security situational awareness is of great significance for
improving the network monitoring ability, emergency response capability and predicting the development trend of network security
Data Mining Techniques for Providing Network Security through Intrusion Detec...IJAAS Team
Intrusion Detection Systems are playing major role in network security in this internet world. Many researchers have been introduced number of intrusion detection systems in the past. Even though, no system was detected all kind of attacks and achieved better detection accuracy. Most of the intrusion detection systems are used data mining techniques such as clustering, outlier detection, classification, classification through learning techniques. Most of the researchers have been applied soft computing techniques for making effective decision over the network dataset for enhancing the detection accuracy in Intrusion Detection System. Few researchers also applied artificial intelligence techniques along with data mining algorithms for making dynamic decision. This paper discusses about the number of intrusion detection systems that are proposed for providing network security. Finally, comparative analysis made between the existing systems and suggested some new ideas for enhancing the performance of the existing systems.
ANALYSIS OF MACHINE LEARNING ALGORITHMS WITH FEATURE SELECTION FOR INTRUSION ...IJNSA Journal
This document summarizes a research paper that analyzes machine learning algorithms for intrusion detection using the UNSW-NB15 dataset. It compares the performance of KNN, SGD, Random Forest, Logistic Regression, and Naive Bayes classifiers both with and without feature selection. Chi-Square feature selection is applied to reduce irrelevant features before training and evaluating the classifiers on metrics like accuracy, precision, recall, F1-score and detection/false positive rates. The Random Forest classifier generally performed best on earlier datasets but the paper aims to evaluate these algorithms on the latest UNSW-NB15 dataset containing novel attacks.
ANALYSIS OF MACHINE LEARNING ALGORITHMS WITH FEATURE SELECTION FOR INTRUSION ...IJNSA Journal
This document summarizes a research paper that analyzes machine learning algorithms for intrusion detection using the UNSW-NB15 dataset. It compares the performance of classifiers like KNN, SGD, Random Forest, Logistic Regression, and Naive Bayes, both with and without feature selection. Chi-Square feature selection is applied to reduce irrelevant features before training the classifiers. The classifiers' performance is evaluated based on metrics like accuracy, precision, recall, F1-score, true positive rate and false positive rate. The paper finds that feature selection can improve classifiers' performance for intrusion detection.
IRJET- Anomaly Detection System in CCTV Derived VideosIRJET Journal
This document describes a proposed system for anomaly detection in CCTV videos using deep learning techniques. The system has two main components: 1) feature extraction using convolutional neural networks to learn representations of normal behavior from training videos, and 2) an anomaly detection classifier to identify abnormal events in new videos based on the learned features. Several related works incorporating techniques like k-means clustering, decision trees, and neural networks for video-based anomaly detection are also reviewed. The methodology section outlines the overall framework, including preprocessing steps and separate training and testing phases to extract normal features and then detect anomalies.
The efficient and effective monitoring of mobile networks is vital given the number of users who rely on
such networks and the importance of those networks. The purpose of this paper is to present a monitoring
scheme for mobile networks based on the use of rules and decision tree data mining classifiers to upgrade
fault detection and handling. The goal is to have optimisation rules that improve anomaly detection. In
addition, a monitoring scheme that relies on Bayesian classifiers was also implemented for the purpose of
fault isolation and localisation. The data mining techniques described in this paper are intended to allow a
system to be trained to actually learn network fault rules. The results of the tests that were conducted
allowed for the conclusion that the rules were highly effective to improve network troubleshooting.
AUTOMATED BUG TRIAGE USING ADVANCED DATA REDUCTION TECHNIQUESJournal For Research
Bug triage is an important step in the process of bug fixing. The goal of bug triage is to correctly assign a developer to a newly reported bug in the system. To perform the automated bug triage, text classification techniques are applied. This will helps to reduce the time cost in manual work. To reduce the scale and improve the quality of bug data, the proposed system addresses the data reduction techniques, instance selection and feature selection for bug triage. The instance selection technique used here is to identify the relevant bugs that can match the newly reported bug. The feature selection technique is used to select the relevant data from each bug in the training set. A predictive model is proposed to identify the order in which the data reduction techniques are applied for each newly reported bug. This step will improve the performance of the classification process. An experimental study using Eclipse and Firefox bug data is undergone in which the proposed system shows an accuracy of 73%.
This document discusses using data mining techniques to help with crime investigation by analyzing large amounts of crime data. It compares the performance of three data mining algorithms (J48, Naive Bayes, JRip) on a sample criminal database to identify the best performing algorithm. The best algorithm would then be used on the criminal database to help identify possible suspects for a crime based on evidence and attributes. The document provides details on each of the three algorithms and evaluates them based on classification accuracy and other metrics to select the best technique for the criminal investigation application.
A PROPOSED MODEL FOR DIMENSIONALITY REDUCTION TO IMPROVE THE CLASSIFICATION C...IJNSA Journal
Over the past few years, intrusion protection systems have drawn a mature research area in the field of computer networks. The problem of excessive features has a significant impact on
intrusion detection performance. The use of machine learning algorithms in many previous researches has been used to identify network traffic, harmful or normal. Therefore, to obtain the accuracy, we must reduce the dimensionality of the data used. A new model design based on a combination of feature selection and machine learning algorithms is proposed in this paper. This model depends on selected genes from every feature to increase the accuracy of intrusion detection systems. We selected from features content only ones which impact in attack detection. The performance has been evaluated based on a comparison of several known algorithms. The NSL-KDD dataset is used for examining classification. The proposed model outperformed the other learning approaches with accuracy 98.8 %.
Automated exam question set generator using utility based agent and learning ...Journal Papers
This document proposes an Automated Exam Question Set Generator (AEQSG) that uses two intelligent agents - a Utility Based Agent (UBA) and a Learning Agent (LA). The UBA chooses exam questions based on user preferences or utilities, while the LA learns from past exam results to improve future question set generation. The AEQSG also applies Bloom's Taxonomy and Genetic Algorithms to generate question sets that meet guidelines while distributing questions by difficulty level. This approach aims to reduce educators' time spent creating exam question sets and improve their quality.
Application of Data Mining Technique in Invasion RecognitionIOSR Journals
This document discusses applying data mining techniques to network intrusion detection. It begins by introducing traditional intrusion detection methods and their limitations. It then establishes a data mining-based model for intrusion detection that is designed to address these limitations. The model collects network data, preprocesses it, applies data mining algorithms to extract patterns, and uses the patterns to detect intrusions. This allows detection of both known and unknown intrusion types while reducing false alarms and missed detections compared to traditional methods. Key aspects of the proposed system include data acquisition, preprocessing, pattern extraction using various data mining algorithms, and intrusion detection based on the extracted patterns.
COMPUTER INTRUSION DETECTION BY TWOOBJECTIVE FUZZY GENETIC ALGORITHMcscpconf
This document describes a computer intrusion detection system that uses a two-objective fuzzy genetic algorithm. The system aims to (1) maximize detection rate and (2) minimize the number of rules while maintaining a low false rate. It generates fuzzy rules from training data to classify known attack patterns. A genetic algorithm is then applied to find non-dominated sets of rules that optimize the two objectives. The best performing rule set is used as signatures to detect known intrusions in test data.
Software Defect Prediction Using Radial Basis and Probabilistic Neural NetworksEditor IJCATR
This document discusses using neural networks for software defect prediction. It examines the effectiveness of using a radial basis function neural network and a probabilistic neural network on prediction accuracy and defect prediction compared to other techniques. The key findings are that neural networks provide an acceptable level of accuracy for defect prediction but perform poorly at actual defect prediction. Probabilistic neural networks performed consistently better than other techniques across different datasets in terms of prediction accuracy and defect prediction ability. The document recommends using an ensemble of different software defect prediction models rather than relying on a single technique.
an error in that computer program. In order to improve the software quality, prediction of faulty modules is
necessary. Various Metric suites and techniques are available to predict the modules which are critical and
likely to be fault prone. Genetic Algorithm is a problem solving algorithm. It uses genetics as its model of
problem solving. It’s a search technique to find approximate solutions to optimization and search
problems.Genetic algorithm is applied for solving the problem of faulty module prediction and as well as
for finding the most important attribute for fault occurrence. In order to perform the analysis, performance
validation of the Genetic Algorithm using open source software jEdit is done. The results are measured in
terms Accuracy and Error in predicting by calculating probability of detection and probability of false
Alarms
IRJET - Survey on Malware Detection using Deep Learning MethodsIRJET Journal
This document discusses various machine learning methods for malware detection, including support vector machines (SVM), random forests, and decision trees. It provides an overview of each method and related works that have applied these techniques. Specifically, it examines analyses that used linear SVM, random forests on Android apps, and an improved decision tree algorithm to classify malware families. The document concludes that machine learning methods have become important for malware detection as signatures alone cannot keep up with new malware variants.
Test case prioritization using firefly algorithm for software testingJournal Papers
Firefly Algorithm is applied to optimize the ordering of test cases for software testing. Test cases are represented as fireflies, with their similarity distance calculated using string metrics determining the firefly brightness. The Firefly Algorithm prioritizes test cases by moving brighter fireflies, representing more dissimilar test cases, to the front of the test sequence. Experiments on benchmark programs show the Firefly Algorithm approach achieves better or equal average percentage of faults detected and time performance compared to existing works.
Fault Detection in Mobile Communication Networks Using Data Mining Techniques...ijcisjournal
This document discusses using data mining techniques and big data analytics to detect faults in mobile communication networks. It first provides background on data mining, mobile communication networks, and fault detection techniques. It then discusses using self-organizing maps, discrete wavelet transforms, and cluster analysis as data mining techniques to analyze network data and detect faults between source and destination nodes. The goal is to identify outlier nodes experiencing faults by analyzing patterns in large datasets from mobile networks.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Evaluation of network intrusion detection using markov chainIJCI JOURNAL
Day today life internet threat has been increased significantly. There is a need to develop model in order to
maintain security of system. The most effective techniques are Intrusion Detection System (IDS).The
purpose of intrusion system through the security devices detect and deal with it. In this paper, a
mathematical approach is used effectively to predict and detect intrusion in the network. Here we discuss
about two algorithms ‘K-Means + Apriori’, a method which classify normal and abnormal activities in
computer network. In K-Means process, it partitions the training set into K-clusters using Euclidean
distance and introduce an outlier factor, then it build Apriori Algorithm to prune the data by removing
infrequent data in the database. Based on defined state the degree of incoming data is evaluated through
the experiment using sample DARPA2000 dataset, and achieves high detection performance in level of
attack in stages.
Comparative Performance Analysis of Machine Learning Techniques for Software ...csandit
Machine learning techniques can be used to analyse data from different perspectives and enable
developers to retrieve useful information. Machine learning techniques are proven to be useful
in terms of software bug prediction. In this paper, a comparative performance analysis of
different machine learning techniques is explored for software bug prediction on public
available data sets. Results showed most of the machine learning methods performed well on
software bug datasets.
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...IRJET Journal
1. The document presents a comparative analysis of machine learning algorithms for predicting customer churn in the telecom industry.
2. Logistic regression, random forest, and balanced random forest classifiers were evaluated on a dataset of 25,000 customers described by 111 variables.
3. The balanced logistic regression model that used SMOTE to address class imbalance achieved the best performance with an area under the ROC curve of 0.861, accurately predicting churn with an accuracy of 77% and recall of 76% on the test set.
Machine Learning approaches are good in solving problems that have less information. In most cases, the
software domain problems characterize as a process of learning that depend on the various circumstances
and changes accordingly. A predictive model is constructed by using machine learning approaches and
classified them into defective and non-defective modules. Machine learning techniques help developers to
retrieve useful information after the classification and enable them to analyse data from different
perspectives. Machine learning techniques are proven to be useful in terms of software bug prediction. This
study used public available data sets of software modules and provides comparative performance analysis
of different machine learning techniques for software bug prediction. Results showed most of the machine
learning methods performed well on software bug datasets.
With the increasing importance of cyberspace security, the research and application of network situational awareness is getting more attention. The research on network security situational awareness is of great significance for
improving the network monitoring ability, emergency response capability and predicting the development trend of network security
Data Mining Techniques for Providing Network Security through Intrusion Detec...IJAAS Team
Intrusion Detection Systems are playing major role in network security in this internet world. Many researchers have been introduced number of intrusion detection systems in the past. Even though, no system was detected all kind of attacks and achieved better detection accuracy. Most of the intrusion detection systems are used data mining techniques such as clustering, outlier detection, classification, classification through learning techniques. Most of the researchers have been applied soft computing techniques for making effective decision over the network dataset for enhancing the detection accuracy in Intrusion Detection System. Few researchers also applied artificial intelligence techniques along with data mining algorithms for making dynamic decision. This paper discusses about the number of intrusion detection systems that are proposed for providing network security. Finally, comparative analysis made between the existing systems and suggested some new ideas for enhancing the performance of the existing systems.
ANALYSIS OF MACHINE LEARNING ALGORITHMS WITH FEATURE SELECTION FOR INTRUSION ...IJNSA Journal
This document summarizes a research paper that analyzes machine learning algorithms for intrusion detection using the UNSW-NB15 dataset. It compares the performance of KNN, SGD, Random Forest, Logistic Regression, and Naive Bayes classifiers both with and without feature selection. Chi-Square feature selection is applied to reduce irrelevant features before training and evaluating the classifiers on metrics like accuracy, precision, recall, F1-score and detection/false positive rates. The Random Forest classifier generally performed best on earlier datasets but the paper aims to evaluate these algorithms on the latest UNSW-NB15 dataset containing novel attacks.
ANALYSIS OF MACHINE LEARNING ALGORITHMS WITH FEATURE SELECTION FOR INTRUSION ...IJNSA Journal
This document summarizes a research paper that analyzes machine learning algorithms for intrusion detection using the UNSW-NB15 dataset. It compares the performance of classifiers like KNN, SGD, Random Forest, Logistic Regression, and Naive Bayes, both with and without feature selection. Chi-Square feature selection is applied to reduce irrelevant features before training the classifiers. The classifiers' performance is evaluated based on metrics like accuracy, precision, recall, F1-score, true positive rate and false positive rate. The paper finds that feature selection can improve classifiers' performance for intrusion detection.
IRJET- Anomaly Detection System in CCTV Derived VideosIRJET Journal
This document describes a proposed system for anomaly detection in CCTV videos using deep learning techniques. The system has two main components: 1) feature extraction using convolutional neural networks to learn representations of normal behavior from training videos, and 2) an anomaly detection classifier to identify abnormal events in new videos based on the learned features. Several related works incorporating techniques like k-means clustering, decision trees, and neural networks for video-based anomaly detection are also reviewed. The methodology section outlines the overall framework, including preprocessing steps and separate training and testing phases to extract normal features and then detect anomalies.
An intrusion detection algorithm for amiIJCI JOURNAL
Nowadays, using the smart metering devices for energy users to manage a wide variety of subscribers,
reading devices for measuring, billing, disconnection and connection of subscribers’ connection
management is an important issue. The performance of these intelligent systems is based on information
transfer in the context of information technology, so reported data from network should be managed to
avoid the malicious activities that including the issues that could affect the quality of service the system. In
this paper for control of the reported data and to ensure the veracity of the obtained information, using
intrusion detection system is proposed based on the support vector machine and principle component
analysis (PCA) to recognize and identify the intrusions and attacks in the smart grid. Here, the operation of
intrusion detection systems for different kernel of SVM when using support vector machine (SVM) and PCA
simultaneously is studied. To evaluate the algorithm, based on data KDD99, numerical simulation is done
on five different kernels for an intrusion detection system using support vector machine with PCA
simultaneously. Also comparison analysis is investigated for presented intrusion detection algorithm in
terms of time - response, rate of increase network efficiency and increase system error and differences in
the use or lack of use PCA. The results indicate that correct detection rate and the rate of attack error
detection have best value when PCA is used, and when the core of algorithm is radial type, in SVM
algorithm reduces the time for data analysis and enhances performance of intrusion detection.
Intrusion Detection System (IDS): Anomaly Detection using Outlier Detection A...Drjabez
This document describes a proposed approach for anomaly detection in intrusion detection systems using outlier detection. It begins with background on intrusion detection systems and issues with existing approaches. It then presents the proposed two-stage approach using outlier detection: 1) Training with large normal datasets in a distributed storage environment, and 2) Testing intrusion datasets to compute an error value compared to the trained model. If the error value exceeds a threshold, the test data is flagged as anomalous. Experimental results on network packet datasets demonstrate the approach can effectively identify anomalies.
Predictive Modeling for Topographical Analysis of Crime RateIRJET Journal
This document describes a proposed system to use machine learning methods to predict crime rates and types of crimes in specific areas based on historical crime data. The system would analyze crime data collected from websites including date, location, and crime type to identify patterns. Machine learning algorithms would be trained on the data to build predictive models. The goal is to help law enforcement agencies more quickly detect, resolve, and prevent crimes by predicting where and what types of crimes may occur based on the characteristics of past crimes.
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
This document summarizes a research paper that proposes a hybrid intrusion detection system using modified k-means clustering and naive Bayes classification. The system aims to improve detection rates and reduce false alarms. It first uses modified k-means clustering to filter outliers and identify cluster centers in the data. It then uses naive Bayes classification to classify data points as normal or intrusions. The researchers tested their system on the KDD99 intrusion detection dataset and found it achieved high accuracy rates of over 99% with low false positive rates, outperforming other methods.
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
Abstract
Internet Technology is growing at exponential rate day by day, making data security of computer systems more complex and critical. There has been multiple methodology implemented for the same in recent time as detailed in [1], [3]. Availability of larger bandwidth has made the multiple large computer server network connected worldwide and thus increasing the load on the necessity to secure data and Intrusion detection system (IDS) is one of the most efficient technique to maintain security of computer system. The proposed system is designed in such a way that are helpful in identifying malicious behavior and improper use of computer system. In this report we proposed a hybrid technique for intrusion detection using data mining algorithms. Our main objective is to do complete analysis of intrusion detection Dataset to test the implemented system.In This report we will propose a new methodology in which Modified k-mean is used for clustering whereas Naïve Bayes for the classification. These two data mining techniques will be used for Intrusion detection in large horizontally distributed database.
Keywords: Intrusion Detection, Modified K-Mean, Naïve Bays
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...ijaia
In the era of the fourth industrial revolution, measuring and ensuring the reliability, efficiency and safety of the industrial systems and components are one of the uppermost key concern. In addition, predicting performance degradation or remaining useful life (RUL) of an equipment over time based on its historical sensor data enables companies to greatly reduce their maintenance cost. In this way, companies can prevent costly unexpected breakdown and become more profitable and competitive in the marketplace. This paper introduces a deep learning-based method by combining CNN(Convolutional Neural Networks) and LSTM (Long Short-Term Memory)neural networks to predict RUL for industrial equipment. The proposed method does not depend upon any degradation trend assumptions and it can learn complex temporal representative and distinguishing patterns in the sensor data. In order to evaluate the efficiency and effectiveness of the proposed method, we evaluated it on two different experiment: RUL estimation and predicting the status of the IoT devices in 2-week period. Experiments are conducted on a publicly available NASA’s turbo fan-engine dataset. Based on the experiment results, the deep learning-based approach achieved high prediction accuracy. Moreover, the results show that the method outperforms standard well-accepted machine learning algorithms and accomplishes competitive performance when compared to the state-of-the art methods
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...gerogepatton
In the era of the fourth industrial revolution, measuring and ensuring the reliability, efficiency and safety of the industrial systems and components are one of the uppermost key concern. In addition, predicting performance degradation or remaining useful life (RUL) of an equipment over time based on its historical sensor data enables companies to greatly reduce their maintenance cost. In this way, companies can prevent costly unexpected breakdown and become more profitable and competitive in the marketplace. This paper introduces a deep learning-based method by combining CNN(Convolutional Neural Networks) and LSTM (Long Short-Term Memory)neural networks to predict RUL for industrial equipment. The proposed method does not depend upon any degradation trend assumptions and it can learn complex temporal representative and distinguishing patterns in the sensor data. In order to evaluate the efficiency and effectiveness of the proposed method, we evaluated it on two different experiment: RUL estimation and predicting the status of the IoT devices in 2-week period. Experiments are conducted on a publicly available NASA’s turbo fan-engine dataset. Based on the experiment results, the deep learning-based approach achieved high prediction accuracy. Moreover, the results show that the method outperforms standard well-accepted machine learning algorithms and accomplishes competitive performance when compared to the state-of-the art methods
Threshold benchmarking for feature ranking techniquesjournalBEEI
In prediction modeling, the choice of features chosen from the original feature set is crucial for accuracy and model interpretability. Feature ranking techniques rank the features by its importance but there is no consensus on the number of features to be cut-off. Thus, it becomes important to identify a threshold value or range, so as to remove the redundant features. In this work, an empirical study is conducted for identification of the threshold benchmark for feature ranking algorithms. Experiments are conducted on Apache Click dataset with six popularly used ranker techniques and six machine learning techniques, to deduce a relationship between the total number of input features (N) to the threshold range. The area under the curve analysis shows that ≃ 33-50% of the features are necessary and sufficient to yield a reasonable performance measure, with a variance of 2%, in defect prediction models. Further, we also find that the log2(N) as the ranker threshold value represents the lower limit of the range.
Empowering anomaly detection algorithm: a reviewIAESIJAI
Detecting anomalies in a data stream relevant to domains like intrusion detection, fraud detection, security in sensor networks, or event detection in internet of things (IoT) environments is a growing field of research. For instance, the use of surveillance cameras installed everywhere that is usually governed by human experts. However, when many cameras are involved, more human expertise is needed, thus making it expensive. Hence, researchers worldwide are trying to invent the best-automated algorithm to detect abnormal behavior using real-time data. The designed algorithm for this purpose may contain gaps that could differentiate the qualities in specific domains. Therefore, this study presents a review of anomaly detection algorithms, introducing the gap that presents the advantages and disadvantages of these algorithms. Since many works of literature were reviewed in this review, it is expected to aid researchers in closing this gap in the future.
A Lightweight Method for Detecting Cyber Attacks in High-traffic Large Networ...IJCNCJournal
Protecting information systems is a difficult and long-term task. The size and traffic intensity of computer networks are diverse and no one protection solution is universal for all cases. A certain solution protects well in the campus network, but it is unlikely to protect well in the service provider's network. A key component of a cyber defence system is a network attack detector. This component needs to be designed to have a good way to scale detection capabilities with network size and traffic intensity beyond the size and intensity of a campus network. From this point of view, this paper aims to build a network attack detection method suitable for the scale of large and high-traffic networks based on machine learning models using clustering techniques and our proposed detection technique. The detection technique is different from outlier detection commonly used in clustering-based anomaly detection applications. The method was evaluated in cases using different feature extraction methods and different clustering algorithms. Experimental results on the NSL-KDD data set are positive with a detection accuracy of over 97%.
A LIGHTWEIGHT METHOD FOR DETECTING CYBER ATTACKS IN HIGH-TRAFFIC LARGE NETWOR...IJCNCJournal
The document proposes a lightweight method for detecting cyberattacks in high-traffic large networks based on clustering techniques. The method clusters network traffic data using algorithms like K-means and DBSCAN. It extracts features from the data using methods like Risk-based Acquisition and Attribute Ratio before clustering. The clusters are then analyzed to detect anomalous clusters that may indicate attacks. This detection technique can scale to large, high-traffic networks by processing data in batches and distributing the workload of analyzing clusters across multiple parallel processes. The method was evaluated on the NSL-KDD dataset and achieved over 97% detection accuracy.
The document discusses using data mining techniques to analyze crime data and predict crime trends. It describes collecting crime reports from various sources to create a database. Machine learning algorithms would then be applied to the crime data to discover patterns and relationships between different crimes. This analysis could help police identify crime hotspots and determine if a crime was committed in a known location. The proposed system aims to forecast crimes and trends based on past crime data, date and location to help prevent crimes. It discusses implementing the system using Python and testing it with sample input data.
This document discusses a novel method for intrusion awareness using Distributed Situational Awareness (D-SA). It proposes using D-SA and support vector machines (SVM) for network intrusion detection and classification. The method is evaluated using the KDD Cup 1999 intrusion detection dataset. Experimental results show the proposed D-SA method achieves higher detection rates compared to rule-based classification techniques.
Analysis on Fraud Detection Mechanisms Using Machine Learning TechniquesIRJET Journal
1) The document discusses using machine learning techniques like Random Forest Classifier and AdaBoost to detect fraud in blockchain transactions through an ensemble model.
2) It analyzes the individual accuracy of Random Forest and AdaBoost classifiers, finding accuracies over 99.99%, then ensembles their predictions using a stacking method.
3) The stacking ensemble model combines the predictions of the Random Forest and AdaBoost models into a new training set to potentially provide even more accurate fraud detection compared to the individual models.
ENVIRONMENTAL QUALITY PREDICTION AND ITS DEPLOYMENTIRJET Journal
This document presents research on using machine learning models to predict environmental quality by analyzing data from environmental sensors. The researchers implemented classification algorithms like decision trees, logistic regression, naive bayes, random forest and support vector machines to predict air quality using metrics like accuracy. They found that the decision tree algorithm achieved the highest accuracy of 99.8% among the other models. The proposed system aims to develop a more effective machine learning model for air quality prediction compared to existing image-based techniques, in order to help monitor pollution levels and protect public health.
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
Building practical and efficient intrusion detection systems in computer network is important in industrial areas today and machine learning technique provides a set of effective algorithms to detect network
intrusion. To find out appropriate algorithms for building such kinds of systems, it is necessary to evaluate various types of machine learning algorithms based on specific criteria. In this paper, we propose a novel evaluation formula which incorporates 6 indexes into our comprehensive measurement, including precision, recall, root mean square error, training time, sample complexity and practicability, in order to
find algorithms which have high detection rate, low training time, need less training samples and are easy
to use like constructing, understanding and analyzing models. Detailed evaluation process is designed to
get all necessary assessment indicators and 6 kinds of machine learning algorithms are evaluated.
Experimental results illustrate that Logistic Regression shows the best overall performance.
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
This document proposes a novel evaluation approach to find lightweight machine learning algorithms for intrusion detection. It incorporates 6 evaluation indexes: precision, recall, root mean square error, training time, sample complexity, and practicability. The evaluation formula calculates a score for each algorithm based on F1 score and penalty values. The document defines penalty values for the practicability of 6 machine learning algorithms (decision tree, naive bayes, multilayer perceptron, radial basis function network, logistic regression, support vector machine). Experimental results on intrusion detection datasets will evaluate the algorithms based on the proposed approach.
Similar to Optimization of network traffic anomaly detection using machine learning (20)
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Neural network optimizer of proportional-integral-differential controller par...IJECEIAES
Wide application of proportional-integral-differential (PID)-regulator in industry requires constant improvement of methods of its parameters adjustment. The paper deals with the issues of optimization of PID-regulator parameters with the use of neural network technology methods. A methodology for choosing the architecture (structure) of neural network optimizer is proposed, which consists in determining the number of layers, the number of neurons in each layer, as well as the form and type of activation function. Algorithms of neural network training based on the application of the method of minimizing the mismatch between the regulated value and the target value are developed. The method of back propagation of gradients is proposed to select the optimal training rate of neurons of the neural network. The neural network optimizer, which is a superstructure of the linear PID controller, allows increasing the regulation accuracy from 0.23 to 0.09, thus reducing the power consumption from 65% to 53%. The results of the conducted experiments allow us to conclude that the created neural superstructure may well become a prototype of an automatic voltage regulator (AVR)-type industrial controller for tuning the parameters of the PID controller.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
A review on features and methods of potential fishing zoneIJECEIAES
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naïve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community.
Electrical signal interference minimization using appropriate core material f...IJECEIAES
As demand for smaller, quicker, and more powerful devices rises, Moore's law is strictly followed. The industry has worked hard to make little devices that boost productivity. The goal is to optimize device density. Scientists are reducing connection delays to improve circuit performance. This helped them understand three-dimensional integrated circuit (3D IC) concepts, which stack active devices and create vertical connections to diminish latency and lower interconnects. Electrical involvement is a big worry with 3D integrates circuits. Researchers have developed and tested through silicon via (TSV) and substrates to decrease electrical wave involvement. This study illustrates a novel noise coupling reduction method using several electrical involvement models. A 22% drop in electrical involvement from wave-carrying to victim TSVs introduces this new paradigm and improves system performance even at higher THz frequencies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...amsjournal
The Fourth Industrial Revolution is transforming industries, including healthcare, by integrating digital,
physical, and biological technologies. This study examines the integration of 4.0 technologies into
healthcare, identifying success factors and challenges through interviews with 70 stakeholders from 33
countries. Healthcare is evolving significantly, with varied objectives across nations aiming to improve
population health. The study explores stakeholders' perceptions on critical success factors, identifying
challenges such as insufficiently trained personnel, organizational silos, and structural barriers to data
exchange. Facilitators for integration include cost reduction initiatives and interoperability policies.
Technologies like IoT, Big Data, AI, Machine Learning, and robotics enhance diagnostics, treatment
precision, and real-time monitoring, reducing errors and optimizing resource utilization. Automation
improves employee satisfaction and patient care, while Blockchain and telemedicine drive cost reductions.
Successful integration requires skilled professionals and supportive policies, promising efficient resource
use, lower error rates, and accelerated processes, leading to optimized global healthcare outcomes.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Optimization of network traffic anomaly detection using machine learning
1. International Journal of Electrical and Computer Engineering (IJECE)
Vol. 11, No. 3, June 2021, pp. 2360~2370
ISSN: 2088-8708, DOI: 10.11591/ijece.v11i3.pp2360-2370 2360
Journal homepage: http://ijece.iaescore.com
Optimization of network traffic anomaly detection using
machine learning
Cho Do Xuan1
, Hoang Thanh2
, Nguyen Tung Lam3
1
Information Security Department, Posts and Telecommunications Institute of Technology, Hanoi, Vietnam
2
Digital Marketing, FPT University, Hanoi, Vietnam
3
Information Assurance Department, FPT University, Hanoi, Vietnam
Article Info ABSTRACT
Article history:
Received Sep 6, 2020
Revised Dec 9, 2020
Accepted Dec 15, 2020
In this paper, to optimize the process of detecting cyber-attacks, we choose to
propose 2 main optimization solutions: Optimizing the detection method and
optimizing features. Both of these two optimization solutions are to ensure
the aim is to increase accuracy and reduce the time for analysis and
detection. Accordingly, for the detection method, we recommend using the
Random Forest supervised classification algorithm. The experimental results
in section 4.1 have proven that our proposal that use the Random Forest
algorithm for abnormal behavior detection is completely correct because the
results of this algorithm are much better than some other detection algorithms
on all measures. For the feature optimization solution, we propose to use
some data dimensional reduction techniques such as information gain,
principal component analysis, and correlation coefficient method. The results
of the research proposed in our paper have proven that to optimize the cyber-
attack detection process, it is not necessary to use advanced algorithms with
complex and cumbersome computational requirements, it must depend on the
monitoring data for selecting the reasonable feature extraction and
optimization algorithm as well as the appropriate attack classification and
detection algorithms.
Keywords:
Feature optimization
Machine learning
Network traffic
Network traffic anomaly
detection
Optimization
This is an open access article under the CC BY-SA license.
Corresponding Author:
Cho Do Xuan
Department of Information Security
Posts and Telecommunications Institute of Technology
122 Hoang Quoc Viet, Cau Giay District, Hanoi, Vietnam
Email: chodx@ptit.edu.vn
1. INTRODUCTION
The cyber-attack is a form of dangerous attack that has increased rapidly in both the number of
recorded attacks and the extent of their damage to organizations and businesses. The research [1-3] classified
cyber-attack techniques into two main methods: Passive attack and active attack. According to the
report [4], in 2019, cyber-attack techniques are considered as the top of the most dangerous attack
techniques. From the statistics about security vulnerabilities [5] that are often exploited in the system by
attackers, we can see the level and the danger of current cyber-attacks for organizations, governments, and
businesses. Therefore, the problem of detecting and early warning signs of cyber-attack campaigns is very
necessary today.
The studies [2, 3] presented the difference between cyber-attack and other attack techniques, thus
making the detection and the warning of this attack have many difficulties. Currently, there are two main
methods for detecting cyber-attacks: signature-based method through the rule sets, and anomaly-based
method based on data analysis and statistics to find out abnormal characteristics in the network [1-3, 6]. The
2. Int J Elec & Comp Eng ISSN: 2088-8708
Optimization of network traffic anomaly detection using machine learning (Cho Do Xuan)
2361
signature-based method has the ability to detect quickly and accurately but cannot detect new attack
techniques [1]. The anomaly-based method is not only capable of detecting attacks but also capable of
detecting abnormal behaviors, but this method requires complex calculation and processing, and has low
accuracy. The anomaly-based method is usually based on two main techniques that are machine learning and
deep learning to classify abnormal and normal behavior [1, 2]. In this paper, we propose a cyber-attack
detection method using the random forest (RF) machine learning algorithm. The RF algorithm has been
proved as the current best algorithm for classification by studies [1, 3, 6-8]. The study [1, 2] listed and
analyzed some data sets commonly used for cyber-attack detection such as DARPA/KDD Cup99, CAIDA,
NSL-KDD, ISCX 2012, UNSW-NB15, etc. In these datasets, the UNSW-NB15 data set is built and
developed relatively in accordance with real network systems [1, 9]. Therefore, in this paper, we will use the
UNSW-NB15 dataset to experiment with cyber-attack detection methods.
As presented above, in order to optimize the process of detecting and alerting cyber-attacks based on
machine learning techniques, recent studies and recommendations often attempt to find new detection
methods and techniques. However, we recognize that the new approaches are usually only suitable for
existing datasets, when they are applied in practice, they often don't bring high efficiency due to the
incompatibility of model building datasets with monitoring datasets. Therefore, in our point of view, instead
of trying to learn or develop new detection methods, we look for ways to analyze and build experimental
datasets so that they are most suitable for real network monitoring systems. In this paper, in order to optimize
the abnormal detection process based on the UNSW-NB15 dataset, we propose methods of evaluating and
selecting new features. The methods that we propose to use in this paper include information gain, principal
component analysis, and correlation coefficient method.
Our research is presented as follows: the urgency of the research problem is presented in section 1.
In section 2, we present the process of researching, surveying, and evaluating related works. The algorithms
related to the problem of classifying attack and reducing feature dimensions are presented in section 3.
Section 4 presents the results of the experimental process. Accordingly, section 4.1 is the experimental
process of detecting cyber-attacks, in which we evaluate and compare our proposed method with some other
studies. The results of the process of evaluating and comparing the efficiency of the feature dimension
reduction method are presented in section 4.2. Conclusion and evaluation are presented in section 5. The
practical significance and scientificity of our paper include:
- Apply RF machine learning algorithm and UNSW-NB15 dataset to detect abnormal behavior in the
network. In the studies that we surveyed (see Section 2.1), the authors used different machine learning
methods to compare and evaluate the effectiveness of each algorithm. However, no research has applied
the RF algorithm to detect anomalies based on the UNSW-NB15 data set, although this algorithm has
been indicated as the current best algorithm for classification by some studies. Our experimental results
presented in section 4.1 prove the effectiveness of RF algorithm in detecting anomalies and show that
when building abnormal detection systems, it is not necessary to set up algorithms that are too
cumbersome and complicated. In addition, based on the results of our proposed experimental scenarios,
we have shown the options for selecting the dataset and parameters of the algorithm so that they are in
compliance with the detection model.
- Proposing methods of evaluating and selecting features. In this paper, we propose to use some methods
and techniques in order to evaluate and select the best features. In addition, we will reassess the detection
model based on the selected features with two criteria: accuracy and processing time. The results of the
research and evaluation in section 4.2 are developments and supplements to the shortcomings of the
studies presented in section 2.2.
2. RELATED WORK
2.1. Cyber-attacks detection based on UNSW-NB15 dataset
In the study [10], Kumar et al. proposed a method to classify cyber-attack techniques based on
UNSW-NB15 by using different rule sets. However, in this study, building and applying the rule set will be
limited because the coverage and the number of rule sets are not large enough. Moustafa et al. [11] proposed
the geometric area analysis technique to detect cyber-attacks by using trapezoidal area estimation. To
evaluate the effectiveness of the proposed method, the authors conducted experiments on UNSW-NB15 and
NSL-KDD datasets. Experimental results in the study showed the superiority of the UNSW-NB15 dataset
over the NSL-KDD dataset. Besides, research [12] presents a technique for building an effective anomaly
detection system based on two datasets: the NSL-KDD and UNSW-NB15. This technique requires three
modules: capturing and logging module, pre-processing module, and the Dirichlet mixture model that is a
novel statistical decision engine based on anomaly detection technique. The first module scans and gathers
network data. Then the second module analyzes and filters these data in order to improve the efficiency of
3. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 3, June 2021 : 2360 - 2370
2362
the decision engine. Finally, the decision engine is built based on the Dirichlet mixture model. Bagui et al.
[13] proposed the cyber-attacks detection method based on Naïve Bayes, and decision trees (J48) algorithm.
In their experimental section, the research team [13] used these algorithms in turn to classify different cyber-
attack components in the UNSW-NB15 dataset. In the study [14], the authors proposed a model to detect
cyber-attacks using stacking techniques. Accordingly, in the training process of their model, the author uses
machine learning algorithms consisting of K-Nearest Neighbors, Decision Tree, and Logistic Regression in
order to build a model based on the UNSW-NB15 and UGR'16 datasets. The study [15] evaluated the
effectiveness of 8 machine learning algorithms (consisting of 2-layer and 3-layer algorithms) for network
intrusion detection. This is a good idea, but it requires the use of the Microsoft Azure Machine Learning
Studio system to apply in practice. In this research, we proceeded to distinguish between attack and normal
based on pure machine learning algorithms and the use of Apache Spark technology. Our results are similar
to the results of the method that authors [15] proposed, but our performance and experimental configuration
are much simpler than the research [15].
In addition, other studies also presented methods to detect attack components in the network using
machine learning algorithms. The study [16] presented a method of detecting DDOS attacks using a
technique that comprehensively simulates DDOS attacks. In their study [17], Narender et al. proposed a
method to detect DDOS attacks using machine learning algorithms such as Logistic Regression, Decision
Tree, and K-Nearest Neighbors. This is a relatively classic approach. Nowadays, these classification
algorithms are often not as effective as the RF algorithm [7]. Jafar et al. [18] proposed a method to classify
DOS, Prob, U2R, and L2R attack techniques by using some algorithms consisting of Neural Network,
Genetic, and Decision Tree. However, the approach using classification algorithms with KDD 99 dataset in
the study is an old one because the current cyber-attack data is much more abundant and diverse.
2.2. The problem of optimizing the anomaly detection feature on the network based on the UNSW-
NB15 dataset
In the study [19], the author proposed using Pearson's correlation coefficient and gain ratio
technique to evaluate features. However, the limitation of this study is that the authors didn't conduct
experiments to evaluate the accuracy of each method of feature dimension reduction. In this paper, we will
not only evaluate features to select important features but also evaluate the anomaly detection model based
on the feature evaluation process. The study [20] proposed the Information gain method to reduce the feature
dimension in the training process of the botnet detection model. However, in that study, the authors didn't
specify which redundant features were removed. The study [10] described the Information gain algorithm for
reducing the feature dimension. However, in the experimental part, the authors didn't compare the effectiveness of
the detection method when using the feature dimension reduction technique. Bagui et al. [13] proposed methods
of feature selection using K-means Clustering and Correlation based Feature Selection algorithms. In the
study [21], the authors proposed using a deep learning model combining Convolutional Neural Network and
long short-term memory network (LSTM) to extract and classify cyber-attacks using the CICIDS2017
dataset. Experimental results show that the classification system gives overall accuracy as 98.67% and the
accuracy of each attack type as over 99.50%. However, this approach requires a lot of time and a cumbersome
calculation system. Thus this method is only suitable for studies and is difficult to apply in reality.
3. ANOMALY CLASSIFICATION AND ITS OPTIMIZATION USING MACHINE LEARNING
3.1. Experimental data
The data set used for experiments is UNSW-NB15. This dataset was built by using the IXIA
PerfectStorm tool to extract a mixture of attack operations in the network. More than 100 GB of raw network
traffic are captured by Tcpdump tool and processed by Argus, Bro-IDS, and twelve algorithms written in the
C# language to extract 43 features and save it in CSV format [9, 10, 12, 13]. The selected features are divided
into six groups:
- Flow features: Include features used to identify network flow such as IP address, port number, and
protocol.
- Basic features: Include connection description features.
- Content features: Consist of features of TCP/IP protocol, and features of HTTP application layer protocol.
- Time features: include time-related features such as packet arrival time, start/end time and round trip time
of TCP protocol.
- Additional generated features. Features in this group can be divided into two smaller groups: general
purpose features and connection features.
- Labeled features: are labels for records.
4. Int J Elec & Comp Eng ISSN: 2088-8708
Optimization of network traffic anomaly detection using machine learning (Cho Do Xuan)
2363
3.2. Anomaly classification using random forest machine learning algorithm
The study [7] surveyed and evaluated some supervised learning algorithms in the cyber-attack
detection problem. Accordingly, the study indicates that the RF algorithm is the current best classification
technique. Therefore, in this paper, we will use the RF algorithm to detect anomalies in the network based on
the UNSW-NB15 dataset. RF is an ensemble classification method [22]. This algorithm is based on an
ensemble of classifiers, which normally are decision trees to make the final prediction [23]. The theoretical
foundation of this algorithm is based on Jensen's inequality [23]. According to Jensen's inequality applied to
the classification problems, it is shown that the combination of many models may produce less error rate than
that of each individual model.
3.3. Feature evaluation and selection
In fact, not all features, which we found, are useful to build a training model to help make the
necessary predictions. Using a few features sometimes reduces the accuracy of prediction and takes time to
build a model. Therefore, feature selection plays a very important, necessary role in the process of building
abnormal detection systems. Selecting good features will not only improve the accuracy of attack prediction
but also reduce feature extraction time. In this paper, we evaluate and select features by some different
methods in order to assess the effectiveness of each method for the UNSW-NB15 dataset.
3.3.1. Feature optimization using correlation coefficient method
The correlation coefficient is a statistical index that measures the strength of the relationship
between two variables. There are many different kinds of correlation coefficients. In this paper, we used the
Pearson correlation coefficient. Pearson correlation coefficient between two variables X and Y is calculated
by the formula [24].
( )
where:
Cov (X, Y) is the covariance of X and Y
is the standard deviation of X
is the standard deviation of Y
The correlation coefficient has a value between -1 and 1. The negative correlation coefficient
indicates that the two variables have a negative correlation or inverse correlation (is a perfect negative
correlation when the value is -1). The positive correlation coefficient indicates a positive correlation (is a
perfect positive correlation when the value is 1). The correlation coefficient is zero if two variables are
independent of each other. Features with large correlation coefficients have linear dependence, and thus they
have almost the same effect on the dependent features. So we can reduce one of those two features.
3.3.2. Feature optimization using information gain method
Information gain (IG) is a feature evaluation method based on entropy function and is widely used
in machine learning [25]. Information gain is defined as a quantity that measures the amount of information
gained about a class from a feature. Information gain is calculated based on entropy quantity [23]. The
entropy function is defined as follows [23]: Given a probability distribution of a discrete variable can
receive different values * }. Suppose that the probability for get these values are
( ) with and ∑ . This distribution symbol is ( ). The
entropy of this distribution is defined by formula (1)
( ) ∑ ( ) (1)
From the formula of entropy, we formulate the calculation principle of Information gain as follows:
Step 1: Consider a problem with different classes. Suppose that we work with a non-leaf node with data
points forming the set with the number of elements as . Suppose further that in these data
points, there are points (with ) belongs to class c. The probability for each data
point belongs to class c is approximately (maximum likelihood estimation). Thus, the entropy at
this node is calculated as follows:
( ) ∑ (2)
5. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 3, June 2021 : 2360 - 2370
2364
Step 2: Assuming that the dataset is divided into subsets according to a feature . Based on , data points in
S are divided into child nodes: with points in each child node.
We define formula (3) as the sum of weighted entropy of each child node. The taking weight is
important because nodes often have different the numbers of points.
( ) ∑ ( ) (3)
Step 3: Calculate information gain value based on feature .
( ) ( ) ( ) (4)
3.3.3. Feature optimization using principal component analysis method
Principal component analysis (PCA) is a method of finding a new basis so that the information of
the data is mainly concentrated in several coordinates, the remainder only contains a small amount of
information. To simplify the calculation, PCA will look for an orthonormal basis to make a new basis so that
in this system, the most important components are in some coordinates of the first component [26]. We can
see the steps for implementing PCA as follows [26, 27]:
Step 1: Calculate the mean vector of all data.
̅ ∑ (5)
Step 2: Subtract the mean vector from each data point.
̂ ̅ (6)
Step 3: Calculate the covariance matrix:
̂ ̂ (7)
Step 4: Calculate eigenvalues and eigenvectors with norm equal to 1 of this matrix, arrange them in
the descending order of eigenvalues.
Step 5: Select K eigenvectors with K highest eigenvalues to build the matrix UK whose columns form an
orthogonal. These K vectors are also called key components that form a subspace close to the
distribution of the normalized original data.
Step 6: Project the normalized original datâdown to the found subspace.
Step 7: Calculate the coordinates of the new data. The new data is the coordinates of the data points on the
new space according to the formula (8).
̂ (8)
The original data can be approximated according to the new data as in formula (9).
̅ (9)
4. EXPERIMIMENTS AND EVALUATIONS
4.1. Experiment and evaluation of abnormal detection method
4.1.1. Experimental scenarios
The experimental dataset in our paper includes 2,540,047 records consisting of 2,218,764 normal
records and 321,283 attack records. We will divide the above dataset into experimental datasets as follows:
- Dataset A: consist of 322,106 normal records and 321,283 abnormal records.
- Dataset B: consist of 964,971 normal records and 321,283 abnormal records.
- Dataset C: consist of 2,218,764 normal records and 321,283 abnormal records.
Each small dataset above is divided into two parts in a ratio of 7:3 to conduct training and testing.
For the classification algorithm, to evaluate the effectiveness of the RF algorithm on each dataset A, B, C, we
change the parameters representing the number of decision trees in the RF algorithm. The model will be
6. Int J Elec & Comp Eng ISSN: 2088-8708
Optimization of network traffic anomaly detection using machine learning (Cho Do Xuan)
2365
tested with the number of decision trees used as {10, 40, 60, 80, 100}. Besides, we also conduct experiments
to compare the RF algorithm with some algorithms of other studies including decision tree (J48) [9, 21] and
LSTM [21, 28] algorithms. In the study [15], the authors have proven that the KNN and logistic regression
algorithms both have less efficiency than the decision tree algorithm, so to see the effectiveness of the RF
algorithm, we will only compare it with decision tree and LSTM algorithms
4.1.2. Evaluation criteria
In this paper, we specify that the abnormal record is labeled as positive, and normal records are
labeled as negative. The metrics used to evaluate the effectiveness of the abnormal detection method in our
paper include:
Accuracy: the ratio between the number of points correctly predicted and the total number of points in the
test dataset.
(10)
Precision: the ratio of the number of true positive points among those classified as positive (TP+FP). High
Precision value means that the accuracy of the found points is high.
(11)
Recall is defined as the ratio of the number of true positive points among those that are actually positive
(TP+FN). High recall value means that the true positive rate (TPR) is high meaning that the rate of
missing the actual positive points is low.
(12)
In which, True positive (TP) is the number of abnormal records that are correctly predicted; False
positive (FP) is the number of normal records that are incorrectly predicted; True negative (TN) is the
number of normal records correctly predicted; False negative (FN) is the number of abnormal records that are
incorrectly predicted.
Confusion matrix: This matrix will show how many data points actually belong to which class and how
many data points are predicted to belong to which class. In addition, the TPR, FNR, FPR, TNR
(R-Rate) criteria are calculated based on the normalized confusion matrix. Table 1 describes the
calculation formulas of the above parameters.
Table 1. Confusion matrix
Predicted as abnormal Predicted as normal
Actual abnormal ( ) ( )
Actual normal ( ) ( )
4.1.3. Experimental results
a. Experimental results with dataset A
From Table 2, we can see that when the number of decision trees is 40, the algorithm has the highest
accuracy and precision which are 99.299% and 98.619% respectively. Besides, when changing the number of
decision trees from 10 to 100, the accuracy of the algorithm doesn't change much. This shows that with a
dataset balanced about the ratio of normal and abnormal records, the RF algorithm detects well and steadily.
However, when the number of decision trees increases, training and testing time also increases. Table 3 shows the
evaluation result of the confusion matrix in case of the number of decision trees of 40. From Table 3, we can see
that the prediction model achieved very high accuracy in both normal and anomaly predictions.
b. Experimental results with dataset B
From Table 2 and Table 4, we can see that the accuracy and precision of dataset B are lower than the
dataset A. However, the recall values don't change much. In addition, the training time of dataset B is 1.3 to
1.5 times higher than the dataset A. For the RF algorithm in dataset B, the highest accuracy (98.944%) is
achieved when the number of decision trees is 60 and 80. However, the highest precision (95.965%) is
7. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 3, June 2021 : 2360 - 2370
2366
achieved in case of the number of decision trees of 40. Table 5 shows the result of the confusion matrix in
case of the number of decision trees of 40.
Table 2. Experimental results with dataset a using RF algorithm
Algorithm Accuracy
%
Precision% Recall
%
FPR
%
TNR
%
FNR
%
Training time
(s)
Random Forest with the
number of trees as
10 99.298 98.617 99.996 1.399 98.601 0.004 111.051
40 99.299 98.619 99.997 1.397 98.603 0.003 117.402
60 99.296 98.614 99.996 1.402 98.598 0.004 125.954
80 99.299 98.618 99.997 1.398 98.602 0.003 131.755
100 99.296 98.615 99.994 1.401 98.599 0.006 138.924
J48 [12, 20] 98.473 89.242 99.981 1.745 98.255 0.019 98.209
LSTM [20, 27] 97.682 96.888 98.522 3.156 96.844 1.478 165.453
Table 3. Confusion matrix result with the number of decision trees of 40 on dataset A
Predicted as abnormal Predicted as normal
Actual abnormal 95303 1350
Actual normal 3 96431
Table 4. Experimental results with dataset B
Algorithm Accuracy
%
Precision% Recall
%
FPR
%
TNR
%
FNR
%
Training time
(s)
Random Forest with the number
of trees as
10 98.942 95.957 99.977 1.404 98.596 0.023 111.397
40 98.943 95.965 99.973 1.401 98.599 0.027 127.143
60 98.944 95.954 99.990 1.405 98.595 0.010 142.84
80 98.944 95.958 99.987 1.403 98.597 0.013 156.939
100 98.943 95.956 99.984 1.404 98.596 0.016 172.444
J48 [9, 21] 98.012 93.037 99.489 2.479 97.521 0.511 103.514
LSTM [21, 28] 96.579 94.801 91.313 1.667 98.333 8.687 201.2
Table 5. Confusion matrix result with the number of decision trees of 40 on dataset B
Predicted as abnormal Predicted as normal
Actual abnormal 96422 12
Actual normal 4067 285329
c. Experimental results with dataset C
When the number of abnormal records and the number of normal records have the largest difference,
all experimental values give poorer results than other scenarios. This is reasonable because this is the nature
of the classification process. If the disparity in the dataset is too large, the classification model will over fit.
From Table 6 it can be seen that: with a parameter of the number of decision trees of 10, the highest
Accuracy and Precision are respectively 99.016% and 94.825%. The confusion matrix values are shown in
Table 7.
Table 6. Experimental results with dataset C
Algorithm Accuracy % Precision% Recall % FPR % TNR % FNR % Training
time (s)
Random Forest with the number
of trees as
10 99.016 94.825 97.547 0.771 99.229 2.453 136.065
40 98.877 92.254 99.473 1.209 98.791 0.527 168.027
60 98.869 92.114 99.583 1.234 98.766 0.417 205.869
80 98.786 91.262 99.971 1.386 98.614 0.029 243.109
100 98.869 92.090 99.612 1.239 98.761 0.388 273.612
J48 [9, 21] 97.681 85.416 98.482 2.435 97.565 1.518 128.03
LSTM [21, 28] 94.752 88.939 90.209 3.735 96.265 9.791 400.642
Table 7. Confusion matrix result with the number of decision trees of 10 on dataset C
Predicted as abnormal Predicted as normal
Actual abnormal 94068 2366
Actual normal 5134 660839
8. Int J Elec & Comp Eng ISSN: 2088-8708
Optimization of network traffic anomaly detection using machine learning (Cho Do Xuan)
2367
d. Discussion
From the experimental results in Tables 2, 4 and 6, we can see that the RF algorithm gave good and
stable classification results although there is a very large difference among the datasets. The lower the
imbalance of the dataset is, the higher the measures of the correct detection rate are. Besides, for J48 [9, 21]
and LSTM [21, 28] algorithms, when the dataset changes, the detection results and detection time also
change. The J48 algorithm has the advantage of the lowest time for detection and classification due to using
only one tree for evaluation. However, this algorithm has the disadvantage that its accuracy on all
measurements is lower than the RF algorithm. With the LSTM algorithm [21, 28], the detection efficiency
has been improved but the processing time is too slow compared with other algorithms. Thence it can see that
the LSTM algorithm is not really suitable for datasets without time parameters. Based on these results, we
provided some criteria and basis for cyber-attack detection systems to choose in order to balance between
detection performance and time cost.
4.2. Evaluation of feature optimization methods
From Table 6, we select a parameter of the number of decision trees of 80 to conduct experiments
and evaluate the feature optimization method. We chose this scenario because the dataset C and the number
of decision trees of 80 give the lowest accuracy and precision.
4.2.1. Feature selection using correlation coefficient method
a. Experimental results of feature dimension reduction
According to the rule of selecting and evaluating features of the correlation coefficient method, if
the two features have a large correlation coefficient, one of the two features should be removed. The reason is
that if both features are kept, there is not mean much about terms of value. Accordingly, from Figure 1, we
specify that if two features have the correlation coefficient greater than or equal to 0.9, or less than or equal
to -0.9, one of the two features will be removed. By doing this, we removed 12 features consisting of sloss,
ct_state_ttl, synack, ct_dst_src_ltm, Dpkts, dwin, ackdat, ct_srv_dst, Ltime, dloss, and ct_src_ltm. So from
Figure 1, the number of remaining features is 31.
Figure 1. Correlation coefficient matrix among features in a dataset
b. Result of classification using correlation coefficient method
Experimental results of dataset C with 31 selected features are presented in Table 8. Comparing
Table 8 with Table 6, we see that the important metrics such as accuracy, precision, and training time are all
improved, being the following: Accuracy value increased by 0.015%; Precision value increased by 0.146%;
Training time reduced by 52.954 seconds.
Table 8. Experimental results of dataset C with 31 features
Accuracy % Precision % Recall % FPR % TNR % FNR % Training Time (s)
98.801 91.408 99.945 1.365 98.635 0.055 190.155
9. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 3, June 2021 : 2360 - 2370
2368
4.2.2. Feature selection using IG method
a. Experimental results of feature dimension reduction
The Figure 2 shows the importance of each feature when using the IG evaluation method. Features
with low importance scores (less than 0.01) will be removed to reduce the number of features. By doing this,
we removed 15 features: dloss, dwin, stcpb, dtcpb, trans_depth, res_bdy_len, Sjit, Djit, Stime, Ltime,
is_sm_ips_ports, ct_flw_http_mthd, is_ftp_login, ct_ftp_cmd, ct_src_ltm.
b. Result of classification using IG method
Experimental results of dataset C with 28 selected features are presented in Table 9. Comparing
Table 9 with Tables 8 and 6, we see that the important metrics such as accuracy, precision, and training time
are all much better, being the following: Accuracy value increased by 0.193%; Precision value increased by
2.333%; Training time reduced by 29.699 seconds.
Table 9. Experimental results of dataset C with 28 features
Accuracy % Precision % Recall % FPR % TNR % FNR % Training Time (s)
98.979% 93.595% 98.709% 0.982% 99.018 1.291 213.410
Figure 2. Graph of feature values by IG method
4.2.3. Feature selection using PCA method
a. Experimental results of feature dimension reduction
We choose to keep the number of features in dataset C at 31. After the experimental process, PCA
method has removed 12 features consisting of Dpkts, dwin, ackdat, ct_srv_dst, Ltime, dloss, trans_depth,
res_bdy_len, Sjit, Djit, Stime, Ltime, is_sm_ips_ports, and ct_flw_http_mthd.
b. Result of classification using PCA method
Experimental results of dataset C with 31 selected features are presented in Table 10. Comparing
with the initial feature set, this experimental scenario also has better accuracy, precision, and training time
values. Furthermore, reducing the feature dimension by PCA method has higher accuracy and precision than
the feature selection using correlation coefficient method, but training time is more than 34.377 seconds.
Comparing the experimental results in Table 10 with Table 9, the PCA method isn't as effective as the IG
method. The reason is that the PCA method compresses data that could lead to the loss of important features,
and the IG method performs weight evaluation to select features. Therefore, if the data set is larger, the use of
the PCA method will be more effective.
10. Int J Elec & Comp Eng ISSN: 2088-8708
Optimization of network traffic anomaly detection using machine learning (Cho Do Xuan)
2369
Table 10. Experimental results of dataset C with 31 features
Accuracy % Precision % Recall % FPR % TNR % FNR % Training Time (s)
98.804% 91.926% 99.293% 1.267% 98.733% 0.707% 224.532
4.2.4. Discussion
The experimental results in Tables 8–10 show that the feature dimension reduction algorithms
brought good efficiency in both 2 problems: improving the efficiency of the detection process, and time for
detection and warning. However, based on the different efficiency of the feature dimension reduction
methods, we noticed that cyber-attack monitoring and detection systems need a trade-off between detection
efficiency and detection time. The IG and correlation coefficient algorithms can give better results in terms of
detection time and efficiency if we continue to choose thresholds to reduce the dimension. However, if
reducing the number of features too large, it will lead to the loss of data characteristics. Besides, these
algorithms are only suitable for small and medium datasets. For large datasets, it is necessary to use the PCA
method. Therefore, we think that monitoring systems need to constantly update and reevaluate the training
model to change the values and roles of features to ensure that all useful features are used.
5. CONCLUSION
Cyber-attack techniques have always been and will always be major challenges for intrusion
monitoring and detection systems. With the goal of optimizing the cyber-attack detection process, in our
research, we proposed two main problems: optimizing the attack detection method by using the RF
supervised learning algorithm and optimizing features based on feature dimension reduction techniques. The
experimental results about detecting cyber-attacks using the RF algorithm show that the RF algorithm
has been effective not only for the ability to accurately detect attacks but also for the ability to limit the false
detection of attacks when the experimental dataset has a large difference between normal data and cyber-
attack data. For the feature optimization process, feature dimension reduction methods removed many
features. In particular, the correlation coefficient method decreased by 26%, IG decreased by 32%, and PCA
decreased by 43% of the number of features. Although the number of features is reduced, the detection
method still ensures the efficiency of accuracy as well as the detection time. This shows that dimensional
reduction methods selected and eliminated accurately redundant features. With the results, our paper has not
only provided network attack monitoring and detection systems with criteria to choose from to ensure the
time and efficiency of the detection process but also proved that: to optimize the detection of cyber-attacks, it
is not necessary to use advanced algorithms with complex and cumbersome computational requirements, it
must depend on the monitoring data for selecting the reasonable feature extraction and optimization
algorithm as well as the appropriate attack detection algorithms. In the future, we will continue to research
and propose to apply our approach on other experimental data sets of cyber-attacks such as IDS 2018,
CTU 13, etc. Besides, we will improve data dimension reduction solutions based on information
representation methods of features or using graph theory.
REFERENCES
[1] R. Markus, et al., “A survey of network-based intrusion detection data sets,” Computers & security, vol. 8, no. 6,
pp. 147-167, 2019.
[2] Kh. Ansam, et al., “Survey of intrusion detection systems: techniques, datasets and challenges,” Cybersecurity,
vol. 20, pp. 2-20, 2019.
[3] A. T. Admassu, and S.N. Pramod., “A review on software defined network security risks and challenges,”
TELKOMNIKA Telecommunication, Computing, Electronics and Control, vol. 17, no. 6, pp. 3168-3174, 2019.
[4] Cyber Edge Group, “2019 Cyberthreat Defense Report,” Imperva, 2019. [Online]. Available:
https://www.imperva.com/resources/reports/CyberEdge-2019-CDR-Report-v1.1.pdf.
[5] Joe Levy, “Sophos 2020 Threat Report,” Sophos, 2019, [Online]. Available: https://www.sophos.com/en-
us/medialibrary/PDFs/technical-papers/sophoslabs-uncut-2020-threat-report.pdf.
[6] A. Mohiuddin, M. Abdun, H. Jiankun, “A Survey of Network Anomaly Detection Techniques,” Journal of Network
and Computer Applications, vol. 60. pp. 19-31, 2015.
[7] J. J. Arthur, et al., “Review of the machine learning methods in the classification of phishing attack,” Bulletin of
Electrical Engineering and Informatics (BEEI), vol. 8, vo. 4, pp. 1545-1555, 2019.
[8] D. X. Cho, et al. “An adaptive anomaly request detection framework based on dynamic web application profiles,”
International Journal of Electrical and Computer Engineering (IJECE), vol. 10, no. 5, pp. 5335-5346, 2020.
[9] The UNSW-NB15 Dataset Description, “University of New South Wales Canberra,” 2020. [Online]. Available:
https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/.
[10] K. Vikash., et al., “An integrated rule based intrusion detection system: analysis on UNSW-NB15 data set and the
real time online dataset,” Cluster Computing, vol. 23, pp. 1397-1418, 2019.
11. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 11, No. 3, June 2021 : 2360 - 2370
2370
[11] N. Moustafa., et al., “Novel Geometric Area Analysis Technique for Anomaly Detection using Trapezoidal Area
Estimation on Large-scale Networks,” IEEE Transactions on Big Data, vol. 5, no. 4, pp. 481-494, 2019.
[12] N. Moustafa et al., “Big Data Analytics for Intrusion Detection System: Statistical Decision-Making Using Finite
Dirichlet Mixture Models,” Chapter: 3Publisher: Springer publishing house, 2017.
[13] S. Bagui, et al., “Using machine learning techniques to identify rare cyber-attacks on the UNSW-NB15 dataset,”
Security and Privacy, vol. 2, no. 3, pp. 1-13, 2019.
[14] S. Rajagopal., et al., “A predictive model for network intrusion detection using stacking approach,” International
Journal of Electrical and Computer Engineering (IJECE), vol. 10, no. 3, pp. 2734-2741, 2020.
[15] S. Rajagopal, et al., “Performance analysis of binary and multiclass models using azure machine learning,”
International Journal of Electrical and Computer Engineering (IJECE), vol. 10, no. 1, pp. 978-986, 2020.
[16] H. H. Ibrahim, et al., “A comprehensive study of distributed Denial-of-Service attack with the detection
techniques,” International Journal of Electrical and Computer Engineering (IJECE), vol. 10, no. 4, pp. 3685-3694,
2020.
[17] M. Narender., and B.N. Yuvaraju, “Preemptive modelling towards classifying vulnerability of DDoS attack in SDN
environment,” International Journal of Electrical and Computer Engineering (IJECE), vol. 10, no. 2,
pp. 1599-1611, 2020.
[18] J. Majidpour, and H. Hasan Zadeh, “Application of deep learning to enhance the accuracy of intrusion detection in
modern computer networks,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 3, pp. 1137-
1148, 2020.
[19] N. Moustafa., and J. Slay., “The evaluation of Network Anomaly Detection Systems: Statistical analysis of the
UNSW-NB15 data set and the comparison with the KDD99 data set,” Information Security Journal: A Global
Perspective, vol. 25, no. 1-3, pp. 18-31, 2016.
[20] Z. M. Algelal, et al., “Botnet detection using ensemble classifiers of network flow,” International Journal of
Electrical and Computer Engineering (IJECE), vol. 10, no. 3, pp. 2543-2550, 2020.
[21] P. Sun, et al., “DL-IDS: Extracting Features Using CNN-LSTM Hybrid Network for Intrusion Detection System,”
Security and Communication Networks, vol. 2020, pp. 1-11, 2020.
[22] L. Breiman., “Understanding Random Forests: From Theory to Practice,” Machine Learning, vol. 80, no. 1, pp. 5-
32, 2017.
[23] S. S. Shai., “Understanding Machine Learning: From Theory to Algorithms,” Cambridge University Press, 2018.
[24] B. Andrzej and J. Andrzej, “SPSS tutorial: Pearson Correlation,” Wydawnictwo Niezależne. vol. 1, pp. 5-21, 2014.
[25] T. A. Alhaj, et al., “Feature Selection Using Information Gain for Improved Structural-Based Alert Correlation,”
PLOS ONE, vol. 11, no. 11, 2016.
[26] J. Shlens., “A Tutorial on Principal Component Analysis,” arxiv.org/, arXiv:1404.1100, 2014.
[27] J. Lever, Krzywinski, et al., “Principal component analysis,” Nat Methods, vol. 14, pp. 631-642, 2017.
[28] A. Boukhalfa, et al., “LSTM deep learning method for network intrusion detection system,” International Journal
of Electrical and Computer Engineering (IJECE), vol. 10, no. 3, pp. 3315-3322, 2020.