BotDetectorFW: an optimized botnet detection framework based on five features-distance measures supported by comparisons of four machine learning classifiers using CICIDS2017 dataset
A Botnet is one of many attacks that can execute malicious tasks and develop continuously. Therefore, current research introduces a comparison framework, called BotDetectorFW, with classification and complexity improvements for the detection of Botnet attack using CICIDS2017 dataset. It is a free online dataset consist of several attacks with high-dimensions features. The process of feature selection is a significant step to obtain the least features by eliminating irrelated features and consequently reduces the detection time. This process implemented inside BotDetectorFW using two steps; data clustering and five distance measure formulas (cosine, dice, driver & kroeber, overlap, and pearson correlation) using C#, followed by selecting the best N features used as input into four classifier algorithms evaluated using machine learning (WEKA); multilayerperceptron, JRip, IBK, and random forest. In BotDetectorFW, the thoughtful and diligent cleaning of the dataset within the preprocessing stage beside the normalization, binary clustering of its features, followed by the adapting of feature selection based on suitable feature distance techniques, and finalized by testing of selected classification algorithms. All together contributed in satisfying the highperformance metrics using fewer features number (8 features as a minimum) compared to and outperforms other methods found in the literature that adopted (10 features or higher) using the same dataset. Furthermore, the results and performance evaluation of BotDetectorFM shows a competitive impact in terms of classification accuracy (ACC), precision (Pr), recall (Rc), and f-measure (F1) metrics.
Botnet detection using Wgans for securityssuser3f5a831
This document proposes a GAN-based technique to improve botnet detection by addressing class imbalance issues in datasets. It involves using a WGAN to generate synthetic samples for the underrepresented botnet class, which are then combined with the original dataset and used to train and evaluate several machine learning classifiers. The proposed "Bot-WGANs" method achieves over 99.9% accuracy, 100% precision, 99.8% recall, and 99.9% F1 score on the CICIDS2017 dataset, outperforming existing methods that use oversampling techniques like SMOTE. The approach shows potential for enhancing cybersecurity by simplifying and strengthening botnet detection.
A Hierarchical Feature Set optimization for effective code change based Defec...IOSR Journals
This document summarizes research on using support vector machines (SVMs) for software defect prediction. It analyzes 11 datasets from NASA projects containing code metrics and defect information for modules. The researchers preprocessed the data by removing duplicate/inconsistent instances, constant attributes, and balancing the datasets. They used SVMs with 5-fold cross validation to classify modules as defective or non-defective, achieving an average accuracy of 70% across the datasets. The researchers conclude SVMs can effectively predict defects but note earlier studies using the NASA data may have overstated capabilities due to insufficient data preprocessing.
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection SystemIOSRjournaljce
Big data problem in intrusion detection system is mainly due to the large volume of the data. The dimension of the original data is 41. Some of the feature of original data are unnecessary. In this process, the volume of data has expanded into hundreds and thousands of gigabytes(GB) of information. The dimension span of data and volume can be reduced and the system is enhanced by using K-NN and BA. The reduction ratio of KDD datasets and processing speed is very slow so the data has been reduced for extracting features by Bees Algorithm (AB) and use K-nearest neighbors as classification (KNN). So, the KDD99 datasets applied in the experiments with significant features. The results have gave higher detection and accuracy rate as well as reduced false positive rate. Keywords: Big Data; Intru
This document discusses feature selection techniques for intrusion detection systems. It begins with background on intrusion detection and challenges related to large datasets. It then describes three common feature selection algorithms: Correlation-based Feature Selection (CFS), Information Gain (IG), and Gain Ratio (GR). The document proposes a fusion model that applies these three algorithms to select features, then uses genetic algorithm and naive Bayes classification to evaluate performance. It conducted experiments on the KDD Cup 99 intrusion detection dataset to compare the proposed fusion method to the individual feature selection algorithms.
This document discusses feature selection techniques for intrusion detection systems. It begins with background on intrusion detection and challenges related to large datasets. It then describes three common feature selection algorithms: Correlation-based Feature Selection (CFS), Information Gain (IG), and Gain Ratio (GR). The document proposes a fusion model that applies these three algorithms to select features, then uses genetic algorithm and naive Bayes classification to evaluate performance. It conducted experiments on the KDD Cup 99 intrusion detection dataset to compare the proposed fusion method to the individual feature selection algorithms.
Statistical performance assessment of supervised machine learning algorithms ...IAESIJAI
Several studies have shown that an ensemble classifier's effectiveness is directly correlated with the diversity of its members. However, the algorithms used to build the base learners are one of the issues encountered when using a stacking ensemble. Given the number of options, choosing the best ones might be challenging. In this study, we selected some of the most extensively applied supervised machine learning algorithms and performed a performance evaluation in terms of well-known metrics and validation methods using two internet of things (IoT) intrusion detection datasets, namely network-based anomaly internet of things (N-BaIoT) and internet of things intrusion detection dataset (IoTID20). Friedman and Dunn's tests are used to statistically examine the significant differences between the classifier groups. The goal of this study is to encourage security researchers to develop an intrusion detection system (IDS) using ensemble learning and to propose an appropriate method for selecting diverse base classifiers for a stacking-type ensemble. The performance results indicate that adaptive boosting, and gradient boosting (GB), gradient boosting machines (GBM), light gradient boosting machines (LGBM), extreme gradient boosting (XGB) and deep neural network (DNN) classifiers exhibit better trade-off between the performance parameters and classification time making them ideal choices for developing anomaly-based IDSs.
A novel ensemble modeling for intrusion detection system IJECEIAES
Vast increase in data through internet services has made computer systems more vulnerable and difficult to protect from malicious attacks. Intrusion detection systems (IDSs) must be more potent in monitoring intrusions. Therefore an effectual Intrusion Detection system architecture is built which employs a facile classification model and generates low false alarm rates and high accuracy. Noticeably, IDS endure enormous amounts of data traffic that contain redundant and irrelevant features, which affect the performance of the IDS negatively. Despite good feature selection approaches leads to a reduction of unrelated and redundant features and attain better classification accuracy in IDS. This paper proposes a novel ensemble model for IDS based on two algorithms Fuzzy Ensemble Feature selection (FEFS) and Fusion of Multiple Classifier (FMC). FEFS is a unification of five feature scores. These scores are obtained by using feature-class distance functions. Aggregation is done using fuzzy union operation. On the other hand, the FMC is the fusion of three classifiers. It works based on Ensemble decisive function. Experiments were made on KDD cup 99 data set have shown that our proposed system works superior to well-known methods such as Support Vector Machines (SVMs), K-Nearest Neighbor (KNN) and Artificial Neural Networks (ANNs). Our examinations ensured clearly the prominence of using ensemble methodology for modeling IDSs, and hence our system is robust and efficient.
IRJET- Intrusion Detection using IP Binding in Real NetworkIRJET Journal
This document summarizes a research paper that proposes using genetic algorithms and support vector machines to improve network intrusion detection. It discusses how genetic algorithms can be used to select optimal features for support vector machine classifiers, in order to speed up training time and improve classification accuracy. The genetic algorithm optimizes the crossover and mutation probabilities during evolution to find the best feature subset for identifying network intrusions using support vector machines. Evaluation of this approach suggests it could enhance the effectiveness of intrusion detection systems.
Botnet detection using Wgans for securityssuser3f5a831
This document proposes a GAN-based technique to improve botnet detection by addressing class imbalance issues in datasets. It involves using a WGAN to generate synthetic samples for the underrepresented botnet class, which are then combined with the original dataset and used to train and evaluate several machine learning classifiers. The proposed "Bot-WGANs" method achieves over 99.9% accuracy, 100% precision, 99.8% recall, and 99.9% F1 score on the CICIDS2017 dataset, outperforming existing methods that use oversampling techniques like SMOTE. The approach shows potential for enhancing cybersecurity by simplifying and strengthening botnet detection.
A Hierarchical Feature Set optimization for effective code change based Defec...IOSR Journals
This document summarizes research on using support vector machines (SVMs) for software defect prediction. It analyzes 11 datasets from NASA projects containing code metrics and defect information for modules. The researchers preprocessed the data by removing duplicate/inconsistent instances, constant attributes, and balancing the datasets. They used SVMs with 5-fold cross validation to classify modules as defective or non-defective, achieving an average accuracy of 70% across the datasets. The researchers conclude SVMs can effectively predict defects but note earlier studies using the NASA data may have overstated capabilities due to insufficient data preprocessing.
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection SystemIOSRjournaljce
Big data problem in intrusion detection system is mainly due to the large volume of the data. The dimension of the original data is 41. Some of the feature of original data are unnecessary. In this process, the volume of data has expanded into hundreds and thousands of gigabytes(GB) of information. The dimension span of data and volume can be reduced and the system is enhanced by using K-NN and BA. The reduction ratio of KDD datasets and processing speed is very slow so the data has been reduced for extracting features by Bees Algorithm (AB) and use K-nearest neighbors as classification (KNN). So, the KDD99 datasets applied in the experiments with significant features. The results have gave higher detection and accuracy rate as well as reduced false positive rate. Keywords: Big Data; Intru
This document discusses feature selection techniques for intrusion detection systems. It begins with background on intrusion detection and challenges related to large datasets. It then describes three common feature selection algorithms: Correlation-based Feature Selection (CFS), Information Gain (IG), and Gain Ratio (GR). The document proposes a fusion model that applies these three algorithms to select features, then uses genetic algorithm and naive Bayes classification to evaluate performance. It conducted experiments on the KDD Cup 99 intrusion detection dataset to compare the proposed fusion method to the individual feature selection algorithms.
This document discusses feature selection techniques for intrusion detection systems. It begins with background on intrusion detection and challenges related to large datasets. It then describes three common feature selection algorithms: Correlation-based Feature Selection (CFS), Information Gain (IG), and Gain Ratio (GR). The document proposes a fusion model that applies these three algorithms to select features, then uses genetic algorithm and naive Bayes classification to evaluate performance. It conducted experiments on the KDD Cup 99 intrusion detection dataset to compare the proposed fusion method to the individual feature selection algorithms.
Statistical performance assessment of supervised machine learning algorithms ...IAESIJAI
Several studies have shown that an ensemble classifier's effectiveness is directly correlated with the diversity of its members. However, the algorithms used to build the base learners are one of the issues encountered when using a stacking ensemble. Given the number of options, choosing the best ones might be challenging. In this study, we selected some of the most extensively applied supervised machine learning algorithms and performed a performance evaluation in terms of well-known metrics and validation methods using two internet of things (IoT) intrusion detection datasets, namely network-based anomaly internet of things (N-BaIoT) and internet of things intrusion detection dataset (IoTID20). Friedman and Dunn's tests are used to statistically examine the significant differences between the classifier groups. The goal of this study is to encourage security researchers to develop an intrusion detection system (IDS) using ensemble learning and to propose an appropriate method for selecting diverse base classifiers for a stacking-type ensemble. The performance results indicate that adaptive boosting, and gradient boosting (GB), gradient boosting machines (GBM), light gradient boosting machines (LGBM), extreme gradient boosting (XGB) and deep neural network (DNN) classifiers exhibit better trade-off between the performance parameters and classification time making them ideal choices for developing anomaly-based IDSs.
A novel ensemble modeling for intrusion detection system IJECEIAES
Vast increase in data through internet services has made computer systems more vulnerable and difficult to protect from malicious attacks. Intrusion detection systems (IDSs) must be more potent in monitoring intrusions. Therefore an effectual Intrusion Detection system architecture is built which employs a facile classification model and generates low false alarm rates and high accuracy. Noticeably, IDS endure enormous amounts of data traffic that contain redundant and irrelevant features, which affect the performance of the IDS negatively. Despite good feature selection approaches leads to a reduction of unrelated and redundant features and attain better classification accuracy in IDS. This paper proposes a novel ensemble model for IDS based on two algorithms Fuzzy Ensemble Feature selection (FEFS) and Fusion of Multiple Classifier (FMC). FEFS is a unification of five feature scores. These scores are obtained by using feature-class distance functions. Aggregation is done using fuzzy union operation. On the other hand, the FMC is the fusion of three classifiers. It works based on Ensemble decisive function. Experiments were made on KDD cup 99 data set have shown that our proposed system works superior to well-known methods such as Support Vector Machines (SVMs), K-Nearest Neighbor (KNN) and Artificial Neural Networks (ANNs). Our examinations ensured clearly the prominence of using ensemble methodology for modeling IDSs, and hence our system is robust and efficient.
IRJET- Intrusion Detection using IP Binding in Real NetworkIRJET Journal
This document summarizes a research paper that proposes using genetic algorithms and support vector machines to improve network intrusion detection. It discusses how genetic algorithms can be used to select optimal features for support vector machine classifiers, in order to speed up training time and improve classification accuracy. The genetic algorithm optimizes the crossover and mutation probabilities during evolution to find the best feature subset for identifying network intrusions using support vector machines. Evaluation of this approach suggests it could enhance the effectiveness of intrusion detection systems.
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
This document summarizes a research paper that proposes a hybrid intrusion detection system using modified k-means clustering and naive Bayes classification. The system aims to improve detection rates and reduce false alarms. It first uses modified k-means clustering to filter outliers and identify cluster centers in the data. It then uses naive Bayes classification to classify data points as normal or intrusions. The researchers tested their system on the KDD99 intrusion detection dataset and found it achieved high accuracy rates of over 99% with low false positive rates, outperforming other methods.
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
Abstract
Internet Technology is growing at exponential rate day by day, making data security of computer systems more complex and critical. There has been multiple methodology implemented for the same in recent time as detailed in [1], [3]. Availability of larger bandwidth has made the multiple large computer server network connected worldwide and thus increasing the load on the necessity to secure data and Intrusion detection system (IDS) is one of the most efficient technique to maintain security of computer system. The proposed system is designed in such a way that are helpful in identifying malicious behavior and improper use of computer system. In this report we proposed a hybrid technique for intrusion detection using data mining algorithms. Our main objective is to do complete analysis of intrusion detection Dataset to test the implemented system.In This report we will propose a new methodology in which Modified k-mean is used for clustering whereas Naïve Bayes for the classification. These two data mining techniques will be used for Intrusion detection in large horizontally distributed database.
Keywords: Intrusion Detection, Modified K-Mean, Naïve Bays
Reliability is concerned with decreasing faults and their impact. The earlier the faults are detected the better. That's why this presentation talks about automated techniques using machine learning to detect faults as early as possible.
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)ieijjournal1
This document discusses different classifier selection models for intrusion detection systems. It begins by introducing intrusion detection systems and their importance for network security. It then describes reducing the features of the KDD Cup 99 dataset to improve computational efficiency. Fifteen different classifier algorithms are described, including K-Means, Naive Bayes, Decision Trees, Support Vector Machines, and ensemble methods. Two models are proposed for combining classifier results. Simulation results on the KDD Cup 99 dataset show the true positive rates, false positive rates, correctly classified instances, and training times for each attack category and classifier. The best performing classifiers are identified for different intrusion types.
TOWARDS PREDICTING SOFTWARE DEFECTS WITH CLUSTERING TECHNIQUESijaia
The purpose of software defect prediction is to improve the quality of a software project by building a
predictive model to decide whether a software module is or is not fault prone. In recent years, much
research in using machine learning techniques in this topic has been performed. Our aim was to evaluate
the performance of clustering techniques with feature selection schemes to address the problem of software
defect prediction problem. We analysed the National Aeronautics and Space Administration (NASA)
dataset benchmarks using three clustering algorithms: (1) Farthest First, (2) X-Means, and (3) selforganizing map (SOM). In order to evaluate different feature selection algorithms, this article presents a
comparative analysis involving software defects prediction based on Bat, Cuckoo, Grey Wolf Optimizer
(GWO), and particle swarm optimizer (PSO). The results obtained with the proposed clustering models
enabled us to build an efficient predictive model with a satisfactory detection rate and acceptable number
of features.
A feature selection method based on auto-encoder for internet of things intru...IJECEIAES
The evolution in gadgets where various devices have become connected to the internet such as sensors, cameras, smartphones, and others, has led to the emergence of internet of things (IoT). As any network, security is the main issue facing IoT. Several studies addressed the intrusion detection task in IoT. The majority of these studies utilized different statistical and bio-inspired feature selection techniques. Deep learning is a family of techniques that demonstrated remarkable performance in the field of classification. The emergence of deep learning techniques has led to configure new neural network architectures that is designed for the feature selection task. This study proposes a deep learning architecture known as auto-encoder (AE) for the task of feature selection in IoT intrusion detection. A benchmark dataset for IoT intrusions has been considered in the experiments. The proposed AE has been carried out for the feature selection task along with a simple neural network (NN) architecture for the classification task. Experimental results showed that the proposed AE showed an accuracy of 99.97% with a false alarm rate (FAR) of 1.0. The comparison against the state of the art proves the efficacy of AE.
Performance analysis of binary and multiclass models using azure machine lear...IJECEIAES
Network data is expanding and that too at an alarming rate. Besides, the sophisticated attack tools used by hackers lead to capricious cyber threat landscape. Traditional models proposed in the field of network intrusion detection using machine learning algorithms emphasize more on improving attack detection rate and reducing false alarms but time efficiency is often overlooked. Therefore, in order to address this limitation, a modern solution has been presented using Machine Learning-as-a-Service platform. The proposed work analyses the performance of eight two-class and three multiclass algorithms using UNSW NB-15, a modern intrusion detection dataset. 82,332 testing samples were considered to evaluate the performance of algorithms. The proposed two class decision forest model exhibited 99.2% accuracy and took 6 seconds to learn 1,75,341 network instances. Multiclass classification task was also undertaken wherein attack types like generic, exploits, shellcode and worms were classified with a recall percentage of 99%, 94.49%, 91.79% and 90.9% respectively by the multiclass decision forest model that also leapfrogged others in terms of training and execution time.
Intrusion Detection System Based on K-Star Classifier and Feature Set ReductionIOSR Journals
Abstract: Network security and Intrusion Detection Systems (IDS’s) is an important security related research
area. This paper applies K-star algorithm with filtering analysis in order to build a network intrusion detection
system. For our experimental analysis and as a case study, we have used the new NSL-KDD dataset, which is a
modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 66.0% for the
training set and the remainder for the testing set a 2 class classifications has been implemented. WEKA which is
a java based open source software consists of a collection of machine learning algorithms for Data mining tasks
has been used in the testing process. The experimental results show that the proposed approach is very accurate
with low false positive rate and high true positive rate and it takes less learning time in comparison with other
existing approaches used for efficient network intrusion detection.
Keywords: Information Gain, Intrusion Detection System, Instance-based classifier, K-Star, Weka.
Exploring and comparing various machine and deep learning technique algorithm...CSITiaesprime
Domain generation algorithm (DGA) is used as the main source of script in different groups of malwares, which generates the domain names of points and will further be used for command-and-control servers. The security measures usually identify the malware but the domain name algorithms will be updating themselves in order to avoid the less efficient older security detection methods. The reason being the older detection methods does not use either the machine learning or deep learning algorithms to detect the DGAs. Thus, the impact of incorporating the machine learning and deep learning techniques to detect the DGA is well discussed. As a result, they can create a huge number of domains to avoid debar and henceforth, block the hackers and zombie systems with the older methods itself. The main purpose of this research work is to compare and analyse by implementing various machine learning algorithms that suits the respective dataset yielding better results. In this research paper, the obtained dataset is pre-processed and the respective data is processed by different machine learning algorithms such as random forest (RF), support vector machine (SVM), Naive Bayes classifier, H20 AutoML, convolutional neural network (CNN), long short-term memory neural network (LSTM) for the classification. It is observed and understood that the LSTM provides a better classification efficiency of 98% and the H20 AutoML method giving the least efficiency of 75%.
2 14-1346479656-1- a study of feature selection methods in intrusion detectio...Dr. Amrita .
This document provides a survey of feature selection methods for intrusion detection systems. It discusses three categories of feature selection approaches: filter, wrapper, and hybrid. The KDD Cup 99 dataset is commonly used to evaluate feature selection methods, which contains 41 features divided into four categories. Performance is typically evaluated based on metrics like detection rate, false alarm rate, and accuracy rates derived from a confusion matrix. Feature selection aims to select a minimal subset of relevant features to improve detection performance while reducing computational requirements.
Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...CSCJournals
Intrusion detection system plays a main role in detecting anomaly and suspected behaviors in many organization environments. The detection process involves collecting and analyzing real traffic data which in heavy-loaded networks represents the most challenging aspect in designing efficient IDS.
Collected data should be prepared and reduced to enhance the classification accuracy and computation performance.
In this research, a proposed technique called, ANOVA-PCA, is applied on NSL-KDD dataset of 41 features which are reduced to 10. It is tested and evaluated with three types of supervised classifiers: k-nearest neighbor, decision tree, and random forest. Results are obtained using various performance measures, and they are compared with other feature selection algorithms such as neighbor component analysis (NCA) and ReliefF. Results showed that the proposed method was simple, faster in computation compared with others, and good classification accuracy of 98.9% was achieved.
The paper proposes a new machine learning approach for cyber security in big data. It combines multiple classifiers into an ensemble "outfit" approach. The outfit approach achieves 99.8% accuracy in distinguishing benign from malicious web pages, outperforming individual classifiers. The methodology collects and prepares a big dataset to train and evaluate KNN, SVM, MLP classifiers. Results show the outfit approach has higher true positive rate, F-measure, and recall while lowering the false positive rate compared to individual classifiers. The research aims to better detect cyber threats and improve security of big data.
PERFORMANCE EVALUATION OF DIFFERENT KERNELS FOR SUPPORT VECTOR MACHINE USED I...IJCNCJournal
The success of any Intrusion Detection System (IDS) is a complicated problem due to its nonlinearity and the quantitative or qualitative network traffic data stream with numerous features. As a result, in order to
get rid of this problem, several types of intrusion detection methods with different levels of accuracy have been proposed which leads the choice of an effective and robust method for IDS as a very important topic in information security. In this regard, the support vector machine (SVM) has been playing an important role to provide potential solutions for the IDS problem. However, the practicability of introducing SVM is
affected by the difficulties in selecting appropriate kernel and its parameters. From this viewpoint, this paper presents the work to apply different kernels for SVM in ID Son the KDD’99 Dataset and NSL-KDD dataset as well as to find out which kernel is the best for SVM. The important deficiency in the KDD’99 data set is the huge number of redundant records as observed earlier. Therefore, we have derived a data set RRE-KDD by eliminating redundant record from KDD’99train and test dataset prior to apply different kernel for SVM. This RRE-KDD consists of both KDD99Train+ and KDD99 Test+ dataset for training and testing purposes, respectively. The way to derive RRE-KDD data set is different from that of NSL-KDD
data set. The experimental results indicate that Laplace kernel can achieve higher detection rate and lower false positive rate with higher precision than other kernel son both RRE-KDD and NSL-KDD datasets. It is also found that the performances of other kernels are dependent on datasets.
Permission based Android Malware Detection using Random ForestIRJET Journal
This document presents a machine learning approach for detecting unknown Android malware using permissions as features. The study collected 15,000 malware and 15,000 benign Android applications and extracted Android Open Source Project (AOSP) permissions and third-party permissions as features. A random forest classifier was trained on these features with 10-fold cross-validation. The random forest model achieved 91.1% accuracy for AOSP permissions and 72.3% accuracy for third-party permissions in classifying malware applications. The methodology demonstrates the potential for discovering new anomalous Android applications through a machine learning technique focused on permission features.
This document evaluates the performance of three machine learning algorithms - Naive Bayes, Random Forest, and Stochastic Gradient Boosting - for detecting DDoS attacks. The algorithms were tested on a dataset from the Canadian Institute for Cybersecurity containing normal and attack network traffic. Stochastic Gradient Boosting achieved 100% accuracy, precision, recall and F1 score, outperforming the other algorithms. However, it had the longest execution time of 5.87 seconds compared to 1.55 seconds for Random Forest and 0.037 seconds for Naive Bayes. In conclusion, Stochastic Gradient Boosting was the most accurate for classifying DDoS attacks, but took significantly longer to run than the other models.
Network Intrusion Detection System using Machine LearningIRJET Journal
This document discusses using machine learning algorithms to develop a network intrusion detection system (IDS). It analyzes different machine learning algorithms like support vector machines (SVM) and naive bayes and evaluates their performance on detecting intrusions using the NSL-KDD dataset. The paper reviews related work applying machine learning to IDS and discusses algorithms like SVM and naive bayes in more detail. It proposes developing a hybrid multi-level model to improve accuracy and handling large volumes of data. The system architecture and conclusions are also summarized.
Three layer hybrid learning to improve intrusion detection system performanceIJECEIAES
In imbalanced network traffic, malicious cyberattacks can be hidden in a large amount of normal traffic, making it difficult for intrusion detection systems (IDS) to detect them. Therefore, anomaly-based IDS with machine learning is the solution. However, a single machine learning cannot accurately detect all types of attacks. Therefore, a hybrid model that combines long short-term memory (LSTM) and random forest (RF) in three layers is proposed. Building the hybrid model starts with Nearmiss-2 class balancing, which reduces normal samples without increasing minority samples. Then, feature selection is performed using chi-square and RF. Next, hyperparameter tuning is performed to obtain the optimal model. In the first and second layers, LSTM and RF are used for binary classification to detect normal data and attack data. While the third layer model uses RF for multiclass classification. The hybrid model verified using the CSE-CIC-IDS2018 dataset, showed better performance compared to the single algorithm. For multiclass classification, the hybrid model achieved 99.76% accuracy, 99.76% precision, 99.76% recall, and 99.75% F1-score.
Performance Comparison of Dimensionality Reduction Methods using MCDRAM Publications
The recent blast of dataset size, in number of records and in addition of attributes, has set off the improvement of various big data platforms and in addition parallel data analytic algorithms. In the meantime however, it has pushed for the utilization of data dimensionality reduction systems. Mobile Telecom Industry competition has become more and more fierce. In order to improve their services and business in the competitive world, they are ready to analyse the stored data by several data mining technologies to retain customers and maintain their relationship with them. Mobile Call Detail Record (MCDR) comprises diversity and complexity information containing information like Voice Call, Text Message, Video Calls, and other Data Services usages. It is proposed to evaluate and compare the performance of different dimensionality reduction methods such as Chi-Square (Chi2) Method, Principal Component Analysis (PCA), Information Gain Attribute Evaluator, Gain-Ratio Attribute Evaluator (GRAE), Attribute Selected Classifier (ASC) and Quantile Regression (QR) Methods.
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learnin...IJCNCJournal
The paper proposes a two-phase classification method for detecting anomalies in network traffic, aiming to tackle the challenges of imbalance and feature selection. The study uses Information Gain to select relevant features and evaluates its performance on the CICIDS-2018 dataset with various classifiers. Results indicate that the ensemble classifier achieved the highest accuracy, precision, and recall. The proposed method addresses challenges in intrusion detection and highlights the effectiveness of ensemble classifiers in improving anomaly detection accuracy. Also, the quantity of pertinent characteristics chosen by Information Gain has a considerable impact on the F1-score and detection accuracy. Specifically, the Ensemble Learning achieved the highest accuracy of 98.36% and F1-score of 97.98% using the relevant selected features.
Intrusion Detection System(IDS) Development Using Tree-Based Machine Learning...IJCNCJournal
The paper proposes a two-phase classification method for detecting anomalies in network traffic, aiming to tackle the challenges of imbalance and feature selection. The study uses Information Gain to select relevant features and evaluates its performance on the CICIDS-2018 dataset with various classifiers. Results indicate that the ensemble classifier achieved the highest accuracy, precision, and recall. The proposed method addresses challenges in intrusion detection and highlights the effectiveness of ensemble classifiers in improving anomaly detection accuracy. Also, the quantity of pertinent characteristics chosen by Information Gain has a considerable impact on the F1-score and detection accuracy. Specifically, the Ensemble Learning achieved the highest accuracy of 98.36% and F1-score of 97.98% using the relevant selected features.
Development of depth map from stereo images using sum of absolute differences...nooriasukmaningtyas
This article proposes a framework for the depth map reconstruction using stereo images. Fundamentally, this map provides an important information which commonly used in essential applications such as autonomous vehicle navigation, drone’s navigation and 3D surface reconstruction. To develop an accurate depth map, the framework must be robust against the challenging regions of low texture, plain color and repetitive pattern on the input stereo image. The development of this map requires several stages which starts with matching cost calculation, cost aggregation, optimization and refinement stage. Hence, this work develops a framework with sum of absolute difference (SAD) and the combination of two edge preserving filters to increase the robustness against the challenging regions. The SAD convolves using block matching technique to increase the efficiency of matching process on the low texture and plain color regions. Moreover, two edge preserving filters will increase the accuracy on the repetitive pattern region. The results show that the proposed method is accurate and capable to work with the challenging regions. The results are provided by the Middlebury standard dataset. The framework is also efficiently and can be applied on the 3D surface reconstruction. Moreover, this work is greatly competitive with previously available methods.
Model predictive controller for a retrofitted heat exchanger temperature cont...nooriasukmaningtyas
This paper aims to demonstrate the practical aspects of process control theory for undergraduate students at the Department of Chemical Engineering at the University of Bahrain. Both, the ubiquitous proportional integral derivative (PID) as well as model predictive control (MPC) and their auxiliaries were designed and implemented in a real-time framework. The latter was realized through retrofitting an existing plate-and-frame heat exchanger unit that has been operated using an analog PID temperature controller. The upgraded control system consists of a personal computer (PC), low-cost signal conditioning circuit, national instruments USB 6008 data acquisition card, and LabVIEW software. LabVIEW control design and simulation modules were used to design and implement the PID and MPC controllers. The performance of the designed controllers was evaluated while controlling the outlet temperature of the retrofitted plate-and-frame heat exchanger. The distinguished feature of the MPC controller in handling input and output constraints was perceived in real-time. From a pedagogical point of view, realizing the theory of process control through practical implementation was substantial in enhancing the student’s learning and the instructor’s teaching experience.
More Related Content
Similar to BotDetectorFW: an optimized botnet detection framework based on five features-distance measures supported by comparisons of four machine learning classifiers using CICIDS2017 dataset
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
This document summarizes a research paper that proposes a hybrid intrusion detection system using modified k-means clustering and naive Bayes classification. The system aims to improve detection rates and reduce false alarms. It first uses modified k-means clustering to filter outliers and identify cluster centers in the data. It then uses naive Bayes classification to classify data points as normal or intrusions. The researchers tested their system on the KDD99 intrusion detection dataset and found it achieved high accuracy rates of over 99% with low false positive rates, outperforming other methods.
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
Abstract
Internet Technology is growing at exponential rate day by day, making data security of computer systems more complex and critical. There has been multiple methodology implemented for the same in recent time as detailed in [1], [3]. Availability of larger bandwidth has made the multiple large computer server network connected worldwide and thus increasing the load on the necessity to secure data and Intrusion detection system (IDS) is one of the most efficient technique to maintain security of computer system. The proposed system is designed in such a way that are helpful in identifying malicious behavior and improper use of computer system. In this report we proposed a hybrid technique for intrusion detection using data mining algorithms. Our main objective is to do complete analysis of intrusion detection Dataset to test the implemented system.In This report we will propose a new methodology in which Modified k-mean is used for clustering whereas Naïve Bayes for the classification. These two data mining techniques will be used for Intrusion detection in large horizontally distributed database.
Keywords: Intrusion Detection, Modified K-Mean, Naïve Bays
Reliability is concerned with decreasing faults and their impact. The earlier the faults are detected the better. That's why this presentation talks about automated techniques using machine learning to detect faults as early as possible.
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)ieijjournal1
This document discusses different classifier selection models for intrusion detection systems. It begins by introducing intrusion detection systems and their importance for network security. It then describes reducing the features of the KDD Cup 99 dataset to improve computational efficiency. Fifteen different classifier algorithms are described, including K-Means, Naive Bayes, Decision Trees, Support Vector Machines, and ensemble methods. Two models are proposed for combining classifier results. Simulation results on the KDD Cup 99 dataset show the true positive rates, false positive rates, correctly classified instances, and training times for each attack category and classifier. The best performing classifiers are identified for different intrusion types.
TOWARDS PREDICTING SOFTWARE DEFECTS WITH CLUSTERING TECHNIQUESijaia
The purpose of software defect prediction is to improve the quality of a software project by building a
predictive model to decide whether a software module is or is not fault prone. In recent years, much
research in using machine learning techniques in this topic has been performed. Our aim was to evaluate
the performance of clustering techniques with feature selection schemes to address the problem of software
defect prediction problem. We analysed the National Aeronautics and Space Administration (NASA)
dataset benchmarks using three clustering algorithms: (1) Farthest First, (2) X-Means, and (3) selforganizing map (SOM). In order to evaluate different feature selection algorithms, this article presents a
comparative analysis involving software defects prediction based on Bat, Cuckoo, Grey Wolf Optimizer
(GWO), and particle swarm optimizer (PSO). The results obtained with the proposed clustering models
enabled us to build an efficient predictive model with a satisfactory detection rate and acceptable number
of features.
A feature selection method based on auto-encoder for internet of things intru...IJECEIAES
The evolution in gadgets where various devices have become connected to the internet such as sensors, cameras, smartphones, and others, has led to the emergence of internet of things (IoT). As any network, security is the main issue facing IoT. Several studies addressed the intrusion detection task in IoT. The majority of these studies utilized different statistical and bio-inspired feature selection techniques. Deep learning is a family of techniques that demonstrated remarkable performance in the field of classification. The emergence of deep learning techniques has led to configure new neural network architectures that is designed for the feature selection task. This study proposes a deep learning architecture known as auto-encoder (AE) for the task of feature selection in IoT intrusion detection. A benchmark dataset for IoT intrusions has been considered in the experiments. The proposed AE has been carried out for the feature selection task along with a simple neural network (NN) architecture for the classification task. Experimental results showed that the proposed AE showed an accuracy of 99.97% with a false alarm rate (FAR) of 1.0. The comparison against the state of the art proves the efficacy of AE.
Performance analysis of binary and multiclass models using azure machine lear...IJECEIAES
Network data is expanding and that too at an alarming rate. Besides, the sophisticated attack tools used by hackers lead to capricious cyber threat landscape. Traditional models proposed in the field of network intrusion detection using machine learning algorithms emphasize more on improving attack detection rate and reducing false alarms but time efficiency is often overlooked. Therefore, in order to address this limitation, a modern solution has been presented using Machine Learning-as-a-Service platform. The proposed work analyses the performance of eight two-class and three multiclass algorithms using UNSW NB-15, a modern intrusion detection dataset. 82,332 testing samples were considered to evaluate the performance of algorithms. The proposed two class decision forest model exhibited 99.2% accuracy and took 6 seconds to learn 1,75,341 network instances. Multiclass classification task was also undertaken wherein attack types like generic, exploits, shellcode and worms were classified with a recall percentage of 99%, 94.49%, 91.79% and 90.9% respectively by the multiclass decision forest model that also leapfrogged others in terms of training and execution time.
Intrusion Detection System Based on K-Star Classifier and Feature Set ReductionIOSR Journals
Abstract: Network security and Intrusion Detection Systems (IDS’s) is an important security related research
area. This paper applies K-star algorithm with filtering analysis in order to build a network intrusion detection
system. For our experimental analysis and as a case study, we have used the new NSL-KDD dataset, which is a
modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 66.0% for the
training set and the remainder for the testing set a 2 class classifications has been implemented. WEKA which is
a java based open source software consists of a collection of machine learning algorithms for Data mining tasks
has been used in the testing process. The experimental results show that the proposed approach is very accurate
with low false positive rate and high true positive rate and it takes less learning time in comparison with other
existing approaches used for efficient network intrusion detection.
Keywords: Information Gain, Intrusion Detection System, Instance-based classifier, K-Star, Weka.
Exploring and comparing various machine and deep learning technique algorithm...CSITiaesprime
Domain generation algorithm (DGA) is used as the main source of script in different groups of malwares, which generates the domain names of points and will further be used for command-and-control servers. The security measures usually identify the malware but the domain name algorithms will be updating themselves in order to avoid the less efficient older security detection methods. The reason being the older detection methods does not use either the machine learning or deep learning algorithms to detect the DGAs. Thus, the impact of incorporating the machine learning and deep learning techniques to detect the DGA is well discussed. As a result, they can create a huge number of domains to avoid debar and henceforth, block the hackers and zombie systems with the older methods itself. The main purpose of this research work is to compare and analyse by implementing various machine learning algorithms that suits the respective dataset yielding better results. In this research paper, the obtained dataset is pre-processed and the respective data is processed by different machine learning algorithms such as random forest (RF), support vector machine (SVM), Naive Bayes classifier, H20 AutoML, convolutional neural network (CNN), long short-term memory neural network (LSTM) for the classification. It is observed and understood that the LSTM provides a better classification efficiency of 98% and the H20 AutoML method giving the least efficiency of 75%.
2 14-1346479656-1- a study of feature selection methods in intrusion detectio...Dr. Amrita .
This document provides a survey of feature selection methods for intrusion detection systems. It discusses three categories of feature selection approaches: filter, wrapper, and hybrid. The KDD Cup 99 dataset is commonly used to evaluate feature selection methods, which contains 41 features divided into four categories. Performance is typically evaluated based on metrics like detection rate, false alarm rate, and accuracy rates derived from a confusion matrix. Feature selection aims to select a minimal subset of relevant features to improve detection performance while reducing computational requirements.
Data Preparation and Reduction Technique in Intrusion Detection Systems: ANOV...CSCJournals
Intrusion detection system plays a main role in detecting anomaly and suspected behaviors in many organization environments. The detection process involves collecting and analyzing real traffic data which in heavy-loaded networks represents the most challenging aspect in designing efficient IDS.
Collected data should be prepared and reduced to enhance the classification accuracy and computation performance.
In this research, a proposed technique called, ANOVA-PCA, is applied on NSL-KDD dataset of 41 features which are reduced to 10. It is tested and evaluated with three types of supervised classifiers: k-nearest neighbor, decision tree, and random forest. Results are obtained using various performance measures, and they are compared with other feature selection algorithms such as neighbor component analysis (NCA) and ReliefF. Results showed that the proposed method was simple, faster in computation compared with others, and good classification accuracy of 98.9% was achieved.
The paper proposes a new machine learning approach for cyber security in big data. It combines multiple classifiers into an ensemble "outfit" approach. The outfit approach achieves 99.8% accuracy in distinguishing benign from malicious web pages, outperforming individual classifiers. The methodology collects and prepares a big dataset to train and evaluate KNN, SVM, MLP classifiers. Results show the outfit approach has higher true positive rate, F-measure, and recall while lowering the false positive rate compared to individual classifiers. The research aims to better detect cyber threats and improve security of big data.
PERFORMANCE EVALUATION OF DIFFERENT KERNELS FOR SUPPORT VECTOR MACHINE USED I...IJCNCJournal
The success of any Intrusion Detection System (IDS) is a complicated problem due to its nonlinearity and the quantitative or qualitative network traffic data stream with numerous features. As a result, in order to
get rid of this problem, several types of intrusion detection methods with different levels of accuracy have been proposed which leads the choice of an effective and robust method for IDS as a very important topic in information security. In this regard, the support vector machine (SVM) has been playing an important role to provide potential solutions for the IDS problem. However, the practicability of introducing SVM is
affected by the difficulties in selecting appropriate kernel and its parameters. From this viewpoint, this paper presents the work to apply different kernels for SVM in ID Son the KDD’99 Dataset and NSL-KDD dataset as well as to find out which kernel is the best for SVM. The important deficiency in the KDD’99 data set is the huge number of redundant records as observed earlier. Therefore, we have derived a data set RRE-KDD by eliminating redundant record from KDD’99train and test dataset prior to apply different kernel for SVM. This RRE-KDD consists of both KDD99Train+ and KDD99 Test+ dataset for training and testing purposes, respectively. The way to derive RRE-KDD data set is different from that of NSL-KDD
data set. The experimental results indicate that Laplace kernel can achieve higher detection rate and lower false positive rate with higher precision than other kernel son both RRE-KDD and NSL-KDD datasets. It is also found that the performances of other kernels are dependent on datasets.
Permission based Android Malware Detection using Random ForestIRJET Journal
This document presents a machine learning approach for detecting unknown Android malware using permissions as features. The study collected 15,000 malware and 15,000 benign Android applications and extracted Android Open Source Project (AOSP) permissions and third-party permissions as features. A random forest classifier was trained on these features with 10-fold cross-validation. The random forest model achieved 91.1% accuracy for AOSP permissions and 72.3% accuracy for third-party permissions in classifying malware applications. The methodology demonstrates the potential for discovering new anomalous Android applications through a machine learning technique focused on permission features.
This document evaluates the performance of three machine learning algorithms - Naive Bayes, Random Forest, and Stochastic Gradient Boosting - for detecting DDoS attacks. The algorithms were tested on a dataset from the Canadian Institute for Cybersecurity containing normal and attack network traffic. Stochastic Gradient Boosting achieved 100% accuracy, precision, recall and F1 score, outperforming the other algorithms. However, it had the longest execution time of 5.87 seconds compared to 1.55 seconds for Random Forest and 0.037 seconds for Naive Bayes. In conclusion, Stochastic Gradient Boosting was the most accurate for classifying DDoS attacks, but took significantly longer to run than the other models.
Network Intrusion Detection System using Machine LearningIRJET Journal
This document discusses using machine learning algorithms to develop a network intrusion detection system (IDS). It analyzes different machine learning algorithms like support vector machines (SVM) and naive bayes and evaluates their performance on detecting intrusions using the NSL-KDD dataset. The paper reviews related work applying machine learning to IDS and discusses algorithms like SVM and naive bayes in more detail. It proposes developing a hybrid multi-level model to improve accuracy and handling large volumes of data. The system architecture and conclusions are also summarized.
Three layer hybrid learning to improve intrusion detection system performanceIJECEIAES
In imbalanced network traffic, malicious cyberattacks can be hidden in a large amount of normal traffic, making it difficult for intrusion detection systems (IDS) to detect them. Therefore, anomaly-based IDS with machine learning is the solution. However, a single machine learning cannot accurately detect all types of attacks. Therefore, a hybrid model that combines long short-term memory (LSTM) and random forest (RF) in three layers is proposed. Building the hybrid model starts with Nearmiss-2 class balancing, which reduces normal samples without increasing minority samples. Then, feature selection is performed using chi-square and RF. Next, hyperparameter tuning is performed to obtain the optimal model. In the first and second layers, LSTM and RF are used for binary classification to detect normal data and attack data. While the third layer model uses RF for multiclass classification. The hybrid model verified using the CSE-CIC-IDS2018 dataset, showed better performance compared to the single algorithm. For multiclass classification, the hybrid model achieved 99.76% accuracy, 99.76% precision, 99.76% recall, and 99.75% F1-score.
Performance Comparison of Dimensionality Reduction Methods using MCDRAM Publications
The recent blast of dataset size, in number of records and in addition of attributes, has set off the improvement of various big data platforms and in addition parallel data analytic algorithms. In the meantime however, it has pushed for the utilization of data dimensionality reduction systems. Mobile Telecom Industry competition has become more and more fierce. In order to improve their services and business in the competitive world, they are ready to analyse the stored data by several data mining technologies to retain customers and maintain their relationship with them. Mobile Call Detail Record (MCDR) comprises diversity and complexity information containing information like Voice Call, Text Message, Video Calls, and other Data Services usages. It is proposed to evaluate and compare the performance of different dimensionality reduction methods such as Chi-Square (Chi2) Method, Principal Component Analysis (PCA), Information Gain Attribute Evaluator, Gain-Ratio Attribute Evaluator (GRAE), Attribute Selected Classifier (ASC) and Quantile Regression (QR) Methods.
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learnin...IJCNCJournal
The paper proposes a two-phase classification method for detecting anomalies in network traffic, aiming to tackle the challenges of imbalance and feature selection. The study uses Information Gain to select relevant features and evaluates its performance on the CICIDS-2018 dataset with various classifiers. Results indicate that the ensemble classifier achieved the highest accuracy, precision, and recall. The proposed method addresses challenges in intrusion detection and highlights the effectiveness of ensemble classifiers in improving anomaly detection accuracy. Also, the quantity of pertinent characteristics chosen by Information Gain has a considerable impact on the F1-score and detection accuracy. Specifically, the Ensemble Learning achieved the highest accuracy of 98.36% and F1-score of 97.98% using the relevant selected features.
Intrusion Detection System(IDS) Development Using Tree-Based Machine Learning...IJCNCJournal
The paper proposes a two-phase classification method for detecting anomalies in network traffic, aiming to tackle the challenges of imbalance and feature selection. The study uses Information Gain to select relevant features and evaluates its performance on the CICIDS-2018 dataset with various classifiers. Results indicate that the ensemble classifier achieved the highest accuracy, precision, and recall. The proposed method addresses challenges in intrusion detection and highlights the effectiveness of ensemble classifiers in improving anomaly detection accuracy. Also, the quantity of pertinent characteristics chosen by Information Gain has a considerable impact on the F1-score and detection accuracy. Specifically, the Ensemble Learning achieved the highest accuracy of 98.36% and F1-score of 97.98% using the relevant selected features.
Similar to BotDetectorFW: an optimized botnet detection framework based on five features-distance measures supported by comparisons of four machine learning classifiers using CICIDS2017 dataset (20)
Development of depth map from stereo images using sum of absolute differences...nooriasukmaningtyas
This article proposes a framework for the depth map reconstruction using stereo images. Fundamentally, this map provides an important information which commonly used in essential applications such as autonomous vehicle navigation, drone’s navigation and 3D surface reconstruction. To develop an accurate depth map, the framework must be robust against the challenging regions of low texture, plain color and repetitive pattern on the input stereo image. The development of this map requires several stages which starts with matching cost calculation, cost aggregation, optimization and refinement stage. Hence, this work develops a framework with sum of absolute difference (SAD) and the combination of two edge preserving filters to increase the robustness against the challenging regions. The SAD convolves using block matching technique to increase the efficiency of matching process on the low texture and plain color regions. Moreover, two edge preserving filters will increase the accuracy on the repetitive pattern region. The results show that the proposed method is accurate and capable to work with the challenging regions. The results are provided by the Middlebury standard dataset. The framework is also efficiently and can be applied on the 3D surface reconstruction. Moreover, this work is greatly competitive with previously available methods.
Model predictive controller for a retrofitted heat exchanger temperature cont...nooriasukmaningtyas
This paper aims to demonstrate the practical aspects of process control theory for undergraduate students at the Department of Chemical Engineering at the University of Bahrain. Both, the ubiquitous proportional integral derivative (PID) as well as model predictive control (MPC) and their auxiliaries were designed and implemented in a real-time framework. The latter was realized through retrofitting an existing plate-and-frame heat exchanger unit that has been operated using an analog PID temperature controller. The upgraded control system consists of a personal computer (PC), low-cost signal conditioning circuit, national instruments USB 6008 data acquisition card, and LabVIEW software. LabVIEW control design and simulation modules were used to design and implement the PID and MPC controllers. The performance of the designed controllers was evaluated while controlling the outlet temperature of the retrofitted plate-and-frame heat exchanger. The distinguished feature of the MPC controller in handling input and output constraints was perceived in real-time. From a pedagogical point of view, realizing the theory of process control through practical implementation was substantial in enhancing the student’s learning and the instructor’s teaching experience.
Control of a servo-hydraulic system utilizing an extended wavelet functional ...nooriasukmaningtyas
Servo-hydraulic systems have been extensively employed in various industrial applications. However, these systems are characterized by their highly complex and nonlinear dynamics, which complicates the control design stage of such systems. In this paper, an extended wavelet functional link neural network (EWFLNN) is proposed to control the displacement response of the servo-hydraulic system. To optimize the controller's parameters, a recently developed optimization technique, which is called the modified sine cosine algorithm (M-SCA), is exploited as the training method. The proposed controller has achieved remarkable results in terms of tracking two different displacement signals and handling external disturbances. From a comparative study, the proposed EWFLNN controller has attained the best control precision compared with those of other controllers, namely, a proportional-integralderivative (PID) controller, an artificial neural network (ANN) controller, a wavelet neural network (WNN) controller, and the original wavelet functional link neural network (WFLNN) controller. Moreover, compared to the genetic algorithm (GA) and the original sine cosine algorithm (SCA), the M-SCA has shown better optimization results in finding the optimal values of the controller's parameters.
Decentralised optimal deployment of mobile underwater sensors for covering la...nooriasukmaningtyas
This paper presents the problem of sensing coverage of layers of the ocean in three dimensional underwater environments. We propose distributed control laws to drive mobile underwater sensors to optimally cover a given confined layer of the ocean. By applying this algorithm at first the mobile underwater sensors adjust their depth to the specified depth. Then, they make a triangular grid across a given area. Afterwards, they randomly move to spread across the given grid. These control laws only rely on local information also they are easily implemented and computationally effective as they use some easy consensus rules. The feature of exchanging information just among neighbouring mobile sensors keeps the information exchange minimum in the whole networks and makes this algorithm practicable option for undersea. The efficiency of the presented control laws is confirmed via mathematical proof and numerical simulations.
Evaluation quality of service for internet of things based on fuzzy logic: a ...nooriasukmaningtyas
The development of the internet of thing (IoT) technology has become a major concern in sustainability of quality of service (SQoS) in terms of efficiency, measurement, and evaluation of services, such as our smart home case study. Based on several ambiguous linguistic and standard criteria, this article deals with quality of service (QoS). We used fuzzy logic to select the most appropriate and efficient services. For this reason, we have introduced a new paradigmatic approach to assess QoS. In this regard, to measure SQoS, linguistic terms were collected for identification of ambiguous criteria. This paper collects the results of other work to compare the traditional assessment methods and techniques in IoT. It has been proven that the comparison that traditional valuation methods and techniques could not effectively deal with these metrics. Therefore, fuzzy logic is a worthy method to provide a good measure of QoS with ambiguous linguistic and criteria. The proposed model addresses with constantly being improved, all the main axes of the QoS for a smart home. The results obtained also indicate that the model with its fuzzy performance importance index (FPII) has efficiently evaluate the multiple services of SQoS.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Smart monitoring system using NodeMCU for maintenance of production machinesnooriasukmaningtyas
Maintenance is an activity that helps to reduce risk, increase productivity, improve quality, and minimize production costs. The necessity for maintenance actions will increase efficiency and enhance the safety and quality of products and processes. On getting these conditions, it is necessary to implement a monitoring system used to observe machines' conditions from time to time, especially the machine parts that often experience problems. This paper presents a low-cost intelligent monitoring system using NodeMCU to continuously monitor machine conditions and provide warnings in the case of machine failure. Not only does it provide alerts, but this monitoring system also generates historical data on machine conditions to the Google Cloud (Google Sheet), includes which machines were down, downtime, issues occurred, repairs made, and technician handling. The results obtained are machine operators do not need to lose a relatively long time to call the technician. Likewise, the technicians assisted in carrying out machine maintenance activities and online reports so that errors that often occur due to human error do not happen again. The system succeeded in reducing the technician-calling time and maintenance workreporting time up to 50%. The availability of online and real-time maintenance historical data will support further maintenance strategy.
Design and simulation of a software defined networkingenabled smart switch, f...nooriasukmaningtyas
Using sustainable energy is the future of our planet earth, this became not only economically efficient but also a necessity for the preservation of life on earth. Because of such necessity, smart grids became a very important issue to be researched. Many literatures discussed this topic and with the development of internet of things (IoT) and smart sensors, smart grids are developed even further. On the other hand, software defined networking is a technology that separates the control plane from the data plan of the network. It centralizes the management and the orchestration of the network tasks by using a network controller. The network controller is the heart of the SDN-enabled network, and it can control other networking devices using software defined networking (SDN) protocols such as OpenFlow. A smart switching mechanism called (SDN-smgrid-sw) for the smart grid will be modeled and controlled using SDN. We modeled the environment that interact with the sensors, for the sun and the wind elements. The Algorithm is modeled and programmed for smart efficient power sharing that is managed centrally and monitored using SDN controller. Also, all if the smart grid elements (power sources) are connected to the IP network using IoT protocols.
Efficient wireless power transmission to remote the sensor in restenosis coro...nooriasukmaningtyas
In this study, the researchers have proposed an alternative technique for designing an asymmetric 4 coil-resonance coupling module based on the series-to-parallel topology at 27 MHz industrial scientific medical (ISM) band to avoid the tissue damage, for the constant monitoring of the in-stent restenosis coronary artery. This design consisted of 2 components, i.e., the external part that included 3 planar coils that were placed outside the body and an internal helical coil (stent) that was implanted into the coronary artery in the human tissue. This technique considered the output power and the transfer efficiency of the overall system, coil geometry like the number of coils per turn, and coil size. The results indicated that this design showed an 82% efficiency in the air if the transmission distance was maintained as 20 mm, which allowed the wireless power supply system to monitor the pressure within the coronary artery when the implanted load resistance was 400 Ω.
Grid reactive voltage regulation and cost optimization for electric vehicle p...nooriasukmaningtyas
Expecting large electric vehicle (EV) usage in the future due to environmental issues, state subsidies, and incentives, the impact of EV charging on the power grid is required to be closely analyzed and studied for power quality, stability, and planning of infrastructure. When a large number of energy storage batteries are connected to the grid as a capacitive load the power factor of the power grid is inevitably reduced, causing power losses and voltage instability. In this work large-scale 18K EV charging model is implemented on IEEE 33 network. Optimization methods are described to search for the location of nodes that are affected most due to EV charging in terms of power losses and voltage instability of the network. Followed by optimized reactive power injection magnitude and time duration of reactive power at the identified nodes. It is shown that power losses are reduced and voltage stability is improved in the grid, which also complements the reduction in EV charging cost. The result will be useful for EV charging stations infrastructure planning, grid stabilization, and reducing EV charging costs.
Topology network effects for double synchronized switch harvesting circuit on...nooriasukmaningtyas
Energy extraction takes place using several different technologies, depending on the type of energy and how it is used. The objective of this paper is to study topology influence for a smart network based on piezoelectric materials using the double synchronized switch harvesting (DSSH). In this work, has been presented network topology for circuit DSSH (DSSH Standard, Independent DSSH, DSSH in parallel, mono DSSH, and DSSH in series). Using simulation-based on a structure with embedded piezoelectric system harvesters, then compare different topology of circuit DSSH for knowledge is how to connect the circuit DSSH together and how to implement accurately this circuit strategy for maximizing the total output power. The network topology DSSH extracted power a technique allows again up to in terms of maximal power output compared with network topology standard extracted at the resonant frequency. The simulation results show that by using the same input parameters the maximum efficiency for topology DSSH in parallel produces 120% more energy than topology DSSH-series. In addition, the energy harvesting by mono-DSSH is more than DSSH-series by 650% and it has exceeded DSSHind by 240%.
Improving the design of super-lift Luo converter using hybrid switching capac...nooriasukmaningtyas
In this article, an improvement to the positive output super-lift Luo converter (POSLC) has been proposed to get high gain at a low duty cycle. Also, reduce the stress on the switch and diodes, reduce the current through the inductors to reduce loss, and increase efficiency. Using a hybrid switch unit composed of four inductors and two capacitors it is replaced by the main inductor in the elementary circuit. It’s charged in parallel with the same input voltage and discharged in series. The output voltage is increased according to the number of components. The gain equation is modeled. The boundary condition between continuous conduction mode (CCM) and discontinuous conduction mode (DCM) has been derived. Passive components are designed to get high output voltage (8 times at D=0.5) and low ripple about (0.004). The circuit is simulated and analyzed using MATLAB/Simulink. Maximum power point tracker (MPPT) controls the converter to provide the most interest from solar energy.
Third harmonic current minimization using third harmonic blocking transformernooriasukmaningtyas
Zero sequence blocking transformers (ZSBTs) are used to suppress third harmonic currents in 3-phase systems. Three-phase systems where singlephase loading is present, there is every chance that the load is not balanced. If there is zero-sequence current due to unequal load current, then the ZSBT will impose high impedance and the supply voltage at the load end will be varied which is not desired. This paper presents Third harmonic blocking transformer (THBT) which suppresses only higher harmonic zero sequences. The constructional features using all windings in single-core and construction using three single-phase transformers explained. The paper discusses the constructional features, full details of circuit usage, design considerations, and simulation results for different supply and load conditions. A comparison of THBT with ZSBT is made with simulation results by considering four different cases
Power quality improvement of distribution systems asymmetry caused by power d...nooriasukmaningtyas
With an increase of non-linear load in today’s electrical power systems, the rate of power quality drops and the voltage source and frequency deteriorate if not properly compensated with an appropriate device. Filters are most common techniques that employed to overcome this problem and improving power quality. In this paper an improved optimization technique of filter applies to the power system is based on a particle swarm optimization with using artificial neural network technique applied to the unified power flow quality conditioner (PSO-ANN UPQC). Design particle swarm optimization and artificial neural network together result in a very high performance of flexible AC transmission lines (FACTs) controller and it implements to the system to compensate all types of power quality disturbances. This technique is very powerful for minimization of total harmonic distortion of source voltages and currents as a limit permitted by IEEE-519. The work creates a power system model in MATLAB/Simulink program to investigate our proposed optimization technique for improving control circuit of filters. The work also has measured all power quality disturbances of the electrical arc furnace of steel factory and suggests this technique of filter to improve the power quality.
Studies enhancement of transient stability by single machine infinite bus sys...nooriasukmaningtyas
Maintaining network synchronization is important to customer service. Low fluctuations cause voltage instability, non-synchronization in the power system or the problems in the electrical system disturbances, harmonics current and voltages inflation and contraction voltage. Proper tunning of the parameters of stabilizer is prime for validation of stabilizer. To overcome instability issues and get reinforcement found a lot of the techniques are developed to overcome instability problems and improve performance of power system. Genetic algorithm was applied to optimize parameters and suppress oscillation. The simulation of the robust composite capacitance system of an infinite single-machine bus was studied using MATLAB was used for optimization purpose. The critical time is an indication of the maximum possible time during which the error can pass in the system to obtain stability through the simulation. The effectiveness improvement has been shown in the system
Renewable energy based dynamic tariff system for domestic load managementnooriasukmaningtyas
To deal with the present power-scenario, this paper proposes a model of an advanced energy management system, which tries to achieve peak clipping, peak to average ratio reduction and cost reduction based on effective utilization of distributed generations. This helps to manage conventional loads based on flexible tariff system. The main contribution of this work is the development of three-part dynamic tariff system on the basis of time of utilizing power, available renewable energy sources (RES) and consumers’ load profile. This incorporates consumers’ choice to suitably select for either consuming power from conventional energy sources and/or renewable energy sources during peak or off-peak hours. To validate the efficiency of the proposed model we have comparatively evaluated the model performance with existing optimization techniques using genetic algorithm and particle swarm optimization. A new optimization technique, hybrid greedy particle swarm optimization has been proposed which is based on the two aforementioned techniques. It is found that the proposed model is superior with the improved tariff scheme when subjected to load management and consumers’ financial benefit. This work leads to maintain a healthy relationship between the utility sectors and the consumers, thereby making the existing grid more reliable, robust, flexible yet cost effective.
Energy harvesting maximization by integration of distributed generation based...nooriasukmaningtyas
The purpose of distributed generation systems (DGS) is to enhance the distribution system (DS) performance to be better known with its benefits in the power sector as installing distributed generation (DG) units into the DS can introduce economic, environmental and technical benefits. Those benefits can be obtained if the DG units' site and size is properly determined. The aim of this paper is studying and reviewing the effect of connecting DG units in the DS on transmission efficiency, reactive power loss and voltage deviation in addition to the economical point of view and considering the interest and inflation rate. Whale optimization algorithm (WOA) is introduced to find the best solution to the distributed generation penetration problem in the DS. The result of WOA is compared with the genetic algorithm (GA), particle swarm optimization (PSO), and grey wolf optimizer (GWO). The proposed solutions methodologies have been tested using MATLAB software on IEEE 33 standard bus system
Intelligent fault diagnosis for power distribution systemcomparative studiesnooriasukmaningtyas
Short circuit is one of the most popular types of permanent fault in power distribution system. Thus, fast and accuracy diagnosis of short circuit failure is very important so that the power system works more effectively. In this paper, a newly enhanced support vector machine (SVM) classifier has been investigated to identify ten short-circuit fault types, including single line-toground faults (XG, YG, ZG), line-to-line faults (XY, XZ, YZ), double lineto-ground faults (XYG, XZG, YZG) and three-line faults (XYZ). The performance of this enhanced SVM model has been improved by using three different versions of particle swarm optimization (PSO), namely: classical PSO (C-PSO), time varying acceleration coefficients PSO (T-PSO) and constriction factor PSO (K-PSO). Further, utilizing pseudo-random binary sequence (PRBS)-based time domain reflectometry (TDR) method allows to obtain a reliable dataset for SVM classifier. The experimental results performed on a two-branch distribution line show the most optimal variant of PSO for short fault diagnosis.
A deep learning approach based on stochastic gradient descent and least absol...nooriasukmaningtyas
More than eighty-five to ninety percentage of the diabetic patients are affected with diabetic retinopathy (DR) which is an eye disorder that leads to blindness. The computational techniques can support to detect the DR by using the retinal images. However, it is hard to measure the DR with the raw retinal image. This paper proposes an effective method for identification of DR from the retinal images. In this research work, initially the Weiner filter is used for preprocessing the raw retinal image. Then the preprocessed image is segmented using fuzzy c-mean technique. Then from the segmented image, the features are extracted using grey level co-occurrence matrix (GLCM). After extracting the fundus image, the feature selection is performed stochastic gradient descent, and least absolute shrinkage and selection operator (LASSO) for accurate identification during the classification process. Then the inception v3-convolutional neural network (IV3-CNN) model is used in the classification process to classify the image as DR image or non-DR image. By applying the proposed method, the classification performance of IV3-CNN model in identifying DR is studied. Using the proposed method, the DR is identified with the accuracy of about 95%, and the processed retinal image is identified as mild DR.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Manufacturing Process of molasses based distillery ppt.pptx
BotDetectorFW: an optimized botnet detection framework based on five features-distance measures supported by comparisons of four machine learning classifiers using CICIDS2017 dataset
1. Indonesian Journal of Electrical Engineering and Computer Science
Vol. 21, No. 1, January 2021, pp. 377~390
ISSN: 2502-4752, DOI: 10.11591/ijeecs.v21.i1.pp377-390 377
Journal homepage: http://ijeecs.iaescore.com
BotDetectorFW: an optimized botnet detection framework
based on five features-distance measures supported by
comparisons of four machine learning classifiers using
CICIDS2017 dataset
Aaya F. Jabbar, Imad J. Mohammed
Department of Computer Science, University of Baghdad, Baghdad, Iraq
Article Info ABSTRACT
Article history:
Received Feb 29, 2020
Revised Jun 14, 2020
Accepted Aug 8, 2020
A Botnet is one of many attacks that can execute malicious tasks and develop
continuously. Therefore, current research introduces a comparison
framework, called BotDetectorFW, with classification and complexity
improvements for the detection of Botnet attack using CICIDS2017 dataset.
It is a free online dataset consist of several attacks with high-dimensions
features. The process of feature selection is a significant step to obtain the
least features by eliminating irrelated features and consequently reduces the
detection time. This process implemented inside BotDetectorFW using two
steps; data clustering and five distance measure formulas (cosine, dice, driver
& kroeber, overlap, and pearson correlation) using C#, followed by selecting
the best N features used as input into four classifier algorithms evaluated
using machine learning (WEKA); multilayerperceptron, JRip, IBK, and
random forest. In BotDetectorFW, the thoughtful and diligent cleaning of the
dataset within the preprocessing stage beside the normalization, binary
clustering of its features, followed by the adapting of feature selection based
on suitable feature distance techniques, and finalized by testing of selected
classification algorithms. All together contributed in satisfying the high-
performance metrics using fewer features number (8 features as a minimum)
compared to and outperforms other methods found in the literature that
adopted (10 features or higher) using the same dataset. Furthermore, the
results and performance evaluation of BotDetectorFM shows a competitive
impact in terms of classification accuracy (ACC), precision (Pr), recall (Rc),
and f-measure (F1) metrics.
Keywords:
BotDetectorFW
Botnet detection
CICIDS2017
Distance measures
Feature selection
Machine learning
WEKA
This is an open access article under the CC BY-SA license.
Corresponding Author:
Aaya F. Jabbar
Department of Computer Science
University of Baghdad, Baghdad, Iraq
Email: aayafadhil@gmail.com
1. INTRODUCTION
A botnet is the Internet-connected devices controlled by a Botnet owner to perform malware tasks
such as to send spam, steal data, and launch DDoS attacks. There are different tools for botnet attacks like
grum, windigo, storm, and ares [1]. A scenario of botnet could compose of some infected “bots” and
controllers (one or more, but often just a single controller). A bot is a host that has infected that controlled
remotely by a bot herder (botmaster). Meanwhile, a botmaster uses a command & control server to collect
information about the bots and issue commands to one, some, or all of it simultaneously [2]. Figure 1
displays a simplified view of a botnet. In the used Botnet dataset [3], they exercise ares, which is a Python-
2. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 21, No. 1, January 2021 : 377 - 390
378
based Botnet were attacker uses a Kali Linux and the victims are five different Windows OS, namely Vista,
7, 8.1 and 10. To secure devices from botnet attacks, it must develop several detection models based on
previously recorded botnet data.
Figure 1. Botnet topology
As a paper organization, the next sections appear in a sequence to discuss; the investigation of
related work, Dataset under testing, data preprocessing with cleaning and normalization processes, the
proposed feature selection method (clustering and features-distance measures, classifiers and machine
learning, results of comparisons with discussion, and finally conducted by a conclusion and references
sections).
The authors in [4] proposed a CFS-BA algorithm to discard irrelevant features and the ensemble
classifier that combined between ForestPA, RF, and C4.5 with the AOP rule used to construct the
classification model. The proposed IDS evaluated using three datasets: NSL-KDD, KDDCup’99, and
CICIDS2017. The Acc, Precision, AD, F-Measure, FAR, and ADR metrics used to compare the ensemble
classifier and individual classifier performances. The accuracies of the proposed system reached 97% in
KDDCup’99, 99% in NSL-KDD, and 96% in CICIDS2017 and the number of features using this system is
reducing from 41 to 10 in NSL-KDD, from 41 to 12 in KDDCup’99, and 84 to 13 in CICIDS2017.
The authors in [5], performed their experiment based on different attacks using CICIDS2017 dataset
separately. Since it contains high number of features, the dataset was cleaned up from unwanted features
using a PCA method targeting the accurate selection of features. The PCA method evaluated using three
well-known classifiers (KNN, C4.5, and NaiveBayes) to measure the true detection and false alarm rates.
Specially, the number of best features in the botnet attack had been reduced into 23 features with the
detection rate up to 98.8%. The aim from [6] was to examine incorporating auto-encoder AE and PCA
(namely UBD) for dimensionality reduction and the use of classifiers (RF, NB, LDA, and QDA) towards
designing an efficient intrusion detection system within CICIDS2017 dataset. The experimental analysis
confirmed the significant results using UBD method and RF classifier to select the best10 features, where the
accuracy rate reached 99% for multi-class dataset while the terms of true positive rate (TPR), false positive
rate (FPR), Recall and Precision for the binary-class dataset (special botnet) reached 1.000.
In the paper [7], the author contributed in the improvement of AdaBoost classifier performance
using CICIDS2017. It evaluated the SMOTE technique, PCA and ensemble feature selection (EFS) for
features reduction. Furthermore, it improved the AdaBoost classifier, where the accuracy of 81.83%, a
precision of 0.81, recall of 1, and F1 score of 0.901 with 25 features. In the study [8], the learning model
developed by using deep learning-DMLP. The model applied using DDOS dataset found in CICIDS2017.
The Recursive Feature elimination method and Random Forest used to get the best fewer features of the
dataset. The most important features (10) tested using DMLP model and achieved accuracy up to 89%. In [9]
the authors evaluated their experiment using machine learning classifiers to detect botnet traffic of
CICIDS2017 dataset. The Botnet dataset divided into three training and two testing sets. In this work,
five classifiers: IBk, J48, NaïveBayes, OneR, and Random Forest of WEKA tested. J48 decision tree was
evaluated as the overall best when encompassing all 78 features without dimensionality reduction for both
test sets. Their higher accuracy detection rate reached 98%.
In the paper [10] proposed a two-phase hybrid method based on two different feature selection
techniques with recurrent neural network (RNN) and support vector machine (SVM) to get a few sets of
3. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
BotDetectorFW: An optimized Botnet Detection Framework based on… (Aaya F. Jabbar)
379
features that improve the detection performance and reduce the computational time. The first phase combined
joint mutual information maximization (JMIM) with RNN and the second phase combined correlation with
SVM. The proposed system carried using two datasets: NSL-KDD dataset (consists of two training sets:
KDDTrain+and KDDTrain_20%, and two testing sets: KDDTest+and KDDTest-21) and Kyoto2006+dataset.
The system performance evaluated using metrics such as false alarm rate (FAR), recall, precision, detection
rate (DR), F-Score and Accuracy. The results of testing the proposed system for KDDTest+, FAR of
0.0085%, recall of 97.7557%, precision of 97.2655%, f-score of 96.5025% and accuracy of 98.9256%,
compared to KDDTest-21 results which are; FAR of 0.0076%, recall of 96.1749%, precision of 97.3321%, f-
score of 97.0041% and accuracy of 98.9749%, and for Kyoto2006+; FAR of 0.0068%, recall of 99.2199%,
precision of 95.5998%, f-score of 96.97.7879% and Accuracy of 97.9443%.
The study in [11] proposed a Restricted Growing SOM method with clustering reference vector
(RGSOM-CRV) and Parallel RGSOM-CRV to improve the efficiency of attack classification in KDD Cup
1999 dataset and evaluated using metrics: accuracy (ACC), false alarm rate (FAR), detection rate (DTR) or
recall, and precision. Parallel RGSOM-CRV method outperformed the regular GSOM, as it reached up to
91.86% ACC, 20.58% FAR, 95.32% DTR or Recall, and Precision up to 94.35%. The authors of paper [12]
proposed a genetic algorithm (GA) as a tool that able to identify harmful types of connections in a
computer network, as it analyzed features of connection data and types of connection in the network to
generate a set of classification rules. The proposed method uses the combination of genetic operators which
are cloning, crossover, and mutation processes to generate new chromosomes. Fitness value indicated the
quality of a chromosome (candidate solution) that can detect a set of predetermined attack connection of data
during the training process. The proposed method applied using KDD Cup 99 dataset. From the results
obtained, it indicated that the average of the success rate and the probability value were directly
proportionate, as the high rate for an average of the success was 99.9825% for 0.5 probability value.
Paper [13] proposed a method to analyze and identify the characteristic of a botnet traffic behavior
in P2P environment based on the UDP protocol using network traffic analysis tools such as the botnet
detection strategy based on the signature, DNS anomaly approach. In signature approach, it showed one of
the well-known botnet name conficker that classified in the network-based as NetBIOS attack. DNS anomaly
approach used to analyze the behavior of the DNS to define the characteristic of the network botnet. The
identified anomalies are DNS packet request, anomalous DNS MX query, the NetBIOS attack, UDP flood
attack and DNS amplification attack. It used DNS packets in anomalous network traffic as an indicator for
the presence of the botnet, then compared the normal and anomalous traffic to analyze the DNS protocol for
identification of the DNS amplification attack, UDP flood attack and spambot activities (which was defined
when there is an anomalous DNS MX query in the network).
2. THE PROPOSED FEATURE SELECTION METHODS OF BOTDERTECTORFM
Extra features can cause an increase in the computation time; also can add negative impact to an
accuracy of the detection system. Therefore; it is advisable to reduce the number of the features in a dataset
using a suitable feature selection technique that searches for the best set of features that could optimize a
classification of data [14]. In literature, feature selection techniques are classed into filter [15], wrapper [16],
and embedded techniques [17]. Filter methods evaluate the effectiveness of selected features, separately from
learning methods, while wrapper methods require learning methods to evaluate the quality of detection
system.
Comparatively, embedded methods execute feature selection as one of the components during the
process of model construction [4, 18]. The feature selection technique of our proposed framework,
BotDetectorFM, uses the embedded type. CICIDS2017 contains features that have been recorded while
acquiring data flow, those features are related to a specific network and don’t have any impact on model
results. One of the dataset preprocessing approaches is data dimension reduction adopted inside
BotDetectorFM by removing all those meaningless features. Among useless features, we validate four
nominal ones: Flow ID, source IP, destination IP, and timestamp by removing them as done in [5]. Figure 2
depicts the main components and inner flow relationships of BotDetectorFM.
4. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 21, No. 1, January 2021 : 377 - 390
380
Figure 2. The proposed Botnet detection framework (BotDetectorFM using CICIDS2017 Dataset)
3. RESEARCH METHOD
3.1. Dataset
CICIDS2017 dataset, the current paper focus, includes benign and attacks traffic [19]. It contains the
analysis results of network traffic using CICFlowMeter and labelled flow based on timestamp, IP with port
for source, IP with Port for destination, protocols, and attack. The data capturing period began from Monday,
July_3_2017 to Friday July_7_2017, for a total of five days. The attacks in this dataset include Heartbleed,
DoS, infiltration, brute force SSH, brute force FTP, DDoS, web attack, and Botnet. Other datasets of IDS
separated the training from the testing dataset, but CICIDS2017 gathered all records of each attack to CSV
file [3]. This dataset contains 85 network flow features. The definition of extracted features is available in
Table 2. In current work, the used dataset from CICIDS2017 which is related to the botnet dataset traffic that
describes in Table 1.
Table 1. The details of botnet dataset
Dataset Name CICIDS2017
CSV File Used Friday-WorkingHours-Morning.pcap_ISCX
Year Of Release 2017
Total Number Of Instances 191033
Number Of Attributes Used in This Paper 85
Number Of Class 2 (BENIGN And BOT )
Table 2. The explanation names of features in CICIDS2017 Dataset
No. Feature Name No. Feature Name No. Feature Name
1 Flow ID 29 Fwd IAT Std 57 ECE Flag Count
2 Source IP 30 Fwd IAT Max 58 Down/Up Ratio
3 Source Port 31 Fwd IAT Min 59 Average Packet Size
4 Destination IP 32 Bwd IAT Total 60 AvgFwd Segment Size
5 Destination Port 33 Bwd IAT Mean 61 AvgBwd Segment Size
6 Protocol 34 Bwd IAT Std 62 FwdAvg Bytes/Bulk
7 Time stamp 35 Bwd IAT Max 63 FwdAvg Packets/Bulk
8 Flow Duration 36 Bwd IAT Min 64 FwdAvg Bulk Rate
9 Total Fwd Packets 37 Fwd PSH Flags 65 BwdAvg Bytes/Bulk
10 Total Backward Packets 38 Bwd PSH Flags 66 BwdAvg Packets/Bulk
11 Total Length of FwdPck 39 Fwd URG Flags 67 BwdAvg Bulk Rate
5. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
BotDetectorFW: An optimized Botnet Detection Framework based on… (Aaya F. Jabbar)
381
No. Feature Name No. Feature Name No. Feature Name
12 Total Length of BwdPck 40 Bwd URG Flags 68 SubflowFwd Packets
13 Fwd Packet Length Max 41 Fwd Header Length 69 SubflowFwd Bytes
14 Fwd Packet Length Min 42 Bwd Header Length 70 SubflowBwd Packets
15 FwdPck Length Mean 43 Fwd Packets/s 71 SubflowBwd Bytes
16 Fwd Packet Length Std 44 Bwd Packets/s 72 Init_Win_bytes_fwd
17 Bwd Packet Length Max 45 Min Packet Length 73 Act_data_pkt_fwd
18 Bwd Packet Length Min 46 Max Packet Length 74 Min_seg_size_fwd
19 Bwd Packet Length Mean 47 Packet Length Mean 75 Active Mean
20 Bwd Packet Length Std 48 Packet Length Std 76 Active Std
21 Flow Bytes/s 49 Packet Len. Variance 77 Active Max
22 Flow Packets/s 50 FIN Flag Count 78 Active Min
23 Flow IAT Mean 51 SYN Flag Count 79 Idle Mean
24 Flow IAT Std 52 RST Flag Count 80 Idle Packet
25 Flow IAT Max 53 PSH Flag Count 81 Idle Std
26 Flow IAT Min 54 ACK Flag Count 82 Idle Max
27 Fwd IAT Total 55 URG Flag Count 83 Idle Min
28 Fwd IAT Mean 56 CWE Flag Count 84 Label
3.2. Data preprocessing
Realistic data typically takes from heterogeneous platforms and may be redundant, incomplete, and
inconsistent [20]. Thus, it requires a preprocessing step that converts data into a suitable format for analysis
and discovery [21]. In this work, the preprocessing step includes cleaning data from outliers, redundant, and
data transforming [4]. Also, before the using of a dataset for detection model evaluation, it has been
necessary to clean up the dataset from errors that could occur while flow data are acquiring [5]. In general,
preprocessing consumes necessary time and an essential step for the detection system.
3.2.1. Removing redundant attributes
In the first of preprocessing, the CICIDS2017dataset contains 85 attributes but must check if there
are redundant attributes [22]. Therefore any redundant attribute must be removed to as a requirement of the
accurate model analysis (ex. ‘Fwd Header Length’ that appeared twice in the list of attributes in the number
of an attribute (41) and (62), removing one of them). The number of features after that become 84 [5].
3.2.2. Transforming of missing and infinity values
This raw data of CICIDS2017 contains anomalous instances, which may influence the performance
of the detection system taking into consideration that some detection methods do not accept these types of
values [23]. Thus, replacing them by other values could be a solution such as missing values that can be
replaced by minimum and infinite values by a maximum of their attribute values. For example, the feature
‘Flow Packets/s’ in the Botnet dataset includes abnormal values as ‘Infinity’ and ‘NaN’ [4].
3.2.3. Transforming data
CICIDS2017 contains nominal attributes. As many classifiers do not accept nominal values, the
transforming process is vital and has an impact on detection system accuracy [24]. Thus, it is necessary to
replace every single value in a nominal attribute with an integer to handle the symbolic values. For example,
transforming the IP source and destination, flow ID, and timestamp into an integer representation is adopted.
Besides, the label attribute contains two nominal values; bot and benign that can transforme into binary
numeric values such as 0 instead of benign value and 1 instead of Bot value. Also, normalized methods used
to transform all attributes value into the same range [25]. Minmax is one of the normalization strategies
which transform dataset values from range to another in each attribute [6].
Y=
X−Xmin
Xmax−Xmin
(1)
Where X is the set of dataset values of x, Xmin and Xmax are minimum value and maximum value
of x attribute values. The new range of data is 0–1 range.
3.3. Clustering data
The clustering data process converts or normalizes the values of the features into 0 to 1 range based
on means (M) and standard deviation (STD) values per feature. Algorithm 1 outlines the steps of the
clustering process.
6. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 21, No. 1, January 2021 : 377 - 390
382
Clustering data algorithm of BotDetectorFM:
Input: DS (Dataset matrix), R (No. of instances), F (Index of feature)
Output: DS_Clustering (Binary Dataset matrix that consist of 0 and 1 values)
-------------------------------------------------------------------------------------------
--------------------
1: // Mean and STD calculation
2: Calculate Mean (M) for each feature (F)
3: Calculate Standard Deviation (STD) for each feature (F)
4:
5: Calculate cluster1=M+STD//addition of mean and standard deviation
6: Calculate cluster2=M-STD//difference between mean and standard deviation
7:
8: //Clustering per Feature values Loop
9: j=0 //j is a counter for No. of instances (R) in each feature (F)
10: While j<R (stopping condition)
11: Calculate value1=DS [F, j]-cluster1
12: Calculate value2=DS [F, j]-cluster2
13: if value1<value2
14: DS_Clustering [F, j]=1
15: else
16: DS_Clustering [F, j]=0
17: j++
18: Repeat step (10) until the stopping condition is reached
3.4. Distance measure
It takes the results of binary clustering, the normalized data, as input and evaluates the distance
between the feature and label class. In general, the domain of distance measures depicts different formulas
for measuring the distance between numerical vectors of a similar length. BotDetectorFM tries to open or
head the investigation of five formulas on CICIDS2017, experiences them on the produced normalized
CICIDS2017 that may optimize the performances of the classifiers consequently the overall botnet detection
process. Let us assume there are two real-valued X and Y vectors such as X=[x1, x2… xn] and Y=[y1, y2…
yn], where X is any feature values, while Y is a label class values. Also, the distance measure as a real
number D is assumed. Therefore, it is common for D values to be within 1<=D<=0. The value of distance
increases as D approaches 1, and decreases as D approaches 0 [26, 27]. The proposed distance measures for
CICIDS2017 evaluated according to the following formulae:
Cosine Measure [28]:
DCosine (X, Y)=
∑ Xi∗Yi
n
i=0
√∑ (Xi)2 ∗ √∑ (Yi)2
n
i=0
n
i=0
(2)
Dice Measure [29]:
DDice (X, Y)=
2∗ ∑ Xi∗Yj
n
i=0
∑ (Xi)2 + ∑ (Yj)2
n
i=0
n
i=0
(3)
Driver & Kroeber measurer [11]:
DDRIVER & KROEBER (X, Y)=
∑ Xi∗Yi
n
i=0
2∗√∑ (Xi)2
n
i=0
+
∑ Xi∗Yi
n
i=0
2∗√∑ (Yi)2
n
i=0
(4)
Overlap measure [30]:
DOverlap (X, Y)=
∑ Xi∗Yi
n
i=0
min(∑ (Xi)2 ,∑ (Yi)2
n
i=0
n
i=0 )
(5)
Pearson correlation measure [31]:
DPearson_correlastion (X, Y)=
(N∗∑ Xi∗Yi
n
i=0 )−(∑ Xi,j∗∑ Yi)
n
i=0
n
i=0
√[(N∗∑ (Xi)2
n
j=0 )–(∑ Xi)
n
j=0
2
]∗[(N∗∑ (Yi)2)−(∑ Yi)
n
j=0
2
]
n
i=0
(6)
Where i-item is a counter for the number of features (N). After calculating the distance for each feature, the
features rearranged relative to the highest distance value. Thus, the important features advance at the highest-
7. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
BotDetectorFW: An optimized Botnet Detection Framework based on… (Aaya F. Jabbar)
383
ranking and the unrelated features at the least-ranking. Then, retrieving the original values for features (the
original values are the values before applying feature selection steps on it). The best distance measure is
equivalent to the highest performance measures characterized by the least selected features.
3.5. Machine learning
It is techniques or tools that are gaining popularity not only in terms of unknown and known
malware detection but also learning from the environment, which may detect attacks [32]. WEKA is one of
these tools written in Java. It is software freely accessible. It contains tools for several tasks such as
clustering, classification, association rules ...etc, in addition; tools for analyzing the learning results [33]. The
current paper uses WEKA 3.9 version as the testing environment of classifiers. Test option used in all
techniques such that the percentage split equal to 70%, this option divides the dataset into a train set 70% and
test set 30%. The classification module of BotDetectorFM investigates five classifiers known as random
forest within the decision trees algorithms [34, 35], IBK within lazy algorithms [35], JRIP within rules
algorithms [34, 36], and MultilayerPerceptron within functions algorithms [33, 37].
3.6. Performance evaluation metrics
Different metrics designed to measure the efficiency of the Detection system. Typically, these
metrics measured from a confusion matrix perspective. In BotDetectorFM, a confusion matrix used to
represent the results of the detection model. It handled as an analysis tool to measure whether the classifier is
good in recognizing the instances of different classes [38]. Table 3 identifies the confusion matrix.
Table 3. Confusion matrix
Actual
Perdicted
Attack Normal
Attack TP FP
Normal FN TN
The following are basic terms to classify events that depicted in Table 3 [39]:
TP (True Positive): The value obtained from intersecting positive actual and positive predictive values.
FP (False Positive): The value obtained from intersecting negative actual and positive predictive values.
FN (False Negative): The value obtained from intersecting actual positive and negative predictive values.
TN (True Negative): value obtained from intersecting negative actual and predictive negative values.
The values of TP and TN provide information when the classifier of the data is true, while FP and
FN provide information when the classifier is wrong in classifying the data [40].
3.7. Metrics from the confusion matrix
Many performance metrics defined by confusion matrix variables. Thus, these metrics produce
numeric values that make simply comparable. This study uses some performance metrics to appraise the
performance of the detection system, including precision (PR), recall (Rc), f-measure (F1), and accuracy
(ACC). The definitions of these metrics are provided below [39]:
Accuracy (ACC): It defined as the ratios measure of the correctly classified an object as either normal or
attack. Accuracy calculates using (7).
ACC=
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁
(7)
Precision (PR): It is the fragment of data instances predicted as positive that are positive precision
represents the ratio between positive predictions and all number of positive values. Precision can be
defined using (8)
Pr=
𝑇𝑃
𝑇𝑃+𝐹𝑃
(8)
Recall (Rc) or sensitivity: is the system’s ability to detect all existing attacks. Recall can calculate from the
number of detected intrusions using the system on all of the actual intrusions. This metric is equivalent to
the detection rate. Recall can be defined using (9).
Rc=
𝑇𝑃
𝑇𝑃+𝐹𝑁
(9)
8. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 21, No. 1, January 2021 : 377 - 390
384
F-Measure (F1): It is a harmonic combination of precision (Pr) and recall (Rc) into a single measure. F-
Measure calculated using (10).
F1=
2
1
𝑃𝑟
+
1
𝑅𝑐
(10)
4. RESULTS AND DISCUSSION
As mentioned earlier, the objective is to reduce the data dimension of CICIDS2017 that expected to
impact the botnet detection process positively. The selected features are the highest ten features determined
from each feature selection method considered as a set of optimal features Table 4.
Table 4. Show higher ten features order from each distance measure
Distance Measure Highest 10 Selected Features
Cosine
Destination Port, PSH Flag Count, Init_Win_bytes_forward, URG Flag
Count, Fwd Packet, Length Std, Fwd Packet Length Max, Down/Up Ratio,
ACK Flag Count, Bwd Packets/s, Flow Packets/s
Dice
Destination Port, URG Flag Count, PSH Flag Count,
Init_Win_bytes_forward, Flow Packets/s, Fwd Packet Length Std, Fwd
Packet Length Max, ACK Flag Count, Fwd Packets/s, Bwd Packets/s
DRIVER&KROEBER
Destination Port, Down/Up Ratio, Init_Win_bytes_forward, PSH Flag
Count, URG Flag Count, Source Port, Fwd Packet Length Std, Fwd
Packet Length Max, ACK Flag Count, Bwd Packets/s
Overlap
Destination Port, Down/Up Ratio, Init_Win_bytes_forward, PSH Flag
Count, Source Port, ACK Flag Count, Fwd Packet Length Std, Fwd
Packet Length Max, Bwd Packets/s, URG Flag Count
Pearson Correlation
Destination Port, PSH Flag Count, URG Flag Count,
Init_Win_bytes_forward, Fwd Packet Length Std, Flow Packets/s, Fwd
Packet Length Max, ACK Flag Count, Bwd Packets/s, Fwd Packets/s
As noted in the table above, most of the ten selected features are the same in all distance measures,
but they differ in a few. In addition, the order of features is different in all measures. This also may affect the
results when applying some classification algorithms. Then testing the optimal features displayed in Table 1
using classification algorithms. The implementation results of classification algorithms are displayed in
Tables 5, 6, and 7, which consider a total of 10, 9, and 8 features respectively. The results are obtained using
a test set. Bold values exhibit the highest result values obtained by applying some classifier models.
Figures 3, 4, and 5 exhibit performance comparisons of classifiers. The displayed bars in blue, red, green and
purple colours represent the accuracy of MLP, IBk, JRip, and random forest classifiers respectively.
Table 5. The results of classification algorithms for test set based on the highest ten (10) selected features
Distance Measure Classification Algorithm ACC Pr Rc F1
Cosine
MultilayerPerceptron 99.5899 % 0.978 0.614 0.754
JRip 99.9564 % 0.972 0.986 0.979
IBK 99.8238 % 0.893 0.940 0.916
Random Forest 99.9651 % 0.988 0.978 0.983
Dice
MultilayerPerceptron 99.5934 % 0.971 0.622 0.759
JRip 99.9634 % 0.981 0.983 0.982
IBK 99.8168 % 0.888 0.940 0.913
Random Forest 99.9651 % 0.990 0.976 0.983
DRIVER &
KROEBER
MultilayerPerceptron 99.8377 % 0.996 0.845 0.917
JRip 100 % 1.000 1.000 1.000
IBK 99.9948 % 0.998 0.997 0.997
Random Forest 100 % 1.000 1.000 1.000
Overlap
MultilayerPerceptron 99.8447 % 0.994 0.854 0.919
JRip 100 % 1.000 1.000 1.000
IBK 99.9948 % 0.998 0.997 0.997
Random Forest 100 % 1.000 1.000 1.000
Pearson Correlation
MultilayerPerceptron 99.6039 % 0.997 0.616 0.761
JRip 99.9616 % 0.978 0.985 0.81
IBK 99.8168 % 0.888 0.940 0.913
Random Forest 99.9668 % 0.990 0.978 0.984
9. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
BotDetectorFW: An optimized Botnet Detection Framework based on… (Aaya F. Jabbar)
385
Table 6. The results of classification algorithms for test set based on the highest nine (9) selected features
Distance Measure
Classification
Algorithm
ACC Pr Rc F1
Cosine
MultilayerPerceptron 99.6039 % 0.997 0.616 0.761
JRip 99.9651 % 0.980 0.986 0.983
IBK 99.8342 % 0.903 0.939 0.921
Random Forest 99.9791 % 0.991 0.988 0.990
Dice
MultilayerPerceptron 99.6022 % 0.995 0.616 0.761
JRip 99.9581 % 0.975 0.985 0.980
IBK 99.829 % 0.896 0.942 0.919
Random Forest 99.9721 % 0.988 0.985 0.986
DRIVER &
KROEBER
MultilayerPerceptron 99.7313 % 0.989 0.747 0.851
JRip 100% 1.000 1.000 1.000
IBK 99.9965 % 0.998 0.998 0.998
Random Forest 100% 1.000 1.000 1.000
Overlap
MultilayerPerceptron 99.836 % 0.996 0.844 0.913
JRip 100% 1.000 1.000 1.000
IBK 99.9916 % 0.995 0.997 0.996
Random Forest 100% 1.000 1.000 1.000
Pearson Correlation
MultilayerPerceptron 99.6022 % 0.995 0.616 0.761
JRip 99.9511 % 0.971 0.981 0.976
IBK 99.8098 % 0.883 0.939 0.910
Random Forest 99.9721 % 0.990 0.983 0.986
Table 7. The results of classification algorithms for test set based on the highest eight (8) selected features
Distance Measure
Classification
Algorithm
ACC Pr Rc F1
Cosine
MultilayerPerceptron 99.6039 % 0.997 0.616 0.761
JRip 99.9686 % 0.975 0.995 0.985
IBK 99.9424 % 0.966 0.978 0.972
Random Forest 99.9686 % 0.985 0.985 0.985
Dice
MultilayerPerceptron 99.6022 % 0.995 0.616 0.761
JRip 99.9651 % 0.981 0.985 0.983
IBK 99.815 % 0.891 0.934 0.912
Random Forest 99.9756 % 0.988 0.988 0.988
DRIVER & KROEBER
MultilayerPerceptron 99.8447 % 0.994 0.854 0.919
JRip 100% 1.000 1.000 1.000
IBK 99.9965 % 0.998 0.998 0.998
Random Forest 100% 1.000 1.000 1.000
Overlap
MultilayerPerceptron 99.5376 % 0.903 0.616 0.732
JRip 100% 1.000 1.000 1.000
IBK 99.9965 % 0.998 0.998 0.998
Random Forest 100% 1.000 1.000 1.000
Pearson Correlation
MultilayerPerceptron 99.6039 % 0.997 0.616 0.761
JRip 99.9564 % 0.973 0.985 0.979
IBK 99.815 % 0.891 0.934 0.912
Random Forest 99.9756 % 0.990 0.986 0.988
It noticed that the best distance measures (results) gained from BotDetectorFM when overlap and
Driver&Kroeber measures implemented and proved by the performance of classifiers. Both give significant
results based on the performance metrics in all classification methods. Especially, random forest and JRip
algorithms. As a result and distinction, BotDetectorFM succeeded in reducing the number of best features of
CICIDS2017 from ten to eight (8) providing fewer complexities in terms of time and space processing as a
botnet detection system compared to the explored previous works. Also, it is critical to point out that the
number of significant features cannot be reduced to less than 8 because it may negatively affect the detection
performance and its accuracy. Moving further in the discussion of classification results using the
performance metrics may show more deep perspectives or indications. First, the accuracy can see in the label
(a) on Figure 3, Figure 4, and Figure 5, the highest accuracy (ACC) for all distance measurements is in the
RF algorithm followed by the JRip, IBK, and MLP algorithms respectively.
10. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 21, No. 1, January 2021 : 377 - 390
386
(a) accuracy
(b) precision
(c) recall (d) f-measure
(e) 3D line chart
Figure 3. Comparison of performance metrics with feature selection methods based on the highest
Ten selected features
The accuracy rate in overlap and Driver&Kroeber measures reached to 100%. second, precision (Pr)
measure can notice in the label (b) on Figure 3, Figure 4, and Figure 5), the highest precision value is in the
RF and JRip algorithms for the overlap and Driver&Kroeber distance methods reached to 1 (highest value),
but the highest precision value for other distance methods is in the MLP algorithm. This difference in the
values of Pr due to its dependency on the TP for botnet instances and FN for benign instances, and since the
benign instances exceed the instances of botnet very much, it is the reason for this difference as well as this is
one of the problems of this dataset. Thirdly, recall (Rc) metric is shown in the label (c) on Figure 3, Figure 4,
11. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
BotDetectorFW: An optimized Botnet Detection Framework based on… (Aaya F. Jabbar)
387
and Figure 5. It is the most important measure for botnet detection because it depends only on the instances
of a botnet. The highest values for this metric found between the RF and JRip algorithms. The high recall
value is in JRip and RF classifiers for overlap and Driver&Kroeber measures is the same value reached to 1
(highest value), but the high recall value for cosine measure is in JRip classifier while for other used distance
measures is in RF classifiers. Finally, the f-measure (F1) metric shown in the label (d) of Figure 3, Figure 4,
and Figure 5, which is related to Pr and Rc metrics. The highest value of this metric result from two
algorithms JRip and RF reached to 1 using overlap and Driver&Kroeber measures. While the highest f-
measure value for other distance methods got in the RF algorithm.
(a) accuracy (b) precision
(c) recall (d) f-measure
(e) 3D line chart for all metrics
Figure 4. Comparison of performance metrics with feature selection methods based on the highest
Nine selected features
12. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 21, No. 1, January 2021 : 377 - 390
388
In general, the RF classifier algorithm produces high results for all performance measures, and the
best distance measures that rearranged the features to select the best higher N features are overlap and
Driver&Kroeber measures. The reason behind obtaining of the best features selection results by applying of
overlap and Driver-Kroeber distance measures is that they work (their formulas) or behave proportionally
directly with the intersection between feature and the label class. Therefore, if the intersection value becomes
large this means the distance ratio is great for this feature. Thus, these methods give large values for features
that have a high similarity rate compared to the label class. In terms of classifiers, the overall analysis of
BotDetectorFM found that RF and JRip give the highest results because the random forest classifier works as
a merge of multiple decision trees (algorithms) participate together to get more flexible, accurate and stable
classification results. Also, JRip works well with a large dataset by building models interpretable easily.
(a) accuracy (b) precision
(c) recall (d) f-measure
(e) 3D Line chart for all metrics
Figure 5. Comparison of performance metrics with feature selection methods based on
highest eight selected features
13. Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752
BotDetectorFW: An optimized Botnet Detection Framework based on… (Aaya F. Jabbar)
389
5. CONCLUSION
In summary, by comparison with the tested methods in previous studies on the same dataset
CICIDS2017, the proposed framework BotDetectorFM introduces an advance prediction with the fewer
number of features (8 significant features only) that diminish the detection complexity of Botnet traffic with
high detection accuracy via applying an advanced mixture of perfect features selection and suitable
classification algorithms with careful preprocessing stage and advance reduction process of the dataset
dimension.
REFERENCES
[1] M. Eslahi, R. Salleh, and N. B. Anuar, “Bots and botnets: An overview of characteristics, detection and
challenges,” in 2012 IEEE International Conference on Control System, Computing and Engineering, pp. 349-354,
2012.
[2] G. Vormayr, T. Zseby, and J. Fabini, “Botnet communication patterns,” IEEE Communications Surveys &
Tutorials, vol. 19, pp. 2768-2796, 2017.
[3] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Intrusion Detection Evaluation Dataset (CIC-IDS2017),”
2018. Available: https://www.unb.ca/cic/datasets/ids-2017.html
[4] Y.-Y. Zhou and G. Cheng, “An Efficient Network Intrusion Detection System Based on Feature Selection and
Ensemble Classifier,” arXiv preprint arXiv:1904.01352, 2019.
[5] A. Boukhamla and J. C. Gaviro, “CICIDS2017 dataset: performance improvements and validation as a robust
intrusion detection system testbed,” International Journal of Information and Computer Security, Sept. 2018.
[6] R. Abdulhammed, H. Musafer, A. Alessa, M. Faezipour, and A. Abuzneid, “Features Dimensionality Reduction
Approaches for Machine Learning Based Network Intrusion Detection,” Electronics, vol. 8, no. 3, pp. 322, 2019.
[7] A. Yulianto, P. Sukarno, and N. A. Suwastika, “Improving AdaBoost-based Intrusion Detection System (IDS)
Performance on CIC IDS 2017 Dataset,” in Journal of Physics: Conference Series, vol. 1192, pp. 012018, 2019.
[8] S. Ustebay, Z. Turgut, and M. A. Aydin, “Intrusion Detection System with Recursive Feature Elimination by Using
Random Forest and Deep Learning Classifier,” in 2018 International Congress on Big Data, Deep Learning and
Fighting Cyber Terrorism (IBIGDELFT), pp. 71-76, 2018.
[9] R. McKay, B. Pendleton, J. Britt, and B. Nakhavanit, “Machine Learning Algorithms on Botnet Traffic: Ensemble
and Simple Algorithms,” in Proceedings of the 2019 3rd International Conference on Compute and Data Analysis,
pp. 31-35, 2019.
[10] B. N. Kumar, M. S. B. Raju, and B. V. Vardhan, “A novel approach for selective feature mechanism for two-phase
intrusion detection system,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 14, pp. 101-
112, 2019.
[11] T. Y. Christyawan, A. A. Supianto, and W. F. Mahmudy, “Anomaly-based intrusion detector system using
restricted growing self organizing map,” Indonesian Journal of Electrical Engineering and Computer Science, vol.
13, pp. 919-926, 2019.
[12] H. Suhaimi, S. I. Suliman, I. Musirin, A. F. Harun, and R. Mohamad, “Network intrusion detection system by
using genetic algorithm,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 16, pp. 1593-
1599, 2019.
[13] N. Z. M. Safar, N. Abdullah, H. Kamaludin, S. Abd Ishak, and M. R. M. Isa, “Characterising and detection of
botnet in P2P network for UDP protocol,” Indonesian Journal of Electrical Engineering and Computer Science,
vol. 18, pp. 1584-1595, 2020.
[14] N. Alias, C. F. M. Foozy, N. R. Ramli, and N. Zainuddin, “Video spam comment features selection using machine
learning techniques,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 15, pp. 1046-1053,
2019.
[15] O. Osanaiye, H. Cai, K.-K. R. Choo, A. Dehghantanha, Z. Xu, and M. Dlodlo, “Ensemble-based multi-filter feature
selection method for DDoS detection in cloud computing,” EURASIP Journal on Wireless Communications and
Networking, vol. 2016, no. 130, pp. 1-10, 2016.
[16] C. Khammassi and S. Krichen, “A GA-LR wrapper approach for feature selection in network intrusion detection,”
computers & security, vol. 70, pp. 255-277, 2017.
[17] Y.-D. Lan, “A Hybrid Feature Selection based on Mutual Information and Genetic Algorithm,” Indonesian Journal
of Electrical Engineering and Computer Science, vol. 7, pp. 214-225, 2017.
[18] A. F. Alharan, H. K. Fatlawi, and N. S. Ali, “A cluster-based feature selection method for image texture
classification,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 14, pp. 1433-1442, 2019.
[19] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “A Detailed Analysis of the CICIDS2017 Data Set,” in
International Conference on Information Systems Security and Privacy, pp. 172-188, 2018.
[20] M. Jupri and R. Sarno, “Data mining, fuzzy AHP and TOPSIS for optimizing taxpayer supervision,” Indonesian
Journal of Electrical Engineering and Computer Science, vol. 18, pp. 75-87, 2020.
[21] T. A. Runkler, “Data Pre-processing,” in Secondary Analysis of Electronic Health Records, ed: Springer, pp. 115-
141, 2016.
[22] R. Elankavi, R. Kalaiprasath, and D. R. Udayakumar, “A fast clustering algorithm for high-dimensional data,”
International Journal of Civil Engineering and Technology (Ijciet), vol. 8, pp. 1220-1227, 2017.
[23] A. Fatima, N. Nazir, and M. G. Khan, “Data cleaning in data warehouse: a survey of data pre-processing
techniques and tools,” IJ Information Technology and Computer Science, vol. 3, pp. 50-61, 2017.
14. ISSN: 2502-4752
Indonesian J Elec Eng & Comp Sci, Vol. 21, No. 1, January 2021 : 377 - 390
390
[24] S. Samdani and S. Shukla, “A novel technique for converting nominal attributes to numeric attributes for intrusion
detection,” in 2017 8th International Conference on Computing, Communication and Networking Technologies
(ICCCNT), pp. 1-5, 2017.
[25] S. Jain, S. Shukla, and R. Wadhvani, “Dynamic selection of normalization techniques using data complexity
measures,” Expert Systems with Applications, vol. 106, pp. 252-262, 2018.
[26] M. De Marsico and D. Riccio, “Weighty LBP: a new selection strategy of LBP codes depending on their
information content,” in International Conference on Image Analysis and Processing, pp. 424-434, 2017.
[27] R. Todeschini, V. Consonni, H. Xiang, J. Holliday, M. Buscema, and P. Willett, “Similarity coefficients for binary
chemoinformatics data: overview and extended comparison using simulated and real data sets,” Journal of
chemical information and modeling, vol. 52, pp. 2884-2901, 2012.
[28] S. Han, C. Zhao, W. Meng, and C. Li, “Cosine similarity based fingerprinting algorithm in WLAN indoor
positioning against device diversity,” in 2015 IEEE International Conference on Communications (ICC), pp. 2710-
2714, 2015.
[29] R. R. Shamir, Y. Duchin, J. Kim, G. Sapiro, and N. Harel, “Continuous dice coefficient: a method for evaluating
probabilistic segmentations,” arXiv preprint arXiv:1906.11031, 2019.
[30] F. B. Allyson, M. L. Danilo, S. M. José and B. C. Giovanni, “Sherlock N-overlap: Invasive Normalization and
Overlap Coefficient for the Similarity Analysis Between Source Code,” in IEEE Transactions on Computers, vol.
68, no. 5, pp. 740-751, 1 May 2019. doi: 10.1109/TC.2018.2881449.
[31] D. A. Walker, “JMASM 48: The Pearson Product-Moment Correlation Coefficient and Adjustment Indices: The
Fisher Approximate Unbiased Estimator and the Olkin-Pratt Adjustment (SPSS),” Journal of Modern Applied
Statistical Methods, vol. 16, pp. 29, 2017.
[32] H. R. Esmaeel, “Analysis of classification learning algorithms,” Indonesian Journal of Electrical Engineering and
Computer Science, vol. 17, pp. 1029-1039, 2020.
[33] J. Jabez, S. Gowri, S. Vigneshwari, J. A. Mayan, and S. Srinivasulu, “Anomaly Detection by Using CFS Subset
and Neural Network with WEKA Tools,” in Information and Communication Technology for Intelligent Systems,
ed: Springer, pp. 675-682, 2019.
[34] E. Jedari, Z. Wu, R. Rashidzadeh, and M. Saif, “Wi-Fi based indoor location positioning employing random forest
classifier,” in 2015 international conference on indoor positioning and indoor navigation (IPIN), pp. 1-5, 2015.
[35] S. Bharati, M. A. Rahman, and P. Podder, “Breast cancer prediction applying different classification algorithm with
comparative analysis using WEKA,” in 2018 4th International Conference on Electrical Engineering and
Information & Communication Technology (iCEEiCT), pp. 581-584, 2018.
[36] F. MA, W. Yassin, and N. H. MS, “Scrutinized System Calls Information Using J48 And Jrip For Malware
Behaviour Detection,” Journal of Engineering Science and Technology, vol. 14, pp. 291-304, 2019.
[37] P. Tiwari, H. Dao, and G. N. Nguyen, “Performance evaluation of lazy, decision tree classifier and multilayer
perceptron on traffic accident analysis,” Informatica, vol. 41, 2017.
[38] I. Düntsch and G. Gediga, “Confusion matrices and rough set data analysis,” in Journal of Physics: Conference
Series 1229, pp. 1-6, 2019.
[39] A. Rácz, D. Bajusz, and K. Héberger, “Multi-Level Comparison of Machine Learning Classifiers and Their
Performance Metrics,” Molecules, vol. 24, p. 2811, 2019.
[40] Y. Jiao and P. Du, “Performance measures in evaluating machine learning based bioinformatics predictors for
classifications,” Quantitative Biology, vol. 4, no. 4, pp. 320-330, 2016.