Botnet Detection in Software
Defined Networks by Deep
Learning Techniques
Authors:
Ivan Letteri
Giuseppe Della Penna
Giovanni De Gasperis
University of L’Aquila
Road Map
● Components
○ Botmaster
○ Bots
○ Command & Control
● Architecture
○ Client-Server
○ Peer 2 Peer
○ Hybrid
- Botnet: consist of a internet-connected devices (bots), controlled by an attacker called botmaster that
manage by Command & Control, difficult to detect and eradicate due to the
- Architecture: determine the size & distribution of botnet and could be Client-Server, P2P, or a hybrid of
both in order to be more resilient and avoid detection
Botnet and Cyber Crime
Software Defined Networking
● Architecture
○ Application plane
○ Control plane
○ Data plane
● Open Flow protocol
○ msgs
○ OF tables
● Application plane
○ Routing for traffic
monitoring
○ botnet behavioral
analysis
- SDN: is an emerging network approach where the intelligence is in a single component. The forwarding
process separate Data Plane from the routing (Control plane), through messages via OpenFlow protocol to
the flow tables
- Idea: traffic monitoring for malware behavior analysis in order to detect botnet attacks
State of the Art
● Tang et al.
○ NSL-KDD
○ Self Taught Learning
○ 6 SDN features
- Tang et al.: propose a deep learning IDS in SDN using STL and NSL-KDD dataset,
- 6 SDN features: duration, protocol type, source& destin. bytes, count and service count
- Kalavini et al.: compare ML models like SVM, Naive Bayes, Dec.Trees and NN using CTU13 dataset
- Wang W. et al.: encode traffic in images for train a CNN with customized dataset USTC-TFC2016
● Kalavaini et al.
○ CTU 13
○ SVN, NB, NN and
Decision Trees
● Wang W. et al.
○ Convolutional NN
○ USTC-TFC2016
The Dataset
● HogZilla Dataset
○ CTU 13
○ ISCX-IDS
○ 990k samples
○ 192 features
- HogZilla Dataset: public dataset by merging CTU13 and ISCX IDS preprocessed & classified
- Fair dataset: composed by 50% of bots traffic (180K samples) and 50% of normal traffic (180K samples)
- Feats. Selection: All statistical features from the Controller via OF, 8 direct extracted, the rest calculated
● Fair dataset
○ 50% bot traffic
○ 50% normal traffic
● Features Selection
○ 22 SDN features
■ 8 direct
■ 14 calculated
The Neural Network (MLP)
● Network implementation
○ Ternsorflow
○ Keras
○ SciKitLearn
● Multi Layer Perceptron
○ 22 input neurons
○ 7 hidden layer
(diamond shape)
○ Softmax activation
with 2 output neurons
● Dropout (dropp the output)
○ random cutting of 3%
of links
- Keras + TF + SciKitlearn: Keras communicate with TensorFlow API, SKlearn library for data manipulation
- Architecture: IN layer with 22 neurons, 7 hid. layer (44, 88, 176, 88, 44, 22, 11) and 2 neurons in OUT layer
- Avoid overfitting via Dropout: All layers are fully connected only 70% randomly linked every epoch
Experimentation
● Split in 5 Train&Test
Sets
○ Fair dataset
○ Shuffling
○ Partitioning
■ 50%-30%-20%
■ 50&20% Train
&Test set
■ 30% prediction
● Best result achieved
○ 5th dataset
○ 96,52 % of accuracy
- Splitting: 50% & 20% for train and test set, 30% for empirical prediction
- Split in 5 sets: “similar” to a 5-cross folder validation used during the testing
- Shuffling and Partitioning: reducing variance and making sure the model remain general
- 5th subset for Training: the best result with the 96,52% of accuracy
Fine-tuning hyper parameters
● Learning Rate
○ Step size descend
○ Increase/Decrease
■ Accuracy
● Batch Size
○ 100 samples is the
best size
● Epochs
○ 25 times is the
optimal compromise
● Optimizer
○ Adam is the best in
this experiment
- 0.001 of Learning Rate: is the optimal size of the updates applied to the network weight, especially prediction
- Batch Size: short size requires less memory and train fast but it is less accurate to estimate the gradient
- 25 Epochs: train significantly fast to get a good accuracy/performance compromise with prediction 96,96%
- Adam Optimizer: is clearly not the best optimizer for some tasks but in this case is the best
Conclusions
● Summary
○ New, big & realistic
dataset
○ Derived SDN-specific
features
○ MLP with 7 hidden
layers
○ Intense fine-tuned
parameters
○ result in 97%
prediction accuracy
on unknown traffic
- Future Work: an our “unbiased” new dataset SDN driven from accurate Data Analysis (scatter matrix,
feature Importance, Clustering, etc...)
- An in-depth traffic analysis with our SDN framework (SDNsecKit)

Botnet detection in SDN by DL techniques

  • 1.
    Botnet Detection inSoftware Defined Networks by Deep Learning Techniques Authors: Ivan Letteri Giuseppe Della Penna Giovanni De Gasperis University of L’Aquila
  • 2.
  • 3.
    ● Components ○ Botmaster ○Bots ○ Command & Control ● Architecture ○ Client-Server ○ Peer 2 Peer ○ Hybrid - Botnet: consist of a internet-connected devices (bots), controlled by an attacker called botmaster that manage by Command & Control, difficult to detect and eradicate due to the - Architecture: determine the size & distribution of botnet and could be Client-Server, P2P, or a hybrid of both in order to be more resilient and avoid detection Botnet and Cyber Crime
  • 4.
    Software Defined Networking ●Architecture ○ Application plane ○ Control plane ○ Data plane ● Open Flow protocol ○ msgs ○ OF tables ● Application plane ○ Routing for traffic monitoring ○ botnet behavioral analysis - SDN: is an emerging network approach where the intelligence is in a single component. The forwarding process separate Data Plane from the routing (Control plane), through messages via OpenFlow protocol to the flow tables - Idea: traffic monitoring for malware behavior analysis in order to detect botnet attacks
  • 5.
    State of theArt ● Tang et al. ○ NSL-KDD ○ Self Taught Learning ○ 6 SDN features - Tang et al.: propose a deep learning IDS in SDN using STL and NSL-KDD dataset, - 6 SDN features: duration, protocol type, source& destin. bytes, count and service count - Kalavini et al.: compare ML models like SVM, Naive Bayes, Dec.Trees and NN using CTU13 dataset - Wang W. et al.: encode traffic in images for train a CNN with customized dataset USTC-TFC2016 ● Kalavaini et al. ○ CTU 13 ○ SVN, NB, NN and Decision Trees ● Wang W. et al. ○ Convolutional NN ○ USTC-TFC2016
  • 6.
    The Dataset ● HogZillaDataset ○ CTU 13 ○ ISCX-IDS ○ 990k samples ○ 192 features - HogZilla Dataset: public dataset by merging CTU13 and ISCX IDS preprocessed & classified - Fair dataset: composed by 50% of bots traffic (180K samples) and 50% of normal traffic (180K samples) - Feats. Selection: All statistical features from the Controller via OF, 8 direct extracted, the rest calculated ● Fair dataset ○ 50% bot traffic ○ 50% normal traffic ● Features Selection ○ 22 SDN features ■ 8 direct ■ 14 calculated
  • 7.
    The Neural Network(MLP) ● Network implementation ○ Ternsorflow ○ Keras ○ SciKitLearn ● Multi Layer Perceptron ○ 22 input neurons ○ 7 hidden layer (diamond shape) ○ Softmax activation with 2 output neurons ● Dropout (dropp the output) ○ random cutting of 3% of links - Keras + TF + SciKitlearn: Keras communicate with TensorFlow API, SKlearn library for data manipulation - Architecture: IN layer with 22 neurons, 7 hid. layer (44, 88, 176, 88, 44, 22, 11) and 2 neurons in OUT layer - Avoid overfitting via Dropout: All layers are fully connected only 70% randomly linked every epoch
  • 8.
    Experimentation ● Split in5 Train&Test Sets ○ Fair dataset ○ Shuffling ○ Partitioning ■ 50%-30%-20% ■ 50&20% Train &Test set ■ 30% prediction ● Best result achieved ○ 5th dataset ○ 96,52 % of accuracy - Splitting: 50% & 20% for train and test set, 30% for empirical prediction - Split in 5 sets: “similar” to a 5-cross folder validation used during the testing - Shuffling and Partitioning: reducing variance and making sure the model remain general - 5th subset for Training: the best result with the 96,52% of accuracy
  • 9.
    Fine-tuning hyper parameters ●Learning Rate ○ Step size descend ○ Increase/Decrease ■ Accuracy ● Batch Size ○ 100 samples is the best size ● Epochs ○ 25 times is the optimal compromise ● Optimizer ○ Adam is the best in this experiment - 0.001 of Learning Rate: is the optimal size of the updates applied to the network weight, especially prediction - Batch Size: short size requires less memory and train fast but it is less accurate to estimate the gradient - 25 Epochs: train significantly fast to get a good accuracy/performance compromise with prediction 96,96% - Adam Optimizer: is clearly not the best optimizer for some tasks but in this case is the best
  • 10.
    Conclusions ● Summary ○ New,big & realistic dataset ○ Derived SDN-specific features ○ MLP with 7 hidden layers ○ Intense fine-tuned parameters ○ result in 97% prediction accuracy on unknown traffic - Future Work: an our “unbiased” new dataset SDN driven from accurate Data Analysis (scatter matrix, feature Importance, Clustering, etc...) - An in-depth traffic analysis with our SDN framework (SDNsecKit)

Editor's Notes

  • #3 Introduction to Botnets, the most used cyber criminal “tool”, and an introduction to SDN, software defined networking the new networking paradigm the Dataset... the collection of data used, and why we used a particular dataset the Neural network, more precisely the Deep Learning model that we have realized the Experiment conduct on hyper parameters
  • #4 A botnet consists of a number of internet-connected devices, each of which runs one or more bots. You can consider a Botnet like a system compromised by some kind of malware, malware controlled by an attacker, an attacker called botmaster @Botnets can be very difficult to detect and eradicate, difficult because the large size and distribution too. For example due to their architectures, that could be Client Server, P2P, or hybrid that is a mix of both, in order to be more resilient and avoid detection.
  • #5 An SDN is an emerging networking approach, that enables an easy and efficient (re)configuration function, in order to improve network performance and so on. The SDN centralizes the network intelligence, in a single component, called Controller. The Controller communicates with elements on the data plane (for example, switches or routers) through the OpenFlow protocol, for the routing. @For our research, we've been focused on the Application Plane, and the central idea is to realize an application for the traffic monitoring, in an SDN, for malware behavioral analysis, that can be done in order to detect botnet attacks.
  • #6 Tang et al. propose a deep learning based approach, for a general network intrusion detection in SDN using self-taught learning (STL) and a different dataset to our, the NSL-KDD dataset. And through six SDN features: duration, protocol type, source bytes, destination bytes, count, and service count they trained a NN. @Kalavaini et al. compare several different machine learning algorithms, such as SVM, naive bayes, decision trees and among those the neural networks, all these models in the context of network traffic classification. They exploit the CTU-13 dataset, surprisingly, they conclude that neural networks are the worst classifiers in this context. @Wang W. et al., encode traffic data as images, and then exploit a convolutional neural network for traffic classification, reach an accuracy rate of roughly 99% using a customized dataset USTC-TFC2016.
  • #7 Nowadays, the behavioural analysis is often supported by machine learning techniques, … and the core in Machine Learning, especially in supervised learning, is the employed dataset. After a lot of evaluations, we chosen the HogZilla dataset, because combines selected parts of the well-known CTU-13 dataset, @and good normal traffic that came from ISCX IDS datasets. Our work has been, to refine the Hogzilla dataset by extracting from it a @fair dataset, i.e., a balanced dataset containing exactly the same number (180, 000) of normal and botnet samples, the original dataset has 1 billion of samples unbalance of Normal traffic. @our dataset has only 22 features instead of 192 from the original one, more precisely 22 SDN features extractable via SDN. @how we selected 22 features: 8 features, from the HogZilla dataset, can be directly obtained via REST calls, REST is an architecture. These calls to the SDN controller, more precisely, from the flow and port statistics by OpenFlow protocol, and the other 14 features calculated from the previous eight.
  • #8 Neural networks have been widely used in classification tasks, because Bots are continuously evolving and thus, their behaviour quickly changes. To run our neural network we used TensorFlow, interacting with Keras library, and the other important library SciKitLearn, for data manipulation phase. We have done all the experiments in a local machine with Ubuntu server. In our experiment, we used a Multi Layer Perceptron with 22 neurons in input layer, 7 hidden layers distributed in order to form a diamond shape. And 2 output layers with softmax activation function, where 01 is normal traffic and 10 is bots traffic, with our fine-tuning hyper parameters we reduce/eliminate uncertainty values 00 and 11 . @All the layers are fully connected but unliked randomly via Dropout set to 30% in order to avoid overfitting.
  • #9 To test our approach, from the fair dataset, @at the beginning we extract 5 different train and test sets. we shuffled the dataset with the purpose of reducing variance and making sure that models remain general and overfit less, then we partitioned it in 50%-30%-20% parts. The first and last parts will be used for training and testing, whereas the remaining 30% is discarded and used later for a manual validation with our algorithm, in the prediction phase. @ as you can see in the table, the 5th dataset obtain the best result with the 96,52% of accuracy, so it is the best candidate
  • #10 One of the principal activity of this work: an Intense Fine-Tuning of Hyper Parameters, starting from:The learning rate parameter, which controls the size of the updates applied to the network weights. @in a perspective of performance, choosing the best Batch Size is important, we realize that group 100 samples is the best strategy. Because it is short enough to train fast and accurate enough to the estimate of the gradient. @Furthermore, we tried to change the number of epochs, only 25 epochs and we get the higher prediction accuracy (96,96%). The training is significantly faster. A good accuracy/performance compromise @The Optimizer: Adam is clearly not the best optimizer but seems the best in this particular experiment.
  • #11 The conclusion: in this paper, we performed botnet detection experiments on a new dataset, containing a very large amount of normal and botnet traffic samples, from which we extracted a set of botnet-specific meaningful features that can be actually derived in a SDN environment. @Our future work on this field will include a more in-depth analysis of the SDN traffic features, to further reduce the amount of data needed to reach the current accuracy levels.