1. Towards a Lightweight Intrusion Detection
System for the Internet of Things
1
Rashila Shrestha
M.Tech IT 2021
Nov 30, 2022
SANA ULLAH JAN, SAEED AHMED, VLADIMIR SHAKHOV, AND INSOO KOO
University of Ulsan
IEEE Access
2. Outlines
● Introduction
● Related Works
● Motivations
● Analysis of IoT threats
● Proposed Intrusion Detection System
● Experiments (1-7)
● Discussion
● Research Questions
● Conclusion and Future Scope
2
4. Introduction
● Integration of the internet into the entities of the different domains of human society is emerging.
● However, wide-range IoT networks make them prone to cyber attacks.
● One of the main types of attack is denial of service (DoS), where the attacker floods the network with
a large volume of data to prevent nodes from using the services.
● An intrusion detection mechanism is considered a chief source of protection for information and
communications technology.
● However, conventional intrusion detection methods need to be modified and improved for
application to the Internet of Things
○ certain limitations, like resource-constrained devices, the limited memory and battery capacity of
nodes, and specific protocol stacks.
4
6. Related Works
● Farahnakian and Heikkonen proposed a deep autoencoder (DAE)-based
classification model to detect and classify intrusions in the network.
○ Obtained classifier by combining the set of auto-encoder(AE)
○ Output of AE at (n-1)th layer is the input to AE at nth layer.
○ Better intrusion detection compared to deep belief network (DBN) and combination of AE+DBN
● Shone et al. proposes a nonsymmetric deep auto-encoder (NDAE) to learn features
in an unsupervised manner.
○ A set of NDAEs is stacked to perform the learning and classification tasks.
6
7. Related Work contd..
● Oh et al. proposed a malicious pattern detection.
○ Auxiliary Shifting Method and early detection scheme
○ Reduce memory usage in pattern matching process
○ Efficient early detection of malicious pattern
○ Limitation:
■ Failed to detect and classify other attacks, such as denial of service (DoS), false data injection,
etc.
● Ali et al. proposed a fast learning network (FLN) with particle swarm optimization
(PSO) applied for convergence of classifier parameters.
○ Limitations:
■ Complexity of the system was too high to be applied to sensor nodes due to their low
computation and energy-storage capabilities.
7
8. Related Work contd..
● Moukhafi et al. combined a hybrid genetic algorithm (GA) and an SVM with PSO for
feature subset selection in their proposed intrusion detection system.
○ Succeeded in differentiating DoS attack from other attacks with 100% accuracy
○ Limitation:
■ Could not discriminate normal class signals from other types of attacks with reasonable
accuracy.
● Kabir et al. proposed an optimum allocation–based least square SVM (OA-LS-SVM)
for intrusion detection systems.
○ This technique first combines the training and testing datasets. Then, an optimal allocation (OA)
scheme determines the volume of training and testing sets.
○ Later, it selects representative samples directly from training and testing datasets for the classifier.
8
9. Motivation
● All those proposed methods were able to perform efficiently in detecting intrusions
in the network.
● These techniques rely on resource-intensive computing and may be too heavy for
the low-capacity nodes of IoT networks.
● Thus, it was necessary to design an IDS requiring low computational costs, minimal
energy, and little memory in the network nodes.
9
11. IoT Traffic Modeling
● To analyze network behavior, a mathematical model of the traffic is used.
● Poisson process has been used for traffic modeling in IoT system.
● Also generates training and testing samples for SVM performance analysis.
11
12. Proposed Intrusion Detection System
● Two main phases of the system:
○ Training and evaluation phase
● In training phase, labeled data samples are obtained
● In the first stage, features are extracted to obtain feature pool.
● The resulting feature pool along with a vector of labels is then used to train the
classifier.
● After a trained classifier is obtained, it is then presented to classify the unobserved
samples from the test dataset.
● To evaluate the performance of the classifier, similar features used in the training
phase are extracted from the test samples in the test dataset.
● These unlabeled test samples are then given input to the classifier to obtain the
predicted output.
12
14. Performance Evaluation Parameters
● A true positive is an outcome where the model correctly predicts the positive class.
● Similarly, a true negative is an outcome where the model correctly predicts the
negative class.
● A false positive is an outcome where the model incorrectly predicts the positive
class.
● A false negative is an outcome where the model incorrectly predicts the negative
class.
14
15. 15
Performance Evaluation Parameters
● True Positive Rate: The ratio of true positives to the number of observations
predicted as positive. Also known as sensitivity, recall, or hit rate:
● False Positive Rate: The ratio of false positives to the sum of false positives and true
negatives. Also known as fall-out.
16. Performance Evaluation Parameters contd..
● False Detection Rate: Defined as (1 – accuracy) and calculated as follows:
● Accuracy (ACC): The ratio of true values to total observations, calculated as follow:
16
17. Experiment 1
● SVM-based classifier is trained with three kernels
○ linear, polynomial, and radial-basis function
● The four feature combinations:
○ mean and maximum (max) values
○ mean and median values
○ max and median values
○ mean, max, and median values.
● Datasets with 100 observations
○ 40 observations for training the classifier
○ 60 observations for testing
17
20. Analysis
● Linear-SVM outperformed the counter poly SVM and rbf-SVM in terms of accurately
classifying the input signal using all four combinations of features.
● Detection time for all types of SVM was similar for respective combinations of
features.
● Mean and max took the longest time to classify 120 observations in the test set.
“Mean, max, and median was the combination of features that required the shortest time
from among all combinations of features to obtain final results”
20
23. Experiment 2
● Test the performance of classifier:
○ Signal S, of length 1000 elements obtained using bigram technique(N-gram with N=2)
○ S = {s1, s2, ..., sr} r: random number between 1-1000
○ Snorm = {s1, s2, ..., sr} was obtained through Poisson distribution with parameter λnorm,
○ Signal Sintr = {Sr+1, Sr+2, ..., S1000} was generated with a Poisson distribution from parameter λintr.
○ Finally, the two vectors, Snorm and Sint were concatenated to obtain one signal, S = [Snorm,Sintr], of
1000 elements
● λnorm = 2 and λintr =4 are used to obtain the training and testing signals.
23
24. 24
Analysis
● Linear kernel-based classifier outperformed the other two classifiers in terms of
accuracy.
● The polynomial kernel-based classifier reported the worst accuracy for features sets
1, 2 and 3.
● In feature set 4, the RBF kernel-based classifier showed the lowest accuracy.
● The RBF kernel-based classifier outperformed the linear and polynomial kernel-
based classifiers in terms of TPR.
● The worst results were reported by the polynomial kernel based classifier
26. Experiment 3
● Scenario:
○ similar λnorm, λintr = 6
● Performance of Kernel-based classifier improved compared to linear and RBF
kernel-based classifiers
● Lower window size for feature set 1:
○ increase the accuracy of TPR of polynomial kernel-based classifier.
● Further, decrease in w :
○ classifier’s performance degraded to lower in linear and RBF kernel-based classifiers.
● Using feature sets 2,3,and 4:
○ polynomial kernel-based classifier showed comparable performance to remaining
classifiers.
○ Improved performance in FPR, FDR and τ.
● Increase in λintr increases the efficiency of all classifiers for all sets of features
26
28. Experiment 4
● Scenario:
○ λintr = 1.5 and λnorm = 2
● Analysis
○ performance of the polynomial kernel–based classifier was
worse compared to the other two classifiers in many cases.
○ performance of the linear and RBF kernel–based classifiers
are comparable to each other.
28
30. Experiment 5
● Scenario:
○ λintr was further decreased to 1 to analyze the effect on the
performance of the classifiers.
● Analysis:
○ performance by all classifiers improved for all cases and classifier types
○ linear kernel–based classifier achieved the best results among the
three types of classifiers
● Results strengthen the statement that an increase in the
difference between λnorm and λintr increases the performance
of the classifiers.
30
32. Experiment 6
● Compared proposed SVM algorithm with NN, KNN and DT algorithm
● SVM was implemented using three kernels.
○ linear, polynomial and radial-basis function - SVM(L), SVM(P) and SVM(R)
● Selected parameters and algorithm:
○ Neural Network, a feed forward network with input, hidden and output
layers comprised of 2 or 3 and 2 and 1 nodes respectively.
○ Levenberg Marquardt (LM) in training phase for backpropagation
○ KNN classifier was used to classify an input sample by obtaining the
Euclidean distance from 3 nearest neighbor
○ Decision Tree, a classification decision tree was fitted in training phase to
perform classification of test samples
32
34. Experiment 7
● Efficiency of proposed algorithm was analysed based on accuracy and CPU time with
the algorithm in GA-SVM, A-IDS, and WFS-IDS.
● Measure of accuracy shows how efficiently an algorithm classify an input signal as
normal or intruded.
● To measure the lightweightness of an algorithm, the two parameters including
consumed energy and elapsed time were used.
34
35. Analysis
● Datasets - 10,000 samples where the 60% samples are reserved for training and 40%
for testing purpose
○ three features; mean, maximum and median measures of the packet arrival rate
● Generate a parallel synchronized dataset to the CICIDS2017 for proving the
efficiency of proposed algorithm.
● The DDoS attack was implemented in the network at random points in time.
● This subset contained a total of 225,745 samples attributed in 78 features.
● Performance comparison among the algorithms GA-SVM, A-IDS, WFS-IDS and the
proposed algorithm by utilizing the generated dataset using Poisson distribution.
35
37. Analysis
● The accuracy of the proposed algorithm is the highest at 98.03%. The worst
performance is shown by the A-IDS algorithm with 89.76% accuracy.
● CPU execution time of the proposed scheme is the lowest with 208.9063, and 2.281
seconds for training and testing phases, respectively.
● These results prove the lightweightness property of the proposed algorithm with the
given set.
37
38. Analysis contd..
● The results of proposed, GA-SVM and WFS-IDS were comparable.
● The A-IDS shows comparatively poor performance.
● The proposed IDS successfully achieve highest TPs, TNs and lowest FNs instances.
38
39. Discussion
● Three features are used for feature selection and reduction:
○ system processing time can be reduced due to low time consumption of single attribute acquisition
from input data instead of multiple attributes.
○ extracting just 2 or 3 features from that single attribute takes lower time as compared to extraction of
up to 40 features from the multiple attributes.
○ the complexity of SVM is also reduced because of utilizing a much lower number of dimensions
(features) of input samples.
● The packet arrival rate was considered
○ follow the Poisson distribution, of the traffic intensity to the node.
● The Poisson distribution is just used for performance evaluation.
39
40. Research Questions
Why to use Packet Arrival Rate Attribute?
● The intrusions affects the traffic intensity.
● Either the data rate is increased(blackhole attack, warmhole attack), or
decreased(jamming, packet flooding)
● Thus, by monitoring the traffic intensity or packet arrival rate, intrusion in the
network can be detected.
40
41. B. Why SVM?
● Efficient Performance in ML algorithm
● Performance comparison in the project confirm the claim
C. How do we get less training and testing times?
● The major reason which reduces the computational time of the proposed algorithm
is the elimination of the feature selection which is the part of training phase only.
41
42. Conclusions
● Design lightweight IDS for anomaly detection in the IoT
● A common type of attack, known as DDoS, is the target.
● The proposed IDS focuses on two major issues
○ the attribute of the receiving data used to classify the signal
○ the machine learning based classifier
● Comparative analysis of SVM-based classifier with other machine learning-based
classifiers
● The results show that an SVM-based IDS can perform satisfactorily in detection of
attacks.
42
43. Future Work
● The effect of changing traffic intensity is not clearly pronounced or masked by
intruders.
● The algorithms which consider the complex 40 features may be able to detect more
sophisticated intrusions.
43
44. References
[1] Bruno Bogaz Zarpelão, Rodrigo Sanches Miani, Cláudio Toshio Kawakani, and Sean Carlisto de Alvarenga. A
survey of intrusion detection in internet of things. Journal of Network and Computer Applications, 84:25–37, 2017.
[2] Sabrina Sicari, Alessandra Rizzardi, Luigi Alfredo Grieco, and Alberto Coen-Porisini. Security, privacy and trust in
internet of things: The road ahead. Computer networks, 76:146–164, 2015.
[3] Dhananjay Singh, Gaurav Tripathi, and Antonio J Jara. A survey of internet-of-things: Future vision, architecture,
challenges and services. In Internet of things (WF-IoT), 2014 IEEE world forum on, pages 287–292. IEEE, 2014.
[4] Aref Meddeb. Internet of things standards: who stands out from the crowd? IEEE Communications Magazine,
54(7):40–47, 2016.
[5] Hichem Sedjelmaci, Sidi Mohamed Senouci, and Tarik Taleb. An accurate security game for low-resource iot
devices. IEEE Transactions on Vehicular Technology, 66(10):9381–9393, 2017.
44