Application of machine learning and cognitive computing in intrusion detection systems

Application of Hardware-based Machine
learning For Intrusion Detection using
Cognitive Processors
Mahdi Hosseini Moghaddam
Purdue University Calumet

Table of Content
• Introduction
• Why new IDS
• Significance of Problem
• Definitions
• Literature review
• Architecture
• Methodology
• Analysis
• Conclusion
• Future work
• Questions
• Cost
• Timeline
• References

Introduction
• New technologies come to market, and with them new vulnerabilities add to our
systems.
• Nowadays lots of devices connect to the internet not only computers but also
devices like TV, refrigerator, cell phones, doors and even small sensors.
• Our today’s markets are less tolerant to down time due to security issues or
attacks.
• Attacks likes Denial of Service can cause a big problem by make the service
unavailable and increase the down time

Why New IDS
• Intrusion detection systems use two approaches in order to detect the malicious
traffic :
• signature based which rely on the previously created list of known attacks
• Anomaly detection
• Signature-based approach can not detect Novell attack and zero-day attack.
• Anomaly detection uses machine learning algorithm, however most of them are
resource intensive.
• Performance and response time is crucial, fast detection is a MUST

Significance of Problem
• Signature based intrusion detection systems
need to check the traffics with thousands or
even millions of pattern gathered from
previously executed attacks
• novel attacks or previous attack with even a
minor changes are almost impossible to
detect in run time
• In order to add the signature of an attack to
the base system the attack first needs to
detect and analyze and then its pattern
should be created

Definitions
• Machine learning: that we refer to this as ML, is a system that can learn from
data
• Embedded System: is a sort of computer system often with real-time computing
constraints.
• Cognitive Processor: it uses the idea of neural network to build a processing unit
works like Human Brain. As the Brain it’s consist of small unit called neuron.
Neurons in this computational model have its own memory and logic for
operating on that memory.
• IDS: intrusion detection system
• RCE: Restricted Coulomb Energy is a Hyperspherical classifiers.
• KNN: K-Nearest Neighbor is a non-parametric method for classification and
regression

Definitions – KNN
• An object is classified by a majority
vote of its neighbors, with the object
being assigned to the class most
common among its k nearest
neighbors
• The neighbors are taken from a set of
objects for which the class (for k-NN
classification) is known.
• If k = 1, then the object is simply
assigned to the class of that single
nearest neighbor.

Definitions – RCE
• The architecture of the RCE network
contains two layers: A hidden layer
and an output layer.
• The hidden layer is fully
interconnected to all components of
an input pattern
• The output layer is sparsely connected
to the hidden layer; each hidden unit
projects its output to one and only one
output unit.

Literature Review
• A signature based IDS watches for network packets then compares that traffic to
a database of known attacks, called signatures. However, there will be a time gap
between the attack and the time the system can detect that attack (Barman
2012).
• In 2010, Stuxnet, a computer worm, affected nuclear facility in a country. It was
designed to harm PLC system (Falliere, 2011).
• Baker and Prasanna in 2004, proposed a methodology for building an efficient
IDS using FPGA. They showed that this methodology results in 8 times faster
computing time in comparison with shift-and-compare architecture. Although
they reached high throughput, the amount of false-positive errors was increased.
• In 2013 Yoon et al, suggested a Multicore-based IDS. Shared resources in
processors create a lot of problem and also add a lot complexity to development
of system using those processors. They tried to detect malicious behavior using
statistical analysis.

Architecture
• Data Collector :
Raspberry PI Board
• Interface Board:
Arduino Due
• Cognitive Processor :
CM1K – Cognimem

Architecture (2) – CM1K
• It features 1024 neurons working in parallel implementing two non-linear
classifiers.
• Learn and recognize patterns up to 256 bytes ( 1 Byte for each)
• Classify patterns up to 32,768 categories
• Choice of Restricted Coulomb Energy (RCE) or K-Nearest Neighbor (KNN)
classifiers
• Low cost, small footprint, low power consumption (0.5w)
• Recognition time independent of the number of neurons

Methodology – Data Collection
• A small packet sniffer has been
developed. The sniffer is based on
libpcap library.
• The developed packet sniffer is
installed on an embedded device
which is a Raspberry PI.
• The sniffer is based on libpcap library.
Once it reads the packet header, it
stores it into CSV format.

Methodology – Data Collection (2)
• In order to have required samples
a small isolated LAN has been set
up.
• Normal packets like ping trace
route and other TCP stream have
been generated in this network.
• Anomaly Packets were gathered
by running some network attack
using Netwox toolset.
• The dataset has 10 features

Methodology – Data Collection (3)
Features
• src_ip
• dst_ip
• Tos
• Len
• Id
• off
• ttl
• prt
• src_p
• dst_p

Methodology - Data Normalization
• There is only 1 byte available for each feature. 1 byte cannot store numbers higher
than 255.
• CM1K chip only accepts integer values so the values were rounded.
• Collected data should be normalized to fit in this range. This was achieved by
using this formula:
𝑥 𝑛𝑒𝑤 = 𝑟𝑑𝑜𝑤𝑛 +
𝑥 − 𝑥 𝑚𝑖𝑛
𝑥 𝑚𝑎𝑥 − 𝑥 𝑚𝑖𝑛
× (𝑟𝑢𝑝−𝑟𝑑𝑜𝑤𝑛)

Methodology - Classification and
Training
• Another column for class was added to dataset. For the normal data, the class is
‘1’ and for data gathered from anomaly traffic the class is ‘2’.
• 10 pairs of Test/Train file were prepared. Each file contained 512 samples for
normal traffic and 512 for anomaly traffic.
• The data must sent form the Arduino board to the CM1K.
• After the CM1K was trained The Arduino board loaded the test file into chip.
• The chip sends back the distance between the test samples and the trained
model starting from shortest distance.

Training Using CM1K
• The algorithm can be chosen
before training part. RCE and
KNN can be selected by changing
a data register on the Arduino
board.

Training Using Software SDK
• This SDK simulate the hardware algorithms and provide some report and
testing functionality.

Training Using NSL-KDD Dataset
• The KDD Cup '99 dataset was created by processing the tcpdump portions of
the 1998 DARPA Intrusion Detection System (IDS) Evaluation dataset
• NSL-KDD suggested in order solving some problem of KDD’99 dataset.
• NSL-KDD dataset has 41 features and provided thousands of data sample for
both training and testing.
• By using the same method used before, the CM1K was trained and then tested
with both KNN and RCE algorithm.
• From test and train samples 10 pairs of completely identical data were created.
Each sample file has 1024 samples.

Training Using NSL-KDD Dataset (2)
• protocol_type
• service
• flag
• src_bytes
• dst_bytes
• wrong_fragment
• num_failed_logins
• num_shells
• srv_count
• rerror_rate
• dst_host_count
• dst_host_same_srv_rate
• dst_host_diff_srv_rate
• su_attempted
NSL-KDD Dataset features:

Analysis- CM1K Result
• Simply by comparing the actual class with the determined class it is possible to
calculate accuracy
Sample # RCE RCE_N RCE_TIME KNN KNN_N KNN_TIME
1 76.76% 3 110249 71.68% 1024 110163
2 82.03% 3 110112 80.66% 1024 110261
3 83.59% 3 110207 85.35% 1024 110453
4 63.78% 5 110003 30.37% 1024 110446
5 85.44% 4 110075 87.21% 1024 110463
6 77.54% 3 110136 87.40% 1024 110240
7 61.82% 3 111890 87.79% 1024 110335
8 58.89% 3 111331 77.25% 1024 110322
9 66.31% 3 110177 32.91% 1024 110486
10 76.17% 3 110259 69.14% 1024 110256

Analysis- CM1K Result (2)
• Although the accuracy for both RCE and KNN are somehow close but RCE
showed less diversity and hence more consistency in the accuracy.
RCE KNN
Average 73.23% 70.98%
Variance 0.00940922 0.04730602
Standard Deviation 0.09700113 0.21749947

Analysis- Software Result
• The result for RCE algorithm obtained from hardware and software gathered
in below table. As it is shown below from the accuracy points of view both are
same however surprisingly software solution was much faster that hardware
ones.
Sample # Hardware Software
Accuracy # of Neurons TIME Accuracy # of Neurons TIME
1 76.76% 3 110249 76.76% 3 1230
2 82.03% 3 110112 82.03% 3 910
3 83.59% 3 110207 83.59% 3 890
4 63.78% 5 110003 64.06% 5 1170
5 85.44% 4 110075 85.45% 4 880
6 77.54% 3 110136 77.54% 3 780
7 61.82% 3 111890 61.82% 3 800
8 58.89% 3 111331 58.89% 3 900
9 66.31% 3 110177 66.31% 3 820
10 76.17% 3 110259 76.17% 3 880

Analysis- CM1K Result
NSL-KDD Dataset
• Because the same amount of data was used, the result is in same structure
with the dataset created as part of this project.
Sample # RCE RCE_N RCE_TIME KNN KNN_N KNN_TIME
1 79.39% 2 123728 87.01% 1024 123195
2 58.40% 3 123522 59.67% 1024 123500
3 79.59% 2 123853 87.40% 1024 123146
4 50.88% 7 123188 84.86% 1024 123430
5 57.91% 2 123662 86.72% 1024 123505
6 80.57% 2 123824 84.47% 1024 123338
7 79.88% 5 123362 88.77% 1024 123691
8 80.66% 3 123611 81.05% 1024 123448
9 58.30% 2 123678 83.40% 1024 123426
10 78.81% 2 123974 87.30% 1024 123286

Conclusion
• CM1K provides parallelism with low cost and energy consumption
• CM1K provides classification algorithm in hardware level
• Although KNN showed more accuracy but RCE used less Neuron.
• Having good data is a big challenge
• This project can be used for any classification problem
• 𝐼2 𝐶 is not a good communication bus as it creates bottleneck

Future Work
• Having more features regarding network packets
• Using a chain of chips
• Using USB instead of 𝐼2 𝐶
• Developing alarming method
• Create a general classifier

Cost
Cost for required equipment
Item Price
Arduino Due 40 $
Raspberry PI Model B 40$
Cognimem CM1K Chip 150$
Bread Board 20$
Memory SD 8 GB 12 $
Wire & resistor & oscillator 5 $
AC Adapter 5.0 V Out 20 $
USB Cable – A Male to B Male 7 $
Soldering Kit 90$

Time Line
120 Days dedicated for project accomplishment
0 20 40 60 80 100 120 140
Developing Packet Sniffer
Get the components
Design of the system
Installing Packet Sniffer on Raspberry PI
Soldering complete and approved by advisor
Gathering Sample from Network
Developing Classifier Code On Arduion
Training the Chip
Testing the IDS with random Data
Post testing modification
Timeline
Start Days Completed

References
• Cheng (2006). On-Time and Scalable Intrusion Detection in Embedded
Systems. Albert Mo Kim Cheng, Real-Time Systems Laboratory Department of
Computer Science University of Houston.
• Axelsson (1999). Research in intrusion-detection systems: A survey.
TR 98-17, Department of Computer Engineering, Chalmers University of Technology, G
¨ oteborg, Sweden, December 1998. Revised August 19, 1999.
• Kerschbaum (2001) Florian Kerschbaum, Eugene H. Spafford, Diego
Zamboni. Using internal sensors and embedded detectors for intrusion detection.
Center for Education and Research in Information Assurance and Security 1315
Recitation Building Purdue University.
• Tavallaee (2009) Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A.
Ghorbani. A Detailed Analysis of the KDD CUP 99 Data Set.
• Hripcsak, G., & Rothschild, A. (2005). Agreement, the F-Measure, and the Reliability in
Information Retrieval. Retrieved from
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090460/pdf/296.pdf

Application of machine learning and cognitive computing in intrusion detection systems

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Application of machine learning and cognitive computing in intrusion detection systems

Similar to Application of machine learning and cognitive computing in intrusion detection systems (20)

Recently uploaded

Recently uploaded (20)

Application of machine learning and cognitive computing in intrusion detection systems