How to Troubleshoot Apps for the Modern Connected Worker
Network behavioral clustering engine
1.
2. What do we do?
We use machine learning to empower
organizations to get a clearer view of their
networks
2
3. What do we offer?
›
Network Behavioral Clustering Engine
› NBCE is a set of technologies whose main objective is analyze
the structural differences among normal and malicious traffic
generated by cyber attacks and network intrusions.
› BCE characterizes the normal traffic using clustering and
propose how the normal traffic should look like at a specific
point in time.
3
4. What do we use?
Online machine learning. A paradigm shift
….00100110100101….
Online In memory Analytics
Memory
Memory
Disk
Off-line
moves to
Online
Disk
Only Analysis
Results are stored
Large Analysis process
What’s the benefit?
› Not all the generated data represents useful information
› Historical information is not always available (E.g. Electric car
adoption, new traffic in the network M2M)
› Need to react instantly to meaningful changes in data. (E.g. trends on
market stock exchange)
5. Intrusion/Attack detection
› Intrusion detection is the process of monitoring the events occurring in
a computer system or network and analyzing them for signs of
possible incidents.
› Incidents are violations or imminent threats of violation of:
› computer security policies.
› acceptable use policies.
› standard security practices.
› An intrusion detection system (IDS) is software that automates the
intrusion detection process.
› IDSs are primarily focuses on identifying possible incidents and
detecting when an attacker has successfully compromised a system
by exploiting vulnerability in the system.
6. Intrusion/Attack detection
Detection approaches
› Signature-Based Detection.
› A signature is a pattern that corresponds to a known threat (e.g. a
DoS attack). It is the process of comparing signatures against
observed events to identify possible incidents.
› Anomaly-Based Detection
› The process of comparing definitions of what activity is considered
normal against observed events to identify significant deviations.
Capable of detecting previously unknown threats. It is not required
to have a previous labeled dataset.
7. Using Clustering for Intrusion Detection
A Set of
Unlabeled
Data
Unsupervised
Anomaly Detection
Algorithm
› Assumptions for unsupervised anomaly
detection algorithm:
Detected Intrusion
Clusters
Comparison with
Detected Clusters
› The intrusions are rare with respect to normal network
traffic.
› The intrusions are different from normal network traffic.
› As a Result:
› The intrusions will appear as outliers in the data.
Detected malicious
attacks
8. Using Clustering for Intrusion
Detection
› The unsupervised anomaly detection
algorithm clusters the unlabeled data
instances together into clusters using a
simple distance-based metric.
› Once data is clustered, all of the instances
that appear in small clusters are labeled as
anomalies because:
› The normal instances should form
large clusters compared to the
intrusions,
› Malicious intrusions and normal
instances are qualitatively different, so
they do not fall into the same cluster.
9. Metric & Normalization
• Euclidean Metric
(for distance computation)
• Feature Normalization
(to eliminate the difference in the scale of features)
9/29
10. Methodology
› Feed the traffic data into the system
› Distribute the traffic across the Turing Nodes
› Red-Means Algorithm:
› Feature selection
› Run the distributed clustered algorithm
› Dynamic cluster number discover
› Major cluster metrics calculation
› Collect the results from the nodes to a central point
› Logistic Regression
› Compare the results with the predictive model
› Re-train the model if it required
› Mark the current situation as normal or abnormal
11. NBCE Architecture
Streaming
Text/XML
Files
Kantor Nodes
Turing Nodes
Internal
Comm system
Picasso Node
Communicator
Red-Means Algorithm
Comm External Sys
Feature distribution
Configuration File
DB
Access
Internal
Comm system
SMNP
Console
SOAP Notification
Logistic regression
› Kantor is the
› Turing is the core. It
listener of the
distributes the
Turing Nodes
system. It reads
clustering algorithm,
or listen from
collects results and
streaming,
create a prediction
sockets, text files
model about how the
or dbs.
traffic should behave
in the future.
› Picasso is the data
visualization
component. It
shows in a friendly
manner how traffic
looks like and how
it will be.
Editor's Notes
The intrusions are rare with respect to normal network traffic, numberof normal instances is much bigger than number of intrusioninstances. The intrusions are different from normal network traffic, which means intrusionsare qualitatively different from the normal instances.