From ensembles to computer networks

From ensembles
to computer
networks
Sevvandi Kandanaarachchi
Data61 MLRG Seminar
21st April 2022
1

Overview
• Why is finding interesting patterns in data important?
• Methodology: Item Response Theory to construct an anomaly
detection ensemble
• An application: computer networks
• Next challenges
2

Interesting patterns
in data
3

Interesting patterns in data – Why?
• We live in a data-rich world
• Phones and personal smart devices
• Videos/CCTV
• Satellites roaming around the planet
• Social media and content generation
• Wearable technology (heart rate monitor)
This Photo by Unknown
Author is licensed under CC
BY-SA
4

What should we focus upon?
• Impossible to go through all the data in real time
• But we want to know when something “important”
happens
• Important – context dependent
• A person who is monitored has had a fall (wearables)
• Deforestation (satellite data)
• A group harmful to society is gaining popularity (social
media – national security)
• A bushfire starting off
This Photo by Unknown Author is licensed under CC BY
This Photo by Unknown Author is licensed under
CC BY-ND
5

Challenges
• Automated tools to extract these events of interest
• Early detection is super important
• High accuracy
• Low false positive rates
• Complex, noisy data
Goals
• To allocate resources effectively and efficiently
• Prevent disaster from happening
• Or minimize the loss
6

A critical piece – finding the interesting bit
• It can be called many names
• Events, anomalies, outliers, novelties, emerging threats
• Can’t always train a model to find the interesting bit
• Can’t lock in what is interesting
• Training a model on certain fraud/intrusions/cyber attacks is not optimal, because there are
new types of fraud/attacks, always!
• Antivirus – known viruses only
• You want something more “intelligent” and accurate
• Alerts you when something weird happens with high accuracy
• Flexible (can evolve)
• A shift of focus over time
• Previously outliers were detected to be discarded – they make the model worse
• Now, we want to know about the anomalies – they are telling us something interesting
7

We looked at interesting patterns in data.
Next, we look at some specific research.
8

An anomaly
detection
ensemble using
Item Response
Theory
Unsupervised Anomaly Detection Ensembles using Item Response
Theory
Sevvandi Kandanaarachchi
Information Sciences (2022)
9

What are we trying to do?
Achieve Higher
Accuracy
New methods with
better accuracy
Build an ensemble
from existing
methods
10

What are we trying to do?
Achieve Higher
Accuracy
New methods with
better accuracy
Build an ensemble
from existing
methods
11

Specific challenges
• In regression we have 𝑥, 𝑦 → (𝑥, 𝑦 )
• So you can use e = 𝑦 − 𝑦 in your ensemble
• The models can be weighted by their accuracy
But…
• Unsupervised anomaly detection does not have 𝑦
• We have 𝑥 → each AD method gives 𝑦1, 𝑦2, 𝑦3, 𝑦4 → Ensemble gives
𝑦𝑒𝑛𝑠
12

What is an anomaly detection ensemble?
Dataset
Unsupervised
AD methods
The AD methods are heterogenous methods
AD ensemble
Ensemble
Score
The data 𝑥 The anomaly scores 𝑦1, 𝑦2 𝑦3 𝑦4, 𝑦5 , 𝑦6, 𝑦7
13

We use Item Response Theory to construct
the ensemble
 Explain IRT
 How we use it to construct an AD ensemble
14

What is Item Response Theory (IRT)?
• A set of models used in educational psychometrics/social sciences
• Premise - intrinsic “quality” that cannot be measured directly
• Racial prejudice or stress proneness
• Political inclinations
• Verbal or mathematical ability
• A test instrument
• A survey
• Exam
This Photo by Unknown Author is
licensed under CC BY-SA
15

IRT
Survey responses
Exam marks
IRT Model
Output
Discrimination of each test item
Difficulty of each test item
Participant ability (hidden quality)
16

IRT in education
• 𝑁 Students answer 𝑛 questions
• Your input to the IRT model is a matrix of
marks 𝑌𝑁×𝑛
• Fit the IRT model
• You get as your output
• Test item discrimination
• Test item difficulty
• Student ability (latent trait)
• Focus is on item discrimination and
difficulty
Q 1 Q 2 Q 3 Q 4
Stu 1 0.95 0.87 0.67 0.84
Stu 2 0.57 0.49 0.78 0.77
Stu n 0.75 0.86 0.57 0.45
17

IRT in psychometrics
• A survey
• Rosenberg's Self-Esteem Scale
• I feel I am a person of worth (Strongly Agree/Agree/Neutral/... )
• Use original responses (no marking as in education)
• Fit the IRT model
• Output
• Participants self-esteem (hidden quality = latent trait)
• Question discrimination
• Question difficulty
• Focus is on the hidden ability
18

IRT in Data Science/Machine Learning
• Relatively new area of research
• From performance data find
• Ability of classifiers
• Discrimination/difficulty of datasets
• 2019 - Item response theory in AI: Analysing machine learning
classifiers at the instance level – F. Martínez-Plumed et al.
19

IRT ensemble for anomaly detection
Latent trait = the anomalousness of the observations = the
ensemble score
High values → high anomalousness, low values → low
anomalousness
Matrix of
anomaly scores
𝑌𝑁×𝑛
IRT Model
=
20

Example
• AD methods (DDoutlier, h2o, e1071)
• KNN_AGG
• LOF
• COF
• INFLO
• KDEOS
• LDF
• LDOF
• Autoencoders – Deep learning
• OCSVM – One class Support Vector
Machine
• Isolation Forest – Tree based method
Nearest neighbourhood-based
methods
Density/distance based
Dataset
Unsupervised
AD methods
AD ensemble
Ensemble
Score
21

Unsupervised AD methods output
𝑌𝑁×𝑛 =
22

IRT Ensemble
𝑌𝑁×𝑛
IRT
ensemble
Ensemble
Score
23

Fitting the IRT model
• Maximising the expectation
• 𝐸 = 𝑁 𝑗(ln 𝛼𝑗 + ln |𝛾𝑗|) − 1/ 2 𝑖 𝑗 𝛼𝑗
2
𝛽𝑗 + 𝛾𝑗𝑧𝑖𝑗 − 𝜇𝑖
𝑡
2
+

Why does it work?
• Ensemble scores
𝜃𝑖 =
𝑗 𝛼𝑗
2
(𝛽𝑗+𝛾𝑗𝑧𝑖𝑗)
𝑗 𝛼𝑗
2
𝜃𝑖 - ensemble score for the 𝑖𝑡ℎ observation
𝛼𝑗 - discrimination
𝛾𝑗 - scaling parameters for the 𝑗𝑡ℎ AD method
𝛽𝑗 - difficulty
𝑧𝑖𝑗 - anomaly score of the 𝑗𝑡ℎ AD method on the 𝑖𝑡ℎ observation
25

Why does it work?
𝜃𝑖 =
𝑗 𝛼𝑗
2
(𝛽𝑗+𝛾𝑗𝑧𝑖𝑗)
𝑗 𝛼𝑗
2 = 𝑗(𝑐𝑗 + 𝑤𝑗𝑧𝑖𝑗)
• Ensemble scores are a weighted average of the original anomaly
scores
• The weights 𝑤𝑗 depend on the discrimination and scaling parameters
of each anomaly detection method
• AD Methods with higher discrimination get higher weights
• Ensemble accentuates better methods and downplays noisy methods
Each AD
method has a
weight
26

This work
• R package – outlierensembles – on CRAN
• Extends R package EstCRM for IRT
• Includes other anomaly detection ensembles as well
• More details on the paper https://arxiv.org/abs/2106.06243
27

We looked at an AD ensemble.
Next, we dive into an application.
28

An application in
computer network
security
Honeyboost: Boosting honeypot performance with data fusion and
anomaly detection
Sevvandi Kandanaarachchi, Hideya Ochiai (UTokyo), Asha Rao (RMIT)
Expert Systems with Applications (2022)
29

LAN Security Monitoring Project
• Between 12 ASEAN and
SAARC countries
• Boost cyber-resilience
among partners
• Countries in low
economic conditions
• Cost effective methods
• Focus on Local Area
Networks (LAN)
Average Monthly Malware Encounter Rate, 2018
(Microsoft, Security Intelligence Report, 2019)
About 10 nodes in Japan
3 nodes in Malaysia
1 node in Laos
6 nodes
in Thailand
2 nodes
in Myanmar
4 nodes in Indonesia
2 nodes in Cambodia
2 nodes
in India
2 node in
Philippines
4 nodes in Vietnam
30

LAN: Local Area Network
LAN-Security Monitoring
Device (honeypot)
Smartphones
Printer
Smart Appliances
Data Server
Inside a Local Area Network (LAN)
• Devices communicating with each other
• Any suspicious behaviour?
• Detect malware in action
31

The Data
• Several protocol features
• Features derived by
looking at packet headers
• Features specific to the
protocol
• Each protocol has a
different number of
features
Timestamp From_Node F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11
1553585825, '172.16.1.107', 80, 2, 64, 0, 2, 0, 0, 0, 0, 1, 1
1553585890, '172.16.1.107', 80, 2, 64, 0, 2, 0, 0, 0, 0, 1, 1
1553660565, '172.16.1.107', 80, 1, 64, 0, 1, 0, 0, 0, 0, 1, 1
1553660570, '172.16.1.107', 80, 1, 64, 0, 1, 0, 0, 0, 0, 0, 0
1553667575, '172.16.1.107', 80, 3, 64, 0, 3, 0, 0, 0, 0, 2, 2
1553667580, '172.16.1.107', 80, 1, 64, 0, 1, 0, 0, 0, 0, 0, 0
1553751195, '172.16.1.208', 80, 1, 64, 0, 1, 0, 0, 0, 0, 0, 0
Protocol 1
Timestamp From_Node G1 G2 G3
1554351595, '172.16.1.86', 3702, 2, 652
1554351595, '172.16.1.86', 137, 2, 78
1554351595, '172.16.1.86', 1900, 4, 146
1554351595, '172.16.1.86', 7, 1, 28
Protocol 2
32

Varying-dimensional time series
• Sort by time, then by node
• Different protocols have different features
• Finding anomalies from varying-dimensional time series
• 400 computers/nodes = 400 varying-dimensional time series
• Which ones are anomalous?
time
33

Varying-
dimensional time
series for each node
multivariate time
series
Compute features
The methodology
• Using a window model
• We know the real anomalous nodes and the times (they access
something they shouldn’t - honeypot)
AD method
lookout
34

Varying-
dimensional time
series for each node
multivariate time
series
Timestamp Protocol ARP count ARP
degree
TCP PC1 TCP PC2 UDP PC1 UDP PC2
30 ARP 10 12 0 0 0 0
55 TCP 0 0 -2.15 1.75 0 0
85 UDP 0 0 0 0 3.56 0.45
Node A
35

multivariate time
series
Compute features
Timest
amp
Protoc
ol
ARP
count
ARP
degree
TCP
PC1
TCP
PC2
UDP
PC1
UDP
PC2
30 ARP 10 12 0 0 0 0
55 TCP 0 0 -2.15 1.75 0 0
85 UDP 0 0 0 0 3.56 0.45
Node A
𝑅17
MV time series for each
node gets transformed to a
point in 𝑅17
Feature space for
all nodes
36

Features
• The total length of line segments in 𝑅6
• The maximum time difference
• Number of protocols used
• Number of TCP calls/UDP calls
• Total length of line segments in each protocol space
• Line of best fit in in each protocol space
• Sum of errors squared for the line of best fit
TCP PC1
TCP PC2
37

AD method - lookout
• lookout - work with Rob Hyndman. Published in JCGS (2021)
• Uses Extreme Value Theory (EVT) to find anomalies
• Applicability: Computer network traffic has heavy tails – EVT can
handle that
Feature space AD method
lookout
38

Results
• We identify real anomalies
before they access the
honeypot (they shouldn’t do
that)
• The nodes behave in an
anomalous way before a
“breach” is triggered
• We can predict a breach using
this method
• Low false positives
• Visualize anomalies develop
• Discover patterns of suspicious
behaviour
39

Thoughts ...
• This was a classic data science problem
• We were given the data and the problem context and asked to tackle
it
• How do you formulate the problem?
• Many building blocks
• Identifying anomalies and visualising them aids decision making
• Bonus: open up a research avenue
• Underlying general research problem, not application-specific
40

Another way to think about this problem
• Model the network dynamics
• Find suspicious behaviour in a
network
• Network dynamics not commonly
used in cyber security
• Public datasets do not facilitate that
• Growth potential in this area
• Tom Bernardi’s MSc project
41

Next challenges for the field
• Networks
• Anomalies/events in networks (computer networks)
• Nuwan’s MSc Project on behavioural biometrics
• Visualization of networks at different granularities
• Dynamic networks – echo chambers – how they form
• Event detection in spatiotemporal data
• Applications in epidemiology
• Can you identify a hotspot before it happens?
• Ecology
• Algorithm bias
• Bias in data + bias in algorithm
43

Recap: ensembles to networks
• Broad applicability in detecting
interesting patterns in data
• Applications in cyber security, wearable
sensors, satellite data, social media
• Core research problem ties back to
statistics/maths
• Need robust, highly accurate
methodologies that can capture these
patterns
• Exciting field. Thrilled to be part of it!
Stats
&
Maths
This Photo by Unknown
Author is licensed under CC
BY-SA
44

https://sevvandi.netlify.app/
@sevvandik
45

Continuous IRT model
• Samejima, 1969 – Continuous Response Model
• Wang and Zeng, 1998 – Procedure to compute item parameters using
expectation maximization for Samejima’s model
• Shojima, 2005 – Non-iterative item parameter solution in each EM
cycle
• Zopluoglu, 2015 – EstCRM R package implements Shojima’s 2005
model
• Update the loglikelihood – to include negative discrimination items
48

Example with
iterations
Data in 𝑅6
- first 2 dimensions shown,
others normally distributed
Evaluation metric – Area under ROC
Iteration 5
Iteration 10
50

Comparison on a data repository
51

LAN Security Monitoring
• ‘LAN-Security Monitoring Device’ to capture suspicious/ malicious
activities that happen inside a LAN.
LAN: Local Area Network
LAN-Security Monitoring Device
Honeypot - a trap for attackers
Smartphones
Printer
Smart Appliances
Data Server
53

Findings
• Suspicious nodes that do not
access the honeypot
Feature space for
all nodes
lookout
This node
does not
access the
honeypot
This node
does not
access the
honeypot
55

From ensembles to computer networks

Recommended

Recommended

More Related Content

Similar to From ensembles to computer networks

Similar to From ensembles to computer networks (20)

More from CSIRO

More from CSIRO (14)

Recently uploaded

Recently uploaded (20)

From ensembles to computer networks