ODSC 2019: Sessionisation via stochastic periods for root event identification

Sessionisation via Stochastic
periods for root event
identification
Kuldeep Jiwani
ODSC India 2019

Thales Overview
From the Bottom of the Oceans… to the Depths of
Space & Cyberspace
Key Digital Technologies

Thales: A Research and Development Powerhouse
6 times winner
2012, 2013,
2015, 2016,
2017, 2018
Expertise in a uniquely broad range
of technical domains, from science
to systems, applied across
businesses.
An extensive intellectual property
portfolio of 20,500 patents.
Albert Fert
Scientific director of the
CNRS/Thales joint physics
unit and winner of the
2007 Nobel prize in
physics.

Agenda
• Motivation of studying events
• Concept & purpose of Sessionisation
• Traditional approaches
• Real world case studies
• Applied Data Science way of doing Sessionisation

Events
• Orders placed in a market
• Sequence of user tweets
• User’s clicks on a website
• Activity update by an IoT device
• Network events on a router
• Network alarms in a network

Entities: Information flow
Events stream
Time

Time sequenced events
Time
Entity - 1

Time
Entity - 1
Entity - 2

Time
Entity - 1
Entity - 2
Entity - 3

Time
Entity - 1
Entity - 2
Entity - 3
Entity - 4

Sessions: Operations vs Data Science view
Continuity in
activity
Mean activity period
Gap >> Mean activity period
Sessions Sessions Sessions

Sessions: Operations vs Data Science view
Continuity in
activity
Chain of time
sequenced events
Mean activity period
Gap >> Mean activity period
Sessions Sessions Sessions
Time based correlation

Root event identification from session chain

Root event identification from session chain
Stochastic periods

Malicious Actors in the world of AI
• Orders placed in a market: Market manipulation
• Sequence of user tweets : Bot campaigns
• User’s clicks on a website: Fraudulent transactions
• Activity by an IoT device: Taking device control
• Network events on a router: Cyber attacks

Events stream as impulse train
1
0
Time
Time

Approaches for finding time based patterns
• Fourier transform
• Time period – Stochastic periods
• GMM (Gaussian Mixture Models)
• Infinite GMM (Gaussian Mixture
Models)
• Non-parametric Bayesian methods
• Applied data science techniques
Information
Complexity
Applied data
science

Fourier series: Quick intro
𝑃 𝑡 =
1
2
𝑎0 + 𝑎1 cos 𝜔𝑡 + 𝑎2 cos 2𝜔𝑡 + … + 𝑏1 sin 𝜔𝑡 + 𝑏2 sin 2𝜔𝑡 + …
𝑷 𝒕 =
𝟏
𝟐
𝒂 𝟎 +
𝒏=𝟏
∞
𝒂 𝒏 𝐜𝐨𝐬 𝒏𝝎𝒕 +
𝒏=𝟏
∞
𝒃 𝒏 𝐬𝐢𝐧 𝒏𝝎𝒕
𝑓 𝑡 → 𝑃(𝑡)

Fourier series: Quick intro
𝐸2
=
0
𝑇
(𝑓 𝑡 − 𝑃(𝑡))2
𝑑𝑡
𝜕𝐸2
𝜕𝑎 𝑛
= 0
𝜕𝐸2
𝜕𝑏 𝑛
= 0
RMSE
(Root Mean Square Error)
Minimize RMSE loss
Derivative = 0
𝑎 𝑛 =
2
𝑇 0
𝑇
𝑓(𝑡) cos 𝑛𝜔𝑡 𝑑𝑡 𝑏 𝑛 =
2
𝑇 0
𝑇
𝑓(𝑡) sin 𝑛𝜔𝑡 𝑑𝑡
𝑎0 =
2
𝑇 0
𝑇
𝑓(𝑡) 𝑑𝑡 𝑏0 = 0

Fourier transform: Quick intro
Euler’s formula 𝑒 𝑗𝜔𝑡
= cos 𝜔𝑡 + 𝑗 sin 𝜔𝑡
𝑃 𝑡 =
𝑛=−∞
∞
𝑐 𝑛 𝑒 𝑗𝑛𝜔𝑡
𝑐 𝑛 =
1
𝑇 0
𝑇
𝑓(𝑡) 𝑒−𝑗𝑛𝜔𝑡
𝑑𝑡
Fourier Series
𝐹 𝑗𝜔 =
−∞
∞
𝑓(𝑡)𝑒−𝑗𝜔𝑡
𝑑𝑡
Fourier Transform (CTFT)
𝑋 𝑗𝜔 =
𝑛=−∞
∞
𝑥(𝑛)𝑒−𝑗𝑛𝜔
Fourier Transform (DTFT)

Fourier Transform (DTFT): Impulse train
FT (Real): Magnitude FT (Imaginary): Phase shift

Fourier Transform (DTFT): Impulse train
FT (Real): Magnitude FT (Imaginary): Phase shift
10 1010

Plotting Fourier Transform in Python
N = time_signal.shape[0]
signal_fft = numpy.fft.fft(time_signal)
frequency_bins = numpy.fft.fftfreq(N)
fig, ax = plt.subplots(1,2,figsize=(28,7))
ax[0].plot(frequency_bins[1:N/2], np.abs(signal_fft.real[1:N/2]), 'g')
ax[1].plot(frequency_bins[1:N/2], signal_fft.imag[1:N/2], 'c')

Case studies via public datasets
• Sessionisation is an essential activity in detecting malicious bot
activities like Beaconing
• We will use 6th dataset of CTU-13 datasets for examples
• Provided by Czech Technical University (CTU)
• Traces captured from a malware attack executed in university network
• 6th dataset simulates a bot named DonBot, it attacks SVC services on Windows
• Dataset: https://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-Capture-
Botnet-47/bro/conn.log

Case – 1: DonBot’s DNS queries to university DNS
server

DonBot: DNS traces
Fourier
Transform
Bot’s
DNS event
stream
(Zoomed)

Stochastic periods: Introduction
• Analyze periodicity in time domain
• Compute consecutive time deltas
• Real world signals are noisy so time deltas will vary a lot
• If there is periodicity in the signal, time deltas will vary in a band
• The density plot of time deltas will show some high density regions
• We can learn a probability distributions for each high density region

DonBot: DNS traces
Bot’s
DNS event
stream
Time delta
analysis

DonBot: DNS traces
MeanPeriod = μ
Periodicity = μ / σ
Bot’s
DNS event
stream
Time delta
analysis

DonBot: DNS traces
MeanPeriod = μ
Bot’s
DNS event
stream
Time delta
analysis
Session-2Session-1 Session-3

DonBot: DNS traces
MeanPeriod = μ
Time delta
analysis
Time delta
Density
plot

DonBot: DNS traces
MeanPeriod = μ
Time delta
analysis
Time delta
Density
plot
PDF: Stochastic period

Case – 2: DonBot’s data transfer via backdoor
port (5678) with a malicious IP (91.212.135.158)

DonBot: Data transfer via backdoor
DonBot’s
backdoor
traffic
Time delta
analysis

Zoomed in
FT
without 0th
frequency
Time delta
analysis

Time delta
analysis
Time delta
Density
plot

Stochastic Period - 1
Stochastic Period - 2
Time delta
analysis
Time delta
Density
plot

Case – 3: Genuine systematic DNS queries to
university’s DNS server

Genuine systematic DNS queries to DNS server
Normal
DNS
queries
Time delta
analysis

Normal
DNS
queries
Time delta
analysis
[0.8, 1.6, 2.4, 3.2]

FT
without 0th
frequency
Time delta
analysis
FT: Only able to highlight the higher time periods

Time delta
analysis
Time delta
Density
plot

Time delta
analysis
Time delta
Density
plot
A
B
C
D

Case – 4: Just another interesting DNS pattern

Normal DNS queries
FT
without 0th
frequency
Time delta
analysis

Normal DNS queries
Time delta
analysis
Time delta
Density
plot

Auto discovering multiple distributions
tation Maximization
w to estimate parameter ?✓
Expectation Maximization
If sources are known,easy:
How to estimate parameter ?✓GMM - Gaussian Mixture Models

Auto discovering multiple distributions
sticmodel of data Gaussian Mixture Model (GMM)
GMM - Gaussian Mixture Models

GMM – Gaussian Mixture Models
• Does soft clustering of data points instead of hard clustering
• In principal it is very similar to K-Means but works on
probability
• K-Means: {P1  C1, P2  C2}, GMM: {P1  [0.8, 0.1, 0.1], P2  [0.05, 0.85, 0.1]}
• Problem with GMM & K-Means: We need to define “K”
• Techniques like Elbow method, Silhouette, etc. are based on certain assumptions
• Cannot be applied in general for automated discovery of K
• Finding “K” automatically is a very hard problem to solve
C1, C2, C3 C1, C2, C3

Constructing GMM
Gaussian Mixture Model (GMM)
𝑃 𝑥 𝜃 = 𝜋1 𝒩 𝑥 𝜇1, Σ1 + 𝜋2 𝒩(𝑥|𝜇2, Σ2)+ 𝜋3 𝒩(𝑥|𝜇3, Σ3)
𝜃 = {𝜋1, 𝜋2, 𝜋3, 𝜇1, 𝜇2, 𝜇3, Σ1, Σ2, Σ3} Parametric setting
K = 3

Constructing GMM
K = 3
t x
𝑝 𝑡 = 𝑐 𝜃 = 𝜋 𝑐 𝑝(𝑥|𝑡 = 𝑐, 𝜃) = 𝒩 𝑥 𝜇 𝑐, Σ 𝑐
t is a latent variable: [1, 2, 3], 𝜋1 + 𝜋2 + 𝜋3 = 1
𝑝 𝑥 𝜃 =
𝑐=1
𝐾
𝑝 𝑥 𝑡 = 𝑐, 𝜃 𝑝 𝑡 = 𝑐 𝜃Likelihood

Constructing GMM
K = 3
t x
𝑝 𝑡 = 𝑐 𝜃 = 𝜋 𝑐 𝑝(𝑥|𝑡 = 𝑐, 𝜃) = 𝒩 𝑥 𝜇 𝑐, Σ 𝑐
Training GMM: 𝑚𝑎𝑥
𝜃
𝑖=1
𝑁
𝑝(𝑥𝑖|𝜃)
Subject to: 𝜋 𝑘 = 1 ; 𝜋 𝑘 ≥ 0; Σ 𝑘 ≻0
EM algorithm:
• E-step: Compute q(t) dist. over t
• M-step: Update Gaussian params
• To fit points assigned to them
t is a latent variable: [1, 2, 3], 𝜋1 + 𝜋2 + 𝜋3 = 1
𝑝 𝑥 𝜃 =
𝑐=1
𝐾
𝑝 𝑥 𝑡 = 𝑐, 𝜃 𝑝 𝑡 = 𝑐 𝜃Likelihood

Sessionisation problem statement

Sessionisation: Data Science at scale
• In a real world scenario, be it
• Web users over internet, Network hosts in an enterprise network, etc.
• One would need to apply Sessionisation on millions of entities
• So manual inspection based methods cannot be used
• We need a fully automated system to discover multiple
”Stochastic Periods”
• We need to find the clusters automatically

Infinite GMM (Gaussian Mixture Models)
Based on Bayesian non-parametric approaches

Probabilistic programming
Logistic regression: 𝑝 𝑦𝑖 = 1 𝛽) = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽0) 𝛽 ~ 𝒩(𝜇, 𝜎)
Sampling:
• MCMC
• Gibbs

Probabilistic programming
Logistic regression: 𝑝 𝑦𝑖 = 1 𝛽) = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽0) 𝛽 ~ 𝒩(𝜇, 𝜎)
Sampling:
• MCMC
• Gibbs
Credible interval [a, b]
𝑝 𝑎 ≤ exp 𝛽𝑖 ≤ 𝑏 = 0.95
Posterior distribution of 𝛽
𝑝 exp 𝛽𝑖 𝑋𝑡𝑟𝑎𝑖𝑛, 𝑦𝑡𝑟𝑎𝑖𝑛)

Dirichlet distribution
𝐷𝑖𝑟 𝜃 𝛼 =
1
𝐵(𝛼)
𝑘=1
𝐾
𝜃 𝑘
𝛼 𝑘−1
𝒌
𝜽 𝒌 = 𝟏
𝜃 𝑘 ≥ 0
𝐵 𝛼 =
𝑖=1
𝐾
Γ(𝛼𝑖)
Γ( 𝑖=1
𝐾
𝛼𝑖)
Γ 𝑛 = 𝑛 − 1 !
Beta distribution
Gamma function
for positive integer n

𝐷𝑖𝑟 𝜃 𝛼 =
1
𝐵(𝛼)
𝑘=1
𝐾
𝜃 𝑘
𝛼 𝑘−1
K=3 Simplex
(0, 0, 1)
(0, 1, 0)(1, 0, 0)
(0.3, 0.2, 0.5)
Effects of varying 𝛼
𝛼 = (10, 10, 10)
𝛼 = (0.1, 0.1, 0.1)
Density
𝒌
𝜽 𝒌 = 𝟏
𝜃 𝑘 ≥ 0
𝐵 𝛼 =
𝑖=1
𝐾
Γ(𝛼𝑖)
Γ( 𝑖=1
𝐾
𝛼𝑖)
Γ 𝑛 = 𝑛 − 1 !
Beta distribution
Gamma function
for positive integer n
(
1
3
,
1
3
,
1
3
)

Intuition behind Infinite GMM
Properties of Dirichlet distribution
Dirichlet distribution is conjugate to Multinomial distribution
If π = 𝜋1, 𝜋2, … , 𝜋 𝑘 ~ 𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡(𝛼1, 𝛼2, … , 𝛼 𝑘)
Dirichlet Satisfies expansion or combination rule:
𝝅 𝟏 𝜽, 𝝅 𝟏(𝟏 − 𝜽), 𝝅 𝟐, … , 𝝅 𝒌 ~ 𝑫𝒊𝒓𝒊𝒄𝒉𝒍𝒆𝒕(𝜶 𝟏 𝒃 , 𝜶 𝟏(𝟏 − 𝒃), 𝜶 𝟐, … , 𝜶 𝒌)
Allows to increase the dimensionality of Dirichlet
Where 0 < b < 1 and 𝜃~𝐵𝑒𝑡𝑎(𝛼1 𝑏, 𝛼1 1 − 𝑏 )

Intuition behind Infinite GMM
Properties of Dirichlet distribution
Dirichlet distribution is conjugate to Multinomial distribution
If π = 𝜋1, 𝜋2, … , 𝜋 𝑘 ~ 𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡(𝛼1, 𝛼2, … , 𝛼 𝑘)
Dirichlet Satisfies expansion or combination rule:
𝝅 𝟏 𝜽, 𝝅 𝟏(𝟏 − 𝜽), 𝝅 𝟐, … , 𝝅 𝒌 ~ 𝑫𝒊𝒓𝒊𝒄𝒉𝒍𝒆𝒕(𝜶 𝟏 𝒃 , 𝜶 𝟏(𝟏 − 𝒃), 𝜶 𝟐, … , 𝜶 𝒌)
Allows to increase the dimensionality of Dirichlet
Where 0 < b < 1 and 𝜃~𝐵𝑒𝑡𝑎(𝛼1 𝑏, 𝛼1 1 − 𝑏 )
Dirichlet Process π(2)
= 𝜋1
(2)
, 𝜋2
(2)
~ 𝐷𝑖𝑟( 𝛼
2,
𝛼
2) ~ 𝐷𝑖𝑟( 𝛼
4,
𝛼
4,
𝛼
4,
𝛼
4) ~ 𝐷𝑖𝑟( 𝛼
𝐾
,…
𝛼
𝐾
)
𝑲 → ∞

Dirichlet process
21 : The Indian Bu↵et Process
Figure 2: On the left is an example of Indian Bu↵et Process dish assign
the right is an example binary matrix generated from IBP.
3. The nth customer helps himself to each dish with probability mk /
dish k was chosen.
4. He tries Poisson(↵/ n) new dishes.
Indian buffet processChinese restaurant process
Chinese restaurant process in action

Distribution of Distributions
Mixture of Gaussians

Infinite GMM
Probabilistic
modelling:
• PyMC3
• TensorFlow
Scikit-learn:
sklearn.mixture.
BayessianGuassian
Mixture
Dirichlet Prior:
• Dirichlet Distribution - Finite GMM
• Dirichlet Process - Infinite GMM

Probabilistic modeling
• Probabilistic models captures the uncertainty better in real
world data
• But it is very computationally intensive
• The sampling process takes time to stabilize and then generate meaningful
results
• Certainly cannot work on large datasets

Applied Data Science for automated clustering

Finding dense regionsation Maximization

Obtaining stochastic periods recursively
Stochastic
periods
• Get probability distributions
from dense regions
Get dense
regions list
• Find dmin to cluster a
region
Recursively
split regions
• If region is:
{Heavy tailed, Multi-modal}
Kurtosis of normal distribution = 3
Heavy tailed: Excess Kurtosis > 6
𝐵𝑖𝑚𝑜𝑑𝑎𝑙𝑖𝑡𝑦 =
𝛾2
+ 1
𝜅
𝛾: Skewness
𝜅 : Kurtosis
Bimodality
for uniform
distribution 5/9
Bimodality > 0.8
Unimoda
l
Not unimodal
Unimo
dal ?

dmin via distance matrix
This topic was covered generically in detail during ODSC 2018 – “Topological space clustering”

dmin via distance matrix
This topic was covered generically in detail during ODSC 2018 – “Topological space clustering”
If local dense regions exists along with
sparsity, then we can obtain hierarchical
clusters at each mode

Density plot of distance matrix

Method proposed:
Finding optimal clustering-epsilon
• The problem comes down to finding the most optimal curve for the
Gaussian kernel
• One of the ways to solve it algorithmically
Grid Search
(band_width,
grid_size)
rFFT
Silverman
Transform
I-
rFFT
Score
(logLoss,
stdDev)
Minima
(band_width, grid_size)

Stochastic periods: Systematic DNS queries
Mean (μ) Std. (σ)
Periodicity
( μ / σ )
Skewness Kurtosis
Bimodal
Coeff
Num
Points
Density Range_min Range_max
0 0.809819 0.000499 1623.841793 0.252468 -0.628825 0.443460 330 330.000000 0.808578 0.810866
1 0.812474 0.000559 1454.556761 -0.879146 0.291782 0.536779 817 817.000000 0.810869 0.813244
2 0.813497 0.000141 5758.645345 0.130659 -1.036117 0.512333 426 426.000000 0.813245 0.813768
3 0.814162 0.000326 2495.343595 0.774715 -0.464280 0.623092 281 281.000000 0.813769 0.815018
4 1.622954 0.000631 2570.868659 -0.093343 -1.009738 0.504116 845 845.000000 1.621745 1.624109
5 1.625497 0.000630 2578.372267 -0.304701 -0.984417 0.540452 1386 1386.000000 1.624114 1.626489
6 1.627108 0.000496 3282.770498 0.992204 0.538372 0.558516 614 614.000000 1.626492 1.628858
7 2.436156 0.000627 3885.490319 0.059122 -1.007464 0.492753 208 208.000000 2.434985 2.437341
8 2.438674 0.000653 3733.232096 -0.230265 -0.988668 0.514873 269 269.000000 2.437374 2.439728

THANKS
E-mail: kuldeep.jiwani@gmail.com / kuldeep.Jiwani@thalesgroup.com
LinkedIn: https://www.linkedin.com/in/kuldeep-jiwani-988605/

Finite GMM: Bayesian setting
Algorithm: Collapsed Gibbs sampler for a finite Gaussian mixture model
Choose an initial z
For T iterations do # Gibbs sampling iterations
For i = 1 to N do
Remove xi ’s statistics from component zi # Old cluster assignment for xi
For k = 1 to K do # Every possible component
Calculate P(zi = k|zi , α)
Calculate p(xi |Xki , β)
Calculate P(zi = k|zi , X , α, β) ∝ P(zi = k|zi , α) p(xi |Xki , β)
End for
Sample knew from P(zi |zi , X , α, β) after normalizing
Add xi ’s statistics to the component zi = knew # New assignment for xi
End for
End for Evaluation metric for Gibbs: 𝑘=1
𝐾
𝑝 𝑋 𝑘 𝛽 𝑝(𝑧|𝛼)

Infinite GMM: Bayesian setting
Choose an initial z
For T iterations do # Gibbs sampling iterations
For i = 1 to N do
Remove xi ’s statistics from component zi # Old cluster assignment for xi
For k = 1 to K do # Every possible component
Calculate P(zi = k|zi , α)
Calculate p(xi |Xki , β)
Calculate P(zi = k|zi , X , α, β) ∝ P(zi = k|zi , α) p(xi |Xki , β)
End for
Calculate P (zi = k∗|zi, α) # Consider a new component
Calculate p(xi|β)
Calculate P (zi = k∗|zi, X , α, β) ∝ P (zi = k∗|zi, α) p(xi|β)
Sample knew from P(zi |zi , X , α, β) after normalizing
If any component is empty, remove it and decrease K
Add xi ’s statistics to the component zi = knew # New assignment for xi
End for
End for

ODSC 2019: Sessionisation via stochastic periods for root event identification

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ODSC 2019: Sessionisation via stochastic periods for root event identification

Similar to ODSC 2019: Sessionisation via stochastic periods for root event identification (20)

Recently uploaded

Recently uploaded (20)

ODSC 2019: Sessionisation via stochastic periods for root event identification

Editor's Notes