SlideShare a Scribd company logo
Sessionisation via Stochastic
periods for root event
identification
Kuldeep Jiwani
ODSC India 2019
Thales Overview
From the Bottom of the Oceans… to the Depths of
Space & Cyberspace
Key Digital Technologies
Thales: A Research and Development Powerhouse
6 times winner
2012, 2013,
2015, 2016,
2017, 2018
Expertise in a uniquely broad range
of technical domains, from science
to systems, applied across
businesses.
An extensive intellectual property
portfolio of 20,500 patents.
Albert Fert
Scientific director of the
CNRS/Thales joint physics
unit and winner of the
2007 Nobel prize in
physics.
Agenda
• Motivation of studying events
• Concept & purpose of Sessionisation
• Traditional approaches
• Real world case studies
• Applied Data Science way of doing Sessionisation
Events
• Orders placed in a market
• Sequence of user tweets
• User’s clicks on a website
• Activity update by an IoT device
• Network events on a router
• Network alarms in a network
Entities: Information flow
Events stream
Time
Time sequenced events
Time
Entity - 1
Time sequenced events
Time
Entity - 1
Entity - 2
Time sequenced events
Time
Entity - 1
Entity - 2
Entity - 3
Time sequenced events
Time
Entity - 1
Entity - 2
Entity - 3
Time sequenced events
Time
Entity - 1
Entity - 2
Entity - 3
Entity - 4
Sessions: Operations vs Data Science view
Continuity in
activity
Mean activity period
Gap >> Mean activity period
Sessions Sessions Sessions
Sessions: Operations vs Data Science view
Continuity in
activity
Chain of time
sequenced events
Mean activity period
Gap >> Mean activity period
Sessions Sessions Sessions
Time based correlation
Root event identification from session chain
Root event identification from session chain
Root event identification from session chain
Root event identification from session chain
Root event identification from session chain
Root event identification from session chain
Stochastic periods
Malicious Actors in the world of AI
• Orders placed in a market: Market manipulation
• Sequence of user tweets : Bot campaigns
• User’s clicks on a website: Fraudulent transactions
• Activity by an IoT device: Taking device control
• Network events on a router: Cyber attacks
Events stream as impulse train
1
0
Time
Time
Approaches for finding time based patterns
• Fourier transform
• Time period – Stochastic periods
• GMM (Gaussian Mixture Models)
• Infinite GMM (Gaussian Mixture
Models)
• Non-parametric Bayesian methods
• Applied data science techniques
Information
Complexity
Applied data
science
Fourier transform intuition
Fourier series: Quick intro
𝑃 𝑡 =
1
2
𝑎0 + 𝑎1 cos 𝜔𝑡 + 𝑎2 cos 2𝜔𝑡 + … + 𝑏1 sin 𝜔𝑡 + 𝑏2 sin 2𝜔𝑡 + …
𝑷 𝒕 =
𝟏
𝟐
𝒂 𝟎 +
𝒏=𝟏
∞
𝒂 𝒏 𝐜𝐨𝐬 𝒏𝝎𝒕 +
𝒏=𝟏
∞
𝒃 𝒏 𝐬𝐢𝐧 𝒏𝝎𝒕
𝑓 𝑡 → 𝑃(𝑡)
Fourier series: Quick intro
𝐸2
=
0
𝑇
(𝑓 𝑡 − 𝑃(𝑡))2
𝑑𝑡
𝜕𝐸2
𝜕𝑎 𝑛
= 0
𝜕𝐸2
𝜕𝑏 𝑛
= 0
RMSE
(Root Mean Square Error)
Minimize RMSE loss
Derivative = 0
𝑎 𝑛 =
2
𝑇 0
𝑇
𝑓(𝑡) cos 𝑛𝜔𝑡 𝑑𝑡 𝑏 𝑛 =
2
𝑇 0
𝑇
𝑓(𝑡) sin 𝑛𝜔𝑡 𝑑𝑡
𝑎0 =
2
𝑇 0
𝑇
𝑓(𝑡) 𝑑𝑡 𝑏0 = 0
Fourier transform: Quick intro
Euler’s formula 𝑒 𝑗𝜔𝑡
= cos 𝜔𝑡 + 𝑗 sin 𝜔𝑡
𝑃 𝑡 =
𝑛=−∞
∞
𝑐 𝑛 𝑒 𝑗𝑛𝜔𝑡
𝑐 𝑛 =
1
𝑇 0
𝑇
𝑓(𝑡) 𝑒−𝑗𝑛𝜔𝑡
𝑑𝑡
Fourier Series
𝐹 𝑗𝜔 =
−∞
∞
𝑓(𝑡)𝑒−𝑗𝜔𝑡
𝑑𝑡
Fourier Transform (CTFT)
𝑋 𝑗𝜔 =
𝑛=−∞
∞
𝑥(𝑛)𝑒−𝑗𝑛𝜔
Fourier Transform (DTFT)
Fourier Transform (DTFT): Impulse train
FT (Real): Magnitude FT (Imaginary): Phase shift
Fourier Transform (DTFT): Impulse train
FT (Real): Magnitude FT (Imaginary): Phase shift
10 1010
Plotting Fourier Transform in Python
N = time_signal.shape[0]
signal_fft = numpy.fft.fft(time_signal)
frequency_bins = numpy.fft.fftfreq(N)
fig, ax = plt.subplots(1,2,figsize=(28,7))
ax[0].plot(frequency_bins[1:N/2], np.abs(signal_fft.real[1:N/2]), 'g')
ax[1].plot(frequency_bins[1:N/2], signal_fft.imag[1:N/2], 'c')
Case studies via public datasets
• Sessionisation is an essential activity in detecting malicious bot
activities like Beaconing
• We will use 6th dataset of CTU-13 datasets for examples
• Provided by Czech Technical University (CTU)
• Traces captured from a malware attack executed in university network
• 6th dataset simulates a bot named DonBot, it attacks SVC services on Windows
• Dataset: https://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-Capture-
Botnet-47/bro/conn.log
Case – 1: DonBot’s DNS queries to university DNS
server
DonBot: DNS traces
Fourier
Transform
Bot’s
DNS event
stream
(Zoomed)
DonBot: DNS traces
Fourier
Transform
Bot’s
DNS event
stream
(Zoomed)
DonBot: DNS traces
Fourier
Transform
Bot’s
DNS event
stream
(Zoomed)
Stochastic periods: Introduction
• Analyze periodicity in time domain
• Compute consecutive time deltas
• Real world signals are noisy so time deltas will vary a lot
• If there is periodicity in the signal, time deltas will vary in a band
• The density plot of time deltas will show some high density regions
• We can learn a probability distributions for each high density region
DonBot: DNS traces
Bot’s
DNS event
stream
Time delta
analysis
DonBot: DNS traces
MeanPeriod = μ
Periodicity = μ / σ
Bot’s
DNS event
stream
Time delta
analysis
DonBot: DNS traces
MeanPeriod = μ
Periodicity = μ / σ
Bot’s
DNS event
stream
Time delta
analysis
Session-2Session-1 Session-3
DonBot: DNS traces
MeanPeriod = μ
Periodicity = μ / σ
Time delta
analysis
Time delta
Density
plot
DonBot: DNS traces
MeanPeriod = μ
Periodicity = μ / σ
Time delta
analysis
Time delta
Density
plot
PDF: Stochastic period
Case – 2: DonBot’s data transfer via backdoor
port (5678) with a malicious IP (91.212.135.158)
DonBot: Data transfer via backdoor
DonBot’s
backdoor
traffic
Time delta
analysis
DonBot: Data transfer via backdoor
DonBot’s
backdoor
traffic
Time delta
analysis
DonBot: Data transfer via backdoor
Zoomed in
FT
without 0th
frequency
Time delta
analysis
DonBot: Data transfer via backdoor
Zoomed in
FT
without 0th
frequency
Time delta
analysis
DonBot: Data transfer via backdoor
Time delta
analysis
Time delta
Density
plot
DonBot: Data transfer via backdoor
Time delta
analysis
Time delta
Density
plot
DonBot: Data transfer via backdoor
Stochastic Period - 1
Stochastic Period - 2
Time delta
analysis
Time delta
Density
plot
Case – 3: Genuine systematic DNS queries to
university’s DNS server
Genuine systematic DNS queries to DNS server
Normal
DNS
queries
Time delta
analysis
Genuine systematic DNS queries to DNS server
Normal
DNS
queries
Time delta
analysis
[0.8, 1.6, 2.4, 3.2]
Genuine systematic DNS queries to DNS server
FT
without 0th
frequency
Time delta
analysis
FT: Only able to highlight the higher time periods
Genuine systematic DNS queries to DNS server
Time delta
analysis
Time delta
Density
plot
Genuine systematic DNS queries to DNS server
Time delta
analysis
Time delta
Density
plot
A
B
C
D
Case – 4: Just another interesting DNS pattern
Normal DNS queries
FT
without 0th
frequency
Time delta
analysis
Normal DNS queries
Time delta
analysis
Time delta
Density
plot
Auto discovering multiple distributions
tation Maximization
w to estimate parameter ?✓
Expectation Maximization
If sources are known,easy:
How to estimate parameter ?✓GMM - Gaussian Mixture Models
Auto discovering multiple distributions
sticmodel of data Gaussian Mixture Model (GMM)
GMM - Gaussian Mixture Models
GMM – Gaussian Mixture Models
• Does soft clustering of data points instead of hard clustering
• In principal it is very similar to K-Means but works on
probability
• K-Means: {P1  C1, P2  C2}, GMM: {P1  [0.8, 0.1, 0.1], P2  [0.05, 0.85, 0.1]}
• Problem with GMM & K-Means: We need to define “K”
• Techniques like Elbow method, Silhouette, etc. are based on certain assumptions
• Cannot be applied in general for automated discovery of K
• Finding “K” automatically is a very hard problem to solve
C1, C2, C3 C1, C2, C3
Bayesian way of building models
𝑃 𝜃 𝑋 =
𝑃 𝑋 𝜃 𝑃(𝜃)
𝑃(𝑋)
PriorLikelihood
Evidence
Posterior
𝑃 𝜃 𝑋 =
𝑃 𝑋 𝜃 𝑃(𝜃)
𝑃(𝑋)
𝑃(𝜃) is conjugate to 𝑃 𝑋 𝜃
A(𝜈’) A(𝜈)
For example:
P(𝜽) = 𝓝(𝜽|0, 1) # Standard normal
P(X|𝜽) = 𝓝(x|𝜽, 1) # with 1 std. dev
𝑃(𝜃|𝑋) ∝ 𝑒−
1
2(𝑥−𝜃)2
𝑒−
1
2 𝜃2
𝑃(𝜃|𝑋) ∝ 𝑒−(𝜃−
𝑥
2)2
P(𝜽|X) = 𝓝(𝜃|
𝑥
2
,
1
2
)
Constructing GMM
Gaussian Mixture Model (GMM)
𝑃 𝑥 𝜃 = 𝜋1 𝒩 𝑥 𝜇1, Σ1 + 𝜋2 𝒩(𝑥|𝜇2, Σ2)+ 𝜋3 𝒩(𝑥|𝜇3, Σ3)
𝜃 = {𝜋1, 𝜋2, 𝜋3, 𝜇1, 𝜇2, 𝜇3, Σ1, Σ2, Σ3} Parametric setting
K = 3
Constructing GMM
Gaussian Mixture Model (GMM)
𝑃 𝑥 𝜃 = 𝜋1 𝒩 𝑥 𝜇1, Σ1 + 𝜋2 𝒩(𝑥|𝜇2, Σ2)+ 𝜋3 𝒩(𝑥|𝜇3, Σ3)
𝜃 = {𝜋1, 𝜋2, 𝜋3, 𝜇1, 𝜇2, 𝜇3, Σ1, Σ2, Σ3} Parametric setting
K = 3
t x
𝑝 𝑡 = 𝑐 𝜃 = 𝜋 𝑐 𝑝(𝑥|𝑡 = 𝑐, 𝜃) = 𝒩 𝑥 𝜇 𝑐, Σ 𝑐
t is a latent variable: [1, 2, 3], 𝜋1 + 𝜋2 + 𝜋3 = 1
𝑝 𝑥 𝜃 =
𝑐=1
𝐾
𝑝 𝑥 𝑡 = 𝑐, 𝜃 𝑝 𝑡 = 𝑐 𝜃Likelihood
Constructing GMM
Gaussian Mixture Model (GMM)
𝑃 𝑥 𝜃 = 𝜋1 𝒩 𝑥 𝜇1, Σ1 + 𝜋2 𝒩(𝑥|𝜇2, Σ2)+ 𝜋3 𝒩(𝑥|𝜇3, Σ3)
𝜃 = {𝜋1, 𝜋2, 𝜋3, 𝜇1, 𝜇2, 𝜇3, Σ1, Σ2, Σ3} Parametric setting
K = 3
t x
𝑝 𝑡 = 𝑐 𝜃 = 𝜋 𝑐 𝑝(𝑥|𝑡 = 𝑐, 𝜃) = 𝒩 𝑥 𝜇 𝑐, Σ 𝑐
Training GMM: 𝑚𝑎𝑥
𝜃
𝑖=1
𝑁
𝑝(𝑥𝑖|𝜃)
Subject to: 𝜋 𝑘 = 1 ; 𝜋 𝑘 ≥ 0; Σ 𝑘 ≻0
EM algorithm:
• E-step: Compute q(t) dist. over t
• M-step: Update Gaussian params
• To fit points assigned to them
t is a latent variable: [1, 2, 3], 𝜋1 + 𝜋2 + 𝜋3 = 1
𝑝 𝑥 𝜃 =
𝑐=1
𝐾
𝑝 𝑥 𝑡 = 𝑐, 𝜃 𝑝 𝑡 = 𝑐 𝜃Likelihood
Sessionisation problem statement
Sessionisation: Data Science at scale
• In a real world scenario, be it
• Web users over internet, Network hosts in an enterprise network, etc.
• One would need to apply Sessionisation on millions of entities
• So manual inspection based methods cannot be used
• We need a fully automated system to discover multiple
”Stochastic Periods”
• We need to find the clusters automatically
Infinite GMM (Gaussian Mixture Models)
Based on Bayesian non-parametric approaches
Probabilistic programming
Logistic regression: 𝑝 𝑦𝑖 = 1 𝛽) = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽0) 𝛽 ~ 𝒩(𝜇, 𝜎)
Sampling:
• MCMC
• Gibbs
Probabilistic programming
Logistic regression: 𝑝 𝑦𝑖 = 1 𝛽) = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽0) 𝛽 ~ 𝒩(𝜇, 𝜎)
Sampling:
• MCMC
• Gibbs
Credible interval [a, b]
𝑝 𝑎 ≤ exp 𝛽𝑖 ≤ 𝑏 = 0.95
Posterior distribution of 𝛽
𝑝 exp 𝛽𝑖 𝑋𝑡𝑟𝑎𝑖𝑛, 𝑦𝑡𝑟𝑎𝑖𝑛)
Dirichlet distribution
𝐷𝑖𝑟 𝜃 𝛼 =
1
𝐵(𝛼)
𝑘=1
𝐾
𝜃 𝑘
𝛼 𝑘−1
𝒌
𝜽 𝒌 = 𝟏
𝜃 𝑘 ≥ 0
𝐵 𝛼 =
𝑖=1
𝐾
Γ(𝛼𝑖)
Γ( 𝑖=1
𝐾
𝛼𝑖)
Γ 𝑛 = 𝑛 − 1 !
Beta distribution
Gamma function
for positive integer n
Dirichlet distribution
𝐷𝑖𝑟 𝜃 𝛼 =
1
𝐵(𝛼)
𝑘=1
𝐾
𝜃 𝑘
𝛼 𝑘−1
K=3 Simplex
(0, 0, 1)
(0, 1, 0)(1, 0, 0)
(0.3, 0.2, 0.5)
Effects of varying 𝛼
Dirichlet distribution
𝛼 = (10, 10, 10)
Dirichlet distribution
𝛼 = (0.1, 0.1, 0.1)
Density
𝒌
𝜽 𝒌 = 𝟏
𝜃 𝑘 ≥ 0
𝐵 𝛼 =
𝑖=1
𝐾
Γ(𝛼𝑖)
Γ( 𝑖=1
𝐾
𝛼𝑖)
Γ 𝑛 = 𝑛 − 1 !
Beta distribution
Gamma function
for positive integer n
(
1
3
,
1
3
,
1
3
)
Intuition behind Infinite GMM
Properties of Dirichlet distribution
Dirichlet distribution is conjugate to Multinomial distribution
If π = 𝜋1, 𝜋2, … , 𝜋 𝑘 ~ 𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡(𝛼1, 𝛼2, … , 𝛼 𝑘)
Dirichlet Satisfies expansion or combination rule:
𝝅 𝟏 𝜽, 𝝅 𝟏(𝟏 − 𝜽), 𝝅 𝟐, … , 𝝅 𝒌 ~ 𝑫𝒊𝒓𝒊𝒄𝒉𝒍𝒆𝒕(𝜶 𝟏 𝒃 , 𝜶 𝟏(𝟏 − 𝒃), 𝜶 𝟐, … , 𝜶 𝒌)
Allows to increase the dimensionality of Dirichlet
Where 0 < b < 1 and 𝜃~𝐵𝑒𝑡𝑎(𝛼1 𝑏, 𝛼1 1 − 𝑏 )
Intuition behind Infinite GMM
Properties of Dirichlet distribution
Dirichlet distribution is conjugate to Multinomial distribution
If π = 𝜋1, 𝜋2, … , 𝜋 𝑘 ~ 𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡(𝛼1, 𝛼2, … , 𝛼 𝑘)
Dirichlet Satisfies expansion or combination rule:
𝝅 𝟏 𝜽, 𝝅 𝟏(𝟏 − 𝜽), 𝝅 𝟐, … , 𝝅 𝒌 ~ 𝑫𝒊𝒓𝒊𝒄𝒉𝒍𝒆𝒕(𝜶 𝟏 𝒃 , 𝜶 𝟏(𝟏 − 𝒃), 𝜶 𝟐, … , 𝜶 𝒌)
Allows to increase the dimensionality of Dirichlet
Where 0 < b < 1 and 𝜃~𝐵𝑒𝑡𝑎(𝛼1 𝑏, 𝛼1 1 − 𝑏 )
Dirichlet Process π(2)
= 𝜋1
(2)
, 𝜋2
(2)
~ 𝐷𝑖𝑟( 𝛼
2,
𝛼
2) ~ 𝐷𝑖𝑟( 𝛼
4,
𝛼
4,
𝛼
4,
𝛼
4) ~ 𝐷𝑖𝑟( 𝛼
𝐾
,…
𝛼
𝐾
)
𝑲 → ∞
Dirichlet process
21 : The Indian Bu↵et Process
Figure 2: On the left is an example of Indian Bu↵et Process dish assign
the right is an example binary matrix generated from IBP.
3. The nth customer helps himself to each dish with probability mk /
dish k was chosen.
4. He tries Poisson(↵/ n) new dishes.
Indian buffet processChinese restaurant process
Chinese restaurant process in action
Distribution of Distributions
Mixture of Gaussians
Infinite GMM
Probabilistic
modelling:
• PyMC3
• TensorFlow
Scikit-learn:
sklearn.mixture.
BayessianGuassian
Mixture
Dirichlet Prior:
• Dirichlet Distribution - Finite GMM
• Dirichlet Process - Infinite GMM
Probabilistic modeling
• Probabilistic models captures the uncertainty better in real
world data
• But it is very computationally intensive
• The sampling process takes time to stabilize and then generate meaningful
results
• Certainly cannot work on large datasets
Applied Data Science for automated clustering
Finding dense regionsation Maximization
Finding dense regionsation Maximization
Finding dense regionsation Maximization
Obtaining stochastic periods recursively
Stochastic
periods
• Get probability distributions
from dense regions
Get dense
regions list
• Find dmin to cluster a
region
Recursively
split regions
• If region is:
{Heavy tailed, Multi-modal}
Kurtosis of normal distribution = 3
Heavy tailed: Excess Kurtosis > 6
𝐵𝑖𝑚𝑜𝑑𝑎𝑙𝑖𝑡𝑦 =
𝛾2
+ 1
𝜅
𝛾: Skewness
𝜅 : Kurtosis
Bimodality
for uniform
distribution 5/9
Bimodality > 0.8
Unimoda
l
Not unimodal
Unimo
dal ?
dmin via distance matrix
This topic was covered generically in detail during ODSC 2018 – “Topological space clustering”
dmin via distance matrix
This topic was covered generically in detail during ODSC 2018 – “Topological space clustering”
dmin via distance matrix
This topic was covered generically in detail during ODSC 2018 – “Topological space clustering”
If local dense regions exists along with
sparsity, then we can obtain hierarchical
clusters at each mode
Density plot of distance matrix
Method proposed:
Finding optimal clustering-epsilon
• The problem comes down to finding the most optimal curve for the
Gaussian kernel
• One of the ways to solve it algorithmically
Grid Search
(band_width,
grid_size)
rFFT
Silverman
Transform
I-
rFFT
Score
(logLoss,
stdDev)
Minima
(band_width, grid_size)
Genuine systematic DNS queries to DNS server
Time delta
analysis
Time delta
Density
plot
Stochastic periods: Systematic DNS queries
Mean (μ) Std. (σ)
Periodicity
( μ / σ )
Skewness Kurtosis
Bimodal
Coeff
Num
Points
Density Range_min Range_max
0 0.809819 0.000499 1623.841793 0.252468 -0.628825 0.443460 330 330.000000 0.808578 0.810866
1 0.812474 0.000559 1454.556761 -0.879146 0.291782 0.536779 817 817.000000 0.810869 0.813244
2 0.813497 0.000141 5758.645345 0.130659 -1.036117 0.512333 426 426.000000 0.813245 0.813768
3 0.814162 0.000326 2495.343595 0.774715 -0.464280 0.623092 281 281.000000 0.813769 0.815018
4 1.622954 0.000631 2570.868659 -0.093343 -1.009738 0.504116 845 845.000000 1.621745 1.624109
5 1.625497 0.000630 2578.372267 -0.304701 -0.984417 0.540452 1386 1386.000000 1.624114 1.626489
6 1.627108 0.000496 3282.770498 0.992204 0.538372 0.558516 614 614.000000 1.626492 1.628858
7 2.436156 0.000627 3885.490319 0.059122 -1.007464 0.492753 208 208.000000 2.434985 2.437341
8 2.438674 0.000653 3733.232096 -0.230265 -0.988668 0.514873 269 269.000000 2.437374 2.439728
THANKS
E-mail: kuldeep.jiwani@gmail.com / kuldeep.Jiwani@thalesgroup.com
LinkedIn: https://www.linkedin.com/in/kuldeep-jiwani-988605/
Appendix
Finite GMM: Bayesian setting
Algorithm: Collapsed Gibbs sampler for a finite Gaussian mixture model
Choose an initial z
For T iterations do # Gibbs sampling iterations
For i = 1 to N do
Remove xi ’s statistics from component zi # Old cluster assignment for xi
For k = 1 to K do # Every possible component
Calculate P(zi = k|zi , α)
Calculate p(xi |Xki , β)
Calculate P(zi = k|zi , X , α, β) ∝ P(zi = k|zi , α) p(xi |Xki , β)
End for
Sample knew from P(zi |zi , X , α, β) after normalizing
Add xi ’s statistics to the component zi = knew # New assignment for xi
End for
End for Evaluation metric for Gibbs: 𝑘=1
𝐾
𝑝 𝑋 𝑘 𝛽 𝑝(𝑧|𝛼)
Infinite GMM: Bayesian setting
Choose an initial z
For T iterations do # Gibbs sampling iterations
For i = 1 to N do
Remove xi ’s statistics from component zi # Old cluster assignment for xi
For k = 1 to K do # Every possible component
Calculate P(zi = k|zi , α)
Calculate p(xi |Xki , β)
Calculate P(zi = k|zi , X , α, β) ∝ P(zi = k|zi , α) p(xi |Xki , β)
End for
Calculate P (zi = k∗|zi, α) # Consider a new component
Calculate p(xi|β)
Calculate P (zi = k∗|zi, X , α, β) ∝ P (zi = k∗|zi, α) p(xi|β)
Sample knew from P(zi |zi , X , α, β) after normalizing
If any component is empty, remove it and decrease K
Add xi ’s statistics to the component zi = knew # New assignment for xi
End for
End for
Simplex-2D

More Related Content

What's hot

Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)
Sri Prasanna
 
BIRTE-13-Kawashima
BIRTE-13-KawashimaBIRTE-13-Kawashima
BIRTE-13-Kawashima
Hideyuki Kawashima
 
The Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemThe Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management System
Reza Rahimi
 
Distributed systems scheduling
Distributed systems schedulingDistributed systems scheduling
Distributed systems scheduling
Pragati Startup Presentation Designer firm
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
MapR Technologies
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
민재 정
 
logical clocks
logical clockslogical clocks
Clocks
ClocksClocks
Clocks
guesta013ed8
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
MLconf
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
Albert Bifet
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, Pepperdata
MLconf
 
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural NetworksImproving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
Aditya N Deshmukh
 
A Brief History of Stream Processing
A Brief History of Stream ProcessingA Brief History of Stream Processing
A Brief History of Stream Processing
Aleksandr Kuboskin, CFA
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learning
Taehoon Kim
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Albert Bifet
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
Ryo Iwaki
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
MLconf
 
[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템
NAVER D2
 
Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr -
PyData
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
mooopan
 

What's hot (20)

Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)
 
BIRTE-13-Kawashima
BIRTE-13-KawashimaBIRTE-13-Kawashima
BIRTE-13-Kawashima
 
The Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemThe Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management System
 
Distributed systems scheduling
Distributed systems schedulingDistributed systems scheduling
Distributed systems scheduling
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
 
logical clocks
logical clockslogical clocks
logical clocks
 
Clocks
ClocksClocks
Clocks
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, Pepperdata
 
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural NetworksImproving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
 
A Brief History of Stream Processing
A Brief History of Stream ProcessingA Brief History of Stream Processing
A Brief History of Stream Processing
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learning
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data Streams
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템
 
Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr -
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
 

Similar to ODSC 2019: Sessionisation via stochastic periods for root event identification

CRYPTANALYSIS AGAINST SYMMETRICKEY SCHEMES WITH ONLINE CLASSICAL QUERIES AND ...
CRYPTANALYSIS AGAINST SYMMETRICKEY SCHEMES WITH ONLINE CLASSICAL QUERIES AND ...CRYPTANALYSIS AGAINST SYMMETRICKEY SCHEMES WITH ONLINE CLASSICAL QUERIES AND ...
CRYPTANALYSIS AGAINST SYMMETRICKEY SCHEMES WITH ONLINE CLASSICAL QUERIES AND ...
Priyanka Aash
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming Applications
Debasish Ghosh
 
Streaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameStreaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same Game
Numenta
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Ian Foster
 
Neural network
Neural networkNeural network
Neural network
Babu Priyavrat
 
A calculus of mobile Real-Time processes
A calculus of mobile Real-Time processesA calculus of mobile Real-Time processes
A calculus of mobile Real-Time processes
Polytechnique Montréal
 
Master Thesis Presentation
Master Thesis PresentationMaster Thesis Presentation
Master Thesis Presentation
Mohamed Sobh
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- ML and Ubicomp...
Bridging the Gap: Machine Learning for Ubiquitous Computing -- ML and Ubicomp...Bridging the Gap: Machine Learning for Ubiquitous Computing -- ML and Ubicomp...
Bridging the Gap: Machine Learning for Ubiquitous Computing -- ML and Ubicomp...
Thomas Ploetz
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdf
TulasiramKandula1
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
CastLabKAIST
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing System
C4Media
 
From Trill to Quill and Beyond
From Trill to Quill and BeyondFrom Trill to Quill and Beyond
From Trill to Quill and Beyond
Badrish Chandramouli
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...
Thanh Hieu
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent Monitoring
Intelie
 
Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...
Balázs Hidasi
 
Real time intrusion detection in network traffic using adaptive and auto-scal...
Real time intrusion detection in network traffic using adaptive and auto-scal...Real time intrusion detection in network traffic using adaptive and auto-scal...
Real time intrusion detection in network traffic using adaptive and auto-scal...
Gobinath Loganathan
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
Paul Groth
 
Dealing with the need for Infrastructural Support in Ambient Intelligence
Dealing with the need for Infrastructural Support in Ambient IntelligenceDealing with the need for Infrastructural Support in Ambient Intelligence
Dealing with the need for Infrastructural Support in Ambient Intelligence
Diego López-de-Ipiña González-de-Artaza
 
Mastering AIOps with Deep Learning
Mastering AIOps with Deep LearningMastering AIOps with Deep Learning
Mastering AIOps with Deep Learning
Jorge Cardoso
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdf
Polytechnique Montréal
 

Similar to ODSC 2019: Sessionisation via stochastic periods for root event identification (20)

CRYPTANALYSIS AGAINST SYMMETRICKEY SCHEMES WITH ONLINE CLASSICAL QUERIES AND ...
CRYPTANALYSIS AGAINST SYMMETRICKEY SCHEMES WITH ONLINE CLASSICAL QUERIES AND ...CRYPTANALYSIS AGAINST SYMMETRICKEY SCHEMES WITH ONLINE CLASSICAL QUERIES AND ...
CRYPTANALYSIS AGAINST SYMMETRICKEY SCHEMES WITH ONLINE CLASSICAL QUERIES AND ...
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming Applications
 
Streaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameStreaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same Game
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Neural network
Neural networkNeural network
Neural network
 
A calculus of mobile Real-Time processes
A calculus of mobile Real-Time processesA calculus of mobile Real-Time processes
A calculus of mobile Real-Time processes
 
Master Thesis Presentation
Master Thesis PresentationMaster Thesis Presentation
Master Thesis Presentation
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- ML and Ubicomp...
Bridging the Gap: Machine Learning for Ubiquitous Computing -- ML and Ubicomp...Bridging the Gap: Machine Learning for Ubiquitous Computing -- ML and Ubicomp...
Bridging the Gap: Machine Learning for Ubiquitous Computing -- ML and Ubicomp...
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdf
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing System
 
From Trill to Quill and Beyond
From Trill to Quill and BeyondFrom Trill to Quill and Beyond
From Trill to Quill and Beyond
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent Monitoring
 
Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...
 
Real time intrusion detection in network traffic using adaptive and auto-scal...
Real time intrusion detection in network traffic using adaptive and auto-scal...Real time intrusion detection in network traffic using adaptive and auto-scal...
Real time intrusion detection in network traffic using adaptive and auto-scal...
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
Dealing with the need for Infrastructural Support in Ambient Intelligence
Dealing with the need for Infrastructural Support in Ambient IntelligenceDealing with the need for Infrastructural Support in Ambient Intelligence
Dealing with the need for Infrastructural Support in Ambient Intelligence
 
Mastering AIOps with Deep Learning
Mastering AIOps with Deep LearningMastering AIOps with Deep Learning
Mastering AIOps with Deep Learning
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdf
 

Recently uploaded

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 

Recently uploaded (20)

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 

ODSC 2019: Sessionisation via stochastic periods for root event identification

  • 1. Sessionisation via Stochastic periods for root event identification Kuldeep Jiwani ODSC India 2019
  • 2. Thales Overview From the Bottom of the Oceans… to the Depths of Space & Cyberspace Key Digital Technologies
  • 3. Thales: A Research and Development Powerhouse 6 times winner 2012, 2013, 2015, 2016, 2017, 2018 Expertise in a uniquely broad range of technical domains, from science to systems, applied across businesses. An extensive intellectual property portfolio of 20,500 patents. Albert Fert Scientific director of the CNRS/Thales joint physics unit and winner of the 2007 Nobel prize in physics.
  • 4. Agenda • Motivation of studying events • Concept & purpose of Sessionisation • Traditional approaches • Real world case studies • Applied Data Science way of doing Sessionisation
  • 5. Events • Orders placed in a market • Sequence of user tweets • User’s clicks on a website • Activity update by an IoT device • Network events on a router • Network alarms in a network
  • 9. Time sequenced events Time Entity - 1 Entity - 2 Entity - 3
  • 10. Time sequenced events Time Entity - 1 Entity - 2 Entity - 3
  • 11. Time sequenced events Time Entity - 1 Entity - 2 Entity - 3 Entity - 4
  • 12. Sessions: Operations vs Data Science view Continuity in activity Mean activity period Gap >> Mean activity period Sessions Sessions Sessions
  • 13. Sessions: Operations vs Data Science view Continuity in activity Chain of time sequenced events Mean activity period Gap >> Mean activity period Sessions Sessions Sessions Time based correlation
  • 14. Root event identification from session chain
  • 15. Root event identification from session chain
  • 16. Root event identification from session chain
  • 17. Root event identification from session chain
  • 18. Root event identification from session chain
  • 19. Root event identification from session chain Stochastic periods
  • 20. Malicious Actors in the world of AI • Orders placed in a market: Market manipulation • Sequence of user tweets : Bot campaigns • User’s clicks on a website: Fraudulent transactions • Activity by an IoT device: Taking device control • Network events on a router: Cyber attacks
  • 21. Events stream as impulse train 1 0 Time Time
  • 22. Approaches for finding time based patterns • Fourier transform • Time period – Stochastic periods • GMM (Gaussian Mixture Models) • Infinite GMM (Gaussian Mixture Models) • Non-parametric Bayesian methods • Applied data science techniques Information Complexity Applied data science
  • 24. Fourier series: Quick intro 𝑃 𝑡 = 1 2 𝑎0 + 𝑎1 cos 𝜔𝑡 + 𝑎2 cos 2𝜔𝑡 + … + 𝑏1 sin 𝜔𝑡 + 𝑏2 sin 2𝜔𝑡 + … 𝑷 𝒕 = 𝟏 𝟐 𝒂 𝟎 + 𝒏=𝟏 ∞ 𝒂 𝒏 𝐜𝐨𝐬 𝒏𝝎𝒕 + 𝒏=𝟏 ∞ 𝒃 𝒏 𝐬𝐢𝐧 𝒏𝝎𝒕 𝑓 𝑡 → 𝑃(𝑡)
  • 25. Fourier series: Quick intro 𝐸2 = 0 𝑇 (𝑓 𝑡 − 𝑃(𝑡))2 𝑑𝑡 𝜕𝐸2 𝜕𝑎 𝑛 = 0 𝜕𝐸2 𝜕𝑏 𝑛 = 0 RMSE (Root Mean Square Error) Minimize RMSE loss Derivative = 0 𝑎 𝑛 = 2 𝑇 0 𝑇 𝑓(𝑡) cos 𝑛𝜔𝑡 𝑑𝑡 𝑏 𝑛 = 2 𝑇 0 𝑇 𝑓(𝑡) sin 𝑛𝜔𝑡 𝑑𝑡 𝑎0 = 2 𝑇 0 𝑇 𝑓(𝑡) 𝑑𝑡 𝑏0 = 0
  • 26. Fourier transform: Quick intro Euler’s formula 𝑒 𝑗𝜔𝑡 = cos 𝜔𝑡 + 𝑗 sin 𝜔𝑡 𝑃 𝑡 = 𝑛=−∞ ∞ 𝑐 𝑛 𝑒 𝑗𝑛𝜔𝑡 𝑐 𝑛 = 1 𝑇 0 𝑇 𝑓(𝑡) 𝑒−𝑗𝑛𝜔𝑡 𝑑𝑡 Fourier Series 𝐹 𝑗𝜔 = −∞ ∞ 𝑓(𝑡)𝑒−𝑗𝜔𝑡 𝑑𝑡 Fourier Transform (CTFT) 𝑋 𝑗𝜔 = 𝑛=−∞ ∞ 𝑥(𝑛)𝑒−𝑗𝑛𝜔 Fourier Transform (DTFT)
  • 27. Fourier Transform (DTFT): Impulse train FT (Real): Magnitude FT (Imaginary): Phase shift
  • 28. Fourier Transform (DTFT): Impulse train FT (Real): Magnitude FT (Imaginary): Phase shift 10 1010
  • 29. Plotting Fourier Transform in Python N = time_signal.shape[0] signal_fft = numpy.fft.fft(time_signal) frequency_bins = numpy.fft.fftfreq(N) fig, ax = plt.subplots(1,2,figsize=(28,7)) ax[0].plot(frequency_bins[1:N/2], np.abs(signal_fft.real[1:N/2]), 'g') ax[1].plot(frequency_bins[1:N/2], signal_fft.imag[1:N/2], 'c')
  • 30. Case studies via public datasets • Sessionisation is an essential activity in detecting malicious bot activities like Beaconing • We will use 6th dataset of CTU-13 datasets for examples • Provided by Czech Technical University (CTU) • Traces captured from a malware attack executed in university network • 6th dataset simulates a bot named DonBot, it attacks SVC services on Windows • Dataset: https://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-Capture- Botnet-47/bro/conn.log
  • 31. Case – 1: DonBot’s DNS queries to university DNS server
  • 35. Stochastic periods: Introduction • Analyze periodicity in time domain • Compute consecutive time deltas • Real world signals are noisy so time deltas will vary a lot • If there is periodicity in the signal, time deltas will vary in a band • The density plot of time deltas will show some high density regions • We can learn a probability distributions for each high density region
  • 36. DonBot: DNS traces Bot’s DNS event stream Time delta analysis
  • 37. DonBot: DNS traces MeanPeriod = μ Periodicity = μ / σ Bot’s DNS event stream Time delta analysis
  • 38. DonBot: DNS traces MeanPeriod = μ Periodicity = μ / σ Bot’s DNS event stream Time delta analysis Session-2Session-1 Session-3
  • 39. DonBot: DNS traces MeanPeriod = μ Periodicity = μ / σ Time delta analysis Time delta Density plot
  • 40. DonBot: DNS traces MeanPeriod = μ Periodicity = μ / σ Time delta analysis Time delta Density plot PDF: Stochastic period
  • 41. Case – 2: DonBot’s data transfer via backdoor port (5678) with a malicious IP (91.212.135.158)
  • 42. DonBot: Data transfer via backdoor DonBot’s backdoor traffic Time delta analysis
  • 43. DonBot: Data transfer via backdoor DonBot’s backdoor traffic Time delta analysis
  • 44. DonBot: Data transfer via backdoor Zoomed in FT without 0th frequency Time delta analysis
  • 45. DonBot: Data transfer via backdoor Zoomed in FT without 0th frequency Time delta analysis
  • 46. DonBot: Data transfer via backdoor Time delta analysis Time delta Density plot
  • 47. DonBot: Data transfer via backdoor Time delta analysis Time delta Density plot
  • 48. DonBot: Data transfer via backdoor Stochastic Period - 1 Stochastic Period - 2 Time delta analysis Time delta Density plot
  • 49. Case – 3: Genuine systematic DNS queries to university’s DNS server
  • 50. Genuine systematic DNS queries to DNS server Normal DNS queries Time delta analysis
  • 51. Genuine systematic DNS queries to DNS server Normal DNS queries Time delta analysis [0.8, 1.6, 2.4, 3.2]
  • 52. Genuine systematic DNS queries to DNS server FT without 0th frequency Time delta analysis FT: Only able to highlight the higher time periods
  • 53. Genuine systematic DNS queries to DNS server Time delta analysis Time delta Density plot
  • 54. Genuine systematic DNS queries to DNS server Time delta analysis Time delta Density plot A B C D
  • 55. Case – 4: Just another interesting DNS pattern
  • 56. Normal DNS queries FT without 0th frequency Time delta analysis
  • 57. Normal DNS queries Time delta analysis Time delta Density plot
  • 58. Auto discovering multiple distributions tation Maximization w to estimate parameter ?✓ Expectation Maximization If sources are known,easy: How to estimate parameter ?✓GMM - Gaussian Mixture Models
  • 59. Auto discovering multiple distributions sticmodel of data Gaussian Mixture Model (GMM) GMM - Gaussian Mixture Models
  • 60. GMM – Gaussian Mixture Models • Does soft clustering of data points instead of hard clustering • In principal it is very similar to K-Means but works on probability • K-Means: {P1  C1, P2  C2}, GMM: {P1  [0.8, 0.1, 0.1], P2  [0.05, 0.85, 0.1]} • Problem with GMM & K-Means: We need to define “K” • Techniques like Elbow method, Silhouette, etc. are based on certain assumptions • Cannot be applied in general for automated discovery of K • Finding “K” automatically is a very hard problem to solve C1, C2, C3 C1, C2, C3
  • 61. Bayesian way of building models 𝑃 𝜃 𝑋 = 𝑃 𝑋 𝜃 𝑃(𝜃) 𝑃(𝑋) PriorLikelihood Evidence Posterior 𝑃 𝜃 𝑋 = 𝑃 𝑋 𝜃 𝑃(𝜃) 𝑃(𝑋) 𝑃(𝜃) is conjugate to 𝑃 𝑋 𝜃 A(𝜈’) A(𝜈) For example: P(𝜽) = 𝓝(𝜽|0, 1) # Standard normal P(X|𝜽) = 𝓝(x|𝜽, 1) # with 1 std. dev 𝑃(𝜃|𝑋) ∝ 𝑒− 1 2(𝑥−𝜃)2 𝑒− 1 2 𝜃2 𝑃(𝜃|𝑋) ∝ 𝑒−(𝜃− 𝑥 2)2 P(𝜽|X) = 𝓝(𝜃| 𝑥 2 , 1 2 )
  • 62. Constructing GMM Gaussian Mixture Model (GMM) 𝑃 𝑥 𝜃 = 𝜋1 𝒩 𝑥 𝜇1, Σ1 + 𝜋2 𝒩(𝑥|𝜇2, Σ2)+ 𝜋3 𝒩(𝑥|𝜇3, Σ3) 𝜃 = {𝜋1, 𝜋2, 𝜋3, 𝜇1, 𝜇2, 𝜇3, Σ1, Σ2, Σ3} Parametric setting K = 3
  • 63. Constructing GMM Gaussian Mixture Model (GMM) 𝑃 𝑥 𝜃 = 𝜋1 𝒩 𝑥 𝜇1, Σ1 + 𝜋2 𝒩(𝑥|𝜇2, Σ2)+ 𝜋3 𝒩(𝑥|𝜇3, Σ3) 𝜃 = {𝜋1, 𝜋2, 𝜋3, 𝜇1, 𝜇2, 𝜇3, Σ1, Σ2, Σ3} Parametric setting K = 3 t x 𝑝 𝑡 = 𝑐 𝜃 = 𝜋 𝑐 𝑝(𝑥|𝑡 = 𝑐, 𝜃) = 𝒩 𝑥 𝜇 𝑐, Σ 𝑐 t is a latent variable: [1, 2, 3], 𝜋1 + 𝜋2 + 𝜋3 = 1 𝑝 𝑥 𝜃 = 𝑐=1 𝐾 𝑝 𝑥 𝑡 = 𝑐, 𝜃 𝑝 𝑡 = 𝑐 𝜃Likelihood
  • 64. Constructing GMM Gaussian Mixture Model (GMM) 𝑃 𝑥 𝜃 = 𝜋1 𝒩 𝑥 𝜇1, Σ1 + 𝜋2 𝒩(𝑥|𝜇2, Σ2)+ 𝜋3 𝒩(𝑥|𝜇3, Σ3) 𝜃 = {𝜋1, 𝜋2, 𝜋3, 𝜇1, 𝜇2, 𝜇3, Σ1, Σ2, Σ3} Parametric setting K = 3 t x 𝑝 𝑡 = 𝑐 𝜃 = 𝜋 𝑐 𝑝(𝑥|𝑡 = 𝑐, 𝜃) = 𝒩 𝑥 𝜇 𝑐, Σ 𝑐 Training GMM: 𝑚𝑎𝑥 𝜃 𝑖=1 𝑁 𝑝(𝑥𝑖|𝜃) Subject to: 𝜋 𝑘 = 1 ; 𝜋 𝑘 ≥ 0; Σ 𝑘 ≻0 EM algorithm: • E-step: Compute q(t) dist. over t • M-step: Update Gaussian params • To fit points assigned to them t is a latent variable: [1, 2, 3], 𝜋1 + 𝜋2 + 𝜋3 = 1 𝑝 𝑥 𝜃 = 𝑐=1 𝐾 𝑝 𝑥 𝑡 = 𝑐, 𝜃 𝑝 𝑡 = 𝑐 𝜃Likelihood
  • 66. Sessionisation: Data Science at scale • In a real world scenario, be it • Web users over internet, Network hosts in an enterprise network, etc. • One would need to apply Sessionisation on millions of entities • So manual inspection based methods cannot be used • We need a fully automated system to discover multiple ”Stochastic Periods” • We need to find the clusters automatically
  • 67. Infinite GMM (Gaussian Mixture Models) Based on Bayesian non-parametric approaches
  • 68. Probabilistic programming Logistic regression: 𝑝 𝑦𝑖 = 1 𝛽) = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽0) 𝛽 ~ 𝒩(𝜇, 𝜎) Sampling: • MCMC • Gibbs
  • 69. Probabilistic programming Logistic regression: 𝑝 𝑦𝑖 = 1 𝛽) = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽0) 𝛽 ~ 𝒩(𝜇, 𝜎) Sampling: • MCMC • Gibbs Credible interval [a, b] 𝑝 𝑎 ≤ exp 𝛽𝑖 ≤ 𝑏 = 0.95 Posterior distribution of 𝛽 𝑝 exp 𝛽𝑖 𝑋𝑡𝑟𝑎𝑖𝑛, 𝑦𝑡𝑟𝑎𝑖𝑛)
  • 70. Dirichlet distribution 𝐷𝑖𝑟 𝜃 𝛼 = 1 𝐵(𝛼) 𝑘=1 𝐾 𝜃 𝑘 𝛼 𝑘−1 𝒌 𝜽 𝒌 = 𝟏 𝜃 𝑘 ≥ 0 𝐵 𝛼 = 𝑖=1 𝐾 Γ(𝛼𝑖) Γ( 𝑖=1 𝐾 𝛼𝑖) Γ 𝑛 = 𝑛 − 1 ! Beta distribution Gamma function for positive integer n
  • 71. Dirichlet distribution 𝐷𝑖𝑟 𝜃 𝛼 = 1 𝐵(𝛼) 𝑘=1 𝐾 𝜃 𝑘 𝛼 𝑘−1 K=3 Simplex (0, 0, 1) (0, 1, 0)(1, 0, 0) (0.3, 0.2, 0.5) Effects of varying 𝛼 Dirichlet distribution 𝛼 = (10, 10, 10) Dirichlet distribution 𝛼 = (0.1, 0.1, 0.1) Density 𝒌 𝜽 𝒌 = 𝟏 𝜃 𝑘 ≥ 0 𝐵 𝛼 = 𝑖=1 𝐾 Γ(𝛼𝑖) Γ( 𝑖=1 𝐾 𝛼𝑖) Γ 𝑛 = 𝑛 − 1 ! Beta distribution Gamma function for positive integer n ( 1 3 , 1 3 , 1 3 )
  • 72. Intuition behind Infinite GMM Properties of Dirichlet distribution Dirichlet distribution is conjugate to Multinomial distribution If π = 𝜋1, 𝜋2, … , 𝜋 𝑘 ~ 𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡(𝛼1, 𝛼2, … , 𝛼 𝑘) Dirichlet Satisfies expansion or combination rule: 𝝅 𝟏 𝜽, 𝝅 𝟏(𝟏 − 𝜽), 𝝅 𝟐, … , 𝝅 𝒌 ~ 𝑫𝒊𝒓𝒊𝒄𝒉𝒍𝒆𝒕(𝜶 𝟏 𝒃 , 𝜶 𝟏(𝟏 − 𝒃), 𝜶 𝟐, … , 𝜶 𝒌) Allows to increase the dimensionality of Dirichlet Where 0 < b < 1 and 𝜃~𝐵𝑒𝑡𝑎(𝛼1 𝑏, 𝛼1 1 − 𝑏 )
  • 73. Intuition behind Infinite GMM Properties of Dirichlet distribution Dirichlet distribution is conjugate to Multinomial distribution If π = 𝜋1, 𝜋2, … , 𝜋 𝑘 ~ 𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡(𝛼1, 𝛼2, … , 𝛼 𝑘) Dirichlet Satisfies expansion or combination rule: 𝝅 𝟏 𝜽, 𝝅 𝟏(𝟏 − 𝜽), 𝝅 𝟐, … , 𝝅 𝒌 ~ 𝑫𝒊𝒓𝒊𝒄𝒉𝒍𝒆𝒕(𝜶 𝟏 𝒃 , 𝜶 𝟏(𝟏 − 𝒃), 𝜶 𝟐, … , 𝜶 𝒌) Allows to increase the dimensionality of Dirichlet Where 0 < b < 1 and 𝜃~𝐵𝑒𝑡𝑎(𝛼1 𝑏, 𝛼1 1 − 𝑏 ) Dirichlet Process π(2) = 𝜋1 (2) , 𝜋2 (2) ~ 𝐷𝑖𝑟( 𝛼 2, 𝛼 2) ~ 𝐷𝑖𝑟( 𝛼 4, 𝛼 4, 𝛼 4, 𝛼 4) ~ 𝐷𝑖𝑟( 𝛼 𝐾 ,… 𝛼 𝐾 ) 𝑲 → ∞
  • 74. Dirichlet process 21 : The Indian Bu↵et Process Figure 2: On the left is an example of Indian Bu↵et Process dish assign the right is an example binary matrix generated from IBP. 3. The nth customer helps himself to each dish with probability mk / dish k was chosen. 4. He tries Poisson(↵/ n) new dishes. Indian buffet processChinese restaurant process Chinese restaurant process in action
  • 76. Infinite GMM Probabilistic modelling: • PyMC3 • TensorFlow Scikit-learn: sklearn.mixture. BayessianGuassian Mixture Dirichlet Prior: • Dirichlet Distribution - Finite GMM • Dirichlet Process - Infinite GMM
  • 77. Probabilistic modeling • Probabilistic models captures the uncertainty better in real world data • But it is very computationally intensive • The sampling process takes time to stabilize and then generate meaningful results • Certainly cannot work on large datasets
  • 78. Applied Data Science for automated clustering
  • 82. Obtaining stochastic periods recursively Stochastic periods • Get probability distributions from dense regions Get dense regions list • Find dmin to cluster a region Recursively split regions • If region is: {Heavy tailed, Multi-modal} Kurtosis of normal distribution = 3 Heavy tailed: Excess Kurtosis > 6 𝐵𝑖𝑚𝑜𝑑𝑎𝑙𝑖𝑡𝑦 = 𝛾2 + 1 𝜅 𝛾: Skewness 𝜅 : Kurtosis Bimodality for uniform distribution 5/9 Bimodality > 0.8 Unimoda l Not unimodal Unimo dal ?
  • 83. dmin via distance matrix This topic was covered generically in detail during ODSC 2018 – “Topological space clustering”
  • 84. dmin via distance matrix This topic was covered generically in detail during ODSC 2018 – “Topological space clustering”
  • 85. dmin via distance matrix This topic was covered generically in detail during ODSC 2018 – “Topological space clustering” If local dense regions exists along with sparsity, then we can obtain hierarchical clusters at each mode
  • 86. Density plot of distance matrix
  • 87. Method proposed: Finding optimal clustering-epsilon • The problem comes down to finding the most optimal curve for the Gaussian kernel • One of the ways to solve it algorithmically Grid Search (band_width, grid_size) rFFT Silverman Transform I- rFFT Score (logLoss, stdDev) Minima (band_width, grid_size)
  • 88. Genuine systematic DNS queries to DNS server Time delta analysis Time delta Density plot
  • 89. Stochastic periods: Systematic DNS queries Mean (μ) Std. (σ) Periodicity ( μ / σ ) Skewness Kurtosis Bimodal Coeff Num Points Density Range_min Range_max 0 0.809819 0.000499 1623.841793 0.252468 -0.628825 0.443460 330 330.000000 0.808578 0.810866 1 0.812474 0.000559 1454.556761 -0.879146 0.291782 0.536779 817 817.000000 0.810869 0.813244 2 0.813497 0.000141 5758.645345 0.130659 -1.036117 0.512333 426 426.000000 0.813245 0.813768 3 0.814162 0.000326 2495.343595 0.774715 -0.464280 0.623092 281 281.000000 0.813769 0.815018 4 1.622954 0.000631 2570.868659 -0.093343 -1.009738 0.504116 845 845.000000 1.621745 1.624109 5 1.625497 0.000630 2578.372267 -0.304701 -0.984417 0.540452 1386 1386.000000 1.624114 1.626489 6 1.627108 0.000496 3282.770498 0.992204 0.538372 0.558516 614 614.000000 1.626492 1.628858 7 2.436156 0.000627 3885.490319 0.059122 -1.007464 0.492753 208 208.000000 2.434985 2.437341 8 2.438674 0.000653 3733.232096 -0.230265 -0.988668 0.514873 269 269.000000 2.437374 2.439728
  • 90. THANKS E-mail: kuldeep.jiwani@gmail.com / kuldeep.Jiwani@thalesgroup.com LinkedIn: https://www.linkedin.com/in/kuldeep-jiwani-988605/
  • 92. Finite GMM: Bayesian setting Algorithm: Collapsed Gibbs sampler for a finite Gaussian mixture model Choose an initial z For T iterations do # Gibbs sampling iterations For i = 1 to N do Remove xi ’s statistics from component zi # Old cluster assignment for xi For k = 1 to K do # Every possible component Calculate P(zi = k|zi , α) Calculate p(xi |Xki , β) Calculate P(zi = k|zi , X , α, β) ∝ P(zi = k|zi , α) p(xi |Xki , β) End for Sample knew from P(zi |zi , X , α, β) after normalizing Add xi ’s statistics to the component zi = knew # New assignment for xi End for End for Evaluation metric for Gibbs: 𝑘=1 𝐾 𝑝 𝑋 𝑘 𝛽 𝑝(𝑧|𝛼)
  • 93. Infinite GMM: Bayesian setting Choose an initial z For T iterations do # Gibbs sampling iterations For i = 1 to N do Remove xi ’s statistics from component zi # Old cluster assignment for xi For k = 1 to K do # Every possible component Calculate P(zi = k|zi , α) Calculate p(xi |Xki , β) Calculate P(zi = k|zi , X , α, β) ∝ P(zi = k|zi , α) p(xi |Xki , β) End for Calculate P (zi = k∗|zi, α) # Consider a new component Calculate p(xi|β) Calculate P (zi = k∗|zi, X , α, β) ∝ P (zi = k∗|zi, α) p(xi|β) Sample knew from P(zi |zi , X , α, β) after normalizing If any component is empty, remove it and decrease K Add xi ’s statistics to the component zi = knew # New assignment for xi End for End for

Editor's Notes

  1. 80 000 employees in 68 countries, a global company Heavy investments in innovation every year to develop state-of-the-art technologies: 1Bn€ invested in self-funded R&D