SlideShare a Scribd company logo
1 of 94
Sessionisation via Stochastic
periods for root event
identification
Kuldeep Jiwani
ODSC India 2019
Thales Overview
From the Bottom of the Oceans… to the Depths of
Space & Cyberspace
Key Digital Technologies
Thales: A Research and Development Powerhouse
6 times winner
2012, 2013,
2015, 2016,
2017, 2018
Expertise in a uniquely broad range
of technical domains, from science
to systems, applied across
businesses.
An extensive intellectual property
portfolio of 20,500 patents.
Albert Fert
Scientific director of the
CNRS/Thales joint physics
unit and winner of the
2007 Nobel prize in
physics.
Agenda
• Motivation of studying events
• Concept & purpose of Sessionisation
• Traditional approaches
• Real world case studies
• Applied Data Science way of doing Sessionisation
Events
• Orders placed in a market
• Sequence of user tweets
• User’s clicks on a website
• Activity update by an IoT device
• Network events on a router
• Network alarms in a network
Entities: Information flow
Events stream
Time
Time sequenced events
Time
Entity - 1
Time sequenced events
Time
Entity - 1
Entity - 2
Time sequenced events
Time
Entity - 1
Entity - 2
Entity - 3
Time sequenced events
Time
Entity - 1
Entity - 2
Entity - 3
Time sequenced events
Time
Entity - 1
Entity - 2
Entity - 3
Entity - 4
Sessions: Operations vs Data Science view
Continuity in
activity
Mean activity period
Gap >> Mean activity period
Sessions Sessions Sessions
Sessions: Operations vs Data Science view
Continuity in
activity
Chain of time
sequenced events
Mean activity period
Gap >> Mean activity period
Sessions Sessions Sessions
Time based correlation
Root event identification from session chain
Root event identification from session chain
Root event identification from session chain
Root event identification from session chain
Root event identification from session chain
Root event identification from session chain
Stochastic periods
Malicious Actors in the world of AI
• Orders placed in a market: Market manipulation
• Sequence of user tweets : Bot campaigns
• User’s clicks on a website: Fraudulent transactions
• Activity by an IoT device: Taking device control
• Network events on a router: Cyber attacks
Events stream as impulse train
1
0
Time
Time
Approaches for finding time based patterns
• Fourier transform
• Time period – Stochastic periods
• GMM (Gaussian Mixture Models)
• Infinite GMM (Gaussian Mixture
Models)
• Non-parametric Bayesian methods
• Applied data science techniques
Information
Complexity
Applied data
science
Fourier transform intuition
Fourier series: Quick intro
𝑃 𝑡 =
1
2
𝑎0 + 𝑎1 cos 𝜔𝑡 + 𝑎2 cos 2𝜔𝑡 + … + 𝑏1 sin 𝜔𝑡 + 𝑏2 sin 2𝜔𝑡 + …
𝑷 𝒕 =
𝟏
𝟐
𝒂 𝟎 +
𝒏=𝟏
∞
𝒂 𝒏 𝐜𝐨𝐬 𝒏𝝎𝒕 +
𝒏=𝟏
∞
𝒃 𝒏 𝐬𝐢𝐧 𝒏𝝎𝒕
𝑓 𝑡 → 𝑃(𝑡)
Fourier series: Quick intro
𝐸2
=
0
𝑇
(𝑓 𝑡 − 𝑃(𝑡))2
𝑑𝑡
𝜕𝐸2
𝜕𝑎 𝑛
= 0
𝜕𝐸2
𝜕𝑏 𝑛
= 0
RMSE
(Root Mean Square Error)
Minimize RMSE loss
Derivative = 0
𝑎 𝑛 =
2
𝑇 0
𝑇
𝑓(𝑡) cos 𝑛𝜔𝑡 𝑑𝑡 𝑏 𝑛 =
2
𝑇 0
𝑇
𝑓(𝑡) sin 𝑛𝜔𝑡 𝑑𝑡
𝑎0 =
2
𝑇 0
𝑇
𝑓(𝑡) 𝑑𝑡 𝑏0 = 0
Fourier transform: Quick intro
Euler’s formula 𝑒 𝑗𝜔𝑡
= cos 𝜔𝑡 + 𝑗 sin 𝜔𝑡
𝑃 𝑡 =
𝑛=−∞
∞
𝑐 𝑛 𝑒 𝑗𝑛𝜔𝑡
𝑐 𝑛 =
1
𝑇 0
𝑇
𝑓(𝑡) 𝑒−𝑗𝑛𝜔𝑡
𝑑𝑡
Fourier Series
𝐹 𝑗𝜔 =
−∞
∞
𝑓(𝑡)𝑒−𝑗𝜔𝑡
𝑑𝑡
Fourier Transform (CTFT)
𝑋 𝑗𝜔 =
𝑛=−∞
∞
𝑥(𝑛)𝑒−𝑗𝑛𝜔
Fourier Transform (DTFT)
Fourier Transform (DTFT): Impulse train
FT (Real): Magnitude FT (Imaginary): Phase shift
Fourier Transform (DTFT): Impulse train
FT (Real): Magnitude FT (Imaginary): Phase shift
10 1010
Plotting Fourier Transform in Python
N = time_signal.shape[0]
signal_fft = numpy.fft.fft(time_signal)
frequency_bins = numpy.fft.fftfreq(N)
fig, ax = plt.subplots(1,2,figsize=(28,7))
ax[0].plot(frequency_bins[1:N/2], np.abs(signal_fft.real[1:N/2]), 'g')
ax[1].plot(frequency_bins[1:N/2], signal_fft.imag[1:N/2], 'c')
Case studies via public datasets
• Sessionisation is an essential activity in detecting malicious bot
activities like Beaconing
• We will use 6th dataset of CTU-13 datasets for examples
• Provided by Czech Technical University (CTU)
• Traces captured from a malware attack executed in university network
• 6th dataset simulates a bot named DonBot, it attacks SVC services on Windows
• Dataset: https://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-Capture-
Botnet-47/bro/conn.log
Case – 1: DonBot’s DNS queries to university DNS
server
DonBot: DNS traces
Fourier
Transform
Bot’s
DNS event
stream
(Zoomed)
DonBot: DNS traces
Fourier
Transform
Bot’s
DNS event
stream
(Zoomed)
DonBot: DNS traces
Fourier
Transform
Bot’s
DNS event
stream
(Zoomed)
Stochastic periods: Introduction
• Analyze periodicity in time domain
• Compute consecutive time deltas
• Real world signals are noisy so time deltas will vary a lot
• If there is periodicity in the signal, time deltas will vary in a band
• The density plot of time deltas will show some high density regions
• We can learn a probability distributions for each high density region
DonBot: DNS traces
Bot’s
DNS event
stream
Time delta
analysis
DonBot: DNS traces
MeanPeriod = μ
Periodicity = μ / σ
Bot’s
DNS event
stream
Time delta
analysis
DonBot: DNS traces
MeanPeriod = μ
Periodicity = μ / σ
Bot’s
DNS event
stream
Time delta
analysis
Session-2Session-1 Session-3
DonBot: DNS traces
MeanPeriod = μ
Periodicity = μ / σ
Time delta
analysis
Time delta
Density
plot
DonBot: DNS traces
MeanPeriod = μ
Periodicity = μ / σ
Time delta
analysis
Time delta
Density
plot
PDF: Stochastic period
Case – 2: DonBot’s data transfer via backdoor
port (5678) with a malicious IP (91.212.135.158)
DonBot: Data transfer via backdoor
DonBot’s
backdoor
traffic
Time delta
analysis
DonBot: Data transfer via backdoor
DonBot’s
backdoor
traffic
Time delta
analysis
DonBot: Data transfer via backdoor
Zoomed in
FT
without 0th
frequency
Time delta
analysis
DonBot: Data transfer via backdoor
Zoomed in
FT
without 0th
frequency
Time delta
analysis
DonBot: Data transfer via backdoor
Time delta
analysis
Time delta
Density
plot
DonBot: Data transfer via backdoor
Time delta
analysis
Time delta
Density
plot
DonBot: Data transfer via backdoor
Stochastic Period - 1
Stochastic Period - 2
Time delta
analysis
Time delta
Density
plot
Case – 3: Genuine systematic DNS queries to
university’s DNS server
Genuine systematic DNS queries to DNS server
Normal
DNS
queries
Time delta
analysis
Genuine systematic DNS queries to DNS server
Normal
DNS
queries
Time delta
analysis
[0.8, 1.6, 2.4, 3.2]
Genuine systematic DNS queries to DNS server
FT
without 0th
frequency
Time delta
analysis
FT: Only able to highlight the higher time periods
Genuine systematic DNS queries to DNS server
Time delta
analysis
Time delta
Density
plot
Genuine systematic DNS queries to DNS server
Time delta
analysis
Time delta
Density
plot
A
B
C
D
Case – 4: Just another interesting DNS pattern
Normal DNS queries
FT
without 0th
frequency
Time delta
analysis
Normal DNS queries
Time delta
analysis
Time delta
Density
plot
Auto discovering multiple distributions
tation Maximization
w to estimate parameter ?✓
Expectation Maximization
If sources are known,easy:
How to estimate parameter ?✓GMM - Gaussian Mixture Models
Auto discovering multiple distributions
sticmodel of data Gaussian Mixture Model (GMM)
GMM - Gaussian Mixture Models
GMM – Gaussian Mixture Models
• Does soft clustering of data points instead of hard clustering
• In principal it is very similar to K-Means but works on
probability
• K-Means: {P1  C1, P2  C2}, GMM: {P1  [0.8, 0.1, 0.1], P2  [0.05, 0.85, 0.1]}
• Problem with GMM & K-Means: We need to define “K”
• Techniques like Elbow method, Silhouette, etc. are based on certain assumptions
• Cannot be applied in general for automated discovery of K
• Finding “K” automatically is a very hard problem to solve
C1, C2, C3 C1, C2, C3
Bayesian way of building models
𝑃 𝜃 𝑋 =
𝑃 𝑋 𝜃 𝑃(𝜃)
𝑃(𝑋)
PriorLikelihood
Evidence
Posterior
𝑃 𝜃 𝑋 =
𝑃 𝑋 𝜃 𝑃(𝜃)
𝑃(𝑋)
𝑃(𝜃) is conjugate to 𝑃 𝑋 𝜃
A(𝜈’) A(𝜈)
For example:
P(𝜽) = 𝓝(𝜽|0, 1) # Standard normal
P(X|𝜽) = 𝓝(x|𝜽, 1) # with 1 std. dev
𝑃(𝜃|𝑋) ∝ 𝑒−
1
2(𝑥−𝜃)2
𝑒−
1
2 𝜃2
𝑃(𝜃|𝑋) ∝ 𝑒−(𝜃−
𝑥
2)2
P(𝜽|X) = 𝓝(𝜃|
𝑥
2
,
1
2
)
Constructing GMM
Gaussian Mixture Model (GMM)
𝑃 𝑥 𝜃 = 𝜋1 𝒩 𝑥 𝜇1, Σ1 + 𝜋2 𝒩(𝑥|𝜇2, Σ2)+ 𝜋3 𝒩(𝑥|𝜇3, Σ3)
𝜃 = {𝜋1, 𝜋2, 𝜋3, 𝜇1, 𝜇2, 𝜇3, Σ1, Σ2, Σ3} Parametric setting
K = 3
Constructing GMM
Gaussian Mixture Model (GMM)
𝑃 𝑥 𝜃 = 𝜋1 𝒩 𝑥 𝜇1, Σ1 + 𝜋2 𝒩(𝑥|𝜇2, Σ2)+ 𝜋3 𝒩(𝑥|𝜇3, Σ3)
𝜃 = {𝜋1, 𝜋2, 𝜋3, 𝜇1, 𝜇2, 𝜇3, Σ1, Σ2, Σ3} Parametric setting
K = 3
t x
𝑝 𝑡 = 𝑐 𝜃 = 𝜋 𝑐 𝑝(𝑥|𝑡 = 𝑐, 𝜃) = 𝒩 𝑥 𝜇 𝑐, Σ 𝑐
t is a latent variable: [1, 2, 3], 𝜋1 + 𝜋2 + 𝜋3 = 1
𝑝 𝑥 𝜃 =
𝑐=1
𝐾
𝑝 𝑥 𝑡 = 𝑐, 𝜃 𝑝 𝑡 = 𝑐 𝜃Likelihood
Constructing GMM
Gaussian Mixture Model (GMM)
𝑃 𝑥 𝜃 = 𝜋1 𝒩 𝑥 𝜇1, Σ1 + 𝜋2 𝒩(𝑥|𝜇2, Σ2)+ 𝜋3 𝒩(𝑥|𝜇3, Σ3)
𝜃 = {𝜋1, 𝜋2, 𝜋3, 𝜇1, 𝜇2, 𝜇3, Σ1, Σ2, Σ3} Parametric setting
K = 3
t x
𝑝 𝑡 = 𝑐 𝜃 = 𝜋 𝑐 𝑝(𝑥|𝑡 = 𝑐, 𝜃) = 𝒩 𝑥 𝜇 𝑐, Σ 𝑐
Training GMM: 𝑚𝑎𝑥
𝜃
𝑖=1
𝑁
𝑝(𝑥𝑖|𝜃)
Subject to: 𝜋 𝑘 = 1 ; 𝜋 𝑘 ≥ 0; Σ 𝑘 ≻0
EM algorithm:
• E-step: Compute q(t) dist. over t
• M-step: Update Gaussian params
• To fit points assigned to them
t is a latent variable: [1, 2, 3], 𝜋1 + 𝜋2 + 𝜋3 = 1
𝑝 𝑥 𝜃 =
𝑐=1
𝐾
𝑝 𝑥 𝑡 = 𝑐, 𝜃 𝑝 𝑡 = 𝑐 𝜃Likelihood
Sessionisation problem statement
Sessionisation: Data Science at scale
• In a real world scenario, be it
• Web users over internet, Network hosts in an enterprise network, etc.
• One would need to apply Sessionisation on millions of entities
• So manual inspection based methods cannot be used
• We need a fully automated system to discover multiple
”Stochastic Periods”
• We need to find the clusters automatically
Infinite GMM (Gaussian Mixture Models)
Based on Bayesian non-parametric approaches
Probabilistic programming
Logistic regression: 𝑝 𝑦𝑖 = 1 𝛽) = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽0) 𝛽 ~ 𝒩(𝜇, 𝜎)
Sampling:
• MCMC
• Gibbs
Probabilistic programming
Logistic regression: 𝑝 𝑦𝑖 = 1 𝛽) = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽0) 𝛽 ~ 𝒩(𝜇, 𝜎)
Sampling:
• MCMC
• Gibbs
Credible interval [a, b]
𝑝 𝑎 ≤ exp 𝛽𝑖 ≤ 𝑏 = 0.95
Posterior distribution of 𝛽
𝑝 exp 𝛽𝑖 𝑋𝑡𝑟𝑎𝑖𝑛, 𝑦𝑡𝑟𝑎𝑖𝑛)
Dirichlet distribution
𝐷𝑖𝑟 𝜃 𝛼 =
1
𝐵(𝛼)
𝑘=1
𝐾
𝜃 𝑘
𝛼 𝑘−1
𝒌
𝜽 𝒌 = 𝟏
𝜃 𝑘 ≥ 0
𝐵 𝛼 =
𝑖=1
𝐾
Γ(𝛼𝑖)
Γ( 𝑖=1
𝐾
𝛼𝑖)
Γ 𝑛 = 𝑛 − 1 !
Beta distribution
Gamma function
for positive integer n
Dirichlet distribution
𝐷𝑖𝑟 𝜃 𝛼 =
1
𝐵(𝛼)
𝑘=1
𝐾
𝜃 𝑘
𝛼 𝑘−1
K=3 Simplex
(0, 0, 1)
(0, 1, 0)(1, 0, 0)
(0.3, 0.2, 0.5)
Effects of varying 𝛼
Dirichlet distribution
𝛼 = (10, 10, 10)
Dirichlet distribution
𝛼 = (0.1, 0.1, 0.1)
Density
𝒌
𝜽 𝒌 = 𝟏
𝜃 𝑘 ≥ 0
𝐵 𝛼 =
𝑖=1
𝐾
Γ(𝛼𝑖)
Γ( 𝑖=1
𝐾
𝛼𝑖)
Γ 𝑛 = 𝑛 − 1 !
Beta distribution
Gamma function
for positive integer n
(
1
3
,
1
3
,
1
3
)
Intuition behind Infinite GMM
Properties of Dirichlet distribution
Dirichlet distribution is conjugate to Multinomial distribution
If π = 𝜋1, 𝜋2, … , 𝜋 𝑘 ~ 𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡(𝛼1, 𝛼2, … , 𝛼 𝑘)
Dirichlet Satisfies expansion or combination rule:
𝝅 𝟏 𝜽, 𝝅 𝟏(𝟏 − 𝜽), 𝝅 𝟐, … , 𝝅 𝒌 ~ 𝑫𝒊𝒓𝒊𝒄𝒉𝒍𝒆𝒕(𝜶 𝟏 𝒃 , 𝜶 𝟏(𝟏 − 𝒃), 𝜶 𝟐, … , 𝜶 𝒌)
Allows to increase the dimensionality of Dirichlet
Where 0 < b < 1 and 𝜃~𝐵𝑒𝑡𝑎(𝛼1 𝑏, 𝛼1 1 − 𝑏 )
Intuition behind Infinite GMM
Properties of Dirichlet distribution
Dirichlet distribution is conjugate to Multinomial distribution
If π = 𝜋1, 𝜋2, … , 𝜋 𝑘 ~ 𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡(𝛼1, 𝛼2, … , 𝛼 𝑘)
Dirichlet Satisfies expansion or combination rule:
𝝅 𝟏 𝜽, 𝝅 𝟏(𝟏 − 𝜽), 𝝅 𝟐, … , 𝝅 𝒌 ~ 𝑫𝒊𝒓𝒊𝒄𝒉𝒍𝒆𝒕(𝜶 𝟏 𝒃 , 𝜶 𝟏(𝟏 − 𝒃), 𝜶 𝟐, … , 𝜶 𝒌)
Allows to increase the dimensionality of Dirichlet
Where 0 < b < 1 and 𝜃~𝐵𝑒𝑡𝑎(𝛼1 𝑏, 𝛼1 1 − 𝑏 )
Dirichlet Process π(2)
= 𝜋1
(2)
, 𝜋2
(2)
~ 𝐷𝑖𝑟( 𝛼
2,
𝛼
2) ~ 𝐷𝑖𝑟( 𝛼
4,
𝛼
4,
𝛼
4,
𝛼
4) ~ 𝐷𝑖𝑟( 𝛼
𝐾
,…
𝛼
𝐾
)
𝑲 → ∞
Dirichlet process
21 : The Indian Bu↵et Process
Figure 2: On the left is an example of Indian Bu↵et Process dish assign
the right is an example binary matrix generated from IBP.
3. The nth customer helps himself to each dish with probability mk /
dish k was chosen.
4. He tries Poisson(↵/ n) new dishes.
Indian buffet processChinese restaurant process
Chinese restaurant process in action
Distribution of Distributions
Mixture of Gaussians
Infinite GMM
Probabilistic
modelling:
• PyMC3
• TensorFlow
Scikit-learn:
sklearn.mixture.
BayessianGuassian
Mixture
Dirichlet Prior:
• Dirichlet Distribution - Finite GMM
• Dirichlet Process - Infinite GMM
Probabilistic modeling
• Probabilistic models captures the uncertainty better in real
world data
• But it is very computationally intensive
• The sampling process takes time to stabilize and then generate meaningful
results
• Certainly cannot work on large datasets
Applied Data Science for automated clustering
Finding dense regionsation Maximization
Finding dense regionsation Maximization
Finding dense regionsation Maximization
Obtaining stochastic periods recursively
Stochastic
periods
• Get probability distributions
from dense regions
Get dense
regions list
• Find dmin to cluster a
region
Recursively
split regions
• If region is:
{Heavy tailed, Multi-modal}
Kurtosis of normal distribution = 3
Heavy tailed: Excess Kurtosis > 6
𝐵𝑖𝑚𝑜𝑑𝑎𝑙𝑖𝑡𝑦 =
𝛾2
+ 1
𝜅
𝛾: Skewness
𝜅 : Kurtosis
Bimodality
for uniform
distribution 5/9
Bimodality > 0.8
Unimoda
l
Not unimodal
Unimo
dal ?
dmin via distance matrix
This topic was covered generically in detail during ODSC 2018 – “Topological space clustering”
dmin via distance matrix
This topic was covered generically in detail during ODSC 2018 – “Topological space clustering”
dmin via distance matrix
This topic was covered generically in detail during ODSC 2018 – “Topological space clustering”
If local dense regions exists along with
sparsity, then we can obtain hierarchical
clusters at each mode
Density plot of distance matrix
Method proposed:
Finding optimal clustering-epsilon
• The problem comes down to finding the most optimal curve for the
Gaussian kernel
• One of the ways to solve it algorithmically
Grid Search
(band_width,
grid_size)
rFFT
Silverman
Transform
I-
rFFT
Score
(logLoss,
stdDev)
Minima
(band_width, grid_size)
Genuine systematic DNS queries to DNS server
Time delta
analysis
Time delta
Density
plot
Stochastic periods: Systematic DNS queries
Mean (μ) Std. (σ)
Periodicity
( μ / σ )
Skewness Kurtosis
Bimodal
Coeff
Num
Points
Density Range_min Range_max
0 0.809819 0.000499 1623.841793 0.252468 -0.628825 0.443460 330 330.000000 0.808578 0.810866
1 0.812474 0.000559 1454.556761 -0.879146 0.291782 0.536779 817 817.000000 0.810869 0.813244
2 0.813497 0.000141 5758.645345 0.130659 -1.036117 0.512333 426 426.000000 0.813245 0.813768
3 0.814162 0.000326 2495.343595 0.774715 -0.464280 0.623092 281 281.000000 0.813769 0.815018
4 1.622954 0.000631 2570.868659 -0.093343 -1.009738 0.504116 845 845.000000 1.621745 1.624109
5 1.625497 0.000630 2578.372267 -0.304701 -0.984417 0.540452 1386 1386.000000 1.624114 1.626489
6 1.627108 0.000496 3282.770498 0.992204 0.538372 0.558516 614 614.000000 1.626492 1.628858
7 2.436156 0.000627 3885.490319 0.059122 -1.007464 0.492753 208 208.000000 2.434985 2.437341
8 2.438674 0.000653 3733.232096 -0.230265 -0.988668 0.514873 269 269.000000 2.437374 2.439728
THANKS
E-mail: kuldeep.jiwani@gmail.com / kuldeep.Jiwani@thalesgroup.com
LinkedIn: https://www.linkedin.com/in/kuldeep-jiwani-988605/
Appendix
Finite GMM: Bayesian setting
Algorithm: Collapsed Gibbs sampler for a finite Gaussian mixture model
Choose an initial z
For T iterations do # Gibbs sampling iterations
For i = 1 to N do
Remove xi ’s statistics from component zi # Old cluster assignment for xi
For k = 1 to K do # Every possible component
Calculate P(zi = k|zi , α)
Calculate p(xi |Xki , β)
Calculate P(zi = k|zi , X , α, β) ∝ P(zi = k|zi , α) p(xi |Xki , β)
End for
Sample knew from P(zi |zi , X , α, β) after normalizing
Add xi ’s statistics to the component zi = knew # New assignment for xi
End for
End for Evaluation metric for Gibbs: 𝑘=1
𝐾
𝑝 𝑋 𝑘 𝛽 𝑝(𝑧|𝛼)
Infinite GMM: Bayesian setting
Choose an initial z
For T iterations do # Gibbs sampling iterations
For i = 1 to N do
Remove xi ’s statistics from component zi # Old cluster assignment for xi
For k = 1 to K do # Every possible component
Calculate P(zi = k|zi , α)
Calculate p(xi |Xki , β)
Calculate P(zi = k|zi , X , α, β) ∝ P(zi = k|zi , α) p(xi |Xki , β)
End for
Calculate P (zi = k∗|zi, α) # Consider a new component
Calculate p(xi|β)
Calculate P (zi = k∗|zi, X , α, β) ∝ P (zi = k∗|zi, α) p(xi|β)
Sample knew from P(zi |zi , X , α, β) after normalizing
If any component is empty, remove it and decrease K
Add xi ’s statistics to the component zi = knew # New assignment for xi
End for
End for
Simplex-2D

More Related Content

What's hot

Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Sri Prasanna
 
The Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemThe Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemReza Rahimi
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford MapR Technologies
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...MLconf
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsAlbert Bifet
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataMLconf
 
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural NetworksImproving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural NetworksAditya N Deshmukh
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningTaehoon Kim
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsAlbert Bifet
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningRyo Iwaki
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템NAVER D2
 
Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - PyData
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learningmooopan
 

What's hot (20)

Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)Logical Clocks (Distributed computing)
Logical Clocks (Distributed computing)
 
BIRTE-13-Kawashima
BIRTE-13-KawashimaBIRTE-13-Kawashima
BIRTE-13-Kawashima
 
The Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemThe Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management System
 
Distributed systems scheduling
Distributed systems schedulingDistributed systems scheduling
Distributed systems scheduling
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
 
logical clocks
logical clockslogical clocks
logical clocks
 
Clocks
ClocksClocks
Clocks
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, Pepperdata
 
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural NetworksImproving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
Improving Numerical Wave Forecasts by Data Assimilation Based on Neural Networks
 
A Brief History of Stream Processing
A Brief History of Stream ProcessingA Brief History of Stream Processing
A Brief History of Stream Processing
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learning
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data Streams
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템
 
Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr -
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
 

Similar to ODSC 2019: Sessionisation via stochastic periods for root event identification

CRYPTANALYSIS AGAINST SYMMETRICKEY SCHEMES WITH ONLINE CLASSICAL QUERIES AND ...
CRYPTANALYSIS AGAINST SYMMETRICKEY SCHEMES WITH ONLINE CLASSICAL QUERIES AND ...CRYPTANALYSIS AGAINST SYMMETRICKEY SCHEMES WITH ONLINE CLASSICAL QUERIES AND ...
CRYPTANALYSIS AGAINST SYMMETRICKEY SCHEMES WITH ONLINE CLASSICAL QUERIES AND ...Priyanka Aash
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsDebasish Ghosh
 
Streaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameStreaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameNumenta
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
A calculus of mobile Real-Time processes
A calculus of mobile Real-Time processesA calculus of mobile Real-Time processes
A calculus of mobile Real-Time processesPolytechnique Montréal
 
Master Thesis Presentation
Master Thesis PresentationMaster Thesis Presentation
Master Thesis PresentationMohamed Sobh
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- ML and Ubicomp...
Bridging the Gap: Machine Learning for Ubiquitous Computing -- ML and Ubicomp...Bridging the Gap: Machine Learning for Ubiquitous Computing -- ML and Ubicomp...
Bridging the Gap: Machine Learning for Ubiquitous Computing -- ML and Ubicomp...Thomas Ploetz
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfTulasiramKandula1
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemC4Media
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...Thanh Hieu
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent MonitoringIntelie
 
Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Balázs Hidasi
 
Real time intrusion detection in network traffic using adaptive and auto-scal...
Real time intrusion detection in network traffic using adaptive and auto-scal...Real time intrusion detection in network traffic using adaptive and auto-scal...
Real time intrusion detection in network traffic using adaptive and auto-scal...Gobinath Loganathan
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 
Mastering AIOps with Deep Learning
Mastering AIOps with Deep LearningMastering AIOps with Deep Learning
Mastering AIOps with Deep LearningJorge Cardoso
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfPolytechnique Montréal
 

Similar to ODSC 2019: Sessionisation via stochastic periods for root event identification (20)

CRYPTANALYSIS AGAINST SYMMETRICKEY SCHEMES WITH ONLINE CLASSICAL QUERIES AND ...
CRYPTANALYSIS AGAINST SYMMETRICKEY SCHEMES WITH ONLINE CLASSICAL QUERIES AND ...CRYPTANALYSIS AGAINST SYMMETRICKEY SCHEMES WITH ONLINE CLASSICAL QUERIES AND ...
CRYPTANALYSIS AGAINST SYMMETRICKEY SCHEMES WITH ONLINE CLASSICAL QUERIES AND ...
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming Applications
 
Streaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameStreaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same Game
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Neural network
Neural networkNeural network
Neural network
 
A calculus of mobile Real-Time processes
A calculus of mobile Real-Time processesA calculus of mobile Real-Time processes
A calculus of mobile Real-Time processes
 
Master Thesis Presentation
Master Thesis PresentationMaster Thesis Presentation
Master Thesis Presentation
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- ML and Ubicomp...
Bridging the Gap: Machine Learning for Ubiquitous Computing -- ML and Ubicomp...Bridging the Gap: Machine Learning for Ubiquitous Computing -- ML and Ubicomp...
Bridging the Gap: Machine Learning for Ubiquitous Computing -- ML and Ubicomp...
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdf
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing System
 
From Trill to Quill and Beyond
From Trill to Quill and BeyondFrom Trill to Quill and Beyond
From Trill to Quill and Beyond
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent Monitoring
 
Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...
 
Real time intrusion detection in network traffic using adaptive and auto-scal...
Real time intrusion detection in network traffic using adaptive and auto-scal...Real time intrusion detection in network traffic using adaptive and auto-scal...
Real time intrusion detection in network traffic using adaptive and auto-scal...
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
Dealing with the need for Infrastructural Support in Ambient Intelligence
Dealing with the need for Infrastructural Support in Ambient IntelligenceDealing with the need for Infrastructural Support in Ambient Intelligence
Dealing with the need for Infrastructural Support in Ambient Intelligence
 
Mastering AIOps with Deep Learning
Mastering AIOps with Deep LearningMastering AIOps with Deep Learning
Mastering AIOps with Deep Learning
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdf
 

Recently uploaded

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 

Recently uploaded (20)

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 

ODSC 2019: Sessionisation via stochastic periods for root event identification

  • 1. Sessionisation via Stochastic periods for root event identification Kuldeep Jiwani ODSC India 2019
  • 2. Thales Overview From the Bottom of the Oceans… to the Depths of Space & Cyberspace Key Digital Technologies
  • 3. Thales: A Research and Development Powerhouse 6 times winner 2012, 2013, 2015, 2016, 2017, 2018 Expertise in a uniquely broad range of technical domains, from science to systems, applied across businesses. An extensive intellectual property portfolio of 20,500 patents. Albert Fert Scientific director of the CNRS/Thales joint physics unit and winner of the 2007 Nobel prize in physics.
  • 4. Agenda • Motivation of studying events • Concept & purpose of Sessionisation • Traditional approaches • Real world case studies • Applied Data Science way of doing Sessionisation
  • 5. Events • Orders placed in a market • Sequence of user tweets • User’s clicks on a website • Activity update by an IoT device • Network events on a router • Network alarms in a network
  • 9. Time sequenced events Time Entity - 1 Entity - 2 Entity - 3
  • 10. Time sequenced events Time Entity - 1 Entity - 2 Entity - 3
  • 11. Time sequenced events Time Entity - 1 Entity - 2 Entity - 3 Entity - 4
  • 12. Sessions: Operations vs Data Science view Continuity in activity Mean activity period Gap >> Mean activity period Sessions Sessions Sessions
  • 13. Sessions: Operations vs Data Science view Continuity in activity Chain of time sequenced events Mean activity period Gap >> Mean activity period Sessions Sessions Sessions Time based correlation
  • 14. Root event identification from session chain
  • 15. Root event identification from session chain
  • 16. Root event identification from session chain
  • 17. Root event identification from session chain
  • 18. Root event identification from session chain
  • 19. Root event identification from session chain Stochastic periods
  • 20. Malicious Actors in the world of AI • Orders placed in a market: Market manipulation • Sequence of user tweets : Bot campaigns • User’s clicks on a website: Fraudulent transactions • Activity by an IoT device: Taking device control • Network events on a router: Cyber attacks
  • 21. Events stream as impulse train 1 0 Time Time
  • 22. Approaches for finding time based patterns • Fourier transform • Time period – Stochastic periods • GMM (Gaussian Mixture Models) • Infinite GMM (Gaussian Mixture Models) • Non-parametric Bayesian methods • Applied data science techniques Information Complexity Applied data science
  • 24. Fourier series: Quick intro 𝑃 𝑡 = 1 2 𝑎0 + 𝑎1 cos 𝜔𝑡 + 𝑎2 cos 2𝜔𝑡 + … + 𝑏1 sin 𝜔𝑡 + 𝑏2 sin 2𝜔𝑡 + … 𝑷 𝒕 = 𝟏 𝟐 𝒂 𝟎 + 𝒏=𝟏 ∞ 𝒂 𝒏 𝐜𝐨𝐬 𝒏𝝎𝒕 + 𝒏=𝟏 ∞ 𝒃 𝒏 𝐬𝐢𝐧 𝒏𝝎𝒕 𝑓 𝑡 → 𝑃(𝑡)
  • 25. Fourier series: Quick intro 𝐸2 = 0 𝑇 (𝑓 𝑡 − 𝑃(𝑡))2 𝑑𝑡 𝜕𝐸2 𝜕𝑎 𝑛 = 0 𝜕𝐸2 𝜕𝑏 𝑛 = 0 RMSE (Root Mean Square Error) Minimize RMSE loss Derivative = 0 𝑎 𝑛 = 2 𝑇 0 𝑇 𝑓(𝑡) cos 𝑛𝜔𝑡 𝑑𝑡 𝑏 𝑛 = 2 𝑇 0 𝑇 𝑓(𝑡) sin 𝑛𝜔𝑡 𝑑𝑡 𝑎0 = 2 𝑇 0 𝑇 𝑓(𝑡) 𝑑𝑡 𝑏0 = 0
  • 26. Fourier transform: Quick intro Euler’s formula 𝑒 𝑗𝜔𝑡 = cos 𝜔𝑡 + 𝑗 sin 𝜔𝑡 𝑃 𝑡 = 𝑛=−∞ ∞ 𝑐 𝑛 𝑒 𝑗𝑛𝜔𝑡 𝑐 𝑛 = 1 𝑇 0 𝑇 𝑓(𝑡) 𝑒−𝑗𝑛𝜔𝑡 𝑑𝑡 Fourier Series 𝐹 𝑗𝜔 = −∞ ∞ 𝑓(𝑡)𝑒−𝑗𝜔𝑡 𝑑𝑡 Fourier Transform (CTFT) 𝑋 𝑗𝜔 = 𝑛=−∞ ∞ 𝑥(𝑛)𝑒−𝑗𝑛𝜔 Fourier Transform (DTFT)
  • 27. Fourier Transform (DTFT): Impulse train FT (Real): Magnitude FT (Imaginary): Phase shift
  • 28. Fourier Transform (DTFT): Impulse train FT (Real): Magnitude FT (Imaginary): Phase shift 10 1010
  • 29. Plotting Fourier Transform in Python N = time_signal.shape[0] signal_fft = numpy.fft.fft(time_signal) frequency_bins = numpy.fft.fftfreq(N) fig, ax = plt.subplots(1,2,figsize=(28,7)) ax[0].plot(frequency_bins[1:N/2], np.abs(signal_fft.real[1:N/2]), 'g') ax[1].plot(frequency_bins[1:N/2], signal_fft.imag[1:N/2], 'c')
  • 30. Case studies via public datasets • Sessionisation is an essential activity in detecting malicious bot activities like Beaconing • We will use 6th dataset of CTU-13 datasets for examples • Provided by Czech Technical University (CTU) • Traces captured from a malware attack executed in university network • 6th dataset simulates a bot named DonBot, it attacks SVC services on Windows • Dataset: https://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-Capture- Botnet-47/bro/conn.log
  • 31. Case – 1: DonBot’s DNS queries to university DNS server
  • 35. Stochastic periods: Introduction • Analyze periodicity in time domain • Compute consecutive time deltas • Real world signals are noisy so time deltas will vary a lot • If there is periodicity in the signal, time deltas will vary in a band • The density plot of time deltas will show some high density regions • We can learn a probability distributions for each high density region
  • 36. DonBot: DNS traces Bot’s DNS event stream Time delta analysis
  • 37. DonBot: DNS traces MeanPeriod = μ Periodicity = μ / σ Bot’s DNS event stream Time delta analysis
  • 38. DonBot: DNS traces MeanPeriod = μ Periodicity = μ / σ Bot’s DNS event stream Time delta analysis Session-2Session-1 Session-3
  • 39. DonBot: DNS traces MeanPeriod = μ Periodicity = μ / σ Time delta analysis Time delta Density plot
  • 40. DonBot: DNS traces MeanPeriod = μ Periodicity = μ / σ Time delta analysis Time delta Density plot PDF: Stochastic period
  • 41. Case – 2: DonBot’s data transfer via backdoor port (5678) with a malicious IP (91.212.135.158)
  • 42. DonBot: Data transfer via backdoor DonBot’s backdoor traffic Time delta analysis
  • 43. DonBot: Data transfer via backdoor DonBot’s backdoor traffic Time delta analysis
  • 44. DonBot: Data transfer via backdoor Zoomed in FT without 0th frequency Time delta analysis
  • 45. DonBot: Data transfer via backdoor Zoomed in FT without 0th frequency Time delta analysis
  • 46. DonBot: Data transfer via backdoor Time delta analysis Time delta Density plot
  • 47. DonBot: Data transfer via backdoor Time delta analysis Time delta Density plot
  • 48. DonBot: Data transfer via backdoor Stochastic Period - 1 Stochastic Period - 2 Time delta analysis Time delta Density plot
  • 49. Case – 3: Genuine systematic DNS queries to university’s DNS server
  • 50. Genuine systematic DNS queries to DNS server Normal DNS queries Time delta analysis
  • 51. Genuine systematic DNS queries to DNS server Normal DNS queries Time delta analysis [0.8, 1.6, 2.4, 3.2]
  • 52. Genuine systematic DNS queries to DNS server FT without 0th frequency Time delta analysis FT: Only able to highlight the higher time periods
  • 53. Genuine systematic DNS queries to DNS server Time delta analysis Time delta Density plot
  • 54. Genuine systematic DNS queries to DNS server Time delta analysis Time delta Density plot A B C D
  • 55. Case – 4: Just another interesting DNS pattern
  • 56. Normal DNS queries FT without 0th frequency Time delta analysis
  • 57. Normal DNS queries Time delta analysis Time delta Density plot
  • 58. Auto discovering multiple distributions tation Maximization w to estimate parameter ?✓ Expectation Maximization If sources are known,easy: How to estimate parameter ?✓GMM - Gaussian Mixture Models
  • 59. Auto discovering multiple distributions sticmodel of data Gaussian Mixture Model (GMM) GMM - Gaussian Mixture Models
  • 60. GMM – Gaussian Mixture Models • Does soft clustering of data points instead of hard clustering • In principal it is very similar to K-Means but works on probability • K-Means: {P1  C1, P2  C2}, GMM: {P1  [0.8, 0.1, 0.1], P2  [0.05, 0.85, 0.1]} • Problem with GMM & K-Means: We need to define “K” • Techniques like Elbow method, Silhouette, etc. are based on certain assumptions • Cannot be applied in general for automated discovery of K • Finding “K” automatically is a very hard problem to solve C1, C2, C3 C1, C2, C3
  • 61. Bayesian way of building models 𝑃 𝜃 𝑋 = 𝑃 𝑋 𝜃 𝑃(𝜃) 𝑃(𝑋) PriorLikelihood Evidence Posterior 𝑃 𝜃 𝑋 = 𝑃 𝑋 𝜃 𝑃(𝜃) 𝑃(𝑋) 𝑃(𝜃) is conjugate to 𝑃 𝑋 𝜃 A(𝜈’) A(𝜈) For example: P(𝜽) = 𝓝(𝜽|0, 1) # Standard normal P(X|𝜽) = 𝓝(x|𝜽, 1) # with 1 std. dev 𝑃(𝜃|𝑋) ∝ 𝑒− 1 2(𝑥−𝜃)2 𝑒− 1 2 𝜃2 𝑃(𝜃|𝑋) ∝ 𝑒−(𝜃− 𝑥 2)2 P(𝜽|X) = 𝓝(𝜃| 𝑥 2 , 1 2 )
  • 62. Constructing GMM Gaussian Mixture Model (GMM) 𝑃 𝑥 𝜃 = 𝜋1 𝒩 𝑥 𝜇1, Σ1 + 𝜋2 𝒩(𝑥|𝜇2, Σ2)+ 𝜋3 𝒩(𝑥|𝜇3, Σ3) 𝜃 = {𝜋1, 𝜋2, 𝜋3, 𝜇1, 𝜇2, 𝜇3, Σ1, Σ2, Σ3} Parametric setting K = 3
  • 63. Constructing GMM Gaussian Mixture Model (GMM) 𝑃 𝑥 𝜃 = 𝜋1 𝒩 𝑥 𝜇1, Σ1 + 𝜋2 𝒩(𝑥|𝜇2, Σ2)+ 𝜋3 𝒩(𝑥|𝜇3, Σ3) 𝜃 = {𝜋1, 𝜋2, 𝜋3, 𝜇1, 𝜇2, 𝜇3, Σ1, Σ2, Σ3} Parametric setting K = 3 t x 𝑝 𝑡 = 𝑐 𝜃 = 𝜋 𝑐 𝑝(𝑥|𝑡 = 𝑐, 𝜃) = 𝒩 𝑥 𝜇 𝑐, Σ 𝑐 t is a latent variable: [1, 2, 3], 𝜋1 + 𝜋2 + 𝜋3 = 1 𝑝 𝑥 𝜃 = 𝑐=1 𝐾 𝑝 𝑥 𝑡 = 𝑐, 𝜃 𝑝 𝑡 = 𝑐 𝜃Likelihood
  • 64. Constructing GMM Gaussian Mixture Model (GMM) 𝑃 𝑥 𝜃 = 𝜋1 𝒩 𝑥 𝜇1, Σ1 + 𝜋2 𝒩(𝑥|𝜇2, Σ2)+ 𝜋3 𝒩(𝑥|𝜇3, Σ3) 𝜃 = {𝜋1, 𝜋2, 𝜋3, 𝜇1, 𝜇2, 𝜇3, Σ1, Σ2, Σ3} Parametric setting K = 3 t x 𝑝 𝑡 = 𝑐 𝜃 = 𝜋 𝑐 𝑝(𝑥|𝑡 = 𝑐, 𝜃) = 𝒩 𝑥 𝜇 𝑐, Σ 𝑐 Training GMM: 𝑚𝑎𝑥 𝜃 𝑖=1 𝑁 𝑝(𝑥𝑖|𝜃) Subject to: 𝜋 𝑘 = 1 ; 𝜋 𝑘 ≥ 0; Σ 𝑘 ≻0 EM algorithm: • E-step: Compute q(t) dist. over t • M-step: Update Gaussian params • To fit points assigned to them t is a latent variable: [1, 2, 3], 𝜋1 + 𝜋2 + 𝜋3 = 1 𝑝 𝑥 𝜃 = 𝑐=1 𝐾 𝑝 𝑥 𝑡 = 𝑐, 𝜃 𝑝 𝑡 = 𝑐 𝜃Likelihood
  • 66. Sessionisation: Data Science at scale • In a real world scenario, be it • Web users over internet, Network hosts in an enterprise network, etc. • One would need to apply Sessionisation on millions of entities • So manual inspection based methods cannot be used • We need a fully automated system to discover multiple ”Stochastic Periods” • We need to find the clusters automatically
  • 67. Infinite GMM (Gaussian Mixture Models) Based on Bayesian non-parametric approaches
  • 68. Probabilistic programming Logistic regression: 𝑝 𝑦𝑖 = 1 𝛽) = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽0) 𝛽 ~ 𝒩(𝜇, 𝜎) Sampling: • MCMC • Gibbs
  • 69. Probabilistic programming Logistic regression: 𝑝 𝑦𝑖 = 1 𝛽) = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽0) 𝛽 ~ 𝒩(𝜇, 𝜎) Sampling: • MCMC • Gibbs Credible interval [a, b] 𝑝 𝑎 ≤ exp 𝛽𝑖 ≤ 𝑏 = 0.95 Posterior distribution of 𝛽 𝑝 exp 𝛽𝑖 𝑋𝑡𝑟𝑎𝑖𝑛, 𝑦𝑡𝑟𝑎𝑖𝑛)
  • 70. Dirichlet distribution 𝐷𝑖𝑟 𝜃 𝛼 = 1 𝐵(𝛼) 𝑘=1 𝐾 𝜃 𝑘 𝛼 𝑘−1 𝒌 𝜽 𝒌 = 𝟏 𝜃 𝑘 ≥ 0 𝐵 𝛼 = 𝑖=1 𝐾 Γ(𝛼𝑖) Γ( 𝑖=1 𝐾 𝛼𝑖) Γ 𝑛 = 𝑛 − 1 ! Beta distribution Gamma function for positive integer n
  • 71. Dirichlet distribution 𝐷𝑖𝑟 𝜃 𝛼 = 1 𝐵(𝛼) 𝑘=1 𝐾 𝜃 𝑘 𝛼 𝑘−1 K=3 Simplex (0, 0, 1) (0, 1, 0)(1, 0, 0) (0.3, 0.2, 0.5) Effects of varying 𝛼 Dirichlet distribution 𝛼 = (10, 10, 10) Dirichlet distribution 𝛼 = (0.1, 0.1, 0.1) Density 𝒌 𝜽 𝒌 = 𝟏 𝜃 𝑘 ≥ 0 𝐵 𝛼 = 𝑖=1 𝐾 Γ(𝛼𝑖) Γ( 𝑖=1 𝐾 𝛼𝑖) Γ 𝑛 = 𝑛 − 1 ! Beta distribution Gamma function for positive integer n ( 1 3 , 1 3 , 1 3 )
  • 72. Intuition behind Infinite GMM Properties of Dirichlet distribution Dirichlet distribution is conjugate to Multinomial distribution If π = 𝜋1, 𝜋2, … , 𝜋 𝑘 ~ 𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡(𝛼1, 𝛼2, … , 𝛼 𝑘) Dirichlet Satisfies expansion or combination rule: 𝝅 𝟏 𝜽, 𝝅 𝟏(𝟏 − 𝜽), 𝝅 𝟐, … , 𝝅 𝒌 ~ 𝑫𝒊𝒓𝒊𝒄𝒉𝒍𝒆𝒕(𝜶 𝟏 𝒃 , 𝜶 𝟏(𝟏 − 𝒃), 𝜶 𝟐, … , 𝜶 𝒌) Allows to increase the dimensionality of Dirichlet Where 0 < b < 1 and 𝜃~𝐵𝑒𝑡𝑎(𝛼1 𝑏, 𝛼1 1 − 𝑏 )
  • 73. Intuition behind Infinite GMM Properties of Dirichlet distribution Dirichlet distribution is conjugate to Multinomial distribution If π = 𝜋1, 𝜋2, … , 𝜋 𝑘 ~ 𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡(𝛼1, 𝛼2, … , 𝛼 𝑘) Dirichlet Satisfies expansion or combination rule: 𝝅 𝟏 𝜽, 𝝅 𝟏(𝟏 − 𝜽), 𝝅 𝟐, … , 𝝅 𝒌 ~ 𝑫𝒊𝒓𝒊𝒄𝒉𝒍𝒆𝒕(𝜶 𝟏 𝒃 , 𝜶 𝟏(𝟏 − 𝒃), 𝜶 𝟐, … , 𝜶 𝒌) Allows to increase the dimensionality of Dirichlet Where 0 < b < 1 and 𝜃~𝐵𝑒𝑡𝑎(𝛼1 𝑏, 𝛼1 1 − 𝑏 ) Dirichlet Process π(2) = 𝜋1 (2) , 𝜋2 (2) ~ 𝐷𝑖𝑟( 𝛼 2, 𝛼 2) ~ 𝐷𝑖𝑟( 𝛼 4, 𝛼 4, 𝛼 4, 𝛼 4) ~ 𝐷𝑖𝑟( 𝛼 𝐾 ,… 𝛼 𝐾 ) 𝑲 → ∞
  • 74. Dirichlet process 21 : The Indian Bu↵et Process Figure 2: On the left is an example of Indian Bu↵et Process dish assign the right is an example binary matrix generated from IBP. 3. The nth customer helps himself to each dish with probability mk / dish k was chosen. 4. He tries Poisson(↵/ n) new dishes. Indian buffet processChinese restaurant process Chinese restaurant process in action
  • 76. Infinite GMM Probabilistic modelling: • PyMC3 • TensorFlow Scikit-learn: sklearn.mixture. BayessianGuassian Mixture Dirichlet Prior: • Dirichlet Distribution - Finite GMM • Dirichlet Process - Infinite GMM
  • 77. Probabilistic modeling • Probabilistic models captures the uncertainty better in real world data • But it is very computationally intensive • The sampling process takes time to stabilize and then generate meaningful results • Certainly cannot work on large datasets
  • 78. Applied Data Science for automated clustering
  • 82. Obtaining stochastic periods recursively Stochastic periods • Get probability distributions from dense regions Get dense regions list • Find dmin to cluster a region Recursively split regions • If region is: {Heavy tailed, Multi-modal} Kurtosis of normal distribution = 3 Heavy tailed: Excess Kurtosis > 6 𝐵𝑖𝑚𝑜𝑑𝑎𝑙𝑖𝑡𝑦 = 𝛾2 + 1 𝜅 𝛾: Skewness 𝜅 : Kurtosis Bimodality for uniform distribution 5/9 Bimodality > 0.8 Unimoda l Not unimodal Unimo dal ?
  • 83. dmin via distance matrix This topic was covered generically in detail during ODSC 2018 – “Topological space clustering”
  • 84. dmin via distance matrix This topic was covered generically in detail during ODSC 2018 – “Topological space clustering”
  • 85. dmin via distance matrix This topic was covered generically in detail during ODSC 2018 – “Topological space clustering” If local dense regions exists along with sparsity, then we can obtain hierarchical clusters at each mode
  • 86. Density plot of distance matrix
  • 87. Method proposed: Finding optimal clustering-epsilon • The problem comes down to finding the most optimal curve for the Gaussian kernel • One of the ways to solve it algorithmically Grid Search (band_width, grid_size) rFFT Silverman Transform I- rFFT Score (logLoss, stdDev) Minima (band_width, grid_size)
  • 88. Genuine systematic DNS queries to DNS server Time delta analysis Time delta Density plot
  • 89. Stochastic periods: Systematic DNS queries Mean (μ) Std. (σ) Periodicity ( μ / σ ) Skewness Kurtosis Bimodal Coeff Num Points Density Range_min Range_max 0 0.809819 0.000499 1623.841793 0.252468 -0.628825 0.443460 330 330.000000 0.808578 0.810866 1 0.812474 0.000559 1454.556761 -0.879146 0.291782 0.536779 817 817.000000 0.810869 0.813244 2 0.813497 0.000141 5758.645345 0.130659 -1.036117 0.512333 426 426.000000 0.813245 0.813768 3 0.814162 0.000326 2495.343595 0.774715 -0.464280 0.623092 281 281.000000 0.813769 0.815018 4 1.622954 0.000631 2570.868659 -0.093343 -1.009738 0.504116 845 845.000000 1.621745 1.624109 5 1.625497 0.000630 2578.372267 -0.304701 -0.984417 0.540452 1386 1386.000000 1.624114 1.626489 6 1.627108 0.000496 3282.770498 0.992204 0.538372 0.558516 614 614.000000 1.626492 1.628858 7 2.436156 0.000627 3885.490319 0.059122 -1.007464 0.492753 208 208.000000 2.434985 2.437341 8 2.438674 0.000653 3733.232096 -0.230265 -0.988668 0.514873 269 269.000000 2.437374 2.439728
  • 90. THANKS E-mail: kuldeep.jiwani@gmail.com / kuldeep.Jiwani@thalesgroup.com LinkedIn: https://www.linkedin.com/in/kuldeep-jiwani-988605/
  • 92. Finite GMM: Bayesian setting Algorithm: Collapsed Gibbs sampler for a finite Gaussian mixture model Choose an initial z For T iterations do # Gibbs sampling iterations For i = 1 to N do Remove xi ’s statistics from component zi # Old cluster assignment for xi For k = 1 to K do # Every possible component Calculate P(zi = k|zi , α) Calculate p(xi |Xki , β) Calculate P(zi = k|zi , X , α, β) ∝ P(zi = k|zi , α) p(xi |Xki , β) End for Sample knew from P(zi |zi , X , α, β) after normalizing Add xi ’s statistics to the component zi = knew # New assignment for xi End for End for Evaluation metric for Gibbs: 𝑘=1 𝐾 𝑝 𝑋 𝑘 𝛽 𝑝(𝑧|𝛼)
  • 93. Infinite GMM: Bayesian setting Choose an initial z For T iterations do # Gibbs sampling iterations For i = 1 to N do Remove xi ’s statistics from component zi # Old cluster assignment for xi For k = 1 to K do # Every possible component Calculate P(zi = k|zi , α) Calculate p(xi |Xki , β) Calculate P(zi = k|zi , X , α, β) ∝ P(zi = k|zi , α) p(xi |Xki , β) End for Calculate P (zi = k∗|zi, α) # Consider a new component Calculate p(xi|β) Calculate P (zi = k∗|zi, X , α, β) ∝ P (zi = k∗|zi, α) p(xi|β) Sample knew from P(zi |zi , X , α, β) after normalizing If any component is empty, remove it and decrease K Add xi ’s statistics to the component zi = knew # New assignment for xi End for End for

Editor's Notes

  1. 80 000 employees in 68 countries, a global company Heavy investments in innovation every year to develop state-of-the-art technologies: 1Bn€ invested in self-funded R&D