SlideShare a Scribd company logo
Dictyogram: a Statistical Approach for the
Definition and Visualization of Network Flow
Categories
David Muelas, Miguel Gordo, Jos´e Luis Garc´ıa-Dorado,
Jorge E. L´opez de Vergara
Email: {dav.muelas, jl.garcia, jorge.lopez vergara}@uam.es,
miguel.gordo@estudiante.uam.es
Universidad Aut´onoma de Madrid
CNSM 2015 – November 2015
Network Health Check
Network managers must monitor network vital signs to assure it is
healthy:
(a) ECG
00:00:00 03:20:00 06:40:00 10:00:00 13:20:00 16:40:00 20:00:00 23:20:00
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Cat1 Cat2 Cat3 Cat4 Cat5 Cat6 Cat7 Cat8 Cat9 Cat10
(b) Dictyogram (Normalized version)
But. . . What exactly is Dictyogram?
Dictyogram (from δ´ικτυo, network in Greek): Method to
graphically trace the network flow behavior versus time. Its
graphical results can be like a network electrogram, showing its
vital signs.
Introduction
Method definition
Experimental results
Conclusions
Outline
1 Introduction
Context
Our Goals
2 Method definition
Probability integral transform
Modeling CDFs
3 Experimental results
Model evaluation
Dictyogram visualization
4 Conclusions
D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 4
Introduction
Method definition
Experimental results
Conclusions
Context
Our Goals
Context
Network flow-based monitoring has been proven useful to
detect network intrusion, malfunction, or other types of
anomalies.
Unfortunately, network managers have to deal with tons of
measurement data, and its interpretation has become a
challenge.
Data summaries: difficult to reach a good trade-off between
detail and simplifications: insufficient data can lead to
restricted or even erroneous conclusions.
Not only the measurements are important from the point of
view of network management: the application of suitable
techniques improves the quality and depth of the knowledge
that can be extracted from measurements.
D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 5
Introduction
Method definition
Experimental results
Conclusions
Context
Our Goals
Our Goals
Our proposal is intended to ease network managers’ work by
proposing a novel approach to study the behavior of network flow
characteristics. Our main goal is to define comprehensive
summaries of network flow data:
Our approach is based in the study of different flow
characteristics’ ECDFs — e.g., flow size or duration
distributions.
Using those ECDFs, we define flow categories using the
integral probability transform — e.g., using decile delimited
intervals.
As we will see, this approach improves the detection of network
anomalies and the visualization of network state.
D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 6
Introduction
Method definition
Experimental results
Conclusions
Probability integral transform
Modeling CDFs
Method description
Probability integral transform:
Let X be a continuous random variable with cumulative
distribution function FX . Then FX (X) follows a uniform
distribution on [0, 1].
(b)
0
0.5
1
(a)
C
i
= F
X
−1
(P
i
)
P
i
And them, we define flow categories using a set of probability
levels using the CDF of certain flow characteristics.
D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 7
Introduction
Method definition
Experimental results
Conclusions
Probability integral transform
Modeling CDFs
Keep an eye on the hypotheses!
25 30 35
0
0.2
0.4
0.6
0.8
1
(b)
0200400600
0
0.2
0.4
0.6
0.8
1
(a)
(c) Gaussian
0 20 40 60
0
0.2
0.4
0.6
0.8
1
(b)
05101520
0
0.2
0.4
0.6
0.8
1
(a)
(d) Poisson
D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 8
Introduction
Method definition
Experimental results
Conclusions
Probability integral transform
Modeling CDFs
How can we model an CDF?
Glivenko-Cantelli theorem: the ECDF converges to the CDF
as the number of observations increases.
Nonetheless, computational cost increases when we
accumulate all the values of the characteristic under analysis.
Alternative approach: Functional Data Analysis:
Mean Function: Fmean
X =
1
n
n
i=1
FXi
Problem: not robust
Functional Depth:
Maximum depth observation.
Median Function (it is the function that maximizes the
functional depth we use).
Problem: more computationally expensive
D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 9
Introduction
Method definition
Experimental results
Conclusions
Probability integral transform
Modeling CDFs
Dataset for the evaluation
To asses the advantages of our method, we have use a real
dataset:
Flow records, Spanish Academic Network: more than one
million users, more than 7 years of data.
Exporters: 5 Netflow exporters, different geographical
locations (all of them in Spain).
Packet level sampling: rate of one out of 100 packets.
Period selected for the evaluation of the CDF estimation
methods: 30 days.
D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 10
Introduction
Method definition
Experimental results
Conclusions
Probability integral transform
Modeling CDFs
Analyzing ECDFs to get a model of the typical behavior
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
X: 40
Y: 0.9
X: 44
Y: 0.8
X: 53
Y: 0.7
X: 80
Y: 0.6
X: 149
Y: 0.5
X: 501
Y: 0.4
X: 1452
Y: 0.3
X: 1500
Y: 0.2
X: 3000
Y: 0.1
Flow size (bytes)
P(X>x)
Mean
Deepest
Median
Figure: Comparison between observed CCDFs (orange line, no marker)
for Exporter A, and models obtained using the mean (blue line, circles),
deepest (black line, diamonds) and median (red line, triangles) functions.
D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 11
Introduction
Method definition
Experimental results
Conclusions
Model evaluation
Dictyogram visualization
Empirical comparison (I)
0 5 10 15 20 25 30
0
5
10
x 10
5
A
0 5 10 15 20 25 30
0
5
10
x 10
6
B
0 5 10 15 20 25 30
0
5
10
x 10
7
C
0 5 10 15 20 25 30
0
5
10
x 10
6
D
0 5 10 15 20 25 30
0
5
x 10
6
E
Mean Deepest Median
Figure: Evolution of the Pearson’s test-statistic for all exporters. (Less is
better.)
D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 12
Introduction
Method definition
Experimental results
Conclusions
Model evaluation
Dictyogram visualization
Empirical comparison (II)
Table: Summary of the evaluation of the different methods to estimate
the CDF.
Exporter Method # Best
A
Mean function 0
Deepest obs. 3
Median function 25
B
Mean function 0
Deepest obs. 6
Median function 22
C
Mean function 20
Deepest obs. 8
Median function 0
D
Mean function 0
Deepest obs. 23
Median function 5
E
Mean function 0
Deepest obs. 28
Median function 0
D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 13
Introduction
Method definition
Experimental results
Conclusions
Model evaluation
Dictyogram visualization
Final visualization of Dictyogram
03:00:00 06:00:00 09:00:00 12:00:00 15:00:00 18:00:00 21:00:00
0
2
4
x 10
4
(a) Mean
Concurrentflowsforeachcategory
03:00:00 06:00:00 09:00:00 12:00:00 15:00:00 18:00:00 21:00:00
0
2
4
x 10
4
(b) Deepest Observation
Time of day
03:00:00 06:00:00 09:00:00 12:00:00 15:00:00 18:00:00 21:00:00
0
2
4
x 10
4
(c) Median
1
1
1
2
2
2
Figure: Dictyogram representation of fi (t) with their respective size
intervals delimited by the deciles given by (a) mean, (b) deepest observed
ECDF, and (c) median.
D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 14
Introduction
Method definition
Experimental results
Conclusions
Model evaluation
Dictyogram visualization
Final visualization of Dictyogram
00:00:00 03:20:00 06:40:00 10:00:00 13:20:00 16:40:00 20:00:00 23:20:00
0
0.5
1
1.5
2
2.5
3
3.5
4
x 10
4
1 2
Figure: Zoom in the median.
D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 15
Introduction
Method definition
Experimental results
Conclusions
Key remarks
Our method:
Is manager friendly: it provides Statistical summaries based
on certain probability levels, which eases the study of the
flows traversing the network.
Links statistical properties to time evolution: it eases the
detection of changes in the statistical properties of the
characteristics under analysis.
Improves network flow data visualization: it lets control
the resolution of the visualization of the distribution that
network flow characteristics follow.
D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 16
Introduction
Method definition
Experimental results
Conclusions
Future work
We plan to:
Study how to summarize several different network behaviors in
a multivariate uniform distribution, and use other well-known
distributions (and not only uniform) for signatures.
Study the distribution of the Pearson’s test-statistic to detect
anomalous events.
Test the stability of the estimation of the CDF ( to define
some criteria to recalibrate the model).
Explore other representations with higher dimensionality.
D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 17
Introduction
Method definition
Experimental results
Conclusions
Thank you!
Questions?
D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 18
Introduction
Method definition
Experimental results
Conclusions
Annex: Functional depth
We use the definition given by:
MSn,H(x) = min{SLn(x), ILn(x)} (1)
where
SLn(x) = 1
nλ(I)
n
i=1
λ{t ∈ I : x(t) ≤ xi (t)}
ILn(x) = 1
nλ(I)
n
i=1
λ{t ∈ I : x(t) ≥ xi (t)} (2)
With it, we consider:
Maximum depth observation.
Median Function (it is the function that maximizes the
functional depth we use).
D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 19

More Related Content

Similar to Dictyogram: a Statistical Approach for the Definition and Visualization of Network Flow Categories

Fault detection and diagnosis for non-Gaussian stochastic distribution system...
Fault detection and diagnosis for non-Gaussian stochastic distribution system...Fault detection and diagnosis for non-Gaussian stochastic distribution system...
Fault detection and diagnosis for non-Gaussian stochastic distribution system...ISA Interchange
 
Multimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution MethodMultimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution Method
IJERA Editor
 
Healthcare deserts: How accessible is US healthcare?
Healthcare deserts: How accessible is US healthcare?Healthcare deserts: How accessible is US healthcare?
Healthcare deserts: How accessible is US healthcare?
Data Con LA
 
Csmr10a.ppt
Csmr10a.pptCsmr10a.ppt
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
ssuser4b1f48
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
Azad public school
 
High performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayesHigh performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayes
eSAT Journals
 
High performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayesHigh performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayes
eSAT Journals
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
cscpconf
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
csitconf
 
On Tracking Behavior of Streaming Data: An Unsupervised Approach
On Tracking Behavior of Streaming Data: An Unsupervised ApproachOn Tracking Behavior of Streaming Data: An Unsupervised Approach
On Tracking Behavior of Streaming Data: An Unsupervised Approach
Waqas Tariq
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classification
csandit
 
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
cscpconf
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
Ganesan Narayanasamy
 
Introduction to MARS (1999)
Introduction to MARS (1999)Introduction to MARS (1999)
Introduction to MARS (1999)Salford Systems
 
Credit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning TechniquesCredit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Zhongmin Luo
 
Beginner's Guide to Diffusion Models..pptx
Beginner's Guide to Diffusion Models..pptxBeginner's Guide to Diffusion Models..pptx
Beginner's Guide to Diffusion Models..pptx
Ishaq Khan
 
Detecting Discontinuties in Large Scale Systems
Detecting  Discontinuties in Large Scale SystemsDetecting  Discontinuties in Large Scale Systems
Detecting Discontinuties in Large Scale Systems
haroonmalik786
 

Similar to Dictyogram: a Statistical Approach for the Definition and Visualization of Network Flow Categories (20)

Fault detection and diagnosis for non-Gaussian stochastic distribution system...
Fault detection and diagnosis for non-Gaussian stochastic distribution system...Fault detection and diagnosis for non-Gaussian stochastic distribution system...
Fault detection and diagnosis for non-Gaussian stochastic distribution system...
 
Multimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution MethodMultimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution Method
 
Healthcare deserts: How accessible is US healthcare?
Healthcare deserts: How accessible is US healthcare?Healthcare deserts: How accessible is US healthcare?
Healthcare deserts: How accessible is US healthcare?
 
Csmr10a.ppt
Csmr10a.pptCsmr10a.ppt
Csmr10a.ppt
 
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
 
report
reportreport
report
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
High performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayesHigh performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayes
 
High performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayesHigh performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayes
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
 
On Tracking Behavior of Streaming Data: An Unsupervised Approach
On Tracking Behavior of Streaming Data: An Unsupervised ApproachOn Tracking Behavior of Streaming Data: An Unsupervised Approach
On Tracking Behavior of Streaming Data: An Unsupervised Approach
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classification
 
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
MEDICAL DIAGNOSIS CLASSIFICATION USING MIGRATION BASED DIFFERENTIAL EVOLUTION...
 
CSMR10a.ppt
CSMR10a.pptCSMR10a.ppt
CSMR10a.ppt
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
Introduction to MARS (1999)
Introduction to MARS (1999)Introduction to MARS (1999)
Introduction to MARS (1999)
 
Credit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning TechniquesCredit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning Techniques
 
Beginner's Guide to Diffusion Models..pptx
Beginner's Guide to Diffusion Models..pptxBeginner's Guide to Diffusion Models..pptx
Beginner's Guide to Diffusion Models..pptx
 
Detecting Discontinuties in Large Scale Systems
Detecting  Discontinuties in Large Scale SystemsDetecting  Discontinuties in Large Scale Systems
Detecting Discontinuties in Large Scale Systems
 

More from Jorge E. López de Vergara Méndez

On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...
Jorge E. López de Vergara Méndez
 
Evaluación de equipamiento de bajo coste para realizar medidas de red en ento...
Evaluación de equipamiento de bajo coste para realizar medidas de red en ento...Evaluación de equipamiento de bajo coste para realizar medidas de red en ento...
Evaluación de equipamiento de bajo coste para realizar medidas de red en ento...
Jorge E. López de Vergara Méndez
 
Análisis de Datos Funcionales para Gestión de Red: Téecnicas, Retos y Oportun...
Análisis de Datos Funcionales para Gestión de Red: Téecnicas, Retos y Oportun...Análisis de Datos Funcionales para Gestión de Red: Téecnicas, Retos y Oportun...
Análisis de Datos Funcionales para Gestión de Red: Téecnicas, Retos y Oportun...
Jorge E. López de Vergara Méndez
 
MONITORIZACIÓN Y ANÁLISIS DE TRÁFICO DE RED CON APACHE HADOOP
MONITORIZACIÓN Y ANÁLISIS DE TRÁFICO DE RED CON APACHE HADOOPMONITORIZACIÓN Y ANÁLISIS DE TRÁFICO DE RED CON APACHE HADOOP
MONITORIZACIÓN Y ANÁLISIS DE TRÁFICO DE RED CON APACHE HADOOP
Jorge E. López de Vergara Méndez
 
Merging heterogeneous network measurement data
Merging heterogeneous network measurement dataMerging heterogeneous network measurement data
Merging heterogeneous network measurement data
Jorge E. López de Vergara Méndez
 
Multimedia flow classification at 10 Gbps using acceleration techniques on co...
Multimedia flow classification at 10 Gbps using acceleration techniques on co...Multimedia flow classification at 10 Gbps using acceleration techniques on co...
Multimedia flow classification at 10 Gbps using acceleration techniques on co...
Jorge E. López de Vergara Méndez
 
Evaluating Quality of Experience in IPTV Services Using MPEG Frame Loss Rate
Evaluating Quality of Experience in IPTV Services Using MPEG Frame Loss RateEvaluating Quality of Experience in IPTV Services Using MPEG Frame Loss Rate
Evaluating Quality of Experience in IPTV Services Using MPEG Frame Loss Rate
Jorge E. López de Vergara Méndez
 
Defining ontologies for IP traffic measurements at MOI ISG
Defining ontologies for IP traffic measurements at MOI ISGDefining ontologies for IP traffic measurements at MOI ISG
Defining ontologies for IP traffic measurements at MOI ISG
Jorge E. López de Vergara Méndez
 
Integración semántica de información de distintos repositorios de medidas de red
Integración semántica de información de distintos repositorios de medidas de redIntegración semántica de información de distintos repositorios de medidas de red
Integración semántica de información de distintos repositorios de medidas de red
Jorge E. López de Vergara Méndez
 

More from Jorge E. López de Vergara Méndez (9)

On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...
 
Evaluación de equipamiento de bajo coste para realizar medidas de red en ento...
Evaluación de equipamiento de bajo coste para realizar medidas de red en ento...Evaluación de equipamiento de bajo coste para realizar medidas de red en ento...
Evaluación de equipamiento de bajo coste para realizar medidas de red en ento...
 
Análisis de Datos Funcionales para Gestión de Red: Téecnicas, Retos y Oportun...
Análisis de Datos Funcionales para Gestión de Red: Téecnicas, Retos y Oportun...Análisis de Datos Funcionales para Gestión de Red: Téecnicas, Retos y Oportun...
Análisis de Datos Funcionales para Gestión de Red: Téecnicas, Retos y Oportun...
 
MONITORIZACIÓN Y ANÁLISIS DE TRÁFICO DE RED CON APACHE HADOOP
MONITORIZACIÓN Y ANÁLISIS DE TRÁFICO DE RED CON APACHE HADOOPMONITORIZACIÓN Y ANÁLISIS DE TRÁFICO DE RED CON APACHE HADOOP
MONITORIZACIÓN Y ANÁLISIS DE TRÁFICO DE RED CON APACHE HADOOP
 
Merging heterogeneous network measurement data
Merging heterogeneous network measurement dataMerging heterogeneous network measurement data
Merging heterogeneous network measurement data
 
Multimedia flow classification at 10 Gbps using acceleration techniques on co...
Multimedia flow classification at 10 Gbps using acceleration techniques on co...Multimedia flow classification at 10 Gbps using acceleration techniques on co...
Multimedia flow classification at 10 Gbps using acceleration techniques on co...
 
Evaluating Quality of Experience in IPTV Services Using MPEG Frame Loss Rate
Evaluating Quality of Experience in IPTV Services Using MPEG Frame Loss RateEvaluating Quality of Experience in IPTV Services Using MPEG Frame Loss Rate
Evaluating Quality of Experience in IPTV Services Using MPEG Frame Loss Rate
 
Defining ontologies for IP traffic measurements at MOI ISG
Defining ontologies for IP traffic measurements at MOI ISGDefining ontologies for IP traffic measurements at MOI ISG
Defining ontologies for IP traffic measurements at MOI ISG
 
Integración semántica de información de distintos repositorios de medidas de red
Integración semántica de información de distintos repositorios de medidas de redIntegración semántica de información de distintos repositorios de medidas de red
Integración semántica de información de distintos repositorios de medidas de red
 

Recently uploaded

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 

Recently uploaded (20)

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 

Dictyogram: a Statistical Approach for the Definition and Visualization of Network Flow Categories

  • 1. Dictyogram: a Statistical Approach for the Definition and Visualization of Network Flow Categories David Muelas, Miguel Gordo, Jos´e Luis Garc´ıa-Dorado, Jorge E. L´opez de Vergara Email: {dav.muelas, jl.garcia, jorge.lopez vergara}@uam.es, miguel.gordo@estudiante.uam.es Universidad Aut´onoma de Madrid CNSM 2015 – November 2015
  • 2. Network Health Check Network managers must monitor network vital signs to assure it is healthy: (a) ECG 00:00:00 03:20:00 06:40:00 10:00:00 13:20:00 16:40:00 20:00:00 23:20:00 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Cat1 Cat2 Cat3 Cat4 Cat5 Cat6 Cat7 Cat8 Cat9 Cat10 (b) Dictyogram (Normalized version) But. . . What exactly is Dictyogram?
  • 3. Dictyogram (from δ´ικτυo, network in Greek): Method to graphically trace the network flow behavior versus time. Its graphical results can be like a network electrogram, showing its vital signs.
  • 4. Introduction Method definition Experimental results Conclusions Outline 1 Introduction Context Our Goals 2 Method definition Probability integral transform Modeling CDFs 3 Experimental results Model evaluation Dictyogram visualization 4 Conclusions D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 4
  • 5. Introduction Method definition Experimental results Conclusions Context Our Goals Context Network flow-based monitoring has been proven useful to detect network intrusion, malfunction, or other types of anomalies. Unfortunately, network managers have to deal with tons of measurement data, and its interpretation has become a challenge. Data summaries: difficult to reach a good trade-off between detail and simplifications: insufficient data can lead to restricted or even erroneous conclusions. Not only the measurements are important from the point of view of network management: the application of suitable techniques improves the quality and depth of the knowledge that can be extracted from measurements. D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 5
  • 6. Introduction Method definition Experimental results Conclusions Context Our Goals Our Goals Our proposal is intended to ease network managers’ work by proposing a novel approach to study the behavior of network flow characteristics. Our main goal is to define comprehensive summaries of network flow data: Our approach is based in the study of different flow characteristics’ ECDFs — e.g., flow size or duration distributions. Using those ECDFs, we define flow categories using the integral probability transform — e.g., using decile delimited intervals. As we will see, this approach improves the detection of network anomalies and the visualization of network state. D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 6
  • 7. Introduction Method definition Experimental results Conclusions Probability integral transform Modeling CDFs Method description Probability integral transform: Let X be a continuous random variable with cumulative distribution function FX . Then FX (X) follows a uniform distribution on [0, 1]. (b) 0 0.5 1 (a) C i = F X −1 (P i ) P i And them, we define flow categories using a set of probability levels using the CDF of certain flow characteristics. D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 7
  • 8. Introduction Method definition Experimental results Conclusions Probability integral transform Modeling CDFs Keep an eye on the hypotheses! 25 30 35 0 0.2 0.4 0.6 0.8 1 (b) 0200400600 0 0.2 0.4 0.6 0.8 1 (a) (c) Gaussian 0 20 40 60 0 0.2 0.4 0.6 0.8 1 (b) 05101520 0 0.2 0.4 0.6 0.8 1 (a) (d) Poisson D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 8
  • 9. Introduction Method definition Experimental results Conclusions Probability integral transform Modeling CDFs How can we model an CDF? Glivenko-Cantelli theorem: the ECDF converges to the CDF as the number of observations increases. Nonetheless, computational cost increases when we accumulate all the values of the characteristic under analysis. Alternative approach: Functional Data Analysis: Mean Function: Fmean X = 1 n n i=1 FXi Problem: not robust Functional Depth: Maximum depth observation. Median Function (it is the function that maximizes the functional depth we use). Problem: more computationally expensive D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 9
  • 10. Introduction Method definition Experimental results Conclusions Probability integral transform Modeling CDFs Dataset for the evaluation To asses the advantages of our method, we have use a real dataset: Flow records, Spanish Academic Network: more than one million users, more than 7 years of data. Exporters: 5 Netflow exporters, different geographical locations (all of them in Spain). Packet level sampling: rate of one out of 100 packets. Period selected for the evaluation of the CDF estimation methods: 30 days. D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 10
  • 11. Introduction Method definition Experimental results Conclusions Probability integral transform Modeling CDFs Analyzing ECDFs to get a model of the typical behavior 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 X: 40 Y: 0.9 X: 44 Y: 0.8 X: 53 Y: 0.7 X: 80 Y: 0.6 X: 149 Y: 0.5 X: 501 Y: 0.4 X: 1452 Y: 0.3 X: 1500 Y: 0.2 X: 3000 Y: 0.1 Flow size (bytes) P(X>x) Mean Deepest Median Figure: Comparison between observed CCDFs (orange line, no marker) for Exporter A, and models obtained using the mean (blue line, circles), deepest (black line, diamonds) and median (red line, triangles) functions. D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 11
  • 12. Introduction Method definition Experimental results Conclusions Model evaluation Dictyogram visualization Empirical comparison (I) 0 5 10 15 20 25 30 0 5 10 x 10 5 A 0 5 10 15 20 25 30 0 5 10 x 10 6 B 0 5 10 15 20 25 30 0 5 10 x 10 7 C 0 5 10 15 20 25 30 0 5 10 x 10 6 D 0 5 10 15 20 25 30 0 5 x 10 6 E Mean Deepest Median Figure: Evolution of the Pearson’s test-statistic for all exporters. (Less is better.) D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 12
  • 13. Introduction Method definition Experimental results Conclusions Model evaluation Dictyogram visualization Empirical comparison (II) Table: Summary of the evaluation of the different methods to estimate the CDF. Exporter Method # Best A Mean function 0 Deepest obs. 3 Median function 25 B Mean function 0 Deepest obs. 6 Median function 22 C Mean function 20 Deepest obs. 8 Median function 0 D Mean function 0 Deepest obs. 23 Median function 5 E Mean function 0 Deepest obs. 28 Median function 0 D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 13
  • 14. Introduction Method definition Experimental results Conclusions Model evaluation Dictyogram visualization Final visualization of Dictyogram 03:00:00 06:00:00 09:00:00 12:00:00 15:00:00 18:00:00 21:00:00 0 2 4 x 10 4 (a) Mean Concurrentflowsforeachcategory 03:00:00 06:00:00 09:00:00 12:00:00 15:00:00 18:00:00 21:00:00 0 2 4 x 10 4 (b) Deepest Observation Time of day 03:00:00 06:00:00 09:00:00 12:00:00 15:00:00 18:00:00 21:00:00 0 2 4 x 10 4 (c) Median 1 1 1 2 2 2 Figure: Dictyogram representation of fi (t) with their respective size intervals delimited by the deciles given by (a) mean, (b) deepest observed ECDF, and (c) median. D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 14
  • 15. Introduction Method definition Experimental results Conclusions Model evaluation Dictyogram visualization Final visualization of Dictyogram 00:00:00 03:20:00 06:40:00 10:00:00 13:20:00 16:40:00 20:00:00 23:20:00 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 4 1 2 Figure: Zoom in the median. D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 15
  • 16. Introduction Method definition Experimental results Conclusions Key remarks Our method: Is manager friendly: it provides Statistical summaries based on certain probability levels, which eases the study of the flows traversing the network. Links statistical properties to time evolution: it eases the detection of changes in the statistical properties of the characteristics under analysis. Improves network flow data visualization: it lets control the resolution of the visualization of the distribution that network flow characteristics follow. D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 16
  • 17. Introduction Method definition Experimental results Conclusions Future work We plan to: Study how to summarize several different network behaviors in a multivariate uniform distribution, and use other well-known distributions (and not only uniform) for signatures. Study the distribution of the Pearson’s test-statistic to detect anomalous events. Test the stability of the estimation of the CDF ( to define some criteria to recalibrate the model). Explore other representations with higher dimensionality. D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 17
  • 18. Introduction Method definition Experimental results Conclusions Thank you! Questions? D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 18
  • 19. Introduction Method definition Experimental results Conclusions Annex: Functional depth We use the definition given by: MSn,H(x) = min{SLn(x), ILn(x)} (1) where SLn(x) = 1 nλ(I) n i=1 λ{t ∈ I : x(t) ≤ xi (t)} ILn(x) = 1 nλ(I) n i=1 λ{t ∈ I : x(t) ≥ xi (t)} (2) With it, we consider: Maximum depth observation. Median Function (it is the function that maximizes the functional depth we use). D. Muelas, M. Gordo, J.L. Garc´ıa-Dorado, J.E. L´opez de Vergara Dictyogram 19