SlideShare a Scribd company logo
The Tale of Heavy Tails in Computer Networking
Stenio Fernandes
CIn/UFPE, Recife, Brazil
Carleton University - ARS Lab – May 2016
Outline
 Essential Concepts and Terminology
| The heavy-tail phenomenon
| Outliers detection
| Heavy-tailed distributions and its variations (subclasses)
 Evidences of Heavy-Tailedness in Computer Networks
| Examples
2
Essential Concepts and Terminology
3
The heavy-tail phenomenon
4
• Heavy-tailedness in computer networking is like Ninjas,
they’re everywhere Internet meme
• Extreme observations must be taken carefully and very
seriously
• Dataset that exhibits very large observation values
makes descriptive and inferential statistical analysis
much more difficult
• It might not make sense to use traditional statistical
techniques and tools in these cases
• Some important initial questions:
• Are we confident to discard single, scattered, or
burst of observations that presents extreme values
due to uncontrolled factors?
• Are the extreme values come from valid
measurements?
Statistical black sheep
 This is what I call Statistical Black Sheep
| the ones that causes shame or embarrassment because of deviation from the accepted
standards of his or her group (Black Sheep definition on M-W)
 You can either keep or discard such measurement values
based on subjective analysis
| It is out of scope of your interest
• Ex.: mean value of cat videos length on YouTube
 Take-home lesson:
| do not disgrace the black sheep without proper reasons
 Decision can also be made based on rigorous statistical
analysis
| a quantitative analysis
| Recall that an outlier might be influential on regression modeling (more on that later) 5
It starts with outliers
Here is an Outlier
6
 Here is another one!
Outliers
 An observation can be considered as outlier if it falls below or
above certain limits
| detection is only an indication that you might want to think carefully about them
 There are a number of formal tests and rules of thumbs to detect
outliers in an observation variable
1. Grubbs’
2. Tietjen-Moore’s
3. Mahalanobis distance
4. Extreme Value Theory (EVT)
5. Generalized Extreme Studentized Deviate (ESD)
 Try to not be so picky when choosing the method
| simply because outlier detection and handling is an art
| a subjective approach plays an important role to accommodate outliers in your analysis 7
Outliers in Regression Models
8
Outliers
 Kurtosis: concrete idea
about the expected
number of outliers
| High (strong skewness) or low
(weak skewness)
 A general approach for
outlier detection
| identify values apart from the
central values( in terms of 𝜎)
| A common and simple approach
• define the fences as 𝜇 ± 3 × 𝜎
• 𝜇 is the sample mean
• 𝜎 is the sample standard deviation
(more conservative: use 4 𝜎) 9
Outliers and Heavy-Tails
 Verify if there are lots of observations outside the fences
| This might be indicating that the underlying phenomenon generates heavy-tailed data
• Your black sheep metamorphoses into a black swan
 If extreme values come from distributions with heavy tails
• Weibull, Gamma, Pareto
| Such events are not so rare
| They are likely to be part of the underlying phenomenon
 If you decided to keep the outliers
| You recognized them as part of the underlying data generation process
| You need to address them properly
 Why do we need to use other statistical measures when dealing
with heavy-tailed distributions?
10
Moments from Heavy-Tails Distributions
11
More on Heavy-Tails
 Classification
| Light or Thin tail
| Fat or Heavy tail
| Long tail
 Light and thin tailed distributions are always used as references
| Normal and Exponential distributions
| Definition: A probability distribution that has an exponentially decaying complementary CDF
 Heavy-tailed distributions are the general ones
| most formal analysis of heavy tailed distributions indeed deals with right heavy tailed
distributions with [0, ∞] support
| the term fat tail is not well accepted by the traditional (and more formal) communities of
statisticians and mathematicians, although is widely used in the finance one
12
Some Formal Stuff
 Some intuition behind the concepts
| Power-Law is a relation between two variables in a 𝑝 𝑥 ∝ 𝑐𝑓(𝑥) form, where 𝑓 𝑥 takes a
general form of 𝑥−𝛼
| There are dozens of power-law distributions
• Zipf and Pareto are the most well-known ones in the computer networking field
| They have interesting mathematical properties, such as the tails fall asymptotically according
to the power parameter
 The Pareto distribution became famous due to its capability to
fit in and model well real-world related problems
| The Pareto rule (or principle), aka the 80/20 rule, has been used to exemplify clearly that
phenomena of all sorts are running far from the Normal distribution.
• It is clear that the normal is not being Normal!
13
Some Formalities
 A non-negative random variable X, either continuous or discrete,
can be considered a Power-Law distribution if it follows
| 𝑃[𝑋 ≥ 𝑥] ∼ 𝑐𝑥−𝛼
| where c and 𝛼 are the constant parameters that characterize the distribution.
• 𝛼 is known as the scale parameter. Both constants are positive.
 Heavy-tailedness
| The tail of a function 𝐹(𝑥, ∞) is denoted by 𝐹 = 𝑃(𝑋 > 𝑥), where F is the distribution function
of a random variable X.
• F is (right) heavy-tailed if 𝐸 𝑒 𝜆𝑋
= ∞, for all λ > 0.
• The distribution is light-tailed when 𝐸 𝑒 𝜆𝑋
< ∞.
14
Some Formalities
 Long-tailedness
 𝐹 (the survival function) is long-tailed when
| lim
𝑥→∞
𝐹(𝑥+𝜆)
𝐹(𝑥)
= 1, 𝑓𝑜𝑟 𝑎𝑙𝑙 𝜆 > 0
• 𝐹 is a non-increasing function, so it converges to 1.
| Considering that the tail of 𝐹 has a polynomial decay rate −𝛼 (i.e., the tail index), its 𝑘 𝑡ℎ
moments are infinite for all 𝑘 > 𝛼.
 The Pareto case
| One interesting property of a Power-Law distribution is that if you take the logarithmic scale
plot (i.e., log-log) of the CCDF – in a rank plot - it should present a straight line
| Its density function is given by:
• 𝑝 𝑥 = 𝛼𝑘 𝛼 𝑥−𝛼−1
15
Some Formalities
 Pareto shows interesting features
| If 𝛼 ≤ 1, there is no first moment, i.e., its mean is infinite.
| In the case of if 0 < 𝛼 ≤ 2, its variance is also infinite (heavy-tail).
| A Pareto PDF is scale free
• In computer networking problems, it can capture self-similar behavior (aka fractal) in several layers
of the protocol stack
 A log-log view of the Pareto PDF reveals, as expected, a straight
line, as follows:
| ln 𝑝(𝑥) = −𝛼 − 1 ln 𝑥 + 𝛼 ln 𝑘 + ln 𝛼
 The second and third terms of the equation are constants.
| The relation between ln 𝑝(𝑥) and ln 𝑥 is linear, where −𝛼 − 1 is its slope.
• A simple approach for identifying the scale parameter is by means of linear regression.
16
17
Take-home lesson
 The fact is that some universal statistical practices and theories
do not hold if the data follows a heavy-tailed distribution
| The Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) do not hold when
dealing with heavy tailed distributions.
• This is due to the fact that their first or second moments are not finite, which is the fundamental
assumption that supports both LLN and CLT.
18
Evidences of Heavy-Tailedness in Computer Networks
19
Evidences of Heavy-tailedness
 Extreme events in nature occurs in both micro and macro scales
 a number of case studies and evidences of the occurrence of
extreme events
| nature (e.g., earthquakes, landslides, floods, droughts, storms)
| human-induced catastrophes (spills, nuclear accidents, dam ruptures, power outages)
| financial (e.g., wealth distribution. When the 0.1% richer has 50% of the world’s wealth)
| geo- and socio-political area (e.g., human fatalities in wars)
| online social network phenomenon (e.g., tweets like “the naked celebrity pics leak cracks
down the Internet”), which is known to causes spikes in traffic from time to time
 Extreme events in computer networking have been studied (by
measurements, modelling, and analyses) for decades
| Unfortunately, a number of network engineers and researchers still do not take such
phenomena carefully 20
Some Examples
 Power-Law distributions in Internet measurements
| web objects have a tight relation with long tails
• Images, Texts, Video, Embedded code
| modelling issues and implications for network planning and design (e.g., web caching architectures)
• Question like “What is the average size of web objects in the Internet?” should not be answered by calculating
the mean value!
 Recent Studies in mobile environments
| typical performance metrics follows heavy-tailed distributions
• main object sizes
• embedded object size
• number of embedded objects in one request
• embedded object inter-arrival time
• session duration
• interval between two consecutive requests (aka the reading time)
21
Some Examples
 Video Systems
| YouTube: The number views can be modelled well by Zipf, Weibull, or Gamma distributions
• Zipf-like distributions fit well this popularity metric in mobile environments
 Intriguing cases of heavy tailedness in the Internet are in the network layer
| strong evidences of heavy tailedness for the sampled IP addressed
| distributions of IP packets per aggregation are all following a Power-law distribution
• the number of packets per flow, unique address, or IP prefixes
 Internet connectivity at several levels of aggregation can be modeled with
heavy tail distributions
 P2P Systems
| video popularity
| session duration
| churn of peers
• user arrival and departure at/from the overlay network
| Different studies have reported different distributions (just be careful with the choice)
22
The Tale of Heavy Tails in Computer Networking
Stenio Fernandes
CIn/UFPE, Recife, Brazil
Carleton University - ARS Lab – May 2016

More Related Content

Viewers also liked

Computer networks--introduction computer-networking
Computer networks--introduction computer-networkingComputer networks--introduction computer-networking
Computer networks--introduction computer-networking
Olorunyomi Segun
 
Networking with Purpose - the Lincoln Hub: Dr Andrew West Vice Chancellor, Li...
Networking with Purpose - the Lincoln Hub: Dr Andrew West Vice Chancellor, Li...Networking with Purpose - the Lincoln Hub: Dr Andrew West Vice Chancellor, Li...
Networking with Purpose - the Lincoln Hub: Dr Andrew West Vice Chancellor, Li...
SmartNet
 
Intelligent Mobile Broadband
Intelligent Mobile BroadbandIntelligent Mobile Broadband
Intelligent Mobile Broadband
Continuous Computing
 
DPI R&D Service
DPI R&D ServiceDPI R&D Service
DPI R&D Service
Prodapt Solutions
 
Data analytics in computer networking
Data analytics in computer networkingData analytics in computer networking
Data analytics in computer networking
Stenio Fernandes
 
Long Tail Keyword Research - SMX Advanced London 2011
Long Tail Keyword Research - SMX Advanced London 2011Long Tail Keyword Research - SMX Advanced London 2011
Long Tail Keyword Research - SMX Advanced London 2011
Kevin Gibbons
 
240z Tail Light Enhancements
240z Tail Light Enhancements240z Tail Light Enhancements
240z Tail Light Enhancements
David Oroshnik
 
Measuring Private Cloud Resiliency
Measuring Private Cloud ResiliencyMeasuring Private Cloud Resiliency
Measuring Private Cloud Resiliency
Ixia
 
Globecom - MENS 2011 - Characterizing Signature Sets for Testing DPI Systems
Globecom - MENS 2011 - Characterizing Signature Sets for Testing DPI SystemsGlobecom - MENS 2011 - Characterizing Signature Sets for Testing DPI Systems
Globecom - MENS 2011 - Characterizing Signature Sets for Testing DPI Systems
Stenio Fernandes
 
Stability analysis of impulsive fractional differential systems with delay
Stability analysis of impulsive fractional differential systems with delayStability analysis of impulsive fractional differential systems with delay
Stability analysis of impulsive fractional differential systems with delay
Mostafa Shokrian Zeini
 
Nic solution strategy
Nic solution strategyNic solution strategy
Nic solution strategy
Prodapt Solutions
 
Robust Stability and Disturbance Analysis of a Class of Networked Control Sys...
Robust Stability and Disturbance Analysis of a Class of Networked Control Sys...Robust Stability and Disturbance Analysis of a Class of Networked Control Sys...
Robust Stability and Disturbance Analysis of a Class of Networked Control Sys...
Shadi Nabil Albarqouni
 
Numerical Simulations Of Basic Interfacial Instabilities With the Improved Tw...
Numerical Simulations Of Basic Interfacial Instabilities With the Improved Tw...Numerical Simulations Of Basic Interfacial Instabilities With the Improved Tw...
Numerical Simulations Of Basic Interfacial Instabilities With the Improved Tw...
Luka Štrubelj
 
Ch12
Ch12Ch12
Vineyard Networks Product Overview
Vineyard Networks Product OverviewVineyard Networks Product Overview
Vineyard Networks Product Overview
laurenjthomson
 
Traffic Management, DPI, Internet Offload Gateway
Traffic Management, DPI, Internet Offload GatewayTraffic Management, DPI, Internet Offload Gateway
Traffic Management, DPI, Internet Offload Gateway
Continuous Computing
 
Deep Packet Inspection (DPI) Test Methodology
Deep Packet Inspection (DPI) Test MethodologyDeep Packet Inspection (DPI) Test Methodology
Deep Packet Inspection (DPI) Test Methodology
Ixia
 
Best Ways of Using Moodle
Best Ways of Using MoodleBest Ways of Using Moodle
Best Ways of Using Moodle
Sandra Pires Coach
 
Network topology.ppt
Network topology.pptNetwork topology.ppt
Network topology.ppt
Siddique Ibrahim
 
Synchronization of multihop sensor networks in the app layer
Synchronization of multihop sensor networks in the app layerSynchronization of multihop sensor networks in the app layer
Synchronization of multihop sensor networks in the app layer
Vaishnavi
 

Viewers also liked (20)

Computer networks--introduction computer-networking
Computer networks--introduction computer-networkingComputer networks--introduction computer-networking
Computer networks--introduction computer-networking
 
Networking with Purpose - the Lincoln Hub: Dr Andrew West Vice Chancellor, Li...
Networking with Purpose - the Lincoln Hub: Dr Andrew West Vice Chancellor, Li...Networking with Purpose - the Lincoln Hub: Dr Andrew West Vice Chancellor, Li...
Networking with Purpose - the Lincoln Hub: Dr Andrew West Vice Chancellor, Li...
 
Intelligent Mobile Broadband
Intelligent Mobile BroadbandIntelligent Mobile Broadband
Intelligent Mobile Broadband
 
DPI R&D Service
DPI R&D ServiceDPI R&D Service
DPI R&D Service
 
Data analytics in computer networking
Data analytics in computer networkingData analytics in computer networking
Data analytics in computer networking
 
Long Tail Keyword Research - SMX Advanced London 2011
Long Tail Keyword Research - SMX Advanced London 2011Long Tail Keyword Research - SMX Advanced London 2011
Long Tail Keyword Research - SMX Advanced London 2011
 
240z Tail Light Enhancements
240z Tail Light Enhancements240z Tail Light Enhancements
240z Tail Light Enhancements
 
Measuring Private Cloud Resiliency
Measuring Private Cloud ResiliencyMeasuring Private Cloud Resiliency
Measuring Private Cloud Resiliency
 
Globecom - MENS 2011 - Characterizing Signature Sets for Testing DPI Systems
Globecom - MENS 2011 - Characterizing Signature Sets for Testing DPI SystemsGlobecom - MENS 2011 - Characterizing Signature Sets for Testing DPI Systems
Globecom - MENS 2011 - Characterizing Signature Sets for Testing DPI Systems
 
Stability analysis of impulsive fractional differential systems with delay
Stability analysis of impulsive fractional differential systems with delayStability analysis of impulsive fractional differential systems with delay
Stability analysis of impulsive fractional differential systems with delay
 
Nic solution strategy
Nic solution strategyNic solution strategy
Nic solution strategy
 
Robust Stability and Disturbance Analysis of a Class of Networked Control Sys...
Robust Stability and Disturbance Analysis of a Class of Networked Control Sys...Robust Stability and Disturbance Analysis of a Class of Networked Control Sys...
Robust Stability and Disturbance Analysis of a Class of Networked Control Sys...
 
Numerical Simulations Of Basic Interfacial Instabilities With the Improved Tw...
Numerical Simulations Of Basic Interfacial Instabilities With the Improved Tw...Numerical Simulations Of Basic Interfacial Instabilities With the Improved Tw...
Numerical Simulations Of Basic Interfacial Instabilities With the Improved Tw...
 
Ch12
Ch12Ch12
Ch12
 
Vineyard Networks Product Overview
Vineyard Networks Product OverviewVineyard Networks Product Overview
Vineyard Networks Product Overview
 
Traffic Management, DPI, Internet Offload Gateway
Traffic Management, DPI, Internet Offload GatewayTraffic Management, DPI, Internet Offload Gateway
Traffic Management, DPI, Internet Offload Gateway
 
Deep Packet Inspection (DPI) Test Methodology
Deep Packet Inspection (DPI) Test MethodologyDeep Packet Inspection (DPI) Test Methodology
Deep Packet Inspection (DPI) Test Methodology
 
Best Ways of Using Moodle
Best Ways of Using MoodleBest Ways of Using Moodle
Best Ways of Using Moodle
 
Network topology.ppt
Network topology.pptNetwork topology.ppt
Network topology.ppt
 
Synchronization of multihop sensor networks in the app layer
Synchronization of multihop sensor networks in the app layerSynchronization of multihop sensor networks in the app layer
Synchronization of multihop sensor networks in the app layer
 

Similar to The tale of heavy tails in computer networking

2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
The Statistical and Applied Mathematical Sciences Institute
 
Radcliffe
RadcliffeRadcliffe
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
Work-Bench
 
cs 601 - lecture 1.pptx
cs 601 - lecture 1.pptxcs 601 - lecture 1.pptx
cs 601 - lecture 1.pptx
GopalPatidar13
 
Decision tree
Decision treeDecision tree
Decision tree
RINUSATHYAN
 
DT.pptx
DT.pptxDT.pptx
DT.pptx
PrabhasShetty
 
03 presentation-bothiesson
03 presentation-bothiesson03 presentation-bothiesson
03 presentation-bothiesson
InfinIT - Innovationsnetværket for it
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
South West Data Meetup
 
Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted Dropout
Seunghyun Hwang
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
butest
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
butest
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
Sitamarhi Institute of Technology
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
Sitamarhi Institute of Technology
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
Sitamarhi Institute of Technology
 
Seminar nov2017
Seminar nov2017Seminar nov2017
Seminar nov2017
Ahmed Youssef Ali Amer
 
Automatic Visualization
Automatic VisualizationAutomatic Visualization
Automatic Visualization
Sri Ambati
 
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
theijes
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Maninda Edirisooriya
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
Datamining Tools
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
guest0edcaf
 

Similar to The tale of heavy tails in computer networking (20)

2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
 
Radcliffe
RadcliffeRadcliffe
Radcliffe
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
 
cs 601 - lecture 1.pptx
cs 601 - lecture 1.pptxcs 601 - lecture 1.pptx
cs 601 - lecture 1.pptx
 
Decision tree
Decision treeDecision tree
Decision tree
 
DT.pptx
DT.pptxDT.pptx
DT.pptx
 
03 presentation-bothiesson
03 presentation-bothiesson03 presentation-bothiesson
03 presentation-bothiesson
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted Dropout
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Seminar nov2017
Seminar nov2017Seminar nov2017
Seminar nov2017
 
Automatic Visualization
Automatic VisualizationAutomatic Visualization
Automatic Visualization
 
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 

More from Stenio Fernandes

SDN Dependability: Assessment, Techniques, and Tools - SDN Research Group - I...
SDN Dependability: Assessment, Techniques, and Tools - SDN Research Group - I...SDN Dependability: Assessment, Techniques, and Tools - SDN Research Group - I...
SDN Dependability: Assessment, Techniques, and Tools - SDN Research Group - I...
Stenio Fernandes
 
A brief history of streaming video in the Internet
A brief history of streaming video in the InternetA brief history of streaming video in the Internet
A brief history of streaming video in the Internet
Stenio Fernandes
 
Research Challenges and Opportunities in the Era of the Internet of Everythin...
Research Challenges and Opportunities in the Era of the Internet of Everythin...Research Challenges and Opportunities in the Era of the Internet of Everythin...
Research Challenges and Opportunities in the Era of the Internet of Everythin...
Stenio Fernandes
 
Orientações para a pós graduação - reunião semestral - orientandos - 2014.1
Orientações para a pós graduação - reunião semestral - orientandos - 2014.1Orientações para a pós graduação - reunião semestral - orientandos - 2014.1
Orientações para a pós graduação - reunião semestral - orientandos - 2014.1
Stenio Fernandes
 
IEEE ICC 2012 - Dependability Assessment of Virtualized Networks
 IEEE ICC 2012 - Dependability Assessment of Virtualized Networks IEEE ICC 2012 - Dependability Assessment of Virtualized Networks
IEEE ICC 2012 - Dependability Assessment of Virtualized Networks
Stenio Fernandes
 
Big Data Analytics and Advanced Computer Networking Scenarios
Big Data Analytics and Advanced Computer Networking ScenariosBig Data Analytics and Advanced Computer Networking Scenarios
Big Data Analytics and Advanced Computer Networking Scenarios
Stenio Fernandes
 
A referee's plea reviewed
A referee's plea reviewedA referee's plea reviewed
A referee's plea reviewed
Stenio Fernandes
 

More from Stenio Fernandes (7)

SDN Dependability: Assessment, Techniques, and Tools - SDN Research Group - I...
SDN Dependability: Assessment, Techniques, and Tools - SDN Research Group - I...SDN Dependability: Assessment, Techniques, and Tools - SDN Research Group - I...
SDN Dependability: Assessment, Techniques, and Tools - SDN Research Group - I...
 
A brief history of streaming video in the Internet
A brief history of streaming video in the InternetA brief history of streaming video in the Internet
A brief history of streaming video in the Internet
 
Research Challenges and Opportunities in the Era of the Internet of Everythin...
Research Challenges and Opportunities in the Era of the Internet of Everythin...Research Challenges and Opportunities in the Era of the Internet of Everythin...
Research Challenges and Opportunities in the Era of the Internet of Everythin...
 
Orientações para a pós graduação - reunião semestral - orientandos - 2014.1
Orientações para a pós graduação - reunião semestral - orientandos - 2014.1Orientações para a pós graduação - reunião semestral - orientandos - 2014.1
Orientações para a pós graduação - reunião semestral - orientandos - 2014.1
 
IEEE ICC 2012 - Dependability Assessment of Virtualized Networks
 IEEE ICC 2012 - Dependability Assessment of Virtualized Networks IEEE ICC 2012 - Dependability Assessment of Virtualized Networks
IEEE ICC 2012 - Dependability Assessment of Virtualized Networks
 
Big Data Analytics and Advanced Computer Networking Scenarios
Big Data Analytics and Advanced Computer Networking ScenariosBig Data Analytics and Advanced Computer Networking Scenarios
Big Data Analytics and Advanced Computer Networking Scenarios
 
A referee's plea reviewed
A referee's plea reviewedA referee's plea reviewed
A referee's plea reviewed
 

Recently uploaded

一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
harshapolam10
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
ijaia
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
Paris Salesforce Developer Group
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
MadhavJungKarki
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
mahaffeycheryld
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
Engineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdfEngineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdf
edwin408357
 
Gas agency management system project report.pdf
Gas agency management system project report.pdfGas agency management system project report.pdf
Gas agency management system project report.pdf
Kamal Acharya
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
Yasser Mahgoub
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
bjmsejournal
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
bijceesjournal
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
upoux
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
PKavitha10
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 

Recently uploaded (20)

一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
Engineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdfEngineering Standards Wiring methods.pdf
Engineering Standards Wiring methods.pdf
 
Gas agency management system project report.pdf
Gas agency management system project report.pdfGas agency management system project report.pdf
Gas agency management system project report.pdf
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 

The tale of heavy tails in computer networking

  • 1. The Tale of Heavy Tails in Computer Networking Stenio Fernandes CIn/UFPE, Recife, Brazil Carleton University - ARS Lab – May 2016
  • 2. Outline  Essential Concepts and Terminology | The heavy-tail phenomenon | Outliers detection | Heavy-tailed distributions and its variations (subclasses)  Evidences of Heavy-Tailedness in Computer Networks | Examples 2
  • 3. Essential Concepts and Terminology 3
  • 4. The heavy-tail phenomenon 4 • Heavy-tailedness in computer networking is like Ninjas, they’re everywhere Internet meme • Extreme observations must be taken carefully and very seriously • Dataset that exhibits very large observation values makes descriptive and inferential statistical analysis much more difficult • It might not make sense to use traditional statistical techniques and tools in these cases • Some important initial questions: • Are we confident to discard single, scattered, or burst of observations that presents extreme values due to uncontrolled factors? • Are the extreme values come from valid measurements?
  • 5. Statistical black sheep  This is what I call Statistical Black Sheep | the ones that causes shame or embarrassment because of deviation from the accepted standards of his or her group (Black Sheep definition on M-W)  You can either keep or discard such measurement values based on subjective analysis | It is out of scope of your interest • Ex.: mean value of cat videos length on YouTube  Take-home lesson: | do not disgrace the black sheep without proper reasons  Decision can also be made based on rigorous statistical analysis | a quantitative analysis | Recall that an outlier might be influential on regression modeling (more on that later) 5
  • 6. It starts with outliers Here is an Outlier 6  Here is another one!
  • 7. Outliers  An observation can be considered as outlier if it falls below or above certain limits | detection is only an indication that you might want to think carefully about them  There are a number of formal tests and rules of thumbs to detect outliers in an observation variable 1. Grubbs’ 2. Tietjen-Moore’s 3. Mahalanobis distance 4. Extreme Value Theory (EVT) 5. Generalized Extreme Studentized Deviate (ESD)  Try to not be so picky when choosing the method | simply because outlier detection and handling is an art | a subjective approach plays an important role to accommodate outliers in your analysis 7
  • 9. Outliers  Kurtosis: concrete idea about the expected number of outliers | High (strong skewness) or low (weak skewness)  A general approach for outlier detection | identify values apart from the central values( in terms of 𝜎) | A common and simple approach • define the fences as 𝜇 ± 3 × 𝜎 • 𝜇 is the sample mean • 𝜎 is the sample standard deviation (more conservative: use 4 𝜎) 9
  • 10. Outliers and Heavy-Tails  Verify if there are lots of observations outside the fences | This might be indicating that the underlying phenomenon generates heavy-tailed data • Your black sheep metamorphoses into a black swan  If extreme values come from distributions with heavy tails • Weibull, Gamma, Pareto | Such events are not so rare | They are likely to be part of the underlying phenomenon  If you decided to keep the outliers | You recognized them as part of the underlying data generation process | You need to address them properly  Why do we need to use other statistical measures when dealing with heavy-tailed distributions? 10
  • 11. Moments from Heavy-Tails Distributions 11
  • 12. More on Heavy-Tails  Classification | Light or Thin tail | Fat or Heavy tail | Long tail  Light and thin tailed distributions are always used as references | Normal and Exponential distributions | Definition: A probability distribution that has an exponentially decaying complementary CDF  Heavy-tailed distributions are the general ones | most formal analysis of heavy tailed distributions indeed deals with right heavy tailed distributions with [0, ∞] support | the term fat tail is not well accepted by the traditional (and more formal) communities of statisticians and mathematicians, although is widely used in the finance one 12
  • 13. Some Formal Stuff  Some intuition behind the concepts | Power-Law is a relation between two variables in a 𝑝 𝑥 ∝ 𝑐𝑓(𝑥) form, where 𝑓 𝑥 takes a general form of 𝑥−𝛼 | There are dozens of power-law distributions • Zipf and Pareto are the most well-known ones in the computer networking field | They have interesting mathematical properties, such as the tails fall asymptotically according to the power parameter  The Pareto distribution became famous due to its capability to fit in and model well real-world related problems | The Pareto rule (or principle), aka the 80/20 rule, has been used to exemplify clearly that phenomena of all sorts are running far from the Normal distribution. • It is clear that the normal is not being Normal! 13
  • 14. Some Formalities  A non-negative random variable X, either continuous or discrete, can be considered a Power-Law distribution if it follows | 𝑃[𝑋 ≥ 𝑥] ∼ 𝑐𝑥−𝛼 | where c and 𝛼 are the constant parameters that characterize the distribution. • 𝛼 is known as the scale parameter. Both constants are positive.  Heavy-tailedness | The tail of a function 𝐹(𝑥, ∞) is denoted by 𝐹 = 𝑃(𝑋 > 𝑥), where F is the distribution function of a random variable X. • F is (right) heavy-tailed if 𝐸 𝑒 𝜆𝑋 = ∞, for all λ > 0. • The distribution is light-tailed when 𝐸 𝑒 𝜆𝑋 < ∞. 14
  • 15. Some Formalities  Long-tailedness  𝐹 (the survival function) is long-tailed when | lim 𝑥→∞ 𝐹(𝑥+𝜆) 𝐹(𝑥) = 1, 𝑓𝑜𝑟 𝑎𝑙𝑙 𝜆 > 0 • 𝐹 is a non-increasing function, so it converges to 1. | Considering that the tail of 𝐹 has a polynomial decay rate −𝛼 (i.e., the tail index), its 𝑘 𝑡ℎ moments are infinite for all 𝑘 > 𝛼.  The Pareto case | One interesting property of a Power-Law distribution is that if you take the logarithmic scale plot (i.e., log-log) of the CCDF – in a rank plot - it should present a straight line | Its density function is given by: • 𝑝 𝑥 = 𝛼𝑘 𝛼 𝑥−𝛼−1 15
  • 16. Some Formalities  Pareto shows interesting features | If 𝛼 ≤ 1, there is no first moment, i.e., its mean is infinite. | In the case of if 0 < 𝛼 ≤ 2, its variance is also infinite (heavy-tail). | A Pareto PDF is scale free • In computer networking problems, it can capture self-similar behavior (aka fractal) in several layers of the protocol stack  A log-log view of the Pareto PDF reveals, as expected, a straight line, as follows: | ln 𝑝(𝑥) = −𝛼 − 1 ln 𝑥 + 𝛼 ln 𝑘 + ln 𝛼  The second and third terms of the equation are constants. | The relation between ln 𝑝(𝑥) and ln 𝑥 is linear, where −𝛼 − 1 is its slope. • A simple approach for identifying the scale parameter is by means of linear regression. 16
  • 17. 17
  • 18. Take-home lesson  The fact is that some universal statistical practices and theories do not hold if the data follows a heavy-tailed distribution | The Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) do not hold when dealing with heavy tailed distributions. • This is due to the fact that their first or second moments are not finite, which is the fundamental assumption that supports both LLN and CLT. 18
  • 19. Evidences of Heavy-Tailedness in Computer Networks 19
  • 20. Evidences of Heavy-tailedness  Extreme events in nature occurs in both micro and macro scales  a number of case studies and evidences of the occurrence of extreme events | nature (e.g., earthquakes, landslides, floods, droughts, storms) | human-induced catastrophes (spills, nuclear accidents, dam ruptures, power outages) | financial (e.g., wealth distribution. When the 0.1% richer has 50% of the world’s wealth) | geo- and socio-political area (e.g., human fatalities in wars) | online social network phenomenon (e.g., tweets like “the naked celebrity pics leak cracks down the Internet”), which is known to causes spikes in traffic from time to time  Extreme events in computer networking have been studied (by measurements, modelling, and analyses) for decades | Unfortunately, a number of network engineers and researchers still do not take such phenomena carefully 20
  • 21. Some Examples  Power-Law distributions in Internet measurements | web objects have a tight relation with long tails • Images, Texts, Video, Embedded code | modelling issues and implications for network planning and design (e.g., web caching architectures) • Question like “What is the average size of web objects in the Internet?” should not be answered by calculating the mean value!  Recent Studies in mobile environments | typical performance metrics follows heavy-tailed distributions • main object sizes • embedded object size • number of embedded objects in one request • embedded object inter-arrival time • session duration • interval between two consecutive requests (aka the reading time) 21
  • 22. Some Examples  Video Systems | YouTube: The number views can be modelled well by Zipf, Weibull, or Gamma distributions • Zipf-like distributions fit well this popularity metric in mobile environments  Intriguing cases of heavy tailedness in the Internet are in the network layer | strong evidences of heavy tailedness for the sampled IP addressed | distributions of IP packets per aggregation are all following a Power-law distribution • the number of packets per flow, unique address, or IP prefixes  Internet connectivity at several levels of aggregation can be modeled with heavy tail distributions  P2P Systems | video popularity | session duration | churn of peers • user arrival and departure at/from the overlay network | Different studies have reported different distributions (just be careful with the choice) 22
  • 23. The Tale of Heavy Tails in Computer Networking Stenio Fernandes CIn/UFPE, Recife, Brazil Carleton University - ARS Lab – May 2016