Data analytics in computer networking

Data Analytics in Computer
Networking
The Case for Exploratory Data Analysis
Stenio Fernandes
Carleton University / CIn-UFPE
March 2016

Outline
Data Analysis - background
EDA basics
Applied EDA (Examples: WiFi simulated data)
Q&A
References

Data Science Pipeline
•Analytic Data
•Analytic Code
•Documentation
•Distribution
•ElementsofReproducibleResearch
Report Writing for Data Science in R, Roger D. Peng, 2016

1. Stating and refining the question
2. Exploring the data
3. Building formal statistical models
4. Interpreting the results
5. Communicating the results
Epicycle of Analysis
The Art of Data Science, A Guide for Anyone Who Works with Data, Roger D. Peng and Elizabeth Matsui, 2016

• summarize the measurements in a
single data set without further
interpretation
•Descriptive
• Searching for discoveries, trends,
correlations, or relationships
between multiple variables to
generate ideas or hypotheses
Exploratory
• quantifying whether an observed
pattern will likely hold beyond the
data set in hand
Inferential
• uses a subset of measurements (the
features) to predict another
measurement (the outcome)
Predictive
• what happens to one measurement
if you make another measurement
change
Causal
• changing one measurement always
and exclusively leads to a specific,
deterministic behavior in another
Deterministic
The Elements of Data
Analytic Style, A guide for
people who want to analyze
data, Jeff Leek, 2015

Why use EDA - Summary
• Maximize insight into a data set
• Uncover underlying structure
• Extract important variables
• Detect outliers and anomalies
• Test underlying assumptions
• Develop parsimonious models
• Determine optimal factor
settings
•NIST
• Show comparisons
• Show causality, mechanism,
explanation
• Show multivariate data
• Integrate multiple modes of
evidence
• Describe and document the
evidence
• Content is king
•JHUniversity

Answer to initial questions
What is a typical value for a certain feature?
What is the uncertainty for a typical value of a
feature?
What is a good distributional fit for a feature?
What is the percentile distribution?
Does modification on one variable have an
effect another variable?
Does a factor have an effect on performance
metrics?
What are the most important factors?
What is the best function for relating a
response variable to other variables?
What are the best settings for factors
(i.e. levels)?
Can we separate signal from noise?
Can we extract any structure from multivariate
data?
Does the data have outliers?

EDA Graphs
Understand data properties Find patterns in data
Suggest modeling strategies Debug analyses

Applied EDA
Using R/ggplot2
(mpg dataset) -> fake wifi dataset

Practical Steps
•Before performing any measurements or simulation
• Identify
• Performance Metrics
• Performance Factors and Levels
• Caution: sometimes you have to guess the ranges for the levels
• Use an educated guess
Don’t run tons of simulations / experiments (As previously discussed)
Plot quick and dirty graphs
• No need for titles, labels

Some examples of EDA Graphs - WiFi Data (simulated)
• “Vendor” - factor / levels: LinkSys, …
• “Model“ – factor / Levels: GST200, …
• "Users_Max_Rate“ - factor (background traffic) /
levels: 1.6, 1.8,…,7.0 Mbps
• "Year“ – factor / Levels: 1999, 2008
• "BER“ – factor / Levels: 4, 5, 6, and 8
• "Type“ – factor (type of user) / Levels: 4, f, r
• Rate – performance metric (Mbps)
• Distance - factor (distance from the AP) / “Levels:
50,100m
Features
(Observation
Variables)

References
• NIST’s Handbook of Statistics Engineering (online)
• Report Writing for Data Science in R, Roger D. Peng, 2016
• The Art of Data Science, A Guide for Anyone Who Works with Data, Roger D.
Peng and Elizabeth Matsui, 2016
• The Elements of Data Analytic Style, A guide for people who want to analyze
data, Jeff Leek, 2015

Data analytics in computer networking

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Data analytics in computer networking

Similar to Data analytics in computer networking (20)

More from Stenio Fernandes

More from Stenio Fernandes (6)

Recently uploaded

Recently uploaded (20)

Data analytics in computer networking

Editor's Notes