SlideShare a Scribd company logo
Learning Deep
Broadband Network@HOME
Hongjoo LEE
Who am I?
● Machine Learning Engineer
○ Fraud Detection System
○ Software Defect Prediction
● Software Engineer
○ Email Services (40+ mil. users)
○ High traffic server (IPC, network, concurrent programming)
● MPhil, HKUST
○ Major : Software Engineering based on ML tech
○ Research interests : ML, NLP, IR
Data Collection Time series Analysis Forecast Modeling Anomaly Detection
Naive approach
Logging SpeedTest
Data preparation
Handling time series
Seasonal Trend Decomposition Rolling Forecast Basic approaches
Autoregression, Moving Average
ARIMA Multivariate Gaussian
Home Network
Home Network
Home Network
Anomaly Detection (Naive approach in 2015)
Problem definition
● Detect abnormal states of Home Network
● Anomaly detection for time series
○ Finding outlier data points relative to some usual signal
Types of anomalies in time series
● Additive outliers
Types of anomalies in time series
● Temporal changes
Types of anomalies in time series
● Level shift
Data Collection Time series Analysis Forecast Modeling Anomaly Detection
Naive approach
Logging SpeedTest
Data preparation
Handling time series
Seasonal Trend Decomposition Rolling Forecast Basic approaches
Autoregression, Moving Average
ARIMA Multivariate Gaussian
Logging Data
● Speedtest-cli
● Every 5 minutes for 3 Month. ⇒ 20k observations.
$ speedtest-cli --simple
Ping: 35.811 ms
Download: 68.08 Mbit/s
Upload: 19.43 Mbit/s
$ crontab -l
*/5 * * * * echo ‘>>> ‘$(date) >> $LOGFILE; speedtest-cli --simple >> $LOGFILE
Logging Data
● Log output
$ more $LOGFILE
>>> Thu Apr 13 10:35:01 KST 2017
Ping: 42.978 ms
Download: 47.61 Mbit/s
Upload: 18.97 Mbit/s
>>> Thu Apr 13 10:40:01 KST 2017
Ping: 103.57 ms
Download: 33.11 Mbit/s
Upload: 18.95 Mbit/s
>>> Thu Apr 13 10:45:01 KST 2017
Ping: 47.668 ms
Download: 54.14 Mbit/s
Upload: 4.01 Mbit/s
Data preparation
● Parse data
class SpeedTest(object):
def __init__(self, string):
self.__string = string
self.__pos = 0
self.datetime = None# for DatetimeIndex = None # ping test in ms = None# down speed in Mbit/sec
self.upload = None # up speed in Mbit/sec
def __iter__(self):
return self
def next(self):
Data preparation
● Build panda DataFrame
speedtests = [st for st in SpeedTests(logstring)]
dt_index = pd.date_range(
speedtests[0].datetime.replace(second=0, microsecond=0),
periods=len(speedtests), freq='5min')
df = pd.DataFrame(index=dt_index,
data=([,, st.upload] for st in speedtests),
Data preparation
● Plot raw data
Data preparation
● Structural breaks
○ Accidental missings for a long period
Data preparation
● Handling missing data
○ Only a few occasional cases
Handling time series
● By DatetimeIndex
○ df[‘2017-04’:’2017-06’]
○ df[‘2017-04’:]
○ df[‘2017-04-01 00:00:00’:]
○ df[df.index.weekday_name == ‘Monday’]
○ df[df.index.minute == 0]
● By TimeGrouper
○ df.groupby(pd.TimeGrouper(‘D’))
○ df.groupby(pd.TimeGrouper(‘M’))
Patterns in time series
● Is there a pattern in 24 hours?
Patterns in time series
● Is there a daily pattern?
Components of Time series data
● Trend :The increasing or decreasing direction in the series.
● Seasonality : The repeating in a period in the series.
● Noise : The random variation in the series.
Components of Time series data
● A time series is a combination of these components.
○ yt
= Tt
+ St
+ Nt
(additive model)
○ yt
= Tt
× St
× Nt
(multiplicative model)
Seasonal Trend Decomposition
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(week_dn_ts)
plt.plot(week_dn_ts) # Original
Rolling Forecast
Rolling Forecast
from statsmodels.tsa.arima_model import ARIMA
forecasts = list()
history = [x for x in train_X]
for t in range(len(test_X)): # for each new observation
model = ARIMA(history, order=order) # update the model
y_hat = # forecast one step ahead
forecasts.append(y_hat) # store predictions
history.append(test_X[t]) # keep history updated
Residuals ~ N( , 2
residuals = [test[t] - forecasts[t] for t in range(len(test_X))]
residuals = pd.DataFrame(residuals)
Anomaly Detection (Basic approach)
● IQR (Inter Quartile Range)
● 2-5 Standard Deviation
● MAD (Median Absolute Deviation)
Anomaly Detection (Naive approach)
● Inter Quartile Range
Anomaly Detection (Naive approach)
● Inter Quartile Range
○ NumPy
○ Pandas
q1, q3 = np.percentile(col, [25, 75])
iqr = q3 - q1
np.where((col < q1 - 1.5*iqr) | (col > q3 + 1.5*iqr))
q1 = df[‘col’].quantile(.25)
q3 = df[‘col’].quantile(.75)
iqr = q3 - q1
df.loc[~df[‘col’].between(q1-1.5*iqr, q3+1.5*iqr),’col’]
Anomaly Detection (Naive approach)
● 2-5 Standard Deviation
Anomaly Detection (Naive approach)
● 2-5 Standard Deviation
○ NumPy
○ Pandas
std = pd[‘col’].std()
med = pd[‘col’].median()
df.loc[~df[‘col’].between(med - 3*std, med + 3*std), 0]
std = np.std(col)
med = np.median(col)
np.where((col < med - 3*std) | (col < med + 3*std))
Anomaly Detection (Naive approach)
● MAD (Median Absolute Deviation)
○ MAD = median(|Xi
- median(X)|)
○ “Detecting outliers: Do not use standard deviation around the mean, use absolute deviation
around the median” - Christopher Leys (2013)
Data Collection Time series Analysis Forecast Modeling Anomaly Detection
Naive approach
Logging SpeedTest
Data preparation
Handling time series
Seasonal Trend Decomposition Rolling Forecast Basic approaches
Autoregression, Moving Average
ARIMA Multivariate Gaussian
Stationary Series Criterion
● The mean, variance and covariance of the series are time invariant.
stationary non-stationary
Stationary Series Criterion
● The mean, variance and covariance of the series are time invariant.
stationary non-stationary
Stationary Series Criterion
● The mean, variance and covariance of the series are time invariant.
stationary non-stationary
Test Stationarity
● A non-stationary series can be made stationary after differencing.
● Instead of modelling the level, we model the change
● Instead of forecasting the level, we forecast the change
● I(d) = yt
- yt-d
● AR + I + MA
Autoregression (AR)
● Autoregression means developing a linear model that uses observations at
previous time steps to predict observations at future time step.
● Because the regression model uses data from the same input variable at
previous time steps, it is referred to as an autoregression
Moving Average (MA)
● MA models look similar to the AR component, but it's dealing with different
● The model account for the possibility of a relationship between a variable
and the residuals from previous periods.
ARIMA(p, d, q)
● Autoregressive Integrated Moving Average
○ AR : A model that uses dependent relationship between an observation and some number of
lagged observations.
○ I : The use of differencing of raw observations in order to make the time series stationary.
○ MA : A model that uses the dependency between an observation and a residual error from a
MA model.
● parameters of ARIMA model
○ p : The number of lag observations included in the model
○ d : the degree of differencing, the number of times that raw observations are differenced
○ q : The size of moving average window.
Identification of ARIMA
● Autocorrelation function(ACF) : measured by a simple correlation between
current observation Yt
and the observation p lags from the current one Yt-p
● Partial Autocorrelation Function (PACF) : measured by the degree of
association between Yt
and Yt-p
when the effects at other intermediate time
lags between Yt
and Yt-p
are removed.
● Inference from ACF and PACF : theoretical ACFs and PACFs are available for
various values of the lags of AR and MA components. Therefore, plotting
ACFs and PACFs versus lags and comparing leads to the selection of the
appropriate parameter p and q for ARIMA model
Identification of ARIMA (easy case)
● General characteristics of theoretical ACFs and PACFs
● Reference :
○ Prof. Robert Nau
model ACF PACF
AR(p) Tail off; Spikes decay towards zero Spikes cutoff to zero after lag p
MA(q) Spikes cutoff to zero after lag q Tails off; Spikes decay towards zero
ARMA(p,q) Tails off; Spikes decay towards zero Tails off; Spikes decay towards zero
Identification of ARIMA (easy case)
Identification of ARIMA (complicated)
Anomaly Detection (Parameter Estimation)
Anomaly Detection (Multivariate Gaussian Distribution)
Anomaly Detection (Multivariate Gaussian)
import numpy as np
from scipy.stats import multivariate_normal
def estimate_gaussian(dataset):
mu = np.mean(dataset, axis=0)
sigma = np.cov(dataset.T)
return mu, sigma
def multivariate_gaussian(dataset, mu, sigma):
p = multivariate_normal(mean=mu, cov=sigma)
return p.pdf(dataset)
mu, sigma = estimate_gaussian(train_X)
p = multivariate_gaussian(train_X, mu, sigma)
anomalies = np.where(p < ep) # ep : threshold
Anomaly Detection (Multivariate Gaussian)
import numpy as np
from scipy.stats import multivariate_normal
def estimate_gaussian(dataset):
mu = np.mean(dataset, axis=0)
sigma = np.cov(dataset.T)
return mu, sigma
def multivariate_gaussian(dataset, mu, sigma):
p = multivariate_normal(mean=mu, cov=sigma)
return p.pdf(dataset)
mu, sigma = estimate_gaussian(train_X)
p = multivariate_gaussian(train_X, mu, sigma)
anomalies = np.where(p < ep) # ep : threshold
Anomaly Detection (Multivariate Gaussian)
import numpy as np
from scipy.stats import multivariate_normal
def estimate_gaussian(dataset):
mu = np.mean(dataset, axis=0)
sigma = np.cov(dataset.T)
return mu, sigma
def multivariate_gaussian(dataset, mu, sigma):
p = multivariate_normal(mean=mu, cov=sigma)
return p.pdf(dataset)
mu, sigma = estimate_gaussian(train_X)
p = multivariate_gaussian(train_X, mu, sigma)
anomalies = np.where(p < ep) # ep : threshold
Data Collection Time series Analysis Forecast Modeling Anomaly Detection
Naive approach
Logging SpeedTest
Data preparation
Handling time series
Seasonal Trend Decomposition Rolling Forecast Basic approaches
Autoregression, Moving Average
ARIMA Multivariate Gaussian
Long Short-Term Memory
LSTM layer
Long Short-Term Memory
LSTM layer
Long Short-Term Memory
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.metrics import mean_squared_error
model = Sequential()
model.add(LSTM(num_neurons, stateful=True, return_sequences=True,
batch_input_shape=(batch_size, timesteps, input_dimension))
model.add(LSTM(num_neurons, stateful=True,
batch_input_shape=(batch_size, timesteps, input_dimension))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(num_epoch):, y, epochs=1, batch_size=batch_size, shuffle=False)
Long Short-Term Memory
● Will allow to model sophisticated and seasonal dependencies in time series
● Very helpful with multiple time series
● On going research, requires a lot of work to build model for time series
● Be prepared before calling engineers for service failures
● Pythonista has all the powerful tools
○ pandas is great for handling time series
○ statsmodels for analyzing and modeling time series
○ sklearn is such a multi-tool in data science
○ keras is good to start deep learning
● Pythonista needs to understand a few concepts before using the tools
○ Stationarity in time series
○ Autoregressive and Moving Average
○ Means of forecasting, anomaly detection
● Deep Learning for forecasting time series
○ still on-going research
● Do try this at home

More Related Content

What's hot

Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data Streams
Albert Bifet
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
Kenta Oono
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data Science
Albert Bifet
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
Yoonho Lee
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
Kenta Oono
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsMining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Albert Bifet
Recursive algorithms
Recursive algorithmsRecursive algorithms
Recursive algorithms
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
Mila, Université de Montréal
Scalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven ApplicationsScalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven Applications
Holistic Benchmarking of Big Linked Data
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.
Albert Bifet
Scaling out logistic regression with Spark
Scaling out logistic regression with SparkScaling out logistic regression with Spark
Scaling out logistic regression with Spark
Barak Gitsis
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
DB Tsai
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationHidekazu Oiwa
Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)
Shiang-Yun Yang
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitShiladitya Sen
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
DB Tsai
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya

What's hot (20)

Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data Streams
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data Science
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsMining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Recursive algorithms
Recursive algorithmsRecursive algorithms
Recursive algorithms
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
02 analysis
02 analysis02 analysis
02 analysis
Scalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven ApplicationsScalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven Applications
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.
Scaling out logistic regression with Spark
Scaling out logistic regression with SparkScaling out logistic regression with Spark
Scaling out logistic regression with Spark
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)Aaex4 group2(中英夾雜)
Aaex4 group2(中英夾雜)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal Wabbit
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)

Viewers also liked

Mock it right! A beginner’s guide to world of tests and mocks, Maciej Polańczyk
Mock it right! A beginner’s guide to world of tests and mocks, Maciej PolańczykMock it right! A beginner’s guide to world of tests and mocks, Maciej Polańczyk
Mock it right! A beginner’s guide to world of tests and mocks, Maciej Polańczyk
Pôle Systematic Paris-Region
How to apply deep learning to 3 d objects
How to apply deep learning to 3 d objectsHow to apply deep learning to 3 d objects
How to apply deep learning to 3 d objects
Ogushi Masaya
Type Annotations in Python: Whats, Whys and Wows!
Type Annotations in Python: Whats, Whys and Wows!Type Annotations in Python: Whats, Whys and Wows!
Type Annotations in Python: Whats, Whys and Wows!
Andreas Dewes
EuroPython 2017 - How to make money with your Python open-source project
EuroPython 2017 - How to make money with your Python open-source projectEuroPython 2017 - How to make money with your Python open-source project
EuroPython 2017 - How to make money with your Python open-source project
Max Tepkeev
OpenAPI development with Python
OpenAPI development with PythonOpenAPI development with Python
OpenAPI development with Python
Takuro Wada
Analytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation SlidesAnalytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation Slides
Fixing The Sales Forecast
Fixing The Sales ForecastFixing The Sales Forecast
Fixing The Sales Forecast
Systematically Improving Sales Forecast Accuracy
Systematically Improving Sales Forecast AccuracySystematically Improving Sales Forecast Accuracy
Systematically Improving Sales Forecast Accuracy
Inflexion-Point Strategy Partners
IDATE DigiWorld - FTTH global perspective 241017 - Roland Montagne
IDATE DigiWorld - FTTH global perspective 241017 - Roland MontagneIDATE DigiWorld - FTTH global perspective 241017 - Roland Montagne
IDATE DigiWorld - FTTH global perspective 241017 - Roland Montagne
IDATE DigiWorld
The Future of Social Networks on the Internet: The Need for Semantics
The Future of Social Networks on the Internet: The Need for SemanticsThe Future of Social Networks on the Internet: The Need for Semantics
The Future of Social Networks on the Internet: The Need for Semantics
John Breslin
DigiWorld Future Paris-Bernard Ourghanlian-CTO & CS0- Microsoft
DigiWorld Future Paris-Bernard Ourghanlian-CTO & CS0- MicrosoftDigiWorld Future Paris-Bernard Ourghanlian-CTO & CS0- Microsoft
DigiWorld Future Paris-Bernard Ourghanlian-CTO & CS0- Microsoft
IDATE DigiWorld
Predictive Analytics in Telecommunication
Predictive Analytics in TelecommunicationPredictive Analytics in Telecommunication
Predictive Analytics in Telecommunication
Rising Media Ltd.
Improving Forecast Accuracy
Improving Forecast AccuracyImproving Forecast Accuracy
Improving Forecast Accuracy
Onur Sezgin
IDATE DigiWorld - FTTH global perspective 241017 Gigabit VF - Roland Montagne
IDATE DigiWorld - FTTH global perspective 241017 Gigabit VF - Roland MontagneIDATE DigiWorld - FTTH global perspective 241017 Gigabit VF - Roland Montagne
IDATE DigiWorld - FTTH global perspective 241017 Gigabit VF - Roland Montagne
IDATE DigiWorld
A Test of B2B Sales Forecasting Methods
A Test of B2B Sales Forecasting MethodsA Test of B2B Sales Forecasting Methods
A Test of B2B Sales Forecasting Methods
The State of Broadband: Broadband catalyzing sustainable development. Septemb...
The State of Broadband: Broadband catalyzing sustainable development. Septemb...The State of Broadband: Broadband catalyzing sustainable development. Septemb...
The State of Broadband: Broadband catalyzing sustainable development. Septemb...
Andrés Rodríguez Seijo
Broadband 101 Feasibility Studies
Broadband 101 Feasibility StudiesBroadband 101 Feasibility Studies
Broadband 101 Feasibility Studies
Ann Treacy
IDATE DigiWorld -FTTx markets public - Roland MONTAGNE
IDATE DigiWorld -FTTx markets public - Roland MONTAGNEIDATE DigiWorld -FTTx markets public - Roland MONTAGNE
IDATE DigiWorld -FTTx markets public - Roland MONTAGNE
IDATE DigiWorld
Understanding RF Fundamentals and the Radio Design of Wireless Networks
Understanding RF Fundamentals and the Radio Design of Wireless NetworksUnderstanding RF Fundamentals and the Radio Design of Wireless Networks
Understanding RF Fundamentals and the Radio Design of Wireless Networks
Cisco Mobility

Viewers also liked (20)

Mock it right! A beginner’s guide to world of tests and mocks, Maciej Polańczyk
Mock it right! A beginner’s guide to world of tests and mocks, Maciej PolańczykMock it right! A beginner’s guide to world of tests and mocks, Maciej Polańczyk
Mock it right! A beginner’s guide to world of tests and mocks, Maciej Polańczyk
How to apply deep learning to 3 d objects
How to apply deep learning to 3 d objectsHow to apply deep learning to 3 d objects
How to apply deep learning to 3 d objects
Type Annotations in Python: Whats, Whys and Wows!
Type Annotations in Python: Whats, Whys and Wows!Type Annotations in Python: Whats, Whys and Wows!
Type Annotations in Python: Whats, Whys and Wows!
EuroPython 2017 - How to make money with your Python open-source project
EuroPython 2017 - How to make money with your Python open-source projectEuroPython 2017 - How to make money with your Python open-source project
EuroPython 2017 - How to make money with your Python open-source project
OpenAPI development with Python
OpenAPI development with PythonOpenAPI development with Python
OpenAPI development with Python
Analytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation SlidesAnalytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation Slides
Fixing The Sales Forecast
Fixing The Sales ForecastFixing The Sales Forecast
Fixing The Sales Forecast
Systematically Improving Sales Forecast Accuracy
Systematically Improving Sales Forecast AccuracySystematically Improving Sales Forecast Accuracy
Systematically Improving Sales Forecast Accuracy
IDATE DigiWorld - FTTH global perspective 241017 - Roland Montagne
IDATE DigiWorld - FTTH global perspective 241017 - Roland MontagneIDATE DigiWorld - FTTH global perspective 241017 - Roland Montagne
IDATE DigiWorld - FTTH global perspective 241017 - Roland Montagne
The Future of Social Networks on the Internet: The Need for Semantics
The Future of Social Networks on the Internet: The Need for SemanticsThe Future of Social Networks on the Internet: The Need for Semantics
The Future of Social Networks on the Internet: The Need for Semantics
Sales forecast
Sales forecastSales forecast
Sales forecast
DigiWorld Future Paris-Bernard Ourghanlian-CTO & CS0- Microsoft
DigiWorld Future Paris-Bernard Ourghanlian-CTO & CS0- MicrosoftDigiWorld Future Paris-Bernard Ourghanlian-CTO & CS0- Microsoft
DigiWorld Future Paris-Bernard Ourghanlian-CTO & CS0- Microsoft
Predictive Analytics in Telecommunication
Predictive Analytics in TelecommunicationPredictive Analytics in Telecommunication
Predictive Analytics in Telecommunication
Improving Forecast Accuracy
Improving Forecast AccuracyImproving Forecast Accuracy
Improving Forecast Accuracy
IDATE DigiWorld - FTTH global perspective 241017 Gigabit VF - Roland Montagne
IDATE DigiWorld - FTTH global perspective 241017 Gigabit VF - Roland MontagneIDATE DigiWorld - FTTH global perspective 241017 Gigabit VF - Roland Montagne
IDATE DigiWorld - FTTH global perspective 241017 Gigabit VF - Roland Montagne
A Test of B2B Sales Forecasting Methods
A Test of B2B Sales Forecasting MethodsA Test of B2B Sales Forecasting Methods
A Test of B2B Sales Forecasting Methods
The State of Broadband: Broadband catalyzing sustainable development. Septemb...
The State of Broadband: Broadband catalyzing sustainable development. Septemb...The State of Broadband: Broadband catalyzing sustainable development. Septemb...
The State of Broadband: Broadband catalyzing sustainable development. Septemb...
Broadband 101 Feasibility Studies
Broadband 101 Feasibility StudiesBroadband 101 Feasibility Studies
Broadband 101 Feasibility Studies
IDATE DigiWorld -FTTx markets public - Roland MONTAGNE
IDATE DigiWorld -FTTx markets public - Roland MONTAGNEIDATE DigiWorld -FTTx markets public - Roland MONTAGNE
IDATE DigiWorld -FTTx markets public - Roland MONTAGNE
Understanding RF Fundamentals and the Radio Design of Wireless Networks
Understanding RF Fundamentals and the Radio Design of Wireless NetworksUnderstanding RF Fundamentals and the Radio Design of Wireless Networks
Understanding RF Fundamentals and the Radio Design of Wireless Networks

Similar to EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME

timeseries cheat sheet with example code for R
timeseries cheat sheet with example code for Rtimeseries cheat sheet with example code for R
timeseries cheat sheet with example code for R
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clusteringLiang Xie, PhD
Different Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLDifferent Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIML
GDG DevFest Xiamen 2017
GDG DevFest Xiamen 2017GDG DevFest Xiamen 2017
GDG DevFest Xiamen 2017
Taegyun Jeon
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
Paris Open Source Summit
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
Ivo Andreev
GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using Tenso...
GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using Tenso...GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using Tenso...
GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using Tenso...
Taegyun Jeon
TMPA-2017: Vellvm - Verifying the LLVM
TMPA-2017: Vellvm - Verifying the LLVMTMPA-2017: Vellvm - Verifying the LLVM
TMPA-2017: Vellvm - Verifying the LLVM
Iosif Itkin
Time Series Analysis: Challenge Kaggle with TensorFlow
Time Series Analysis: Challenge Kaggle with TensorFlowTime Series Analysis: Challenge Kaggle with TensorFlow
Time Series Analysis: Challenge Kaggle with TensorFlow
SeungHyun Jeon
ANN ARIMA Hybrid Models for Time Series Prediction
ANN ARIMA Hybrid Models for Time Series PredictionANN ARIMA Hybrid Models for Time Series Prediction
ANN ARIMA Hybrid Models for Time Series Prediction
M Baddar
20191107 breizh data_day
20191107 breizh data_day20191107 breizh data_day
20191107 breizh data_day
Jean-Charles Vialatte
Statistical inference for (Python) Data Analysis. An introduction.
Statistical inference for (Python) Data Analysis. An introduction.Statistical inference for (Python) Data Analysis. An introduction.
Statistical inference for (Python) Data Analysis. An introduction.
Piotr Milanowski
MSc Thesis Defense Presentation
MSc Thesis Defense PresentationMSc Thesis Defense Presentation
MSc Thesis Defense PresentationMostafa Elhoushi
DSJ_Unit I & II.pdf
DSJ_Unit I & II.pdfDSJ_Unit I & II.pdf
DSJ_Unit I & II.pdf
Data structure and algorithm using java
Data structure and algorithm using javaData structure and algorithm using java
Data structure and algorithm using java
Narayan Sau
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudData
Tracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First PrinciplesTracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First Principles
Pranav Bahl & Jonathan Stacks - Robust Automated Forecasting in Python and R
Pranav Bahl & Jonathan Stacks - Robust Automated Forecasting in Python and RPranav Bahl & Jonathan Stacks - Robust Automated Forecasting in Python and R
Pranav Bahl & Jonathan Stacks - Robust Automated Forecasting in Python and R
Unsupervised program synthesis
Unsupervised program synthesisUnsupervised program synthesis
Unsupervised program synthesis
Amrith Krishna

Similar to EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME (20)

timeseries cheat sheet with example code for R
timeseries cheat sheet with example code for Rtimeseries cheat sheet with example code for R
timeseries cheat sheet with example code for R
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clustering
Different Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLDifferent Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIML
GDG DevFest Xiamen 2017
GDG DevFest Xiamen 2017GDG DevFest Xiamen 2017
GDG DevFest Xiamen 2017
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using Tenso...
GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using Tenso...GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using Tenso...
GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using Tenso...
TMPA-2017: Vellvm - Verifying the LLVM
TMPA-2017: Vellvm - Verifying the LLVMTMPA-2017: Vellvm - Verifying the LLVM
TMPA-2017: Vellvm - Verifying the LLVM
Time Series Analysis: Challenge Kaggle with TensorFlow
Time Series Analysis: Challenge Kaggle with TensorFlowTime Series Analysis: Challenge Kaggle with TensorFlow
Time Series Analysis: Challenge Kaggle with TensorFlow
ANN ARIMA Hybrid Models for Time Series Prediction
ANN ARIMA Hybrid Models for Time Series PredictionANN ARIMA Hybrid Models for Time Series Prediction
ANN ARIMA Hybrid Models for Time Series Prediction
20191107 breizh data_day
20191107 breizh data_day20191107 breizh data_day
20191107 breizh data_day
Statistical inference for (Python) Data Analysis. An introduction.
Statistical inference for (Python) Data Analysis. An introduction.Statistical inference for (Python) Data Analysis. An introduction.
Statistical inference for (Python) Data Analysis. An introduction.
MSc Thesis Defense Presentation
MSc Thesis Defense PresentationMSc Thesis Defense Presentation
MSc Thesis Defense Presentation
DSJ_Unit I & II.pdf
DSJ_Unit I & II.pdfDSJ_Unit I & II.pdf
DSJ_Unit I & II.pdf
Data structure and algorithm using java
Data structure and algorithm using javaData structure and algorithm using java
Data structure and algorithm using java
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudData
Tracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First PrinciplesTracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First Principles
Pranav Bahl & Jonathan Stacks - Robust Automated Forecasting in Python and R
Pranav Bahl & Jonathan Stacks - Robust Automated Forecasting in Python and RPranav Bahl & Jonathan Stacks - Robust Automated Forecasting in Python and R
Pranav Bahl & Jonathan Stacks - Robust Automated Forecasting in Python and R
Unsupervised program synthesis
Unsupervised program synthesisUnsupervised program synthesis
Unsupervised program synthesis

Recently uploaded

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
enxupq Cheatsheet: automate your data workflows Cheatsheet: automate your data Cheatsheet: automate your data workflows Cheatsheet: automate your data workflows
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf

Recently uploaded (20)

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
一比一原版(YU毕业证)约克大学毕业证成绩单 Cheatsheet: automate your data workflows Cheatsheet: automate your data Cheatsheet: automate your data workflows Cheatsheet: automate your data workflows
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf

EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME

  • 2. Who am I? ● Machine Learning Engineer ○ Fraud Detection System ○ Software Defect Prediction ● Software Engineer ○ Email Services (40+ mil. users) ○ High traffic server (IPC, network, concurrent programming) ● MPhil, HKUST ○ Major : Software Engineering based on ML tech ○ Research interests : ML, NLP, IR
  • 3. Outline Data Collection Time series Analysis Forecast Modeling Anomaly Detection Naive approach Logging SpeedTest Data preparation Handling time series Seasonal Trend Decomposition Rolling Forecast Basic approaches Stationarity Autoregression, Moving Average Autocorrelation ARIMA Multivariate Gaussian LSTM
  • 7. Anomaly Detection (Naive approach in 2015)
  • 8. Problem definition ● Detect abnormal states of Home Network ● Anomaly detection for time series ○ Finding outlier data points relative to some usual signal
  • 9. Types of anomalies in time series ● Additive outliers
  • 10. Types of anomalies in time series ● Temporal changes
  • 11. Types of anomalies in time series ● Level shift
  • 12. Outline Data Collection Time series Analysis Forecast Modeling Anomaly Detection Naive approach Logging SpeedTest Data preparation Handling time series Seasonal Trend Decomposition Rolling Forecast Basic approaches Stationarity Autoregression, Moving Average Autocorrelation ARIMA Multivariate Gaussian LSTM
  • 13. Logging Data ● Speedtest-cli ● Every 5 minutes for 3 Month. ⇒ 20k observations. $ speedtest-cli --simple Ping: 35.811 ms Download: 68.08 Mbit/s Upload: 19.43 Mbit/s $ crontab -l */5 * * * * echo ‘>>> ‘$(date) >> $LOGFILE; speedtest-cli --simple >> $LOGFILE 2>&1
  • 14. Logging Data ● Log output $ more $LOGFILE >>> Thu Apr 13 10:35:01 KST 2017 Ping: 42.978 ms Download: 47.61 Mbit/s Upload: 18.97 Mbit/s >>> Thu Apr 13 10:40:01 KST 2017 Ping: 103.57 ms Download: 33.11 Mbit/s Upload: 18.95 Mbit/s >>> Thu Apr 13 10:45:01 KST 2017 Ping: 47.668 ms Download: 54.14 Mbit/s Upload: 4.01 Mbit/s
  • 15. Data preparation ● Parse data class SpeedTest(object): def __init__(self, string): self.__string = string self.__pos = 0 self.datetime = None# for DatetimeIndex = None # ping test in ms = None# down speed in Mbit/sec self.upload = None # up speed in Mbit/sec def __iter__(self): return self def next(self): …
  • 16. Data preparation ● Build panda DataFrame speedtests = [st for st in SpeedTests(logstring)] dt_index = pd.date_range( speedtests[0].datetime.replace(second=0, microsecond=0), periods=len(speedtests), freq='5min') df = pd.DataFrame(index=dt_index, data=([,, st.upload] for st in speedtests), columns=['ping','down','up'])
  • 18. Data preparation ● Structural breaks ○ Accidental missings for a long period
  • 19. Data preparation ● Handling missing data ○ Only a few occasional cases
  • 20. Handling time series ● By DatetimeIndex ○ df[‘2017-04’:’2017-06’] ○ df[‘2017-04’:] ○ df[‘2017-04-01 00:00:00’:] ○ df[df.index.weekday_name == ‘Monday’] ○ df[df.index.minute == 0] ● By TimeGrouper ○ df.groupby(pd.TimeGrouper(‘D’)) ○ df.groupby(pd.TimeGrouper(‘M’))
  • 21. Patterns in time series ● Is there a pattern in 24 hours?
  • 22. Patterns in time series ● Is there a daily pattern?
  • 23. Components of Time series data ● Trend :The increasing or decreasing direction in the series. ● Seasonality : The repeating in a period in the series. ● Noise : The random variation in the series.
  • 24. Components of Time series data ● A time series is a combination of these components. ○ yt = Tt + St + Nt (additive model) ○ yt = Tt × St × Nt (multiplicative model)
  • 25. Seasonal Trend Decomposition from statsmodels.tsa.seasonal import seasonal_decompose decomposition = seasonal_decompose(week_dn_ts) plt.plot(week_dn_ts) # Original plt.plot(decomposition.seasonal) plt.plot(decomposition.trend)
  • 27. Rolling Forecast from statsmodels.tsa.arima_model import ARIMA forecasts = list() history = [x for x in train_X] for t in range(len(test_X)): # for each new observation model = ARIMA(history, order=order) # update the model y_hat = # forecast one step ahead forecasts.append(y_hat) # store predictions history.append(test_X[t]) # keep history updated
  • 28. Residuals ~ N( , 2 ) residuals = [test[t] - forecasts[t] for t in range(len(test_X))] residuals = pd.DataFrame(residuals) residuals.plot(kind=’kde’)
  • 29. Anomaly Detection (Basic approach) ● IQR (Inter Quartile Range) ● 2-5 Standard Deviation ● MAD (Median Absolute Deviation)
  • 30. Anomaly Detection (Naive approach) ● Inter Quartile Range
  • 31. Anomaly Detection (Naive approach) ● Inter Quartile Range ○ NumPy ○ Pandas q1, q3 = np.percentile(col, [25, 75]) iqr = q3 - q1 np.where((col < q1 - 1.5*iqr) | (col > q3 + 1.5*iqr)) q1 = df[‘col’].quantile(.25) q3 = df[‘col’].quantile(.75) iqr = q3 - q1 df.loc[~df[‘col’].between(q1-1.5*iqr, q3+1.5*iqr),’col’]
  • 32. Anomaly Detection (Naive approach) ● 2-5 Standard Deviation
  • 33. Anomaly Detection (Naive approach) ● 2-5 Standard Deviation ○ NumPy ○ Pandas std = pd[‘col’].std() med = pd[‘col’].median() df.loc[~df[‘col’].between(med - 3*std, med + 3*std), 0] std = np.std(col) med = np.median(col) np.where((col < med - 3*std) | (col < med + 3*std))
  • 34. Anomaly Detection (Naive approach) ● MAD (Median Absolute Deviation) ○ MAD = median(|Xi - median(X)|) ○ “Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median” - Christopher Leys (2013)
  • 35. Outline Data Collection Time series Analysis Forecast Modeling Anomaly Detection Naive approach Logging SpeedTest Data preparation Handling time series Seasonal Trend Decomposition Rolling Forecast Basic approaches Stationarity Autoregression, Moving Average Autocorrelation ARIMA Multivariate Gaussian LSTM
  • 36. Stationary Series Criterion ● The mean, variance and covariance of the series are time invariant. stationary non-stationary
  • 37. Stationary Series Criterion ● The mean, variance and covariance of the series are time invariant. stationary non-stationary
  • 38. Stationary Series Criterion ● The mean, variance and covariance of the series are time invariant. stationary non-stationary
  • 40. Differencing ● A non-stationary series can be made stationary after differencing. ● Instead of modelling the level, we model the change ● Instead of forecasting the level, we forecast the change ● I(d) = yt - yt-d ● AR + I + MA
  • 41. Autoregression (AR) ● Autoregression means developing a linear model that uses observations at previous time steps to predict observations at future time step. ● Because the regression model uses data from the same input variable at previous time steps, it is referred to as an autoregression
  • 42. Moving Average (MA) ● MA models look similar to the AR component, but it's dealing with different values. ● The model account for the possibility of a relationship between a variable and the residuals from previous periods.
  • 43. ARIMA(p, d, q) ● Autoregressive Integrated Moving Average ○ AR : A model that uses dependent relationship between an observation and some number of lagged observations. ○ I : The use of differencing of raw observations in order to make the time series stationary. ○ MA : A model that uses the dependency between an observation and a residual error from a MA model. ● parameters of ARIMA model ○ p : The number of lag observations included in the model ○ d : the degree of differencing, the number of times that raw observations are differenced ○ q : The size of moving average window.
  • 44. Identification of ARIMA ● Autocorrelation function(ACF) : measured by a simple correlation between current observation Yt and the observation p lags from the current one Yt-p . ● Partial Autocorrelation Function (PACF) : measured by the degree of association between Yt and Yt-p when the effects at other intermediate time lags between Yt and Yt-p are removed. ● Inference from ACF and PACF : theoretical ACFs and PACFs are available for various values of the lags of AR and MA components. Therefore, plotting ACFs and PACFs versus lags and comparing leads to the selection of the appropriate parameter p and q for ARIMA model
  • 45. Identification of ARIMA (easy case) ● General characteristics of theoretical ACFs and PACFs ● Reference : ○ ○ Prof. Robert Nau model ACF PACF AR(p) Tail off; Spikes decay towards zero Spikes cutoff to zero after lag p MA(q) Spikes cutoff to zero after lag q Tails off; Spikes decay towards zero ARMA(p,q) Tails off; Spikes decay towards zero Tails off; Spikes decay towards zero
  • 46. Identification of ARIMA (easy case)
  • 47. Identification of ARIMA (complicated)
  • 48. Anomaly Detection (Parameter Estimation) xdown xup xdown xup
  • 49. Anomaly Detection (Multivariate Gaussian Distribution)
  • 50. Anomaly Detection (Multivariate Gaussian) import numpy as np from scipy.stats import multivariate_normal def estimate_gaussian(dataset): mu = np.mean(dataset, axis=0) sigma = np.cov(dataset.T) return mu, sigma def multivariate_gaussian(dataset, mu, sigma): p = multivariate_normal(mean=mu, cov=sigma) return p.pdf(dataset) mu, sigma = estimate_gaussian(train_X) p = multivariate_gaussian(train_X, mu, sigma) anomalies = np.where(p < ep) # ep : threshold
  • 51. Anomaly Detection (Multivariate Gaussian) import numpy as np from scipy.stats import multivariate_normal def estimate_gaussian(dataset): mu = np.mean(dataset, axis=0) sigma = np.cov(dataset.T) return mu, sigma def multivariate_gaussian(dataset, mu, sigma): p = multivariate_normal(mean=mu, cov=sigma) return p.pdf(dataset) mu, sigma = estimate_gaussian(train_X) p = multivariate_gaussian(train_X, mu, sigma) anomalies = np.where(p < ep) # ep : threshold
  • 52. Anomaly Detection (Multivariate Gaussian) import numpy as np from scipy.stats import multivariate_normal def estimate_gaussian(dataset): mu = np.mean(dataset, axis=0) sigma = np.cov(dataset.T) return mu, sigma def multivariate_gaussian(dataset, mu, sigma): p = multivariate_normal(mean=mu, cov=sigma) return p.pdf(dataset) mu, sigma = estimate_gaussian(train_X) p = multivariate_gaussian(train_X, mu, sigma) anomalies = np.where(p < ep) # ep : threshold
  • 53. Outline Data Collection Time series Analysis Forecast Modeling Anomaly Detection Naive approach Logging SpeedTest Data preparation Handling time series Seasonal Trend Decomposition Rolling Forecast Basic approaches Stationarity Autoregression, Moving Average Autocorrelation ARIMA Multivariate Gaussian LSTM
  • 56. Long Short-Term Memory from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from sklearn.metrics import mean_squared_error model = Sequential() model.add(LSTM(num_neurons, stateful=True, return_sequences=True, batch_input_shape=(batch_size, timesteps, input_dimension)) model.add(LSTM(num_neurons, stateful=True, batch_input_shape=(batch_size, timesteps, input_dimension)) model.add(Dense(1)) model.compile(loss='mean_squared_error', optimizer='adam') for i in range(num_epoch):, y, epochs=1, batch_size=batch_size, shuffle=False) model.reset_states()
  • 57. Long Short-Term Memory ● Will allow to model sophisticated and seasonal dependencies in time series ● Very helpful with multiple time series ● On going research, requires a lot of work to build model for time series
  • 58. Summary ● Be prepared before calling engineers for service failures ● Pythonista has all the powerful tools ○ pandas is great for handling time series ○ statsmodels for analyzing and modeling time series ○ sklearn is such a multi-tool in data science ○ keras is good to start deep learning ● Pythonista needs to understand a few concepts before using the tools ○ Stationarity in time series ○ Autoregressive and Moving Average ○ Means of forecasting, anomaly detection ● Deep Learning for forecasting time series ○ still on-going research ● Do try this at home