SlideShare a Scribd company logo
1 of 44
Download to read offline
M1 - Linear Regression and Time Series Analysis
Luis Moreira-Matias
luis.matias[at]neclab.eu
www.luis-matias.pt.vu
NEC Laboratories Europe,
Heidelberg Germany
07/09/2015, Porto, Portugal
“Eureka!” - How to Build Accurate Predictors for Real-valued Outputs from Simple Methods
M1 - Linear Regression and Time-series Analysis
Outline
Regression Analysis
Basic concepts: Target, Objective and Learning/Induction Functions
Simple Linear Regression
Numerical Example with Least Squares
Multivariate Linear Regression, Bayesian Statistics and Kernel-Based
Approaches
Time Series Analysis - when the time becomes a feature
Basic concepts: Stationarity, ACF/PACF, Seasonality
AutoRegressive (AR) and Moving Average (MA) models
Box-Jenkins ARIMA forecasting for short-term predictions
Lessons Learned
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 2 / 41
M1 - Linear Regression and Time-series Analysis | Basics in Regression Analysis
An Introduction to Regression
Numerical prediction problems are about to generalize the behavior
of a target variable y given a predefined explanatory context (i.e.
explanatory variables) such as y = f(x);
Example: Energy consumption of a given family y along the time of
the day x;
Inductive learning method: estimate a behavioral function ˆf(x) given
a set of data samples, i.e. training set;
The range of all explanatory variables: feature space;
Example: [0, 24] hours defines the feature space of the time of the
day x;
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 3 / 41
M1 - Linear Regression and Time-series Analysis | Basics in Regression Analysis
Basic Concepts in Regression
Given a training set X with N = |X| samples, we want to estimate a
Target Function ˆf(x), x ∈ X such that
ˆf : X → n
, such that ˆf(x) = f(x), ∀x ∈ X
n denotes the number of features which range in
n
denotes the feature space where the training set X is mapped
Basic Concepts
Target Function: ˆf(x) ∼ f(x)
Induction Function/Learner/Method: the function used to construct ˆf(x)
from the input samples/training set
Objective/Loss Function: The function that we aim to minimize by
approximating ˆf(x) to f(x)
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 4 / 41
M1 - Linear Regression and Time-series Analysis | Basics in Regression Analysis
Overview on the Types of Learning Functions
Target Function vs. Learners
Can take either Linear or Non-Linear form, depending on the type of
relationship between y and X built by the Learner;
Parametric Learning methods do assume a functional form to ˆf(x)
apriori
Non-parametric do not!
White-Box Learning methods can express ˆf(x) on a equation
Black-box methods cannot!
Examples on Learners
Linear Least Squares [Legendre, 1805]: White-box Parametric
(Linear) Learner;
k-Nearest Neighbors [Cover and Hart, 1967] Black-Box
Non-Parametric Learner;
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 5 / 41
M1 - Linear Regression and Time-series Analysis | Basics in Regression Analysis
Objective Function in Regression
Typically, the regression task ends up to be the following
arg min
ˆY
l(ˆY, Y), ∀x ∈ X, f(x) = y ∈ Y,ˆf(x) = y ∈ ˆY
l is the so-called loss function to be minimized by defining ˆf(x)
if l(ˆY, Y) ∼ 0, we may be approximating ˆf(x) too much to f(x). Possible
Overfitting!
Regularization can be performed over the loss function to avoid
overfitting (to be discussed further)
Typical Loss Functions in Regression
Absolute Deviation N
i=1 |yi − ˆyi|
Least Squares N
i=1(yi − ˆyi)2
r = y − ˆy is often refered as the prediction residuals
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 6 / 41
M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Simple Linear Regression
Simple Linear Regression is a special case of Linear Regression
which considers only one independent variable x, i.e. n = 1
It is parametric as it assumes that the target function must be a linear
combination of the feature values
It is a white-box method as the target function is known before
learned
Target Function: ˆf(x) = a · x + b
To estimate ˆf(x), we need to compute a, b which minimize
N
i=1(yi − ˆyi)2
→ Least Squares
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 7 / 41
M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Mr. Burns loves money!
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 8 / 41
M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Mr. Burns loves money!
He lives in Blue Street Mansion, where is always very Hot!!!
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 9 / 41
M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Mr. Burns loves money!
He lives in Blue Street Mansion, where is always very Hot!!!
Blue Street as a Bus Stop which is always crowded!!!
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 10 / 41
M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Mr. Burns loves money!
He lives in Blue Street Mansion, where is always very Hot!!!
Blue Street as a Bus Stop which is always crowded!!!
He wants to create an on-street lemonade’s salesplace to explore
those poor persons!
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 11 / 41
M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Problem 1: his freezer cannot hold more than 80 lemonades.
Problem 2: his freezer cannot hold each lemonade more than 1-2
hours.
Question: He wants to know how many persons will be waiting at stop
along the day.
Lets help him :(
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 12 / 41
M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
We got some direct observations X
x y
8 30
9 60
10 75
11 60
12 75
q
q
q
q
q
6 7 8 9 10 11 12 13 14
Bus Arrival Time (in hours)
Independent Variable
0102030405060708090100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 13 / 41
M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Goal: estimate ¯f(x) ∼ f(x) (dashed line)
x y
8 30
9 60
10 75
11 60
12 75
q
q
q
q
q
6 7 8 9 10 11 12 13 14
Bus Arrival Time (in hours)
Independent Variable
0102030405060708090100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 14 / 41
M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Any regression curve must pass the (x,y) mean values
x y
8 30
9 60
10 75
11 60
12 75
Mean Mean
10 60
q
q
q
q
q
6 7 8 9 10 11 12 13 14
Bus Arrival Time (in hours)
Independent Variable
0102030405060708090100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 15 / 41
M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Compute sample differences to ¯x
x y x − ¯x
8 30 -2
9 60 -1
10 75 0
11 60 1
12 75 2
Mean Mean
10 60
q
q
q
q
q
6 7 8 9 10 11 12 13 14
Bus Arrival Time (in hours)
Independent Variable
0102030405060708090100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 16 / 41
M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Compute sample differences to ¯y
x y x − ¯x y − ¯y
8 30 -2 -30
9 60 -1 0
10 75 0 15
11 60 1 30
12 75 2 45
Mean Mean
10 60
q
q
q
q
q
6 7 8 9 10 11 12 13 14
Bus Arrival Time (in hours)
Independent Variable
0102030405060708090100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 17 / 41
M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
¯f(x) = a · x + b; a =
N
i=1(xi−¯x)(yi−¯y)
N
i=1(xi−¯x)2
= 180
10 = 18, i.e. slope
x y x − ¯x y − ¯y (x − ¯x)2
(x − ¯x)(y − ¯y)
8 30 -2 -30 4 60
9 60 -1 0 1 0
10 75 0 15 0 0
11 60 1 30 1 30
12 75 2 45 4 90
Mean Mean Sum Sum
10 60 10 180
q
q
q
q
q
6 7 8 9 10 11 12 13 14
Bus Arrival Time (in hours)
Independent Variable
0102030405060708090100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 18 / 41
M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
b = ¯y − ¯x · a = 60 − 10 × 18 = −120, i.e. intersect
x y x − ¯x y − ¯y (x − ¯x)2
(x − ¯x)(y − ¯y)
8 30 -2 -30 4 60
9 60 -1 0 1 0
10 75 0 15 0 0
11 60 1 30 1 30
12 75 2 45 4 90
Mean Mean Sum Sum
10 60 10 180
q
q
q
q
q
6 7 8 9 10 11 12 13 14
Bus Arrival Time (in hours)
Independent Variable
0102030405060708090100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 19 / 41
M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Estimated Target Function: ¯f(x) = 18x − 120
q
q
q
q
q
6 7 8 9 10 11 12 13 14
Bus Arrival Time (in hours)
Independent Variable
0102030405060708090100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 20 / 41
M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Multivariate Linear Regression
What if there are multiple features, i.e. n > 1?
X will be a matrix... XN,n =


x1,1 x1,2 · · · x1,n
x2,1 x2,2 · · · x2,n
...
...
...
...
xN,1 xN,2 · · · xN,n


All the previous operations can be performed through algebric
operators and transformations (to see further)!
¯f(x) = w1 · x[1] + w2 · x[2] + ... + wn · x[n] + b = wT
X + b
We do know that f(x) = ¯f(x) + . Assuming that ∼ N, we have that
the Least Squares is similar to the Maximum likelihood Estimator
(MLE)!
Consequently, we can obtain ¯f(x) as the maximum likelihood
w = (XT
X)−1
(XT
Y)
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 21 / 41
M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Result Analysis
Are Mr. Burns Happy with our final result?
q
q
q
q
q
6 7 8 9 10 11 12 13 14 15 16 17 18
Bus Arrival Time (in hours)
Independent Variable
020406080100120140160180200
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 22 / 41
M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Prediction Exercise
Let’s do an estimation to 14h...we got 130 pax!
However, he does know that the bus has a capacity of 100 pax!!!!
How certain can he be about our prediction?
q
q
q
q
q
q
?
Bus Capacity
6 7 8 9 10 11 12 13 14 15 16 17 18
Bus Arrival Time (in hours)
Independent Variable
020406080100120140160180200
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 23 / 41
M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Bayesian Linear Regression
Why doesn’t our LS/MLE output a satisfactory model?
It overfitted the training data - which do not adequately cover the
feature space!!!
One solution: Go Bayesian [Box and Tiao, 2011]! Our prior
knowledge can optimize the loss function!
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 24 / 41
M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Bayesian Linear Regression
Apriori, we do know that the Bus capacity is 100 pax.
Empirically, we could state that f(x) ∼ N(60, 10) → Predictive
Distribution
µ = 60 is easily verifiable on the data, while σ2
= 10 comes from our
belief;
Result: p(20 < f(x) < 80) = 90%
20 40 60 80 100
0.000.010.020.030.04
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 25 / 41
M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Bayesian Linear Regression
Result: A line with a smoother slope. Mr. Burns is (almost...) an
happy man!
q
q
q
q
q
q
?
q
?
6 7 8 9 10 11 12 13 14 15 16 17 18
Bus Arrival Time (in hours)
Independent Variable
020406080100120140160180200
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 26 / 41
M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Other Regression Methods
What if we have more samples? Do you think that the target function
would be linear still?
Would the overfitting be a bigger problem in the presence of outliers?
q
q
q
q
q
q
q
q
q
q
6 7 8 9 10 11 12 13 14 15 16 17 18
Bus Arrival Time (in hours)
Independent Variable
020406080100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 27 / 41
M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Other Regression Methods
Kernel Regression [Nadaraya, 1964] can deal with outliers.
It is a white-box non-parametric regression method.
It returns an weighted average of all samples where their weight is
computed by the concept of neighborhood given by bandwidth
parameter (i.e. λ).
¯f(x∗) =
N
i=1 K(x∗,xi,λ)yi
N
i=1 K(x∗,xi,λ)
the K function is the kernel used to compute such weights
Distinct applications may require distinct kernels!
λ must be tuned before usage (e.g. cross validation)
Two common kernels: Nadaraya-Watson K(x∗, xi, λ) = x∗−xi
λ ,
Normal K(x∗, xi, λ) = e
−
(x∗−xi)2
2λ2
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 28 / 41
M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Nadaraya-Watson Kernel Regression
Example using a Nadaraya-Watson Kernel...
q
q
q
q
q
q
q
q
q
q
6 7 8 9 10 11 12 13 14 15 16 17 18
Bus Arrival Time (in hours)
Independent Variable
020406080100
NumberofPassengersBoardedinBlueStreet
DependentVariable
λ=1
q
q
q
q
q
q
q
q
q
q
6 7 8 9 10 11 12 13 14 15 16 17 18
Bus Arrival Time (in hours)
Independent Variable
020406080100
NumberofPassengersBoardedinBlueStreet
DependentVariable
λ=3
q
q
q
q
q
q
q
q
q
q
6 7 8 9 10 11 12 13 14 15 16 17 18
Bus Arrival Time (in hours)
Independent Variable
020406080100
NumberofPassengersBoardedinBlueStreet
DependentVariable
λ=5
q
q
q
q
q
q
q
q
q
q
6 7 8 9 10 11 12 13 14 15 16 17 18
Bus Arrival Time (in hours)
Independent Variable
020406080100
NumberofPassengersBoardedinBlueStreet
DependentVariable
λ=10
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 29 / 41
M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Other Relevant Methods worthy to be explored
Conjugate Gradients [Hestenes and Stiefel, 1952]
Weighted Linear Regression [Strutz, 2010]
Regression (Decision) Trees [Breiman et al., 1984]
Local Regression [Cleveland, 1981]
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 30 / 41
M1 - Linear Regression and Time-series Analysis | Time Series Analysis
What is Time Series analysis?
Time Series Analysis is a subfield of signal processing that deals with
modeling the behavior of a timestamped series of numerical values.
It can be faced as a subset of simple linear regression problem where
x is defined in time and, even more important, the sample’s arrival
sequence is relevant for their future values!!!
Finally, it is expected that the signal would follow some trend, evolving
over seasons.
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 31 / 41
M1 - Linear Regression and Time-series Analysis | Time Series Analysis
Time Series - An Example
qqq
qq
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
qq
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qqqq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
4 8 16 24 8 16 24 8 16 24 8 16 24 8 16 24
MON TUE WED THU FRI
Bus Arrival Time (in hours)
020406080120160200240
NumberofPassengersBoardedinBlueStreet
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 32 / 41
M1 - Linear Regression and Time-series Analysis | Time Series Analysis
Time Series - Basic Concepts
Stationarity - Are the mean, the variance and the co-variance
constant along t?
Stationarity is a key property to deal with time series. Even if the
series are not stationary, they may turn by differencing the signal with
some lag d
Trend - Is there any periodicity for which the signal repeats itself (in
some pattern) along t?
Seasonality - Is the signal stationary for subsets of t? Has this signal
a trend for subsets of t? Then it is said to be seasonal!
ACF - Autocorrelation Function. Measures the correlation of the
signal with itself lagged over a pool of possible lag values.
PACF - Partial Autocorrelation Function. Measures the ACF after
removing linear dependences
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 33 / 41
M1 - Linear Regression and Time-series Analysis | Time Series Analysis
AutoRegressive and Moving Average Models
Autoregressive model of order p, i.e. AR(p)
yt = δ + φ1yt−1 + φ2yt−2 + ... + φqyt−q + t
yt is an weighted average of its p previous values
Moving Average model of order q, i.e. MA(q)
yt = δ − θ1 t−1 − θ1 t−2 − ... − θq t−q + t
yt is an weighted average of its q previous error terms ∗
Are these equations somehow familiar to you!?
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 34 / 41
M1 - Linear Regression and Time-series Analysis | Time Series Analysis
ARIMA models
ARIMA models stands for AutoRegressive Integrated Moving Average
and are used to estimate time series models [Box and Pierce, 1970]
ARIMA models are linear regression models that use lagged values
of the dependent variable and/or a random disturbance term as
explanatory variables
They rely heavily on the autocorrelation patterns of the data, i.e. they
assume that the signal repeats itself somehow
An ARIMA model is defined by three values (p, d, q) where p denotes
its AR component, q the MA component and d the differentiation
needed to make the series stationary. Any of those values can be 0.
The random disturbances t are assumed to be gaussian, i.e.
t ∼ N(0, σ2
)
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 35 / 41
M1 - Linear Regression and Time-series Analysis | Time Series Analysis
An Non-Seasonal ARIMA example in a nutshell
Stationarity: Yes (but not sure with visual inspection)
Trend: It repeats itself each 24h with two peaks around 10am and
24am
How to estimate the ARIMA model in place? (p, d, q)?
qqq
qq
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
qq
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qqqq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
4 8 16 24 8 16 24 8 16 24 8 16 24 8 16 24
MON TUE WED THU FRI
Bus Arrival Time (in hours)
020406080120160200240
NumberofPassengersBoardedinBlueStreet
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 36 / 41
M1 - Linear Regression and Time-series Analysis | Time Series Analysis
An Non-Seasonal ARIMA example in a nutshell
Stationarity: If Stationary, ACF should tail off abruptly. Otherwise, it
will do it smoothly, going nowhere...
Series is Stationary → d = 0
ACF cuts off after p = 3 lags; PACF cuts off after q = 1 lags
Model (p, d, q)=(3, 0, 1)
0 10 20 30 40
−0.20.20.61.0
Lag
ACF
Series y
0 10 20 30 40
−0.20.2
Lag
PartialACF
Series y
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 37 / 41
M1 - Linear Regression and Time-series Analysis | Time Series Analysis
An Non-Seasonal ARIMA example in a nutshell
ARIMA are very powerful methods for short-term forecasting
horizons. Looking ahead more than one terms means to re-use the
predictions as true past values to estimate their future outputs.
Our exercise is to predict the bus demand for the last day, i.e. Friday.
We did it so using a rolling horizon of one hour (we predicted
one-step ahead on each iteration, including the last true output value
on the training series used to estimate the next weight set, i.e φ∗, θ∗)
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 38 / 41
M1 - Linear Regression and Time-series Analysis | Time Series Analysis
An Non-Seasonal ARIMA example in a nutshell
The forecasting result is very close to the real series. It finds the
peaks/valleys but it under/overestimate their true values...
qqq
qq
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
qq
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qqqq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
4 8 16 24 8 16 24 8 16 24 8 16 24 8 16 24
MON TUE WED THU FRI
Bus Arrival Time (in hours)
020406080120160200240
NumberofPassengersBoardedinBlueStreet
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 39 / 41
M1 - Linear Regression and Time-series Analysis | Time Series Analysis
Other Relevant Methods worthy to be explored
Auto-ARIMA Learning Model [Hyndman and Khandakar, 2007]
Seasonal ARIMA [Box et al., 1976]
Holt-Winters Exponential Smoothing [Goodwin et al., 2010]
Inhomogeneous Poisson Models [Lee et al., 1991]
GARCH models [Engle, 1982]
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 40 / 41
M1 - Linear Regression and Time-series Analysis | Time Series Analysis
Lessons Learned
White-box Regression methods are easier to understand as they
provide a direct relationship between input-output values
Linear Regression methods can be powerful inference tools
Bayesian Statistics can help of inputting prior knowledge about target
variable into our learning model
The adequate choice of a loss function for a given problem can
enhance the predictive power of a method
Time Series Analysis methods can be used when the sample’s order
is relevant and time-dependent
They are easy to understand and specially powerful on predicting for
short-term horizons
A brief study of the problem and the selection of the proper
combination of ML components can easy your daily life.
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 41 / 41
M1 - Linear Regression and Time-series Analysis | Time Series Analysis
Box, G., Jenkins, G., and Reinsel, G. (1976).
Time series analysis.
Holden-day San Francisco.
Box, G. and Pierce, D. (1970).
Distribution of residual autocorrelations in autoregressive-integrated
moving average time series models.
Journal of the American Statistical Association, 65(332):1509–1526.
Box, G. E. and Tiao, G. C. (2011).
Bayesian inference in statistical analysis, volume 40.
John Wiley & Sons.
Breiman, L., Friedman, J., Stone, C., and Olshen, R. (1984).
Classification and regression trees.
CRC press.
Cleveland, W. S. (1981).
Lowess: A program for smoothing scatterplots by robust locally
weighted regression.
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 41 / 41
M1 - Linear Regression and Time-series Analysis | Time Series Analysis
The American Statistician, 35(1):54–54.
Cover, T. and Hart, P. (1967).
Nearest neighbor pattern classification.
IEEE Transactions on Information Theory, 13(1):21–27.
Engle, R. F. (1982).
Autoregressive conditional heteroscedasticity with estimates of the
variance of united kingdom inflation.
Econometrica: Journal of the Econometric Society, pages 987–1007.
Goodwin, P. et al. (2010).
The holt-winters approach to exponential smoothing: 50 years old and
going strong.
Foresight, 19:30–33.
Hestenes, M. R. and Stiefel, E. (1952).
Methods of conjugate gradients for solving linear systems.
Hyndman, R. and Khandakar, Y. (2007).
Automatic time series for forecasting: the forecast package for r.
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 41 / 41
M1 - Linear Regression and Time-series Analysis | Time Series Analysis
Technical report, Monash University, Department of Econometrics and
Business Statistics.
Lee, S., Wilson, J. R., and Crawford, M. M. (1991).
Modeling and simulation of a nonhomogeneous poisson process
having cyclic behavior.
Communications in Statistics-Simulation and Computation,
20(2-3):777–809.
Legendre, A. (1805).
Nouvelles m´ethodes pour la d´etermination des orbites des com`etes.
Number 1. F. Didot.
Nadaraya, E. (1964).
On estimating regression.
Theory of Probability & Its Applications, 9(1):141–142.
Strutz, T. (2010).
Data fitting and uncertainty.
A practical introduction to weighted least squares and beyond.
Vieweg+ Teubner.
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 41 / 41

More Related Content

Viewers also liked

IT Professional (Sandeep)
IT Professional (Sandeep)IT Professional (Sandeep)
IT Professional (Sandeep)Sandeep Kumar
 
Physically-based Modeling of Motion Pattern for Scorpion Robot
Physically-based Modeling of Motion Pattern for Scorpion RobotPhysically-based Modeling of Motion Pattern for Scorpion Robot
Physically-based Modeling of Motion Pattern for Scorpion RobotDarin Rajan
 
deepak pal Resume (2)
deepak pal Resume (2)deepak pal Resume (2)
deepak pal Resume (2)DEEPAK PAL
 
รูปแบบคอมพิวเตอร์ช่วยสอน
รูปแบบคอมพิวเตอร์ช่วยสอนรูปแบบคอมพิวเตอร์ช่วยสอน
รูปแบบคอมพิวเตอร์ช่วยสอนsoysuda
 
aldin praktikum 3
aldin praktikum 3aldin praktikum 3
aldin praktikum 3aldin15
 
Investor reach
Investor reach Investor reach
Investor reach econnexx
 
Women's Access to Healthcare - Adrienne Zertuche Presentation
Women's Access to Healthcare - Adrienne Zertuche PresentationWomen's Access to Healthcare - Adrienne Zertuche Presentation
Women's Access to Healthcare - Adrienne Zertuche PresentationGeorgia Commission on Women
 
Petrov-Boston-July-27-2014
Petrov-Boston-July-27-2014Petrov-Boston-July-27-2014
Petrov-Boston-July-27-2014Daniel Petrov
 

Viewers also liked (14)

IT Professional (Sandeep)
IT Professional (Sandeep)IT Professional (Sandeep)
IT Professional (Sandeep)
 
Physically-based Modeling of Motion Pattern for Scorpion Robot
Physically-based Modeling of Motion Pattern for Scorpion RobotPhysically-based Modeling of Motion Pattern for Scorpion Robot
Physically-based Modeling of Motion Pattern for Scorpion Robot
 
deepak pal Resume (2)
deepak pal Resume (2)deepak pal Resume (2)
deepak pal Resume (2)
 
รูปแบบคอมพิวเตอร์ช่วยสอน
รูปแบบคอมพิวเตอร์ช่วยสอนรูปแบบคอมพิวเตอร์ช่วยสอน
รูปแบบคอมพิวเตอร์ช่วยสอน
 
aldin praktikum 3
aldin praktikum 3aldin praktikum 3
aldin praktikum 3
 
CEM_overview
CEM_overviewCEM_overview
CEM_overview
 
teste
testeteste
teste
 
Investor reach
Investor reach Investor reach
Investor reach
 
Kitab allah
Kitab allahKitab allah
Kitab allah
 
Chris Resume 10_13_2015
Chris Resume 10_13_2015Chris Resume 10_13_2015
Chris Resume 10_13_2015
 
SOFTWARE
SOFTWARESOFTWARE
SOFTWARE
 
Каталог Farmasi май 2016
Каталог Farmasi май 2016Каталог Farmasi май 2016
Каталог Farmasi май 2016
 
Women's Access to Healthcare - Adrienne Zertuche Presentation
Women's Access to Healthcare - Adrienne Zertuche PresentationWomen's Access to Healthcare - Adrienne Zertuche Presentation
Women's Access to Healthcare - Adrienne Zertuche Presentation
 
Petrov-Boston-July-27-2014
Petrov-Boston-July-27-2014Petrov-Boston-July-27-2014
Petrov-Boston-July-27-2014
 

Similar to introduction_to_basic_numerical_prediction_ECML15_tutorial

Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systemsrecsysfr
 
Por dentro do modelo 4MD - Como a EPE projeta a micro e minigeração no Brasil
Por dentro do modelo 4MD - Como a EPE projeta a micro e minigeração no BrasilPor dentro do modelo 4MD - Como a EPE projeta a micro e minigeração no Brasil
Por dentro do modelo 4MD - Como a EPE projeta a micro e minigeração no BrasilGabriel Konzen
 
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical Methods
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical MethodsJavier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical Methods
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical MethodsJ. García - Verdugo
 
Mb0048 operations research winter 2014
Mb0048 operations research winter 2014Mb0048 operations research winter 2014
Mb0048 operations research winter 2014Mba Assignments
 
The Application of Spatial Filtering Technique to the Economic Convergence of...
The Application of Spatial Filtering Technique to the Economic Convergence of...The Application of Spatial Filtering Technique to the Economic Convergence of...
The Application of Spatial Filtering Technique to the Economic Convergence of...Beniamino Murgante
 
Robust sensor fault detection and isolation of an anerarobic bioreactor model...
Robust sensor fault detection and isolation of an anerarobic bioreactor model...Robust sensor fault detection and isolation of an anerarobic bioreactor model...
Robust sensor fault detection and isolation of an anerarobic bioreactor model...Francisco Ronay López Estrada
 
Probability and random processes project based learning template.pdf
Probability and random processes project based learning template.pdfProbability and random processes project based learning template.pdf
Probability and random processes project based learning template.pdfVedant Srivastava
 
IE 425 Homework 10Submit on Tuesday, 12101.(20 pts) C.docx
IE 425 Homework 10Submit on Tuesday, 12101.(20 pts) C.docxIE 425 Homework 10Submit on Tuesday, 12101.(20 pts) C.docx
IE 425 Homework 10Submit on Tuesday, 12101.(20 pts) C.docxsheronlewthwaite
 
Webinar: A bi-objective multiperiod fuzzy scheduling for a multimodal urban t...
Webinar: A bi-objective multiperiod fuzzy scheduling for a multimodal urban t...Webinar: A bi-objective multiperiod fuzzy scheduling for a multimodal urban t...
Webinar: A bi-objective multiperiod fuzzy scheduling for a multimodal urban t...BRTCoE
 
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...Philippe Laborie
 
Digital Signal Processing[ECEG-3171]-Ch1_L07
Digital Signal Processing[ECEG-3171]-Ch1_L07Digital Signal Processing[ECEG-3171]-Ch1_L07
Digital Signal Processing[ECEG-3171]-Ch1_L07Rediet Moges
 
Goodness–of–fit tests for regression models: the functional data case
Goodness–of–fit tests for regression models: the functional data caseGoodness–of–fit tests for regression models: the functional data case
Goodness–of–fit tests for regression models: the functional data caseNeuroMat
 
Project Management Fundamental
Project Management FundamentalProject Management Fundamental
Project Management FundamentalAndy Pham, PMP
 
DSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - MarkusDSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - MarkusDeltares
 
Waiting line system
Waiting line systemWaiting line system
Waiting line systemNsbmUcd
 
Monte Carlo Simulation
Monte Carlo SimulationMonte Carlo Simulation
Monte Carlo SimulationDeepti Singh
 
ACCELERATED COMPUTING
ACCELERATED COMPUTING ACCELERATED COMPUTING
ACCELERATED COMPUTING mohamed hanini
 

Similar to introduction_to_basic_numerical_prediction_ECML15_tutorial (20)

Milm Panam Novo
Milm   Panam NovoMilm   Panam Novo
Milm Panam Novo
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
 
Por dentro do modelo 4MD - Como a EPE projeta a micro e minigeração no Brasil
Por dentro do modelo 4MD - Como a EPE projeta a micro e minigeração no BrasilPor dentro do modelo 4MD - Como a EPE projeta a micro e minigeração no Brasil
Por dentro do modelo 4MD - Como a EPE projeta a micro e minigeração no Brasil
 
GRAPH-BASED RECOMMENDATION SYSTEM
GRAPH-BASED RECOMMENDATION SYSTEMGRAPH-BASED RECOMMENDATION SYSTEM
GRAPH-BASED RECOMMENDATION SYSTEM
 
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical Methods
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical MethodsJavier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical Methods
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical Methods
 
Mb0048 operations research winter 2014
Mb0048 operations research winter 2014Mb0048 operations research winter 2014
Mb0048 operations research winter 2014
 
The Application of Spatial Filtering Technique to the Economic Convergence of...
The Application of Spatial Filtering Technique to the Economic Convergence of...The Application of Spatial Filtering Technique to the Economic Convergence of...
The Application of Spatial Filtering Technique to the Economic Convergence of...
 
Robust sensor fault detection and isolation of an anerarobic bioreactor model...
Robust sensor fault detection and isolation of an anerarobic bioreactor model...Robust sensor fault detection and isolation of an anerarobic bioreactor model...
Robust sensor fault detection and isolation of an anerarobic bioreactor model...
 
Probability and random processes project based learning template.pdf
Probability and random processes project based learning template.pdfProbability and random processes project based learning template.pdf
Probability and random processes project based learning template.pdf
 
IE 425 Homework 10Submit on Tuesday, 12101.(20 pts) C.docx
IE 425 Homework 10Submit on Tuesday, 12101.(20 pts) C.docxIE 425 Homework 10Submit on Tuesday, 12101.(20 pts) C.docx
IE 425 Homework 10Submit on Tuesday, 12101.(20 pts) C.docx
 
Nonnegative Matrix Factorization with Side Information for Time Series Recove...
Nonnegative Matrix Factorization with Side Information for Time Series Recove...Nonnegative Matrix Factorization with Side Information for Time Series Recove...
Nonnegative Matrix Factorization with Side Information for Time Series Recove...
 
Webinar: A bi-objective multiperiod fuzzy scheduling for a multimodal urban t...
Webinar: A bi-objective multiperiod fuzzy scheduling for a multimodal urban t...Webinar: A bi-objective multiperiod fuzzy scheduling for a multimodal urban t...
Webinar: A bi-objective multiperiod fuzzy scheduling for a multimodal urban t...
 
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...
 
Digital Signal Processing[ECEG-3171]-Ch1_L07
Digital Signal Processing[ECEG-3171]-Ch1_L07Digital Signal Processing[ECEG-3171]-Ch1_L07
Digital Signal Processing[ECEG-3171]-Ch1_L07
 
Goodness–of–fit tests for regression models: the functional data case
Goodness–of–fit tests for regression models: the functional data caseGoodness–of–fit tests for regression models: the functional data case
Goodness–of–fit tests for regression models: the functional data case
 
Project Management Fundamental
Project Management FundamentalProject Management Fundamental
Project Management Fundamental
 
DSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - MarkusDSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - Markus
 
Waiting line system
Waiting line systemWaiting line system
Waiting line system
 
Monte Carlo Simulation
Monte Carlo SimulationMonte Carlo Simulation
Monte Carlo Simulation
 
ACCELERATED COMPUTING
ACCELERATED COMPUTING ACCELERATED COMPUTING
ACCELERATED COMPUTING
 

More from Luís Moreira-Matias

Shallow Self-Learning for Reject Inference in Credit Scoring
Shallow Self-Learning for Reject Inference in Credit ScoringShallow Self-Learning for Reject Inference in Credit Scoring
Shallow Self-Learning for Reject Inference in Credit ScoringLuís Moreira-Matias
 
presentation_ECMLPKDD16_Concept_v1
presentation_ECMLPKDD16_Concept_v1presentation_ECMLPKDD16_Concept_v1
presentation_ECMLPKDD16_Concept_v1Luís Moreira-Matias
 
flat_presentation_time_evolving_OD_matrix_estimation
flat_presentation_time_evolving_OD_matrix_estimationflat_presentation_time_evolving_OD_matrix_estimation
flat_presentation_time_evolving_OD_matrix_estimationLuís Moreira-Matias
 

More from Luís Moreira-Matias (6)

Shallow Self-Learning for Reject Inference in Credit Scoring
Shallow Self-Learning for Reject Inference in Credit ScoringShallow Self-Learning for Reject Inference in Credit Scoring
Shallow Self-Learning for Reject Inference in Credit Scoring
 
CJammer_ieee_itsc_v1
CJammer_ieee_itsc_v1CJammer_ieee_itsc_v1
CJammer_ieee_itsc_v1
 
presentation_ECMLPKDD16_Concept_v1
presentation_ECMLPKDD16_Concept_v1presentation_ECMLPKDD16_Concept_v1
presentation_ECMLPKDD16_Concept_v1
 
unischeduler_pakdd_v3
unischeduler_pakdd_v3unischeduler_pakdd_v3
unischeduler_pakdd_v3
 
demandlocker_TRB_v3
demandlocker_TRB_v3demandlocker_TRB_v3
demandlocker_TRB_v3
 
flat_presentation_time_evolving_OD_matrix_estimation
flat_presentation_time_evolving_OD_matrix_estimationflat_presentation_time_evolving_OD_matrix_estimation
flat_presentation_time_evolving_OD_matrix_estimation
 

introduction_to_basic_numerical_prediction_ECML15_tutorial

  • 1. M1 - Linear Regression and Time Series Analysis Luis Moreira-Matias luis.matias[at]neclab.eu www.luis-matias.pt.vu NEC Laboratories Europe, Heidelberg Germany 07/09/2015, Porto, Portugal “Eureka!” - How to Build Accurate Predictors for Real-valued Outputs from Simple Methods
  • 2. M1 - Linear Regression and Time-series Analysis Outline Regression Analysis Basic concepts: Target, Objective and Learning/Induction Functions Simple Linear Regression Numerical Example with Least Squares Multivariate Linear Regression, Bayesian Statistics and Kernel-Based Approaches Time Series Analysis - when the time becomes a feature Basic concepts: Stationarity, ACF/PACF, Seasonality AutoRegressive (AR) and Moving Average (MA) models Box-Jenkins ARIMA forecasting for short-term predictions Lessons Learned Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 2 / 41
  • 3. M1 - Linear Regression and Time-series Analysis | Basics in Regression Analysis An Introduction to Regression Numerical prediction problems are about to generalize the behavior of a target variable y given a predefined explanatory context (i.e. explanatory variables) such as y = f(x); Example: Energy consumption of a given family y along the time of the day x; Inductive learning method: estimate a behavioral function ˆf(x) given a set of data samples, i.e. training set; The range of all explanatory variables: feature space; Example: [0, 24] hours defines the feature space of the time of the day x; Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 3 / 41
  • 4. M1 - Linear Regression and Time-series Analysis | Basics in Regression Analysis Basic Concepts in Regression Given a training set X with N = |X| samples, we want to estimate a Target Function ˆf(x), x ∈ X such that ˆf : X → n , such that ˆf(x) = f(x), ∀x ∈ X n denotes the number of features which range in n denotes the feature space where the training set X is mapped Basic Concepts Target Function: ˆf(x) ∼ f(x) Induction Function/Learner/Method: the function used to construct ˆf(x) from the input samples/training set Objective/Loss Function: The function that we aim to minimize by approximating ˆf(x) to f(x) Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 4 / 41
  • 5. M1 - Linear Regression and Time-series Analysis | Basics in Regression Analysis Overview on the Types of Learning Functions Target Function vs. Learners Can take either Linear or Non-Linear form, depending on the type of relationship between y and X built by the Learner; Parametric Learning methods do assume a functional form to ˆf(x) apriori Non-parametric do not! White-Box Learning methods can express ˆf(x) on a equation Black-box methods cannot! Examples on Learners Linear Least Squares [Legendre, 1805]: White-box Parametric (Linear) Learner; k-Nearest Neighbors [Cover and Hart, 1967] Black-Box Non-Parametric Learner; Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 5 / 41
  • 6. M1 - Linear Regression and Time-series Analysis | Basics in Regression Analysis Objective Function in Regression Typically, the regression task ends up to be the following arg min ˆY l(ˆY, Y), ∀x ∈ X, f(x) = y ∈ Y,ˆf(x) = y ∈ ˆY l is the so-called loss function to be minimized by defining ˆf(x) if l(ˆY, Y) ∼ 0, we may be approximating ˆf(x) too much to f(x). Possible Overfitting! Regularization can be performed over the loss function to avoid overfitting (to be discussed further) Typical Loss Functions in Regression Absolute Deviation N i=1 |yi − ˆyi| Least Squares N i=1(yi − ˆyi)2 r = y − ˆy is often refered as the prediction residuals Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 6 / 41
  • 7. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression Simple Linear Regression Simple Linear Regression is a special case of Linear Regression which considers only one independent variable x, i.e. n = 1 It is parametric as it assumes that the target function must be a linear combination of the feature values It is a white-box method as the target function is known before learned Target Function: ˆf(x) = a · x + b To estimate ˆf(x), we need to compute a, b which minimize N i=1(yi − ˆyi)2 → Least Squares Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 7 / 41
  • 8. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression Numerical Example with Least Squares Mr. Burns loves money! Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 8 / 41
  • 9. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression Numerical Example with Least Squares Mr. Burns loves money! He lives in Blue Street Mansion, where is always very Hot!!! Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 9 / 41
  • 10. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression Numerical Example with Least Squares Mr. Burns loves money! He lives in Blue Street Mansion, where is always very Hot!!! Blue Street as a Bus Stop which is always crowded!!! Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 10 / 41
  • 11. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression Numerical Example with Least Squares Mr. Burns loves money! He lives in Blue Street Mansion, where is always very Hot!!! Blue Street as a Bus Stop which is always crowded!!! He wants to create an on-street lemonade’s salesplace to explore those poor persons! Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 11 / 41
  • 12. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression Numerical Example with Least Squares Problem 1: his freezer cannot hold more than 80 lemonades. Problem 2: his freezer cannot hold each lemonade more than 1-2 hours. Question: He wants to know how many persons will be waiting at stop along the day. Lets help him :( Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 12 / 41
  • 13. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression Numerical Example with Least Squares We got some direct observations X x y 8 30 9 60 10 75 11 60 12 75 q q q q q 6 7 8 9 10 11 12 13 14 Bus Arrival Time (in hours) Independent Variable 0102030405060708090100 NumberofPassengersBoardedinBlueStreet DependentVariable Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 13 / 41
  • 14. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression Numerical Example with Least Squares Goal: estimate ¯f(x) ∼ f(x) (dashed line) x y 8 30 9 60 10 75 11 60 12 75 q q q q q 6 7 8 9 10 11 12 13 14 Bus Arrival Time (in hours) Independent Variable 0102030405060708090100 NumberofPassengersBoardedinBlueStreet DependentVariable Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 14 / 41
  • 15. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression Numerical Example with Least Squares Any regression curve must pass the (x,y) mean values x y 8 30 9 60 10 75 11 60 12 75 Mean Mean 10 60 q q q q q 6 7 8 9 10 11 12 13 14 Bus Arrival Time (in hours) Independent Variable 0102030405060708090100 NumberofPassengersBoardedinBlueStreet DependentVariable Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 15 / 41
  • 16. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression Numerical Example with Least Squares Compute sample differences to ¯x x y x − ¯x 8 30 -2 9 60 -1 10 75 0 11 60 1 12 75 2 Mean Mean 10 60 q q q q q 6 7 8 9 10 11 12 13 14 Bus Arrival Time (in hours) Independent Variable 0102030405060708090100 NumberofPassengersBoardedinBlueStreet DependentVariable Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 16 / 41
  • 17. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression Numerical Example with Least Squares Compute sample differences to ¯y x y x − ¯x y − ¯y 8 30 -2 -30 9 60 -1 0 10 75 0 15 11 60 1 30 12 75 2 45 Mean Mean 10 60 q q q q q 6 7 8 9 10 11 12 13 14 Bus Arrival Time (in hours) Independent Variable 0102030405060708090100 NumberofPassengersBoardedinBlueStreet DependentVariable Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 17 / 41
  • 18. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression Numerical Example with Least Squares ¯f(x) = a · x + b; a = N i=1(xi−¯x)(yi−¯y) N i=1(xi−¯x)2 = 180 10 = 18, i.e. slope x y x − ¯x y − ¯y (x − ¯x)2 (x − ¯x)(y − ¯y) 8 30 -2 -30 4 60 9 60 -1 0 1 0 10 75 0 15 0 0 11 60 1 30 1 30 12 75 2 45 4 90 Mean Mean Sum Sum 10 60 10 180 q q q q q 6 7 8 9 10 11 12 13 14 Bus Arrival Time (in hours) Independent Variable 0102030405060708090100 NumberofPassengersBoardedinBlueStreet DependentVariable Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 18 / 41
  • 19. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression Numerical Example with Least Squares b = ¯y − ¯x · a = 60 − 10 × 18 = −120, i.e. intersect x y x − ¯x y − ¯y (x − ¯x)2 (x − ¯x)(y − ¯y) 8 30 -2 -30 4 60 9 60 -1 0 1 0 10 75 0 15 0 0 11 60 1 30 1 30 12 75 2 45 4 90 Mean Mean Sum Sum 10 60 10 180 q q q q q 6 7 8 9 10 11 12 13 14 Bus Arrival Time (in hours) Independent Variable 0102030405060708090100 NumberofPassengersBoardedinBlueStreet DependentVariable Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 19 / 41
  • 20. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression Numerical Example with Least Squares Estimated Target Function: ¯f(x) = 18x − 120 q q q q q 6 7 8 9 10 11 12 13 14 Bus Arrival Time (in hours) Independent Variable 0102030405060708090100 NumberofPassengersBoardedinBlueStreet DependentVariable Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 20 / 41
  • 21. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression Multivariate Linear Regression What if there are multiple features, i.e. n > 1? X will be a matrix... XN,n =   x1,1 x1,2 · · · x1,n x2,1 x2,2 · · · x2,n ... ... ... ... xN,1 xN,2 · · · xN,n   All the previous operations can be performed through algebric operators and transformations (to see further)! ¯f(x) = w1 · x[1] + w2 · x[2] + ... + wn · x[n] + b = wT X + b We do know that f(x) = ¯f(x) + . Assuming that ∼ N, we have that the Least Squares is similar to the Maximum likelihood Estimator (MLE)! Consequently, we can obtain ¯f(x) as the maximum likelihood w = (XT X)−1 (XT Y) Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 21 / 41
  • 22. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods Result Analysis Are Mr. Burns Happy with our final result? q q q q q 6 7 8 9 10 11 12 13 14 15 16 17 18 Bus Arrival Time (in hours) Independent Variable 020406080100120140160180200 NumberofPassengersBoardedinBlueStreet DependentVariable Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 22 / 41
  • 23. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods Prediction Exercise Let’s do an estimation to 14h...we got 130 pax! However, he does know that the bus has a capacity of 100 pax!!!! How certain can he be about our prediction? q q q q q q ? Bus Capacity 6 7 8 9 10 11 12 13 14 15 16 17 18 Bus Arrival Time (in hours) Independent Variable 020406080100120140160180200 NumberofPassengersBoardedinBlueStreet DependentVariable Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 23 / 41
  • 24. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods Bayesian Linear Regression Why doesn’t our LS/MLE output a satisfactory model? It overfitted the training data - which do not adequately cover the feature space!!! One solution: Go Bayesian [Box and Tiao, 2011]! Our prior knowledge can optimize the loss function! Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 24 / 41
  • 25. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods Bayesian Linear Regression Apriori, we do know that the Bus capacity is 100 pax. Empirically, we could state that f(x) ∼ N(60, 10) → Predictive Distribution µ = 60 is easily verifiable on the data, while σ2 = 10 comes from our belief; Result: p(20 < f(x) < 80) = 90% 20 40 60 80 100 0.000.010.020.030.04 Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 25 / 41
  • 26. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods Bayesian Linear Regression Result: A line with a smoother slope. Mr. Burns is (almost...) an happy man! q q q q q q ? q ? 6 7 8 9 10 11 12 13 14 15 16 17 18 Bus Arrival Time (in hours) Independent Variable 020406080100120140160180200 NumberofPassengersBoardedinBlueStreet DependentVariable Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 26 / 41
  • 27. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods Other Regression Methods What if we have more samples? Do you think that the target function would be linear still? Would the overfitting be a bigger problem in the presence of outliers? q q q q q q q q q q 6 7 8 9 10 11 12 13 14 15 16 17 18 Bus Arrival Time (in hours) Independent Variable 020406080100 NumberofPassengersBoardedinBlueStreet DependentVariable Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 27 / 41
  • 28. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods Other Regression Methods Kernel Regression [Nadaraya, 1964] can deal with outliers. It is a white-box non-parametric regression method. It returns an weighted average of all samples where their weight is computed by the concept of neighborhood given by bandwidth parameter (i.e. λ). ¯f(x∗) = N i=1 K(x∗,xi,λ)yi N i=1 K(x∗,xi,λ) the K function is the kernel used to compute such weights Distinct applications may require distinct kernels! λ must be tuned before usage (e.g. cross validation) Two common kernels: Nadaraya-Watson K(x∗, xi, λ) = x∗−xi λ , Normal K(x∗, xi, λ) = e − (x∗−xi)2 2λ2 Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 28 / 41
  • 29. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods Nadaraya-Watson Kernel Regression Example using a Nadaraya-Watson Kernel... q q q q q q q q q q 6 7 8 9 10 11 12 13 14 15 16 17 18 Bus Arrival Time (in hours) Independent Variable 020406080100 NumberofPassengersBoardedinBlueStreet DependentVariable λ=1 q q q q q q q q q q 6 7 8 9 10 11 12 13 14 15 16 17 18 Bus Arrival Time (in hours) Independent Variable 020406080100 NumberofPassengersBoardedinBlueStreet DependentVariable λ=3 q q q q q q q q q q 6 7 8 9 10 11 12 13 14 15 16 17 18 Bus Arrival Time (in hours) Independent Variable 020406080100 NumberofPassengersBoardedinBlueStreet DependentVariable λ=5 q q q q q q q q q q 6 7 8 9 10 11 12 13 14 15 16 17 18 Bus Arrival Time (in hours) Independent Variable 020406080100 NumberofPassengersBoardedinBlueStreet DependentVariable λ=10 Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 29 / 41
  • 30. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods Other Relevant Methods worthy to be explored Conjugate Gradients [Hestenes and Stiefel, 1952] Weighted Linear Regression [Strutz, 2010] Regression (Decision) Trees [Breiman et al., 1984] Local Regression [Cleveland, 1981] Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 30 / 41
  • 31. M1 - Linear Regression and Time-series Analysis | Time Series Analysis What is Time Series analysis? Time Series Analysis is a subfield of signal processing that deals with modeling the behavior of a timestamped series of numerical values. It can be faced as a subset of simple linear regression problem where x is defined in time and, even more important, the sample’s arrival sequence is relevant for their future values!!! Finally, it is expected that the signal would follow some trend, evolving over seasons. Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 31 / 41
  • 32. M1 - Linear Regression and Time-series Analysis | Time Series Analysis Time Series - An Example qqq qq q q q q q q q qq q q q q qq q q q qq qqqq q q q q q q q q q q q q q qq q q q qqqq qq q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q qqqq q q q q q q q q q q q q q q q q q q q 4 8 16 24 8 16 24 8 16 24 8 16 24 8 16 24 MON TUE WED THU FRI Bus Arrival Time (in hours) 020406080120160200240 NumberofPassengersBoardedinBlueStreet Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 32 / 41
  • 33. M1 - Linear Regression and Time-series Analysis | Time Series Analysis Time Series - Basic Concepts Stationarity - Are the mean, the variance and the co-variance constant along t? Stationarity is a key property to deal with time series. Even if the series are not stationary, they may turn by differencing the signal with some lag d Trend - Is there any periodicity for which the signal repeats itself (in some pattern) along t? Seasonality - Is the signal stationary for subsets of t? Has this signal a trend for subsets of t? Then it is said to be seasonal! ACF - Autocorrelation Function. Measures the correlation of the signal with itself lagged over a pool of possible lag values. PACF - Partial Autocorrelation Function. Measures the ACF after removing linear dependences Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 33 / 41
  • 34. M1 - Linear Regression and Time-series Analysis | Time Series Analysis AutoRegressive and Moving Average Models Autoregressive model of order p, i.e. AR(p) yt = δ + φ1yt−1 + φ2yt−2 + ... + φqyt−q + t yt is an weighted average of its p previous values Moving Average model of order q, i.e. MA(q) yt = δ − θ1 t−1 − θ1 t−2 − ... − θq t−q + t yt is an weighted average of its q previous error terms ∗ Are these equations somehow familiar to you!? Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 34 / 41
  • 35. M1 - Linear Regression and Time-series Analysis | Time Series Analysis ARIMA models ARIMA models stands for AutoRegressive Integrated Moving Average and are used to estimate time series models [Box and Pierce, 1970] ARIMA models are linear regression models that use lagged values of the dependent variable and/or a random disturbance term as explanatory variables They rely heavily on the autocorrelation patterns of the data, i.e. they assume that the signal repeats itself somehow An ARIMA model is defined by three values (p, d, q) where p denotes its AR component, q the MA component and d the differentiation needed to make the series stationary. Any of those values can be 0. The random disturbances t are assumed to be gaussian, i.e. t ∼ N(0, σ2 ) Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 35 / 41
  • 36. M1 - Linear Regression and Time-series Analysis | Time Series Analysis An Non-Seasonal ARIMA example in a nutshell Stationarity: Yes (but not sure with visual inspection) Trend: It repeats itself each 24h with two peaks around 10am and 24am How to estimate the ARIMA model in place? (p, d, q)? qqq qq q q q q q q q qq q q q q qq q q q qq qqqq q q q q q q q q q q q q q qq q q q qqqq qq q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q qqqq q q q q q q q q q q q q q q q q q q q 4 8 16 24 8 16 24 8 16 24 8 16 24 8 16 24 MON TUE WED THU FRI Bus Arrival Time (in hours) 020406080120160200240 NumberofPassengersBoardedinBlueStreet Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 36 / 41
  • 37. M1 - Linear Regression and Time-series Analysis | Time Series Analysis An Non-Seasonal ARIMA example in a nutshell Stationarity: If Stationary, ACF should tail off abruptly. Otherwise, it will do it smoothly, going nowhere... Series is Stationary → d = 0 ACF cuts off after p = 3 lags; PACF cuts off after q = 1 lags Model (p, d, q)=(3, 0, 1) 0 10 20 30 40 −0.20.20.61.0 Lag ACF Series y 0 10 20 30 40 −0.20.2 Lag PartialACF Series y Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 37 / 41
  • 38. M1 - Linear Regression and Time-series Analysis | Time Series Analysis An Non-Seasonal ARIMA example in a nutshell ARIMA are very powerful methods for short-term forecasting horizons. Looking ahead more than one terms means to re-use the predictions as true past values to estimate their future outputs. Our exercise is to predict the bus demand for the last day, i.e. Friday. We did it so using a rolling horizon of one hour (we predicted one-step ahead on each iteration, including the last true output value on the training series used to estimate the next weight set, i.e φ∗, θ∗) Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 38 / 41
  • 39. M1 - Linear Regression and Time-series Analysis | Time Series Analysis An Non-Seasonal ARIMA example in a nutshell The forecasting result is very close to the real series. It finds the peaks/valleys but it under/overestimate their true values... qqq qq q q q q q q q qq q q q q qq q q q qq qqqq q q q q q q q q q q q q q qq q q q qqqq qq q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q qqqq q q q q q q q q q q q q q q q q q q q 4 8 16 24 8 16 24 8 16 24 8 16 24 8 16 24 MON TUE WED THU FRI Bus Arrival Time (in hours) 020406080120160200240 NumberofPassengersBoardedinBlueStreet Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 39 / 41
  • 40. M1 - Linear Regression and Time-series Analysis | Time Series Analysis Other Relevant Methods worthy to be explored Auto-ARIMA Learning Model [Hyndman and Khandakar, 2007] Seasonal ARIMA [Box et al., 1976] Holt-Winters Exponential Smoothing [Goodwin et al., 2010] Inhomogeneous Poisson Models [Lee et al., 1991] GARCH models [Engle, 1982] Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 40 / 41
  • 41. M1 - Linear Regression and Time-series Analysis | Time Series Analysis Lessons Learned White-box Regression methods are easier to understand as they provide a direct relationship between input-output values Linear Regression methods can be powerful inference tools Bayesian Statistics can help of inputting prior knowledge about target variable into our learning model The adequate choice of a loss function for a given problem can enhance the predictive power of a method Time Series Analysis methods can be used when the sample’s order is relevant and time-dependent They are easy to understand and specially powerful on predicting for short-term horizons A brief study of the problem and the selection of the proper combination of ML components can easy your daily life. Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 41 / 41
  • 42. M1 - Linear Regression and Time-series Analysis | Time Series Analysis Box, G., Jenkins, G., and Reinsel, G. (1976). Time series analysis. Holden-day San Francisco. Box, G. and Pierce, D. (1970). Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. Journal of the American Statistical Association, 65(332):1509–1526. Box, G. E. and Tiao, G. C. (2011). Bayesian inference in statistical analysis, volume 40. John Wiley & Sons. Breiman, L., Friedman, J., Stone, C., and Olshen, R. (1984). Classification and regression trees. CRC press. Cleveland, W. S. (1981). Lowess: A program for smoothing scatterplots by robust locally weighted regression. Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 41 / 41
  • 43. M1 - Linear Regression and Time-series Analysis | Time Series Analysis The American Statistician, 35(1):54–54. Cover, T. and Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1):21–27. Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica: Journal of the Econometric Society, pages 987–1007. Goodwin, P. et al. (2010). The holt-winters approach to exponential smoothing: 50 years old and going strong. Foresight, 19:30–33. Hestenes, M. R. and Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. Hyndman, R. and Khandakar, Y. (2007). Automatic time series for forecasting: the forecast package for r. Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 41 / 41
  • 44. M1 - Linear Regression and Time-series Analysis | Time Series Analysis Technical report, Monash University, Department of Econometrics and Business Statistics. Lee, S., Wilson, J. R., and Crawford, M. M. (1991). Modeling and simulation of a nonhomogeneous poisson process having cyclic behavior. Communications in Statistics-Simulation and Computation, 20(2-3):777–809. Legendre, A. (1805). Nouvelles m´ethodes pour la d´etermination des orbites des com`etes. Number 1. F. Didot. Nadaraya, E. (1964). On estimating regression. Theory of Probability & Its Applications, 9(1):141–142. Strutz, T. (2010). Data fitting and uncertainty. A practical introduction to weighted least squares and beyond. Vieweg+ Teubner. Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 41 / 41