1. M1 - Linear Regression and Time Series Analysis
Luis Moreira-Matias
luis.matias[at]neclab.eu
www.luis-matias.pt.vu
NEC Laboratories Europe,
Heidelberg Germany
07/09/2015, Porto, Portugal
“Eureka!” - How to Build Accurate Predictors for Real-valued Outputs from Simple Methods
2. M1 - Linear Regression and Time-series Analysis
Outline
Regression Analysis
Basic concepts: Target, Objective and Learning/Induction Functions
Simple Linear Regression
Numerical Example with Least Squares
Multivariate Linear Regression, Bayesian Statistics and Kernel-Based
Approaches
Time Series Analysis - when the time becomes a feature
Basic concepts: Stationarity, ACF/PACF, Seasonality
AutoRegressive (AR) and Moving Average (MA) models
Box-Jenkins ARIMA forecasting for short-term predictions
Lessons Learned
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 2 / 41
3. M1 - Linear Regression and Time-series Analysis | Basics in Regression Analysis
An Introduction to Regression
Numerical prediction problems are about to generalize the behavior
of a target variable y given a predefined explanatory context (i.e.
explanatory variables) such as y = f(x);
Example: Energy consumption of a given family y along the time of
the day x;
Inductive learning method: estimate a behavioral function ˆf(x) given
a set of data samples, i.e. training set;
The range of all explanatory variables: feature space;
Example: [0, 24] hours defines the feature space of the time of the
day x;
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 3 / 41
4. M1 - Linear Regression and Time-series Analysis | Basics in Regression Analysis
Basic Concepts in Regression
Given a training set X with N = |X| samples, we want to estimate a
Target Function ˆf(x), x ∈ X such that
ˆf : X → n
, such that ˆf(x) = f(x), ∀x ∈ X
n denotes the number of features which range in
n
denotes the feature space where the training set X is mapped
Basic Concepts
Target Function: ˆf(x) ∼ f(x)
Induction Function/Learner/Method: the function used to construct ˆf(x)
from the input samples/training set
Objective/Loss Function: The function that we aim to minimize by
approximating ˆf(x) to f(x)
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 4 / 41
5. M1 - Linear Regression and Time-series Analysis | Basics in Regression Analysis
Overview on the Types of Learning Functions
Target Function vs. Learners
Can take either Linear or Non-Linear form, depending on the type of
relationship between y and X built by the Learner;
Parametric Learning methods do assume a functional form to ˆf(x)
apriori
Non-parametric do not!
White-Box Learning methods can express ˆf(x) on a equation
Black-box methods cannot!
Examples on Learners
Linear Least Squares [Legendre, 1805]: White-box Parametric
(Linear) Learner;
k-Nearest Neighbors [Cover and Hart, 1967] Black-Box
Non-Parametric Learner;
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 5 / 41
6. M1 - Linear Regression and Time-series Analysis | Basics in Regression Analysis
Objective Function in Regression
Typically, the regression task ends up to be the following
arg min
ˆY
l(ˆY, Y), ∀x ∈ X, f(x) = y ∈ Y,ˆf(x) = y ∈ ˆY
l is the so-called loss function to be minimized by defining ˆf(x)
if l(ˆY, Y) ∼ 0, we may be approximating ˆf(x) too much to f(x). Possible
Overfitting!
Regularization can be performed over the loss function to avoid
overfitting (to be discussed further)
Typical Loss Functions in Regression
Absolute Deviation N
i=1 |yi − ˆyi|
Least Squares N
i=1(yi − ˆyi)2
r = y − ˆy is often refered as the prediction residuals
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 6 / 41
7. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Simple Linear Regression
Simple Linear Regression is a special case of Linear Regression
which considers only one independent variable x, i.e. n = 1
It is parametric as it assumes that the target function must be a linear
combination of the feature values
It is a white-box method as the target function is known before
learned
Target Function: ˆf(x) = a · x + b
To estimate ˆf(x), we need to compute a, b which minimize
N
i=1(yi − ˆyi)2
→ Least Squares
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 7 / 41
8. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Mr. Burns loves money!
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 8 / 41
9. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Mr. Burns loves money!
He lives in Blue Street Mansion, where is always very Hot!!!
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 9 / 41
10. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Mr. Burns loves money!
He lives in Blue Street Mansion, where is always very Hot!!!
Blue Street as a Bus Stop which is always crowded!!!
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 10 / 41
11. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Mr. Burns loves money!
He lives in Blue Street Mansion, where is always very Hot!!!
Blue Street as a Bus Stop which is always crowded!!!
He wants to create an on-street lemonade’s salesplace to explore
those poor persons!
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 11 / 41
12. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Problem 1: his freezer cannot hold more than 80 lemonades.
Problem 2: his freezer cannot hold each lemonade more than 1-2
hours.
Question: He wants to know how many persons will be waiting at stop
along the day.
Lets help him :(
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 12 / 41
13. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
We got some direct observations X
x y
8 30
9 60
10 75
11 60
12 75
q
q
q
q
q
6 7 8 9 10 11 12 13 14
Bus Arrival Time (in hours)
Independent Variable
0102030405060708090100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 13 / 41
14. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Goal: estimate ¯f(x) ∼ f(x) (dashed line)
x y
8 30
9 60
10 75
11 60
12 75
q
q
q
q
q
6 7 8 9 10 11 12 13 14
Bus Arrival Time (in hours)
Independent Variable
0102030405060708090100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 14 / 41
15. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Any regression curve must pass the (x,y) mean values
x y
8 30
9 60
10 75
11 60
12 75
Mean Mean
10 60
q
q
q
q
q
6 7 8 9 10 11 12 13 14
Bus Arrival Time (in hours)
Independent Variable
0102030405060708090100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 15 / 41
16. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Compute sample differences to ¯x
x y x − ¯x
8 30 -2
9 60 -1
10 75 0
11 60 1
12 75 2
Mean Mean
10 60
q
q
q
q
q
6 7 8 9 10 11 12 13 14
Bus Arrival Time (in hours)
Independent Variable
0102030405060708090100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 16 / 41
17. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Compute sample differences to ¯y
x y x − ¯x y − ¯y
8 30 -2 -30
9 60 -1 0
10 75 0 15
11 60 1 30
12 75 2 45
Mean Mean
10 60
q
q
q
q
q
6 7 8 9 10 11 12 13 14
Bus Arrival Time (in hours)
Independent Variable
0102030405060708090100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 17 / 41
18. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
¯f(x) = a · x + b; a =
N
i=1(xi−¯x)(yi−¯y)
N
i=1(xi−¯x)2
= 180
10 = 18, i.e. slope
x y x − ¯x y − ¯y (x − ¯x)2
(x − ¯x)(y − ¯y)
8 30 -2 -30 4 60
9 60 -1 0 1 0
10 75 0 15 0 0
11 60 1 30 1 30
12 75 2 45 4 90
Mean Mean Sum Sum
10 60 10 180
q
q
q
q
q
6 7 8 9 10 11 12 13 14
Bus Arrival Time (in hours)
Independent Variable
0102030405060708090100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 18 / 41
19. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
b = ¯y − ¯x · a = 60 − 10 × 18 = −120, i.e. intersect
x y x − ¯x y − ¯y (x − ¯x)2
(x − ¯x)(y − ¯y)
8 30 -2 -30 4 60
9 60 -1 0 1 0
10 75 0 15 0 0
11 60 1 30 1 30
12 75 2 45 4 90
Mean Mean Sum Sum
10 60 10 180
q
q
q
q
q
6 7 8 9 10 11 12 13 14
Bus Arrival Time (in hours)
Independent Variable
0102030405060708090100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 19 / 41
20. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Numerical Example with Least Squares
Estimated Target Function: ¯f(x) = 18x − 120
q
q
q
q
q
6 7 8 9 10 11 12 13 14
Bus Arrival Time (in hours)
Independent Variable
0102030405060708090100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 20 / 41
21. M1 - Linear Regression and Time-series Analysis | An Overview on Linear Regression
Multivariate Linear Regression
What if there are multiple features, i.e. n > 1?
X will be a matrix... XN,n =
x1,1 x1,2 · · · x1,n
x2,1 x2,2 · · · x2,n
...
...
...
...
xN,1 xN,2 · · · xN,n
All the previous operations can be performed through algebric
operators and transformations (to see further)!
¯f(x) = w1 · x[1] + w2 · x[2] + ... + wn · x[n] + b = wT
X + b
We do know that f(x) = ¯f(x) + . Assuming that ∼ N, we have that
the Least Squares is similar to the Maximum likelihood Estimator
(MLE)!
Consequently, we can obtain ¯f(x) as the maximum likelihood
w = (XT
X)−1
(XT
Y)
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 21 / 41
22. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Result Analysis
Are Mr. Burns Happy with our final result?
q
q
q
q
q
6 7 8 9 10 11 12 13 14 15 16 17 18
Bus Arrival Time (in hours)
Independent Variable
020406080100120140160180200
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 22 / 41
23. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Prediction Exercise
Let’s do an estimation to 14h...we got 130 pax!
However, he does know that the bus has a capacity of 100 pax!!!!
How certain can he be about our prediction?
q
q
q
q
q
q
?
Bus Capacity
6 7 8 9 10 11 12 13 14 15 16 17 18
Bus Arrival Time (in hours)
Independent Variable
020406080100120140160180200
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 23 / 41
24. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Bayesian Linear Regression
Why doesn’t our LS/MLE output a satisfactory model?
It overfitted the training data - which do not adequately cover the
feature space!!!
One solution: Go Bayesian [Box and Tiao, 2011]! Our prior
knowledge can optimize the loss function!
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 24 / 41
25. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Bayesian Linear Regression
Apriori, we do know that the Bus capacity is 100 pax.
Empirically, we could state that f(x) ∼ N(60, 10) → Predictive
Distribution
µ = 60 is easily verifiable on the data, while σ2
= 10 comes from our
belief;
Result: p(20 < f(x) < 80) = 90%
20 40 60 80 100
0.000.010.020.030.04
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 25 / 41
26. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Bayesian Linear Regression
Result: A line with a smoother slope. Mr. Burns is (almost...) an
happy man!
q
q
q
q
q
q
?
q
?
6 7 8 9 10 11 12 13 14 15 16 17 18
Bus Arrival Time (in hours)
Independent Variable
020406080100120140160180200
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 26 / 41
27. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Other Regression Methods
What if we have more samples? Do you think that the target function
would be linear still?
Would the overfitting be a bigger problem in the presence of outliers?
q
q
q
q
q
q
q
q
q
q
6 7 8 9 10 11 12 13 14 15 16 17 18
Bus Arrival Time (in hours)
Independent Variable
020406080100
NumberofPassengersBoardedinBlueStreet
DependentVariable
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 27 / 41
28. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Other Regression Methods
Kernel Regression [Nadaraya, 1964] can deal with outliers.
It is a white-box non-parametric regression method.
It returns an weighted average of all samples where their weight is
computed by the concept of neighborhood given by bandwidth
parameter (i.e. λ).
¯f(x∗) =
N
i=1 K(x∗,xi,λ)yi
N
i=1 K(x∗,xi,λ)
the K function is the kernel used to compute such weights
Distinct applications may require distinct kernels!
λ must be tuned before usage (e.g. cross validation)
Two common kernels: Nadaraya-Watson K(x∗, xi, λ) = x∗−xi
λ ,
Normal K(x∗, xi, λ) = e
−
(x∗−xi)2
2λ2
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 28 / 41
29. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Nadaraya-Watson Kernel Regression
Example using a Nadaraya-Watson Kernel...
q
q
q
q
q
q
q
q
q
q
6 7 8 9 10 11 12 13 14 15 16 17 18
Bus Arrival Time (in hours)
Independent Variable
020406080100
NumberofPassengersBoardedinBlueStreet
DependentVariable
λ=1
q
q
q
q
q
q
q
q
q
q
6 7 8 9 10 11 12 13 14 15 16 17 18
Bus Arrival Time (in hours)
Independent Variable
020406080100
NumberofPassengersBoardedinBlueStreet
DependentVariable
λ=3
q
q
q
q
q
q
q
q
q
q
6 7 8 9 10 11 12 13 14 15 16 17 18
Bus Arrival Time (in hours)
Independent Variable
020406080100
NumberofPassengersBoardedinBlueStreet
DependentVariable
λ=5
q
q
q
q
q
q
q
q
q
q
6 7 8 9 10 11 12 13 14 15 16 17 18
Bus Arrival Time (in hours)
Independent Variable
020406080100
NumberofPassengersBoardedinBlueStreet
DependentVariable
λ=10
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 29 / 41
30. M1 - Linear Regression and Time-series Analysis | Other Useful Regression Methods
Other Relevant Methods worthy to be explored
Conjugate Gradients [Hestenes and Stiefel, 1952]
Weighted Linear Regression [Strutz, 2010]
Regression (Decision) Trees [Breiman et al., 1984]
Local Regression [Cleveland, 1981]
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 30 / 41
31. M1 - Linear Regression and Time-series Analysis | Time Series Analysis
What is Time Series analysis?
Time Series Analysis is a subfield of signal processing that deals with
modeling the behavior of a timestamped series of numerical values.
It can be faced as a subset of simple linear regression problem where
x is defined in time and, even more important, the sample’s arrival
sequence is relevant for their future values!!!
Finally, it is expected that the signal would follow some trend, evolving
over seasons.
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 31 / 41
33. M1 - Linear Regression and Time-series Analysis | Time Series Analysis
Time Series - Basic Concepts
Stationarity - Are the mean, the variance and the co-variance
constant along t?
Stationarity is a key property to deal with time series. Even if the
series are not stationary, they may turn by differencing the signal with
some lag d
Trend - Is there any periodicity for which the signal repeats itself (in
some pattern) along t?
Seasonality - Is the signal stationary for subsets of t? Has this signal
a trend for subsets of t? Then it is said to be seasonal!
ACF - Autocorrelation Function. Measures the correlation of the
signal with itself lagged over a pool of possible lag values.
PACF - Partial Autocorrelation Function. Measures the ACF after
removing linear dependences
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 33 / 41
34. M1 - Linear Regression and Time-series Analysis | Time Series Analysis
AutoRegressive and Moving Average Models
Autoregressive model of order p, i.e. AR(p)
yt = δ + φ1yt−1 + φ2yt−2 + ... + φqyt−q + t
yt is an weighted average of its p previous values
Moving Average model of order q, i.e. MA(q)
yt = δ − θ1 t−1 − θ1 t−2 − ... − θq t−q + t
yt is an weighted average of its q previous error terms ∗
Are these equations somehow familiar to you!?
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 34 / 41
35. M1 - Linear Regression and Time-series Analysis | Time Series Analysis
ARIMA models
ARIMA models stands for AutoRegressive Integrated Moving Average
and are used to estimate time series models [Box and Pierce, 1970]
ARIMA models are linear regression models that use lagged values
of the dependent variable and/or a random disturbance term as
explanatory variables
They rely heavily on the autocorrelation patterns of the data, i.e. they
assume that the signal repeats itself somehow
An ARIMA model is defined by three values (p, d, q) where p denotes
its AR component, q the MA component and d the differentiation
needed to make the series stationary. Any of those values can be 0.
The random disturbances t are assumed to be gaussian, i.e.
t ∼ N(0, σ2
)
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 35 / 41
36. M1 - Linear Regression and Time-series Analysis | Time Series Analysis
An Non-Seasonal ARIMA example in a nutshell
Stationarity: Yes (but not sure with visual inspection)
Trend: It repeats itself each 24h with two peaks around 10am and
24am
How to estimate the ARIMA model in place? (p, d, q)?
qqq
qq
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
qq
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qqqq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
4 8 16 24 8 16 24 8 16 24 8 16 24 8 16 24
MON TUE WED THU FRI
Bus Arrival Time (in hours)
020406080120160200240
NumberofPassengersBoardedinBlueStreet
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 36 / 41
37. M1 - Linear Regression and Time-series Analysis | Time Series Analysis
An Non-Seasonal ARIMA example in a nutshell
Stationarity: If Stationary, ACF should tail off abruptly. Otherwise, it
will do it smoothly, going nowhere...
Series is Stationary → d = 0
ACF cuts off after p = 3 lags; PACF cuts off after q = 1 lags
Model (p, d, q)=(3, 0, 1)
0 10 20 30 40
−0.20.20.61.0
Lag
ACF
Series y
0 10 20 30 40
−0.20.2
Lag
PartialACF
Series y
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 37 / 41
38. M1 - Linear Regression and Time-series Analysis | Time Series Analysis
An Non-Seasonal ARIMA example in a nutshell
ARIMA are very powerful methods for short-term forecasting
horizons. Looking ahead more than one terms means to re-use the
predictions as true past values to estimate their future outputs.
Our exercise is to predict the bus demand for the last day, i.e. Friday.
We did it so using a rolling horizon of one hour (we predicted
one-step ahead on each iteration, including the last true output value
on the training series used to estimate the next weight set, i.e φ∗, θ∗)
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 38 / 41
39. M1 - Linear Regression and Time-series Analysis | Time Series Analysis
An Non-Seasonal ARIMA example in a nutshell
The forecasting result is very close to the real series. It finds the
peaks/valleys but it under/overestimate their true values...
qqq
qq
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
qq
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qqqq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
4 8 16 24 8 16 24 8 16 24 8 16 24 8 16 24
MON TUE WED THU FRI
Bus Arrival Time (in hours)
020406080120160200240
NumberofPassengersBoardedinBlueStreet
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 39 / 41
40. M1 - Linear Regression and Time-series Analysis | Time Series Analysis
Other Relevant Methods worthy to be explored
Auto-ARIMA Learning Model [Hyndman and Khandakar, 2007]
Seasonal ARIMA [Box et al., 1976]
Holt-Winters Exponential Smoothing [Goodwin et al., 2010]
Inhomogeneous Poisson Models [Lee et al., 1991]
GARCH models [Engle, 1982]
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 40 / 41
41. M1 - Linear Regression and Time-series Analysis | Time Series Analysis
Lessons Learned
White-box Regression methods are easier to understand as they
provide a direct relationship between input-output values
Linear Regression methods can be powerful inference tools
Bayesian Statistics can help of inputting prior knowledge about target
variable into our learning model
The adequate choice of a loss function for a given problem can
enhance the predictive power of a method
Time Series Analysis methods can be used when the sample’s order
is relevant and time-dependent
They are easy to understand and specially powerful on predicting for
short-term horizons
A brief study of the problem and the selection of the proper
combination of ML components can easy your daily life.
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 41 / 41
42. M1 - Linear Regression and Time-series Analysis | Time Series Analysis
Box, G., Jenkins, G., and Reinsel, G. (1976).
Time series analysis.
Holden-day San Francisco.
Box, G. and Pierce, D. (1970).
Distribution of residual autocorrelations in autoregressive-integrated
moving average time series models.
Journal of the American Statistical Association, 65(332):1509–1526.
Box, G. E. and Tiao, G. C. (2011).
Bayesian inference in statistical analysis, volume 40.
John Wiley & Sons.
Breiman, L., Friedman, J., Stone, C., and Olshen, R. (1984).
Classification and regression trees.
CRC press.
Cleveland, W. S. (1981).
Lowess: A program for smoothing scatterplots by robust locally
weighted regression.
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 41 / 41
43. M1 - Linear Regression and Time-series Analysis | Time Series Analysis
The American Statistician, 35(1):54–54.
Cover, T. and Hart, P. (1967).
Nearest neighbor pattern classification.
IEEE Transactions on Information Theory, 13(1):21–27.
Engle, R. F. (1982).
Autoregressive conditional heteroscedasticity with estimates of the
variance of united kingdom inflation.
Econometrica: Journal of the Econometric Society, pages 987–1007.
Goodwin, P. et al. (2010).
The holt-winters approach to exponential smoothing: 50 years old and
going strong.
Foresight, 19:30–33.
Hestenes, M. R. and Stiefel, E. (1952).
Methods of conjugate gradients for solving linear systems.
Hyndman, R. and Khandakar, Y. (2007).
Automatic time series for forecasting: the forecast package for r.
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 41 / 41
44. M1 - Linear Regression and Time-series Analysis | Time Series Analysis
Technical report, Monash University, Department of Econometrics and
Business Statistics.
Lee, S., Wilson, J. R., and Crawford, M. M. (1991).
Modeling and simulation of a nonhomogeneous poisson process
having cyclic behavior.
Communications in Statistics-Simulation and Computation,
20(2-3):777–809.
Legendre, A. (1805).
Nouvelles m´ethodes pour la d´etermination des orbites des com`etes.
Number 1. F. Didot.
Nadaraya, E. (1964).
On estimating regression.
Theory of Probability & Its Applications, 9(1):141–142.
Strutz, T. (2010).
Data fitting and uncertainty.
A practical introduction to weighted least squares and beyond.
Vieweg+ Teubner.
Luis Moreira-Matias | NEC Europe Ltd. | 07/09/2015, Porto, Portugal 41 / 41