Software Engineering College 6 -timeseries data

College 6 – Timeseries
(Source: http://scifun.chem.wisc.edu/WOP/RandomWalk.html )

Hoofdstukken
Kortom: Wat is er blijven hangen van het vorige college?

Waarom timeseries
Wat zijn timeseries
Decompositie van timeseries
Basis functies voor timeseries
Stabiliseren van timeseries
Omgaan met NA in timeseries
Analyse van de ruis (AR/MA/white noise)

CREDITS TO
• Michelangelo Vargas voor vak 3.4 SEC
• NA n dataset http://publish.illinois.edu/spencer-guerrero/2014/12/11/2-dealing-
with-missing-data-in-r-omit-approx-or-spline-part-1/
• Decompositie https://anomaly.io/seasonal-trend-decomposition-in-r/
• Basis functies https://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf
• AR/MA modelleren https://www.analyticsvidhya.com/blog/2015/12/complete-
tutorial-time-series-modeling/
• Forecast https://media.readthedocs.org/pdf/a-little-book-of-r-for-time-
series/latest/a-little-book-of-r-for-time-series.pdf

Het doel van dit college is
• Data kunt ombouwen naar timeseries
• Tijdreeks formules begrijpen
• Tijdreeksen transformeren middels decompositie
• Met de timeseries package kunt werken

Waarom timeseries
Een voorbeeld
Er is een distributie-centrum met producten die in opslag zijn en
producten die niet in opslag zijn. Dit heeft te maken met het aantal
bestellingen per maand. Verschillende mensen in het bedrijf
vermoeden dat het aantal bestellingen van een bepaald product stijgt.
Hoe controleren we dit? Wat zullen de bestellingen in de
toekomst zijn?

Wat is een tijdreeks/timeseries?
Een tijdreeks Xt is een verzameling van waarnemingen, elk gedaan op
een specifiek tijdstip t.
De verzameling T van tijdstippen zou continu kunnen zijn, maar wij
zullen uitgaan van een discrete verzameling T.
Sterker nog, we zullen uitgaan van tijdreeksen met waarnemingen op
een vaste afstand van elkaar.

Een tijdreeks is meestal een samenstelling
van componenten
Voor het beschrijven van een tijdreeks maken we gebruik van vier
componenten.
• De trend De trend geeft de globale beschrijving van de stijging of
daling van een tijdreeks.
• De seizoencomponent Deze component geeft het cyclische
gedrag van de tijdreeks. De periode van dit gedrag hoort constant
en bekend te zijn.
• De conjunctuurcomponent Deze component geeft het cyclische
gedrag waarvan de periode niet bekend is. Deze periode zal over
het algemeen langer zijn dan de seizoensperiode.
• De toevallige component Dit is het gedrag dat we niet kunnen
beschrijven met de drie andere componenten.

Basics van stats, tseries, ast and lmtest package
cycle()# gives the positions in the cycle of each observation (stats)
deltat()# returns the time interval between observations (stats)
end()# extracts and encodes the times the last observation were taken (stats)
frequency()# returns the number of samples per unit time (stats)
read.ts()# reads a time series file (tseries)
start()# extracts and encodes the times the first observation were taken (stats)
time()# creates the vector of times at which a time series was sampled (stats)
ts()#creates time-series objects (stats)
window()# is a generic function which extracts the subset of the object 'x' observed
between the times 'start' and 'end'. If a frequency is specified, the series is then
re-sampled at the new frequency (stats)
#het begint allemaal met installeren en inladen van de package
install.packages('tseries')
library(tseries)
Maken van een Tseries object
Installatie package
https://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf

Tests doen op de timeseries (er zijn meer tests maar die zijn out of scope
Decompositie
decompose()# decomposes a time series into seasonal, trend and irregular components
using moving averages. Deals with additive or multiplicative seasonal component
(stats)
filter()# linear filtering on a time series (stats)
HoltWinters()# computes Holt-Winters Filtering of a given time series (stats)
sfilter()#removes seasonal fluctuation using a simple moving average (ast)
spectrum()# estimates the spectral density of a time series (stats)
stl()# decomposes a time series into seasonal, trend and irregular components using
'loess' (stats)
tsr()# decomposes a time series into trend, seasonal and irregular. Deals with
additive and multiplicative components (ast)
adf.test()#computes the Augmented Dickey-Fuller test for the null that 'x'
has a unit root (tseries)
Box.test()# computes the Box-Pierce or Ljung-Box test statistic for examining
the null hypothesis of independence in a given time series (stats)

Grafieken
Modelleren
ar()# fits an autoregressive time series model to the data, by default
selecting the complexity by AIC (stats)
arima()# fits an ARIMA model to a univariate time series (stats)
arima.sim()# simulate from an ARIMA model (stats)
arma() # fits an ARMA model to a univariate time series by conditional
least squares (tseries)
lag.plot# plots time series against lagged versions of themselves. Helps
visualizing "auto-dependence" even when auto-correlations vanish (stats)
plot.ts()# plotting time-series objects (stats)
seqplot.ts()# plots a two time series on the same plot frame (tseries)
tsdiag()# a generic function to plot time-series diagnostics (stats)
ts.plot()# plots several time series on a common plot. Unlike 'plot.ts' the series
can have a different time bases, but they should have the same frequency (stats)
acf() # the function 'acf' computes (and by default plots) estimates of the
autocovariance or autocorrelation function.
pacf() # Function 'pacf' is the function used for the partial autocorrelations.
lag()# computes a lagged version of a time series, shifting the time base back by a
given number of observations (stats)

Opbouw van timeseries
Additive:
Time series = Seasonal + Trend + Random
https://anomaly.io/seasonal-trend-decomposition-in-r/
Multiplicative:
Time series = Trend * Seasonal *Random
kijk goed naar de twee figuren en vertel hoe je het verschil ziet

https://drsifu.wordpress.com/2012/11/27/time-series-econometrics/
1.Is the Mean constant ?
The mean of the series should not be a function of time
2.Is the Variance constant?
The variance of the series should not a be a function of time.
1.Is the Covariance constant?
The covariance of the i th term and the (i + m) th term should not
be a function of time
Om een tijdreeks te analyseren moet deze stationair zijn

Zijn de onderstaande reeksen stationair?

https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/
The mean of the series should not be a
function of time rather should be a
constant. The image below has the
left hand graph satisfying the condition
whereas the graph in red has a time
dependent mean.
The variance of the series should not a
be a function of time. This property is
known as homoscedasticity. Following
graph depicts what is and what is not a
stationary series. (Notice the varying
spread of distribution in the right hand
graph)
The variance of the series should not
a be a function of time. This property
is known as homoscedasticity.
Following graph depicts what is and
what is not a stationary series.
(Notice the varying spread of
distribution in the right hand graph)

Examples of non stationary time series WHY?

Handmatige decompositie van een additieve tijdreeks (4 stappen)
Stap 1: Importeer en converteer de data
Stap 2: detecteer de trend
install.packages("fpp")
library(fpp)
data(ausbeer)
timeserie_beer = tail(head(ausbeer,
17*4+2),17*4-4)
plot(as.ts(timeserie_beer))
Doe een moving average met een window
gelijk an de frequentie (kwartaal data bijv) en
separeer de trend.
library(forecast)
trend_beer = ma(timeserie_beer, order = 4,
centre = T)
plot(as.ts(timeserie_beer))
lines(trend_beer)
plot(as.ts(trend_beer))

Stap 3: haal de trend uit de tijdreeks
Stap 4: haal seizoenseffecten eruit
Het makkelijkst is het gemiddelde seizoenseffect
te berekenen door een matrix te maken van per
frequentie en de gemiddelden per kolom maken
detrend_beer = timeserie_beer - trend_beer
plot(as.ts(detrend_beer))
Let op het verschil tussen multiplatief en
additief
#maak een matrix van de data met kolommen
gelijk aan de frequentie
#en kantel de matrix zodat de kolommen
m_beer = t(matrix(data = detrend_beer, nrow
= 4))
#bereken per frequentie element het
gemiddelde
seasonal_beer = colMeans(m_beer, na.rm = T)
plot(as.ts(rep(seasonal_beer,16)))

Stap 5: onderzoek het random effect
Stap 6: recnstrueer het signaal (om te checken)
additief
random_beer = timeserie_beer - trend_beer -
seasonal_beer
plot(as.ts(random_beer))
recomposed_beer =
trend_beer+seasonal_beer+random_beer
plot(as.ts(recomposed_beer))

Automatische decompositie van een additieve tijdreeks
additief
#data converteren naar timeseries met
frequentie
ts_beer <- ts(timeserie_beer, frequency = 4)
#decompose is de functie die het allemaal
doet
decompose_beer <- decompose(ts_beer,
"additive")
#het resultaat van de functie is een object
met de verschillende data erin
plot(as.ts(decompose_beer$seasonal))
plot(as.ts(decompose_beer$trend))
plot(as.ts(decompose_beer$random))
plot(decompose_beer)

Maar wat nu als je frequentie vd sezoensinvloed niet weet?
De Fourier transformatie – dummy style uitgelegd
De Fourier transformatie breekt een signaal uiteen naar alle mogelijke
frequenties waar het signaal uit bestaat:
een sinus een samengesteld signaal
Het resultaat is een grafiek waarbij de pieken de frequenties van de
seizoensinvloed zijn, denk erom frequentie dus T = 1 / f
Sin(ϖt)

Maar wat nu als je frequentie vd sezoensinvloed niet weet?
De Fourier transformatie in R
https://anomaly.io/detect-seasonality-using-fourier-transform-r/
# Install and import TSA package
install.packages("TSA")
library(TSA)
# Lees een dataset in
raw = read.csv("iets.csv")
# compute the Fourier Transform
p = periodogram(vectormetwaarden)
#let op dit is een object met data erin!
#maak df met frequenie en spec = hoogte piek
dd = data.frame(freq=p$freq, spec=p$spec)
#rangschik df van hoge naar lage pieken
order = dd[order(-dd$spec),]
#pak de belangijkste 2 (of meer) pieken eruit
top2 = head(order, 2)
# display the 2 highest "power" frequencies
top2
# convert frequency to time periods
time = 1/top2$f
time

Omgaan met Missing Data in R:
het ontstaan van NA’s
Beschouw de volgende tijdreeks
wat is er mis?
http://publish.illinois.edu/spencer-guerrero/2014/12/11/2-dealing-with-missing-data-in-r-omit-approx-or-spline-part-1/
tijdreeks<-data.frame(jaar=rep(2015,10),
maand=c(1:5,7:9,11:12),
waarde=runif(10))
Hoewel er geen NA’s zijn is de reeks, zit er niet dezelfde afstand tussen de waarnemingen`

Beschouw de volgende tijdreeks
wat is er mis?
maand=c(1:5,7:9,11:12),
waarde=runif(10))
Hoewel er geen NA’s zijn is de reeks, zit er niet dezelfde afstand tussen de waarnemingen`
1
2
3
4
5
7
8
9
11
12
1
2
3
4
5
6
7
8
9
10
11
12
De oplossing

De oplossing
#tijdreeks met ontbrekende maanden
#maar geen NA
maand=c(1:5,7:9,11:12),
waarde=runif(10))
#een totaal jaar zonder waarden
TOTAALjaar<-data.frame(jaar=rep(2015,12),
maand=c(1:12))
#een samengevoegde tijdreeks met alle maanden
#en met NA op de momenten dat de tijdreeks geen
#data heeft
Tijdreeks_correctie<-merge(tijdreeks,
TOTAALjaar,
by=c("maand","jaar"),
all.y = TRUE)

Omit? Of locf, Approx, or Spline
Eerder hebben we gedefinieerd:
Sterker nog, we zullen uitgaan van tijdreeksen met waarnemingen op een vaste afstand van elkaar.
maar wat nu als je NA hebt, dan heb je of
• een gat in je series (wel tijdpunt behouden, NA als waarde)
• Niet meer een vaste afstand tussen opvolgende punten (als je NA
verwijdert)

zoo package heeft 3 functies die helpen bij NA’s in timeseries
• na.locf() ->last observation carried forward
• na.approx() ->linear interpolation
• na.spline() ->polynomial interpolation
install.packages('zoo')
library(zoo)
#datasetje aanmaken
missingData<-c(4,5,3,NA,NA,7,NA,4)
plot(missingData)

na.spline()na.locf() na.approx()
Verder lezen, kijk dan op
http://publish.illinois.edu/spencer-guerrero/2014/12/11/2-dealing-with-missing-data-in-r-omit-approx-
or-spline-part-2/
#na.locf
plot(
na.locf(missingData),type='l',
col=623)
points(missingData,col='blue')
#lineaire approximatie
plot(na.approx(missingData),type='l'
,col=459)
points(missingData,col='blue')
#polynome benadering
plot(na.spline(missingData
),type='l',col=300)
points(missingData,col='bl
ue')

Verdeling maken van de ruis
hist(decompose_beer$random)

Het AR model
X(t) = Rho * X(t-1) + Er(t)
Neem de volgende functie
Als Rho = 0 dan is het resultaat een heerlijk stuk witte ruis

Het AR model
X(t) = Rho * X(t-1) + Er(t)
Als Rho = 0.5 wat voor verschil zie je dan?

Het AR model
X(t) = Rho * X(t-1) + Er(t)
Als Rho = 0.9 wat voor verschil zie je nu?

Het AR model
X(t) = Rho * X(t-1) + Er(t)
Als Rho = 1.0 dan hebben we een random walk
Die niet stationair is want E[X(t)] = Rho *E[ X(t-1)]

een mooi voorbeeld van een AR(1) model
Het bruto nationaal product (bnp) is de waarde van alle goederen en
diensten die in een bepaalde periode (meestal een jaar) door een bepaald land
worden geproduceerd: het bruto binnenlands product plus de door de
staatsburgers in het buitenland verdiende primaire inkomens minus de door
buitenlanders in het betreffende land verdiende primaire inkomens.
De hypthese is dat
BNP(t) = alpha * BNP(t – 1) + error (t)

Tweede voorbeeld AR(1)
Stel je verkoopt een product X met een stabiele verkoop van X0. Op
een gegeven moment maak je reclame op t=9 waardoor de verkopen
stijgen. Een alpha % van de klanten die product kopen, kopen je
product nog een keer. Het verloop van je verkoop ziet er dan als volgt
uit.

Het Moving Average model
x(t) = beta * error(t-1) + error (t)
Merk op dat een MA(1)-proces niet hetzelfde is als de moving average die
we gebruiken om te effenen.
* De ene zal gebruikt worden om de trend te bepalen van een
deterministisch proces.
* De andere, het MA(1)-proces, zal gebruikt worden als
bouwsteen om voorspellingen te doen voor een kansproces

Welk model hebben we mee te maken
Zodra je een stationair proces hebt (decompositie) moet je je afvragen met
welk type model je te maken hebt (i) ruis, (ii) AR (iii) MA (iv) anders. Dit vinden
we door de correlatie tussen Xt en X(t-n) te onderzoeken.
We ondercheiden de ACF en PACF
• ACF = auto correlatie functie
• PACF = partiele correlatie functie (ACF minus lags)

Timeseries 1 geeft onderstaande plot Timeseries 2 geeft onderstaande plot
Duidelijk een AR(2) model:
• Exponentieel aflopende ACF
• PACF is niet meer significant bij 2
Duidelijk een MA(2) model:
• Exponentieel aflopende PACF
• ACF is niet meer significant bij 2

http://stats.stackexchange.com/questions/45539/ar1-selection-using-sample-acf-pacf
Hoe zou witte ruis eruit zien??

http://stats.stackexchange.com/questions/45539/ar1-selection-using-sample-acf-pacf

Forecasts using Exponential Smoothing¶
Exponential smoothing can be used to make short-term forecasts for
time series data.
http://a-little-book-of-r-for-time-
series.readthedocs.io/en/latest/src/timeseries.html

DICKEY FULLER TEST OF STATIONARITY
X(t) = Rho * X(t-1) + Er(t)
 X(t) - X(t-1) = (Rho - 1) X(t - 1) + Er(t)
We have to test if Rho – 1 is significantly different than zero or not. If
the null hypothesis gets rejected, we’ll get a stationary time series.
Stationary testing and converting a series into a stationary series are
the most critical processes in a time series modelling. You need to
memorize each and every detail of this concept to move on to the
next step of time series modelling.

Handmatige decompositie van een multiplatieve tijdreeks
Stap 1: Importeer en converteer de data
Stap 2: detecteer de trend
Doe een moving average met een window
gelijk an de frequentie (in dit geval 12
maanden per jaar) en separeer de trend.
install.packages("Ecdat")
library(Ecdat)
data(AirPassengers)
timeserie_air = AirPassengers
plot(as.ts(timeserie_air))
install.packages("forecast")
library(forecast)
trend_air = ma(timeserie_air, order = 12,
centre = T)
plot(as.ts(timeserie_air))
lines(trend_air)
plot(as.ts(trend_air))

Stap 3: haal de trend uit de tijdreeks
Stap 4: haal seizoenseffecten eruit
Het makkelijkst is het gemiddelde seizoenseffect
te berekenen door een matrix te maken van per
frequentie en de gemiddelden per kolom maken
additief, dus nu delen door!
detrend_air = timeserie_air / trend_air
plot(as.ts(detrend_air))
Je ziet dat het seizoenseffect niet meer toeneemt,
logisch de vermenigvuldiging is eruit
m_air = t(matrix(data = detrend_air, nrow =
12))
seasonal_air = colMeans(m_air, na.rm = T)
plot(as.ts(rep(seasonal_air,12)))

Stap 6: recnstrueer het signaal (om te checken)
additief
random_air = timeserie_air / (trend_air *
seasonal_air)
plot(as.ts(random_air))
recomposed_air =
trend_air*seasonal_air*random_air
plot(as.ts(recomposed_air))

additief
Automatische decompositie van een multiplatieve tijdreeks
ts_air = ts(timeserie_air, frequency = 12)
decompose_air = decompose(ts_air,
"multiplicative")
plot(as.ts(decompose_air$seasonal))
plot(as.ts(decompose_air$trend))
plot(as.ts(decompose_air$random))
plot(decompose_air)

Software Engineering College 6 -timeseries data

More Related Content

Viewers also liked

More from Jurjen Helmus

Software Engineering College 6 -timeseries data

Editor's Notes