2. TIME SERIES ANALYSIS:THEORY AND PRACTICE
SOME HOUSEKEEPING
▸ Call for presenters over the summer period
▸ Please don’t use the CodeNode bar after the meetup since
it’s booked for a private event - go to the pub across the
road
2
3. TIME SERIES ANALYSIS:THEORY AND PRACTICE
DEFINITION OF TIME SERIES DATA
▸ Sequence of measurements (data points) -
▸ that follow non-random order (i.e. are successive) -
▸ taken over regular time intervals -
▸ usually with no more than one data point per interval (if
there’s more than one data point - we call it multiple time
series analysis and use slightly different approaches to
modelling).
3
4. TIME SERIES ANALYSIS:THEORY AND PRACTICE
HOW ARE TIME SERIES DIFFERENT FROM OTHER TYPES OF DATA?
▸ Panel data
▸ Cross-sectional data
▸ Time series is a type of cross-sectional data set where one
measurement is differentiated from another by time stamp only
4
5. TIME SERIES ANALYSIS:THEORY AND PRACTICE
APPLICATIONS
▸ Financial markets
▸ Weather forecasting
▸ Sales forecasting
▸ Signal processing
▸ Natural language processing
5
7. TIME SERIES ANALYSIS:THEORY AND PRACTICE
TRENDING
▸ A trend exists when there is a long-term increase or decrease in the
data. It does not have to be linear. A trend can “change direction” and,
say, go from increasing to decreasing.
▸ Trends usually become visible when a linear function is fitted to the
data.
7
Source: http://jcflowers1.iweb.bsu.edu/rlo/trends.htm
8. TIME SERIES ANALYSIS:THEORY AND PRACTICE
SEASONALITY AND CYCLES
▸ A seasonal pattern exists when a series is influenced by
seasonal factors (e.g. the month of the year or day of the
week). Seasonality is always of a fixed and of a known period.
▸ A cyclic pattern exists when data exhibit rises and falls that
are not of fixed period. The duration of these fluctuations is
usually of at least 2 years (e.g. economic cycles).
▸ What may seem to be a trend over a short period of time
may be due to seasonality/cycle over a longer period of time.
Always zoom in/zoom out when plotting your data!
8
9. TIME SERIES ANALYSIS:THEORY AND PRACTICE
WHAT DOES IT ALL LOOK LIKE ON A CHART?
9
Source: http://jcflowers1.iweb.bsu.edu/rlo/trends.htm
10. TIME SERIES ANALYSIS:THEORY AND PRACTICE
WHAT DOES IT ALL LOOK LIKE ON A CHART?
10
Source: http://jcflowers1.iweb.bsu.edu/rlo/trends.htm
11. TIME SERIES ANALYSIS:THEORY AND PRACTICE
WHAT DOES IT ALL LOOK LIKE ON A CHART?
11
Source: http://jcflowers1.iweb.bsu.edu/rlo/trends.htm
13. TIME SERIES ANALYSIS:THEORY AND PRACTICE
TESTING FOR TRENDS AND SEASONALITY
▸ Checking for seasonality: autocorrelation.
▸ Checking for trends: fit a simple curve or a rolling average
and eyeball the chart. No proven automatic tests. Strong
autocorrelation with the time period immediately
preceding the measurement also suggests a trend
component.
13
14. TIME SERIES ANALYSIS:THEORY AND PRACTICE
ON THE IMPORTANCE OF ASKING THE RIGHT QUESTIONS
▸ What are you trying to predict?
▸ Do you know how the measurements were taken?
▸ Do you have any missing values in the dataset? If yes, what
do they represent?
▸ Do you need to adjust for seasonality or trend?
▸ What “shape” is your dataset?
▸ What are the assumptions being made?
14
16. TIME SERIES ANALYSIS:THEORY AND PRACTICE
NOW TO THE PRACTICE BIT
▸ You can’t use the same procedures to analyse snapshot
and time series data.
▸ For example, you can’t randomly pick the data points that
will be withheld for cross-validation and testing purposes.
Why?
▸ Make sure to understand as much as possible about the
underlying factors that affect the measurements.
16
17. TIME SERIES ANALYSIS:THEORY AND PRACTICE
PLOT, PLOT, THEN PLOT AGAIN
▸ Plotting your data will allow you to uncover the structure
of the dataset, spot irregularities in the data and figure out
which adjustments need to be made before proceeding
with the modelling.
▸ Useful libraries: pandas, numpy, json, matplotlib.pyplot,
pathlib, seaborn, scipy stats, statsmodels.
17
22. TIME SERIES ANALYSIS:THEORY AND PRACTICE
TIPS AND TRICKS FOR PLOTTING
▸ Smoothing - linear and exponential
▸ To see the “bigger picture” you may want to look at a moving average of the
input values.
▸ This is what they call “smoothing”.
▸ Linear smoothing gives equal weight to all the points it’s averaging over,
exponential smoothing gives more weight to more recent points.
▸ Points taken as inputs by moving average can be either centred around the
original value or directly behind it.
▸ Use [ColumnName].rolling.(window=[window size], center=True).mean().plot()
to plot rolling average. You can also replace mean by median.
22
23. TIME SERIES ANALYSIS:THEORY AND PRACTICE
TIPS AND TRICKS FOR PLOTTING
▸ For more plotting tools from pandas, visit
▸ http://pandas.pydata.org/pandas-docs/stable/
visualization.html#visualization-autocorrelation
▸ http://pandas.pydata.org/pandas-docs/stable/
computation.html#rolling-windows
23
24. TIME SERIES ANALYSIS:THEORY AND PRACTICE
DATA LOADING AND PREPROCESSING
▸ The data often comes in the form of multiple large csv files that
need to be concatenated together for further processing or slicing.
▸ Here is a useful discussion on Stack Overflow covering this issue:
http://stackoverflow.com/questions/25210819/speeding-up-data-
import-function-pandas-and-appending-to-dataframe/
25210900#25210900
▸ A useful aside: to speed up processing, specify columns to import
and their data type when you’re reading csv into a data frame - and
you can specify different data types for different columns by using
a dictionary: http://pandas.pydata.org/pandas-docs/stable/
generated/pandas.read_csv.html
24
25. TIME SERIES ANALYSIS:THEORY AND PRACTICE
MODELLING APPROACHES-ARMA
▸ ARMA: autoregressive moving average
▸ Example: http://statsmodels.sourceforge.net/devel/
examples/notebooks/generated/tsa_arma.html
▸ ARMA models combine t autoregressive and moving-
average terms to predict (t+1)-th term
25
26. TIME SERIES ANALYSIS:THEORY AND PRACTICE
MODELLING APPROACHES-ARMA
▸ Autoregressive model of order p:
▸ c is a constant, φ are parameters, ε is the error term (white
noise).
▸ Moving average model of order q:
▸ μ is expectation of Xt, ε is again the error term, θ are
parameters.
▸ Combined:
26
27. TIME SERIES ANALYSIS:THEORY AND PRACTICE
MODELLING APPROACHES - ARMA
▸ Why do we combine AR and MA models?
▸ AR model assumes steady change and is poor for
predicting sudden fluctuations.
▸ MA model takes error terms as an input which allows us to
take into account sudden changes in output faster than AR
model would have done on its own.
▸ Data doesn’t come with errors predefined - these are in fact
extrapolated by first fitting a model like AR. See any issues?
27
28. TIME SERIES ANALYSIS:THEORY AND PRACTICE
OTHER MODELLING APPROACHES
▸ Spectrum/Fourier analysis
▸ Attempts to decompose the function into a sum of sinusoidal
waves.
▸ Main aim is to determine the length and amplitude of
underlying cycles in cases where they are not immediately
obvious.
▸ More useful for things like sun spot activity than sales
forecasting (in the latter case seasonal component is easily
guessed by just eyeballing the data).
28
29. TIME SERIES ANALYSIS:THEORY AND PRACTICE
LIMITATIONS OF STANDARD APPROACHES
▸ Difficulty capturing high level dependencies - additional
rules typically have the be hardcoded.
▸ Can’t handle all of the possible data structures effectively.
29
30. TIME SERIES ANALYSIS:THEORY AND PRACTICE
PREDICTION HORIZON
▸ Why can’t we see far into the future?
▸ An interlude on chaos theory
30
31. TIME SERIES ANALYSIS:THEORY AND PRACTICE
NEURAL NETWORKS - A POSSIBLE ALTERNATIVE
▸ Neural network architectures can be modified to capture
global dependencies (e.g. LSTM).
▸ Capable of both regression and classification, depending
on the choice of activation function.
▸ Next time we will discuss
31
32. TIME SERIES ANALYSIS:THEORY AND PRACTICE
USEFUL LINKS
▸ https://documents.software.dell.com/statistics/textbook/time-series-analysis
▸ https://en.wikipedia.org/wiki/Time_series
▸ http://www.fil.ion.ucl.ac.uk/~wpenny/course/array.pdf
▸ https://en.wikipedia.org/wiki/Weather_forecasting
▸ https://www.otexts.org/fpp/6/1
▸ http://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook-plotting
▸ http://pandas.pydata.org/pandas-docs/stable/visualization.html
▸ http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
▸ http://en.wikipedia.org/wiki/Autoregressive–moving-average_model
▸ http://jcflowers1.iweb.bsu.edu/rlo/trends.htm
32