Successfully reported this slideshow.
Upcoming SlideShare
×

# Time Series Analysis: Theory and Practice

5,107 views

Published on

London Machine Learning Practice Meetup: Lecture slides from "Time Series Analysis: Theory and Practice"

Published in: Data & Analytics
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Time Series Analysis: Theory and Practice

1. 1. TIME SERIES ANALYSIS: THEORY AND PRACTICE LMLP MEETUP
2. 2. TIME SERIES ANALYSIS:THEORY AND PRACTICE SOME HOUSEKEEPING ▸ Call for presenters over the summer period ▸ Please don’t use the CodeNode bar after the meetup since it’s booked for a private event - go to the pub across the road 2
3. 3. TIME SERIES ANALYSIS:THEORY AND PRACTICE DEFINITION OF TIME SERIES DATA ▸ Sequence of measurements (data points) - ▸ that follow non-random order (i.e. are successive) - ▸ taken over regular time intervals - ▸ usually with no more than one data point per interval (if there’s more than one data point - we call it multiple time series analysis and use slightly different approaches to modelling). 3
4. 4. TIME SERIES ANALYSIS:THEORY AND PRACTICE HOW ARE TIME SERIES DIFFERENT FROM OTHER TYPES OF DATA? ▸ Panel data ▸ Cross-sectional data ▸ Time series is a type of cross-sectional data set where one measurement is differentiated from another by time stamp only 4
5. 5. TIME SERIES ANALYSIS:THEORY AND PRACTICE APPLICATIONS ▸ Financial markets ▸ Weather forecasting ▸ Sales forecasting ▸ Signal processing ▸ Natural language processing 5
6. 6. TIME SERIES ANALYSIS:THEORY AND PRACTICE PROPERTIES OF TIME SERIES ▸ Seasonality ▸ Trending ▸ Cycles 6
7. 7. TIME SERIES ANALYSIS:THEORY AND PRACTICE TRENDING ▸ A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. A trend can “change direction” and, say, go from increasing to decreasing. ▸ Trends usually become visible when a linear function is ﬁtted to the data. 7 Source: http://jcﬂowers1.iweb.bsu.edu/rlo/trends.htm
8. 8. TIME SERIES ANALYSIS:THEORY AND PRACTICE SEASONALITY AND CYCLES ▸ A seasonal pattern exists when a series is inﬂuenced by seasonal factors (e.g. the month of the year or day of the week). Seasonality is always of a ﬁxed and of a known period. ▸ A cyclic pattern exists when data exhibit rises and falls that are not of ﬁxed period. The duration of these ﬂuctuations is usually of at least 2 years (e.g. economic cycles). ▸ What may seem to be a trend over a short period of time may be due to seasonality/cycle over a longer period of time. Always zoom in/zoom out when plotting your data! 8
9. 9. TIME SERIES ANALYSIS:THEORY AND PRACTICE WHAT DOES IT ALL LOOK LIKE ON A CHART? 9 Source: http://jcﬂowers1.iweb.bsu.edu/rlo/trends.htm
10. 10. TIME SERIES ANALYSIS:THEORY AND PRACTICE WHAT DOES IT ALL LOOK LIKE ON A CHART? 10 Source: http://jcﬂowers1.iweb.bsu.edu/rlo/trends.htm
11. 11. TIME SERIES ANALYSIS:THEORY AND PRACTICE WHAT DOES IT ALL LOOK LIKE ON A CHART? 11 Source: http://jcﬂowers1.iweb.bsu.edu/rlo/trends.htm
12. 12. TIME SERIES ANALYSIS:THEORY AND PRACTICE WHAT DOES IT ALL LOOK LIKE ON A CHART? 12
13. 13. TIME SERIES ANALYSIS:THEORY AND PRACTICE TESTING FOR TRENDS AND SEASONALITY ▸ Checking for seasonality: autocorrelation. ▸ Checking for trends: ﬁt a simple curve or a rolling average and eyeball the chart. No proven automatic tests. Strong autocorrelation with the time period immediately preceding the measurement also suggests a trend component. 13
14. 14. TIME SERIES ANALYSIS:THEORY AND PRACTICE ON THE IMPORTANCE OF ASKING THE RIGHT QUESTIONS ▸ What are you trying to predict? ▸ Do you know how the measurements were taken? ▸ Do you have any missing values in the dataset? If yes, what do they represent? ▸ Do you need to adjust for seasonality or trend? ▸ What “shape” is your dataset? ▸ What are the assumptions being made? 14
15. 15. TIME SERIES ANALYSIS:THEORY AND PRACTICE ON THE IMPORTANCE OF ASKING THE RIGHT QUESTIONS 15
16. 16. TIME SERIES ANALYSIS:THEORY AND PRACTICE NOW TO THE PRACTICE BIT ▸ You can’t use the same procedures to analyse snapshot and time series data. ▸ For example, you can’t randomly pick the data points that will be withheld for cross-validation and testing purposes. Why? ▸ Make sure to understand as much as possible about the underlying factors that affect the measurements. 16
17. 17. TIME SERIES ANALYSIS:THEORY AND PRACTICE PLOT, PLOT, THEN PLOT AGAIN ▸ Plotting your data will allow you to uncover the structure of the dataset, spot irregularities in the data and ﬁgure out which adjustments need to be made before proceeding with the modelling. ▸ Useful libraries: pandas, numpy, json, matplotlib.pyplot, pathlib, seaborn, scipy stats, statsmodels. 17
18. 18. TIME SERIES ANALYSIS:THEORY AND PRACTICE TIPS AND TRICKS FOR PLOTTING ▸ Basic function: plot 18
19. 19. TIME SERIES ANALYSIS:THEORY AND PRACTICE TIPS AND TRICKS FOR PLOTTING ▸ Plotting multiple lines 19
20. 20. TIME SERIES ANALYSIS:THEORY AND PRACTICE TIPS AND TRICKS FOR PLOTTING ▸ Autocorrelation ▸ Use autocorrelation_plot from pandas.tools.plotting 20
21. 21. TIME SERIES ANALYSIS:THEORY AND PRACTICE TIPS AND TRICKS FOR PLOTTING ▸ Autocorrelation 21
22. 22. TIME SERIES ANALYSIS:THEORY AND PRACTICE TIPS AND TRICKS FOR PLOTTING ▸ Smoothing - linear and exponential ▸ To see the “bigger picture” you may want to look at a moving average of the input values. ▸ This is what they call “smoothing”. ▸ Linear smoothing gives equal weight to all the points it’s averaging over, exponential smoothing gives more weight to more recent points. ▸ Points taken as inputs by moving average can be either centred around the original value or directly behind it. ▸ Use [ColumnName].rolling.(window=[window size], center=True).mean().plot() to plot rolling average. You can also replace mean by median. 22
23. 23. TIME SERIES ANALYSIS:THEORY AND PRACTICE TIPS AND TRICKS FOR PLOTTING ▸ For more plotting tools from pandas, visit ▸ http://pandas.pydata.org/pandas-docs/stable/ visualization.html#visualization-autocorrelation ▸ http://pandas.pydata.org/pandas-docs/stable/ computation.html#rolling-windows 23
24. 24. TIME SERIES ANALYSIS:THEORY AND PRACTICE DATA LOADING AND PREPROCESSING ▸ The data often comes in the form of multiple large csv ﬁles that need to be concatenated together for further processing or slicing. ▸ Here is a useful discussion on Stack Overﬂow covering this issue: http://stackoverﬂow.com/questions/25210819/speeding-up-data- import-function-pandas-and-appending-to-dataframe/ 25210900#25210900 ▸ A useful aside: to speed up processing, specify columns to import and their data type when you’re reading csv into a data frame - and you can specify different data types for different columns by using a dictionary: http://pandas.pydata.org/pandas-docs/stable/ generated/pandas.read_csv.html 24
25. 25. TIME SERIES ANALYSIS:THEORY AND PRACTICE MODELLING APPROACHES-ARMA ▸ ARMA: autoregressive moving average ▸ Example: http://statsmodels.sourceforge.net/devel/ examples/notebooks/generated/tsa_arma.html ▸ ARMA models combine t autoregressive and moving- average terms to predict (t+1)-th term 25
26. 26. TIME SERIES ANALYSIS:THEORY AND PRACTICE MODELLING APPROACHES-ARMA ▸ Autoregressive model of order p: ▸ c is a constant, φ are parameters, ε is the error term (white noise). ▸ Moving average model of order q: ▸ μ is expectation of Xt, ε is again the error term, θ are parameters. ▸ Combined: 26
27. 27. TIME SERIES ANALYSIS:THEORY AND PRACTICE MODELLING APPROACHES - ARMA ▸ Why do we combine AR and MA models? ▸ AR model assumes steady change and is poor for predicting sudden ﬂuctuations. ▸ MA model takes error terms as an input which allows us to take into account sudden changes in output faster than AR model would have done on its own. ▸ Data doesn’t come with errors predeﬁned - these are in fact extrapolated by ﬁrst ﬁtting a model like AR. See any issues? 27
28. 28. TIME SERIES ANALYSIS:THEORY AND PRACTICE OTHER MODELLING APPROACHES ▸ Spectrum/Fourier analysis ▸ Attempts to decompose the function into a sum of sinusoidal waves. ▸ Main aim is to determine the length and amplitude of underlying cycles in cases where they are not immediately obvious. ▸ More useful for things like sun spot activity than sales forecasting (in the latter case seasonal component is easily guessed by just eyeballing the data). 28
29. 29. TIME SERIES ANALYSIS:THEORY AND PRACTICE LIMITATIONS OF STANDARD APPROACHES ▸ Difﬁculty capturing high level dependencies - additional rules typically have the be hardcoded. ▸ Can’t handle all of the possible data structures effectively. 29
30. 30. TIME SERIES ANALYSIS:THEORY AND PRACTICE PREDICTION HORIZON ▸ Why can’t we see far into the future? ▸ An interlude on chaos theory 30
31. 31. TIME SERIES ANALYSIS:THEORY AND PRACTICE NEURAL NETWORKS - A POSSIBLE ALTERNATIVE ▸ Neural network architectures can be modiﬁed to capture global dependencies (e.g. LSTM). ▸ Capable of both regression and classiﬁcation, depending on the choice of activation function. ▸ Next time we will discuss 31
32. 32. TIME SERIES ANALYSIS:THEORY AND PRACTICE USEFUL LINKS ▸ https://documents.software.dell.com/statistics/textbook/time-series-analysis ▸ https://en.wikipedia.org/wiki/Time_series ▸ http://www.ﬁl.ion.ucl.ac.uk/~wpenny/course/array.pdf ▸ https://en.wikipedia.org/wiki/Weather_forecasting ▸ https://www.otexts.org/fpp/6/1 ▸ http://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook-plotting ▸ http://pandas.pydata.org/pandas-docs/stable/visualization.html ▸ http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html ▸ http://en.wikipedia.org/wiki/Autoregressive–moving-average_model ▸ http://jcﬂowers1.iweb.bsu.edu/rlo/trends.htm 32