Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Time Series Analysis: Theory and Practice

5,107 views

Published on

London Machine Learning Practice Meetup: Lecture slides from "Time Series Analysis: Theory and Practice"

Published in: Data & Analytics

Time Series Analysis: Theory and Practice

  1. 1. TIME SERIES ANALYSIS: THEORY AND PRACTICE LMLP MEETUP
  2. 2. TIME SERIES ANALYSIS:THEORY AND PRACTICE SOME HOUSEKEEPING ▸ Call for presenters over the summer period ▸ Please don’t use the CodeNode bar after the meetup since it’s booked for a private event - go to the pub across the road 2
  3. 3. TIME SERIES ANALYSIS:THEORY AND PRACTICE DEFINITION OF TIME SERIES DATA ▸ Sequence of measurements (data points) - ▸ that follow non-random order (i.e. are successive) - ▸ taken over regular time intervals - ▸ usually with no more than one data point per interval (if there’s more than one data point - we call it multiple time series analysis and use slightly different approaches to modelling). 3
  4. 4. TIME SERIES ANALYSIS:THEORY AND PRACTICE HOW ARE TIME SERIES DIFFERENT FROM OTHER TYPES OF DATA? ▸ Panel data ▸ Cross-sectional data ▸ Time series is a type of cross-sectional data set where one measurement is differentiated from another by time stamp only 4
  5. 5. TIME SERIES ANALYSIS:THEORY AND PRACTICE APPLICATIONS ▸ Financial markets ▸ Weather forecasting ▸ Sales forecasting ▸ Signal processing ▸ Natural language processing 5
  6. 6. TIME SERIES ANALYSIS:THEORY AND PRACTICE PROPERTIES OF TIME SERIES ▸ Seasonality ▸ Trending ▸ Cycles 6
  7. 7. TIME SERIES ANALYSIS:THEORY AND PRACTICE TRENDING ▸ A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. A trend can “change direction” and, say, go from increasing to decreasing. ▸ Trends usually become visible when a linear function is fitted to the data. 7 Source: http://jcflowers1.iweb.bsu.edu/rlo/trends.htm
  8. 8. TIME SERIES ANALYSIS:THEORY AND PRACTICE SEASONALITY AND CYCLES ▸ A seasonal pattern exists when a series is influenced by seasonal factors (e.g. the month of the year or day of the week). Seasonality is always of a fixed and of a known period. ▸ A cyclic pattern exists when data exhibit rises and falls that are not of fixed period. The duration of these fluctuations is usually of at least 2 years (e.g. economic cycles). ▸ What may seem to be a trend over a short period of time may be due to seasonality/cycle over a longer period of time. Always zoom in/zoom out when plotting your data! 8
  9. 9. TIME SERIES ANALYSIS:THEORY AND PRACTICE WHAT DOES IT ALL LOOK LIKE ON A CHART? 9 Source: http://jcflowers1.iweb.bsu.edu/rlo/trends.htm
  10. 10. TIME SERIES ANALYSIS:THEORY AND PRACTICE WHAT DOES IT ALL LOOK LIKE ON A CHART? 10 Source: http://jcflowers1.iweb.bsu.edu/rlo/trends.htm
  11. 11. TIME SERIES ANALYSIS:THEORY AND PRACTICE WHAT DOES IT ALL LOOK LIKE ON A CHART? 11 Source: http://jcflowers1.iweb.bsu.edu/rlo/trends.htm
  12. 12. TIME SERIES ANALYSIS:THEORY AND PRACTICE WHAT DOES IT ALL LOOK LIKE ON A CHART? 12
  13. 13. TIME SERIES ANALYSIS:THEORY AND PRACTICE TESTING FOR TRENDS AND SEASONALITY ▸ Checking for seasonality: autocorrelation. ▸ Checking for trends: fit a simple curve or a rolling average and eyeball the chart. No proven automatic tests. Strong autocorrelation with the time period immediately preceding the measurement also suggests a trend component. 13
  14. 14. TIME SERIES ANALYSIS:THEORY AND PRACTICE ON THE IMPORTANCE OF ASKING THE RIGHT QUESTIONS ▸ What are you trying to predict? ▸ Do you know how the measurements were taken? ▸ Do you have any missing values in the dataset? If yes, what do they represent? ▸ Do you need to adjust for seasonality or trend? ▸ What “shape” is your dataset? ▸ What are the assumptions being made? 14
  15. 15. TIME SERIES ANALYSIS:THEORY AND PRACTICE ON THE IMPORTANCE OF ASKING THE RIGHT QUESTIONS 15
  16. 16. TIME SERIES ANALYSIS:THEORY AND PRACTICE NOW TO THE PRACTICE BIT ▸ You can’t use the same procedures to analyse snapshot and time series data. ▸ For example, you can’t randomly pick the data points that will be withheld for cross-validation and testing purposes. Why? ▸ Make sure to understand as much as possible about the underlying factors that affect the measurements. 16
  17. 17. TIME SERIES ANALYSIS:THEORY AND PRACTICE PLOT, PLOT, THEN PLOT AGAIN ▸ Plotting your data will allow you to uncover the structure of the dataset, spot irregularities in the data and figure out which adjustments need to be made before proceeding with the modelling. ▸ Useful libraries: pandas, numpy, json, matplotlib.pyplot, pathlib, seaborn, scipy stats, statsmodels. 17
  18. 18. TIME SERIES ANALYSIS:THEORY AND PRACTICE TIPS AND TRICKS FOR PLOTTING ▸ Basic function: plot 18
  19. 19. TIME SERIES ANALYSIS:THEORY AND PRACTICE TIPS AND TRICKS FOR PLOTTING ▸ Plotting multiple lines 19
  20. 20. TIME SERIES ANALYSIS:THEORY AND PRACTICE TIPS AND TRICKS FOR PLOTTING ▸ Autocorrelation ▸ Use autocorrelation_plot from pandas.tools.plotting 20
  21. 21. TIME SERIES ANALYSIS:THEORY AND PRACTICE TIPS AND TRICKS FOR PLOTTING ▸ Autocorrelation 21
  22. 22. TIME SERIES ANALYSIS:THEORY AND PRACTICE TIPS AND TRICKS FOR PLOTTING ▸ Smoothing - linear and exponential ▸ To see the “bigger picture” you may want to look at a moving average of the input values. ▸ This is what they call “smoothing”. ▸ Linear smoothing gives equal weight to all the points it’s averaging over, exponential smoothing gives more weight to more recent points. ▸ Points taken as inputs by moving average can be either centred around the original value or directly behind it. ▸ Use [ColumnName].rolling.(window=[window size], center=True).mean().plot() to plot rolling average. You can also replace mean by median. 22
  23. 23. TIME SERIES ANALYSIS:THEORY AND PRACTICE TIPS AND TRICKS FOR PLOTTING ▸ For more plotting tools from pandas, visit ▸ http://pandas.pydata.org/pandas-docs/stable/ visualization.html#visualization-autocorrelation ▸ http://pandas.pydata.org/pandas-docs/stable/ computation.html#rolling-windows 23
  24. 24. TIME SERIES ANALYSIS:THEORY AND PRACTICE DATA LOADING AND PREPROCESSING ▸ The data often comes in the form of multiple large csv files that need to be concatenated together for further processing or slicing. ▸ Here is a useful discussion on Stack Overflow covering this issue: http://stackoverflow.com/questions/25210819/speeding-up-data- import-function-pandas-and-appending-to-dataframe/ 25210900#25210900 ▸ A useful aside: to speed up processing, specify columns to import and their data type when you’re reading csv into a data frame - and you can specify different data types for different columns by using a dictionary: http://pandas.pydata.org/pandas-docs/stable/ generated/pandas.read_csv.html 24
  25. 25. TIME SERIES ANALYSIS:THEORY AND PRACTICE MODELLING APPROACHES-ARMA ▸ ARMA: autoregressive moving average ▸ Example: http://statsmodels.sourceforge.net/devel/ examples/notebooks/generated/tsa_arma.html ▸ ARMA models combine t autoregressive and moving- average terms to predict (t+1)-th term 25
  26. 26. TIME SERIES ANALYSIS:THEORY AND PRACTICE MODELLING APPROACHES-ARMA ▸ Autoregressive model of order p: ▸ c is a constant, φ are parameters, ε is the error term (white noise). ▸ Moving average model of order q: ▸ μ is expectation of Xt, ε is again the error term, θ are parameters. ▸ Combined: 26
  27. 27. TIME SERIES ANALYSIS:THEORY AND PRACTICE MODELLING APPROACHES - ARMA ▸ Why do we combine AR and MA models? ▸ AR model assumes steady change and is poor for predicting sudden fluctuations. ▸ MA model takes error terms as an input which allows us to take into account sudden changes in output faster than AR model would have done on its own. ▸ Data doesn’t come with errors predefined - these are in fact extrapolated by first fitting a model like AR. See any issues? 27
  28. 28. TIME SERIES ANALYSIS:THEORY AND PRACTICE OTHER MODELLING APPROACHES ▸ Spectrum/Fourier analysis ▸ Attempts to decompose the function into a sum of sinusoidal waves. ▸ Main aim is to determine the length and amplitude of underlying cycles in cases where they are not immediately obvious. ▸ More useful for things like sun spot activity than sales forecasting (in the latter case seasonal component is easily guessed by just eyeballing the data). 28
  29. 29. TIME SERIES ANALYSIS:THEORY AND PRACTICE LIMITATIONS OF STANDARD APPROACHES ▸ Difficulty capturing high level dependencies - additional rules typically have the be hardcoded. ▸ Can’t handle all of the possible data structures effectively. 29
  30. 30. TIME SERIES ANALYSIS:THEORY AND PRACTICE PREDICTION HORIZON ▸ Why can’t we see far into the future? ▸ An interlude on chaos theory 30
  31. 31. TIME SERIES ANALYSIS:THEORY AND PRACTICE NEURAL NETWORKS - A POSSIBLE ALTERNATIVE ▸ Neural network architectures can be modified to capture global dependencies (e.g. LSTM). ▸ Capable of both regression and classification, depending on the choice of activation function. ▸ Next time we will discuss 31
  32. 32. TIME SERIES ANALYSIS:THEORY AND PRACTICE USEFUL LINKS ▸ https://documents.software.dell.com/statistics/textbook/time-series-analysis ▸ https://en.wikipedia.org/wiki/Time_series ▸ http://www.fil.ion.ucl.ac.uk/~wpenny/course/array.pdf ▸ https://en.wikipedia.org/wiki/Weather_forecasting ▸ https://www.otexts.org/fpp/6/1 ▸ http://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook-plotting ▸ http://pandas.pydata.org/pandas-docs/stable/visualization.html ▸ http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html ▸ http://en.wikipedia.org/wiki/Autoregressive–moving-average_model ▸ http://jcflowers1.iweb.bsu.edu/rlo/trends.htm 32

×