Traditionally, time series forecasting has been a popular discipline among the finance realm but often neglected as an area of machine learning. The growing popularity of IoT, sensor networks, and other telemetry applications lead to the collection of vast amount of time series data and forecasting is now used for a multitude of use cases from application performance optimization to workload anomaly detection. The challenge then is to automate a process that was handcrafted for the analysis of a single data series constituted of just tens of data points to large scale processing of thousands of time series and millions of data points. In this talk, we will show how InfluxDB can be used to tackle on some of the issues and solutions to deal with time series forecasting at scale, including continuous accuracy evaluation and algorithm hyperparameters optimization. As a real world example we will be discussing some of the solutions implemented in Veritas Predictive Insight which is capable of training, evaluating and forecasting over 70,000 time series daily
So here is the agenda for today:
We will first go through some basic definitions for those that are not familiar with forecasting
Then we will discuss what are the challenges of automating forecasting and how some of them can be solved
Then we will look on how to validate our model performance
Finally we will discuss how to approach the hyperparameter optimization of models deployed at scale
So here is the agenda for today:
We will first go through some basic definitions for those that are not familiar with forecasting
Then we will discuss what are the challenges of automating forecasting and how some of them can be solved
Then we will look on how to validate our model performance
Finally we will discuss how to approach the hyperparameter optimization of models deployed at scale
So what is forecasting?
Forecasting is trying to predict the future based on the past data
Before we go further, let’s just make one thing clear…
…all it comes down to this statement from a famous statistician. Models are not perfect, they always have an error term. What we are trying to do is to have the model perform better than chance
Sentence in red: this is the so called stationary data assumption. That is mean, variance, and autocorrelation do not change over time.
An explanatory model is useful because it incorporates information about other variables, rather than only historical values of the variable to be forecast.
However, there are several reasons a forecaster might select a time series model rather than an explanatory or mixed model.
First, the system may not be understood, and even if it was understood it may be extremely difficult to measure the relationships that are assumed to govern its behavior.
Second, it is necessary to know or forecast the future values of the various predictors in order to be able to forecast the variable of interest, and this may be too difficult.
Third, the main concern may be only to predict what will happen, not to know why it happens.
Finally, the time series model may give more accurate forecasts than an explanatory or mixed model.
Left: additive
Right: multiplicative
Remainder: mean zero, no significant autocorrelation
So here is the agenda for today:
We will first go through some basic definitions for those that are not familiar with forecasting
Then we will discuss what are the challenges of automating forecasting and how some of them can be solved
Then we will look on how to validate our model performance
Finally we will discuss how to approach the hyperparameter optimization of models deployed at scale
NGINX runs on same machine as one of Influx Nodes
So here is the agenda for today:
We will first go through some basic definitions for those that are not familiar with forecasting
Then we will discuss what are the challenges of automating forecasting and how some of them can be solved
Then we will look on how to validate our model performance
Finally we will discuss how to approach the hyperparameter optimization of models deployed at scale
DS = Data Science (the stats part)
So here is the agenda for today:
We will first go through some basic definitions for those that are not familiar with forecasting
Then we will discuss what are the challenges of automating forecasting and how some of them can be solved
Then we will look on how to validate our model performance
Finally we will discuss how to approach the hyperparameter optimization of models deployed at scale
Paper Link:
Tashman, Leonard J. "Out-of-sample tests of forecasting accuracy: an analysis and review." International journal of forecasting 16.4 (2000): 437-450.
Paper Links:
Hyndman, Rob J., and Anne B. Koehler. "Another look at measures of forecast accuracy." International journal of forecasting 22.4 (2006): 679-688.
Chen, Chao, Jamie Twycross, and Jonathan M. Garibaldi. "A new accuracy measure based on bounded relative error for time series forecasting." PloS one 12.3 (2017): e0174202.
So here is the agenda for today:
We will first go through some basic definitions for those that are not familiar with forecasting
Then we will discuss what are the challenges of automating forecasting and how some of them can be solved
Then we will look on how to validate our model performance
Finally we will discuss how to approach the hyperparameter optimization of models deployed at scale
Objective score:
Computed from online accuracy data
Result of several forecast runs give an algorithm and a specific set of hyperparameters
Uber #1
Paper Link:
Makridakis, Spyros, Evangelos Spiliotis, and Vassilios Assimakopoulos. "The M4 Competition: Results, findings, conclusion and way forward." International Journal of Forecasting 34.4 (2018): 802-808.