2. Agenda Part B
• Forecasting models: Direct vs Cross-validation procedure
• Moving Average Forecast with application
• Exponential Smoothing
– Simple Exponential Smoothing with application
– Linear Exponential smoothing
– JMP Download and Installation
– Excel Applications (simple and linear exponential smoothing)
– Winters exponential smoothing
– JMP Tutorial
– 2 JMP Group Exercises
• Launch of assignment 2
• Linear regression with applications
• Autoregressive models with applications
3. Teaching methodology
• Using MS Excel is not part of the program
• Exercises and applications are usually solved by the teacher, make
sure you can follow
• Individual exercises are provided (Individual exercises.xlsx) to test
your understanding after the class
• It is required a basic understanding on how to use MS Excel with the
following features:
– Basic operations (copy, paste, formatting,…)
– Basic formulas (sum, dragging formulas,...)
– Matrix formulas
• There is a nice tutorial provided by Microsoft on how to use MS
Excel, please take a look if you need it!
4. Forecasting models
• Forecast = f (D, a, b, c,…)
– f: function, depends on the type of model
– D: previous demand data
– a, b, c: parameters used to adapt the model
• Building a forecasting model means to:
– Identify a model (f)
– Select the data to be used (D)
– Find the best value of the parameter (a, b, c,...)
How do we select a model? How do we evaluate it?
5. Direct vs cross-validation
procedure
• Two procedures to make forecasts applicable to every forecasting
model/method
1. The direct procedure is simpler and faster
2. The cross-validation procedure is more complex but allows a
better evaluation of the model
6. Direct procedure (step 1)
Data Set
• Build a model based on the data available (Data Set)
• Change the parameters until you do not find a good fit (MAD,
BIAS, RMSEA, etc.)
Time now
Model
7. Direct procedure (step 1)
• Quantitative models can have many parameters to “play” with
• DO NOT OVER FIT: your target is to create a model able to interpret reality
and produce good forecasts. A model that perfectly fits history is usually not
good at forecasting.
Data Set
Time now
Model
Forecast
Data Set
Time now
Model
Forecast
Over fitting
9. Direct procedure (step 2)
Data Set
• Produce the forecast based on the parameters identified
Time now
Forecast
Model
10. Forecasting using the moving
average
• It is possible to make forecasts using the right-centered moving
average
• The forecast is based on the average of the last k periods
k
i
i
t Y
k
F
1
1
1
Period Demand MM(3) MM(5) MM(6)
1 105
2 125
3 100
4 105
5 104
6 117
7 104
8 127
9 103
10 101
11 109
12 121
13 129
13. Forecasting using the moving
average: forecast
• Forecast for the periods after the first one is equal to the initial
forecast
Period Demand MM(3) MM(5) MM(6)
1 105
2 125
3 100
4 105 110.0
5 104 110.0
6 117 103.0 107.8
7 104 108.7 110.2 109.3
8 127 108.3 106.0 109.2
9 103 116.0 111.4 109.5
10 101 111.3 111.0 110.0
11 109 110.3 110.4 109.3
12 121 104.3 108.8 110.2
13 129 110.3 112.2 110.8
14 119.7 112.6 115.0
15 119.7 112.6 115.0
16 119.7 112.6 115.0
14. Forecasting using the moving
average
• How to select the parameter k?
• Same consideration you made for the centered moving average:
– Low values (you look only to recent values): more reactive, less
smoothing
– High values (you look to older values): more smoothing, less
reactive
• If there is seasonality: k = frequency, but if there is no seasonality,
you have to define the “best” value of k testing it on your time
series
• Let’s try…
15. Exercise B01 Part 1
• Download the file excel: “01B_MovingAverage.xlsx”
• Open the tab: “Direct Procedure”
• Which one of the proposed models (MM4, MM8, MM12) does fit best?
16. Direct procedure (step 3)
Data Set
• Once you have the actual data, you can measure the actual
errors of your model
• If necessary you can adjust the parameters for future
forecasts
Forecast
Time now
Actual
data
17. Direct procedure (step 4)
Data Set
Re-run the procedure with new data to get a new forecast
Forecast
Time now
18. Direct Procedure: Limitations
• The limitation of the direct procedure is that it does not test the
validity of the model developed until we get actual data
• The cross-validation procedure allows to make this testing through a
slightly more complex process
20. Cross-validation procedure
(step 1)
Training
Set
Test Set
Divide the sample in two:
•Training set: to build the model
•Test set: to test the model
Rule of thumb: test set should be a period (step e.g. one year)
and usually no more than one third of the entire dataset.
Time now
21. Cross-validation procedure
(step 2)
Training
Set
• Build your model on the training set.
• Change the parameters until you do not find a good fit (MAD, BIAS,
RMSEA, etc.)
• Remember to not over fit on the training set (as in the direct
procedure)
Time now
Model
22. Cross-validation procedure
(step 3)
• Make a forecast and test it on the test set
• Check MAD, BIAS, RMSEA, etc.
• Now you have two sets of errors
Training
Set
Time now
Model
Forecast
1. Errors on the
training set
Use them to tune
the model
2. Errors on the
test set
Use them to
validate the
model
Test Set
23. Cross-validation procedure
(step 3)
• If you are not satisfied, change the parameters looking at the
training set
• Do not change parameters to fit the TEST SET, otherwise you will
get a test set over fitting
Training
Set
Time now
Model
Training
Set
Time now
Model
Forecast
Test set
over fitting
Forecast
24. Cross-validation procedure
(step 4)
Data Set
Go back to the data set and run the model with all the available
data keeping the parameters you have identified before.
From here on the procedure is identical to the direct
procedure.
Time now
Forecast
25. Cross-validation procedure
(step 5)
Data Set
Once you have the actual data, you can measure the real errors
of your model.
If necessary you can adjust the parameters.
Forecast
Time now
Actual
data
27. Exercise 01B Part 2
• Open the file “01B MovingAverage.xlsx”
• Open tab “Cross Validation”
• Compare the MAD and BIAS between the training and the test set and
check whether MM8 provides a reliable forecast
• Make the final forecast
30. Exponential smoothing
• Limitations:
– The moving-average gives the same weight to all the observations
(1/k)
– The number of observations considered by the weighted moving
average is constant and finite
• What we would like to have:
– Higher weights for newer observations higher reactivity to
new trends
– At the same time, be able to consider the history entire (less
recent observations) if it can provide useful information
• Exponential smoothing overcomes the issues of:
– It is a weighted moving average where weights decrease
exponentially
– All the observations are considered but newer observations
have higher weights
31. Simple exponential smoothing
• The basic idea:
– Higher weights to more recent observations
– Exponentially decreasing weight for the other observations
Weight given
to the most
recent
observation
a
32. Simple exponential smoothing
• Building the model is very simple: just decide the weight form 0 to 1
for the weight of the most recent observation and the model
computes all the other weights automatically
0
0,2
0,4
0,6
0,8
1
t-1
t-2
t-3
t-4
t-5
t-6
0
0,2
0,4
0,6
0,8
1
t-1
t-2
t-3
t-4
t-5
t-6
0
0,2
0,4
0,6
0,8
1
t-1
t-2
t-3
t-4
t-5
t-6
a 0.2 a 0.5 a 0.8
The higher a the higher the importance of the last observation
compared to the others
the model has little memory for data in the past
33. Simple exponential smoothing
t
t
t F
Y
F )
1
(
1 a
a
• The implementation of the model is very simple and based on a
recursive formula
• The forecast is based on the last demand data weighted by alpha e the
forecast made for the previous period weighted by (1-alpha)
• Remember: alpha is a weight bounded within a 0 to 1 range
– Close to “1”: means to give importance only to the last demand
data (model very reactive on current demand)
– Close to “0”: means to give importance only to the previous
demand data (model little reactive, very history-based)
Last demand
data
Previous Forecast =
Previous demand data =
history
34. Simple exponential smoothing
• Before producing the final forecast, forecasts for all the previous
periods have to be generated recursively using the same formula
• There is always a first forecast to calculate without a previous
forecast
• Usually the first forecast is set equal to the first observation (naive
method)
– F1 = Y1
– After some periods, the first observation will lose its significance
35. Example
• We have a time series
• For simplicity sake, we use
the direct procedure
Period Demand
1 105
2 125
3 100
4 105
5 104
6 117
7 104
8 127
9 103
10 101
11 109
12 121
13 129
14
15
16
38. Example
3. Calculate the model
for the second period
t
t
t F
Y
F )
1
(
1 a
a
= 0.5 * 105 + (1-0.5)*105 = 105
Period Demand
Simple Exp.
Smoothing
1 105 105
2 125 105.00
3 100
4 105
5 104
6 117
7 104
8 127
9 103
10 101
11 109
12 121
13 129
14
15
16
39. Example
4. Drag down the
formula to calculate the
other values
Period Demand
Simple Exp.
Smoothing
1 105 105
2 125 105.00
3 100 115.00
4 105 107.50
5 104 106.25
6 117 105.13
7 104 111.06
8 127 107.53
9 103 117.27
10 101 110.13
11 109 105.57
12 121 107.28
13 129 114.14
14
15
16
41. Example
6. Change alpha if needed (but do not overfit). Always use both
visual analysis of the data and error calculation
Alpha = 0.2 Alpha = 0.9
Which one looks better?
42. Example
Alpha = 0.2 Alpha = 0.9
MAD = 9.82 MAD = 12.37
6. Change alpha if needed (but do not overfit). Always use both
visual analysis of the data and error calculation
43. Example
7. We select alpha = 0.2
and to make the forecast
just drag down the
formula one more time
Period Demand
Simple Exp.
Smoothing
1 105 105
2 125 105.00
3 100 109.00
4 105 107.20
5 104 106.76
6 117 106.21
7 104 108.37
8 127 107.49
9 103 111.39
10 101 109.72
11 109 107.97
12 121 108.18
13 129 110.74
14 114.39
15
16
44. Use of the model
• The simple exponential smoothing
is used to forecast only one
period ahead (if you need more
periods you can use the same
value)
Period Demand
Simple Exp.
Smoothing
1 105 105
2 125 105.00
3 100 109.00
4 105 107.20
5 104 106.76
6 117 106.21
7 104 108.37
8 127 107.49
9 103 111.39
10 101 109.72
11 109 107.97
12 121 108.18
13 129 110.74
14 114.39
15 114.39
16 114.39
45. Use of the model
• Stationary demand
• Purpose: identify the level (that is, local mean)
• Use of alpha:
– If high variability lower alpha (0.1-0.3)
– If there are “jumps” in the series higher alpha (0.6-0.8)
The simple exponential smoothing is not
able to recognize trends and seasonality
Low alpha High alpha
47. Exercise B02
• Forecast the demand for a double-knit
fabric for the next 12 months
• Use MS Excel
• Use a simple exponential smoothing model
• Use:
1. Direct procedure (first Excel sheet)
2. Cross validation procedure (test set
= 12 months) (second Excel sheet)
48. Simple Exponential Smoothing
Game
• Divide in groups of 3: 1 computer per group
• Open the file “Exp Smoothing Game Class.xlsx”
• For each period and for each of the 3 products:
1. Write the actual demand provided by the teacher
2. Check the error and the charts (Deck sheet)
3. Select the most appropriate value of alpha
4. Make your forecast for the next period (using direct procedure)
• Two rules
– You can not change past decisions
– You can vary the value of alpha of maximum +/- 0.1
(otherwise becomes red)
49. Adaptive Exponential
Smoothig
• Used to dynamically change the value of alpha
• If MAD is increasing reduce alpha
• If BIAS is increasing increase alpha
MAD increasing
(variability)
Decrease alpha
BIAS increasing (trend
or jumps)
Increase Alpha
50. Adaptive exponential
smoothing
• Improvement of the simple exponential smoothing
• The model becomes more or less reactive according to the evolution
of the demand
• The alpha parameter is dynamically changed according to the error
• A beta parameter sets how fast the alpha
parameter will vary
• Initialization:
– β = 0.2
– F2 = Y1
– A1 = M1 = 0
• Robust algorithm, useful in automated systems
• Alpha is variable, but beta is fixed and it can affect the performance
of the overall model
t
t
t
t
t
t
t
t
t
t
t
t
t
t
t
Y
F
E
M
E
M
A
E
A
M
A
F
Y
F
1
1
1
1
1
1
a
a
a
50
52. Linear exponential smoothing
(Holt)
• Evolution of the simple exponential smoothing to consider the trend
(Holt, 1957)
– Lt is the level of the series
– bt is the slope of the series
– Alpha: sets the reactiveness of the level (as in the simple
smoothing)
– Beta: sets the reactivennes of the underlying trend
m
b
L
F
b
L
L
b
b
L
Y
L
t
t
m
t
t
t
t
t
t
t
t
t
1
1
1
1
1
1
a
a
53. Linear exponential smoothing
(Holt)
• The model can adapt to changes in the trend according to beta:
higher beta higher reactivity to the recent trend
• The factor m is used to project the trend
– It is always 1 until there are demand data than it increases by 1
(2, 3, 4,…) for the forecasting periods
• Limitations: the trend is hypothesized to be linear
54. Building the model
• Initial level value
– L1 = Y1 (level equal to the demand)
• Initial trend value (different approaches)
– b1 = Y2-Y1 (ok if values are not too much different)
– b1 = (Y4 – Y1)/3 (data every four months)
– b1 = (Y13 – Y1)/12 (data per month)
• Alpha and beta have to be set through the visual analysis of the
series and iteration (remember to not overfit) to get the optimal
model
55. Building the model
• The procedure is similar to the one followed for the simple
exponential smoothing (see example next slide)
56. Example (alpa = beta = 0.1)
Yt Ft Lt bt m
105 - 105.00 2.00 -
107 107.00 107.00 2.00 1
110 109.00 109.10 2.01 1
111 111.11 111.10 2.01 1
112 113.11 113.00 2.00 1
114 114.99 114.90 1.99 1
113 116.88 116.49 1.95 1
114 118.44 118.00 1.90 1
116 119.90 119.51 1.87 1
118 121.38 121.04 1.83 1
121 122.87 122.69 1.81 1
124 124.50 124.45 1.81 1
126.26 1
128.07 2
129.87 3
Forecast
= 124.45 + 1.81 * 1
= 124.45 + 1.81 * 2
= 124.45 + 1.81 * 3
1. Initialization
5. Calculate the forecast
using the last data
available for L and b
3. Drag down
2. Create the first row
using formulas (in
order: L, b, F)
4. Check errors
and set alpha and
beta
m
b
L
F
b
L
L
b
b
L
Y
L
t
t
m
t
t
t
t
t
t
t
t
t
1
1
1
1
1
1
a
a
58. Use of the model
• Use it for series with no seasonality
• The use of alpha is the same as in the simple exponential smoothing
(variability vs jumps)
• Use of beta:
– If long term trend: low beta (0.1-0.3)
– If short term trend: high beta (0.6-0.8) be careful that with high beta
the model could mistake random oscillations for trends!
Low beta
High beta
59. Exercise B03
• Provide a forecast for the
following 12 months for the flight
traffic
• Use MS Excel
• Use a linear exponential
smoothing
1. Direct procedure (first Excel
sheet)
2. Cross validation procedure
(test set = 12 months) (second
Excel sheet)
Group exercise
60. Download and install
SAS JMP 10
• http://www.jmp.com/landing/jmp_trial.shtml?ref=hp_visual
• Register to the website and download the 30 days trial
61. Winters exponential
smoothing
• Evolution of the Holt model (Winters 1960) that considers also seasonality
– Lt is the level of the series
– bt is the slope of the series
– St is the seasonality of the series
– Alpha: sets the reactivity of the level (as in the simple smoothing)
– Beta: sets the reactivity of the underlying trend (as in the linear
exponential smoothing)
– Gamma: sets reactivity of the seasonality
m
s
t
t
t
m
t
s
t
t
t
t
t
t
t
t
t
t
s
t
t
t
S
m
b
L
F
S
L
Y
S
b
L
L
b
b
L
S
Y
L
a
a
1
1
1
1
1
1
1
62. Characteristics of the model
• “s” is the seasonality period and it is hypothesized to be known and
constant (e.g. 12 months)
• Limited time horizon (trend and seasonality are supposed to be
constant in the future)
• Trend is hypothesized linear but the model can follow changes in the
trend (as in the linear exponential smoothing)
• Seasonality is multiplicative
– One coefficient for each period
– The coefficient can change over time
63. Building the model
• Initialization: three initial values (L1, b1, S1)
• Two full seasons are needed to calculate coefficients
• The optimal model has to be found changing alpha, beta and gamma
s
s
s
s
s
s
s
s
s
s
s
s
s
L
Y
S
L
Y
S
L
Y
S
s
Y
Y
s
Y
Y
s
Y
Y
s
b
Y
Y
Y
s
L
;...;
;
...
1
...
1
2
2
1
1
2
2
1
1
2
1
64. Use of the model
• The use of alpha is the same as in the simple exponential smoohting
(variability vs jumps)
• Use of beta is the same as in the linear exponential smoohting (long
term trend vs short term trend)
• Use of gamma
– If constant long term seasonality: low gamma (0.1-0.3)
– If varying seasonality from year to year: high gamma (0.6-0.8)
Low gamma
High gamma
65. In conclusion
Data characteristics Simple Linear Winters
Stationary (no
seasonality)
V
Trend V
Trend + Seasonality V
Do not use a more complicated model if it is not the
case, otherwise you will be likely to over fit
Do not use a simple model if it is not the case,
otherwise you will have a poor fit
69. SAS JMP
• Repeat Exercise B02 using SAS JMP
• Using the direct procedure, perform simple, linear and Winters
exponential smoothing and make a forecast for the next year
• Finally, use the cross-validation procedure for the Winters model
Guided exercise
70. Exercise B04
• Monthly Australian sales of red wine: thousands of liters Jan 1980 -
Jul 1995
• Forecast: 12 months
• Make the best forecast you can!
• Use SAS JMP
Group exercise
71. Exercise B05
• World crude oil production
• Make the best forecast you can!
• Use SAS JMP
Group exercise
72. Assignment 2
• File: Assignment 2.xls
• 3 real time series
– Number of research in Goolge of a keyword (normalized at 100)
– Global production of an agriculture product
– Retail sales of a specific good category in a specific country
• Objective: produce the best forecasts you can for the required
periods (see excel file > ‘Forecast’ sheet)
• Your forecast will be compared to real data to find who got the best
forecast in terms of MAD and BIAS
73. Assignment 2
• Try different models (moving average or one of the exponential
smoothing) and decide which is best
• Use the cross-validation technique to set the parameters and
evaluate your forecast
• Remember to avoid over-fit
• You can use Excel or JMP
• Send an email to golini@mip.polimi.it and to
sciacovelli@mip.polimi.it assignment:
– 3 slides (1 slide per series) for the presentation of your solution. For
each series report the model that you used, the value of the parameters
and the motivation
– Excel file with main calculations and forecasts (fill the provided excel
file > ‘Forecast’ sheet)
– Report has to be submitted by email to golini@mip.polimi.it and to
sciacovelli@mip.polimi.it by May 18th
75. Introduction
• Time-based models (e.g. demand decomposition, moving average,
exponential smoothing) are based on the following hypothesis:
t
Past
demand
Future
demand
explains
What are the assumptions?
76. Assumptions of time-based
models
• There are past data might not be the case for new products
• There are regular patterns (trend, seasonality) not always the case (e.g.,
stock prices)
• The demand is quite disconnected form the environment might not be the
case for many products subject to shocks (e.g., events, promotions,…)
The Great Gatsby
Google Search
77. Explanatory models
• When the assumptions are not met, we can try to introduce external
variables (drivers) to explain future demand (explanatory models)
• Examples:
Driver Predicted outcome
Price Sales
Early sales Total Sales
Gross Domestic Product of a country Total demand
Promotion Temporary increase of the sales
… …
Issue: we need information about the
drivers!
78. Regression analysis
• The regression analysis is a type of explanatory model
• The idea is to determine coefficients (a1, a2, a3,..) associated to
drivers (X1, X2, X2,…) able to explain the future demand (Y)
• We will hypothesize linear relationships between the drivers and
demand linear regression
Drivers (X) Demand (Y)
X1
X2
X3
Y
a1
a3
a2
79. Simple vs Multiple Linear
Regression
Drivers (X) Demand (Y)
X1
X2
X3
Y
a1
a3
a2
Drivers (X) Demand (Y)
X1 Y
a1
Simple Linear
Regression:
1 Driver
Multiple Linear
Regression:
>1 Drivers
80. Example
80
Year
CM of
rain
Umbrella
sales
1990 58 59044
1991 58 62060
1992 80 85600
1993 70 65450
1994 83 77107
1995 91 90909
1996 69 66033
1997 63 65268
1998 63 64638
1999 56 55216
2000 86 85140
2001 86 83334
2002 92 84548
2003 58 52490
2004 91 85813
2005 78 78624
2006 93 83700
2007 61 55876
Drivers (X) Demand (Y)
Cm of
rain
Umbrella
sale
a1
only 1 driver
SIMPLE LINEAR REGRESSION
83. An example
83
Year
CM of
rain
Umbrella
sales
1990 58 59044
1991 58 62060
1992 80 85600
1993 70 65450
1994 83 77107
1995 91 90909
1996 69 66033
1997 63 65268
1998 63 64638
1999 56 55216
2000 86 85140
2001 86 83334
2002 92 84548
2003 58 52490
2004 91 85813
2005 78 78624
2006 93 83700
2007 61 55876
y = 883,89x + 6665,5
4000
14000
24000
34000
44000
54000
64000
74000
84000
94000
104000
50 60 70 80 90 100
Simple Linear Regression
Y = a1 * X1 + b
Interpretation: if the cm of rain were 0, we would sell 6665 umbrellas,
for each cm of rain more we sell 883 umbrella more
84. Simple Linear Regression
• A linear relationship between Y (dependent variable) and only one
driver (independent variable)
• The output is the equation of a straight line:
Y = a1 * X1 + b
– a1 is the coefficient associated to X1 for a unit increase of X1
the demand increases by a1 (example: for each cm of rain we sell
883 umbrella more)
– b is the intercept with the Y axis the baseline demand in case
X1 is 0 (example: even if it does not rain, we sell 6665 umbrellas)
84
85. Regression is the attempt to explain the variation in a dependent
variable using the variation in independent variables.
Regression is thus an explanation of causation.
If the independent variable(s) sufficiently explain the variation in the
dependent variable, the model can be used for prediction.
Drivers (x)
Dependent
variable
(y)
Simple Linear Regression
86. Independent variable (x)
Dependent
variable
The function will make a prediction for each observed data point.
The observation is denoted by y and the prediction is denoted by ŷ
The difference between y and ŷ is the prediction error
Zero
Prediction: ŷ
Observation: y
Simple Linear Regression
ERROR
87. Independent variable (x)
Dependent
variable
A least squares regression selects the line with the lowest total sum of
squared prediction errors.
This value is called the Sum of Squares of Error, or SSE.
Simple Linear Regression
88. Simple linear regression
88
The coefficient a1 can also be negative
Week Sales Price
1 10 € 1.30
2 6 € 2.00
3 5 € 1.70
4 12 € 1.50
5 10 € 1.60
6 15 € 1.20
7 5 € 1.60
8 12 € 1.40
9 17 € 1.00
10 20 € 1.10
Average 11.2 € 1.44
y = -14,539x + 32,136
0
5
10
15
20
25
€ 0,00 € 0,50 € 1,00 € 1,50 € 2,00 € 2,50
Interpretation: if the price was 0, we would sell 32 products,
but for every euro more we sell 14.5 products less
90. R2 indicator
• A regression line can have a good or a bad fit
• To evaluate this fit the R2 indicator is used
– R2 is the percentage of variability predicted by the model
– R2 near 100%: good fit
– R2 near 0%: poor fit (data are not related each other)
R2 = 45%
POOR FIT
R2 = 99%
GOOD FIT!
91. Use the linear regression
to make forecasts
91
Year
CM of
rain
Umbrella
sales
1990 58 59044
1991 58 62060
1992 80 85600
1993 70 65450
1994 83 77107
1995 91 90909
1996 69 66033
1997 63 65268
1998 63 64638
1999 56 55216
2000 86 85140
2001 86 83334
2002 92 84548
2003 58 52490
2004 91 85813
2005 78 78624
2006 93 83700
2007 61 55876
y = 883,89x + 6665,5
4000
14000
24000
34000
44000
54000
64000
74000
84000
94000
104000
50 60 70 80 90 100
Back to the umbrella example. We want to
open a new branch in India where there
are 110 cm of rain per year. What is the
predicted demand?
92. We just apply the formula:
Y = a1*X1 + b where X1 is the cm of rain in India
Use the linear regression
to make forecasts
92
y = 883,89x + 6665,5
4000
24000
44000
64000
84000
104000
124000
50 60 70 80 90 100 110 120 130
Solution:
883,89 * 110 + 6665,5 =
103893,4
93. Use the linear regression
to make forecasts
93
Week Sales Price
1 10 € 1.30
2 6 € 2.00
3 5 € 1.70
4 12 € 1.50
5 10 € 1.60
6 15 € 1.20
7 5 € 1.60
8 12 € 1.40
9 17 € 1.00
10 20 € 1.10
Average 11.2 € 1.44
y = -14,539x + 32,136
0
5
10
15
20
25
€ 0,00 € 0,50 € 1,00 € 1,50 € 2,00 € 2,50
How much we expect to sell with a price of 1.30?
94. Use the linear regression
to make forecasts
94
Week Sales Price
1 10 € 1.30
2 6 € 2.00
3 5 € 1.70
4 12 € 1.50
5 10 € 1.60
6 15 € 1.20
7 5 € 1.60
8 12 € 1.40
9 17 € 1.00
10 20 € 1.10
Average 11.2 € 1.44
y = -14,539x + 32,136
0
5
10
15
20
25
€ 0,00 € 0,50 € 1,00 € 1,50 € 2,00 € 2,50
How much we expect to sell with a price of 1.30?
Solution: -14.539 * 1.30 + 32.136 = 13.23
95. Use the simple regression as a
forecasting tool
1. Set-up the model
– Decide which is your dependent Y variable (sales, demand,
etc.)
– Decide which is your X1 driver (price, market drivers, etc.)
2. Calculate coefficients (a1 and b) using a software and check R2 (if
not good change the driver)
3. Identify the future value of the selected driver (may require a
forecast!)
4. Perform the forecast
96. Simple Regression in Excel
• Display data in a scatter-diagram
• Right-click on the series add trend line
97. Recommendations for simple
linear regression
• Simple linear regression can be used even if you do not have deep
statistical competences
• The counterpart is that you need to choose one driver
• It always best to use the driver that is more theoretically correlated
to the demand (cm of rain umbrella)
• But the driver should be also independent from the demand (do not
use cm of rain to predict days of rain)
• If you are undecided among different drivers, you can run several
models and pick the one with the highest R2
• Be careful: regression is sensitive to outliers, always inspect your
data and remove outliers!
98. Exercise B06
• Open the file: “06B Regression Exercise 1.xlsx”
• How many cranes we expect to sell in 2001 when it is expected a
GDP growth of 7.5%?
• What other model could you have used? What would have been the
outcome?
99. Exercise B07 (early sales)
• Fashion companies can not rely on historical data because their
products change every season
• At the beginning of the season there is high uncertainty and the risk
to overproduce if the product is a flop or underproduce if the
product will have success
• One option is to look at the early sales (sales at the beginning of the
season) to forecast the total sales
100. Exercise B07 (early sales)
• Every six months (season) a fast-fashion company launches ten types
of new t-shirts while the old ones are removed from the market
• For each t-shirt two data are registered:
– Total sales at the end of the season
– Early sales after 2 months
• The company is trying to understand if it is possible to forecast the
total sales given a certain amount of early sales
Season Product Total Sales Early Sales
1 T-shirt - 39662 21 7
1 T-shirt - 73131 21 7
1 T-shirt - 97299 49 49
1 T-shirt - 81181 57 57
1 T-shirt - 63623 77 25
1 T-shirt - 14358 84 84
…… … …
101. Exercise B07 (early sales)
• Consider the file “07B Exercise Early Sales.xls”
• Using the software perform a simple regression to evaluate the
relationship between early sales and total sales
• How much will be the sales for a product that had 1,230 early sales?
• Is there any difference between Fashionable and Non Fashionable
products?
102. Exercise B08
• Regression analysis can also be used to predict the effect of a
promotion
• Simply indicate with “1” when a promotion took place or “0” if in
that period there was no promotion
• Perform a simple regression between the total sales and the new 0/1
variable
• The “b” coefficient will be the average increase of sales due to the
promotion
• The “a” coefficient will be the average level of sales excluding the
promotions
• Reversing the formula (subtract from the sales the value of “b” when
a promotion takes place) you can then your data from the effect of
the promotion
103. Exercise B08
• Open File “Regression Exercise 3.xlsx”
• Set up the 0/1 promotion variable
• Calculate the average effect of a promotion
• Clean your data
• Perform a forecast of 3 periods ahead using simple exponential
smoothing
104. Multiple linear regression
• Simple: a relationship between Y (dependent variable) and X
(independent variable)
Y = a1 * X1 + b
• Multiple: a relationship between Y (dependent variable) and several
Xi (independent variables)
Y = a1 * X1 + a2 * X2 + … + b
104
106. Multiple linear regression
Both drivers seem to have a relation with sales
The results of the linear regression are:
Y = 0.034 * GPD + 0.033 * Marketing + 276.72
0
500
1000
1500
2000
2500
10000 15000 20000 25000 30000
Sales of cars
GDP Per capita
0
500
1000
1500
2000
2500
10000 15000 20000 25000 30000
Sales of cars
Marketing Actions
Attention! The coefficient are not the result of two separate regressions,
but it is a simultaneous result
107. Use the linear regression
to make forecasts
• Back to the car sales example based on GDP and Marketing effort
• The regression equation was:
• How many cars would we sale if the GDP next year will be 30.000 per
capita and we plan to invest 25.000 € in marketing?
• Solution
0.034 * 30000 + 0.033 * 25000 + 276.72= 2121.72
Y = 0.034 * GPD + 0.033 * Marketing + 276.72
108. Recommendations for
multiple linear regression
• Multiple linear regression allows you to use several drivers
simultaneously
• However, there some statistical traps, so it is suggested to ask the
help of an expert
• Some suggestions:
• Start with few, theoretically and statistically uncorrelated
drivers (for instance: do not put days of rain AND days of sun to
predict umbrella sales, but days of rain AND people income)
• Check if R2 is significant
• You can add other drivers, one at a time, but always check that
they are not correlated to the previous drivers
• Check if R2 has significantly increased, if not, drop the new
driver (it is better to have fewer drivers in the model)
109. Multiple Regression in Excel
• Select a range
• Function LINEST
• Inputs:
– Y as a vector;
– X as a matrix;
– True (otherwise b = 0);
– True (to have statistics)
• Ctrl+Shift+Enter to enter the function in each cell of the result array
• Outputs in matrix form
an … a1 b
st.err n … …st.err 1 st.err b
R2 St.erry
F Df
SSreg SSresid
110. Exercise B09
• Open the file “09B Exercise Multiple Regression.xlsx”
• The dataset represents number of cold organic fruit juices sold
• The product is still quite new for the market to the sales have high
fluctuations
• Also the price made by the company can change significantly
according to the cost of the raw materials (fruits)
• Customers can order in larger or smaller batches
• Provide a forecast for the number of orders and products sold
considering that for the next week these values are expected
Price 9
Promotions (1 =Yes) 1
Holidays during the week(1=Yes) 0
Full working week (1=Yes) 1
External Temperature (Celsius degrees) 10
Economic outlook (10=very good; 1=very bad) 7
112. FORECAST HORIZON : m
FORECAST estimated at period t for period t+m : Ft+m
t t+m
t+1
m periods
Ft+1 Ft+2 Ft+m
Time
t-1 t+2
Yt
Yt-1
DEMAND for period t : Yt
Notation
113. The idea behind AR Models
• Simple regression model
Yt = b0 + b1X1 + et
• Multivariate regression model
Yt = b0 + b1X1 + b2X2 + … + bkXk + et
• Multivariate regression based on previous observations
Yt = b0 + b1Yt-1 + b2Yt-2 + … + bkYt-k + et
114. The idea behind AR
• We will focus on its application to stationary series and non-
seasonal, but this limitation can be easily removed
• Context of use: mean reverting time series
• Example: drunk man trying to walk on a straight line
Signal
(Mean)
Demand: signal +
disturbance
116. What is the difference in
these two series?
Mean reverting
(AR model)
White noise
(random number in
a range +200 ; +100)
117. Mean reverting time series
• It can be difficult to asses graphically
• It’s more likely that the next period will be closer to the mean than
the period before
• But the path that brings back to the mean can be complex
• We can assess this through the autocorrelogram (see Part A slides)
and partial-autocorrelogram
118. Autocorrelation function
(ACF)
• Provides a synthetic
view of autocorrelations
• Useful tool to identify
non stationariety
factors
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Lag
1,0000
0,8166
0,7181
0,6395
0,5479
0,4301
0,2512
0,0793
-0,0383
-0,1487
-0,2118
-0,3170
-0,3793
-0,3957
-0,3464
AutoCorr
-.8
-.6
-.4
-.2
0
.2
.4
.6
.8
119. Partial Autocorrelation (PACF)
• If we have a regression relationship between Y and X1
and X2, it may be interesting to evaluate how much X1
explains of what X2 cannot explain
Yt = b0 + b1X1 + b2X2 + et
• Partial autocorrelation evaluates the correlation
between Yt e Yt-k when the effect of lags 1, 2, 3, …, k-1
has been eliminated: how much does Yt-k contribute to
explain Yt
Yt Yt-1 Yt-2 …
123. Revert-to-mean models
• A first order AR(1) model can be described as:
• Ŷt = μ + ϕ1Yt-1
• With |ϕ1|< 1
• If ϕ1 is positive the series tends to have periods above and periods
below the average
• If ϕ1 is negative up and downs from the average demand will be
below the mean next period if it is above the mean this period
124. Time series 1 (Mean
reverting)
• We can therefore apply an AR(1) model, for instance in JMP
• We obatain a forecast (green line) that converges to the mean
250
270
290
310
330
350
370
390
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97
Ŷt = 332.506 + 0.646 * Yt-1
130. How to detect AR(x) models
• 2 conditions:
1. The ACF decreases
exponentially (or
oscillating)
2. The PACF is
significant until “x”
lag
In the example we have
an AR(3) model, as the
PACF is significant until
lag 3 (above the blue
line) values
131. Exercise 10B
• Use JMP
• Open the file “10B Autoregressive.xlsx”, identify the model and
perform the forecast
132. Differentiation
• First and second order differentiation removes the trend component
• First and second seasonal differentiation remove the seasonality
133. Gain stationariety
• A possible way is to differentiate
• The 1st difference series evaluates the change between
two subsequent observations in the original series
1
'
t
t
t Y
Y
Y
136. Differentiation
• Allows to get rid of unstationary elements
• We can have 2nd level differentiation
• We can also have seasonal differentiation
2
1
2
1
1
1 2
)
(
)
(
'
'
'
'
t
t
t
t
t
t
t
t
t
t Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
137. Seasonal Differentiation
• Defined as:
s
t
t
t Y
Y
Y
'
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
1980 1985 1990 1995
Electricity
production
In Australia
143. Conclusive remarks
• AR models should be used only when the conditions shown by ACF
and PACF are verified
• AR models can be applied to series with trend by applying a first or
second order differentiation
– For instance: AR(1) with first order differentiation: ARI(1,1)
• AR models can be applied to series with seasonality by applying a
seasonal differentiation
– For instance: AR(1,0)(0,1)12 (in case of seasonality every 12
periods)
• Both differentiations can be combined:
– ARI(1,1)(0,1)12 (in case of seasonality every 12 periods)
144. An example – number of users
of an internet service
0
50
100
150
200
250
1 21 41 61 81
151. Regression and “Moving
Average”
• Similarly we can consider:
• Pay attention: this is the moving average of the
error not of the original series
t
p
t
p
t
t
t e
e
b
e
b
e
b
b
Y
...
2
2
1
1
0 MA
152. Moving average models
• Do not produce regular patterns as AR models ACF usually
significant only at Lag 1
• The series theoretically always starts from the constant b0 (and not
from the previous values) and varies according to the previous
variations that is why we need to rely on the PACF
153. Moving average models
• A first order MA(1) model can be described as:
– Ŷt = b0 + b1et-1
– b0 and b1 > 0
• The demand is supposed to be b0 but if in the previous period was
higher (et-1) then also in this period is going to be higher
– If b1 is positive the series tends to move in couples (high high;
lowlow)
– If b1 is negative the series moves in opposite couples (high
low; low high)
154. MA(1,1) = Simple exponential
smoothing
• Ŷt = b0 + b1et-1 with et-1= Yt - Ŷt-1
• So we can re-write as:
• Ŷt = b0 + b1 (Yt - Ŷt-1)
• In words: the forecast (Ŷt ) is dependent on the previous demand (Yt
) and the previous forecast (Ŷt-1)
• It’s the same principle of the simple exponential smoothing
• In particular, a MA(1,1) with no constant (b0) is equivalent to the
simple exponential smoothing
• For the demonstration:
https://onlinecourses.science.psu.edu/stat510/?q=node/70
• In conclusion: MA models are a broader class of smoothing
models!
158. AutoRegression and Moving
Average
• By joining the two models
• The model is good for stationary series
• It can be adapted for non-stationary series by
adding differentiation
AR MA
+ = ARMA
I
AR I
+ = ARIMA
MA
+
159. Notation
• AR: p = order of the autoregressive component
• I: d = order of the differentiation
• MA: q = order of the moving average component
• The value of p, d, q can be also be higher than one in case of
additional autoregressive or moving average components, however
better to be conservative and do not set more than one parameter
higher than 2
• Examples:
– Autoregressive: ARIMA (1,0,0)
– Moving average: ARIMA (0,0,1)
– Autoregressive with trend: ARIMA (1,1,0)
160. Model identification
ARIMA model identification is highly complex, however as a general
guideline:
1. Check if the series is stationary if not, differentiate until it is
stationary
2. Check if the series has seasonality if not, differentiate
(lag=period of the seasonality) until it is stationary
3. Check if ACF has significant patterns if yes, add an AR
component
4. Check if PACF has a significant patterns if yes, add a MA
component
5. Repeat steps 3 and 4 until ACF and PACF do not show significant
significant patterns
161. Quality evaluation
• Few suggestions for model design
– For the sake of simplicity start with an AR or an MA
model, then apply an ARMA model and analyse
differences
– Test different methods: pay attention
• Usually estimation is done by min(MSE)
• You can easily reduce MSE by complicating the model (i.e.,
overfit)
• Useful to consider measures that account for the complexity
of the model
– Residual analysis
162. Quality measures
• Mean Square Error
• Error variance
• R2
• Likelihood
– Akaike’s Information Criterion (AIC)
• m = p + q + P + Q
• L: Likelihood function
• AIC = -2logL + 2m
• The lower the better
– Schwarz Bayesian Information Criterion (BIC)