This presentation includes two explanatory models to attempt to predict recessions. The first one is a logistic regression. The second one is a deep neural network (DNN). Both use the same set of independent variables: the velocity of money, inflation, the yield curve, and the stock market. As usual, the DNN fits the historical data a bit better than the simpler logistic regression. But, when it comes to testing or predicting, both models are pretty much even.
1. Are we in a Recession?
Predicting recessions with
Logit Regression and DNNs
Gaetan Lion, June 21, 2022
2. 2
Are we already in a recession? Maybe …
2022 Q1 GDP growth was already negative. And, 2022 Q2 may very well be [negative] when the released
data comes out.
The majority of the financial media believes we are already in a recession because of the stubbornly high
inflation (due to supply chain bottlenecks) and the Federal Reserve aggressive monetary policy to fight
inflation. The policy includes a rapid rise in short-term rates, and a reversing of the Quantitative Easing bond
purchase program (reducing the Fed’s balance sheet and taking liquidity & credit out of the financial system).
The Bearish stock market also suggests we are currently in a recession.
On the other hand, Government authorities including the President, the Secretary of the Treasury (Janet
Yellen), and the Federal Reserve all believe that the US economy can achieve a “soft landing” with a declining
inflation rate, while maintaining positive economic growth.
I developed a couple of models to attempt to predict recessions using historical data.
3. 3
The Federal Reserve most recent forecast
The Fed anticipates that they will raise the Fed
Funds rate to above 3% by the end of 2022. And,
that economic growth will remain positive
through 2024 ranging from 2% to 2.5% on an
annualized basis.
Notice that in 2022 Q1, the Fed forecast is
already off. While they forecasted real economic
growth of + 2% on an annualized basis, actual
results came in at negative – 1.5%.
4. 4
Modeling project basics
Objective
Attempting to predict recessions. Here I am focused on recessionary periods. And, any quarterly period with negative
RGDP growth is considered part of a recessionary period.
Dependent variable modeled
It is a binomial variable (0, 1). If the quarter shows negative economic growth it equals 1, otherwise 0.
Independent variable tested
I tested numerous macroeconomic variables going back to early 1960s to capture as many recessionary periods as
possible while still having access to an adequate pool of explanatory macroeconomic variables.
Independent variable transformations
I considered several transformations: a) quarterly % change; b) quarterly first difference (for rate variables); c) yearly %
change; d) level variables. I used d) when a variable level was constrained (oscillating within a finite range).
Model structures
I developed two competing models: 1) a logistic regression; 2) a Deep Neural Network model. Used R software.
Data
I extracted economic variables from FRED going back to 1960. I extracted S&P 500 data from Robert Shiller at Yale.
5. 5
The Logistic Regression Model
rec. This is the binomial (0, 1) dependent
variable. It is 1 if quarterly RGDP growth is
negative, otherwise it is 0.
velo. It is the quarterly first difference in the
Velocity of Money (GDP/M2).
cpi. Quarterly % change in the Consumer
Price Index.
curveL. Level of the Yield Curve. The spread
between the 10 Year Treasury and Fed Funds.
sp12. Yearly % change in S&P 500.
6. 6
Logistic Regression Model rational
A foundational equality: Price x Quantity = Money x Velocity of money
The logistic regression to predict regression includes Price (cpi) and Velocity (velo).
This model also includes the yield curve, a well established variable to predict recession. Notice that this
variable is not quite statistically significant (p-value 0.14). But, the sign of the coefficient is correct. It does
inform and improve the model. And, is well supported by economic theory.
The model includes the stock market that is by nature forward looking in terms of economic outlook. This
makes it a most relevant variable to include in a regression model to predict recessions.
7. 7
The underlying directional relationships
Focusing on standardized coefficients, as Velocity goes up, the probability of a recession goes down.
As the CPI goes up, the probability of a recession goes up.
As the yield curve widens, the probability of a recession goes down. As it narrows, the probability of a
recession goes up.
As the stock market goes up, the probability of a recession goes down. As it goes down, the probability
of a recession goes up.
Based on the standardized coefficients,
Velocity has much more influence on the
probability of recession. And, the yield
curve has a lesser influence than the other
variables.
8. 8
Is our model spurious by including variables from the
P x Q = M x V equality?
I don’t think so. The actual correlations with probabilities of a recession or the actual binomial variable
(0, 1) are rather modest as shown below.
Also, none of the independent variables are multicollinear, as shown by the low Variance Inflation Factors
below. All VIFs are below 2.5.
I also tested the residuals for stationarity and unit root issues. And, the residuals were not deemed non-
stationary. In view of the above, the model specifications and variable selection seem fine.
Notice that the original correlations signs are
consistent with the regression coefficient signs.
9. 9
The DNN model The DNN model has two hidden layers with 3
neurons in the first one, and 2 neurons in the
second one.
Number of neurons is nearly predetermined as
hidden layers must have fewer neurons than the
input layer and more neurons than the output
layer.
The activation function is Sigmoid, which is the same as a Logistic Regression. And, the output function is
also Sigmoid. This makes this DNN consistent with the Logistic Regression model.
10. 10
Logistic Regression DNN
ROC Curves, using the entire data, do not differentiate much between models
AUROC = 0.954 AUROC = 0.947
When using the entire data, the difference between the two models is fractional. They both provide much
lift in information vs. a naïve model that would just use the overall proportion of recessionary periods as
the one constant probability of recession.
11. 11
Kolmogorov – Smirnov plots do not differentiate much the two models
Logistic Regression DNN
Both models depict a tremendous lift or added information vs. a naïve model that would just use the mean
proportion as a constant probability of a recession. The KS test in both cases is associated with a p-value of
0.000 that the model’s fit could be due to randomness. As shown, the KS Plot don’t facilitate a clear
ranking between the two models.
12. 12
This graph more clearly differentiates the two models
Logistic Regression DNN
The graphs above show recessionary quarters in green; And, the other quarters in red.
On this count, the DNN model clearly differentiates itself by assigning probabilities very close to 1 for the vast majority
of the recessionary quarters, and very close to 0 for the other quarters. The logistic regression model shows a more
continuous range of probabilities between the 0 and 1 boundaries. Notice that both models do make a few errors
14. 14
Summary output
The main difference between the two models is that the DNN captures correctly 23 recessionary quarters
out of 31. Meanwhile, the logistic regression captures 21 recessionary quarters out of 31. Most of the
differences in accuracy measures sown above emanate from that difference.
15. 15
A Bayesian representation
using frequencies:
Logistic Regression
Think of a recession as a disease, and
the model as a disease screening test
(like a COVID test). And, given a
disease prevalence (recession
prevalence) and a model sensitivity
and specificity, we can reconstruct the
entire data from the Confusion Matrix.
17. 17
Summary comparison using the Bayesian framework
As shown, not much separates the
models using these measures.
However, we should remember
that the DNN was more
deterministic in its probabilities
output with the majority of its
quarterly probabilities estimates
being close to either 0 or 1. This
differentiation is not captured in
the shown measures of accuracy.
18. Testing Section
18
Here we are truly testing these models by truncating the data so that several recessionary periods
are out-of-sample or Hold Out sample. Thus, the mentioned recessionary periods are treated as
new data.
Within such recessionary periods, we include two quarters before and after the recessionary
quarters to capture enough economic turns in the data. And, check if the models can accurately
forecast such economic turns into their respective recession probabilities estimates.
19. 19
The 1980s Recessions
The DNN made smaller Average and
Median expected errors.
Both models missed 2 recessionary
quarters out of 6 (orange cells).
The Logistic Regression model generated
3 false positives (purple) vs. only 1 for the
DNN model.
20. 20
The 1980s Recessions. Confusion Matrix
The DNN model was superior in detecting non-recessionary quarters during the 1980s Recessions.
21. 21
Great Recession
Here, the Logistic Regression
performed a lot better than the
DNN as it captured 4 out of 5
recessionary quarters. Meanwhile,
the DNN captured only 2 out of 5.
22. 22
Great Recession. Confusion Matrix
The Logistic Regression was twice as accurate in predicting recessionary quarters than the DNN
during the Great Recession.
23. 23
COVID Recession
Both models have a perfect record during the 6 quarters including the 2 quarters of the COVID
Recession.
25. 25
Adding the 3 Recessionary Periods together. Confusion Matrix
When we add the three periods together, the Logistic Regression model is much better at capturing the recessionary
quarters (10 out of 13, vs. 8 out of 13 for the DNN). On the other hand, the DNN generates fewer false positives (1 vs.
3 for the Logistic Regression model).
28. 28
Bayesian comparison
The DNN very marginal superiority when using
the entire data set, did not translate in any
superiority when testing both models using
Hold Out or out-of-sample recessionary
periods.
Additionally, while the Logistic Regression
model is easy to understand, including
communicating the relative influence of the
independent variables (standardized
coefficient), the DNN is a rather opaque black
box. Even, if we would publish the relative
weights of each layer’s submodels, the overall
depiction is very challenging to interpret.
29. 29
Can these models predict the current prospective recession?
No, they can’t. That is for a couple of reasons:
First, both models have already missed out 2022 Q1 as a recessionary quarter. Even using the historical
data (not true testing), the Logistic Regression model assigned a probability of a recession of only 6% for
2022 Q1; and the DNN assigned a probability of 0%. Remember, the DNN is always far more deterministic
in its probability assessments. So, when it is wrong, it is far more off than the Logistic Regression model.
Second, for the models to be able to forecast accurately going forward, you would need to have a crystal
ball to accurately forecast the 4 independent variables. And, that is a general shortcoming of all
econometrics models. Practitioners often believe that Vector Auto Regression (VAR) model structures can
overcome this situation. But, it can’t. Resolving this situation entails resolving a bunch of circular
functions, which is not possible.