The Quant Foundry Labs division was approached to improve models for predicting low probability sovereign defaults. They developed a machine learning model that uses a large dataset of economic, financial, and governance indicators to predict sovereign credit ratings. The model was trained and tested on historical data, demonstrating improved accuracy over traditional statistical techniques. Explanatory tools also provide transparency into the model's predictions. The results represent an improvement in predicting low probability default events, which can help with regulatory requirements and risk management.
20240429 Calibre April 2024 Investor Presentation.pdf
Low Probability Sovereign Default Prediction
1. Low Probability Default – Sovereign Defaults
Chris Cormack and David Kelly, Quant Foundry
Stuck in the Tail
Challenges posed by regulators such as IFRS9 as well as the desire of asset and treasury managers to
improve their portfolio investment decisions and risk indicators for low probability of default portfolios
have led to a desire to improve forecast methods. This note highlights some of the research performed in
our Quant Labs that shows a demonstrable improvement on previous studies.
The problem with low probability defaults such as those witnessed by sovereigns is that by their very
definition, they do not happen very often and also unlike corporate and retail credit there are no more than
two hundred issuers. Traditional statistical techniques that heroically link disparate probability
distributions to create some narrative around contagion are hampered by fragmented market data that is
concentrated in the highly ranked issuers such as G7.
The input data used in these traditional models are a summary of how the network of investors views a
particular credit. It is the edited highlights, an aggregate view of the consequences of individuals across a
number of institutions completing detailed due diligence. The market data also defines a relative statement
or consensus of the risk-return for where the marginal dollar of investment should be applied. This
marginal dollar of risk, for example, drives the over buying and retrenchment of bond investors in external
emerging markets debt.
Governments fail for many reasons, but there is a pattern of behaviour that any analyst would look to see
for history repeating itself. Examples include; material expansion of external debt, autocratic or corrupt
government, low GDP per population, palm tree present on the national flag. OK not the last one although
it was a criterion for a senior credit officer from the 1990s. The key point is that there are a number of
non-market but still measurable attributes that can be combined to automate the due diligence process a
traditional country credit officer would complete.
The advantage of using a model that leverages off a wide set of data and calibrates well to historic market
and downgrade events is that it is much more consistent and avoids officers “going local” and not viewing
how each country sits alongside its peers.
The Quant Foundry Labs division was approached by a Tier 1 client to introduce modelling techniques that
improves the predictability of low probability defaults while reducing the burden on traditional credit risk
management work. We used a combination of traditional subject matter knowledge and AI tools that
together demonstrably provides a more powerful means of assessing events and finding complex
probabilistic dependencies within the data.
The use of more powerful machine learning techniques comes at a challenge of model transparency. To
address this, we have developed a suite of tools to provide insight and explanation into the model choices
and influence of the model inputs. Together the predictive Machine Learning Algorithm and the diagnostic
tools give rise to a powerful model combination that enables both portfolio mangers and regulatory risk
teams to assess the risks from LDPs as a point in time and through the credit cycle.
2. Quant Foundry Approach
If there was a buzzword hit parade for 2018, AI and ML would certainly be in the top three. There is much
hype even though the use of artificial intelligence has been a topic of conversation for decades – Forbidden
Planet, C3PO, Hal? Massively improved computation and a tsunami of available data has enabled AI and
ML to flourish and powerful tools have been democratised so that anyone with some technical background
can use them.
Our Quant Foundry Labs division have embraced these techniques but also appreciate that they are still
just tools that need to be treated with the same discipline that we now manage pricing and risk models.
We apply the following approach to all of our AI projects: -
Industry Knowledge
Has to start with industry knowledge to understand the problem we have
been asked to solve, articulate the desired outcome, available data,
solution design using a blend of traditional and AI techniques
Implementation
Detailed understanding of how to combine traditional and AI techniques to
build out a solution. Quality checks on gap-filled input set,
appropriateness of feature set, code testing and deployment
Quality Framework
Documentation of solution design and implementation approach,
preparation of input data test, performance results and explanatory
artefacts, articulation of model limitations and points of instability
Model Scope
The goal of our model is to predict the rating transition probability as a point in time forecast and to
leverage this model to forecast rating transitions up to four quarters ahead and beyond to address some of
the challenges of IFRS9. The predictive power of our model lies in the fact that it uses advanced machine
learning techniques to “learn” what is the most probable rating given different economical and financial
indicators. The model is trained and calibrated using a large set of economic and financial historical data
collected across many different countries: -
Classifier
We designed the classifier that given several economic and financial
parameters at a given quarter predicts the rating at that quarter. Several
machine learning algorithms and feature sets has been tested in order to
maximize the classification accuracy.
Forecast Algorithm
We implemented an algorithm that forecast the economic and financial
parameters needed by the classifier to predict the ratings. We explored
both traditional econometric models like ARIMA (Auto Regressive
Integrated Moving Average) models, and more innovative AI techniques
like standard Recurrent Neural Networks and LSTM models (Long Short
Term Memory)
3. Forward Evolution
Given the feature predictions and uncertainties we simulate possible
evolution paths. Then we run each path through the classifier. This return
a set of rating probability distributions.
Classifier and the Secret Sauce
AI as with all models cannot operate as an island. It is not credible or possible for a model to dredge the
entire universe of data and come up with a Feature Set that drives the algorithm. The secret sauce here is
industry knowledge and an understanding of the dynamics and red flags of government failure. Getting
this step wrong will allow the AI model to adhere to the time-honoured principle of “garbage in – garbage
out”. The definition of the Feature Set that contributes to the performance of the Classifier follows the
following three steps: -
Expert Selection
We included different kind of indicators: economic data, market data and
governance indicators. We looked for historical time series with the
highest available frequencies and time coverage. We put each time-series
of raw data into a coherent and uniform time grid with monthly and
quarterly frequency
Feature Forecasting
We enhanced the predictive power of our model by building additional
features combining/converting the available ones. This included
percentages, ratios, time decay, changes, spreads and basis. We also
subtracted globally co-ordinated trends such as 2008 meltdown
Final Feature Set
We completed several iterative tests to select the best performing set of
features for the model using a set of custom-built explanatory tools to
highlight feature significance and explanatory power.
Feature Set Selection is a very important process in order to enhance the performances of a classifier. A
higher number of features doesn’t necessarily mean higher accuracy and better results, although will
make it slower. Not all the countries have data on all the features; If we want to include more features in
the model we have to keep in mind that this always comes with the cost of dropping some countries for
which not all the necessary features are available. We need to apply judgement to find a compromise.
The exclusion of critical features due to a poor feature selection can introduce significant bias in the
classifier. We noticed that some features are critical for the classification of medium/lower rated
countries but not for high rating countries and vice-versa. We need to be sure that we are including the
most important once across the all range of ratings. We looked at the feature importance score reported
by the classifier after the training phase for feedback on our selection
4. Gradient Learning Curve
At this point we have historical data from several countries based on the Feature Set as well as the
issuer’s credit rating. We now need to introduce our enhanced gradient boosted algorithm (GB) to
enable us to forecast the future rating of each country. The objective of our chosen approach is to learn
from history to minimise the difference between predicted rating based on historical data and realised
rating, while overcoming considerable variances in the quality of the historical dataset.
GB is an ensemble technique that combines multiple weak learning algorithms that individually are based
on single decision trees, to gain a higher performance in terms of predictive power. The key advantage of
a gradient boosting algorithm lies in the fact that it learns from its errors.
A GB algorithm creates trees sequentially and at each iteration it adds new decisions trees focusing the
attention on the data points misclassified by the previous set of trees. In this way the predictive loss is
iteratively decreased. The process continues until no further improvements can be made. The name
“gradient boosting” refers to the fact that this technique uses a gradient descent algorithm to minimize
the loss when adding new trees.
So, a trained GB model can be described as the aggregation of weak learners, where each added tree has
been optimized. The final node condition of default can be converted into a probability of that event
happening.
We can now provide a probability estimate for each issuer will land on each rating from AAA to D during
each quarter in the coming year and then weights each member of the Feature Set by significance given
to it.
Validate for Stability
At this stage we should be able to claim victory and we can show results that showcase its predictive
power. What we don’t know is how the model will perform going forward, under different conditions.
We need to make sure the model rating output does not flip flap based on small changes in the Feature
Set.
Feature Set Forecast
The feature forecast algorithm consists of a typical principle component transformation of the data and
then apply a auto-regressive technique to create a four quarter evolution of the feature set. We then
assumed that the uncertainties are gaussian and we drew 100 possible new levels of Feature Set and
then reapply our trained GB model to see how the predicted rating change.
Visualising
In machine learning, one of the biggest challenges is model explanation and interpretability.
Some machine learning techniques are intrinsically more transparent (Naive Bayes, Decision Trees,
Random Forest). Others such as Neural Networks are trickier to decipher in particular understanding the
importance of each member of the Feature Set. Just because a feature is used deeper in the tree it
5. doesn’t necessarily mean its importance is lower than features used at higher levels We deploy
visualisation tools that gives less importance to features near the “root of the tree” and give higher
credence to those near the leaves. We deploy visualisation algorithm that looks at the average
difference in predictions over all orderings of the features and gives a more consistent result.
Now for the Results
We described how we trained and calibrated each step of our model. During this process we took from
our sample all the data relative to the last available year. We test the predictive performance of the
classifier by considering how well it knows the true features evolution by un-blinding the test data set.
The subsequent confusion matrix is shown where we can compare the predictions made using the true
values of the features and the predicted values of the features. We can see with the high concentration
in the diagonal the overall model accuracy and the enhanced predictive power around transitions
The graph below shows the strength of the predictability of the model for highly rated and thus low
probability default countries and shows impressive performance. For the next phase of this model
development we plan to capture the dynamics of cross-over movements in particular Southern European
members of the Euro that do not control their domestic currency.
6. Conclusion
The use of a blend of traditional modelling approaches and AI has improved the predictability of low
probability default models where the prevailing data tends to be a challenge. The use of AI demonstrably
provides an efficient way of conducting non-linear regression across historical data that normal
distribution mapping techniques fail.
The critical element of this exercise is the need industry knowledge of what are the key drivers of a
typical government default and the understanding of how these risk factors interplay in the choice of the
gradient AI model. Testing the model stability is central to any development and we certainly take that
approach. Finally, the methods used to visualise the results are integral to the methodology stack for this
type of model and needs to be included in any validation to make sure the recipients understand what is
going on under the hood when making risk decisions.
At the Quant Foundry, we are very excited by this approach as the results are a step improvement in
what has gone before, and we look forward to discussing this with our collaborators in the data vendor
world as well those that address this challenge in the banks.
The Quant Foundry
The Quant Foundry has been set up by Chris Cormack and David Kelly to provide professional consulting
services, quantitative team augmentation services, software design and development as well as mathematical,
social science and scientific research. Our research division – Quant Labs - includes the design and development
of deep-learning algorithms and Artificial Intelligence models across multiple industrial sectors.
Chris Cormack Delivery Partner chris.cormack@quantfoundry.com
David Kelly Quality Assurance Partner david.kelly@quantfoundry.com