1. Using SkLearn to Improve Existing Risk
Models
PJ Fitzpatrick
opensourcepj@gmail.com
2. Definitions and Assumptions
●Market Risk instead of Credit Risk
●Linear instrument on Stock Indices
●Assume we have 6 indices:
● Dow Jones Ind/Transport/Utilities
● S&P 500
● Nasdaq
● Russell 2000
3. Definitions and Assumptions
●Risk Measure = VaR
●Currency of Risk
●Defined at confidence interval and holding
period. Eg VaR 1 day holding period at 95%
confidence interval of 100 means that 95% of the
time 1 day PnL will be above -100
●Used to consistently measure disparate products
in disparate markets
●Measure of risk frequently changes with changes
in volatility
4. Historical VaR
●Historical VaR
●Simulation based risk
●Simulation scenarios are taken as actual
historical returns
●Usually the most recent 1 years data
●Tradeoff between relevance and number of
scenarios
5. Historical VaR
●Easy to explain
●Easy to implement
●A lot more consistent implementations
●Deals very well with the problem of fat tailed
distributions and correlation behaviour under
extreme movements
6. import pandas as pd
from settings import data_dir, scenario_dir
import os
indices = pd.read_csv(os.path.join(data_dir, "prel_indices.csv"))
stock_names = ['DJI', 'DJT', 'DJU', 'GSPC', 'IXIC', 'RUT']
for stock_name in stock_names:
chg_name = '{0}_Chge'.format(stock_name)
indices[chg_name] = indices[stock_name].pct_change()
indices.to_csv(os.path.join(data_dir, "indices.csv"))
Historical VaR –Pre Processing
8. VaR Testing
● Hypothetical PnL as opposed to actual PnL
● Hypothetical PnL compared to VaR each day
and whether breach (1) or not (0) recorded
● Usually every year the model tested
● Kupiec testing – that we are getting around
the amount of breaches for our percentile
amount taking into account the amount of
observations
● Christophensen Test – Testing for runs of
breaches
● Test portfolios used to improve test coverage
9. def kupiec(var_results, per):
n = len(var_results)
m = sum(var_results)
return 2*np.log(pow(1-m/n,n-m)*pow(m/n,m))-
2*np.log(pow(1-per,n-m)*pow(per,m))
def christoffersen_serial_ind(var_results):
n00 = 0
n01 = 0
n10 = 0
n11 = 0
for idx, result in enumerate(var_results[:-1]):
if result == 0:
if var_results[idx+1] == 0:
n00 += 1
else:
n01 += 1
if result != 0:
if var_results[idx+1] == 0:
n10 += 1
else:
n11 += 1
pi01 = n01 / (n00 +n01)
pi11 = n11 / (n10 +n11)
pi = (n01 + n11) / (n00 +n01 + n10 +n11)
return 2*np.log(pow(1-pi01,n00)*pow(pi01,n01)*pow(1-pi11,n10)*pow(pi11,n11)) -
2*np.log(pow(1-pi,n00+n10)*pow(pi,n01+n11))
VaR Testing
10. Improving Historical VaR
●Historical VaR involves selecting scenarios from
a fixed window of most recent data
●An alternative is to use a larger window and
cluster within the window and only select from
the cluster that the current observation belongs
to
●Assumes that market alternates between
different states
11. Improving Historical VaR
●To cluster we need attributes. These can be:
●Derived from risk factors
●External/Calendar based
●Only going to derive attributes from risk factors
here
●External/Calendar are domain specific but those
derived from risk factors can be used for any
asset
12. Improving Historical VaR
●For each risk factor calculate:
●Ratio of index to 30, 50 and 200 day moving average. Eg
DJI_Avg_R_30
●Ratio of 2 averages to each other eg DJI_Avg_R2_30_50
●Ratio of 2 standard deviations to each other eg
DJI_Std_R_30_50
●Number of Std from Average eg DJI_NumStd_30
13. import numpy as np
import pandas as pd
import os
from settings import data_dir, scenario_dir
stock_names = ['DJI', 'DJT', 'DJU', 'GSPC', 'IXIC', 'RUT']
indices = pd.read_csv(os.path.join(data_dir, "prel_indices.csv"))
for stock_name in stock_names:
chg_name = '{0}_Chge'.format(stock_name)
indices[chg_name] = indices[stock_name].pct_change()
for num_days in [30,50,200]:
avg_name = '{0}_{1}_Avg'.format(stock_name, num_days)
std_name = '{0}_{1}_Std'.format(stock_name, num_days)
indices[avg_name] = indices[stock_name].rolling(window=num_days, center=False).mean()
indices[std_name] = indices[chg_name].rolling(window=num_days, center=False).std()
ratio_name = '{0}_Avg_R_{1}'.format(stock_name, num_days)
indices[ratio_name] = indices[stock_name] / indices[avg_name]
Adding Attributes
15. start = idx-kmeans_horizon
end = idx - 1
idxs = indices.loc[start:end].index
X_prel = indices.loc[start:end, cluster_attributes].values
scaler = StandardScaler()
scaler.fit(X_prel)
X = scaler.transform(X_prel)
kmeans = KMeans(n_clusters=3, random_state=0).fit(X)
selected_cluster_label = kmeans.labels_[-1]
hist_var_scen_from_cluster = [idxs[idx] for idx, item in
enumerate(kmeans.labels_) if item ==
selected_cluster_label][var_periods*-1:]
Improving Historical VaR
16. Comparing Results to HistVar
●Compare the absolute deviation from the
percentile against the same number for historical
var
●Perform this for a number of different portfolio
types that are based on trading styles
18. Results for GSPC_Avg_R_200-
IXIC_Std_R_50_200
year long short spread1 spread2
2002 Y Y N Y 2
2003 Y Y Y Y 4
2004 N N Y Y 0
2005 N N Y N -2
2006 Y N N N -2
2007 Y Y Y Y 4
2008 Y Y Y Y 4
2009 Y Y Y Y 4
2010 N N N N -4
2011 Y Y Y Y 4
2012 Y Y N N 0
2013 N N Y Y 0
2014 Y Y N N 0
2015 N Y N N -2
2016 N N N N -4
3 3 1 1
19. Summary
●Results vary by portfolio type
●Allows for model diversification
●Less frequent but larger changes in risk measures
●Much more sophisticated selection criteria required
in practice that includes comprehensive measure of
using a VaR model
●Include static portfolios
●Overfitting
●Combining models
20. Using Decision Trees to Explain VaR
Breaches
●Pre-process dates for announcements
●Used to explain rather than predict
●Would not expect risk factors to be significant
usually. Transformed risk factors might be
present
●Useful results for non technical users
21. Using Regression to Identify Positions In
Portfolio
● Regress the current portfolio PnL by scenario
against instrument PnL by scenario
● Use Lasso/Elastic Net Regression to let the
model calculate the positions
● Very useful information to effectively reduce
disparate portfolios to a comprehendible
number of positions
22. Stress Testing
Specification usually on a small subset of risk
factors. Eg Indu changes by -5%
Objective is to fill out the other risk factors in
manner that is consistent and coherent
Issues:
● Correlation a lot different under extreme moves
● Applying changes can result in impossible risk
factor levels