Using SkLearn to Improve Existing Risk Models

Using SkLearn to Improve Existing Risk
Models
PJ Fitzpatrick
opensourcepj@gmail.com

Definitions and Assumptions
●Market Risk instead of Credit Risk
●Linear instrument on Stock Indices
●Assume we have 6 indices:
● Dow Jones Ind/Transport/Utilities
● S&P 500
● Nasdaq
● Russell 2000

Definitions and Assumptions
●Risk Measure = VaR
●Currency of Risk
●Defined at confidence interval and holding
period. Eg VaR 1 day holding period at 95%
confidence interval of 100 means that 95% of the
time 1 day PnL will be above -100
●Used to consistently measure disparate products
in disparate markets
●Measure of risk frequently changes with changes
in volatility

Historical VaR
●Historical VaR
●Simulation based risk
●Simulation scenarios are taken as actual
historical returns
●Usually the most recent 1 years data
●Tradeoff between relevance and number of
scenarios

Historical VaR
●Easy to explain
●Easy to implement
●A lot more consistent implementations
●Deals very well with the problem of fat tailed
distributions and correlation behaviour under
extreme movements

import pandas as pd
from settings import data_dir, scenario_dir
import os
indices = pd.read_csv(os.path.join(data_dir, "prel_indices.csv"))
stock_names = ['DJI', 'DJT', 'DJU', 'GSPC', 'IXIC', 'RUT']
for stock_name in stock_names:
chg_name = '{0}_Chge'.format(stock_name)
indices[chg_name] = indices[stock_name].pct_change()
indices.to_csv(os.path.join(data_dir, "indices.csv"))
Historical VaR –Pre Processing

indices = pd.read_csv(os.path.join(data_dir, 'indices.csv'))
position = {'DJI':1, 'DJT':1,'DJU':1,'GSPC':1,'IXIC':1,'RUT':1}
idx = 1000
var_periods = 250
series_historical_var = sum(map(lambda x: indices.iloc[idx-var_periods-1:idx-
1]['{0}_Chge'.format(x)]*position[x], position.keys()))
historical_var = np.percentile(series_historical_var, 5)
hypothetical_pnl = sum(map(lambda x:
indices.iloc[idx]['{0}_Chge'.format(x)]*position[x], position.keys()))
Historical VaR –Implementation

VaR Testing
● Hypothetical PnL as opposed to actual PnL
● Hypothetical PnL compared to VaR each day
and whether breach (1) or not (0) recorded
● Usually every year the model tested
● Kupiec testing – that we are getting around
the amount of breaches for our percentile
amount taking into account the amount of
observations
● Christophensen Test – Testing for runs of
breaches
● Test portfolios used to improve test coverage

def kupiec(var_results, per):
n = len(var_results)
m = sum(var_results)
return 2*np.log(pow(1-m/n,n-m)*pow(m/n,m))-
2*np.log(pow(1-per,n-m)*pow(per,m))
def christoffersen_serial_ind(var_results):
n00 = 0
n01 = 0
n10 = 0
n11 = 0
for idx, result in enumerate(var_results[:-1]):
if result == 0:
if var_results[idx+1] == 0:
n00 += 1
else:
n01 += 1
if result != 0:
if var_results[idx+1] == 0:
n10 += 1
else:
n11 += 1
pi01 = n01 / (n00 +n01)
pi11 = n11 / (n10 +n11)
pi = (n01 + n11) / (n00 +n01 + n10 +n11)
return 2*np.log(pow(1-pi01,n00)*pow(pi01,n01)*pow(1-pi11,n10)*pow(pi11,n11)) -
2*np.log(pow(1-pi,n00+n10)*pow(pi,n01+n11))
VaR Testing

Improving Historical VaR
●Historical VaR involves selecting scenarios from
a fixed window of most recent data
●An alternative is to use a larger window and
cluster within the window and only select from
the cluster that the current observation belongs
to
●Assumes that market alternates between
different states

●To cluster we need attributes. These can be:
●Derived from risk factors
●External/Calendar based
●Only going to derive attributes from risk factors
here
●External/Calendar are domain specific but those
derived from risk factors can be used for any
asset

●For each risk factor calculate:
●Ratio of index to 30, 50 and 200 day moving average. Eg
DJI_Avg_R_30
●Ratio of 2 averages to each other eg DJI_Avg_R2_30_50
●Ratio of 2 standard deviations to each other eg
DJI_Std_R_30_50
●Number of Std from Average eg DJI_NumStd_30

import numpy as np
import pandas as pd
import os
from settings import data_dir, scenario_dir
indices = pd.read_csv(os.path.join(data_dir, "prel_indices.csv"))
chg_name = '{0}_Chge'.format(stock_name)
indices[chg_name] = indices[stock_name].pct_change()
for num_days in [30,50,200]:
avg_name = '{0}_{1}_Avg'.format(stock_name, num_days)
std_name = '{0}_{1}_Std'.format(stock_name, num_days)
indices[avg_name] = indices[stock_name].rolling(window=num_days, center=False).mean()
indices[std_name] = indices[chg_name].rolling(window=num_days, center=False).std()
ratio_name = '{0}_Avg_R_{1}'.format(stock_name, num_days)
indices[ratio_name] = indices[stock_name] / indices[avg_name]
Adding Attributes

ratio_name = '{0}_Std_R_{1}_{2}'.format(stock_name, 30,50)
indices[ratio_name] = indices['{0}_{1}_Std'.format(stock_name, 30)] /
indices['{0}_{1}_Std'.format(stock_name, 50)]
avg_ratio_name = '{0}_Avg_R2_{1}_{2}'.format(stock_name, 30, 50)
indices[avg_ratio_name] = indices['{0}_{1}_Avg'.format(stock_name, 30)] /
indices['{0}_{1}_Avg'.format(stock_name, 50)]
ratio_name = '{0}_Std_R_{1}_{2}'.format(stock_name, 50, 200)
indices[ratio_name] = indices['{0}_{1}_Std'.format(stock_name, 50)] /
indices['{0}_{1}_Std'.format(stock_name, 200)]
avg_ratio_name = '{0}_Avg_R2_{1}_{2}'.format(stock_name, 50, 200)
indices[avg_ratio_name] = indices['{0}_{1}_Avg'.format(stock_name, 50)] /
indices['{0}_{1}_Avg'.format(stock_name, 200)]
for num_days in [30,50,200]:
avg_name = '{0}_{1}_Avg'.format(stock_name, num_days)
std_name = '{0}_{1}_Std'.format(stock_name, num_days)
numstd_name = '{0}_NumStd_{1}'.format(stock_name, num_days)
indices[numstd_name] = (indices[stock_name]-indices[avg_name])/indices[std_name]
Adding Attributes

start = idx-kmeans_horizon
end = idx - 1
idxs = indices.loc[start:end].index
X_prel = indices.loc[start:end, cluster_attributes].values
scaler = StandardScaler()
scaler.fit(X_prel)
X = scaler.transform(X_prel)
kmeans = KMeans(n_clusters=3, random_state=0).fit(X)
selected_cluster_label = kmeans.labels_[-1]
hist_var_scen_from_cluster = [idxs[idx] for idx, item in
enumerate(kmeans.labels_) if item ==
selected_cluster_label][var_periods*-1:]

Comparing Results to HistVar
●Compare the absolute deviation from the
percentile against the same number for historical
var
●Perform this for a number of different portfolio
types that are based on trading styles

indices = pd.read_csv(os.path.join(data_dir, "indices.csv"))
positions = {'long':{}, 'short':{}, 'spread1':{}, 'spread2':{}
}
for idx in range(start, end):
end = idx - 1
prel_list = []
calc_ratio = ((indices.iloc[end][stock_name] - indices.iloc[end-
90:end][stock_name].min())/
(indices.iloc[end-90:end][stock_name].max() - indices.iloc[end-
90:end][stock_name].min()))
prel_list.append((calc_ratio,stock_name))
s_list = sorted(prel_list)
positions['spread1'][idx] = {
s_list[0][1]: 1, s_list[1][1]: .5, s_list[2][1]: 0,
s_list[3][1]: 0, s_list[4][1]: -.5, s_list[5][1]: -1 }
positions['spread2'][idx] = {
s_list[0][1]: -1, s_list[1][1]: -.5, s_list[2][1]: 0,
s_list[3][1]: 0, s_list[4][1]: .5, s_list[5][1]: 1 }
positions['short'][idx] = {
s_list[0][1]: -1, s_list[1][1]: -.5, s_list[2][1]: 0,
s_list[3][1]: 0, s_list[4][1]: 0, s_list[5][1]: 0 }
positions['long'][idx] = {
s_list[0][1]: 1, s_list[1][1]: .5, s_list[2][1]: 0,
s_list[3][1]: 0, s_list[4][1]: 0, s_list[5][1]: 0 }
Calculating Test Portfolios

Results for GSPC_Avg_R_200-
IXIC_Std_R_50_200
year long short spread1 spread2
2002 Y Y N Y 2
2003 Y Y Y Y 4
2004 N N Y Y 0
2005 N N Y N -2
2006 Y N N N -2
2007 Y Y Y Y 4
2008 Y Y Y Y 4
2009 Y Y Y Y 4
2010 N N N N -4
2011 Y Y Y Y 4
2012 Y Y N N 0
2013 N N Y Y 0
2014 Y Y N N 0
2015 N Y N N -2
2016 N N N N -4
3 3 1 1

Summary
●Results vary by portfolio type
●Allows for model diversification
●Less frequent but larger changes in risk measures
●Much more sophisticated selection criteria required
in practice that includes comprehensive measure of
using a VaR model
●Include static portfolios
●Overfitting
●Combining models

Using Decision Trees to Explain VaR
Breaches
●Pre-process dates for announcements
●Used to explain rather than predict
●Would not expect risk factors to be significant
usually. Transformed risk factors might be
present
●Useful results for non technical users

Using Regression to Identify Positions In
Portfolio
● Regress the current portfolio PnL by scenario
against instrument PnL by scenario
● Use Lasso/Elastic Net Regression to let the
model calculate the positions
● Very useful information to effectively reduce
disparate portfolios to a comprehendible
number of positions

Stress Testing
Specification usually on a small subset of risk
factors. Eg Indu changes by -5%
Objective is to fill out the other risk factors in
manner that is consistent and coherent
Issues:
● Correlation a lot different under extreme moves
● Applying changes can result in impossible risk
factor levels

Using SkLearn to Improve Existing Risk Models

Recommended

Recommended

More Related Content

Similar to Using SkLearn to Improve Existing Risk Models

Similar to Using SkLearn to Improve Existing Risk Models (20)

Recently uploaded

Recently uploaded (20)

Using SkLearn to Improve Existing Risk Models