OpLossModels_A2015

Title of presentation
Subtitle
Modeling Practice of
Operational Loss Forecasts
in Stress Testing

Copyright © 2015 SAS Institute Inc.
Types of Events Contributing to Operational Losses
• Seven categories defined by Basel-II
Basel II Event Types Business Cases
Internal Fraud misappropriation of assets, tax evasion, intentional mismarking of positions, bribery
External Fraud theft of information, hacking damage, third-party theft and forgery
Employment Practices and Workplace
Safety
discrimination, workers compensation, employee health and safety
Clients, Products, and Business Practice
market manipulation, antitrust, improper trade, product defects, fiduciary breaches,
account churning
Damage to Physical Assets natural disasters, terrorism, vandalism
Business Disruption and Systems
Failures
utility disruptions, software failures, hardware failures
Execution, Delivery, and Process
Management
data entry errors, accounting errors, failed mandatory reporting, negligent loss of client
assets
Source: en.wikipedia.org/wiki/Operational_risk

Operational Risk Requirements
• Based on Basel supervisory guidelines, a full internal operational risk management and measurement program
should include:
 Internal Loss Data
– Operational loss events occurred within the BHC with minimum of 5 years of data required
 External Loss Data
– Operational loss events at peer banks relevant to the BHC’s risk profile, such as ABA Consortium data or SAS@
OpRisk Global Data
 Scenario Analysis
– Assessment of exposure to plausible, low likelihood / high impact operational risk events, e.g. litigations
 Business Environment and Internal Control Factors (BEICF)
– Risk identification, assessment, and monitoring results that serve as the basis for a forward looking qualitative
adjustment to modeled capital requirements

Approaches Calculating Operational Risk Capital
• Three broad approaches based on Basel-II guidance
 Basic indicator approach
– A single indicator as a fixed percentage of BHC’s annual gross income
– Applicable to all banks regardless of the complexity
 Standard approach
– More granular to capture the diversity of business activities in each LOB, e.g. 1 factor for each LOB
– Applicable to banks meeting certain criteria and standards
 Advanced measurement approaches
– Most sophisticated methods based on internal loss data (ILD)
– Applicable to banks meeting higher standards subject to regulatory approvals
– Each BHC can pick a specific modeling approach, including loss distribution approach (LDA), scenario
approach (SA), or structural modeling approach, appropriate for its own risk profile

Regression-Based Method for Stress Testing
• Specifically for the purpose of stress testing, e.g. CCAR, this method is conceptually derived from the loss
distribution approach and is designed to link directly to macro-economic scenarios with the regression specification
through following steps:
 Segment loss events into categories defined by the Unit of Measure (UoM)
– UoM is defined to represent unique business and risk combinations with similar risk drivers, e.g. the
combination of Basel-II event types and LOBs
– The segmentation scheme is tested statistically to ensure the event homogeneity within each UoM
 Aggregate “Total Frequency”, e.g. F, and “Average Severity”, e.g. S, respectively of all loss events within each
UoM by time stamps, e.g. quarters or months
 Develop regression models of F and S separately for each UoM such that
F_t = G(X_t) and S_t = F(X_t), where X_t is the economic driver.
 Calculate the loss of each UoM by combining both F and S such that
Loss_t = F_t × S_t = G(X_t) × F(X_t)

UoM Definitions
• The definition of UoM should be granular enough to capture the diversity of business activities and ensure the within-
UoM homogeneity with loss events:
 Starting UoM design with Basel event types, consider the further split by LOB if the segmentation is supported by
the business requirement, data maturity, measurement stability, and statistical evidence.
 Since each event type represents unique risk characteristics, avoid segmenting losses across event types into a
same UoM
Consumer Commercial Investment Others
Internal Fraud UoM 1 … … … … … …
External Fraud … … … … … … … …
Employment Practices and Workplace Safety … … … … … … … …
Clients, Products, and Business Practice … … … … … … … …
Damage to Physical Assets … … … … … … … …
Business Disruption and Systems Failures … … … … … … … …
Execution, Delivery, and Process Management … … … … … … UoM n
Basel II Event Types
Line of Business

Testing The Validity of UoM
• How to evaluate the distributional homogeneity / heterogeneity of losses among various UoMs?
 Call for more general statistical tests than the ones only for location, e.g. mean, or dispersion, e.g. variance,
comparison
 Pairwise (K = 2) Testing
– Kolmogorov-Smirnov (K-S) test
proc npar1way;
class UOM;
var LOSS;
run;
 Groupwise (K > 2) Testing (Currently no off-shelf solution in SAS procedures)
– Empirical Coverages test (Mielke and Yao, 1988, 1990; Mielke and Berry, 2001)
library(Blossom)
ctest <- coverage(df$LOSS, df$UOM)
summary(ctest)
– Anderson-Darling (AD) test (Scholz and Stephens, 1987) implemented by library(kSamples)

Assessment for Modeling Approaches
Realistic
 Distributional assumptions must
reflect the nature of specific
operational risk measures
Well-Specified
 The model specification must be
conceptually sound and supported
by industry best practices
Flexible
 The framework must be
generalizable to accommodate the
complexity in the data
Simple
 The development must be well
supported by off-shelf computing
solutions
Principals
Practical Considerations
The maturity of frequency and
severity measures in operational
risk
A broad scope, e.g. 1,000+, of
macro-economic indicators as
potential predictors.

Modeling Frequency
• The frequency of operational losses measures how many times the operational loss events of a specific UoM
occurred within a certain period, e.g. quarterly.
• Various modeling approaches are employed in the industry based on different distributional assumptions
 Most popular approach in the industry - Lognormal regression formulated as below:
Log(Max(Freq_t , 1)) = β * X_t + ε_t , where ε_t ~ Normal(0, σ2)
– Straight-forward development with prevailing techniques adopted from simple OLS linear regression.
– However, it is based on problematic assumptions  discrete data vs. continuous distribution
 Regressions specifically designed to model frequency outcomes
– Poisson regression is the prevailing industry practice
– Negative Binomial or Quasi-Poisson regression with liberal assumptions applicable to real-world data
– More advanced regression approach accounting for serial correlations characterized in the time series of
frequency outcomes

Frequency Modeling – Standard Approach
• Poisson regression is a standard way to model frequency measures of the operational risk and is also in line with the
frequency distributional assumption in the well-accepted LDA.
• The model is formulated as
μ_t = Exp(β * X_t), where X_t refers to macroeconomic factor(s)
• Poisson regression can be easily implemented with various SAS procedures
 Convenient computing interface with COUNTREG / GENMOD / GLIMMIX procedures
– GENMOD / GLIMMIX are generic routines for Genelized Linear Models (GLM) for exponential family of
distributions, including Poisson and many others.
– COUNTREG is specifically designed to model frequency outcomes with advanced specification supports
 Estimation by specifying the likelihood function with NLMIXED / NLIN / MODEL procedures
– A full user control of the model specification is provided through the log likelihood function:
... ...
ll = -mu + Y * log(mu) - log(fact(Y));
model Y ~ general(ll);

Frequency Modeling – Flexible Extensions
• While Poisson regression works well in most situations, Negative Binomial (NB) or Quasi-Poisson regression might
be a good alternative with more liberal assumptions.
• Negative Binomial, e.g. NB2, regression assumes VAR(Y_t|X_t) = μ_t + K * μ_t
2 and is applicable to the over-
dispersion, e.g. the variance greater than the mean.
• Quasi-Poisson regression assumes VAR(Y_t|X_t) = θ * μ_t and is extremely flexible in both over-dispersion and
under-dispersion cases, depending whether θ > 1
 Can be implemented in GLIMMIX procedure by specifying the relation between the variance and the mean.
proc glimmix data = your_data;
model Y = X / link = log;
_variance_ = _mu_;
random _residual_;
run;
 R provides an intuitive interface with glm() function
glm(Y ~ X, data = DF, family = quasipoisson())
 However, in both cases, there is no variable selection due to the lack of full likelihood function.

Frequency Modeling – Advanced Approach
• For time series of frequency outcomes with the strong dependence among observations, the serial correlation can be
handled by a more advanced approach incorporating the past information with a dynamic feedback mechanism.
• Assuming Y_t ~ Poisson(λ_t ), the general form of this frequency time series model can be expressed as
g(λ_t) = Σ α × g’(Y_t-i) + Σ β × g(λ_t-i) + η × X_t,
where g’(.) is a transformation, g(.) is a link function, and X_t is the macro-economic driver.
• In the context of log-linear model with a low order setting, the above formulation can be further specified as
ν_t = α × Log(Y_t-1 + 1) + β × ν_t-1 + η × X_t, where ν = Log(λ)
• While there is no SAS procedure readily implementing the above model, it can be estimated by specifying the
likelihood function with MODEL procedure.
 In R, tsglm() in library(tscount) provides a very simple implementation routine
tsglm(Y, xreg = X, distr = "pois", link = "log", model = list(past_obs = 1, past_mean = 1))
 In our prototype, this class of time series models are able to significantly improve GOF and model diagnostics at
the cost of a more complex prediction calculation due to the feedback mechanism.

Frequency Modeling – Other Considerations
• The real-world data is always messy and violates all assumptions, e.g. equidispersion. However, we should start with
something simple, e.g. Poisson regression.
 Even when Poisson model fails due to the violation of equidispersion, we can fall back to Quasi-Poisson model,
which gives identical estimated coefficients, and doesn’t have to abandon all post-model analysis such as back-
testing or sensitivity analysis.
• In some special cases, we might consider composite models reflecting additional data complexity
 For UoMs with rare loss events, two additional modeling approaches can capture point mass at ZERO
– Zero-Inflated Poisson (ZIP) model assumes the mixture distribution between a point mass at ZERO and a
standard Poisson process
– Hurdle model assumes the combination of two separate distributions, all ZEROs and a Truncated Poisson at
ZERO
 For UoMs with regime switches due to changes in business practices, the low-order finite mixture models can be
considered given the proper data support.

Severity Modeling – Choosing The Right Distribution
• Different from Frequency modeling methods, Severity modeling practices are rather diversified primarily due to the
lack of consensus on the Severity distributions.
• While the most popular method is still OLS regression based on the Gaussian distribution, more advanced
approaches under alternative distributional assumptions also become technically feasible upon the availability of
cutting-edge computation routines in SAS.
• SEVERITY procedure in SAS/ETS provides the most comprehensive and powerful computing interface for the
severity modeling.
 Providing multiple likelihood-based, e.g. AIC, or EDF-based, e.g. K-S statistic, measures to test the Goodness-of-
Fit of multiple distributions for the severity data, including but not limited Pareto, Lognormal, Pareto, Inverse-
Gaussian, Weibull, Burr, and so on.
– R library(fitdistrplus) provides similar functions.
 Estimating multiple Severity models under various distributional assumptions simultaneously with a simple output
for the model comparison

Severity Modeling – Gamma or Lognormal
• In additional to Normal distribution, Gamma and Lognormal are most widely-used distributions to model Severity
measures in the operational risk within GLM framework:
 Non-negative and positively skewed with a heavy tail, e.g. the variance proportional to the mean squared
– While Lognormal ensures the non-negativity by Log transformation, Gamma addresses the same with Log link
function.
Histogram and theoretical densities
data
Density
200 300 400 500 600 700 800
0.0000.0020.0040.006
Lognormal
Gamma
200 300 400 500 600 700 800
0.00.20.40.60.81.0
Empirical and theoretical CDFs
data
CDF
Lognormal
Gamma

Severity Modeling – Lognormal
• Under the assumption that the severity follows a Lognormal distribution, Lognormal regression can be formulated as
Log(Severity_t) = β * X_t + ε_t, where X_t refers to macroeconomic factor(s)
 E(Severity_t) = Exp(β * X_t + σ2 / 2), where σ2 / 2 is the “Volatility Adjustment”
• Advantages of Lognormal regression:
 Fit well for the data with a long tail by smoothing out the volatility and stabilizing the variance.
 Easy model development and post-model diagnostics with techniques directly ported from OLS regression
 Well supported by various statistical software with off-shelf solutions for variable selections
• Disadvantages of Lognormal regression:
 Under-prediction without the Volatility Adjustment
 For the model back-testing, need to convert predictions from the log scale back to the original scale

Severity Modeling – Gamma
• For Gamma regression, severity measures in the original scale can be directly modeled without the Log
transformation such that
Severity_t = Exp(β * X_t), where X_t refers to macroeconomic factor(s)
• Advantages of Gamma regression:
 When the data is not extremely volatile, Gamma regression usually performs better than Lognormal regression
 The adjustment factor is not necessary to correct for the estimation bias.
 The original scale of predictions is preserved without any transformation.
• Disadvantages of Gamma regression:
 Difficult model estimation until fairly recent.
 Variable selection for Gamma regression is not available in any SAS procedure
– In SAS, variable selection in Gamma regression can be facilitated by Lognormal regression
– In R, multiple off-shelf routines, e.g. stepAIC(), are readily available for variable selection.

How to Justify A Model?
• Model Quality
 Likelihood-based measures such as AIC / BIC / Deviance
 GOF-type tests such as Vuong Test or Clarke Test
• Model Validity
 Post-model diagnostics for multicollinearity, independence, heteroscedasticity, or normality (if applicable)
– VIF, Ljung-Box, Breusch Pagan, Jarque Bera … …
• Model Predictability
 Scaled-Dependent  MAE / RMSE  Appropriate for predictions across models in the same scale
 Scaled-Independent  MAPE / AMAPE  Appropriate for predictions across models in the difference scale
• Model Stability
 Re-estimate model coefficients after dropping certain data points, e.g. last n or first n
– Coefficients remain significant and stay within Confidence Interval of original coefficients
– The refitted model remains valid without the sacrifice for predictability.

OpLossModels_A2015

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to OpLossModels_A2015

Similar to OpLossModels_A2015 (20)

OpLossModels_A2015