Bayesian Methods for Modern Statistical Analysis: A Bayesian Approach

Bayesian Methods for
Modern Statistical Analysis
Milovan Krnjaji´c
School of Mathematics, Statistics & Applied Mathematics
National University of Ireland, Galway
Whitaker Institute for Innovation and Societal Change
26-March-2013

Contents
1 Classical approach to statistical analysis
2 Features of Bayesian approach
3 Bayesian MCMC computation engine
4 An application of Bayesian modelling
5 Bayesian hierarchical models
6 A model for analysis of ﬁnancial risk

Statistics and uncertainty
Statistics studies uncertainty (quantify, interpret and present, use)
Uncertainty is unavoidable in everyday life, science, economy → importance
of statistics
Information uncertain or incomplete, causes unknown, events random: growth/fall
of a company stock, or an index; financial risk in company merge; portfolio risk;
size of the insurance premium; dynamics of electric power consuption; direction
and intensity of the spread of an epidemics; demand for products and how would
new product lines change it;
Statistical analysis: quantification of uncertainty in order to learn (gain
insight in) a problem (phenomenon) of interest; this learning contributes to
decreasing of the uncertainty
Milovan Krnjajić (NUIG) 1 / 60

Statistics and uncertainty
Probability Theory as a formal apparatus for systematic uncertainty
quantiﬁcation
Statistical analysis comprises:
Gathering data samples and describing data properties
Specifying a model (a formal speciﬁcation of unknown parameters) and
combine it with the data in order to derive inferences about the
parameters
Inference about the nature of the data generating process (answers
questions about the causes)
generalizing sample properties to statements about population
make predictions (with estimates of uncertainty bands)
Decision making based on the inference and prediction(s) regarding the
available actions with the goal of choosing the optimal ones.

Classical statistical analysis
Also called ”frequentist” since it adopts the frequentist interpretation
of probability (as a relative frequency of event occurrences in a long
sequence of repeated observations).
Model parameters are unknown ﬁxed quantities (constants).
Probabilistic statements about unknown parameters make no sense
(parameters are not repeatable in any way).
Randomness in the stats model is assumed for the data set generated
by the sampling process, which in turn may be (imagined to be)
repeated indeﬁnitely.
Data has a sampling probability distribution p(y|θ). Data exibit
random variability, which refers to aleatory uncertainty.
Parameters are not random, but are unknown (uncertain) as a result
of a lack of information (knowledge), which is epistemic uncertainty
(not expressible by frequentist probability)

Classical statisticians: K. Pearson, R. Fisher, J. Neyman
Karl Pearson (1857 – 1936 ) Ronald Fisher (1890 – 1961) Jerzy Neyman (1894 – 1981 )

Classical statistics: Inference
Neyman-Pearson hypothesis testing (HT)
HT error probabilities, power of test, most powerful tests, unbiased
and eﬃcient tests.
Fisher’s maximum likelihood based point estimation (ML)
P-values, Likelihood ratio tests
Inferential results depend on the Central Limit Theorem and large
sample sizes
Frequentist interpretation of probability makes repeatability a central
concept in model development and statistical inference. In (many)
cases where there is no repeatability at all it needs to be imagined in
order to interpret results probabilisticaly.

Problem: Interpretation of confidence interval (CI)
Assume Xi ∼ N(µ, σ2
), with sample mean ¯X = (X1 + X2 + ... + Xn)/n.
Then ¯X ∼ N(µ, σ2
/n) so that
P(µ − 1.96σ/
√
n < ¯X < µ + 1.96σ/
√
n) = 0.95
Therefore
P( ¯X − 1.96σ/
√
n < µ < ¯X + 1.96σ/
√
n) = 0.95
95% confidence interval (CI) is ¯X ± 1.96σ/
√
n
The above probability statements are good only before the sample is taken.
Once we know that ¯X = ¯x it makes no sense to compare probabilistically
two constants, unknown (yet constant) µ and ¯x.
Is P(µ ∈ 95%CI) = 0.95? The answer is NO!
As an unknown but fixed, the true µ either is or is not in the CI. If we
imagined repeated sampling, then µ would be in about 95% of the CI-s.

Problem: Interpretation of conﬁdence interval (CI)
Figure: Is P(µ ∈ 95%CI) = 0.95? The answer is NO! Being an unknown but ﬁxed quantity, the true mean either is or is
not in the CI. However, if we imagine repeated sampling, then µ would be in the CI about 95% of time.

Problem: Hypothesis testing, p-values
Figure: H0 vs. H1, α ﬁxed in advance, β depends on the sample size

Problem: Hypothesis testing, p-values
Hypothesis testing rests on a decision procedure not on a probability
statement.
Testing procedure indirect and convoluted.
Rejecting H0 at level α = 5% does not mean that there is only 5%
probability that H0 is correct.
Rejecting H0 at level α = 5% does not mean that only 5% of data
(collected repeatedly) would come from H0
P-value is not the probability that the H0 is correct.
P-value = the probability of observing data points such as the one
observed or more extreme, assuming that the H0 is correct. What extreme
means depends on the formulation of the null and alternative hypotheses

Violation of the Likelihood Principle
Likelihood Principle (LP): Inference about θ should depend on the
sample but not on the many samples which might have been obtained.
Freq. based analysis violates the LP in hypothesis testing and in
interpretation of the confidence intervals.
Example: Sample of size n with k successes, where P( success ) = θ
is unknown.
(a) fixed n, sample exactly n points;
P(X = k) = n
k θk
(1 − θ)n−k
, hence unbiased ˆθ = k/n.
(b) fixed k, sample as many points as necessary to obtain k successes.
P(X = k) = n−1
k−1 θk
(1 − θ)n−k
but here the unbiased ˆθ = k/(n − 1).
Frequentist claims two different sampling procedures (hence two
different distributions and estimators).
Yet the type of the sampling procedure does not convay any
information on θ
Problem: what if neither n nor k were fixed?

Classical analysis: a summary of problems
Freq. interp. of probability imposes an assumption of repeated
estimation procedures to derive prob. statements about the estimates.
These are not direct prob. statements about the probable values of
just obtained estimate (e.g. a CI for mean or variance), but ”proxy”
statements based on imagined sampling which is never done.
Hypothesis testing based on a convoluted decision procedure which
considers the scenario of indefinite repetitions of sampling
CI-s, p-values and results of hypotheses testing often misinterpreted
by non-specialists.
Likelihood principle often violated or not observed.
Conceptual difficulty in dealing with inherently non-repeatable events
and data sets which are not random samples: cross-national data in
economics, or political science using national data banks of the OECD
global repository. Analysis of the main factors leading to civil wars in
XX century (say) based on a comprehensive data account. What is
the meaning of ”statistical significance” here?

Classical analysis: Hallmarks
Interpretation of probability as long term relative frequency
Model parameters unknown ﬁxed quantities whereas a data sample is
considered random
Inference based on the assumption that the sampling procedure is
repeated indeﬁnitely
Distributional properties of estimators based on asymptotics (large n)
and an appeal to CLT (assumptions of normality)
Taught to undergraduates at an introductory level
Wide variety of powerful (industrial grade) software packages such as
SAS, SPSS, STATA, SPLUS, Minitab

Bayesians: T. Bayes, P.S. Laplace, B. de Finetti
Thomas Bayes (1701 – 1761 ) P.S. Laplace (1749 – 1827) B. de Finetti (1906 – 1985 )

Thomas Bayes (1702–1761)
T. Bayes(1763) An essay towards solving a problem in the doctrine of
chances. Phil Trans Roy. Soc. 53370-418
Interested in causal relationships; Observing a phenomenon we ask
about the cause(s), for example observing symptoms we want to
diagnose the disease, that is, find the causes.
Bayes was the first to consider causal connections in terms of
conditional (inverse) probability
Conditional probability: P(A | B) = P(A i B)/P(B), where A and B
are true or false statements
P(A = effect | B = cause), easier to consider than
P(B = cause | effect), (for example, A = symptom, B = illness)
Inversion of conditioning:
Bayes theorem : P(B | A) =
P(B) P(A | B)
P(A)

Conditional probability: An example
Problem: A random sample of 1,000 persons selected for a medical
test to identify those having illness XYZ
Test characteristics:
Pr( Posit. | Ill ) = 90%
Pr( Negat. | Healthy ) = 90%
Pr( Illness ) = 1%
Question: Pr( Illness. | Posit. ) = ?
Ill Healthy
Posit.
Negat.

Pr( Illness ) = 1%
Ill Healthy
Posit.
Negat.
10

Pr( Illness ) = 1%
Ill Healthy
Posit.
Negat.
10 990

Pr( Illness ) = 1%
Ill Healthy
Posit.
Negat. 891
10 990

Pr( Illness ) = 1%
Ill Healthy
Posit. 99
Negat. 891
10 990

Pr( Illness ) = 1%
Ill Healthy
Posit. 9 99
Negat. 891
10 990

Pr( Illness ) = 1%
Ill Healthy
Posit. 9 99
Negat. 1 891
10 990

Pr( Illness ) = 1%
Ill Healthy
Posit. 9 99
Negat. 1 891
10 990
Answer: Pr(Ill | Posit.) =

Pr( Illness ) = 1%
Ill Healthy
Posit. 9 99
Negat. 1 891
10 990
Answer: Pr(Ill | Posit.) = 9/(9 + 99) = 8.3%

Conditional probability: Bayes theorem
The medical test example:
P(B | +) =
P(+ | B)P(B)
P(+ | B)P(B) + P(+ | Z)P(Z)
=
(0.9)(0.01)
(0.9)(0.01) + (0.1)(0.99)
= 8.3%
Meaning of the theorem:
P(unknown | data) =
P(unknown) P( unknown | data)
P(data)
Theorem holds for real numbers, where unknown = θ, and known
data are in the sample y = (y1, y2, ..., yn):
p(θ | y) =
p(θ) p(y | θ)
p(y)
,
where p(θ | y) i p(y | θ) conditional probability densities and p(θ) i
p(y) marginal densities for θ i y

Knowledge synthesis
Bayes theorem modiﬁes p(θ), the uncertainty about θ, using the info
in the sample, p(y | θ) = L(θ | y):
p(θ | y) ∝ p(θ) × L(θ | y),
p(θ) = prior; info. about θ outside the sample (external)
L(θ | y) = likelihood; info about θ within the sample (internal)
p(θ | y) = posterior; updated info about θ
On the log. scale, lnp(θ | y) = lnp(θ) + lnL(θ | y), so the Bayes
theorem formalizes synthesis of knowledge:
posterior
total info. on θ
=
prior
info outside sample
+
likelihood
info within sample
The theorem emphasizes the sequential nature of knowledge
acquisition, before and after obtaining the sample:

Bayesian analysis
Classical stats interprets the probability as a limiting frequency in a
series of experiments repeated many times (impossible to formulate
probability of inherently unrepeatable events).
In Bayesian statistics the probability is a degree of the subjective
belief (of a rational person) in the truth of a (true/false) statement.
This is subjective probability.
Science objective? It aspires to be so, and to avoid subjective
judgments, yet in every area of science there is a controversy and
diﬀerences of opinion over topics of current interest.
Every statistical model (Bayesian or otherwise) ever developed is
based on a series of subjective judgment calls and assumptions. In
Bayesian stats we make these explicit (as priors).
Hallmark of Bayesian approach: Uncertainty is directly related to
randomness such that any unknown or uncertain quantity is treated
as a random variable with the corresponding prior distribution.
In function L(θ) = p(y | θ) the parameter vector θ is unknown and is
consequently treated as a random variable.

Additional information, prior distribution
One of the hallmarks of the Bayesian approach, treating unknowns as
random, allows us to incorporate in the model the information that is
not present in the data but can be expressed probabilistically.
This is the prior distribution, which encodes information obtained
from sources other than the sample (can be specified before or after
the data sample is obtained).
This gives a potentially great advantage to Bayesian methods in
terms of the ability to integrate information in the data set along with
all other available information.
Significant role in this plays the subjective interpretation of probability.
This has been a source of controversy, disagreements and
misunderstandings.
There are some problems in this approach, but the benefits of being
able to use additional information in a consistent and systematic
manner by far outweigh any difficulties

Experts, impostors, and priors
Three experiments involving tea, music sheets, and cards, and three
experts, a tea taster, a music sheet reader, a drunk (based on an
example from L. Savage).
Experts claim extreme ability to recognize properties of objects
Assume 10 correct answers out of 10 attempts for each expert
Hypothesis H0: the expert correct no more than expected by chance
Hypothesis H1: the expert performs better than chance (based on a
special ability)
There is an equally obvious evidence against H0 in each case (the
same data), thus the conclusions are the same.
But should they be? A Bayesian analysis can include prior knowledge,
different in each case, affecting the final conclusions in a reasonable
way.

Prior distribution, tea taster
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
024681012
Tea taster
Prior
Likelihood
Posterior

Prior distribution, music expert
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
024681012
Music expert
Prior
Likelihood
Posterior

Prior distribution, drunk
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
024681012
Drunk
Prior
Likelihood
Posterior

Expert posterior distributions
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
024681012
Posteriors
Tea taster
Music expert
Drunk

Additional information, prior distribution
This example shows the value of incorporating additional information,
even if it may be subjective (in the sense of one’s making a judgment
call, but then, as rational beings, we do it often)
In Bayesian modelling approach we express the additional information
in terms of the probability distribution
The second message from this example is that different priors may
have quite different impact on the posterior
In the time of subjectivity controversy Bayesians developed a class of
priors with minimal info on which anyone would agree, attempting to
formulate ”objective ignorance” (weak, diffuse, non-informative,
objective priors).
Informative priors extremely useful; Elicitation, the process of
obtaining substantive info from the (application) experts and
encoding it in terms of prob. dist.
Studying the impact of priors on the posterior is a regular exercise
during development of a Bayesian model. This is sensitivity analysis
and it also includes comparison of alternative models

A conjugate Bayesian model
Sample y = (y1, y2, ..., yn), yi ∈ {0, 1}, where n = 300, nd = 50 loans
are delinquent (1) and (n − nd ) = 250 are not (0).
Goal: what is the probability of loan delinquency in the population
and how uncertain are we about the estimate.
Loan, yi : Bernoulli (binary) random variable (r.v.):
yi =
1, with probability θ,
0, with probability 1 − θ
(1)
P(yi = b) = θb(1 − θ)1−b, b = 1 ili b = 0
All r.v.-s are independent and identically distributed (IID)
The r.v. θ is unknown parameter of the Bernoulli dist.
Goal: obtain the posterior information about θ

Beta-Bernoulli Conjugate Bayesian model
Bayesian model:
(yi | θ)
IID
∼ Bernoulli(θ), i = 1, . . . , n
θ ∼ p(θ)
Prior prob. p(θ) = Beta(a, b) ∝ θa−1(1 − θ)b−1
Likelihood = p(sample | θ) ∝ θnd (1 − θ)n−nd
Posterior prob. p(θ | sample) ∝ prior prob. × likelihood
post dist. p(θ | sample) = Beta(a + nd , b + n − nd ) =
Beta(53, 270), for a = 3, b = 20
Posterior and prior from same family of dist. – prior dist. conjugate
with the likelihood
Post. mean of θ, E(θ | sample) = 0.164; SD(θ | sample) ≈ 0.021;
95%CI ≈ (0.164 ± 2 × 0.021) = (0.12, 0.20)

Beta-Bernoulli Conjugate Bayesian model
0.0 0.1 0.2 0.3 0.4
05101520
θ
Probabilitydensity
a priori
likelihood
a posteriori

Bayesian highest density interval
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.02.53.0
Beta(3, 9) Highest density interval
θ
95% credible interval = (.060, .518)
95% highest densty int. = (.04, .48)
Figure: 95%CI = (.06, .518), HDI 95%CI = (.04, .48 ); P(θ ∈ 95% CI ) = .95 = P(θ ∈ HDI 95% CI)

Bayesian Computation
Figure: Cmputing power increases exponentially; from about 1 MIPS for Motorola 68000 (1980) to 17 peta FLOPS (2012,
LLNL, Sequoia, IBM BlueGene PowerPC processors, 98,000 nodes). 1 peta = 101
5 = 1 million billion ﬂoating point operations
per second. 1 sequoia aprox. 1 billion MC68000

Monte Carlo (MC) sampling
In practice, p(θ | y) = p(θ) × L(θ | y), complex and dim(θ) large
Problem: no closed form for integrals; no numerical integration;
A solution: Can learn anything about a probability
distribution from a large sample.
If xi ∼ p(x) then 1
n
n
i=1 g(xi ) → E[g(X)] = X g(x)p(x)dx
Instead of maths – simulation: generate independent samples from
the joint posterior distribution

Rejection method for direct MC sampling
0 50 100 150 200
0.20.40.60.8
Samples of θ from Accept − Reject procedure
Histogram of θ samples from Accept − Reject procedure
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.5
Figure: Goal: generate samples from a p.d. with density f (θ) = −4θlog(θ), where θ ∈ (0, 1).
Rejection method: generate a θi from Unif(0,1) and accept it w.p. (θ)/1.5. The accepted
samples are from the p.d. with density f (x) = −4θlog(θ).

Markov Chain Monte Carlo(MCMC)
IID sampling from u multidim. dist. very difficult and inefficient.
Markov Chain Monte Carlo (MCMC): a class of algorithms that
generate samples in a way that depends on the previous sample (not
IID)
Ergodic theorem: MCMC sequence (chain) converges to the desired
distribution.
Difficult to prove convergence in practice, instead can show evidence

Inventors of MCMC
A. A. Markov (1857 – 1936 ) John von Neumann, Stanislav Ulam, Nicholas Metropolis

MCMC: Metropolis algorithm
Metropolis algorithm: generate samples from any prob. dist. p(θ | y)
Let g(θ) be a symetric p.d. with the same support as p((θ) | y)
Metropolis algorithm (step t + 1):
Generate θ∗
iz g(θ)
Accept θ∗
as a new sample (that is, θt+1
= θ∗
) with probability
min(α, 1), where
α =
p(θ∗
| y)
p(θt | y)
If θ∗
is not accepted then θt+1
= θt
The set of sample points (θ1, θ2, . . . , θN) has p.d. p(θ | y)
Metropolis algorithm works for any distribution, however, it can be
very ineﬃcient

MCMC: Gibbs algorithm
Goal: generate samples from the joint p.d. p(θ1, θ2, θ3 | y)
The joint p.d. p(θ1, θ2, θ3 | y) is uniquely determined with
p(θ1 | θ2, θ3, y), p(θ2 | θ1, θ3, y) i p(θ3 | θ1, θ2, y) (so called full
conditionals)
Gibbs algorithm generates samples from full conditionals. A set of
such points has the joint p.d. p(θ1, θ2, θ3 | y)
starting values: θ0
1, θ0
2, θ0
3
Step t + 1 of the Gibbs:
θt+1
1 ∼ p(θ1 | θt
2, θt
3, y)
θt+1
2 ∼ p(θ2 | θt+1
1 , θt
3, y)
θt+1
3 ∼ p(θ3 | θt+1
1 , θt+1
2 , y)
Sampling from full cond. can be complex; high auto-correlation

Bayesian decision theory
Bayesian decision theory: rational and coherent decisions are made
based on maximization of the expected utility function.
Let A be a set of possible actions, U(a, θ) the utility function, action
a, where the unknown info is θ;
The optimal actions a∗ maximizes the expectation of U:
E(θ|y) [U(a, θ)]
This approach has been succesfully used in many areas, such as
business management, econometrics, engineering, health care,
medicine, etc.

Bayesian methodology (+/-)
Natural way of combining apriori knowledge with observations;
Sequential learning, in an analogy to acquiring of the scientiﬁc
knowledge.
Basis for the theory of optimal decision making
Inference based on posterior distributions, does not depend on
asymptotics and approximations of CLT type.
Interpretation of CI-s for estimators direct and intuitive.
Hypothesis testing without p-values
Enables systematic mapping of complex problems and structured data
into corresponding model speciﬁcations based on hierarchies of
conditional prob. distributions.
Uses unique probabilistic framework for model development.
Requires careful choice and spec of the prior distributions.
Requires sensitivity (robustness, stability) analysis of inference with
respect to the apriori assumptions.
Development of the model spec and MCMC implementation
non-trivial; requires support of substantial computing power.

Future directions for modern statistical analysis
Both paradigms have their strenghts and weaknesses. In practical
work, as staitsticians use any tools which solve the problem.
In the future research and application of modelling methodology,
focus on the advantages of both paradigms; bring them together
maximizing the gain and also minimizing the impact of weak features.
Use Bayesian approach when constructng the model, bringing in all
the relevant information and take advantage of (1) full probabilistic
framework for inference (2) and flexibility of hierarchical specifications
to model structured data,
Interestingly, the fequentist ideas and statistical results based on
frequentist interpretation of probability remain valid for analyzing the
sampling output of the MCMC, which is extremely important to
examine and establish evidence of convergence.
When checking and using the model it is important to assess its
performance in a frequentist way, how many times does the model
predict well when used in practice.
With the available computing power which is continuing to rise, the
Bayesian approach to modelling becomes more efficient andMilovan Krnjajić (NUIG) 39 / 60

Adaptive clinical trials
Clinical trials are prospective studies with the goal of evaluating the effect of a
medical treatment (a drug or a procedure) administered to human patients under
controlled conditions.
Phase I: considers safety from toxicity and includes dose finding; Phase II: looks for
drug efficacy, having to protect against both toxicity and futility (continuing a trial
unlikely to produce positive results, i.e. the drug has no effect) even if all available
patients are enrolled in the trial. In such case it is prudent to stop the trial early;
Phase III: randomized controlled multi-center confirmatory trials on large groups of
patients; The basis for making definitive assessment of drug efficacy.
A standard approach to design and analysis of the clinical trials uses methods of
classical or frequentist statistics.
Inference closely follows the structure of the trial and the protocol must be strictly
maintained throughout the trial.
Suboptimal in terms of finishing early (either confirming the success or declaring
non-efficacy of a drug or a procedure).
Patients must be randomized in two groups; the number of patients in either
group sometimes larger than necessary.
Any future adaptations or interventions in the trial must be spelled out in advance
in order for the inference to work.

Adaptive clinical trials (2)
A modern alternative approach uses Bayesian modelling and design methodologies.
Bayesian approach results in flexible trial design and analysis where experiments
and corresponding interventions can be changed during the course of trials.
Furthermore, various sources of information can be included in the analysis as the
new information becomes available and expert opinion can be incorporated in final
inferences and conclusions. Also, the methods of decision theory can be seemlessly
combined with the results of Bayesian analysis when making final decisions.
A leading criterion in freqentist design of trials is the control of the rate of Type I
errors (false positives). Any change in the protocol of the trial which affects
stopping boundaries is deemed adaptive if it also keeps Type I error constant.
Bayesian inference, does not depend on the particular features of the design of the
trial such as selection of the sample sizes in advance, for example. In Bayesian
approach, besides choosing the stopping rules during trial, one is free to make
assumptions in the form of prior probability distributions. Trial sample sizes can be
determined while the trials are in progress.
A goal of Bayesian approach to design and analysis of trials is to maximize the
usage of information (which may become sequentially available during the trial),
minimize number of patients involved along with the trial duration. Historical data
can be useful in this approach, especially for testing medical devices.

Adaptive clinical trials (3)
For Immediate Release: Feb. 5, 2010 Media Inquiries: Karen Riley, 301-796-4674;
karen.riley@fda.hhs.gov Consumer Inquiries: 888-INFO-FDA
FDA Issues Guidance to Help Streamline Medical Device Clinical Trials. Agency says Bayesian
statistical methods could trim costs, boost efficiency.
The U.S. Food and Drug Administration today issued guidance on Bayesian statistical methods
in the design and analysis of medical device clinical trials that could result in less costly and
more efficient patient studies.
The Bayesian statistical method applies an algorithm that makes it possible for companies to
combine data collected in previous studies with data collected in a current trial. The combined
data may provide sufficient justification for smaller or shorter clinical studies.
The final guidance describes use of Bayesian methods, design and analysis of medical device
clinical trials, the benefits and difficulties with the Bayesian approach, and comparisons with
standard statistical methods. The guidance also presents ideas for using Bayesian methods in
post-market studies.
Health care payers are also contemplating the role Bayesian methods could play in making
coverage decisions. In a June 2009 public meeting, a Medicare Advisory Committee encouraged
Medicare policymakers to consider Bayesian approaches when reviewing trials or technology
assessments during the national coverage analysis process.

Modelling of Count Data
Example: Count data — Bayesian parametric Poisson based model vs.
Bayesian nonparametric (BNP) with a Dirichlet process prior.
Fixed-eﬀects Poisson model, (for i = 1, . . . , n),
(yi |θ)
ind
∼ Poisson[exp(θ)]
(θ|µ, σ2
)
iid
∼ N(µ, σ2
)
(µ, σ2
) ∼ p(µ, σ2
).
(2)
This uses a Lognormal prior for λ = eθ
rather than conjugate Gamma choice;
the two families are similar, and the Lognormal generalizes more readily.
Data often exhibit heterogeneity resulting in (extra-Poisson variability),
variance-to-mean ratio, VTMR > 1

Parametric Random-Effects Poisson (PREP) Model
Random-effects Poisson model (PREP):
(yi |θi )
ind
∼ Poisson[exp(θi )]
(θi |G)
iid
∼ G
G ≡ N(µ, σ2)
(µ, σ2) ∼ p(µ, σ2),
(3)
assuming a parametric CDF G (the Gaussian) for the latent variables
or random effects θi .
Distribution, G, in the population to which it’s appropriate to
generalize may be multimodal or skewed, which a single Gaussian
can’t capture; if so, this PREP model can fail to be valid.

Dirichlet Process Mixture Model
Remove the assumption of a speciﬁc parametric family (normal) for
the mixing dist. G, allowing G to be random and specifying a prior
model on the spqace of {G}, that may be centered on N(µ, σ2), but
permits adaptation/learning.
We use Dirichlet process (DP), G ∼ DP(α, G0),
Poisson DP mixture model (PDPM):
yi | θi
ind
∼ Poisson(eθi )
θi | G
iid
∼ G
G ∼ DP(α, G0),
(4)
where G0 ≡ N(·; µ, σ2) and i = 1, ..., n.
yi | G
ind
∼ Poisson(yi ; eθ
)dG(θ), (5)
with random mixing d.f. G ∼ DP(α, G0), and G0 = N(µ, σ2).

Financial crisis, liquidity risk, credit risk
Financial crisis, from 2008 until now, the worst in last 80 years.
Former giant financial power houses got bankrupt or were liquidated (Lehman
Brothers), some were sold (Bear Stearns, Merrill Lynch), and some investment
banks became commercial (Morgan Stanley, Goldman Sachs). Big companies also
got bankrupt (Gen. Motors) and the governments of several countries had to make
unpopular decisions to salvage the banking systems from total collapse.
The root causes and the dynamics of the crisis will be studied for long time.
A few decisive factors: (1) unprincipled credit rating of the companies heavily
invested in complicated financial products including large positions in collateralized
loan obligations. (2) absence of proper regulation, the CDS markets grew huge,
contracts made in private, no market transparency; No proper regulation in the
securitization of loans and other financial instruments (3) massive financing of
residential loans made available to population with no ability to service the loans.
(4) extremely complex and innovative financial instruments and their deirvatives
(5) turning the financial market into a giant casino with the betting system using
derivatives and syntetic CDS creating a pyramid of bets several hundred times
larger than the total value of underlying real financial structure.
Scandalous corruption at highest levels of core financial institutions (NASDAQ)
In a situation like this, the serious errors in risk assessment unavoidable

Analysis of liquidity risk and credit risk (1)
Goal: Main modelling ideas and results of an analysis of the roles of liquidity
risk and credit risk in the financial crisis 2007-2009. (Garry Koop,
Department of Economics, Univ of Stratchlyde, UK)
Joint analysis of the CDS spread between LIBOR and OIS. LIBOR = London
Interbank Offered Rate; OIS = Overnight Index Swap rates; CDS = Credit default swap.
A swap transfering credit exposure of fixed income products between parties. Payments
are made to the seller of the swap. In return, the seller agrees to pay off a third party debt
if this party defaults on the loan. A CDS is considered insurance against non-payment.
Separated dynamics of credit risk and liquidity risk.
Liquidity Risk (LR): stems from inability to buy or sell a security quickly
enough to prevent or minimize a loss.
Credit Risk (CR): possibility that debtor stops servicing his obligations, for
example, paying off the loan.

LIBOR: A dayly interest rate at which banks can borrow funds from each
other in the London interbank market.
OIS: (overnight indexed swap) is a way for two financial institutions to exchange interest
rates which they pay (to ithers), a fixed is exchanged for a variable. OIS is a measure of
the investor expectation of the effective federal funds rate and should NOT reflect credit
or liquidity risk. High OIS means that the banks are unwilling to borrow, whereas a low
OIS means that the liquidity is good.
”LIBOR-OIS spread” (difference between LIBOR and OIS, is an indication of the
condition of credit markets. Higher and increasing spread means that the banks consider
that credit risk is high implying a possibility of economic decline. Lower and decreasing
spread means that the banks consider lending the money less risky indicating prospect of
economic growth.)

LIBOR-OIS spread has two components due to liquidity risk and credit risk.
(typically about 10bp; compare with 364bp 10/2008, about 100bp 01/2009).
A typical analysis of LIBOR-OIS spread based on a single aggregate time
series.
This analysis of LIBR-OIS spread is based on cross-bank, cross-currency and
cross term time series from a number of diﬀerent banks, diﬀerent currencies,
and terms (long and short term spreads).
The goal is to disentangle credit and liquidity risks from each other and
analyze the evolution of good/bad combinations of each risks during the
crisis.

Gary Koop (et al.), Department of Economics, University of Stratchlyde, Understanding
liquidity and credit risks in the ﬁnancial crisis (Oct. 2010). Analyzes LIBOR-OIS time
series in three dimensions: {banks, currency, terms} in order to understand the dynamics
and impact of two risks (liquidity and credit).
Data: LIBOR-OSI from the banks such as (Barclays, JP Morgan, Citibank,
Deutchebank, RBS, HSBC, Rabobank, LLoyds, UBS); terms of (1, 3, 12) months;
currencies (EUR, USD, GBP)
Basic elements of the model:
SijktSijktSijkt = LIBOR-OIS;
DjtDjtDjt = CDS rate for bank j;
LktLktLkt = liquidity risk (unknown);
KtKtKt = aggregate credit risk for all banks (unknown);
Meaning of the indexes:
term: iii = {1, 2, 3}
bank: jjj = {1, 2, ..., 12}
currency: kkk = {1, 2, 3}
time (days): ttt = {1, 2, ..., 727}

Bayesian structural model of dynamic factors (1)
A simple form of state space model, or dynamic linear model (DLM)
Yt = Ftθt + vt, vt ∼ Nm(0, Vt)
θt = Gtθt−1 + wt, wt ∼ Np(0, Wt)
where Ft is m × p matrix, θ is p × 1 vector of states, vt i wt is a sequence
of independent Gaussian r.v.-s Vt i Wt are covariance (dispersion) matrices
whereas Yt are observations.
Latent variables, θ, change according to the Markov chain:
P(θt | θ1, ..., θt−1) = P(θt | θt−1)
If (θ∗
1 , . . . , θ∗
k ) are discrete, we have the so called switching states model, or
hidden Markov model, where P(i | j) = P(θt = θ∗
i | θt−1 = θ∗
j )
0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500
−5
0
5
10
15
20
time (s)
respiration

Joint model for S (LIBOR-OIS) spread and D (CDS, credit default swap)
accross diﬀerent banks, terms and currencies:
Sijkt = λS
ijk Lkt + ψS
ij Kt + βik Xt + S
ijkt
Djt = ψK
j Kt + γ Zt + D
jt
where S
ijkt
IID
∼ N(0, σ2
ijkS ) and D
jt
IID
∼ N(0, σ2
jD)
Assumptions:
coeﬃcients (of L i K) vary overbanks (j) and terms (i)
D, the CDS bank rate does not depend directly on liquidity risk
λS
> 0, ψS
> 0, ψK
> 0, growth of L i K increases S (LIBOR-OIS) and
D (CDS rate)
Credit risk impacts S equally for all terms i and currencies k.

Model for liquidity risk Lkt and credit risk Kt:
Lkt = φk0 + φk1Lk,t−1 + σkLvkt
Kt = η0 + η1Kt−1 + σK wt
Markov state space model (st):
Lkt = φL
k0(sL
t ) + φL
k11(sL
t )Lk,t−1 + σkL(sL
t )vL
kt
Kt = φK
0 (sK
t ) + φ1K
(sK
t )Kt−1 + σK (sK
t )vK
t
sK
t ∈ {GK , BK } and sL
t ∈ {GL, BL} where ukt and vt are from N(0,1) dist.
Result: transition probability matrices for all states (probability of going
from state i to state j)
st is same for all currencies, so all liquidity factors are connected through st.

Posterior mean of latent credit and liquidity risks
Figure: Credit risk (dark blue); Liquidity risk (EUR, USD, GBP); CR: mainly grows until 12/008 with two abrupt dips,
larger from 12/2007 to 01/2008 (result of forced re-evaluation and writedown), and a smaller one in the summer of 2008;
otherwise the changes are fairly smooth. LR: changes much more abruptly with three major upsurges of LR 08/2007, 12/2007,
and in 10/08 (time of the worst crisis); LR similar accross currencies, yet LR in USD is the largest of the three (10/2008, peak
of crisis). CR changes slower than LR, whereas LR has abrupt changes and it is easier to relate these with regulatory
interventions and other events during the crisis.

Probabilities of states (GL, GC )

Probabilities of states (BL, GC )

Probabilities of states (GL, BC )

Probabilities of states (BL, BC )

Variability components of LIBOR-OIS (Euro)
1 Month 3 Months 12 Months
Var L Var C Var L Var C Var L Var C
barclays 0.8953 0.0000 0.6521 0.1986 0.2183 0.5940
btmufj 0.8509 0.0154 0.6115 0.2391 0.2308 0.5804
citibank 0.8573 0.0045 0.6327 0.2208 0.2288 0.5784
deutschebank 0.8600 0.0000 0.5914 0.2377 0.2786 0.5370
hbos 0.8276 0.0000 0.7510 0.0077 0.5575 0.0610
hsbc 0.8616 0.0003 0.6458 0.2079 0.2908 0.5170
jpmc 0.9018 0.0000 0.6562 0.2013 0.2787 0.5280
lloyds 0.8918 0.0000 0.6309 0.2220 0.2037 0.6015
rabobank 0.8816 0.0015 0.6043 0.2301 0.2736 0.5323
rboscotland 0.8773 0.0003 0.6107 0.2350 0.1847 0.6213
ubs 0.8353 0.0005 0.5832 0.2578 0.2277 0.5860
westlb 0.8539 0.0043 0.5929 0.2518 0.2337 0.5835
Figure: Variability conomponents for teh predictive distribution; 1-month terms, over 80% of
variability belongs to LR, 20% to CR; 3-month terms (about) 60-70%; 12-month terms – credit
risk more signiﬁcant.

Thank you for your attention!

Bayesian Methods for Modern Statistical Analysis: A Bayesian Approach

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Bayesian Methods for Modern Statistical Analysis: A Bayesian Approach

Similar to Bayesian Methods for Modern Statistical Analysis: A Bayesian Approach (20)

More from NUI Galway

More from NUI Galway (20)

Recently uploaded

Recently uploaded (20)

Bayesian Methods for Modern Statistical Analysis: A Bayesian Approach