This document outlines an approach for quantitative risk assessment in road transport infrastructure projects using stochastic analysis with triangular distributions. It discusses determining the combined influence of parameters like project cost and traffic on economic indicators. Traditionally, risks from cost and traffic changing from base cases are analyzed separately using triangular distributions defined by minimum, most likely and maximum limits. The document proposes a method to analyze the combined influence of both parameters varying simultaneously using bivariate distributions and conditional probabilities.
Quantitative Risk Assessment - Road Development Perspective
1. 1
QUANTITATIVE RISK ASSESSMENT – ROAD
TRANSPORT INFRASTRUCTURE DEVELOPMENT
PROJECT PREPARATION PERSPECTIVE
Subir Kumar Podder1
ABSTRACT
The paper is aimed at outlining an approach for Quantitative Risk
Assessment employing stochastic analysis (with a triangular distribution)
so as to determine the combined influence of the dictating parameters on
probabilities. It is appreciated that in the perspective of Road Transport
Development Projects, and more specifically the Economic Analysis
carried out in Feasibility Studies for Road Improvement and
Rehabilitation Projects, such dictating parameters are Traffic and Project
Cost. Accordingly the paper aims at an approach for determining the
impact (on Traditional Economic Analysis Instruments like EIRR, NPV)
under a combined influence of the aforesaid two parameters. Such
analysis requires treading a step beyond that what is required when
probabilities are ascertained separately, and is the focus of this paper.
1. INTRODUCTION
Quantitative Risk Analysis – It provides a means of
estimating the probability that the project NPV will fall
below zero, or that the project EIRR will fall below the
opportunity cost of capital.
More explicitly, the quantitative risk analysis involves
randomly selecting values for the variables from the
probability distribution determined; combining these
values with all base case values to give an EIRR (or NPV)
result; and repeating such a calculation a large number of
times to provide a large number of EIRR (or NPV)
estimates. These estimates are then summarized in a
distribution, key features of which is the proportion of
EIRR values that fall below (i) the opportunity cost of
capital (say 12 per cent), (ii) the most likely forecast values
(i.e. the value which the analysis actually yielded). The
objective is to estimate the probability that the project
might turn out to be unacceptable.
Traditionally, the principle of risk analysis is based on
random simulations carried out assuming a triangular
probability distribution with predetermined minimum,
most likely and maximum limits. The most likely values are
those assumed for the base case situation, whilst the lower
and upper limits that are to be applied for risk analysis in
road feasibility studies are expressed in terms of percentage
(%) of the base-case2
values.3
1
Consultant (Highways), LEA Associates South Asia Pvt.
Ltd, New Delhi, INDIA
2
The base-case refers to Estimated Project Cost and
Estimated Traffic. Risks triggered by changes from these
Triangular Probability Distribution - The triangular
distribution is typically used as a subjective description of a
population for which there is only limited sample data, and
especially in cases where the relationship between variables
is known but data is scarce. It is based on knowledge of the
maximum and minimum and an “Inspired guess” as to the
modal value. [For this reason the triangular distribution has
been called a “lack of knowledge” distribution]. The
triangular distribution is often used in business decision
making4
, particularly in simulations. Generally, when not
much data is known about the distribution of an outcome,
(say, only its smallest and largest values are known), it is
possible to use the uniform distribution. But if the most
likely outcome is also known, then the outcome can be
simulated by a triangular distribution.5
In probability
theory and statistics, the triangular distribution is a
continuous probability distribution with lower limit a, upper
limit b and mode c, where a < b and a <= c <= b. The
probability density function (pdf) is given by:
..................(1)
The cumulative distribution function (cdf) is given by:
............(1A)
There are generally no fixed criteria for using such a result
in ascertaining the acceptability of the Project Viability.
estimates, measured in terms of impacts on the economic
parameters, are essentially what the the quantitative risk
assessment is aimed at.
3
The adopted values (i.e. % of base-case values) are
generally those that the Terms of Reference for a particular
project specify. However, generally they are the same that
are accounted for Sensitivity Analysis performed under an
Economic Analysis.
4
The triangular distribution, along with the Beta
distribution, is also widely used in project management (as
an input into PERT and hence critical path method (CPM))
to model events which take place within an interval defined
by a minimum and maximum value.
5
Source: Wikipedia, the free encyclopedia
2. 2
However, high risk probabilities may be associated with
projects that have a high expected NPV (or EIRR).6
Pertinence to Road Development Projects - Risk
analysis, carried out under Feasibility Studies, for road
transport development projects is concerned with the
probability or likelihood of the following:
Construction costs increasing/ decreasing by a
certain percentage than that for the base-case;
&
Traffic growth occurring at a lower/higher rate
than that assumed in the base case situation.
Such probabilities get determined employing the principles
of random simulations assuming a triangular probability
distribution.
Continuing from above, intrinsic to the triangular
probability distribution are the boundary conditions
(upper and lower limits7
, as well as the mode value i.e. the
most likely value). Very often the prescribed values for the
upper and lower limits are as follows:
UPPER LIMIT:
1) Construction Cost – 90 % of the Base Case
(i.e. the Favoured Alternative)
2) Traffic Growth – 110% of the Base Case
(i.e. the Favoured Alternative)
LOWER LIMIT:
1) Construction Cost – 120 % of the Base Case
(i.e. the Favoured Alternative)
2) Traffic Growth – 75% of the Base Case (i.e.
the Favoured Alternative)
6
Source: Page 157, Appendix 21, Guidelines for the
Economic Analysis of Projects, Economic and Development
Resource Center, the Asian Development Bank, February
1997.
7
It is imperative that the (i) Upper Limits correspond to the
scenario where the %-change to the variables lend a higher
EIRR (or NPV); and (ii) Lower Limits correspond to the
scenario where the %-change to the variables lend a lower
EIRR (or NPV), when compared to the EIRR (or NPV) of
the Favoured Alternative (the most likely value i.e. that the
analysis yields with the Base Case values).
2. THE CONVENTIONAL PRACTICE
With the boundary conditions in place, as stated
immediately above, the stochastic analysis, as often
practiced, essentially involves the following steps:
Establishing the probability distribution function
(pdf), assuming a triangular probability distribution,
for changes in EIRR triggered by COST
Making Random Simulations to arrive at a large
number of estimates (of EIRR)
Establishing the cumulative distribution function
(cdf) [it essentially summarises the estimates]
Interpreting the results which essentially is
determining the proportion of EIRR values that fall
below (i) the opportunity cost of capital (say 12 per
cent), (ii) the most likely forecast values (i.e. the
value which the analysis actually yielded), (iii) the
central estimates (say the mean value).
The analysis is then repeated for changes in EIRR as
triggered by changes to TRAFFIC.
The aforesaid steps can be performed using a MS EXCEL
spreadsheet analysis8
that employs in-built functions to
generate random functions and then the One Way Data
Table to trigger the simulations required.
3. BASIC STOCHASTIC PRINCIPLES FOR
BIVARIATE RANDOM VARIABLES – AN
INTRODUCTION
3.1 BASIC PRINCIPLES
Basic principles on which this paper banks upon are
presented briefly next.
pdf » Probability Density Function f(x)
[pdf is not a probability, it can have value >1.0]
Any function can be a pdf if the following are satisfied:
f(x) ≥ 0 and
1
8
The spreadsheet analysis generally uses the uniform
random number function [RAND()] in Excel to simulate
various discrete or continuous outcomes, without the use of
add-ins such as @RISK or Crystal Ball, both of which are
programs that are often recommended for performing
stochastic analysis [For instance, the “Procedural Guide to
Economic Road Feasibility Studies, MOWHC, Government
of Uganda, (March 2006)”]. By the use of lookup tables,
the simulation repeats itself.
3. 3
Also, F(x) = P[X ≤ x] =
Here, cdf » Cumulative Distribution Function F(x).
Extracts9
are used here for pertinent explanations.
Further,
P[x ≥ a] = 1 - P[X ≤ a]
This is illustrated below.
3.2 BI-VARIATE DISTRIBUTIONS
Continuing from the above, for a two-dimensional random
variable (X,Y), where both X and Y are continuous random
variables, the Bi-variate Distributions are as follows:
Joint pdf of (X,Y):
For a continuous r.v. (X, Y), the joint probability density
function f(x,y) is defined as
1) f(x,y) > = 0
2) ,
The joint pdf, f(x,y) is not a probability
Joint cdf of (X,Y):
F(x, y) = P [ X<= x, Y < = y ]
= ,
9
Source: Extracts from NPTEL, Lecture No. # 02,
Bivariate Distributions, Stochastic Hydrology, Prof. P. P.
Mujumdar, Department of Civil Engineering, Indian
Institute of Science, Bangalore.
3.3 MARGINAL DENSITY FUNCTIONS
The marginal density functions of bi-variate continuous
random variables relate essentially to probability of a
particular variable irrespective of the value that the other
random variable takes The marginal density functions, g(x)
and h(y), of X & Y respectively are defined as follows:
,
,
These are in fact derived from the joint pdf f(x,y), as
follows:
P [c <= X <= d] = P [c <= X <= d, - <= Y <= ]
= ,
From the definitions of pdf’s it is thus seen that g(x) is in
fact the original pdf of the r.v. (random variable) X. Thus,
,
,
Similarly for the r.v. (random variable) Y
3.4 CONDITIONAL DISTRIBUTION
The conditional distribution of X given Y=y is defined as:
g (x / y) = f (x, y) / h (y), h (y) > 0
The conditional distribution of Y given X=x is defined as:
h (y / x) = f (x, y) / g (x), g (x) > 0
While the conditional pdfs satisfy all conditions for a pdf,
the Cumulative Conditional Distributions are:
⁄ ⁄
⁄ ⁄
With the aforesaid explanations for the fundamentals of a
bivariate distribution (for continuous random variables),
the following requirement for stochastic independence
can be concluded upon.
4. 4
3.5 INDEPENDENT RANDOM VARIABLES
When the two random variables are independent, we have
g(x/y) = g(x), and h(y/x) = h(y)
i.e the conditional pdf is equal to the marginal pdf
Therefore,
g (x / y) = f (x, y) / h (y)
g (x) = f (x, y) / h (y)
So, f (x, y) = g (x) . h (y)
So it is concluded that for X and Y to be stochastically
independent,
f(x,y) = g(x) h(y)
The following example illustrates this (taken from Ref. 2).
4. TRIANGULAR DISTRIBUTION - BASICS
4.1 PERTINENT PROPERTIES
Continuing with the discussions on triangular distribution,
in Section-1 above, relevant properties are as follows.
Fundamental properties10,11
:
4.2 GENERATING RANDOM VARIATES
Carrying out random simulations being basic requirements,
as mentioned in Section-2 above, the following expressions
10
http://en.wikipedia.org/wiki/Triangular_distribution
11
These properties find specific relevance to parameter
estimation using Method of Moments
5. 5
are used in generating12
the random variates for a
triangular distribution.
................... (2)
4.3 PARAMETER ESTIMATION
While the aforesaid equation allows generating random
variates which follow a triangular distribution, the next step
is estimating parameters defining the distribution which
essentially are the values for a, b and c in eqn. (1).
The methods used for Parameter Estimation are:
(i) Method of Matching Points – Is a simple but
Approximate Method, hence may be used for first
approximations.
(ii) Method of Moments13
(MoM) – In this method
equating the first ‘m’-moments of the population to the
sample estimates of the first ‘m’-moments results in ‘m’
equations to solve for ‘m’-unknown parameters.
(iii) Method of Maximum Likelihood (ML) – Is based on
maximising a likelihood function, and is preferred over
Method of Moments. A brief14
follows:
12
http://www.asianscientist.com/books/wp-content/uploads/
2013/06/5720_chap1.pdf
13
First Moment – Mean, Second Moment – Variance, Third
Moment – Skewness, Fourth Moment – Kurtosis etc. It is to
be appreciated that the properties mentioned in Section
4.1 finds relevance owing to their requirements for
Parameter Estimation using the Method of Moments.
14
For details references can be made to: NPTEL, Lecture
No. # 07, Parameter Estimation, Stochastic Hydrology,
Prof. P. P. Mujumdar, Department of Civil Engineering,
Indian Institute of Science, Bangalore
However triangular distribution poses specific problems in
the use of ML for parameter estimation. While there are
different literature15
,16
,17
available on this and the aspects
that can be adopted for addressing such difficulties, the one
that this paper has used is that given in Reference-1218
.
Extending that mentioned in (17), Reference 12 presents a
simplified version that requires solving numerically a single
equation given next:
............... (3)
While details are presented in Reference-12, the unique
solution to Eqn. (3) is the intersection of the function g(q)
with the positive diagonal of the unit square [Shown
subsequently in Section 8.1 of this paper].
15
Joo, Y. and Casella, G. (2001), Predictive distributions in
risk analysis and estimation for the triangular distribution,
Envoronmetrics, 12: 647-658, doi: 10.1002/env.489: This
mentions that estimation using quantile least squares is
preferable to ML for ‘triangular distribution’.
16
Source: Page 28, Reference12: The package @RISKS
allows definition of a triangular distribution by specifying a
lower quantile ap, a most likely value m and an upper
quantile br, such that a < ap <= m <= br < b. This avoids
having to specify the lower and upper extremes a and b that
by definition have a zero likelihood of occurrence. The
software @RISKS does not provide details, however,
regarding how the bounds a and b are calculated given
values for ap , m and br. [Note: m is same as c, in eqn (1,2)]
17
Source: Page 28, Reference12: Keefer and Bodily (1983)
formulated this problem in terms of two quadratic equations
from which the unknowns a and b had to be solved
numerically for the values p=0.05 and r=0.95.
18
Can be accessed from site:
http://en.wikipedia.org/wiki/Triangular_distribution.
6. 6
5. FRAMING OF THE PROBLEM
Continuing with the bivariate probability discussed above
(in Section-3), in the perspective of the Quantitative Risk
Analysis pertinent to Road Infrastructure Development
Projects described at the onset (in Section 1), let us consider
two variables C and T influence of which get manifested in
the variable E19
.
The following (C and T) are differences w.r.t. base
values for C and T respectively, and the corresponding E
values (EC and ET) resulting from changes to C/T.
C EC T ET
‐12 2.2 ‐13 ‐1.4
‐8 1.4 ‐10 ‐0.8
‐4 0.5 ‐8 ‐0.4
0 0 0 0
4 ‐1.1 3 0.6
8 ‐1.9 4 1
12 ‐2.2 18 2
The following are graphical representations of the aforesaid
monotonously decreasing / increasing functions.
In line with the conventional practice, mentioned in
Section-2 earlier, the probability distribution functions (pdf)
for changes in E (E) are determined separately next, as
triggered by changes in C (C) and changes in T (T)
respectively. A triangular distribution gets assumed, as we
derive the pdfs, given next, using eqn. (1).
19
While C and T are typically COST and TRAFFIC, E is
EIRR (or NPV) as described in Section-1 above.
.................... (4)
The following describes the ‘parameter estimation’ in
establishing the pdf for ET
20
and Ec in line with eqn. (3)
described in Section 4.3 earlier.
Table 1-1 Parameter Estimates
Pursuant to the traditional practice, simulations (to arrive
at a large number of estimates for either of the cases, EC
and ET) can be performed using eqn (2). Using such
estimates proportions of EC and ET values that fall below a
threshold21
or other central estimates (say mean) can be
derived (separately for impacts of C and T). However this
paper is aimed at determining the combined influence of
the variables C and T on E.
20
Goal Seek Function in MS-Excel has been used to
facilitate the trial and error involved [in the assumption of
‘q’].
21
For road infrastructure development projects this is 12
per cent, the cut-off EIRR which is generally considered as
the opportunity cost of capital).
7. 7
6. THE PROPOSED APPROACH - BASICS
Continuing from discussions in Section-3.3 earlier,
marginal density functions of bi-variate continuous random
variables relate essentially to probability of a particular
variable irrespective of the value that the other random
variable takes.
Now in regard to estimating Economic Internal Rate of
Return (EIRR) for Road Infrastructure Development
Projects, Project Cost and Traffic Growth are independent
entities. As described at the onset, in Section-1, prescribed
boundary conditions are used in the economic analysis
(traditionally such analysis entails Sensitivity Analysis22
and Quantitative Risk Analysis). Accordingly therefore the
following analysis assumes that impact of C and T on
E being not dependent, the distributions23
for C and T
can be construed as marginal distributions. Using the
values for supports derived through a triangular pdf, the
marginal density functions (are essentially those given
under eqn. 4) g(EC) and g(ET) are24
:
.................... (5)
Now from the conditionality for stochastic independence,
discussed in Section-3.5 earlier, we have
f(E) = g(EC) x g(ET) ............. (6)
Therefore the pdf for E
.................... (7)
22
It is appreciated that unlike traditionally performed
Quantitative Risk Analysis, traditionally Sensitivity
Analysis considers a worst scenario of increased Project
Cost together with reduced Traffic. Hence the relevance of
this paper finds credence.
23
Given in Section-5
24
Nomenclatures used are in accordance with those given
in Section-3 of this paper
So the cumulative distribution function (cdf) is25
:
.................... (8)
This, together with conventional simulations using
triangular distribution, forms the basis for the proposed
approach. The details are presented in the subsequent
section.
7. THE PROPOSED APPROACH –
FORMULATION
The proposed approach is based on mathematical
computation for the RHS of eqn.-8, equated to probability –
computations for the LHS of eqn.-8.
7.1 MATHEMATICAL COMPUTATIONS
The RHS of eqn.-8 is essentially
.................... (9)
Considering x to represent Ec, then the limits are either
-2.75 & Ec OR 0 & Ec, depending on value of Ec.
Considering y to represent ET, then the limits are either
-1.79 & ET OR 0 & ET, depending on value of ET.
For example,
Say for certain Ec and ET, the calculations using eqn (9)
yields P [E] = 0.699 ~ 0.7.
7.2 PROBABILITY COMPUTATIONS
As mentioned above, the LHS of eqn.-8 is addresses using
the principles of probability, as detailed next.
25
“Joint cdf” in Section 3.2 may be referred
8. 8
Step 1: Using the estimated parameters a,b,c as given in
Table 1-1 , random simulations were carried out for Ec.
For such simulations Eqn.-2 and the uniform random
number function RAND( ) in MS Excel get used.
Step 2: Similar simulations are then carried out for ET.
Steps 3: Pursuant to stochastic independency, on which the
approach is based, multiplying the above, as shown below,
yield simulations for E.
Table 1-2 Simulated Data for E
Steps 4: A triangular distribution, with mode ( = c )26
= 0 is
applied to the simulated data in Table 1-2.
Table 1- 3 Parameter Estimates for DE
26
Note: c is same as m that Eqn (3) mentions
As may be seen the supports that get derived from
‘parameter estimation’ using27
the Method of Maximum
Likelihood (ML) [Eqn.-3] are:
a = -0.250, c = 0, b = 0.258
Steps 5: With the aforesaid supports, and using Eqn.-1 and
Eq.-1A pdf and cdf respectively was determined for E.
The following graphs show this.
The following table gives the probabilities for E.
Equation-1A provides P values for corresponding X (=E).
Table 1-4 Probabilities for E with Triangular
Distribution
For example,
As may be seen from the shaded row in Table 1-4, the
probability P [E < = 0.06] = 0.702 ~ 0.7.
27
The initial estimates ap and br, [say a =ap = -0.2 and b =
br = 0.21 shown in the Table 1-3] can be derived either
using “Method of Matching Points” or “Method of
Moment”, described in Section 4.3 earlier.
9. 9
7.3 INFERENCE
(a) Inferring from Section 7.1 and Section 7.2, equating
the separate computations for RHS and LHS of Eqn-
(8), does yield useful information on the combined
influence of Ec and ET on E.
(b) Now that there exist definite relations,
C with Ec and T with ET,
as shown in Section 528
earlier, inference (a) above can also
be interpreted as “the combined influence of C and T on
E can be arrived at”.
8. RELEVANT DISCUSSIONS
8.1 UNIQUE SOLUTIONS FOR ML METHOD
This section is aimed at the unique solution of Eqn. (3)
mentioned under Section 4.3 earlier. In this regard it is
recalled, as mentioned under Footnote-16, for solutions to
the estimates of parameters for the ML method “Keefer and
Bodily (1983) formulated this problem in terms of two
quadratic equations from which the unknowns a and b had
to be solved numerically for the values p=0.05 (i.e. r = 1-p
=0.95).” Also recalled, Section 4.3 was concluded stating
that it has been established analytically that “the unique
solution to Eqn. (3) is the intersection of the function g(q)
with the positive diagonal of the unit square.” While the
following graph presents this, the subsequent table shows
the derivations for p = 0.02 (i.e. r = 0.08).
28
The monotonously decreasing / increasing functions
given in the graphs in Section-5
Given that for each value of p there is a unique q, it is
imperative that different sets of parameters29
a and b exist.
The following figure shows the different sets (of a and b )
for the particular sample considered in this paper.
8.2 GOODNESS OF FIT
As may be seen from Step-4, Section 7.2, a triangular
distribution is applied to the simulated E data.
It is appreciated that:
1. The sample size, obtained through simulations,
considered herein is less (just 20). A larger sample is
however to be considered in practice, to yield more
reliable estimates.
2. Also, other distributions might be a better fit. While the
aspect of goodness of fit30,31
is not attended to in this
paper, the following is a frequency analysis to present
some idea on the level of appropriateness in adopting
the triangular distribution for the set of data considered.
29
The parameter c (or m, see note-28) however remains
same. For the case under consideration, a and b being
departures from the most likely estimate (mode) c is ‘0’.
30
For details references can be made to: NPTEL, Lecture
No. # 28, Goodness of Fit, Stochastic Hydrology, Prof. P.
P. Mujumdar, Department of Civil Engineering, Indian
Institute of Science, Bangalore
31
For details the following may also be referred to:
Statistical Methods in Hydrology, Charles T. Haan, The
IOWA State University press, 1977.
10. 10
Table 1-4, presented in Section 7.2, provides
probabilities corresponding to the sample population
shown in the Frequency Table above.
3. Continuing from above, probabilities with a Normal
Distribution corresponding to the same set follows. The
differences are obvious when seen against those for
triangular distribution given in Table 1-4.
4. The following is a graphical representation.
5. Implications of “Goodness of Fit”, using the aforesaid
results, can be perceived from the following :
Triangular distribution:
P [E <= 0.060]= 0.7(i.e. 70%)
Normal distribution:
P [E <= 0.104]= 0.7 (i.e. 70%)
P [E < = 0.060] = 0.324 (i.e. 32%)
9. SUMMARY
While it is a practice for Sensitivity Analysis32
to ascertain
the worst scenario involving the negative effects of all the
deciding parameters, often the same is not practiced for
Quantitative Risk Analysis in Road Infrastructure
Development Projects. This paper attempts to present a
means for ascertaining probabilities of such combined
impacts. Further, this allows for opportunities to reallocate
resources at a future date (subsequent to an Economic
Analysis) in the event changes are frequented for a
particular parameter (say33
, a decrease in Traffic Growth
rate, which thus necessitates reduction to the Project Cost
so as to have the EIRR within an acceptable limit).
However the paper restricts itself to bi-variate continuous
random variables. Given that Traffic and Project-Cost are
the primary elements which get considered in an economic
analysis, a bi-variate distribution however finds relevance.
Applicable fundamentals of probability / stochastic analysis
are revisited for working engineers to have a ready
appreciation. It is appreciated that the same approach may
even be extended to other domains of engineering by
highway engineers.
REFERNCES:
1. Guidelines for the Economic Analysis of Projects,
Economic and Development Resource Center, the
Asian Development Bank, February 1997
2. NPTEL, Lectures on Stochastic Hydrology, Prof. P. P.
Mujumdar, Department of Civil Engineering, Indian
Institute of Science, Bangalore
3. Procedural Guide to Economic Road Feasibility
Studies, MOWHC, Government of Uganda, (March
2006)
4. http://en.wikipedia.org/wiki/Triangular_distribution
5. http://www.asianscientist.com/books/wp-
content/uploads/ 2013/06/5720_chap1.pdf
32
Footnote 22 may be referred to
33
Section 7.3 elaborates this with an example