Prob-Dist-Toll-Forecast-Uncertainty

Prob-Dist-Toll-Forecast-Uncertainty
December 16, 2015
In [2]: # special IPython command to prepare the notebook for matplotlib
%matplotlib inline
import numpy as np
import pandas as pd
import scipy as sp
import seaborn as sns
import matplotlib.pyplot as plt
Estimating the probability distribution of a travel demand forecast Authors: John L. Bowman,
Dinesh Gopinath, and Moshe Ben-Akiva
0.0.1 Algorithm
1. Identify variables that induce error in Toll Revenue prediction : x = (x1, x2, ..., xk, ..., xK)
• Simple Toll Revenue Model - Variables that induce error in Toll Revenue Prediction r(p)
are: (1)
Value of Time, (2) Population, (3) Households, (4) Employment
• Let: x1 = Value of Time, x2 = Population, x3 = Households, x4 = Employment. Thus K = 4,
i.e. 4-Dimensional space of possible outcomes xk
• x = (x1, x2, x3, x4)
2. Obtain probability distribution of xk for k = 1, 2, . . . , K. Distribution can be based on:
(a) Direct input or (b) Assumption, e.g. Triangular, Normal, etc. For each dimension k
discretize an assumed probability distribution and identify a small set of discrete outcomes xnk
k , where
nk = 1, 2, 3, . . . , Nk (assign probabilities p(xnk
k ) to these discrete outcomes based on reasoning and
empirical evidence to approximate xk’s true distribution???):
• Let N1 = 4, N2 = 3, N3 = 5, N4 = 4 and xk = {x1
k, x2
k, x3
k, ..., xNk
k }
• x1 discrete outcomes = {x1
1, x2
1, x3
1, x4
1}, with p(x1
1) + p(x2
1) + p(x3
1) + p(x4
1) = 1
2, x2
2, x3
2}, with p(x1
2) + p(x2
2) + p(x3
2) = 1
3, x2
3, x3
3, x4
3, x5
3}, with p(x1
3) + p(x2
3) + p(x3
3) + p(x4
3) + p(x5
3) = 1
4, x2
4, x3
4, x4
4}, with p(x1
4) + p(x2
4) + p(x3
4) + p(x4
4) = 1
3. Develop Toll Revenue Model for Baseline Scenario : - Get Predicted r
(p)
base for Baseline Scenario
x = (xbase
1 , xbase
2 , xbase
3 , xbase
4 ) from output of the model
4. Run Toll Revenue Model one time for each variable that induce error in prediction :
• Get predicted r
(p)
k=1 based on x = (xextreme
1 , xbase
2 , xbase
3 , xbase
4 )
• Get predicted r
(p)
k=2 based on x = (xbase
1 , xextreme
2 , xbase
3 , xbase
4 )
• Get predicted r
(p)
k=3 based on x = (xbase
1 , xbase
2 , xextreme
3 , xbase
4 )
• Get predicted r
(p)
k=4 based on x = (x1, x2, x3, x4 )
1

5. Calculate change in Predicted Toll Revenue and variables that induce error in prediction
:
• rchange
k =
r
(p)
base−r
(p)
k
r
(p)
k
, where k = 1, 2, 3, 4
• xchange
k =
xbase
1 −xk
xk
, where k = 1, 2, 3, 4
6. Calculate Elasticity of Toll Revenue with respect to variables that induce error in pre-
diction :
• er
k =
rchange
k
xchange
k
, where k = 1, 2, 3, 4
7. Deﬁne a set of scenarios: S = {(xn1
1 , ..., xnk
k , ..., xNk
K ); nk = 1, 2, 3, ..., Nk; k = 1, 2, ..., K}, covering
all combinations of the discrete coutcomes in all K = 4 dimensions
• For simple example, S = {$(x 1ˆ1, x 2ˆ1, x 3ˆ1, x 4ˆ1), (x 1ˆ1, x 2ˆ1, x 3ˆ1, x 4ˆ2), . . . , (x 1ˆ1,
x 2ˆ3, x 3ˆ5, x 4ˆ4), . . . , (x 1ˆ4, x 2ˆ3, x 3ˆ5, x 4ˆ4) $}
• Number of scenarios in S =
K
k=1 Nk. Thus the number of scenarios in simple example is S = 4
x 3 x 5 x 4 = 240. Thus s = 1, 2, 3, . . . , 240
• Using s as a 1-Dimensional index of the member of S: Refer to a single member of S as x(s)
=
(x
(s)
1 , x
(s)
2 , ..., x
(s)
k , ..., x
(s)
K ). For simple example: x(s=1)
= (x1
1, x1
2, x1
3, x1
4); x(s=2)
= (x1
1, x1
2, x1
3, x2
4);
x(s=240)
= (x4
1, x3
2, x5
3, x4
4)
8. Calculate the probability of each scenario: Error variables are mutually independent, thus the
probability of each scenario is given by: p(s) =
K
k=1 p(x
(s)
k ), s ∈ S, thus for simple example:
• p(s = 1) =
4
k=1 p(x
(s=1)
k ) = p(x1
1)p(x1
2)p(x1
3)p(x1
4)
• p(s = 2) =
4
k=1 p(x
(s=2)
k ) = p(x1
1)p(x1
2)p(x1
3)p(x2
4)
• p(s = 240) =
4
k=1 p(x
(s=240)
k ) = p(x4
1)p(x4
2)p(x5
3)p(x4
4)
9. Calculate Toll Revenue for scenario s, r(s)
= r
(p)
base
K
k=1
x
(s)
k
xbase
k
er
k
, s ∈ S, thus for simple example:
• r(s=1)
= r
(p)
base
4
k=1
x
(s=1)
k
xbase
k
er
k
= r
(p)
base
x1
1
xbase
1
er
1 x1
2
xbase
2
er
2 x1
3
xbase
3
er
3 x1
4
xbase
4
er
4
• r(s=2)
= r
(p)
base
4
k=1
x
(s=2)
k
xbase
k
er
k
= r
(p)
base
x1
1
xbase
1
er
1 x1
2
xbase
2
er
2 x1
3
xbase
3
er
3 x2
4
xbase
4
er
4
• r(s=240)
= r
(p)
base
4
k=1
x
(s=240)
k
xbase
k
er
k
= r
(p)
base
x4
1
xbase
1
er
1 x3
2
xbase
2
er
2 x5
3
xbase
3
er
3 x4
4
xbase
4
er
4
10. Using pairs r(s)
and p(s) Plot Revenue CDF
0.0.2 Sources of Uncertainty - Toll (Kockleman)
• Estimates of trip generation
• Estimates of land development
• Models: Trip Generation, Trip Distribution, Mode Choice
• Toll-technology adoption rates
• Hetrogeneity in (VOT) Value of Time savings
2

• Network attributes - Traffic congestion (low-volume corridors have greater uncertanity in their
forecasts)
• Uncertainty in land development patterns
• Demographic and employment projections
• Tolling design - shadow tolls (govt. pays the concessionaire an amount based n toll road use - similar
to toll free) or user-paid tolls (drivers willingness to pay is more complex and difficult to understand -
more forecasting risk)
• Tolling culture of a region, i.e. the degree to which tolls have been used in the past
• Travel demand model imperfections (Heterogenity of VOT is ignored, Variable tolls or HOT lanes
that are free at certain hours)
• Competitive advantage - Toll on the only bridge vs toll on freeway - more options to route
• User attributes - toll facilities serving a small market segment of travelers allow more reliable forecasts
vs hetrogenous populations
• Road location, configurations
• Demand variations over times of days and days of the year also affect forecast reliability
• Brian and Wilkins (2002) - poorly estimated VOTT’s, economic downturns, mis-prediction of future
land use patterns, lower than predicted time savings, added competition, lower than anticipated truck
usage, high variability in traffic volumes
• Economic growth and related changes in income and employment
• Total Demand Model errors
• Model error in elasticity of demand
• Value of time
• Errors in measurement of network times and costs
• Operating speed
• Roadway improvements
1 Texas North Tarrant Express Segment 3A
• Revenue and Transaction Forecast Year = 2018
2018 Revenue and Transactions
• Forecasted 2018 Annual Project Revenue (000’s 2011 Dollars) = 27612
• Forecasted 2018 Daily Transactions = 40086
Truck VOT Calculations
• SOV VOT - Lognormal distribution with mean = $18.59 and standard deviation = $7.4 (µ = 2.849
and σ = 0.383)
3

• Coeﬃcient of variation, Csov
v = 7.4
18.59 = 0.398
• HHM Truck VOT: Mean = $36.48 and Standard deviation = $30.24
• AECOM Truck VOT: Mean = $60.76 and Standard deviation = $51.08
• AverageTruckV OT = HHMT ruckV OT +AECOMT ruckV OT
2 = $48.62
• Standard deviation of Average Truck VOT = Csov
v ∗ AverageTruckV OT = $19.35 (µ = 3.811 and σ
= 0.383, calculations below)
• µσ
In [3]: # Parameters for Truck Lognormal Distribution
m = 48.62
s = 19.35
truck_ln_mu = np.log(m/np.sqrt(1+((s**2)/(m**2))))
truck_ln_sigma = np.sqrt(np.log(1+((s**2)/(m**2))))
print ’truck_ln_mu = %1.3f’ % truck_ln_mu
print ’truck_ln_sigma = %1.3f’ % truck_ln_sigma
truck ln mu = 3.811
truck ln sigma = 0.383
r
(p)
base = 74754 (000’s 2011 Dollars)
Variables: Sources of Uncertainty
• Truck VOT: x1
– Elasticity of Revenue to Truck VOT = 0.994
– xbase
1 = $60.76
– Probability distribution: Lognormal with mean = $48.62 and std. dev = $19.35 (µ = 3.811 and
σ = 0.383)
• Travel Demand: x2
– Elasticity of Revenue to Demand (Transactions as proxy) = 2.57
– xbase
2 = 61056
– Probability distribution: Normal with µ = 58871.5 and σ = 2184.5
• Car VOT Growth: x3
– Elasticity of Revenue to Car VOT Growth = 0.19
– xbase
3 = 2.1%
– Probability distribution: Triangular with Min = 0.5%, Mean = 1.05%, and Max = 2.1%
• Truck VOT Growth: x4
4

– Elasticity of Revenue to Truck VOT Growth = 0.19
– xbase
4 = 2.5%
– Probability distribution: Triangular with Min = 0.5%, Mean = 1.25%, and Max = 2.5%
Truck VOT Probability Distribution: Lognormal
In [4]: mu = 3.811
sigma = 0.383
low = 1
high = 120
dx_1 = 2 # Length of interval
# Comb points along x axis
x_1 = np.arange(low, high, dx_1)
# Compute y values: pdf at each value of x
vot_y = (1/(sigma * x_1 * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((np.log(x_1) - mu)/sigma) ** 2)
# Plot the function
plt.figure(figsize = (16, 8))
plt.stem(x_1, vot_y, markerfmt = ’ ’) # This draws the intervals
plt.xlabel(’$x_1$’)
plt.ylabel(’$p(x_1)$’)
plt.title(’Discretized Log-Normal Probability Density’)
area = np.sum(dx_1 * vot_y)
print ’Probability Sum = %1.4f’ % area
print ’N_1 = %d’ % len(x_1)
temp1 = np.array([x_1, vot_y * dx_1])
Probability Sum = 0.9946
N 1 = 60
5

Travel Demand Probability Distribution: Normal
In [5]: # Mean Transactions = (2035 Transactions + 2018 Transactions)/2
print ’Mean Transactions = %s’ % ((63635+40086)/2.0)
# Std Dev Transactions = (2035 Transactions - 2018 Transactions)/2
print ’Std. Dev Transactions = %s’ % ((63635-40086)/2.0)
Mean Transactions = 51860.5
Std. Dev Transactions = 11774.5
In [6]: demand_mean = 51860.5
demand_sd = 11774.5
demand_low = demand_mean - 3 * demand_sd # low end of x axis
demand_high = demand_mean + 3 * demand_sd # high end of x axis
dx_2 = 2000 # Length of interval
# Comb points along x axis
x_2 = np.arange(demand_low, demand_high, dx_2)
# Compute y values: pdf at each value of x
demand_y = (1/(demand_sd * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x_2 - demand_mean)/demand_sd)
# Plot the function
plt.stem(x_2, demand_y, markerfmt = ’ ’) # This draws the intervals
plt.xlabel(’$Demand$’)
plt.ylabel(’$p(Demand)$’)
plt.title(’Discretized Normal Probability Density’)
area = np.sum(dx_2 * demand_y)
print ’Probability Sum = %1.4f’ % area
print ’N_2 = %d’ % len(x_2)
N 2 = 36
6

Car VOT Growth Probability Distribution: Triangular
In [7]: min_growth_car = 0.004
mean_growth_car = 0.0105
max_growth_car = 0.022
car_array = np.random.triangular(min_growth_car, mean_growth_car, max_growth_car, size = 100000)
#plt.hist(car_array, bins = 10)
car_val = np.histogram(car_array, bins = 20)
car_y = [float(i)/np.sum(car_val[0]) for i in car_val[0]]
# Binwidth issue
x_car = car_val[1]
x_3 = []
for i in range(len(x_car) - 1):
temp = (x_car[i] + x_car[i+1])/2
x_3.append(temp)
# Plot triangular distribution
plt.stem(x_3, car_y, markerfmt = ’ ’) # This draws the intervals
plt.xlabel(’$Car Growth$’)
plt.ylabel(’$p(Car Growth)$’)
plt.title(’Discretized Triangular Probability Density’)
print len(x_3)
print np.sum(car_y)
20
1.0
Truck VOT Growth Probability Distribution: Triangular
7

In [8]: min_growth_truck = 0.004
mean_growth_truck = 0.0125
max_growth_truck = 0.026
truck_array = np.random.triangular(min_growth_truck, mean_growth_truck, max_growth_truck, size =
#plt.hist(car_array, bins = 10)
truck_val = np.histogram(truck_array, bins = 20)
truck_y = [float(i)/np.sum(truck_val[0]) for i in truck_val[0]]
# Binwidth issue
x_truck = truck_val[1]
x_4 = []
for i in range(len(x_truck) - 1):
temp = (x_truck[i] + x_truck[i+1])/2
x_4.append(temp)
# Plot triangular distribution
plt.stem(x_4, truck_y, markerfmt = ’ ’) # This draws the intervals
plt.xlabel(’$Truck Growth$’)
plt.ylabel(’$p(Truck Growth)$’)
plt.title(’Discretized Triangular Probability Density’)
print len(x_4)
print np.sum(truck_y)
20
1.0
Scenarios
In [9]: S = [[i, j, k, l] for i in x_1 for j in x_2 for k in x_3 for l in x_4]
print S[0]
8

print ’n’
print ’Number of Scenarios = ’ + str(len(S))
[1, 16537.0, 0.0044911329557659595, 0.0045996835827973384]
Number of Scenarios = 864000
Probability and Revenue Calculations for Scenarios
In [10]: # Constants: Base Revenue
rp_base = 27612
# Constants: Base values of variables
x_1b = 60.76
x_2b = 40086
x_3b = 0.021
x_4b = 0.025
# Constants: Elasticities of variables
e_x1 = 0.994
e_x2 = 2.57
e_x3 = 0.19
e_x4 = 0.19
revenue_S = []
prob_S = []
for i in range(len(S)):
# R(s)
temp_rev = rp_base * (S[i][0]/x_1b)**(e_x1) * (S[i][1]/x_2b)**(e_x2) * (S[i][2]/x_3b)**(e_x
revenue_S.append(temp_rev)
# Probability calculation:
# Truck VOT
p_x1 = (1/(sigma * S[i][0] * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((np.log(S[i][0]) - mu)/s
# Demand
p_x2 = (1/(demand_sd * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((S[i][1] - demand_mean)/demand
# Car VOT Growth
if S[i][2] in x_3:
cp = x_3.index(S[i][2])
p_x3 = car_y[cp]
# Truck VOT Growth
if S[i][3] in x_4:
tp = x_4.index(S[i][3])
p_x4 = truck_y[tp]
prob_S.append(p_x1 * p_x2 * p_x3 * p_x4)
print ’Probability Sum = %0.4f’ % np.sum(prob_S)
9

In [11]: # Sorting Result based on Revenue
output = (np.array([revenue_S, prob_S])).T
output = output[output[:, 0].argsort()]
In [12]: # Plotting Cumulative Probability Distribution
plt.plot(output[:,0], np.cumsum(output[:,1]), linewidth = 2) # Selecting array column: array[:,
# Plotting Predicted Revenue
plt.axvline(x = rp_base, color = ’r’)
plt.text(74754 + 500, 0.1, ’Predicted Revenue for 2018’, fontsize = 16)
# Remove Scientific Notation
ax = plt.gca()
ax.get_xaxis().get_major_formatter().set_scientific(False)
plt.xlabel(’$Revenue$’, fontsize = 16)
plt.ylabel(’$p(Revenue)$’, fontsize = 16)
plt.title(’Cumulative Probability Distribution - Revenue (000’s 2011 Dollars)’, fontsize = 16)
# Set tick label size
plt.tick_params(axis = ’both’, which = ’major’, labelsize = 14)
print ’Probability Sum = %0.4f’ % np.sum(prob_S)
print ’Demand std = %d’ % demand_sd
Demand std = 11774
Percentile Calculation
In [14]: year = 2018
cum_prob = pd.DataFrame({’Revenue’: output[:,0], ’Cumulative Probability’: np.cumsum(output[:,1
10

# P(Revenue < r) = percentile -> Find r
# 75+
percentile_75 = cum_prob[’Revenue’][cum_prob[cum_prob[’Cumulative Probability’] <= 0.75].shape[
# 25-
# Print values
print ’75th Percentile of ’ + str(year) + ’ Revenue = %0.2f’ % percentile_75
print percentile_95
print percentile_85
print percentile_75
print percentile_25
print percentile_15
print percentile_05
75th Percentile of 2018 Revenue = 49146.23
91921.6153431
62173.6864964
49146.2293139
18857.5804073
14082.965342
8193.3714675
11

Prob-Dist-Toll-Forecast-Uncertainty

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Prob-Dist-Toll-Forecast-Uncertainty

Similar to Prob-Dist-Toll-Forecast-Uncertainty (20)

Prob-Dist-Toll-Forecast-Uncertainty