CLIM: Transition Workshop - Modeling Weather-related House Insurance Claims with Machine Learning Approach - Asim Dey, May 15, 2018

Modeling Weather-induced home Insurance
Risks: A Machine Learning Approach
Asim Dey
The University of Texas at Dallas
joint with
Yulia R. Gel,The University of Texas at Dallas
Slava Lyubchich, University of Maryland Center for Environmental Science.

Introduction
Data
Methods
Support Vector Regression
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Outline
1 Introduction
2 Data
3 Methods
Support Vector Regression (SVR)
Neural Network (NN)
4 Prediction
5 Uncertainty due to Climate Models
6 Future Work
Asim Dey 2/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
US trend in heavy precipitation
Figure: 1. The relative occurrences of 2-day precipitation totals that exceed
on average only once in a ﬁve-year period. Changes are compared to
1901-1960 average (Source: GlobalChange.gov).
Since 2008, the United States has seen six ﬂoods costing at least $1
billion each (SOA).
Asim Dey 3/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Flood disasters in Canada
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990
Numberofflooddisasters
0102030405060 3
4
9
7 8
13 12
44
50 51
Figure: 2. Frequency of ﬂood disaster in Canada (Figure adapted from
Cheng et al. 2012).
In Canada, from 2009 to 2014, total insured losses from catastrophic
events were close to or above $1 billion each year (IBC).
Asim Dey 4/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Flood disasters in Canada
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990
Numberofflooddisasters
0102030405060 3
4
9
7 8
13 12
44
50 51
Figure: 2. Frequency of ﬂood disaster in Canada (Figure adapted from
Cheng et al. 2012).
In Canada, from 2009 to 2014, total insured losses from catastrophic
events were close to or above $1 billion each year (IBC).
The Intergovernmental Panel on Climate Change (IPCC) has projected
that the severity and frequency of extreme rainfalls will further increase
(IPCC, 2014).
Asim Dey 4/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Literature Review
Statistical approaches for claim frequency: GLM (Haug et al., 2011),
GARMA (Soliman et al., 2015) , Bayesian hierarchical model (Scheel et
al., 2013), data-driven nonparametric procedure (Lyubchich and Gel,
2017).
Machine learning techniques for claim frequency: Neural Network (NN)
(Caldeira et al., 2015), SVR (Wu and Akbarov, 2013).
Goals:
1 Model and forecast a joint dynamics of weather-related home insurance
claims (frequencies) and losses (severities).
2 Utility of Support Vector Regression (SVR) and Neural Network (NN), in
forecasting future claim dynamics.
Asim Dey 5/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Data
Table: 1. Overview of the data sets
Period Data type/Climate model Variables
Control Observations Precipitation,
(2002–2011) Number of claims,
Aggregate loss
Scenario Projections from
CanESM2 4.5,
(2021–2080) CanESM2 8.5,
GFDL ESM2M R 8.5,
GFDL ESM2M W 8.5, Precipitation
MPI ESM 8.5,
HadGEM2 ES 8.5
.
Representative Concentration Pathways (RCPs) 4.5 (RCP 4.5) assumes that
the emissions will peak around 2040, then decline. Under the RCP 8.5
scenario, emissions continue to rise throughout the period 2000–2100.
Asim Dey 6/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Normalized Data
Year
Precipitation(mm/week)
0
20
40
60
80
100
120
2001 2003 2005 2007 2009 2011
x = 7.698
x~ = 3.80
s = 10.611
Year
NumberofCliam/week
051015202530
2001 2003 2005 2007 2009 2011
x = 3.608
x~ = 3.0
s = 8.01
Year
Lossinthousanddollars/week
0100200300400500600
2001 2003 2005 2007 2009 2011
x = 46620
x~ = 22300
s = 215961
Figure: 3. Weekly observed precipitation, number of claims and total loss
(2001-2011) in Canadian city A , ¯x is mean, ˜x is median, and s is standard
deviation.
Asim Dey 7/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Modeling Precipitation Related Risk
Number of claims at week t, Nt and aggregate loss at week t, Lt can be
modeled as
Nt = f (Rt , Rt−1) (1)
Lt = f (Rt , Rt−1, Nt ) (2)
where, Rt is total precipitation at week t, and Rt−1 is total precipitation at
week t − 1.
Asim Dey 8/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Support Vector Regression (SVR)
Least Squares Regression minimizes in-sample residual sum of squares.
SVR attempts to minimize the generalized error bound (Eq. 4).
SVR only depends on a particular subset (Residual > ε) of the training data.
− ε 0 ε
0
r
Vε
(r)
Figure: 4. SVR ε-insensitive error function.
f (x) = β, x + β0, (3)
H(β, β0) =
n
i=1
Vε (yi − f (xi )) +
λ
2
β
2
, (4)
k(xi , x) = exp −
x − xi
2
2σ2
(5)
Minimization of loss function leads to a quadratic programming problem.
Asim Dey 9/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Genetic Algorithm (GA)
The results of SVR modeling highly depend on the three user-defined
parameters (hyper-parameters):
1 Regularization parameter (λ),
2 Tube size of ε-insensitive loss function (ε),
3 Bandwidth of the kernel function (σ2
).
Inappropriate choice of hyper-parameters leads to over-fitting or under-fitting.
Genetic Algorithm (GA) is applied to simultaneously optimize all SVR
parameters (Goldberg, 1989).
Asim Dey 10/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Genetic Algorithm (GA) cont...
Each generation is like a iteration in numerical optimization problem. At
each iteration there is a progressive improvement of objective function.
Deﬁne:
- Parameters (C, σ, ε)
- Population size
- Fitness function
- Stopping criteria
Generate initial random population
Train SVR model
and calculate ﬁtness
i = n?
Create new population by:
- Reproduction
- Crossover
- Mutation
Select optimal (C, σ, ε)
Train SVR model using ob-
tained hyper-parameters
i = 1
No
i = i + 1
Yes
Figure: 5. Flow-chart of the GA-SVR.
Asim Dey 11/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Genetic Algorithm (GA)
Table: 2. GA parameter settings
Parameter Value
Number of generations
(stopping criterion): 1000
Population size: 50
Fitness function: RMSE
Search domain (Hsu et al., 2003):
λ (10−3, 103)
σ2 (10−3, 24)
(10−2, 23)
Asim Dey 12/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Claims model-Neural Network (NN)
NN models the response as a nonlinear function of linear combinations of the
predictors.
X1
X2
X3
X4
Y
Input
layer
Output
layer
Figure 6(a): Linear Regression
Model.
Y = g
4
j=1
βj Xj + ,
g(v) = v,
X1
X2
X3
X4
Z1
Z2
Y
Input
layer
Hidden
layer
Ouput
layer
Figure 6(b): Neural Network
Model.
Zj = f
4
i=1
αij Xi ,
Y = g
2
j=1
βj Zj + ,
where, f (v) = 1/(1 + e−v ) is the sigmoid activation function. The unknown
parameters αij , αoj , βj , and βo are estimated by back-propagation method
(Bishop, 2006).
Asim Dey 13/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Number of hidden layers and nodes in a hidden
layer
The rules of thumb:
One hidden layer network can approximate any function that contains a
continuous mapping from one ﬁnite space to another (Heaton, 2008).
0.02750.02850.02950.0305
Number of claim model
Hidden neurons
Testerror(rRMSE)
1 2 3 4 1 2 3 4 5
0.0270.0280.0290.030
Loss Model
Hidden neurons
Testerror(rRMSE)
Figure 7: Number of nodes selection in a hidden layer, number of resamples
K is 100.
Asim Dey 14/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Model Selection
05102030
Year
Numberofclaims/week
Observed
GA−SVR
GLM
NN
2008 2009 2010 2011 2012
Year
Weeklytotalloss(1000CAD2002)
0
100
200
300
400
500
600
2008 2009 2010 2011 2012
Observed
GLM
NN
GA−SVR
Figure 8: Observed values versus ﬁtted values from three diﬀerent models.
GA-SVR captures the variability of observed data better than other two
model, particularly, the sudden high spikes in the number of claims and total
losses.
Asim Dey 15/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Cross-validation
Table: Number of claim model: Cross-validated Normalized average RMSE
(Em), City A
Model 5 years-RMSE (Em)
NN 29
SVR 27
GA-SVR 24
Em =
aveRMSEm
m × 1
N
N
t=1
yt
100%, (6)
where, aveRMSEm = 1
K
K
i=1
RMSEmi , the denominator of Eq. 6 is average
total number of claims (or, average total loss) in the period of m years,
N = 365 × 10, and K = 100 is number of resamples.
Asim Dey 16/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Prediction
The change from the control period (2002 − 2011) to the 6 sub periods, 10
years each, of projection (scenario) period (2021 − 2080) is
∆ = t∈scn
ˆYt /10
t∈ctr
Yt /10
− 1, (7)
where, ‘scn’ and ‘ctr’ refers to the scenario period and control period,
respectively.
Asim Dey 17/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Risk Prediction using GA-SVR
2021−2030 2031−2040 2041−2050 2051−2060 2061−2070 2071−2080
Number of claims
Aggregate loss
GA−SVR Model
Year
Changeinannualnumberofclaimsandloss,%
0
10
20
30
40
50
Figure 9: Projected percentage changes relative to the control period of
2002–2011. Climate model is CanESM2 4.5.
The annual number of home insurance claims and annual aggregate loss will
increase noticeably in the scenario periods.
Asim Dey 18/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
GA-SVR claims
2021−2030 2031−2040 2041−2050 2051−2060 2061−2070 2071−2080
Climate scenario
CanESM2 4.5
CanESM2 8.5
MPI ESM 8.5
GFDL ESM2MR 8.5
GFDL ESM2MW 8.5
HadGEM2 ES 8.5
GA−SVR Model
Year
Changeinannualnumberofclaims,%
0
10
20
30
40
50
60
Figure 10: Predicted claims from diﬀerent climate models.
Asim Dey 19/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
GA-SVR losses
2021−2030 2031−2040 2041−2050 2051−2060 2061−2070 2071−2080
Climate scenario
CanESM2 4.5
CanESM2 8.5
MPI ESM 8.5
GFDL ESM2MR 8.5
GFDL ESM2MW 8.5
HadGEM2 ES 8.5
GA−SVR Model
Year
Changeinannualaggregateloss,%
0
10
20
30
40
50
Figure 11: Predicted losses from diﬀerent climate models.
Asim Dey 20/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Quantile Regression (QR)
Number of claims at week t, Nt :
Nt = f (Rt , Rt−1)
where, Rt is total precipitation at week t, and Rt−1 is total precipitation at
week t − 1.
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.8
pcp0
0.0 0.2 0.4 0.6 0.8 1.0
0.00.10.20.30.4
pcp0l1
Figure 12: Regression coeﬃcients (y-axis) for diﬀerent quantiles of NCL
Asim Dey 21/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Observed vs predicted quantiles
Figure 13: Observed and ﬁtted quantiles for weekly number of claims for
probabilities from 1% to 99%.
Asim Dey 22/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Predictions from diﬀerent climate model
0
5
10
15
20
25
Quantileofweeklynumberofclaims
2002−2012 2021−2030 2061−2070
Observed
CanESM2rcp45CanRCM4
CanESM2rcp85CanRCM4
GFDLESM2Mrcp85RegCM4
GFDLESM2Mrcp85WRF
HadGEM2ESrcp85RegCM4
MPIESMLRrcp85RegCM4
Figure 13: Observed and predicted quantiles for weekly number of claims for
probabilities from 1% to 99%.
Asim Dey 23/24

Introduction
Data
Methods
(SVR)
Neural Network (NN)
Prediction
Uncertainty due
to Climate
Models
Future Work
Future Works
Combine predictions from diﬀerent climate models to a single distribution-
Ensembles of climate models.
1 Bayesian Models (Smith et al., 2009).
2 Bayesian Hierarchical Models (Sansom et al., 2017).
Expand the spatial domain of our analysis to other cities.
Incorporate other information, e.g., seasonal component, location and value
of assets.
Asim Dey 24/24

CLIM: Transition Workshop - Modeling Weather-related House Insurance Claims with Machine Learning Approach - Asim Dey, May 15, 2018

Recommended

Recommended

More Related Content

Similar to CLIM: Transition Workshop - Modeling Weather-related House Insurance Claims with Machine Learning Approach - Asim Dey, May 15, 2018

Similar to CLIM: Transition Workshop - Modeling Weather-related House Insurance Claims with Machine Learning Approach - Asim Dey, May 15, 2018 (20)

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded

Recently uploaded (20)

CLIM: Transition Workshop - Modeling Weather-related House Insurance Claims with Machine Learning Approach - Asim Dey, May 15, 2018