SlideShare a Scribd company logo
1
Time Series Analysis of Monthly Paid Life Insurance Claims:
A Nonstationary Seasonal Model Approach
Jonathan A. Poon
Abstract
The intent of this study is to properly identify a model that best fits monthly paid life insurance
claims at company ABC and to forecast future projections on how much they expect to pay out in the near
future . It was determined through methods of combined first differencing and maximum likelihood that
an ARIMA(4,1,0) x ARIMA(1,1,0)12 model fit the data most sufficiently. This suggested model is
questionable at one point being that a potential outlier proposes inadequacy. In April 2013, company
ABC experienced a high amount of claims paid out in the life insurance area (the highest that occurred
throughout the thirteen year period investigated). Speculations of this occurrence could have been from
possible improvements in the claims process at the company or methods that were implemented to aide
their customers in understanding how to submit claims correctly and faster. In order to ensure privacy to
the company, further explanation of this possible outlier was not investigated. Additional models were
also taken into consideration. However, thorough model diagnostics showed these candidate models had
drawbacks in that they did not abide by the laws of normality, even though their estimated parameters
were significant. By applying the suggested model into forecasting, the company should have expected
$7,450,880.00, $6,357,658.00 and $4,863,679.00 in paid life insurance claims during the months of
September, October and November, respectively. These figures are already known, but were not taken
into consideration during the study. Forecasting up to December 2015 was also reviewed to advise how
much in life insurance claims the company is expected to pay out in the near future.
2
I. Introduction
The name speaks for itself when you hear the term life insurance. It's an agreement, between an
insured (policy holder) and the insurer (insurance provider), which promises to pay money to the insured's
beneficiary (husband, wife, children, etc.) in the case of the unexpected against the insured, namely death.
It stems from an old principle founded in Paris during the 17th Century by Italian investor Lorenzo Tonti.
According to Lorenzo's Tontine Annuity System, its practice was attempting to apply the laws of averages
and the principle of life expectancies in establishing annuities. Basically a group of individuals, no matter
what their age was, could make equal contributions of money towards a single sum of money. This sum
of money was then invested and at the end of each year the interest was dispersed amongst each
contributor. In Tonti's system contributors were considered as survivors (Steuer, 3-4).
The conclusion of Tonti's system is that whoever the last contributor was to survive received both
the interest of that year and the entire amount of the principle. However, as more people (contributors)
demanded to be insured by more money risk in violence amongst contributors also increased. It was then
that the Law of Large of Numbers was partly discovered. In essence to life insurance, the Law of Large
Numbers applies the intuitive method that the more people who contribute (that are at risk to death), the
more money that will be coming in. This way, when one person does die, this would not affect the
remaining contributed group. As of today, life insurance functions as a safeguard against misfortunes by
having the losses of the unfortunate few paid for by the contributions of the many that are exposed to the
same peril. In essence, the sharing of losses and, in the process, the substitution of a certain small "loss"
(the premium payment) for an uncertain large loss (Steuer, 4).
Along with purchasing life insurance comes the possibility of making a claim. A claim is a
procedure that a beneficiary must request in case of the policy holder's unexpected death. When a person
purchases one or more life insurance plans they should make a decision as to which type of payment plan
will need to be applied in case of a future claim on their policy. However if this goal was not met, various
types of payment plans can be offered. In his book, Steuer describes these options: 1) Lump sum - the
life insurance company provides the entire death benefit to the insured's beneficiary in one lump sum.
3
This allows the beneficiary to do whatever they want with it. 2) Specific income provision - on a set
schedule, the life insurance company provides the claimant both the principal and the interest of the life
insurance benefit. 3) Life income option - a guaranteed income for life. This amount of income is based
around the death benefit amount, and the claimant's age and gender at the time of the insured's death. 4)
Interest income option - the life insurance company holds all proceeds and pays out only the interest to
the primary beneficiary. Only when the primary beneficiary dies is when the death benefit is paid out,
specifically to a second beneficiary. These plans are specifically geared towards situations in which the
beneficiary has no way to manage money at the time of filing the claim (Steuer, 259).
When you simply look at what a claim is, specifically in regards to life insurance, one can
speculate that if a company is able to collect a large amount of premiums for the life insurance they sell,
they should be able to pay out death benefits when the time comes. However it is not so simple. Such
complexities in the insurance industry are handled by professionals known as actuaries: mathematicians
in the insurance industry that "calculate premiums, reserves, dividends and insurance, pension and
annuity rates, using risk factors obtained from experience tables" (Steuer, 4). Surprisingly, these
experience tables are based on, but not limited to, an insurance company's history of insurance claims!
Therefore, the aim of this study is to 1) identify a possible candidate model that describes monthly paid
life insurance claims during January 2001 to August 2013 at insurance company ABC and 2) forecast
future monthly claims paid out to the company's life insurance buyers, assuming monthly amounts are
persistent with observed data. By studying this topic, the insurance company can gain a perspective of
how much they expect to pay out in life insurance claims in the near future. Forecast analysis could allow
the following questions to be considered: 1) "Can these total monthly amounts tell us how well we're
processing incoming claim requests?" and 2) "Based on the forecast, should we look into improving our
claims process?", and 3) "Are life claim settlement forecasts much higher than what we expect?".
Forecasting monthly paid life insurance claims for company ABC could also aide in calculating future
premiums and reserves for their life insurance products.
4
II. Data
I was able to gather the claims data from the insurance company I am currently employed at that
is discretely mentioned in the study. The data is a collection of monthly paid life insurance claims that
spans from January 2001 to November 2013, which consists of a total of 154 data points. That is, each
data point represents the total amount of life insurance claims (in dollars) company ABC paid out each
month. The months of September, October and November 2013 were left out of the analysis to be
compared to the proposed model's forecasting accuracy. During the study a noticeable outlier is detected,
but due to omitted information and ensuring company privacy, assumptions could not be made to prove
this. The following study on this data is as follows: 1) Model Specification, 2) Parameter Estimation
(Model Fitting), 3) Model Diagnostics, 4) Forecasting Analysis and 5) Discussion.
III. Model Specification
The most important part of model specification is to first look at the observed data in its original
movement. The plot in Figure 1 on page 5 is a time series plot of the series {Yt} = monthly paid life
insurance claims Y at time t. The plot suggests paid claims in the life insurance sector has a steadily
increasing trend over time and a within-year seasonal pattern. Notice that from 2001 to 2004 the amount
of life insurance claims paid out followed very closely together, month-to-month. However, notice
thereafter the increasing variation between month-to-month paid claims. In some occurrences, mainly
between months during 2008 to present day had extremely low and high amounts of life insurance claims
paid out. Taking a look at mean (or average) levels year-to-year show that the mean amount of monthly
paid life insurance claims is not constant. For instance, the following mean amount of monthly claims for
year's 2005 to 2007 were exactly $4,972,819.74, $5,140,671.21 and $5,341,678.09, respectively. Figure 3
in the Appendix represents a plot of the autocorrelations of the time series across lag k, for k =1,2,3,....,36.
The plot is more commonly known as an Autocorrelation Function (ACF) plot. Notice that as lag k
increases, noticeable spikes in seasonal autocorrelations are present. More specifically as lags k = 12,24,
5
and 36. The plot also shows slow decay in autocorrelations as lag k increases, which in theory, suggests
the series in its original form is not that of stationary process.
Figure 1: Time Series Plot of Monthly Paid Life Insurance Claims at Company ABC
Based on inference of this series, {Yt} is pronounced as being that of a nonstationary process. In
order for the data to be properly modeled the series must be removed of any trend or seasonal patterns.
This approach is known as "detrending", more specifically taking the combined first differences of the
time series will be implemented. This type of method falls under a class of models that are defined as
Multiplicative Nonstationary Seasonal Autoregressive Integrated Moving Average (SARIMA) with
seasonal period s, denoted by ARIMA(p,d,q) x ARIMA(P,D,Q)s (Cryer & Chan, 234). The following
discussions will walk you through a brief description of the SARIMA family of models and the method of
combined first differencing. This will present possible multiplicative stationary SARIMA processes that
sufficiently represent the stochastic progression of the monthly paid life insurance claims data over time.
6
In an ARIMA(p,d,q) x ARIMA(P,D,Q)s , the character "p" represents the order of the
autoregressive component, "d" represents the number of nonseasonal differences needed on the series to
achieve stationarity, and "q" represents the order of the moving average component. In this study, since
the data has seasonal patterns, a multiplicative model consisting of both nonseasonal and seasonal
components must be applied. In the seasonal part of the model, the character "P" represents the order of
the seasonal autoregressive component, "D" represents the number of seasonal differences needed on the
series to achieve stationarity, and "Q" represents the order of the seasonal moving average component.
The character "s" represents the seasonal period. Typically "s" is equal to either 12 (for monthly data) or 4
(for quarterly data). In this study, we know immediately that s = 12. The general form of the
ARIMA(p,d,q) x ARIMA(P,D,Q)s process in backshift notation is
(1)
where {et} is zero mean white noise with variance var(et) = , which means the error terms et are
assumed normally distributed. In equation (1), we are defining the monthly paid life insurance claims
process {Yt} as a multiplicative SARIMA model with dth
nonseasonal differences and Dth
seasonal
differences. "The backshift operator, denoted by B, operates on the time index of the series and shifts time
back one time unit to form a new series" (Cryer & Chan, 106). In particular,
(2)
(Cryer & Chan, 106). The nonseasonal autoregressive (AR) and moving average (MA) characteristics
operators; respectively, are
(3)
(4)
7
and the seasonal autoregressive (AR) and moving average (MA) characteristics operators; respectively,
are
(5)
(6)
and
(7)
In most SARIMA class models, characters "d" and "D" are usually equal to either 1 or 2. In this
study, we take into account only the first differences.
Figure 2 on page 8 provides a depiction of how the series {Yt} has been modified using the
methods of differencing and why combined first differences is the method that should be used. Notice the
plot in the upper right corner. This plot represents the series {Yt} as if only nonseasonal differences, that
is d = 1, was taken into account. As time passes across this new series, seasonal patterns are still present.
Therefore, taking only nonseasonal differences does not bring us in the right direction. As for the plot in
the bottom left corner, this plot represents the series {Yt} as if only seasonal differences, that is D = 1 and
s = 12, was taken into consideration. Notice in this plot all seasonal patterns are practically removed.
However, there are some areas in this time series where the data points tend to "hang together", rather
than oscillate one another. A slight sinusoidal trend in the data is also present as well suggesting that
issues of trend is still present. The goal is to have any trend removed, as mentioned earlier, to achieve
stationarity. Therefore, taking only seasonal differences does not move us in the right direction. Finally,
combining both the seasonal and nonseasonal differences to the original time series (bottom right corner),
that is d = D = 1 and s = 12, provides all the characteristics of a stationary time series plot. Notice now in
this series any noticeable trend (whether positive or negative) and seasonal patterns are no longer existent.
Without further consideration of applying second differences, first differences will be our stopping point.
8
Figure 2: Differencing Methods Applied to Time Series Data. Upper Left: Original Time
Series {Yt}. Upper Right: Nonseasonal First Differences of Original Time Series, that is ∇Yt.
Bottom Left: Seasonal First Differences of Original Time Series, that is ∇12
Yt. Bottom Right:
Combined First Differences (Seasonal & Nonseasonal) of Original Time Series, that is
∇∇12
Yt.
9
The reason for not considering second differences to see if it might improve our persuasion of a
stationary process is the result of using the augmented Dickey-Fuller Unit Root Procedure. This
procedure tests the following:
H0 : α = 1 (nonstationarity).
vs.
H1 : α < 1 (stationarity).
where H0 is the null hypothesis, or what is assumed to be true until proven otherwise. Symbol H1
represents the alternative hypothesis, or what is trying to be proved against the null hypothesis. In this
study, the combined first differences of {Yt} was applied to this test. According to the results, Output-1 in
the Appendix, our p-value < 0.01. Therefore, we have significant evidence to reject the null hypothesis.
That is, we have evidence to show that the combined first differences of {Yt} is that of stationary process.
In order to obtain appropriate values for p, q, P, and Q the sample ACF and PACF (Partial
Autocorrelation Function) plot of the combined first differences time series are plotted. According to the
sample ACF plot (left) in Figure 4 in the Appendix, observe the only noticeable spike in autocorrelation is
at nonseasonal lag k = 1 and seasonal lag k = 12 with autocorrelation negligible elsewhere. As for the
sample PACF plot (right) in Figure 4, there is a noticeable spike at nonseasonal lag k = 1,2, 4 and 11, and
seasonal lag k =12. Also, at seasonal lag k =12 of the PACF, notice how autocorrelation exponentially
decays as lag k increases. Seasonal lags 12, 24 and 36 also show that of a sinusoidal pattern. That is, for
each successive seasonal lag, the autocorrelation oscillates from negative, positive, and negative.
Therefore, the sample ACF plot and sample PACF plot suggests the following model,
ARIMA(4,1,0) x ARIMA(1,1,0)12
From the proposed model, we will follow the process of parameter estimation and assess if the
model fits the monthly paid life insurance claims data adequately. Two additional models will also be
taken into consideration due to their parameter estimates showing significance.
10
IV. Parameter Estimation (Model Fitting)
An ARIMA(p,d,q) x ARIMA(P,D,Q)s process with nonseasonal characteristics p = 4, d = 1, and
q = 0; and seasonal characteristics P = 1, D = 1 and Q = 0 can be expressed as an ARIMA(4,1,0) x
ARIMA(1,1,0)12. By applying equation (1), which is built off of equations (2)-(7), this process can be
expressed as,
(8)
or, equivalently,
(9)
In order to estimate parameters in either equation (8) or (9) the method of maximum likelihood
(MLE) is applied. According to the method of MLE:
"for any set of observations, Y1, Y2,..., Yn, time series or not, the likelihood function L is defined
to be the joint probability density of obtaining the data actually observed. However, it is
considered as a function of the unknown parameters in the model with observed data held fixed.
For ARIMA models, L will be a function of the ϕ's, θ's, Θ's, Φ's, μ and σ2
e given the observations
Y1, Y2,..., Yn. The maximum likelihood estimators are then defined as those values of the
parameters for which the data actually observed are most likely, that is, the values that maximize
the likelihood function" (Cryer & Chan, 158).
According to Output-2 in the Appendix, the estimates for each parameter are = -0.8075, = -0.8026,
= -0.482, = 0.3837 and = -0.5154. When fitted to our proposed SARIMA model, we have the
following equation (10):
By methods of MLE, we can construct large sample confidence intervals of each parameter
estimate, with the following equation,
11
(11)
where za/2 = z1-a/2 = z0.975 = 1.96 assuming that ). SE(parameter estimate) is the approximate
standard error of the estimated parameter. These values can be seen in Output-2 in the Appendix (labeled
as s.e.). What this confidence interval can show is whether or not the parameter estimates for the proposed
model are significantly different than zero. If they are insignificant this means their confidence interval
will contain the value of zero. By applying equation (11) to the proposed parameter estimates and their
respective standard errors, the following 95% confidence intervals (significance level α = 0.05) were
calculated to be (-1.01235,-0.60265), (-1.04899, -0.55621), (-0.73226, -0.23174), (-0.59603, -0.17137)
and (-0.71561, -0.31519) for parameters , , , and ; respectively. Notice that amongst all
confidence intervals present, none contain zero. This concludes that our proposed model parameters
estimated by methods of MLE are significantly different than zero. We can now proceed to check for
assumptions of normality and independence in the residuals, et, in equation (10); and assess whether or
not the proposed model is appropriate in light of the observed monthly paid life insurance claims data.
The method of Overfitting will also be applied to ensure additional complex models, created from the
proposed model, are negligible.
V. Model Diagnostics
To assess the assumption of normality in the proposed model (equation (10) on page 10), a
historgram and QQ plot of the standardized residuals, et, are plotted. In Figure 5 in the Appendix, the
historgram (left) shows a normal distribution curve is definitely an appropriate fit on this plot. In the QQ
plot (right) in Figure 5, notice all data points (residuals) fall closely to the normal fitted line. This further
suggests no gross departure from normality. To further suggest the residuals are normally distributed, the
Shapiro-Wilk procedure is applied. In Output-3 in the Appendix, the Shapiro-Wilk procedure tests the
following:
12
H0 : Standardized residuals are normally distributed.
vs.
H1 : Standardized residuals are not normally distributed.
Based on this test, conducted in R, our observed p-value = 0.2361. Therefore, at any reasonable
significance level α, we fail to reject H0. That is, we do not have enough evidence to show that the
standardized residuals are not normally distributed.
To assess the assumption of independency in the proposed model, a time series plot of the
standardized residuals, et, is used. In Figure 6 in the Appendix, the movement of the residuals over time
suggest no alarming instances of erratic patterns and all look be to moving in an oscillating fashion. To
further suggest independence in the residuals, a formal hypothesis test procedure can be applied. This test
is what is known as the Runs Procedure. It tests the following:
H0 : Standardized residuals are independent.
vs.
H1 : Standardized residuals are not independent.
Based on this test, conducted in R (Output-4 in the Appendix), our observed p-value = 0.289. Therefore,
at any reasonable significance level α, we fail to reject H0. That is, we do not have enough evidence to
show that the standardized residuals are not independent.
We now know the proposed model for the monthly paid life insurance claims data abides by the
rules of normality and independence. To further suggest the ARIMA(4,1,0) x ARIMA(1,1,0)12 is an
appropriate fit to the observed data, the Ljung-Box (1978) procedure can be applied. This procedure, in
reference to this study, tests the following:
H0 : The ARIMA(4,1,0) x ARIMA(1,1,0)12 is an appropriate fit.
vs.
H1 : The ARIMA(4,1,0) x ARIMA(1,1,0)12 is not an appropriate fit.
According to this test (Output-5 in the Appendix), at a maximum lag K = 36, we observe our test statistic
to be with a corresponding p-value = 0.4841. At any reasonable significance level,
13
we fail to reject H0. That is we do not have enough evidence to reject that the proposed model is
appropriate. Figure 7 in the Appendix provides visualization of the Ljung-Box Procedure at work. The top
plot provides a time series plot of the standardized residuals over time. Notice that all but one residual fall
between -3 and 3. According to the Ljung-Box procedure in R, residuals in a time series plot are tested to
be outliers by using the "Bonferroni" criterion. That is, any residuals that fall outside of the bounds z0.025/n
≈ ±3.591368, for n = # of original observed data points = 152, of a residual time series plot are
considered an "outlier". The residual in question is the 148th residual in the plot. With no surprise, this is
the month (April 2013; mentioned on page 1) that the life insurance company ABC experienced the most
pay outs in life insurance claims. This residual, however, was calculated to be approximately 3.079969,
which falls within the Bonferroni criterion. Therefore, the Ljung-Box procedure does not discern this
residual as an outlier. To look further, Figure 7's middle plot depicts the sample ACF of the standardized
residuals. According to this plot, if correlations fall outside the red-dotted lines we have lag k's that are
statistically significant. What is desired is a residual ACF plot free of significant correlations. That is, a
plot that resembles a white noise process. The only "statistically significant" correlation is at lag k = 24,
which is actually a seasonal lag. However, this correlation is approximately -0.1889 (Output-6 in the
Appendix), which is a very small correlation. To further suggest this autocorrelation is of no concern,
regress back to Figure 4 in the Appendix, the sample ACF and PACF of the combined first differences of
the original data observed. There are no suggestions on these plots that seasonal lag k = 24 has any
significance in the data. Therefore, there is no reason to further speculate dependence at lag k = 24.
Further support is mentioned in the discussion of overfitting the proposed model on page 15.
As mentioned on page 9, there are still two additional models that pass as possible candidate
models: 1) ARIMA(0,1,1) x ARIMA(0,1,1)12 and 2) ARIMA(4,1,0) x ARIMA(0,1,1)12. After thorough
model diagnostic comparisons were ran, it was a tough decision choosing the ARIMA(4,1,0) x
ARIMA(1,1,0)12 model as the suggested model fit. For instance, notice in Table 1 on page 14 that among
all three candidates, the proposed model has the highest AIC (Akaike's Information Criterion) and white
noise variance. What is strived for in a proposed model is a model fit that minimizes the AIC and white
14
noise variance the best. The ARIMA(0,1,1) x ARIMA(0,1,1)12 can be observed as having both of these
qualities. The same applies to the ARIMA(4,1,0) x ARIMA(0,1,1)12, the AIC and white noise variance are
very low and not much bigger than the model previously stated. However, a very important diagnostic
observation that strayed away from these candidate models were their test for normality in the residuals.
Notice for the ARIMA(0,1,1) x ARIMA(0,1,1)12, the p-value < 0.005 according to the Shapiro-Wilk test.
Therefore, we can conclude the residuals do not follow that of a normal distribution. This is highly
requested in that, by theory, we expect a stationary SARIMA model to have ). As for the
ARIMA(4,1,0) x ARIMA(0,1,1)12, again the Shapiro-Wilk test convinces to not pursue this model as
choice. The p-value estimated under this test for the ARIMA(4,1,0) x ARIMA(0,1,1)12 was 0.01442.
Although it barely passes as insignificant (if say α = .01 was declared before the study), I would be
hesitant to select this SARIMA process as my model of choice. In conclusion, the proposed model does
not provide the best AIC value or white noise variance; however, it still comes across as meeting all
assumptions of passing as an appropriate stationary SARIMA model.
Table 1: Model Diagnostic Comparisons Amongst Candidate Models
ARIMA(4,1,0)
x
ARIMA(1,1,0)12
ARIMA(4,1,0)
x
ARIMA(0,1,1)12
ARIMA(0,1,1)
x
ARIMA(0,1,1)12
CI Intervals of Parameter Estimates Significant Significant Significant
AIC 401.89 383.55 376.44
Whitenoise Variance 0.9446 0.7625 0.6672
Assumption of Normality (Shapiro-Wilk) p = 0.2361 p = 0.01442 p = 0.00574
Assumption of Independence (Runs) p = 0.289 p = 0.289 p = 0.627
Ljung-Boxt Test p = 0.4841 p = 0.6695 p = 0.2577
With model diagnostics completed, it's suggested that an ARIMA(4,1,0) x ARIMA(1,1,0)12 is the
model of choice for fitting the monthly paid life insurance claims data. The following methods proceeding
15
will look into confirming that higher, or more complex models, are insignificant (or propose uncertainty)
in relation to our proposed model. This proceeding section discusses the methods of Overfitting.
To further assess the proposed model accurately depicts the observed life insurance claims data,
the following more complex models and their estimate parameters will be analyzed: 1) ARIMA(5,1,0) x
ARIMA(1,1,0)12, 2) ARIMA(4,1,1) x ARIMA(1,1,0)12 , 3) ARIMA(4,1,0) x ARIMA(1,1,1)12 and 4)
ARIMA(4,1,0) x ARIMA(2,1,0)12. In order to assess if these additional parameters are significant, 95%
Confidence Intervals will be applied in each overfitted model. The following conclusions arise from the
methods of overfitting (conclusions were made by assistance of Output-7 in the Appendix):
ARIMA(5,1,0) x ARIMA(1,1,0)12 ϕ5 is not significant.
ARIMA(4,1,1) x ARIMA(1,1,0)12 ϕ1, ϕ2, ϕ3 and ϕ4 become insignificant with the additional θ1.
ARIMA(4,1,0) x ARIMA(1,1,1)12 Φ1 becomes insignificant with the addition of Θ1.
ARIMA(4,1,0) x ARIMA(2,1,0)12 All parameter estimates are significant.
What's peculiar about the proposed method of overfitting is that it discovers the ARIMA(4,1,0) x
ARIMA(2,1,0)12 model parameter estimates to be significantly different from zero. However, by analysis
of the ACF and PACF plot of the combined first differences (Figure 4 in the Appendix) there is no
evidence in the PACF plot that suggests a notable correlation spike at seasonal lag k = 24. Therefore, this
model should not be considered in light of its significant parameters.
After thorough model specification, fitting and diagnostics, it's been determined that the
ARIMA(4,1,0) x ARIMA(1,1,0)12 model proves to be the best candidate model over the ARIMA(4,1,0) x
ARIMA(0,1,1)12 and the ARIMA(0,1,1) x ARIMA(0,1,1)12 models. Although it was determined the
additional candidate models had promising attributes to be a model favorite, model diagnostics proved
otherwise. In the proceeding, we will further carry our proposed model into the analysis of forecasting.
That is, we will apply the ARIMA(4,1,0) x ARIMA(1,1,0)12 model to the observed monthly paid life
16
insurance claims data and predict how much company ABC should expect to pay out per month in claim
settlements for all of their life insurance products combined.
VI. Forecasting Analysis
Programming package R was used to compute monthly paid life insurance claims forecast and
95% prediction limits for the lead time l = 1,2,...,24 (24 months ahead) based on the proposed
ARIMA(4,1,0) x ARIMA(1,1,0)12 model. Estimated minimum mean squared error (MMSE) forecasts and
their respective 95% prediction limits can be seen in Output-8 in the Appendix. By plotting these
predicted elements to our original time series observed on page 5 Figure 1, a visual depiction provides
insights as to how well the proposed model's forecast mimics the stochastic periodicity in the monthly
paid life insurance claims data.
Figure 8: Model Forecasts Plotted on Monthly Paid Life Insurance Claims Time Series
17
In Figure 8, solid black dots represent MMSE predictions of the monthly paid life insurance claim
settlements the company could anticipate to see in the near future. Blue dashed lines in the plot represent
the 95% prediction intervals of each MMSE prediction. Notice how as lead time l increases, the models
95% prediction interval also increases in width. This is due to nonstationarity in the original time series,
which was discovered during the model specification phase. According to Output-8 in the Appendix we
are 95% confident, based on the fitted ARIMA(4,1,0) x ARIMA(1,1,0)12 model, company ABC's total life
insurance claims paid out on December 2013 will fall somewhere between $5,045,729.00 and
$9,126,798.00. Forecasts for the months of September, October and November 2013 are compared to
actual amounts already observed by the company. These observations are displayed as solid green dots in
Figure 8 on page 16. As mentioned on page 1, these following months were in the life insurance claims
dataset, but not taken into consideration in this study.
Table 2. Comparison of Observed (not included in study) vs. Fitted Model Prediction
Comparison of Observed vs. Predicted
Month Observed 95% Lower Limit Prediction 95% Upper Limit
September '13 $7,356,677.98 $5,545,976.00 $7,450,880.00 $9,355,784.00
October '13 $7,689,063.72 $4,417,795.00 $6,357,658.00 $8,297,520.00
November '13 $5,498,092.91 $2,922,172.00 $4,863,679.00 $6,805,187.00
Based on the proposed model, it seems to adequately predict what company ABC should
anticipate with their total monthly paid life insurance claims. Therefore, assumptions can be made further
into the future for the company in regards to how much they expect to pay out in life insurance claims.
According to the predictions, company ABC should expect to pay out approximately $189,392,938.00 in
life insurance claims from December 2013 to December 2015. If this is considered to be grossly higher
than what company ABC expects it to be, production processes in the claims department could be
recommended to be closely watched. When it comes down to the simplicity of a claims settlement,
Greene explains that the only direct contact an insurance buyer has in filing a claim is usually the claims
department alone. If communication between the insured and the claims department are left with bad
18
impressions then possible liabilities such as regulatory censure, lawsuits or even suspension of the right to
carry on business in the jurisdiction involved could occur. On the other side, if it's noticed by the
company "overly liberal life insurance settlements are occurring, this could result in higher life insurance
rate levels and even loss of business through lower premiums charged by competitors" (Greene, 173).
VII. Discussion
By applying the four strategies in model fitting, the present study identified that the
ARIMA(4,1,0) x ARIMA(1,1,0)12 process described the stochastic periodicity of recorded monthly paid
life insurance claims at company ABC the most effectively. By methods of maximum likelihood, the
following fitting model was ascertained,
.
From thorough model diagnostics, two additional models nearly outweighed the proposed
ARIMA(4,1,0) x ARIMA(1,1,0)12. However, these two candidate models did not agree (with one being
very distrustful) with the assumptions of normally distributed residuals, et. Therefore, they were no longer
considered as viable candidate models. By methods of model overfitting, an intriguing discovery
suggested the ARIMA(4,1,0) x ARIMA(2,1,0)12 overfitted model showed promise. That is, all of its
estimated parameters were shown to be significantly different than zero and all assumptions were met: 1)
stationarity, 2) normality, 2) independence and 3) model appropriateness. However, this model suggested
an additional seasonal autoregressive (AR) order was needed. This contradicted what was determined in
the model specification phase by analysis of the ACF and PACF plots of the combined first differences of
{Yt}. Therefore, all overfitted models were inconclusive.
Some downfalls with the proposed model in this study were comparisons of its calculated AIC
and white noise variance amongst the other two candidate models. Out of all three, the chosen
ARIMA(4,1,0) x ARIMA(1,1,0)12 had the largest calculated AIC and white noise variance. It is desired
during the model specifications phase to choose one that minimizes both components the best. I believe
one aspect that restricted the influences of the other candidate models was the fact that the dataset had a
19
potential outlier. During the model diagnostic phase, both the ARIMA(4,1,0) x ARIMA(0,1,1)12 and the
ARIMA(0,1,1) x ARIMA(0,1,1)12 models showed clear signs in their histogram and QQ plots that claims
figure recorded for April of 2013 had probable reasons of being a viable outlier (Figure 9 in the
Appendix). I believe this situation kept me from pursuing one of these two additional candidates as the
suggested model fit.
Through statistical programming in R, MMSE forecasts and 95% confidences intervals were
determined by extending the proposed model to make future predictions of what the company should
expect to see with monthly claim settlement figures in their life insurance sector. According to the
forecast, company ABC should expect to pay out approximately $189,392,938.00 in life insurance claims
in the next two years. Keep in mind the following projections are not taking environmental changes in the
economy into consideration. According to the LIMRA 2013 Barometer Study, about two thirds of
American consumers are concerned about financial stability when it comes to their retirement plan. A
surprising factor in the study discovered that one third of consumers in America experienced the loss of a
loved one or close friend within the past two years, and are now more inclined to purchase life insurance
in essence of being more concerned with leaving their dependent(s) financially burdened. However, 10
percent of consumers that experienced a loss of a loved one purchased life insurance in response to the
predicament. By the next year however, they are no more likely to purchase again than the general
consumer (Denley, 5). Therefore, it can be warranted that forecasting and time series analysis plays an
important role in predicting future outcomes. However one must also take into consideration the
circumstances that rise above these analytics.
20
References
Cryer, Jonathan D. and Kung-Sik Chan. Time Series Analysis With Applications in R. New York:
Springer Science+Business Media, LLC, 2010. Print.
Denley, Norah. 2013 Insurance Barometer Study. Research. Hartford, CT: LL Global, Inc., 2013. Web.
Greene, Mark R. Risk and Insurance. Cincinnati: South-Western Publishing Co., 1968. Print.
Steuer, Tony. Questions and Answers on Life Insurance: The Life Insurance Toolbook. Alameda: Life
Insurance Sage Press, 2010. Print.
21
Appendix
Figure 1: Time Series Plot of Monthly Paid Life Insurance Claims at Company ABC
Figure 2: Differencing Methods Applied to Time Series Data. Upper Left: Original Time Series
{Yt}. Upper Right: Nonseasonal First Differences of Original Time Series, that is ∇Yt. Bottom Left:
Seasonal First Differences of Original Time Series, that is ∇12
Yt. Bottom Right: Combined First
Differences (Seasonal & Nonseasonal) of Original Time Series, that is ∇∇12
Yt.
22
Figure 3: Sample ACF of Original Time Series Data
Figure 4: Sample ACF and PACF of Combined First Differences of Original Time Series Data
23
Figure 5: Histogram and QQ plot of Standardized Residuals
Figure 6: Time Series Plot of Standardized Residuals
24
Figure 7: Residual Graphics and Modified Ljung-Box p-values for an ARIMA(4,1,0) x
ARIMA(1,1,0)12 Fit
Figure 8: Seasonal Model Forecasting Estimates Applied to Original Time Series
25
Figure 9: Side-by-Side Histogram and QQ Plots of Additional Candidate Models
Output-1: Augmented Dickey-Fuller Unit Root Test with the Combined First Differences Data
> ar(diff(diff.claims,lag=12))
Call:
ar(x = diff(diff.claims, lag = 12))
Coefficients:
1 2 3 4 5 6 7 8
-0.7543 -0.7441 -0.4034 -0.3184 -0.0427 -0.1071 -0.1325 -0.0710
9 10 11 12 13 14
-0.0482 -0.1379 -0.0439 -0.4215 -0.2309 -0.1812
Order selected 14 sigma^2 estimated as 1.176
> adf.test(diff(diff.claims,lag=12),k=14)
Augmented Dickey-Fuller Test
data: diff(diff.claims, lag = 12)
Dickey-Fuller = -4.8077, Lag order = 14, p-value = 0.01
alternative hypothesis: stationary
Warning message:
In adf.test(diff(diff.claims, lag = 12), k = 14) :
p-value smaller than printed p-value
26
Output-2: SARIMA Parameter Estimation using Methods of Maximum Likelihood
> #ARIMA(4,1,0) x ARIMA(1,1,0)_{12}**
> ar.claims.mdl2 <-
arima(life.claims,order=c(4,1,0),method='ML',seasonal=list(order=c(1,1,0),period=12))
> ar.claims.mdl2
Call:
arima(x = life.claims, order = c(4, 1, 0), seasonal = list(order = c(1, 1, 0),
period = 12), method = "ML")
Coefficients:
ar1 ar2 ar3 ar4 sar1
-0.8075 -0.8026 -0.482 -0.3837 -0.5154
s.e. 0.0794 0.0955 0.097 0.0823 0.0776
sigma^2 estimated as 0.9446: log likelihood = -195.95, aic = 401.89
Output-3: Shapiro-Wilk Procedure to Test Normal Distributed Residuals
> shapiro.test(rstandard(ar.claims.mdl2))
Shapiro-Wilk normality test
data: rstandard(ar.claims.mdl2)
W = 0.9883, p-value = 0.2361
Output-4: Runs Procedure to Test for Independence
> runs(rstandard(ar.claims.mdl2))
$pvalue
[1] 0.289
$observed.runs
[1] 84
$expected.runs
[1] 76.98684
$n1
[1] 77
$n2
[1] 75
$k
[1] 0
Output-5: Determining Ljung-Box Procedure Testing for Model Appropriateness
> Box.test(rstandard(ar.claims.mdl2), lag = 36, type = "Ljung-Box", fitdf = 4)
Box-Ljung test
data: rstandard(ar.claims.mdl2)
X-squared = 31.6524, df = 32, p-value = 0.4841
Output-6: Determining Max Autocorrelation in Time Series Plot of Model Residuals (Figure 7)
> acf.ar.claims.mdl2 <- acf(as.vector(rstandard(ar.claims.mdl2)),lag=36,plot=F)
> max.lag <- which.max(abs(acf.ar.claims.mdl2$acf))
> acf.ar.claims.mdl2$lag[max.lag]
[1] 24
> acf.ar.claims.mdl2$acf[max.lag]
[1] -0.1889239
27
Output-7: Methods of Model Overfitting
> ovf.ar.claims.mdl1 <-
arima(life.claims,order=c(5,1,0),method='ML',seasonal=list(order=c(1,1,0),period=12))
> ovf.ar.claims.mdl2 <-
arima(life.claims,order=c(4,1,1),method='ML',seasonal=list(order=c(1,1,0),period=12))
> ovf.ar.claims.mdl3 <-
arima(life.claims,order=c(4,1,0),method='ML',seasonal=list(order=c(1,1,1),period=12))
> ovf.ar.claims.mdl4 <-
arima(life.claims,order=c(4,1,0),method='ML',seasonal=list(order=c(2,1,0),period=12))
> ovf.ar.claims.mdl1
Call:
arima(x = life.claims, order = c(5, 1, 0), seasonal = list(order = c(1, 1, 0),
period = 12), method = "ML")
Coefficients:
ar1 ar2 ar3 ar4 ar5 sar1
-0.8081 -0.8019 -0.4818 -0.3797 0.0003 -0.5165
s.e. NaN NaN NaN NaN 0.0180 NaN
sigma^2 estimated as 0.9446: log likelihood = -195.94, aic = 403.89
Warning message:
In sqrt(diag(x$var.coef)) : NaNs produced
> ovf.ar.claims.mdl2
Call:
arima(x = life.claims, order = c(4, 1, 1), seasonal = list(order = c(1, 1, 0),
period = 12), method = "ML")
Coefficients:
ar1 ar2 ar3 ar4 ma1 sar1
-0.8094 -0.8040 -0.4831 -0.3840 0.0022 -0.5154
s.e. 0.2442 0.1934 0.1695 0.0936 0.2674 0.0777
sigma^2 estimated as 0.9446: log likelihood = -195.95, aic = 403.89
> ovf.ar.claims.mdl3
Call:
arima(x = life.claims, order = c(4, 1, 0), seasonal = list(order = c(1, 1, 1),
period = 12), method = "ML")
Coefficients:
ar1 ar2 ar3 ar4 sar1 sma1
-0.8221 -0.7300 -0.409 -0.3525 0.0021 -0.8517
s.e. 0.0804 0.1017 0.101 0.0843 0.1248 0.1454
sigma^2 estimated as 0.7622: log likelihood = -186.77, aic = 385.55
> ovf.ar.claims.mdl4
Call:
arima(x = life.claims, order = c(4, 1, 0), seasonal = list(order = c(2, 1, 0),
period = 12), method = "ML")
Coefficients:
ar1 ar2 ar3 ar4 sar1 sar2
-0.8197 -0.7676 -0.4217 -0.3655 -0.6431 -0.2514
s.e. 0.0801 0.0998 0.1029 0.0832 0.0912 0.0957
sigma^2 estimated as 0.8914: log likelihood = -192.68, aic = 397.36
28
Output-8: MMSE Forecasts and 95% Prediction Limits
> year.claims =
c(2013.666,2013.750,2013.833,2013.916,2014,2014.083,2014.166,2014.250,2014.333,2014.41
6,2014.500,2014.583,2014.666,2014.750,2014.833,2014.916,2015,2015.083,2015.166,2015.25
0,2015.333,2015.416,2015.500,2015.583,2015.666,2015.750,2015.833,2015.916)
>
> ar.claims.mdl2.predict <- predict(ar.claims.mdl2,n.ahead=length(year.claims))
> ar.claims.mdl2.predict
$pred
Jan Feb Mar Apr May Jun Jul
2013
2014 6.679601 8.134403 7.490016 9.272639 7.784994 7.401839 7.720315
2015 7.082731 8.435012 8.296457 10.359447 7.204893 7.328288 8.469284
Aug Sep Oct Nov Dec
2013 7.450880 6.357658 4.863679 7.086264
2014 8.431392 7.406520 7.014338 5.394386 6.434451
2015 8.270852 7.856935 7.096430 5.545508 7.195943
$se
Jan Feb Mar Apr May Jun Jul
2013
2014 1.0635637 1.1539799 1.1841844 1.1963517 1.2389101 1.2691593 1.3057830
2015 1.6952197 1.7634241 1.8112046 1.8477683 1.8978117 1.9424720 1.9886551
Aug Sep Oct Nov Dec
2013 0.9719077 0.9897440 0.9905832 1.0411081
2014 1.3359653 1.5138960 1.5627015 1.5920884 1.6502904
2015 2.0319834 2.3237252 2.3908261 2.4325104 2.5191578
>
> lower.pi<-ar.claims.mdl2.predict$pred-qnorm(0.975,0,1)*ar.claims.mdl2.predict$se
> upper.pi<-ar.claims.mdl2.predict$pred+qnorm(0.975,0,1)*ar.claims.mdl2.predict$se
> data.frame(Month=year.claims,lower.pi,upper.pi)
Month lower.pi upper.pi
1 2013.666 5.5459762 9.355784
2 2013.750 4.4177953 8.297520
3 2013.833 2.9221717 6.805187
4 2013.916 5.0457294 9.126798
5 2014.000 4.5950547 8.764148
6 2014.083 5.8726436 10.396162
7 2014.166 5.1690574 9.810975
8 2014.250 6.9278325 11.617445
9 2014.333 5.3567750 10.213214
10 2014.416 4.9143321 9.889345
11 2014.500 5.1610276 10.279603
12 2014.583 5.8129479 11.049836
13 2014.666 4.4393384 10.373702
14 2014.750 3.9514997 10.077177
15 2014.833 2.2739502 8.514822
16 2014.916 3.1999414 9.668961
17 2015.000 3.7601613 10.405300
18 2015.083 4.9787643 11.891260
19 2015.166 4.7465612 11.846353
20 2015.250 6.7378882 13.981007
21 2015.333 3.4852503 10.924535
22 2015.416 3.5211132 11.135464
23 2015.500 4.5715917 12.366976
24 2015.583 4.2882379 12.253467
25 2015.666 3.3025176 12.411353
26 2015.750 2.4104964 11.782363
27 2015.833 0.7778751 10.313140
28 2015.916 2.2584848 12.133402

More Related Content

Viewers also liked

Jungheinrich growingwith passion2016
Jungheinrich growingwith passion2016Jungheinrich growingwith passion2016
Jungheinrich growingwith passion2016
Company Spotlight
 
Jungheinrich callsept15
Jungheinrich callsept15Jungheinrich callsept15
Jungheinrich callsept15
Company Spotlight
 
Jungheinrich calljune2015
Jungheinrich calljune2015Jungheinrich calljune2015
Jungheinrich calljune2015
Company Spotlight
 
Jungheinrich callmarch2015
Jungheinrich callmarch2015Jungheinrich callmarch2015
Jungheinrich callmarch2015
Company Spotlight
 
Cypress Development Corp. Corporate Presentation
Cypress Development Corp. Corporate PresentationCypress Development Corp. Corporate Presentation
Cypress Development Corp. Corporate Presentation
Company Spotlight
 
Corporate Presentation
Corporate PresentationCorporate Presentation
Corporate Presentation
Company Spotlight
 
Accounting for non-life insurances
Accounting for non-life insurancesAccounting for non-life insurances
Accounting for non-life insurancesscef0002
 
Chase Corp - Investor presentation
Chase Corp - Investor presentation Chase Corp - Investor presentation
Chase Corp - Investor presentation
Company Spotlight
 
Introduction to Business Process Analysis and Redesign
Introduction to Business Process Analysis and RedesignIntroduction to Business Process Analysis and Redesign
Introduction to Business Process Analysis and Redesign
Marlon Dumas
 

Viewers also liked (10)

Beazley interims2014
Beazley interims2014Beazley interims2014
Beazley interims2014
 
Jungheinrich growingwith passion2016
Jungheinrich growingwith passion2016Jungheinrich growingwith passion2016
Jungheinrich growingwith passion2016
 
Jungheinrich callsept15
Jungheinrich callsept15Jungheinrich callsept15
Jungheinrich callsept15
 
Jungheinrich calljune2015
Jungheinrich calljune2015Jungheinrich calljune2015
Jungheinrich calljune2015
 
Jungheinrich callmarch2015
Jungheinrich callmarch2015Jungheinrich callmarch2015
Jungheinrich callmarch2015
 
Cypress Development Corp. Corporate Presentation
Cypress Development Corp. Corporate PresentationCypress Development Corp. Corporate Presentation
Cypress Development Corp. Corporate Presentation
 
Corporate Presentation
Corporate PresentationCorporate Presentation
Corporate Presentation
 
Accounting for non-life insurances
Accounting for non-life insurancesAccounting for non-life insurances
Accounting for non-life insurances
 
Chase Corp - Investor presentation
Chase Corp - Investor presentation Chase Corp - Investor presentation
Chase Corp - Investor presentation
 
Introduction to Business Process Analysis and Redesign
Introduction to Business Process Analysis and RedesignIntroduction to Business Process Analysis and Redesign
Introduction to Business Process Analysis and Redesign
 

Similar to Life Insurance Analysis

Analysis the performance of life insurance in private insurance
Analysis the performance of life insurance in private insuranceAnalysis the performance of life insurance in private insurance
Analysis the performance of life insurance in private insuranceAlexander Decker
 
Predictive Analytics and Modeling in Life Insurance
Predictive Analytics and Modeling in Life InsurancePredictive Analytics and Modeling in Life Insurance
Predictive Analytics and Modeling in Life Insurance
Experfy
 
Predictive Analytics and Modeling in Product Pricing (Personal and Commercial...
Predictive Analytics and Modeling in Product Pricing (Personal and Commercial...Predictive Analytics and Modeling in Product Pricing (Personal and Commercial...
Predictive Analytics and Modeling in Product Pricing (Personal and Commercial...
Experfy
 
Chapter 22_Insurance Companies and Pension Funds
Chapter 22_Insurance Companies and Pension FundsChapter 22_Insurance Companies and Pension Funds
Chapter 22_Insurance Companies and Pension FundsRusman Mukhlis
 
Actuarial sc ans
Actuarial sc ansActuarial sc ans
Actuarial sc ans
Dr. Firdaus Khan
 
ACS2011PaperAndrewHoultram
ACS2011PaperAndrewHoultramACS2011PaperAndrewHoultram
ACS2011PaperAndrewHoultramAndrew Houltram
 
Palm Under The Skin
Palm Under The SkinPalm Under The Skin
Palm Under The Skin
mackieman
 
IAQF report_Cornell Team
IAQF report_Cornell TeamIAQF report_Cornell Team
IAQF report_Cornell TeamJin Li
 
Εργασία περί Μαθηματικού Αποθέματος
Εργασία περί Μαθηματικού ΑποθέματοςΕργασία περί Μαθηματικού Αποθέματος
Εργασία περί Μαθηματικού ΑποθέματοςLeonidas Souliotis
 
Insurance Concepts
Insurance ConceptsInsurance Concepts
Insurance Concepts
marx christian sorino
 
INNOVATIVE STRATEGIES TO HELP MAXIMIZE SOCIAL SECURITY BENEFITS
INNOVATIVE STRATEGIES TO HELP MAXIMIZE SOCIAL SECURITY BENEFITSINNOVATIVE STRATEGIES TO HELP MAXIMIZE SOCIAL SECURITY BENEFITS
INNOVATIVE STRATEGIES TO HELP MAXIMIZE SOCIAL SECURITY BENEFITS
Robert Hitchcock CLU, ChFC
 
Soc sec max benes pi3609 08 01_2014
Soc sec max benes pi3609 08 01_2014 Soc sec max benes pi3609 08 01_2014
Soc sec max benes pi3609 08 01_2014
John Baratta
 
Three insurance claims scenarios every agent should share with their insured
Three insurance claims scenarios every agent should share with their insuredThree insurance claims scenarios every agent should share with their insured
Three insurance claims scenarios every agent should share with their insured
AMT Warranty
 
Allocating Assets And Discounting Cash Flows Pension Plan Finance
Allocating Assets And Discounting Cash Flows  Pension Plan FinanceAllocating Assets And Discounting Cash Flows  Pension Plan Finance
Allocating Assets And Discounting Cash Flows Pension Plan Finance
Carrie Cox
 
A study on customer satisfaction of life insurance policies
A study on customer satisfaction of life insurance policiesA study on customer satisfaction of life insurance policies
A study on customer satisfaction of life insurance policiesAnnamumumu
 
Finance assignment help
Finance assignment helpFinance assignment help
Finance assignment help
ozpaperhelp2
 

Similar to Life Insurance Analysis (20)

Analysis the performance of life insurance in private insurance
Analysis the performance of life insurance in private insuranceAnalysis the performance of life insurance in private insurance
Analysis the performance of life insurance in private insurance
 
Predictive Analytics and Modeling in Life Insurance
Predictive Analytics and Modeling in Life InsurancePredictive Analytics and Modeling in Life Insurance
Predictive Analytics and Modeling in Life Insurance
 
Predictive Analytics and Modeling in Product Pricing (Personal and Commercial...
Predictive Analytics and Modeling in Product Pricing (Personal and Commercial...Predictive Analytics and Modeling in Product Pricing (Personal and Commercial...
Predictive Analytics and Modeling in Product Pricing (Personal and Commercial...
 
Valley view university
Valley view universityValley view university
Valley view university
 
Valley view university
Valley view universityValley view university
Valley view university
 
Chapter 22_Insurance Companies and Pension Funds
Chapter 22_Insurance Companies and Pension FundsChapter 22_Insurance Companies and Pension Funds
Chapter 22_Insurance Companies and Pension Funds
 
Actuarial sc ans
Actuarial sc ansActuarial sc ans
Actuarial sc ans
 
ACS2011PaperAndrewHoultram
ACS2011PaperAndrewHoultramACS2011PaperAndrewHoultram
ACS2011PaperAndrewHoultram
 
Final Report
Final ReportFinal Report
Final Report
 
Palm Under The Skin
Palm Under The SkinPalm Under The Skin
Palm Under The Skin
 
IAQF report_Cornell Team
IAQF report_Cornell TeamIAQF report_Cornell Team
IAQF report_Cornell Team
 
Εργασία περί Μαθηματικού Αποθέματος
Εργασία περί Μαθηματικού ΑποθέματοςΕργασία περί Μαθηματικού Αποθέματος
Εργασία περί Μαθηματικού Αποθέματος
 
summer project repor3
summer project repor3summer project repor3
summer project repor3
 
Insurance Concepts
Insurance ConceptsInsurance Concepts
Insurance Concepts
 
INNOVATIVE STRATEGIES TO HELP MAXIMIZE SOCIAL SECURITY BENEFITS
INNOVATIVE STRATEGIES TO HELP MAXIMIZE SOCIAL SECURITY BENEFITSINNOVATIVE STRATEGIES TO HELP MAXIMIZE SOCIAL SECURITY BENEFITS
INNOVATIVE STRATEGIES TO HELP MAXIMIZE SOCIAL SECURITY BENEFITS
 
Soc sec max benes pi3609 08 01_2014
Soc sec max benes pi3609 08 01_2014 Soc sec max benes pi3609 08 01_2014
Soc sec max benes pi3609 08 01_2014
 
Three insurance claims scenarios every agent should share with their insured
Three insurance claims scenarios every agent should share with their insuredThree insurance claims scenarios every agent should share with their insured
Three insurance claims scenarios every agent should share with their insured
 
Allocating Assets And Discounting Cash Flows Pension Plan Finance
Allocating Assets And Discounting Cash Flows  Pension Plan FinanceAllocating Assets And Discounting Cash Flows  Pension Plan Finance
Allocating Assets And Discounting Cash Flows Pension Plan Finance
 
A study on customer satisfaction of life insurance policies
A study on customer satisfaction of life insurance policiesA study on customer satisfaction of life insurance policies
A study on customer satisfaction of life insurance policies
 
Finance assignment help
Finance assignment helpFinance assignment help
Finance assignment help
 

Life Insurance Analysis

  • 1. 1 Time Series Analysis of Monthly Paid Life Insurance Claims: A Nonstationary Seasonal Model Approach Jonathan A. Poon Abstract The intent of this study is to properly identify a model that best fits monthly paid life insurance claims at company ABC and to forecast future projections on how much they expect to pay out in the near future . It was determined through methods of combined first differencing and maximum likelihood that an ARIMA(4,1,0) x ARIMA(1,1,0)12 model fit the data most sufficiently. This suggested model is questionable at one point being that a potential outlier proposes inadequacy. In April 2013, company ABC experienced a high amount of claims paid out in the life insurance area (the highest that occurred throughout the thirteen year period investigated). Speculations of this occurrence could have been from possible improvements in the claims process at the company or methods that were implemented to aide their customers in understanding how to submit claims correctly and faster. In order to ensure privacy to the company, further explanation of this possible outlier was not investigated. Additional models were also taken into consideration. However, thorough model diagnostics showed these candidate models had drawbacks in that they did not abide by the laws of normality, even though their estimated parameters were significant. By applying the suggested model into forecasting, the company should have expected $7,450,880.00, $6,357,658.00 and $4,863,679.00 in paid life insurance claims during the months of September, October and November, respectively. These figures are already known, but were not taken into consideration during the study. Forecasting up to December 2015 was also reviewed to advise how much in life insurance claims the company is expected to pay out in the near future.
  • 2. 2 I. Introduction The name speaks for itself when you hear the term life insurance. It's an agreement, between an insured (policy holder) and the insurer (insurance provider), which promises to pay money to the insured's beneficiary (husband, wife, children, etc.) in the case of the unexpected against the insured, namely death. It stems from an old principle founded in Paris during the 17th Century by Italian investor Lorenzo Tonti. According to Lorenzo's Tontine Annuity System, its practice was attempting to apply the laws of averages and the principle of life expectancies in establishing annuities. Basically a group of individuals, no matter what their age was, could make equal contributions of money towards a single sum of money. This sum of money was then invested and at the end of each year the interest was dispersed amongst each contributor. In Tonti's system contributors were considered as survivors (Steuer, 3-4). The conclusion of Tonti's system is that whoever the last contributor was to survive received both the interest of that year and the entire amount of the principle. However, as more people (contributors) demanded to be insured by more money risk in violence amongst contributors also increased. It was then that the Law of Large of Numbers was partly discovered. In essence to life insurance, the Law of Large Numbers applies the intuitive method that the more people who contribute (that are at risk to death), the more money that will be coming in. This way, when one person does die, this would not affect the remaining contributed group. As of today, life insurance functions as a safeguard against misfortunes by having the losses of the unfortunate few paid for by the contributions of the many that are exposed to the same peril. In essence, the sharing of losses and, in the process, the substitution of a certain small "loss" (the premium payment) for an uncertain large loss (Steuer, 4). Along with purchasing life insurance comes the possibility of making a claim. A claim is a procedure that a beneficiary must request in case of the policy holder's unexpected death. When a person purchases one or more life insurance plans they should make a decision as to which type of payment plan will need to be applied in case of a future claim on their policy. However if this goal was not met, various types of payment plans can be offered. In his book, Steuer describes these options: 1) Lump sum - the life insurance company provides the entire death benefit to the insured's beneficiary in one lump sum.
  • 3. 3 This allows the beneficiary to do whatever they want with it. 2) Specific income provision - on a set schedule, the life insurance company provides the claimant both the principal and the interest of the life insurance benefit. 3) Life income option - a guaranteed income for life. This amount of income is based around the death benefit amount, and the claimant's age and gender at the time of the insured's death. 4) Interest income option - the life insurance company holds all proceeds and pays out only the interest to the primary beneficiary. Only when the primary beneficiary dies is when the death benefit is paid out, specifically to a second beneficiary. These plans are specifically geared towards situations in which the beneficiary has no way to manage money at the time of filing the claim (Steuer, 259). When you simply look at what a claim is, specifically in regards to life insurance, one can speculate that if a company is able to collect a large amount of premiums for the life insurance they sell, they should be able to pay out death benefits when the time comes. However it is not so simple. Such complexities in the insurance industry are handled by professionals known as actuaries: mathematicians in the insurance industry that "calculate premiums, reserves, dividends and insurance, pension and annuity rates, using risk factors obtained from experience tables" (Steuer, 4). Surprisingly, these experience tables are based on, but not limited to, an insurance company's history of insurance claims! Therefore, the aim of this study is to 1) identify a possible candidate model that describes monthly paid life insurance claims during January 2001 to August 2013 at insurance company ABC and 2) forecast future monthly claims paid out to the company's life insurance buyers, assuming monthly amounts are persistent with observed data. By studying this topic, the insurance company can gain a perspective of how much they expect to pay out in life insurance claims in the near future. Forecast analysis could allow the following questions to be considered: 1) "Can these total monthly amounts tell us how well we're processing incoming claim requests?" and 2) "Based on the forecast, should we look into improving our claims process?", and 3) "Are life claim settlement forecasts much higher than what we expect?". Forecasting monthly paid life insurance claims for company ABC could also aide in calculating future premiums and reserves for their life insurance products.
  • 4. 4 II. Data I was able to gather the claims data from the insurance company I am currently employed at that is discretely mentioned in the study. The data is a collection of monthly paid life insurance claims that spans from January 2001 to November 2013, which consists of a total of 154 data points. That is, each data point represents the total amount of life insurance claims (in dollars) company ABC paid out each month. The months of September, October and November 2013 were left out of the analysis to be compared to the proposed model's forecasting accuracy. During the study a noticeable outlier is detected, but due to omitted information and ensuring company privacy, assumptions could not be made to prove this. The following study on this data is as follows: 1) Model Specification, 2) Parameter Estimation (Model Fitting), 3) Model Diagnostics, 4) Forecasting Analysis and 5) Discussion. III. Model Specification The most important part of model specification is to first look at the observed data in its original movement. The plot in Figure 1 on page 5 is a time series plot of the series {Yt} = monthly paid life insurance claims Y at time t. The plot suggests paid claims in the life insurance sector has a steadily increasing trend over time and a within-year seasonal pattern. Notice that from 2001 to 2004 the amount of life insurance claims paid out followed very closely together, month-to-month. However, notice thereafter the increasing variation between month-to-month paid claims. In some occurrences, mainly between months during 2008 to present day had extremely low and high amounts of life insurance claims paid out. Taking a look at mean (or average) levels year-to-year show that the mean amount of monthly paid life insurance claims is not constant. For instance, the following mean amount of monthly claims for year's 2005 to 2007 were exactly $4,972,819.74, $5,140,671.21 and $5,341,678.09, respectively. Figure 3 in the Appendix represents a plot of the autocorrelations of the time series across lag k, for k =1,2,3,....,36. The plot is more commonly known as an Autocorrelation Function (ACF) plot. Notice that as lag k increases, noticeable spikes in seasonal autocorrelations are present. More specifically as lags k = 12,24,
  • 5. 5 and 36. The plot also shows slow decay in autocorrelations as lag k increases, which in theory, suggests the series in its original form is not that of stationary process. Figure 1: Time Series Plot of Monthly Paid Life Insurance Claims at Company ABC Based on inference of this series, {Yt} is pronounced as being that of a nonstationary process. In order for the data to be properly modeled the series must be removed of any trend or seasonal patterns. This approach is known as "detrending", more specifically taking the combined first differences of the time series will be implemented. This type of method falls under a class of models that are defined as Multiplicative Nonstationary Seasonal Autoregressive Integrated Moving Average (SARIMA) with seasonal period s, denoted by ARIMA(p,d,q) x ARIMA(P,D,Q)s (Cryer & Chan, 234). The following discussions will walk you through a brief description of the SARIMA family of models and the method of combined first differencing. This will present possible multiplicative stationary SARIMA processes that sufficiently represent the stochastic progression of the monthly paid life insurance claims data over time.
  • 6. 6 In an ARIMA(p,d,q) x ARIMA(P,D,Q)s , the character "p" represents the order of the autoregressive component, "d" represents the number of nonseasonal differences needed on the series to achieve stationarity, and "q" represents the order of the moving average component. In this study, since the data has seasonal patterns, a multiplicative model consisting of both nonseasonal and seasonal components must be applied. In the seasonal part of the model, the character "P" represents the order of the seasonal autoregressive component, "D" represents the number of seasonal differences needed on the series to achieve stationarity, and "Q" represents the order of the seasonal moving average component. The character "s" represents the seasonal period. Typically "s" is equal to either 12 (for monthly data) or 4 (for quarterly data). In this study, we know immediately that s = 12. The general form of the ARIMA(p,d,q) x ARIMA(P,D,Q)s process in backshift notation is (1) where {et} is zero mean white noise with variance var(et) = , which means the error terms et are assumed normally distributed. In equation (1), we are defining the monthly paid life insurance claims process {Yt} as a multiplicative SARIMA model with dth nonseasonal differences and Dth seasonal differences. "The backshift operator, denoted by B, operates on the time index of the series and shifts time back one time unit to form a new series" (Cryer & Chan, 106). In particular, (2) (Cryer & Chan, 106). The nonseasonal autoregressive (AR) and moving average (MA) characteristics operators; respectively, are (3) (4)
  • 7. 7 and the seasonal autoregressive (AR) and moving average (MA) characteristics operators; respectively, are (5) (6) and (7) In most SARIMA class models, characters "d" and "D" are usually equal to either 1 or 2. In this study, we take into account only the first differences. Figure 2 on page 8 provides a depiction of how the series {Yt} has been modified using the methods of differencing and why combined first differences is the method that should be used. Notice the plot in the upper right corner. This plot represents the series {Yt} as if only nonseasonal differences, that is d = 1, was taken into account. As time passes across this new series, seasonal patterns are still present. Therefore, taking only nonseasonal differences does not bring us in the right direction. As for the plot in the bottom left corner, this plot represents the series {Yt} as if only seasonal differences, that is D = 1 and s = 12, was taken into consideration. Notice in this plot all seasonal patterns are practically removed. However, there are some areas in this time series where the data points tend to "hang together", rather than oscillate one another. A slight sinusoidal trend in the data is also present as well suggesting that issues of trend is still present. The goal is to have any trend removed, as mentioned earlier, to achieve stationarity. Therefore, taking only seasonal differences does not move us in the right direction. Finally, combining both the seasonal and nonseasonal differences to the original time series (bottom right corner), that is d = D = 1 and s = 12, provides all the characteristics of a stationary time series plot. Notice now in this series any noticeable trend (whether positive or negative) and seasonal patterns are no longer existent. Without further consideration of applying second differences, first differences will be our stopping point.
  • 8. 8 Figure 2: Differencing Methods Applied to Time Series Data. Upper Left: Original Time Series {Yt}. Upper Right: Nonseasonal First Differences of Original Time Series, that is ∇Yt. Bottom Left: Seasonal First Differences of Original Time Series, that is ∇12 Yt. Bottom Right: Combined First Differences (Seasonal & Nonseasonal) of Original Time Series, that is ∇∇12 Yt.
  • 9. 9 The reason for not considering second differences to see if it might improve our persuasion of a stationary process is the result of using the augmented Dickey-Fuller Unit Root Procedure. This procedure tests the following: H0 : α = 1 (nonstationarity). vs. H1 : α < 1 (stationarity). where H0 is the null hypothesis, or what is assumed to be true until proven otherwise. Symbol H1 represents the alternative hypothesis, or what is trying to be proved against the null hypothesis. In this study, the combined first differences of {Yt} was applied to this test. According to the results, Output-1 in the Appendix, our p-value < 0.01. Therefore, we have significant evidence to reject the null hypothesis. That is, we have evidence to show that the combined first differences of {Yt} is that of stationary process. In order to obtain appropriate values for p, q, P, and Q the sample ACF and PACF (Partial Autocorrelation Function) plot of the combined first differences time series are plotted. According to the sample ACF plot (left) in Figure 4 in the Appendix, observe the only noticeable spike in autocorrelation is at nonseasonal lag k = 1 and seasonal lag k = 12 with autocorrelation negligible elsewhere. As for the sample PACF plot (right) in Figure 4, there is a noticeable spike at nonseasonal lag k = 1,2, 4 and 11, and seasonal lag k =12. Also, at seasonal lag k =12 of the PACF, notice how autocorrelation exponentially decays as lag k increases. Seasonal lags 12, 24 and 36 also show that of a sinusoidal pattern. That is, for each successive seasonal lag, the autocorrelation oscillates from negative, positive, and negative. Therefore, the sample ACF plot and sample PACF plot suggests the following model, ARIMA(4,1,0) x ARIMA(1,1,0)12 From the proposed model, we will follow the process of parameter estimation and assess if the model fits the monthly paid life insurance claims data adequately. Two additional models will also be taken into consideration due to their parameter estimates showing significance.
  • 10. 10 IV. Parameter Estimation (Model Fitting) An ARIMA(p,d,q) x ARIMA(P,D,Q)s process with nonseasonal characteristics p = 4, d = 1, and q = 0; and seasonal characteristics P = 1, D = 1 and Q = 0 can be expressed as an ARIMA(4,1,0) x ARIMA(1,1,0)12. By applying equation (1), which is built off of equations (2)-(7), this process can be expressed as, (8) or, equivalently, (9) In order to estimate parameters in either equation (8) or (9) the method of maximum likelihood (MLE) is applied. According to the method of MLE: "for any set of observations, Y1, Y2,..., Yn, time series or not, the likelihood function L is defined to be the joint probability density of obtaining the data actually observed. However, it is considered as a function of the unknown parameters in the model with observed data held fixed. For ARIMA models, L will be a function of the ϕ's, θ's, Θ's, Φ's, μ and σ2 e given the observations Y1, Y2,..., Yn. The maximum likelihood estimators are then defined as those values of the parameters for which the data actually observed are most likely, that is, the values that maximize the likelihood function" (Cryer & Chan, 158). According to Output-2 in the Appendix, the estimates for each parameter are = -0.8075, = -0.8026, = -0.482, = 0.3837 and = -0.5154. When fitted to our proposed SARIMA model, we have the following equation (10): By methods of MLE, we can construct large sample confidence intervals of each parameter estimate, with the following equation,
  • 11. 11 (11) where za/2 = z1-a/2 = z0.975 = 1.96 assuming that ). SE(parameter estimate) is the approximate standard error of the estimated parameter. These values can be seen in Output-2 in the Appendix (labeled as s.e.). What this confidence interval can show is whether or not the parameter estimates for the proposed model are significantly different than zero. If they are insignificant this means their confidence interval will contain the value of zero. By applying equation (11) to the proposed parameter estimates and their respective standard errors, the following 95% confidence intervals (significance level α = 0.05) were calculated to be (-1.01235,-0.60265), (-1.04899, -0.55621), (-0.73226, -0.23174), (-0.59603, -0.17137) and (-0.71561, -0.31519) for parameters , , , and ; respectively. Notice that amongst all confidence intervals present, none contain zero. This concludes that our proposed model parameters estimated by methods of MLE are significantly different than zero. We can now proceed to check for assumptions of normality and independence in the residuals, et, in equation (10); and assess whether or not the proposed model is appropriate in light of the observed monthly paid life insurance claims data. The method of Overfitting will also be applied to ensure additional complex models, created from the proposed model, are negligible. V. Model Diagnostics To assess the assumption of normality in the proposed model (equation (10) on page 10), a historgram and QQ plot of the standardized residuals, et, are plotted. In Figure 5 in the Appendix, the historgram (left) shows a normal distribution curve is definitely an appropriate fit on this plot. In the QQ plot (right) in Figure 5, notice all data points (residuals) fall closely to the normal fitted line. This further suggests no gross departure from normality. To further suggest the residuals are normally distributed, the Shapiro-Wilk procedure is applied. In Output-3 in the Appendix, the Shapiro-Wilk procedure tests the following:
  • 12. 12 H0 : Standardized residuals are normally distributed. vs. H1 : Standardized residuals are not normally distributed. Based on this test, conducted in R, our observed p-value = 0.2361. Therefore, at any reasonable significance level α, we fail to reject H0. That is, we do not have enough evidence to show that the standardized residuals are not normally distributed. To assess the assumption of independency in the proposed model, a time series plot of the standardized residuals, et, is used. In Figure 6 in the Appendix, the movement of the residuals over time suggest no alarming instances of erratic patterns and all look be to moving in an oscillating fashion. To further suggest independence in the residuals, a formal hypothesis test procedure can be applied. This test is what is known as the Runs Procedure. It tests the following: H0 : Standardized residuals are independent. vs. H1 : Standardized residuals are not independent. Based on this test, conducted in R (Output-4 in the Appendix), our observed p-value = 0.289. Therefore, at any reasonable significance level α, we fail to reject H0. That is, we do not have enough evidence to show that the standardized residuals are not independent. We now know the proposed model for the monthly paid life insurance claims data abides by the rules of normality and independence. To further suggest the ARIMA(4,1,0) x ARIMA(1,1,0)12 is an appropriate fit to the observed data, the Ljung-Box (1978) procedure can be applied. This procedure, in reference to this study, tests the following: H0 : The ARIMA(4,1,0) x ARIMA(1,1,0)12 is an appropriate fit. vs. H1 : The ARIMA(4,1,0) x ARIMA(1,1,0)12 is not an appropriate fit. According to this test (Output-5 in the Appendix), at a maximum lag K = 36, we observe our test statistic to be with a corresponding p-value = 0.4841. At any reasonable significance level,
  • 13. 13 we fail to reject H0. That is we do not have enough evidence to reject that the proposed model is appropriate. Figure 7 in the Appendix provides visualization of the Ljung-Box Procedure at work. The top plot provides a time series plot of the standardized residuals over time. Notice that all but one residual fall between -3 and 3. According to the Ljung-Box procedure in R, residuals in a time series plot are tested to be outliers by using the "Bonferroni" criterion. That is, any residuals that fall outside of the bounds z0.025/n ≈ ±3.591368, for n = # of original observed data points = 152, of a residual time series plot are considered an "outlier". The residual in question is the 148th residual in the plot. With no surprise, this is the month (April 2013; mentioned on page 1) that the life insurance company ABC experienced the most pay outs in life insurance claims. This residual, however, was calculated to be approximately 3.079969, which falls within the Bonferroni criterion. Therefore, the Ljung-Box procedure does not discern this residual as an outlier. To look further, Figure 7's middle plot depicts the sample ACF of the standardized residuals. According to this plot, if correlations fall outside the red-dotted lines we have lag k's that are statistically significant. What is desired is a residual ACF plot free of significant correlations. That is, a plot that resembles a white noise process. The only "statistically significant" correlation is at lag k = 24, which is actually a seasonal lag. However, this correlation is approximately -0.1889 (Output-6 in the Appendix), which is a very small correlation. To further suggest this autocorrelation is of no concern, regress back to Figure 4 in the Appendix, the sample ACF and PACF of the combined first differences of the original data observed. There are no suggestions on these plots that seasonal lag k = 24 has any significance in the data. Therefore, there is no reason to further speculate dependence at lag k = 24. Further support is mentioned in the discussion of overfitting the proposed model on page 15. As mentioned on page 9, there are still two additional models that pass as possible candidate models: 1) ARIMA(0,1,1) x ARIMA(0,1,1)12 and 2) ARIMA(4,1,0) x ARIMA(0,1,1)12. After thorough model diagnostic comparisons were ran, it was a tough decision choosing the ARIMA(4,1,0) x ARIMA(1,1,0)12 model as the suggested model fit. For instance, notice in Table 1 on page 14 that among all three candidates, the proposed model has the highest AIC (Akaike's Information Criterion) and white noise variance. What is strived for in a proposed model is a model fit that minimizes the AIC and white
  • 14. 14 noise variance the best. The ARIMA(0,1,1) x ARIMA(0,1,1)12 can be observed as having both of these qualities. The same applies to the ARIMA(4,1,0) x ARIMA(0,1,1)12, the AIC and white noise variance are very low and not much bigger than the model previously stated. However, a very important diagnostic observation that strayed away from these candidate models were their test for normality in the residuals. Notice for the ARIMA(0,1,1) x ARIMA(0,1,1)12, the p-value < 0.005 according to the Shapiro-Wilk test. Therefore, we can conclude the residuals do not follow that of a normal distribution. This is highly requested in that, by theory, we expect a stationary SARIMA model to have ). As for the ARIMA(4,1,0) x ARIMA(0,1,1)12, again the Shapiro-Wilk test convinces to not pursue this model as choice. The p-value estimated under this test for the ARIMA(4,1,0) x ARIMA(0,1,1)12 was 0.01442. Although it barely passes as insignificant (if say α = .01 was declared before the study), I would be hesitant to select this SARIMA process as my model of choice. In conclusion, the proposed model does not provide the best AIC value or white noise variance; however, it still comes across as meeting all assumptions of passing as an appropriate stationary SARIMA model. Table 1: Model Diagnostic Comparisons Amongst Candidate Models ARIMA(4,1,0) x ARIMA(1,1,0)12 ARIMA(4,1,0) x ARIMA(0,1,1)12 ARIMA(0,1,1) x ARIMA(0,1,1)12 CI Intervals of Parameter Estimates Significant Significant Significant AIC 401.89 383.55 376.44 Whitenoise Variance 0.9446 0.7625 0.6672 Assumption of Normality (Shapiro-Wilk) p = 0.2361 p = 0.01442 p = 0.00574 Assumption of Independence (Runs) p = 0.289 p = 0.289 p = 0.627 Ljung-Boxt Test p = 0.4841 p = 0.6695 p = 0.2577 With model diagnostics completed, it's suggested that an ARIMA(4,1,0) x ARIMA(1,1,0)12 is the model of choice for fitting the monthly paid life insurance claims data. The following methods proceeding
  • 15. 15 will look into confirming that higher, or more complex models, are insignificant (or propose uncertainty) in relation to our proposed model. This proceeding section discusses the methods of Overfitting. To further assess the proposed model accurately depicts the observed life insurance claims data, the following more complex models and their estimate parameters will be analyzed: 1) ARIMA(5,1,0) x ARIMA(1,1,0)12, 2) ARIMA(4,1,1) x ARIMA(1,1,0)12 , 3) ARIMA(4,1,0) x ARIMA(1,1,1)12 and 4) ARIMA(4,1,0) x ARIMA(2,1,0)12. In order to assess if these additional parameters are significant, 95% Confidence Intervals will be applied in each overfitted model. The following conclusions arise from the methods of overfitting (conclusions were made by assistance of Output-7 in the Appendix): ARIMA(5,1,0) x ARIMA(1,1,0)12 ϕ5 is not significant. ARIMA(4,1,1) x ARIMA(1,1,0)12 ϕ1, ϕ2, ϕ3 and ϕ4 become insignificant with the additional θ1. ARIMA(4,1,0) x ARIMA(1,1,1)12 Φ1 becomes insignificant with the addition of Θ1. ARIMA(4,1,0) x ARIMA(2,1,0)12 All parameter estimates are significant. What's peculiar about the proposed method of overfitting is that it discovers the ARIMA(4,1,0) x ARIMA(2,1,0)12 model parameter estimates to be significantly different from zero. However, by analysis of the ACF and PACF plot of the combined first differences (Figure 4 in the Appendix) there is no evidence in the PACF plot that suggests a notable correlation spike at seasonal lag k = 24. Therefore, this model should not be considered in light of its significant parameters. After thorough model specification, fitting and diagnostics, it's been determined that the ARIMA(4,1,0) x ARIMA(1,1,0)12 model proves to be the best candidate model over the ARIMA(4,1,0) x ARIMA(0,1,1)12 and the ARIMA(0,1,1) x ARIMA(0,1,1)12 models. Although it was determined the additional candidate models had promising attributes to be a model favorite, model diagnostics proved otherwise. In the proceeding, we will further carry our proposed model into the analysis of forecasting. That is, we will apply the ARIMA(4,1,0) x ARIMA(1,1,0)12 model to the observed monthly paid life
  • 16. 16 insurance claims data and predict how much company ABC should expect to pay out per month in claim settlements for all of their life insurance products combined. VI. Forecasting Analysis Programming package R was used to compute monthly paid life insurance claims forecast and 95% prediction limits for the lead time l = 1,2,...,24 (24 months ahead) based on the proposed ARIMA(4,1,0) x ARIMA(1,1,0)12 model. Estimated minimum mean squared error (MMSE) forecasts and their respective 95% prediction limits can be seen in Output-8 in the Appendix. By plotting these predicted elements to our original time series observed on page 5 Figure 1, a visual depiction provides insights as to how well the proposed model's forecast mimics the stochastic periodicity in the monthly paid life insurance claims data. Figure 8: Model Forecasts Plotted on Monthly Paid Life Insurance Claims Time Series
  • 17. 17 In Figure 8, solid black dots represent MMSE predictions of the monthly paid life insurance claim settlements the company could anticipate to see in the near future. Blue dashed lines in the plot represent the 95% prediction intervals of each MMSE prediction. Notice how as lead time l increases, the models 95% prediction interval also increases in width. This is due to nonstationarity in the original time series, which was discovered during the model specification phase. According to Output-8 in the Appendix we are 95% confident, based on the fitted ARIMA(4,1,0) x ARIMA(1,1,0)12 model, company ABC's total life insurance claims paid out on December 2013 will fall somewhere between $5,045,729.00 and $9,126,798.00. Forecasts for the months of September, October and November 2013 are compared to actual amounts already observed by the company. These observations are displayed as solid green dots in Figure 8 on page 16. As mentioned on page 1, these following months were in the life insurance claims dataset, but not taken into consideration in this study. Table 2. Comparison of Observed (not included in study) vs. Fitted Model Prediction Comparison of Observed vs. Predicted Month Observed 95% Lower Limit Prediction 95% Upper Limit September '13 $7,356,677.98 $5,545,976.00 $7,450,880.00 $9,355,784.00 October '13 $7,689,063.72 $4,417,795.00 $6,357,658.00 $8,297,520.00 November '13 $5,498,092.91 $2,922,172.00 $4,863,679.00 $6,805,187.00 Based on the proposed model, it seems to adequately predict what company ABC should anticipate with their total monthly paid life insurance claims. Therefore, assumptions can be made further into the future for the company in regards to how much they expect to pay out in life insurance claims. According to the predictions, company ABC should expect to pay out approximately $189,392,938.00 in life insurance claims from December 2013 to December 2015. If this is considered to be grossly higher than what company ABC expects it to be, production processes in the claims department could be recommended to be closely watched. When it comes down to the simplicity of a claims settlement, Greene explains that the only direct contact an insurance buyer has in filing a claim is usually the claims department alone. If communication between the insured and the claims department are left with bad
  • 18. 18 impressions then possible liabilities such as regulatory censure, lawsuits or even suspension of the right to carry on business in the jurisdiction involved could occur. On the other side, if it's noticed by the company "overly liberal life insurance settlements are occurring, this could result in higher life insurance rate levels and even loss of business through lower premiums charged by competitors" (Greene, 173). VII. Discussion By applying the four strategies in model fitting, the present study identified that the ARIMA(4,1,0) x ARIMA(1,1,0)12 process described the stochastic periodicity of recorded monthly paid life insurance claims at company ABC the most effectively. By methods of maximum likelihood, the following fitting model was ascertained, . From thorough model diagnostics, two additional models nearly outweighed the proposed ARIMA(4,1,0) x ARIMA(1,1,0)12. However, these two candidate models did not agree (with one being very distrustful) with the assumptions of normally distributed residuals, et. Therefore, they were no longer considered as viable candidate models. By methods of model overfitting, an intriguing discovery suggested the ARIMA(4,1,0) x ARIMA(2,1,0)12 overfitted model showed promise. That is, all of its estimated parameters were shown to be significantly different than zero and all assumptions were met: 1) stationarity, 2) normality, 2) independence and 3) model appropriateness. However, this model suggested an additional seasonal autoregressive (AR) order was needed. This contradicted what was determined in the model specification phase by analysis of the ACF and PACF plots of the combined first differences of {Yt}. Therefore, all overfitted models were inconclusive. Some downfalls with the proposed model in this study were comparisons of its calculated AIC and white noise variance amongst the other two candidate models. Out of all three, the chosen ARIMA(4,1,0) x ARIMA(1,1,0)12 had the largest calculated AIC and white noise variance. It is desired during the model specifications phase to choose one that minimizes both components the best. I believe one aspect that restricted the influences of the other candidate models was the fact that the dataset had a
  • 19. 19 potential outlier. During the model diagnostic phase, both the ARIMA(4,1,0) x ARIMA(0,1,1)12 and the ARIMA(0,1,1) x ARIMA(0,1,1)12 models showed clear signs in their histogram and QQ plots that claims figure recorded for April of 2013 had probable reasons of being a viable outlier (Figure 9 in the Appendix). I believe this situation kept me from pursuing one of these two additional candidates as the suggested model fit. Through statistical programming in R, MMSE forecasts and 95% confidences intervals were determined by extending the proposed model to make future predictions of what the company should expect to see with monthly claim settlement figures in their life insurance sector. According to the forecast, company ABC should expect to pay out approximately $189,392,938.00 in life insurance claims in the next two years. Keep in mind the following projections are not taking environmental changes in the economy into consideration. According to the LIMRA 2013 Barometer Study, about two thirds of American consumers are concerned about financial stability when it comes to their retirement plan. A surprising factor in the study discovered that one third of consumers in America experienced the loss of a loved one or close friend within the past two years, and are now more inclined to purchase life insurance in essence of being more concerned with leaving their dependent(s) financially burdened. However, 10 percent of consumers that experienced a loss of a loved one purchased life insurance in response to the predicament. By the next year however, they are no more likely to purchase again than the general consumer (Denley, 5). Therefore, it can be warranted that forecasting and time series analysis plays an important role in predicting future outcomes. However one must also take into consideration the circumstances that rise above these analytics.
  • 20. 20 References Cryer, Jonathan D. and Kung-Sik Chan. Time Series Analysis With Applications in R. New York: Springer Science+Business Media, LLC, 2010. Print. Denley, Norah. 2013 Insurance Barometer Study. Research. Hartford, CT: LL Global, Inc., 2013. Web. Greene, Mark R. Risk and Insurance. Cincinnati: South-Western Publishing Co., 1968. Print. Steuer, Tony. Questions and Answers on Life Insurance: The Life Insurance Toolbook. Alameda: Life Insurance Sage Press, 2010. Print.
  • 21. 21 Appendix Figure 1: Time Series Plot of Monthly Paid Life Insurance Claims at Company ABC Figure 2: Differencing Methods Applied to Time Series Data. Upper Left: Original Time Series {Yt}. Upper Right: Nonseasonal First Differences of Original Time Series, that is ∇Yt. Bottom Left: Seasonal First Differences of Original Time Series, that is ∇12 Yt. Bottom Right: Combined First Differences (Seasonal & Nonseasonal) of Original Time Series, that is ∇∇12 Yt.
  • 22. 22 Figure 3: Sample ACF of Original Time Series Data Figure 4: Sample ACF and PACF of Combined First Differences of Original Time Series Data
  • 23. 23 Figure 5: Histogram and QQ plot of Standardized Residuals Figure 6: Time Series Plot of Standardized Residuals
  • 24. 24 Figure 7: Residual Graphics and Modified Ljung-Box p-values for an ARIMA(4,1,0) x ARIMA(1,1,0)12 Fit Figure 8: Seasonal Model Forecasting Estimates Applied to Original Time Series
  • 25. 25 Figure 9: Side-by-Side Histogram and QQ Plots of Additional Candidate Models Output-1: Augmented Dickey-Fuller Unit Root Test with the Combined First Differences Data > ar(diff(diff.claims,lag=12)) Call: ar(x = diff(diff.claims, lag = 12)) Coefficients: 1 2 3 4 5 6 7 8 -0.7543 -0.7441 -0.4034 -0.3184 -0.0427 -0.1071 -0.1325 -0.0710 9 10 11 12 13 14 -0.0482 -0.1379 -0.0439 -0.4215 -0.2309 -0.1812 Order selected 14 sigma^2 estimated as 1.176 > adf.test(diff(diff.claims,lag=12),k=14) Augmented Dickey-Fuller Test data: diff(diff.claims, lag = 12) Dickey-Fuller = -4.8077, Lag order = 14, p-value = 0.01 alternative hypothesis: stationary Warning message: In adf.test(diff(diff.claims, lag = 12), k = 14) : p-value smaller than printed p-value
  • 26. 26 Output-2: SARIMA Parameter Estimation using Methods of Maximum Likelihood > #ARIMA(4,1,0) x ARIMA(1,1,0)_{12}** > ar.claims.mdl2 <- arima(life.claims,order=c(4,1,0),method='ML',seasonal=list(order=c(1,1,0),period=12)) > ar.claims.mdl2 Call: arima(x = life.claims, order = c(4, 1, 0), seasonal = list(order = c(1, 1, 0), period = 12), method = "ML") Coefficients: ar1 ar2 ar3 ar4 sar1 -0.8075 -0.8026 -0.482 -0.3837 -0.5154 s.e. 0.0794 0.0955 0.097 0.0823 0.0776 sigma^2 estimated as 0.9446: log likelihood = -195.95, aic = 401.89 Output-3: Shapiro-Wilk Procedure to Test Normal Distributed Residuals > shapiro.test(rstandard(ar.claims.mdl2)) Shapiro-Wilk normality test data: rstandard(ar.claims.mdl2) W = 0.9883, p-value = 0.2361 Output-4: Runs Procedure to Test for Independence > runs(rstandard(ar.claims.mdl2)) $pvalue [1] 0.289 $observed.runs [1] 84 $expected.runs [1] 76.98684 $n1 [1] 77 $n2 [1] 75 $k [1] 0 Output-5: Determining Ljung-Box Procedure Testing for Model Appropriateness > Box.test(rstandard(ar.claims.mdl2), lag = 36, type = "Ljung-Box", fitdf = 4) Box-Ljung test data: rstandard(ar.claims.mdl2) X-squared = 31.6524, df = 32, p-value = 0.4841 Output-6: Determining Max Autocorrelation in Time Series Plot of Model Residuals (Figure 7) > acf.ar.claims.mdl2 <- acf(as.vector(rstandard(ar.claims.mdl2)),lag=36,plot=F) > max.lag <- which.max(abs(acf.ar.claims.mdl2$acf)) > acf.ar.claims.mdl2$lag[max.lag] [1] 24 > acf.ar.claims.mdl2$acf[max.lag] [1] -0.1889239
  • 27. 27 Output-7: Methods of Model Overfitting > ovf.ar.claims.mdl1 <- arima(life.claims,order=c(5,1,0),method='ML',seasonal=list(order=c(1,1,0),period=12)) > ovf.ar.claims.mdl2 <- arima(life.claims,order=c(4,1,1),method='ML',seasonal=list(order=c(1,1,0),period=12)) > ovf.ar.claims.mdl3 <- arima(life.claims,order=c(4,1,0),method='ML',seasonal=list(order=c(1,1,1),period=12)) > ovf.ar.claims.mdl4 <- arima(life.claims,order=c(4,1,0),method='ML',seasonal=list(order=c(2,1,0),period=12)) > ovf.ar.claims.mdl1 Call: arima(x = life.claims, order = c(5, 1, 0), seasonal = list(order = c(1, 1, 0), period = 12), method = "ML") Coefficients: ar1 ar2 ar3 ar4 ar5 sar1 -0.8081 -0.8019 -0.4818 -0.3797 0.0003 -0.5165 s.e. NaN NaN NaN NaN 0.0180 NaN sigma^2 estimated as 0.9446: log likelihood = -195.94, aic = 403.89 Warning message: In sqrt(diag(x$var.coef)) : NaNs produced > ovf.ar.claims.mdl2 Call: arima(x = life.claims, order = c(4, 1, 1), seasonal = list(order = c(1, 1, 0), period = 12), method = "ML") Coefficients: ar1 ar2 ar3 ar4 ma1 sar1 -0.8094 -0.8040 -0.4831 -0.3840 0.0022 -0.5154 s.e. 0.2442 0.1934 0.1695 0.0936 0.2674 0.0777 sigma^2 estimated as 0.9446: log likelihood = -195.95, aic = 403.89 > ovf.ar.claims.mdl3 Call: arima(x = life.claims, order = c(4, 1, 0), seasonal = list(order = c(1, 1, 1), period = 12), method = "ML") Coefficients: ar1 ar2 ar3 ar4 sar1 sma1 -0.8221 -0.7300 -0.409 -0.3525 0.0021 -0.8517 s.e. 0.0804 0.1017 0.101 0.0843 0.1248 0.1454 sigma^2 estimated as 0.7622: log likelihood = -186.77, aic = 385.55 > ovf.ar.claims.mdl4 Call: arima(x = life.claims, order = c(4, 1, 0), seasonal = list(order = c(2, 1, 0), period = 12), method = "ML") Coefficients: ar1 ar2 ar3 ar4 sar1 sar2 -0.8197 -0.7676 -0.4217 -0.3655 -0.6431 -0.2514 s.e. 0.0801 0.0998 0.1029 0.0832 0.0912 0.0957 sigma^2 estimated as 0.8914: log likelihood = -192.68, aic = 397.36
  • 28. 28 Output-8: MMSE Forecasts and 95% Prediction Limits > year.claims = c(2013.666,2013.750,2013.833,2013.916,2014,2014.083,2014.166,2014.250,2014.333,2014.41 6,2014.500,2014.583,2014.666,2014.750,2014.833,2014.916,2015,2015.083,2015.166,2015.25 0,2015.333,2015.416,2015.500,2015.583,2015.666,2015.750,2015.833,2015.916) > > ar.claims.mdl2.predict <- predict(ar.claims.mdl2,n.ahead=length(year.claims)) > ar.claims.mdl2.predict $pred Jan Feb Mar Apr May Jun Jul 2013 2014 6.679601 8.134403 7.490016 9.272639 7.784994 7.401839 7.720315 2015 7.082731 8.435012 8.296457 10.359447 7.204893 7.328288 8.469284 Aug Sep Oct Nov Dec 2013 7.450880 6.357658 4.863679 7.086264 2014 8.431392 7.406520 7.014338 5.394386 6.434451 2015 8.270852 7.856935 7.096430 5.545508 7.195943 $se Jan Feb Mar Apr May Jun Jul 2013 2014 1.0635637 1.1539799 1.1841844 1.1963517 1.2389101 1.2691593 1.3057830 2015 1.6952197 1.7634241 1.8112046 1.8477683 1.8978117 1.9424720 1.9886551 Aug Sep Oct Nov Dec 2013 0.9719077 0.9897440 0.9905832 1.0411081 2014 1.3359653 1.5138960 1.5627015 1.5920884 1.6502904 2015 2.0319834 2.3237252 2.3908261 2.4325104 2.5191578 > > lower.pi<-ar.claims.mdl2.predict$pred-qnorm(0.975,0,1)*ar.claims.mdl2.predict$se > upper.pi<-ar.claims.mdl2.predict$pred+qnorm(0.975,0,1)*ar.claims.mdl2.predict$se > data.frame(Month=year.claims,lower.pi,upper.pi) Month lower.pi upper.pi 1 2013.666 5.5459762 9.355784 2 2013.750 4.4177953 8.297520 3 2013.833 2.9221717 6.805187 4 2013.916 5.0457294 9.126798 5 2014.000 4.5950547 8.764148 6 2014.083 5.8726436 10.396162 7 2014.166 5.1690574 9.810975 8 2014.250 6.9278325 11.617445 9 2014.333 5.3567750 10.213214 10 2014.416 4.9143321 9.889345 11 2014.500 5.1610276 10.279603 12 2014.583 5.8129479 11.049836 13 2014.666 4.4393384 10.373702 14 2014.750 3.9514997 10.077177 15 2014.833 2.2739502 8.514822 16 2014.916 3.1999414 9.668961 17 2015.000 3.7601613 10.405300 18 2015.083 4.9787643 11.891260 19 2015.166 4.7465612 11.846353 20 2015.250 6.7378882 13.981007 21 2015.333 3.4852503 10.924535 22 2015.416 3.5211132 11.135464 23 2015.500 4.5715917 12.366976 24 2015.583 4.2882379 12.253467 25 2015.666 3.3025176 12.411353 26 2015.750 2.4104964 11.782363 27 2015.833 0.7778751 10.313140 28 2015.916 2.2584848 12.133402