The document discusses using the programming language R for actuarial science applications. It presents R as a vector-based language suitable for working with life tables and performing actuarial calculations. Examples are given of how to model life contingencies like life expectancies, annuities, and insurance values using vectors and matrices in R. The document also discusses using R to fit prospective mortality models like the Lee-Carter model to data matrices.
It should be my presentation at R in Insurance CASS conference at 15th July 2013. It shows how R can be successfully applied to price life, non - life and reinsurance contracts.
With data analysis showing up in domains as varied as baseball, evidence-based medicine, predicting recidivism and child support lapses, judging wine quality, credit scoring, supermarket scanner data analysis, and “genius” recommendation engines, “business analytics” is part of the zeitgeist. This is a good moment for actuaries to remember that their discipline is arguably the first – and a quarter of a millennium old – example of business analytics at work. Today, the widespread availability of sophisticated open-source statistical computing and data visualization environments provides the actuarial profession with an unprecedented opportunity to deepen its expertise as well as broaden its horizons, living up to its potential as a profession of creative and flexible data scientists.
This session will include an overview of the R statistical computing environment as well as a sequence of brief case studies of actuarial analyses in R. Case studies will include examples from loss distribution analysis, ratemaking, loss reserving, and predictive modeling.
This presentation compares four tools for analysing the sentiment in the content of free-text survey responses concerning a healthcare information website. It was completed by Despo Georgiou as part of her internship at UXLabs (http://uxlabs.co.uk)
A Non Parametric Estimation Based Underwater Target ClassifierCSCJournals
Underwater noise sources constitute a prominent class of input signal in most underwater signal processing systems. The problem of identification of noise sources in the ocean is of great importance because of its numerous practical applications. In this paper, a methodology is presented for the detection and identification of underwater targets and noise sources based on non parametric indicators. The proposed system utilizes Cepstral coefficient analysis and the Kruskal-Wallis H statistic along with other statistical indicators like F-test statistic for the effective detection and classification of noise sources in the ocean. Simulation results for typical underwater noise data and the set of identified underwater targets are also presented in this paper.
A Study on Performance Analysis of Different Prediction Techniques in Predict...IJRES Journal
Time series data is a series of statistical data that is related to a specific instant or a specific time period. Here, the measurements are recorded on a regular basis such as monthly, quarterly and yearly. Most of the researchers have used one of the prediction techniques in prediction of time series data. But, they have not tested all prediction techniques on same data set. They have not even compared the performance of different prediction techniques on the same data set. In this research work, some well known prediction techniques have been applied in the same time series data set. The average error and residual analysis have been done for each and every applied technique. One technique has been selected based on the minimum average error and residual analysis among the all applied techniques. The residual analysis comprises of absolute residual, maximum residual, median of absolute residual, mean of absolute residual and standard deviation. To finalize the algorithm, same procedure has been applied on different time series data sets. Finally, one technique has been selected which has been given minimum error and minimum value of residual analysis in most cases.
Data fusion is the process of combining data from different sources to enhance the utility of the combined product. In remote sensing, input data sources are typically massive, noisy, and have different spatial supports and sampling characteristics. We take an inferential approach to this data fusion problem: we seek to infer a true but not directly observed spatial (or spatio-temporal) field from heterogeneous inputs. We use a statistical model to make these inferences, but like all models it is at least somewhat uncertain. In this talk, we will discuss our experiences with the impacts of these uncertainties and some potential ways addressing them.
A Combination of Wavelet Artificial Neural Networks Integrated with Bootstrap...IJERA Editor
In this paper, an iterative forecasting methodology for time series prediction that integrates wavelet de-noising
and decomposition with an Artificial Neural Network (ANN) and Bootstrap methods is put forward here.
Basically, a given time series to be forecasted is initially decomposed into trend and noise (wavelet) components
by using a wavelet de-noising algorithm. Both trend and noise components are then further decomposed by
means of a wavelet decomposition method producing orthonormal Wavelet Components (WCs) for each one.
Each WC is separately modelled through an ANN in order to provide both in-sample and out-of-sample
forecasts. At each time t, the respective forecasts of the WCs of the trend and noise components are simply
added to produce the in-sample and out-of-sample forecasts of the underlying time series. Finally, out-of-sample
predictive densities are empirically simulated by the Bootstrap sampler and the confidence intervals are then
yielded, considering some level of credibility. The proposed methodology, when applied to the well-known
Canadian lynx data that exhibit non-linearity and non-Gaussian properties, has outperformed other methods
traditionally used to forecast it.
Exploring temporal graph data with Python: a study on tensor decomposition o...André Panisson
Tensor decompositions have gained a steadily increasing popularity in data mining applications. Data sources from sensor networks and Internet-of-Things applications promise a wealth of interaction data that can be naturally represented as multidimensional structures such as tensors. For example, time-varying social networks collected from wearable proximity sensors can be represented as 3-way tensors. By representing this data as tensors, we can use tensor decomposition to extract community structures with their structural and temporal signatures.
The current standard framework for working with tensors, however, is Matlab. We will show how tensor decompositions can be carried out using Python, how to obtain latent components and how they can be interpreted, and what are some applications of this technique in the academy and industry. We will see a use case where a Python implementation of tensor decomposition is applied to a dataset that describes social interactions of people, collected using the SocioPatterns platform. This platform was deployed in different settings such as conferences, schools and hospitals, in order to support mathematical modelling and simulation of airborne infectious diseases. Tensor decomposition has been used in these scenarios to solve different types of problems: it can be used for data cleaning, where time-varying graph anomalies can be identified and removed from data; it can also be used to assess the impact of latent components in the spreading of a disease, and to devise intervention strategies that are able to reduce the number of infection cases in a school or hospital. These are just a few examples that show the potential of this technique in data mining and machine learning applications.
Talk at the modcov19 CNRS workshop, en France, to present our article COVID-19 pandemic control: balancing detection policy and lockdown intervention under ICU sustainability
1. Arthur CHARPENTIER - R in Actuarial Science - UvA actuarial seminar, January 2013
in Actuarial Science
a brief overview
Arthur Charpentier
charpentier.arthur@uqam.ca
http ://freakonometrics.hypotheses.org/
January 2013, Universiteit van Amsterdam
1
2. Arthur CHARPENTIER - R in Actuarial Science - UvA actuarial seminar, January 2013
Agenda
• Introduction to R
• Why R in actuarial science ?
◦ Actuarial science ?
◦ A vector-based language
◦ A large number of packages and libraries for predictive models
◦ Working with (large) databases in R
◦ A language to plot graphs
• Reproducibility issues
• Comparing R with other statistical softwares
◦ R in the insurance industry and amongst statistical researchers
◦ R versus MsExcel Matlab, SAS, SPSS, etc
◦ The R community
• Conclusion ( ?)
2
3. Arthur CHARPENTIER - R in Actuarial Science - UvA actuarial seminar, January 2013
R
“R (and S) is the ‘lingua franca’ of data analysis and
statistical computing, used in academia, climate research,
computer science, bioinformatics, pharmaceutical industry,
customer analytics, data mining, finance and by some
insurers. Apart from being stable, fast, always up-to-date
and very versatile, the chief advantage of R is that it is
available to everyone free of charge. It has extensive and
powerful graphics abilities, and is developing rapidly, being
the statistical tool of choice in many academic environments.”
3
4. Arthur CHARPENTIER - R in Actuarial Science - UvA actuarial seminar, January 2013
A brief history of R
R is based on the S statistical programming language developed by Joe
Chambers at Bell labs in the 80’s
R is an open-source implementation of the S language, developed by Robert
Gentlemn and Ross Ihaka
4
5. Arthur CHARPENTIER - R in Actuarial Science - UvA actuarial seminar, January 2013
actuarial science ?
• students in actuarial programs
• researchers in actuarial science
• actuaries in insurance companies
(or consulting firms, or financial institutions, etc)
Difficult to find a language/software interesting
for everyone...
5
6. Arthur CHARPENTIER - R in Actuarial Science - UvA actuarial seminar, January 2013
Using a vector-based language for life contingencies
A life table is a vector
> TD[39:52,] > TV[39:52,]
Age Lx Age Lx
39 38 95237 38 97753
40 39 94997 39 97648
41 40 94746 40 97534
42 41 94476 41 97413
43 42 94182 42 97282
44 43 93868 43 97138
45 44 93515 44 96981
46 45 93133 45 96810
47 46 92727 46 96622
48 47 92295 47 96424
49 48 91833 48 96218
50 49 91332 49 95995
51 50 90778 50 95752
52 51 90171 51 95488
6
7. Arthur CHARPENTIER - R in Actuarial Science - UvA actuarial seminar, January 2013
Using a vector-based language for life contingencies
If age x ∈ N∗ , define P = [k px ], and p[k,x] corresponds to k px .
The (curtate) expectation of life defined as
∞ ∞
ex = E(Kx ) = k · k|1 qx = k px
k=1 k=1
and we can compute e = [ex ] using
> life.exp = function(x){sum(p[1:nrow(p),x])}
> e = Vectorize(life.exp)(1:m)
The expected present value (or actuarial value) of a temporary life annuity-due is
n−1
1 − Ax:n
ax:n =
¨ ν k · k px =
1−ν
k=0
7
8. Arthur CHARPENTIER - R in Actuarial Science - UvA actuarial seminar, January 2013
Using a vector-based language for life contingencies
and we can define A = [¨x:n ] as
a
> for(j in 1:(m-1)){ adot[,j]<-cumsum(1/(1+i)^(0:(m-1))*c(1,p[1:(m-1),j])) }
Define similarly the expected present value of a term insurance
n−1
A1 =
x:n ν k+1 · k| qx
k=0
and the associated matrix A = [A1 ] as
x:n
> for(j in 1:(m-1)){ A[,j]<-cumsum(1/(1+i)^(1:m)*d[,j]) }
Remark : See also Giorgio Alfredo Spedicatos lifecontingencies package, and
functions pxt, Axn, Exn, etc.
8
9. Arthur CHARPENTIER - R in Actuarial Science - UvA actuarial seminar, January 2013
Using a matrix-based language for prospective life models
Life table L = [Lx ] is no longer a matrix (function of age x) but a matrix
L = [Lx,t ] function of the date t.
> t(DTF)[1:10,1:10]
1899 1900 1901 1902 1903 1904 1905 1906 1907 1908
0 64039 61635 56421 53321 52573 54947 50720 53734 47255 46997
1 12119 11293 10293 10616 10251 10514 9340 10262 10104 9517
2 6983 6091 5853 5734 5673 5494 5028 5232 4477 4094
3 4329 3953 3748 3654 3382 3283 3294 3262 2912 2721
4 3220 3063 2936 2710 2500 2360 2381 2505 2213 2078
5 2284 2149 2172 2020 1932 1770 1788 1782 1789 1751
6 1834 1836 1761 1651 1664 1433 1448 1517 1428 1328
7 1475 1534 1493 1420 1353 1228 1259 1250 1204 1108
8 1353 1358 1255 1229 1251 1169 1132 1134 1083 961
9 1175 1225 1154 1008 1089 981 1027 1025 957 885
Similarly, define the force of mortality matrix µ = [µx,t ]
9
10. Arthur CHARPENTIER - R in Actuarial Science - UvA actuarial seminar, January 2013
10
11. Arthur CHARPENTIER - R in Actuarial Science - UvA actuarial seminar, January 2013
Using a matrix-based language for prospective life models
Assume - as in Lee & Carter (1992) model - that
log µx,t = αx + βx · κt + εx,t ,
with some i.i.d. noise εx,t .
Package demography can be used to fit a Lee-Carter model,
> library(demography)
> MUH =matrix(DEATH$Male/EXPOSURE$Male,nL,nC)
> POPH=matrix(EXPOSURE$Male,nL,nC)
> BASEH <- demogdata(data=MUH, pop=POPH, ages=AGE, years=YEAR, type="mortality",
+ label="France", name="Hommes", lambda=1)
> RES=residuals(LCH,"pearson")
11
14. Arthur CHARPENTIER - R in Actuarial Science - UvA actuarial seminar, January 2013
Using a matrix-based language for prospective life models
One can consider more advanced functions to study mortality, e.g. bagplots, since
µx,t is a functional time series,
> library(rainbow)
> MUH=fts(x = AGE[1:90], y = log(MUH), xname = "Age",yname = "Log Mortality Rate")
> fboxplot(data = MUHF, plot.type = "functional", type = "bag")
> fboxplot(data = MUHF, plot.type = "bivariate", type = "bag")
Source : http ://robjhyndman.com/
14