This document discusses non-life insurance mathematics. It begins by showing a graph of claim frequency for motor insurance over time, noting seasonal variations. It then discusses Poisson regression models for attributing claim variation to observable variables. Finally, it discusses fitting parametric distributions like log-normal and Gamma distributions to historical claims data, including situations with censoring where only a lower claim bound is known. The goal is to model claims data to accurately price insurance policies.
Credit card fraud is a growing problem that affects card holders around the world. Fraud detection has been an interesting topic in machine learning. Nevertheless, current state of the art credit card fraud detection algorithms miss to include the real costs of credit card fraud as a measure to evaluate algorithms. In this paper a new comparison measure that realistically represents the monetary gains and losses due to fraud detection is proposed. Moreover, using the proposed cost measure a cost sensitive method based on Bayes minimum risk is presented. This method is compared with state of the art algorithms and shows improvements up to 23% measured by cost. The results of this paper are based on real life transactional data provided by a large European card processing company.
We review basic reserving methodologies for reserving general insurance like lag analysis and chain ladder. We then move forward to consider multiple stochastic loss reserving models in detail and show how they uncover more insights than basic reserving models.
Credit card fraud is a growing problem that affects card holders around the world. Fraud detection has been an interesting topic in machine learning. Nevertheless, current state of the art credit card fraud detection algorithms miss to include the real costs of credit card fraud as a measure to evaluate algorithms. In this paper a new comparison measure that realistically represents the monetary gains and losses due to fraud detection is proposed. Moreover, using the proposed cost measure a cost sensitive method based on Bayes minimum risk is presented. This method is compared with state of the art algorithms and shows improvements up to 23% measured by cost. The results of this paper are based on real life transactional data provided by a large European card processing company.
We review basic reserving methodologies for reserving general insurance like lag analysis and chain ladder. We then move forward to consider multiple stochastic loss reserving models in detail and show how they uncover more insights than basic reserving models.
In-spite of large volumes of Contingent Credit Lines (CCL) in all commercial banks, the paucity of Exposure at Default (EAD) models, unsuitability of external data and inconsistent internal data with partial draw-downs has been a major challenge for risk managers as well as regulators in for managing CCL portfolios. This current paper is an attempt to build an easy to implement, pragmatic and parsimonious yet accurate model to determine the exposure distribution of a CCL portfolio. Each of the credit line in a portfolio is modeled as a portfolio of large number of option instruments which can be exercised by the borrower, determining the level of usage. Using an algorithm similar to basic the CreditRisk+ and Fourier Transforms we arrive at a portfolio level probability distribution of usage. We perform a simulation experiment using data from Moody\'s Default Risk Service, historical draw-down rates estimated from the history of defaulted CCLs and a current rated portfolio of such.
Regression Analysis is simplified in this presentation. Starting with simple linear to multiple regression analysis, it covers all the statistics and interpretation of various diagnostic plots. Besides, how to verify regression assumptions and some advance concepts of choosing best models makes the slides more useful SAS program codes of two examples are also included.
As part of the OESON Data Science internship program OGTIP Oeson, I completed my first project. The goal of the project was to conduct a statistical analysis of the stock values of three well-known companies using Advanced Excel. I used descriptive statistics to analyze the data, created charts to visualize the trends and built regression models for each company.
The Asset Return - Funding Cost Paradox: The Case for LDINorman Ehrentreich
Presentation for the IQPC Pension Plan De-Risking Conference on November 9th and 10th in New York (preliminary draft)
Proves that lower returning LDI strategies can result in lower funding costs than higher returning, but more volatile equity strategies. Furthermore argues that this is most likely the standard case in reality.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
In-spite of large volumes of Contingent Credit Lines (CCL) in all commercial banks, the paucity of Exposure at Default (EAD) models, unsuitability of external data and inconsistent internal data with partial draw-downs has been a major challenge for risk managers as well as regulators in for managing CCL portfolios. This current paper is an attempt to build an easy to implement, pragmatic and parsimonious yet accurate model to determine the exposure distribution of a CCL portfolio. Each of the credit line in a portfolio is modeled as a portfolio of large number of option instruments which can be exercised by the borrower, determining the level of usage. Using an algorithm similar to basic the CreditRisk+ and Fourier Transforms we arrive at a portfolio level probability distribution of usage. We perform a simulation experiment using data from Moody\'s Default Risk Service, historical draw-down rates estimated from the history of defaulted CCLs and a current rated portfolio of such.
Regression Analysis is simplified in this presentation. Starting with simple linear to multiple regression analysis, it covers all the statistics and interpretation of various diagnostic plots. Besides, how to verify regression assumptions and some advance concepts of choosing best models makes the slides more useful SAS program codes of two examples are also included.
As part of the OESON Data Science internship program OGTIP Oeson, I completed my first project. The goal of the project was to conduct a statistical analysis of the stock values of three well-known companies using Advanced Excel. I used descriptive statistics to analyze the data, created charts to visualize the trends and built regression models for each company.
The Asset Return - Funding Cost Paradox: The Case for LDINorman Ehrentreich
Presentation for the IQPC Pension Plan De-Risking Conference on November 9th and 10th in New York (preliminary draft)
Proves that lower returning LDI strategies can result in lower funding costs than higher returning, but more volatile equity strategies. Furthermore argues that this is most likely the standard case in reality.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
2. Key ratios – claim frequency
2
0,00
5,00
10,00
15,00
20,00
25,00
30,00
35,00
2009
J
2009
M
2009
M
2009
J
2009
S
2009
N
2009
+
2010
F
2010
A
2010
J
2010
A
2010
O
2010
D
2011
J
2011
M
2011
M
2011
J
2011
S
2011
N
2011
+
2012
F
2012
A
2012
J
2012
A
2012
O
2012
D
Claimfrequency all covers motor
•The graph shows claim frequency for all covers for motor insurance
•Notice seasonal variations, due to changing weather condition throughout the years
3. The model (Section 8.4)
3
•The idea is to attribute variation in to variations in a set of observable
variables x1,...,xv. Poisson regressjon makes use of relationships of the form
v
v x
b
x
b
b
...
)
log( 1
1
0
•Why and not itself?
•The expected number of claims is non-negative, where as the predictor on the
right of (1.12) can be anything on the real line
•It makes more sense to transform so that the left and right side of (1.12)
are more in line with each other.
•Historical data are of the following form
•n1 T1 x11...x1x
•n2 T2 x21...x2x
•nn Tn xn1...xnv
•The coefficients b0,...,bv are usually determined by likelihood estimation
)
log(
(1.12)
Claims exposure covariates
4. Non-life insurance from a financial perspective:
for a premium an insurance company commits itself to pay a sum if an event has occured
Introduction to reserving
4
Contract period
Policy holder
signs up for an
insurance
Policy holder
pays premium.
Insurance company
starts to earn
premium
During the duration of the policy, claims might or might not occur:
• How do we measure the number and size of unknown claims?
• How do we know if the reserves on known claims are sufficient?
During the duration of the policy, some of
the premium is earned, some is unearned
• How much premium is earned?
• How much premium is unearned?
• Is the unearned premium sufficient?
Accident
date
Reporting
date
Claims
payments
Claims close Claims
reopening
Claims
payments
Claims close
Premium reserve, prospective
Claims
reserve,
retrospective
prospective
retrospective
5. There are three effects that influence the best estimat
and the uncertainty:
•Payment pattern
•RBNS movements
•Reporting pattern
Up to recently the industry has based model on
payment triangles:
What will the future payments amount to?
Imagine you want to build a
reserve risk model
5
Year Period + 0 Period + 1 Period + 2 Period + 3 Period + 4
2008 7 008 148 25 877 313 31 723 256 32 718 766 33 019 648
2009 30 105 220 65 758 082 76 744 305 79 560 296
2010 89 181 138 171 787 015 201 380 709
2011 109 818 684 198 015 728
2012 97 250 541
?
6. Overview
6
Important issues Models treated Curriculum
Duration (in
lectures)
What is driving the result of a non-
life insurance company? insurance economics models Lecture notes 0,5
How is claim frequency modelled?
Poisson, Compound Poisson
and Poisson regression Section 8.2-4 EB 1,5
How can claims reserving be
modelled?
Chain ladder, Bernhuetter
Ferguson, Cape Cod, Note by Patrick Dahl 2
How can claim size be modelled?
Gamma distribution, log-
normal distribution Chapter 9 EB 2
How are insurance policies
priced?
Generalized Linear models,
estimation, testing and
modelling. CRM models. Chapter 10 EB 2
Credibility theory Buhlmann Straub Chapter 10 EB 1
Reinsurance Chapter 10 EB 1
Solvency Chapter 10 EB 1
Repetition 1
7. The ultimate goal for calculating
the pure premium is pricing
7
claims
of
number
amount
claim
total
severity
Claim
years
policy
of
number
claims
of
number
frequency
Claim
Pure premium = Claim frequency x claim severity
Parametric and non parametric modelling (section 9.2 EB)
The log-normal and Gamma families (section 9.3 EB)
The Pareto families (section 9.4 EB)
Extreme value methods (section 9.5 EB)
Searching for the model (section 9.6 EB)
8. Claim severity modelling is about
describing the variation in claim size
8
0
100
200
300
400
500
600
700
0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 110000 120000 130000 140000 150000
Frequency
Bin
Claim size fire
• The graph below shows how claim size varies for fire claims for houses
• The graph shows data up to the 88th percentile
• How do we handle «typical claims» ? (claims that occur regurlarly)
• How do we handle large claims? (claims that occur rarely)
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
9. Claim severity modelling is about
describing the variation in claim size
9
0
1000
2000
3000
4000
5000
6000
10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 110000 120000 130000 140000 150000
Frequency
Bin
Claim size water
• The graph below shows how claim size varies for water claims for houses
• The graph shows data up to the 97th percentile
• The shape of fire claims and water claims seem to be quite different
• What does this suggest about the drivers of fire claims and water claims?
• Any implications for pricing?
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
10. • Claim size modelling can be parametric through families of distributions such as
the Gamma, log-normal or Pareto with parameters tuned to historical data
• Claim size modelling can also be non-parametric where each claim zi of the
past is assigned a probability 1/n of re-appearing in the future
• A new claim is then envisaged as a random variable for which
• This is an entirely proper probability distribution
• It is known as the empirical distribution and will be useful in Section 9.5.
The ultimate goal for calculating the
pure premium is pricing
10
n
1,...,
i
,
1
)
ˆ
Pr(
n
z
Z i
Ẑ
Size of claim
Client behavour can affect outcome
• Burglar alarm
• Tidy ship (maintenance etc)
• Garage for the car
Bad luck
• Electric failure
• Catastrophes
• House fires
Where do we draw
the line?
Here we sample from the
empirical distribution
Here we use special
Techniques (section 9.5)
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
11. Example
11
0
20
40
60
80
100
120
-1 000 000 - 1 000 000 2 000 000 3 000 000 4 000 000 5 000 000
80 45 000
81 45 301
82 48 260
83 50 000
84 52 580
85 56 126
86 60 000
87 64 219
88 69 571
89 74 604
90 80 000
91 85 998
92 95 258
93 100 000
94 112 767
95 134 994
96 159 646
97 200 329
98 286 373
99 500 000
99,1 602 717
99,2 662 378
99,3 810 787
99,4 940 886
99,5 1 386 840
99,6 2 133 580
99,7 2 999 062
99,8 3 612 031
99,9 4 600 301
100 8 876 390
Empirical
distribution
• The threshold may be set for example at the 99th
percentile, i.e., 500 000 NOK for this product
• The threshold is sometimes called the large
claims threshold
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
12. Scale families of distributions
12
• All sensible parametric models for claim size are of the form
• and Z0 is a standardized random variable corresponding to .
• This proportionality os inherited by expectations, standard deviations and
percentiles; i.e. if are expectation, standard devation and
-percentile for Z0, then the same quantities for Z are
• The parameter can represent for example the exchange rate.
• The effect of passing from one currency to another does not change the shape
of the density function (if the condition above is satisfied)
• In statistics is known as a parameter of scale
• Assume the log-normal model where and are
parameters and . Then . Assume we
rephrase the model as
• Then
parameter
a
is
0
where
,
0
Z
Z
1
0
0
0 and
, q
0
0
0 q
q
and
,
)
exp(
Z
)
1
,
0
(
~ N
)
)
2
/
1
(
exp(
)
( 2
Z
E
)
)
2
/
1
(
exp(
and
)
)
2
/
1
(
exp(
where
, 2
2
0
0
Z
Z
Z
1
)
)
2
/
1
(
)
2
/
1
(
exp(
)}
{exp(
)
)
2
/
1
(
exp(
)}
)
2
/
1
(
{exp(
)}
)
2
/
1
(
{exp(
2
2
2
2
2
0
E
E
E
EZ
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
13. • Models for scale families satisfy
where are the distribution functions of Z and Z0.
• Differentiating with respect to z yields the family of density functions
• The standard way of fitting such models is through likelihood estimation. If
z1,…,zn are the historical claims, the criterion becomes
which is to be maximized with respect to and other parameters.
• A useful extension covers situations with censoring.
• Perhaps the situation where the actual loss is only given as some lower bound
b is most frequent.
• Example:
• travel insurance. Expenses by loss of tickets (travel documents) and
passport are covered up to 10 000 NOK if the loss is not covered by any
of the other clauses.
Fitting a scale family
13
)
(z/
F
)
|
F(z
or
)
/
Pr(
)
Pr( 0
0
z
Z
z
Z
)
(z/
F
and
)
|
F(z 0
dz
z
dF
z
f
z
f
z
f
)
(
)
|
(
where
0
z
),
(
1
)
|
( 0
0
0
},
)
/
(
log{
)
log(
)
,
(
1
0
0
n
i
i
z
f
n
f
L
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
14. • The chance of a claim Z exceeding b is , and for nb such events
with lower bounds b1,…,bnb the analogous joint probability becomes
Take the logarithm of this product and add it to the log likelihood of the fully
observed claims z1,…,zn. The criterion then becomes
Fitting a scale family
14
)}.
/
(
1
{
...
)}
/
(
1
{ 0
1
0
b
n
b
F
x
x
b
F
)
/
(
1 0
b
F
},
)
/
(
log{
}
)
/
(
log{
)
log(
)
,
(
1
0
1
0
0
b
n
i
i
n
i
i z
f
z
f
n
f
L
complete information censoring to the right
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
15. • The distribution of a claim may start at some treshold b instead of the origin.
• Obvious examples are deductibles and re-insurance contracts.
• Models can be constructed by adding b to variables starting at the origin; i.e.
where Z0 is a standardized variable as before. Now
and differentiation with respect to z yields
which is the density function of random variables with b as a lower limit.
Shifted distributions
15
)
Pr(
)
Pr(
)
Pr( 0
0
b
z
Z
z
Z
b
z
Z
b
z
b
z
f
z
f
),
(
1
)
|
( 0
0
Z
b
Z
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
16. • A major issue with claim size modelling is asymmetry and the right tail of the
distribution. A simple summary is the coefficient of skewness
• The numerator is the third order moment. Skewness should not depend on
currency and doesn’t since
• Skewness is often used as a simplified measure of shape
• The standard estimate of the skewness coefficient from observations
z1,…,zn is
Skewness as simple description of shape
16
)
(
)
(
)
(
)
(
)
(
)
( 0
3
0
3
0
0
3
0
3
0
0
3
3
Z
skew
Z
E
Z
E
Z
E
Z
skew
n
i
i z
z
n
n
s 1
3
3
3
3
)
(
/
2
3
1
ˆ
where
ˆ
ˆ
3
3
3
3
)
(
where
)
(
Z
E
Z
skew
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
17. • The random variable that attaches probabilities 1/n to all claims zi of the
past is a possible model for future claims.
• Expectation, standard deviation, skewness and percentiles are all closely
related to the ordinary sample versions. For example
• Furthermore,
• Third order moment and skewness becomes
• Skewness tends to be small
• No simulated claim can be largeer than what has been observed in the past
• These drawbacks imply underestimation of risk
n
i
i
i
n
i
i
n
i
i
z
z
s
n
n
Z
sd
z
z
n
z
z
z
Z
Z
E
Z
E
Z
1
2
2
1
2
1
2
)
(
1
-
n
1
s
,
1
)
ˆ
(
)
(
1
)
)(
ˆ
Pr(
))
ˆ
(
ˆ
(
)
ˆ
var(
Non-parametric estimation
17
.
1
)
ˆ
Pr(
)
ˆ
(
1
1
z
z
n
z
z
Z
Z
E i
n
i
i
n
i
i
3
3
1
3
3
)}
ˆ
(
{
)
ˆ
(
ˆ
)
Ẑ
skew(
and
)
(
1
)
ˆ
(
ˆ
Z
sd
Z
z
z
n
Z
n
i
i
Ẑ
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
20. • A convenient definition of the log-normal model in the present context is
as where
• Mean, standard deviation and skewness are
see section 2.4.
• Parameter estimation is usually carried out by noting that logarithms are
Gaussian. Thus
and when the original log-normal observations z1,…,zn are transformed to
Gaussian ones through y1=log(z1),…,yn=log(zn) with sample mean and
variance , the estimates of become
2
2
/
1
)
log(
)
log(Z
Y
The log-normal family
20
1
1 2
2
2
)
2
(
)
(
,
sd(Z)
,
)
(
e
e
Z
skew
e
Z
E
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
0
Z
Z
)
1
,
0
(
~
for
2
/
0
2
N
e
Z
.
ˆ
,
ˆ
or
ˆ
,
ˆ
2
/
1
)
ˆ
log(
y
/2
2
2
y
s
y s
e
s
y y
y
s
y and
and
21. Log-normal sampling (Algoritm 2.5)
21
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
0
10
20
30
40
50
60
70
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9 2 2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2,9 3
Frequency
Bin
Lognormal ksi = -0.05 and sigma = 1
1. Input:
2. Draw
3. Return
,
)
(
and
~ *
1
*
*
U
uniform
U
*
e
Z
22. The lognormal family
22
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
0
10
20
30
40
50
60
70
80
0,9
0,91
0,92
0,93
0,94
0,95
0,96
0,97
0,98
0,99
1
1,01
1,02
1,03
1,04
1,05
1,06
1,07
1,08
1,09
1,1
1,11
1,12
1,13
1,14
1,15
Frequency
Bin
Lognormal ksi = 0.005 and sigma = 0.05
• Different choice of ksi and sigma
• The shape depends heavily on sigma and is highly skewed when sigma is not
too close to zero
23. • The Gamma family is an important family for which the density function is
• It was defined in Section 2.5 as is the
standard Gamma with mean one and shape alpha. The density of the standard
Gamma simplifies to
Mean, standard deviation and skewness are
and there is a convolution property. Suppose G1,…,Gn are independent with
. Then
The Gamma family
23
dx
e
x
x
e
x
x
f x
x
0
1
/
1
)
(
where
,
0
,
)
(
)
/
(
)
(
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
2/
skew(Z)
,
/
sd(Z)
,
)
(
Z
E
dx
e
x
x
e
x
x
f x
x
0
1
1
)
(
where
,
0
,
)
(
)
(
)
Gamma(
~
G
where
G
Z
)
(
~ i
i Gamma
G
n
n
n
n
G
G
G
Gamma
G
...
...
if
)
...
(
~
1
1
1
1
25. Example: car insurance
• Hull coverage (i.e., damages on own vehicle in
a collision or other sudden and unforeseen
damage)
• Time period for parameter estimation: 2 years
• Covariates:
– Driving length
– Car age
– Region of car owner
– Tariff class
– Bonus of insured vehicle
• 2 models are tested and compared – Gamma
and lognormal
25
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
26. Comparisons of Gamma and lognormal
• The models are compared with respect to fit,
results, validation of model, type 3 analysis and QQ
plots
• Fit: ordinary fit measures are compared
• Results: parameter estimates of the models are
compared
• Validation of model: the data material is split in two,
independent groups. The model is calibrated (i.e.,
estimated) on one half and validated on the other
half
• Type 3 analysis of effects: Does the fit of the model
improve significantly by including the specific
variable?
26
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
27. Comparison of Gamma and
lognormal - fit
27
Criterion Deg. fr. Verdi Value/DF
Deviance 546 12 926,1628 23,6743
Scaled
Deviance 546 669,2070 1,2257
Pearson Chi-
Square 546 7 390,8283 13,5363
Scaled
Pearson X2 546 382,6344 0,7008
Log Likelihood _ - 5 278,7043 _
Full Log
Likelihood _ - 5 278,7043 _
AIC (smaller is
better) _ 10 595,4086 _
AICC (smaller
is better) _ 10 596,8057 _
BIC (smaller is
better) _ 10 677,7747 _
Criterion Deg. fr. Verdi Value/DF
Deviance 2 814 119 523,2128 42,4745
Scaled
Deviance 2 814 2 838,0000 1,0085
Pearson Chi-
Square 2 814 119 523,2128 42,4745
Scaled
Pearson X2 2 814 2 838,0000 1,0085
Log Likelihood _ - 7 145,8679 _
Full Log
Likelihood _ - 7 145,8679 _
AIC (smaller is
better) _ 14 341,7357 _
AICC (smaller
is better) _ 14 342,1980 _
BIC (smaller is
better) _ 14 490,5071 _
Gamma fit
Lognormal fit
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
28. Comparison of Gamma
and lognormal – type 3
28
Gamma fit Lognormal fit
Source Deg. fr. Chi-square Pr>Chi-sq Method
Tariff class 5 70,75 <.0001 LR
Bonus 2 19,32 <.0001 LR
Region 7 20,15 0,0053 LR
Car age 3 342,49 <.0001 LR
Source Deg. fr. Chi-square Pr>Chi-sq Method
Tariff class 5 51,75 <.0001 LR
Bonus 2 177,74 <.0001 LR
Region 7 48,14 <.0001 LR
Driving length 6 70,18 <.0001 LR
Car age 3 939,46 <.0001 LR
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
29. QQ plot Gamma model
29
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
30. QQ plot log normal model
30
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
31. 31
0,0 %
50,0 %
100,0 %
150,0 %
200,0 %
250,0 %
0
10 000
20 000
30 000
40 000
50 000
60 000
70 000
1 2 3 4 5 6
Results tariff class
Risk years
Difference from reference,
gamma model
Difference from reference,
lognormal model
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
32. 32
0,0 %
20,0 %
40,0 %
60,0 %
80,0 %
100,0 %
120,0 %
0
20 000
40 000
60 000
80 000
100 000
120 000
140 000
160 000
70,00 % 75,00 % Under 70%
Results bonus
Risk years
Difference from reference,
gamma model
Difference from reference,
lognormal model
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
33. 33
0,0 %
20,0 %
40,0 %
60,0 %
80,0 %
100,0 %
120,0 %
140,0 %
0
10 000
20 000
30 000
40 000
50 000
60 000
Results region
Risk years
Difference from reference,
gamma model
Difference from reference,
lognormal model
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
34. 34
0,0 %
20,0 %
40,0 %
60,0 %
80,0 %
100,0 %
120,0 %
0
20 000
40 000
60 000
80 000
100 000
120 000
<= 5 years 5-10 years 10-15
years
>15 years
Results car age
Risk years
Difference from reference,
gamma model
Difference from reference,
lognormal model
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
36. 36
0,00
10,00
20,00
30,00
40,00
50,00
60,00
70,00
80,00
90,00
100,00
Total <= 5 years 5-10years 10-15years >15 years
Validation car age
0,00
10,00
20,00
30,00
40,00
50,00
60,00
70,00
Total 1 2 3 4 5 6
Validation tariff class
Difference Gamma
Difference lognormal
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
37. Conclusions so far
• None of the models seem to be perfect
• Lognormal behaves worst and can be
discarded
• Can we do better?
• We try Gamma once more, now exluding
the 0 claims (about 17% of the claims)
– Claims where the policy holder has no guilt
(other party is to blame)
37
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
38. Comparison of Gamma
and lognormal - fit
38
Criterion Deg. fr. Verdi Value/DF
Deviance 546 12 926,1628 23,6743
Scaled
Deviance 546 669,2070 1,2257
Pearson Chi-
Square 546 7 390,8283 13,5363
Scaled
Pearson X2 546 382,6344 0,7008
Log Likelihood _ - 5 278,7043 _
Full Log
Likelihood _ - 5 278,7043 _
AIC (smaller is
better) _ 10 595,4086 _
AICC (smaller
is better) _ 10 596,8057 _
BIC (smaller is
better) _ 10 677,7747 _
Gamma fit
Gamma without zero claims fit
Criterion Deg. fr. Verdi Value/DF
Deviance 494 968,9122 1,9614
Scaled
Deviance 494 546,4377 1,1061
Pearson Chi-
Square 494 949,1305 1,9213
Scaled
Pearson X2 494 535,2814 1,0836
Log Likelihood _ - 5 399,8298 _
Full Log
Likelihood _ - 5 399,8298 _
AIC (smaller is
better) _ 10 837,6596 _
AICC (smaller
is better) _ 10 839,2043 _
BIC (smaller is
better) _ 10 918,1877 _
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
39. Comparison of Gamma
and lognormal – type 3
39
Gamma fit Gamma without zero claims fit
Source Deg. fr. Chi-square Pr>Chi-sq Method
Tariff class 5 70,75 <.0001 LR
Bonus 2 19,32 <.0001 LR
Region 7 20,15 0,0053 LR
Car age 3 342,49 <.0001 LR
Source Deg. fr. Chi-square Pr>Chi-sq Method
BandCode1 5 101,22 <.0001 LR
CurrNCD_Cd 2 43,04 <.0001 LR
KundeFylkeNav
n 7 48,08 <.0001 LR
Side1Verdi6 3 70,76 <.0001 LR
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
40. QQ plot Gamma
40
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
41. QQ plot Gamma model
without zero claims
41
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
42. 42
0,0 %
50,0 %
100,0 %
150,0 %
200,0 %
250,0 %
0
10 000
20 000
30 000
40 000
50 000
60 000
70 000
1 2 3 4 5 6
Results tariff class
Risk years
Difference from reference,
gamma model
Difference from reference,
Gamma model without zero
claims
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
43. 43
0,0 %
20,0 %
40,0 %
60,0 %
80,0 %
100,0 %
120,0 %
140,0 %
0
20 000
40 000
60 000
80 000
100 000
120 000
140 000
160 000
70,00 % 75,00 % Under 70%
Results bonus
Risk years
Difference from reference,
gamma model
Difference from reference,
Gamma model without zero
claims
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
44. 44
0,0 %
20,0 %
40,0 %
60,0 %
80,0 %
100,0 %
120,0 %
140,0 %
0
10 000
20 000
30 000
40 000
50 000
60 000
Results region
Risk years
Difference from reference,
gamma model
Difference from reference,
Gamma model without zero
claims
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching
45. 45
0,0 %
20,0 %
40,0 %
60,0 %
80,0 %
100,0 %
120,0 %
0
20 000
40 000
60 000
80 000
100 000
120 000
<= 5 years 5-10 years 10-15
years
>15 years
Results car age
Risk years
Difference from reference,
gamma model
Difference from reference,
Gamma model without zero
claims
Non parametric
Log-normal, Gamma
The Pareto
Extreme value
Searching