Projecting mortality based on population data is easy… or is it? Are there artifacts in the data which distort our analysis and how significant are they? By using international case studies, we look into the impact of a series of “one-off” events including wars, mass migrations, seasonality and political upheaval. How certain are we that the composition of the past population is representative of the future population and what does this mean for our projections?
2. Disclaimer
15 April 2015 2
The following presentation is for general
information, education and discussion purposes
only.
Views or opinions expressed, whether oral or in
writing do not necessarily reflect those of
PartnerRe nor do they constitute legal or
professional advice.
19. 15 April 2015 19
Russia Smoothed
Source of Data: Human Mortality Database. University of California, Berkeley (USA), and Max Planck Institute for
Demographic Research (Germany). Available at www.mortality.org or www.humanmortality.de.
PartnerRe Own Calculations
20. 15 April 2015 20
Russia Smoothed
Source of Data: Human Mortality Database. University of California, Berkeley (USA), and Max Planck Institute for
Demographic Research (Germany). Available at www.mortality.org or www.humanmortality.de.
PartnerRe Own Calculations
21. 15 April 2015 21
Russia – No Smoothing
Source: HMD (own calculations)
23. UK – No Smoothing
15 April 2015 23
Source: HMD (own calculations)
24. UK – No Smoothing
15 April 2015 24
See Andrew Cairns’ talk and
paper for more on this
Source: HMD (own calculations)
25. Longevity Portfolio
• Over 1 million lives
• Gender differentiated
• Full date of birth available
How are dates of birth distributed?
15 April 2015 25
36. U.S. Example
• Consider data from the Centers for Disease Control and
Prevention (CDC)
• Gender differentiated
• Individual Age
• Calendar Years 1999 – 2011
• … also includes ethnic origin
15 April 2015 40
48. Example 1
• Residents of Dog Land migrate to Cat Land
• 1% population growth
– Per annum
– Over 4 years
– Over the age range 30 – 50
• Dogs don’t trend to local mortality experience
15 April 2015 52
55. Example 2
• Residents of Dog Land migrate to Cat Land
• 1% population growth
– Per annum
– Over 4 years
– Over the age range 30 – 50
• Dogs take on cat characteristics over 10 years
15 April 2015 59
63. Low Immigrant Mortality or Data Artifact?
• Low mortality for most immigrant groups compared to
natives in the host country1
• Often attributed to beneficial health selection processes
• Could it be data artifacts?
• Explored in recent paper by Wallace and Kulu2
15 April 2015 67
1NZ - Hajat et al., 2010, U.S. - Abraido-Lanza et al., 1999; Palloni and Arias, 2004
2Wallace and Kulu, 2014
64. Potential Data Artifacts
15 April 2015 68
Misreporting
of Age
Misclassifying
Nationality
Emigrations
under-
registered
Deaths
undercounted
Mobility Bias
65. Potential Data Artifacts
15 April 2015 69
Misreporting
of Age
Misclassifying
Nationality
Emigrations
under-
registered
Deaths
undercounted
Mobility Bias
“This study supports low mortality
among immigrants … results are not a
data artefact.” (Wallace and Kulu)
90. Seasonality – conventional approach
15 April 2015 94
• Seasonality normally ignored
• Deaths by calendar year
• Mid-year population estimates
Interval Average
improvement in
mortality
Standard
deviation in
annual
improvement
12 months ending
December
2.3% 2.3%
Ten year time series
91. Seasonality – choice of interval
15 April 2015 95
Source: ONS data & own calculations
Conventional method
Vol = 2.3%
12 months ending 30 September
Vol = 1.5%
92. Seasonality – choice of interval
15 April 2015 96
Interval Average
improvement
Standard deviation in
annual improvement
12 months ending
December
2.3% 2.3%
12 months ending
September
2.3% 1.5%
• Calendar year interval overstates underlying volatility
• Stochastic models - Lee Carter, CBD, RH etc
Source: ONS data & own calculations
96. 15 April 2015 100
Expressions of individual views by members of the Institute and
Faculty of Actuaries and its staff are encouraged.
The views expressed in this presentation are those of the
presenter.
Questions Comments
97. Artifacts in Population Data
Nick Owen nicholas.owen@partnerre.com
Chris Reynolds christopher.reynolds@partnerre.com
15 April 2015
Editor's Notes
Standard company disclaimer.
Own view not necessarily reflect those of PartnerRe.
Act upon anything we say today at your own risk.
What’s this? The Hawthorne Tower in Illinois.
The Hawthorne Works was a large factory complex built by Western Electric starting in 1905.
Height of its operations it had 45,000 employees. 1980s replaced with shopping center.
From 1924-32 a series of experiments. Not sinister – productivity.
2 groups of workers used as guinea pigs.
Lighting in work area for one group was improved, other unchanged.
Productivity of more highly illuminated workers increased more than control group.
Other changes made to working conditions (working hours, rest breaks, etc.), - in all cases their productivity improved when a change was made.
Productivity even improved when the lights were dimmed again.
When everything returned to how it was productivity at the factory was at its highest level. Absenteeism had plummeted.
It was suggested that productivity gain occurred as a result of the motivational effect on the workers of the interest being shown in them. Hawthorne Effect - “people behave differently when they are being watched”.
At least some part of what is observed is an artifact of the process of observation
Chris/Nick. Today we‘re going to speak about artifacts in population data.
So what is an artifact?
Not the kind of thing this Indiana Jones lookalike might seek - object of cultural or historical interest.
Something observed in a scientific investigation or experiment that is not naturally present but occurs as a result of the preparative or investigative procedure.
Do we need to worry about artifacts in population data?
Census – gold standard for the population – can be affected by participation bias.
Late 80s early 90s – controversial ‘poll tax’ introduced in the UK.
Large numbers of young men attempted to become invisible to the poll tax register – avoid paying tax.
They dropped off electoral rolls.
Population estimates produced by the ONS – when results of 1991 Census were known there were about a million fewer men of younger working age counted than had been estimated.
So what will we speak about today:
We’ll start with a very brief introduction on how one calculates improvement rates and projects them forward.
Benefits and dangers of smoothing
Basis risk
The impact of inflows or outflows from a population
Quick international comparison – can we learn anything from the experience seen in other countries
Impact of external factors on mortality. How these factors and choices we make can impact the volatility of our experience.
… or the beginner’s guide to mortality projections.
The crude death rate is simply the number of deaths divided by the exposure to risk.
Choose your preferred smoothing approach and iron out the bumps.
Perhaps p-spline, Gaussian smoothing, or in my case I‘m using the Phillips EasySpeed to smooth the rates.
Once these smooth rates have been determined, annualised rates of mortality imporvement can be calculated
Now comes the actuarial judgement ... otherwise known as crystal ball gazing.
Variety of projection techniques exist for forecasting future mortality rates.
These could be the CMI, Lee Carter or CBD models.
One thing in common ... they will all be wrong, but some less wrong then others.
Often illustrate mortality improvements on a heat map.
On the vertical axis I have age.
On the horizontal axis I have calendar year.
Hotter colours represent a higher level of mortality improvement.
Cooler colours represent a lower level of mortality improvement (or even deterioration)
Move along the diagonal we can follow a particular cohort of lives.
Effects common to a cohort would be referred to as cohort effects.
May have effects which are time or period dependent - not related to a particular cohort
We call these period effects
Already we start to see potential complications
Denominator in our crude mortality rates is exposure to risk.
Don‘t typically have this for populations.
We approximate this by the mid year population.
In come countries this isn‘t available so we approximate this by taking the average of the start and end year populations.
What assumptions are we making here?
Births uniformly distributed across the year – is this valid?
Discuss more in the next section on smoothing.
Why smooth?
On the one hand smoothing removes noise and enables data features to be more easily identified.
On the other hand, smoothing may smooth over real features.
Consider this heat map for Russia.
This uses data from the human mortality database and the calculations are performed using our own proprietary tools.
Here we see some period effects.
1992
Fall of Soviet Union in 1991.
1992 - Russia experienced a serious economic crises
Drop in personal income and rapid impoverishment.
Adverse economic changes were followed by mortality increase.
Looks like prolonged period of mortality deterioration ~ 15% per annum.
August 1998 - Russia experienced another economic crisis resulting in mass impoverishment. This can’t be as clearly seen on the heat map.
Turn off smoothing – now we see 2 things:
1992 - Impact of 3 gradual shift @10% p.a. vs instant shift @ 33%.
Maybe not significant? Either way, impact is non-zero.
1998 - impact of the 1998 banking crisis can be more clearly seen.
Smoothing had smoothed over the effect.
UK – see some strong “cohort” features.
No apparent issues or concerns here
Remove the smoothing.
Now it seems we may have some data “problems”
In particular we see this 1919 cohort
Exhibits poorer experience
Long thought to be the impact of Spanish Flu – development of fetus in the womb.
Turns out it’s probably a distortion in the distribution of births leading to an underestimate of the exposure and subsequently higher mortality rates.
See Andrew Cairns’ paper and talk for far more on this point.
Points to note
When you’re smoothing you’re fitting a model
Before you fit a model you want to understand the underlying data
After fitting a model you need to check the fit of your model, typically by performing an analysis of the residuals.
Considered a subset of our longevity portfolio.
Taken 1 million lives.
Gender differentiated, but now the full data of birth is available.
We ask ourselves “How are the dates of birth distributed?”
On this chart we have the year of birth on the horizontal axis
On the vertical axis we show the average day of birth from the start of the year.
So if births are uniformly distributed we’d expected to see the average day of birth sit on this yellow line (182.5 days)
This is what we see.
We’ve only plotted points where we have at least 500 lives.
Which is the case for all years of birth in this range
Number of features
Average day of birth for 1919 is much higher than the years either side. This coincides with the end of the first world war (Nov 1918).
Not just the first world war.
See a similar impact with the 2nd world war which ended in Sep 1945.
Mid year population in 1919 would be the sum of births over the last 12 months. This would underestimate the true exposure for 1919.
But there’s another observation ….
See a downwards trend.
Expected a downwards trend due to survivor bias.
Namely lives born earlier in the year will be older and so it’s less likely they will survive to the start of the contract than those born later in the year.
This effect will be more pronounced the further back we go … hence the downwards trend.
Doesn’t explain why this side of the chart is consistently below the yellow line.
Hypothesised that it may be influenced by the availability of cheap summer holidays leading to a spike in summer conceptions.
Not sure as these are pre 1960 results.
As we get closer to the portfolio date there will be fewer and fewer pensioners.
This increasing downwards trend leads to a fall in the average day of birth.
But it’s not a real effect, it’s a consequence of the way we’ve collected data.
Once again you need to understand the nature and source of your underlying data.
The data you’re using won't have been collected with your purpose in mind. Be consious that this could introduce artifacts into the data.
Here we consider Australia from 1950s through to early 21st century.
See a cohort effect
Possible period effect in the 1960s?
Remove smoothing
Something strange in this left hand corner
Crude data looks to be smoothed
Go further back in time
Notice that the feature persists.
Answer is simple if you see how data in Australia is collected.
Data for years before 1964 are available only in 5 year age groups. Hence the smoother nature of the crude rates
But other “artifacts”
Until mid 1960s the indigenous population was excluded from the federal censuses and population estimates
Aboriginal births and deaths have only been included in the country’s vital statistics since 1966
Our next section relates to basis risk.
Why population mortality?
Insurance portfolios often considered too small to discern credible trends.
Population is often considered only credible sized dataset on which to derive trends.
So we use population trends for insurance/pension blocks.
But what implicit assumptions are we making?
1) Assuming the insured portfolio is a representative subset of the population
2) Insurance portfolio is often a “fixed” portfolio of lives. Assume the population is stable or that the insurance portfolio is subject to the same cohort effects (e.g. migration)?
Illustrate basis risk with an example from the US.
Take population data from the CDC.
Gender differentiated – individual age – calendar years 1999 to 2011
But … also includes ethnic origin.
Based on this data the US population breaks down into the following ethnic types:
Just over 80% are classified as white.
Next we have Black or African American
Followed by Asian or Pacific Islander
And then American Indian or Alaska Native.
Calculated age standardised rate
Split by Calendar Year and Gender
Using the WHO standard population to standardise the mortality rate
Very crudely we see improvement rates of
2.2% per annum for males
1.9% per annum for females.
We’d then take these rates as our starting rate of improvement and project into the future.
Here we’ve broken down mortality by ethnic type
At this point we’ve not standardised
… watch how sensitive the results are to standardisation
Standardised rates have moved considerably.
For male Black or African Americans we see an improvement rate of 2.5% per annum
For females it’s 2.2% p.a.
Considering those classified as white we see 1.5% and 1.1% p.a. respectively.
These trends are different from the total population level.
Intended to illustrate that we could materially misestimate trend if the composition of your portfolio is different from that of the population.
My part of the talk has a common thread which seems to be related to temperature…
First, migrations – that’s a politically hot topic.
Also comparison of heat maps by country
Finally a piece on seasonality and temperature
So why migration?
1)basis risk. May be features which aren’t relevant to your needs
2)artifacts caused by presence of migrations in the past.
Projection of evolving population structure
1st population – cat land. [NEXT]
2nd population – dog land
Projection of evolving population structure
1st population – cat land. [NEXT]
2nd population – dog land
Projection of evolving population structure
1st population – cat land. [NEXT]
2nd population – dog land
Projection of evolving population structure
-some of the dogs move to Cat Land
-leading to 1% growth
-over four years, and it’s limited to certain ages.
Project the two groups, cats and dogs, separately, counting their exposures and deaths and measuring aggregated impact of Cat Land’s population statistics.
We show the change in mortality using a heat map…
The migration creates a signature pattern:
Initial age period effect: for 4 years mortality gets worse by ca. 10% per annum.
The migration creates a signature pattern:
Initial age period effect: for 4 years mortality gets worse by ca. 10% per annum.
The migration creates a signature pattern:
Initial age period effect: for 4 years mortality gets worse by ca. 10% per annum.
Leading cohort effect: As dogs age, the combined population suffers from worse mortality than the cat-only population that went before
Leading cohort effect: As dogs age, the combined population suffers from worse mortality than the cat-only population that went before
The migration creates a signature pattern:
Initial age period effect: for 4 years mortality gets worse by ca. 10% per annum.
Leading edge cohort effect: As dogs age, the combined population suffers from worse mortality than the cat-only population that went before
Trailing edge cohort effect: Population returns to cat-only leading to heavy improvements.
Cross section by calendar year for various ages
-oscillation
Cross section by age for years
-initially see a hump (during the period effect)
-then see an oscillation (after migration has finished)
This is quite an extreme example…
…mortality difference nine times.
…4% population growth
….and mortality improvement artefact is around +/-10%
In practice far less extreme, and swamped by other features in the data.
Struggled to find real examples of immigration signature
-Too much noise
-Not enough difference in mortality
Poland
Some interesting features here – first we see the signature migration signature.
Poland – emmigration, not immigration.
Some interesting features here – first we see the signature migration signature.
1975: Following some changes in the law regarding German citizenship, “Aussiedlers” left the country for Germany
1975: Following some changes in the law regarding German citizenship, “Aussiedlers” left the country for Germany
1980’s: Because of “Martial law”, communist repressions and poverty, people moved mainly to the US, Canada and Germany
2004: The labor market opened in the UK, followed by other EU countries
Low mortality for most immigrant groups compared to natives in the host country1 (NZ - Hajat et al., 2010, U.S. - Abraido-Lanza et al., 1999; Palloni and Arias, 2004)
Often attributed to beneficial health selection processes
Could it be data artifacts?
1. Misreporting of Age – especially at advanced ages. For example in the US.
Possibly due for access to state benefits, easy when records can’t be checked.
Depress mortality rates at older ages and affect the age distribution of deaths.
2. Misclassifying nationality
The misclassification of ethnicity on death certificates may also occur.
Causes a biased numerator.
3. Emigrations under-registered.
Could refer to immigrants not registering an exit . Could be innocent.
Could be incentive to stay on a certain country’s register.
-become PHANTOMS in the data, statistically immortal, age with country’s official statistics
4. Deaths undercounted
Return migration for workers in deteriorating health
Causes a selection bias in the deaths that are counted.
missing deaths when for the most part these lives counted towards exposure
5. In Europe, ‘mobility bias’
Migrants may frequently return to their origin country for short or long periods (independent of health status) given the geographic proximity of many host and origin countries.
If the host country is unaware of these frequent departures, individuals will continue to contribute to denominators even though they are not permanently resident in the host country.
Wallace & Kulu - controlled for known/likely artefacts, found that low immigrant was probably a real feature of the data.
FOR THE UK.
-Implications for mortality projections . In particular , migration features are not relevant for projection a closed portfolio of annuitants/pensioners and should be removed.
-Migration-related artifacts (phantom populations) might affect your statistics in general.
UK actuary setting assumptions...
1) Assured lives experience => Interim cohort projections SC, MC, LC
2010 / 2020 / 2040
2) Population data showing the same thing.
3) same question: how long will it last
4) in 2008, looks like it is finishing?
Often say there is only one history. Not true – there are lots of countries we can learn from.
Question … is the cohort effect ending???
France
Has a cohort effect
Maybe it’s finished already
France
Has a cohort effect
Maybe it’s finished already – hard to say, swamped by effect of smoking ban.
France
Has a cohort effect
Looks like it’s finished already
Italy
Very similar
Netherlands
Very similar, same problem as Ireland with smoking ban
Netherlands
Also has a cohort effect
Probably diminishing
Portugal
Same again?
Sweden
Weak cohort effect
Often say there is only one history. Not true – there are lots of countries we can learn from.
Question … is the cohort effect ending???
Often say there is only one history. Not true – there are lots of countries we can learn from.
Question … is the cohort effect ending???
Finish trip around europe, some countries with no visible cohort effect….
Spain
No cohort effect?
Often say there is only one history. Not true – there are lots of countries we can learn from.
Question … is the cohort effect ending???
Often say there is only one history. Not true – there are lots of countries we can learn from.
Question … is the cohort effect ending???
Often say there is only one history. Not true – there are lots of countries we can learn from.
Question … is the cohort effect ending???
1975: Following some changes in the law regarding German citizenship, “Aussiedlers” left the country for Germany
1980’s: Because of “Martial law”, communist repressions and poverty, people moved mainly to the US, Canada and Germany
2004: The labor market opened in the UK, followed by other EU countries
Canada
Same cohort effect again?
Probably finished
Japan
Very strong cohort effect, probably finished
N.B. May not be the same reasons – but similarity is striking
Selection of countries with cohort effect
UK Ireland France
Belgium Italy Portugal
Netherlands Denmark Austria
Cohort effects all start in the mid sixties, around the same age group.
-over-dispersion, impact on stochastic models (volatility), impact on a projection (last data point)
-summer-summer vs winter-winter timing (Samoplig frequency/Nyquist therorem?)
-correlation to temperature records? => climate change impact
-obvious seasonality effect
-peak is very close to the year end – so timing of peaks will lead to volatility of YoY figures
-typical peak-to-trough is around 20%.
-much more seasonality amongst the elderly
-typical peak to trough ca. 50%
-can also identify some features….
1. Christmas
2. August Bank Holidays
3. 3 Day week in 2012 - record low number of deaths registered. Office was closed.
4. Royal Wedding
- N.B. bank holiday effect.
- Some of this might be real – e.g. elderly hanging on for Christmas.
…..most likely to be the office hours of the registrar office.
- i.e Date of REGISTRATION vs Date of DEATH
-CMI data by age only availible for short time series. Return to ONS data (not split by age).
-obvious seasonality effect
-2011, 2012 => late winter
- where gradient is steep, then deaths in year may be sensitive to timing of the winter.
- late winter & early winter => heavy calendar year
- early winter & late winter => light calendar year
-Sept – Sept model has 60% volatility
- September also has most cosistent temperature
-Over-dispersion models and/or autoregressive models go some way to fixing this,
-But that seems to be a round-about way of fixing an underlying problem in your data treatment
Be careful so that you don’t slip up.
The data you’re using won't have been collected with your purpose in mind – any polishing / manipulation you’re not aware of? What artifacts are in the data.
How clear are you about what’s in your numerator & denominator.
If you’re smoothing your data – you’re fitting a model. Understand the raw data first and then check the fit of the fitted model.
Basis risk. Is the population representative of your portfolio? How is the pool of lives changing over time and is it material?
… and going right back to the Hawthorne effect could the observation process … such as the definition of investigation year … create artifacts in your results.
Today we’ve spoken about the known unknowns … unfortunately we’ll need hindsight to consider the unknown unknowns.
So let me leave you with one final thought on correlation and causation.
We saw earlier that if you could predict the weather you may be able to predict how mortality will change.
But what about this example? Any thoughts on what these might be?
Number of pirates … may be correlated but unlikely to be causation.
Thanks