BIG	
  DATA:	
  An	
  actuarial	
  perspective	
  
	
  
	
  
Information	
  Paper	
  
November	
  2015	
  
	
  
 
	
  
2
Table	
  of	
  Contents	
  
1	
  INTRODUCTION	
   3	
  
2	
  INTRODUCTION	
  TO	
  BIG	
  DATA	
   3	
  
2.1	
  INTRODUCTION	
  AND	
  CHARACTERISTICS	
   3	
  
2.2	
  BIG	
  DATA	
  TECHNIQUES	
  AND	
  TOOLS	
   4	
  
2.3	
  BIG	
  DATA	
  APPLICATIONS	
   4	
  
2.4	
  DATA	
  DRIVEN	
  BUSINESS	
   5	
  
3	
  BIG	
  DATA	
  IN	
  INSURANCE	
  VALUE	
  CHAIN	
   6	
  
3.1	
  INSURANCE	
  UNDERWRITING	
   6	
  
3.2	
  INSURANCE	
  PRICING	
   8	
  
3.3	
  INSURANCE	
  RESERVING	
   10	
  
3.4	
  CLAIMS	
  MANAGEMENT	
   11	
  
4	
  LEGAL	
  ASPECTS	
  OF	
  BIG	
  DATA	
   13	
  
4.1	
  INTRODUCTION	
   13	
  
4.2	
  DATA	
  PROCESSING	
   14	
  
4.3	
  DISCRIMINATION	
   16	
  
5	
  NEW	
  FRONTIERS	
   17	
  
5.1	
  RISK	
  POOLING	
  VS.	
  PERSONALIZATION	
   17	
  
5.2	
  PERSONALISED	
  PREMIUM	
   18	
  
5.3	
  FROM	
  INSURANCE	
  TO	
  PREVENTION	
   18	
  
5.4	
  THE	
  ALL-­‐SEEING	
  INSURER	
   18	
  
5.5	
  CHANGE	
  IN	
  INSURANCE	
  BUSINESS	
   19	
  
6	
  ACTUARIAL	
  SCIENCES	
  AND	
  THE	
  ROLE	
  OF	
  ACTUARIES	
   19	
  
6.1	
  WHAT	
  IS	
  BIG	
  DATA	
  BRINGING	
  FOR	
  THE	
  ACTUARY?	
   19	
  
6.2	
  WHAT	
  IS	
  THE	
  ACTUARY	
  BRINGING	
  TO	
  BIG	
  DATA?	
   20	
  
7	
  CONCLUSIONS	
   21	
  
8	
  REFERENCES	
   22	
  
 
	
  
3
1 Introduction	
  
The	
  Internet	
  has	
  started	
  in	
  1984	
  linking	
  1,000	
  university	
  and	
  corporate	
  labs.	
  In	
  1998	
  it	
  grew	
  to	
  50	
  million	
  
users,	
  while	
  in	
  2015	
  it	
  reached	
  3.2	
  billion	
  people	
  (44%	
  of	
  the	
  global	
  population).	
  This	
  enormous	
  user	
  
growth	
  was	
  combined	
  with	
  an	
  explosion	
  of	
  data	
  that	
  we	
  all	
  produce.	
  Every	
  day	
  we	
  create	
  around	
  2.5	
  
quintillion	
  bytes	
  of	
  data,	
  information	
  coming	
  from	
  various	
  sources	
  including	
  social	
  media	
  sites,	
  gadgets,	
  
smartphones,	
   intelligent	
   homes	
   and	
   cars	
   or	
   industrial	
   sensors	
   to	
   name	
   few.	
   Any	
   company	
   that	
   can	
  
combine	
  various	
  datasets	
  and	
  can	
  entail	
  effective	
  data	
  analytics	
  will	
  be	
  able	
  to	
  become	
  more	
  profitable	
  
and	
  successful.	
  According	
  to	
  a	
  recent	
  report1	
  400	
  large	
  companies	
  who	
  adopted	
  Big	
  Data	
  analytics	
  "have	
  
gained	
  a	
  significant	
  lead	
  over	
  the	
  rest	
  of	
  the	
  corporate	
  world."	
  Big	
  data	
  offers	
  big	
  business	
  gains,	
  but	
  also	
  
has	
   hidden	
   costs	
   and	
   complexity	
   that	
   companies	
   will	
   have	
   to	
   struggle	
   with.	
   Semi-­‐structured	
   and	
  
unstructured	
  big	
  data	
  requires	
  new	
  skills	
  and	
  there	
  is	
  shortage	
  of	
  people	
  who	
  mastered	
  data	
  science	
  and	
  
can	
  handle	
  mathematics	
  and	
  statistics,	
  programming	
  and	
  possess	
  substantive,	
  domain	
  knowledge.	
  
	
  
What	
  will	
  be	
  the	
  impact	
  on	
  the	
  insurance	
  sector	
  and	
  the	
  actuarial	
  profession?	
  The	
  concepts	
  of	
  Big	
  Data	
  
and	
   predictive	
   modelling	
   are	
   not	
   new	
   to	
   insurers	
   who	
   have	
   already	
   been	
   storing	
   and	
   analysing	
   large	
  
quantities	
  of	
  data	
  to	
  achieve	
  deeper	
  insights	
  into	
  customers’	
  behaviour	
  or	
  setting	
  up	
  insurance	
  premiums.	
  
Moreover	
   actuaries	
   are	
   data	
   scientists	
   for	
   insurance	
   and	
   they	
   have	
   all	
   the	
   statistical	
   training	
   and	
  
analytical	
  thinking	
  to	
  understand	
  complexity	
  of	
  data	
  combined	
  with	
  the	
  business	
  insights.	
  We	
  look	
  closely	
  
on	
   the	
   insurance	
   value	
   chain	
   and	
   assess	
   the	
   impact	
   of	
   Big	
   Data	
   on	
   underwriting,	
   pricing	
   and	
   claims	
  
reserving.	
   We	
   examine	
   the	
   ethics	
   of	
   Big	
   Data	
   including	
   data	
   privacy,	
   customer	
   identification,	
   data	
  
ownership	
   and	
   the	
   legal	
   aspects.	
   We	
   also	
   discuss	
   new	
   frontiers	
   for	
   insurance	
   and	
   its	
   impact	
   on	
   the	
  
actuarial	
  profession.	
  Will	
  actuaries	
  will	
  be	
  able	
  to	
  leverage	
  Big	
  Data,	
  create	
  sophisticated	
  risk	
  models	
  and	
  
more	
  personalized	
  insurance	
  offers,	
  and	
  bring	
  new	
  wave	
  of	
  innovation	
  to	
  the	
  market?	
  	
  
	
  
2 Introduction	
  to	
  Big	
  Data	
  
	
  
2.1 Introduction	
  and	
  characteristics	
  
Big	
  Data	
  broadly	
  refers	
  to	
  data	
  sets	
  so	
  large	
  and	
  complex	
  that	
  they	
  cannot	
  be	
  handled	
  by	
  traditional	
  data	
  
processing	
  software	
  and	
  it	
  can	
  be	
  defined	
  by	
  the	
  following	
  attributes:	
  
a. Volume:	
  in	
  2012	
  it	
  was	
  estimated	
  that	
  2.5	
  x	
  1018	
  bytes	
  of	
  data	
  was	
  created	
  worldwide	
  every	
  day	
  -­‐	
  
this	
  is	
  equivalent	
  to	
  a	
  stack	
  of	
  books	
  from	
  the	
  Sun	
  to	
  Pluto	
  and	
  back	
  again.	
  This	
  data	
  comes	
  from	
  
everywhere:	
   sensors	
   used	
   to	
   gather	
   climate	
   information,	
   posts	
   to	
   social	
   media	
   sites,	
   digital	
  
pictures	
  and	
  videos,	
  purchase	
  transaction	
  records,	
  software	
  logs,	
  GPS	
  signals	
  from	
  mobile	
  devices,	
  
among	
  others.	
  
b. Variety	
  and	
  Variability:	
  the	
  challenges	
  of	
  Big	
  Data	
  do	
  not	
  only	
  arise	
  from	
  the	
  sheer	
  volume	
  of	
  
data	
  but	
  also	
  from	
  the	
  fact	
  that	
  data	
  is	
  generated	
  in	
  multiple	
  forms	
  as	
  a	
  mix	
  of	
  unstructured	
  and	
  
structured	
  data,	
  and	
  as	
  a	
  mix	
  of	
  data	
  at	
  rest	
  and	
  data	
  in	
  motion	
  (i.e.	
  static	
  and	
  real	
  time	
  data).	
  
Furthermore	
   the	
   meaning	
   of	
   data	
   can	
   change	
   over	
   time	
   or	
   depend	
   on	
   the	
   context.	
   Structured	
  
data	
  is	
  organized	
  in	
  a	
  way	
  that	
  both	
  computers	
  and	
  humans	
  can	
  read,	
  for	
  example	
  information	
  
stored	
   in	
   traditional	
   databases.	
   Unstructured	
   data	
   refers	
   to	
   data	
   types	
   such	
   as	
   images,	
   audio,	
  
video,	
   social	
   media	
   and	
   other	
   information	
   that	
   are	
   not	
   organized	
   or	
   easily	
   interpreted	
   by	
  
traditional	
   databases.	
   It	
   includes	
   data	
   generated	
   by	
   machines	
   such	
   as	
   sensors,	
   web	
   feeds,	
  
networks	
  or	
  service	
  platforms.	
  
c. Visualization:	
  the	
  insights	
  gained	
  by	
  a	
  company	
  from	
  analysing	
  data	
  must	
  be	
  shared	
  in	
  a	
  way	
  that	
  
is	
  efficient	
  and	
  understandable	
  to	
  the	
  company’s	
  stakeholders.	
  
d. Velocity:	
  data	
  is	
  created,	
  saved,	
  analysed	
  and	
  visualized	
  at	
  an	
  increasing	
  speed,	
  making	
  it	
  possible	
  
to	
  analyse	
  and	
  visualize	
  high	
  volumes	
  of	
  data	
  in	
  real	
  time.	
  	
  
e. Veracity:	
  it	
  is	
  essential	
  that	
  the	
  data	
  is	
  accurate	
  in	
  order	
  to	
  generate	
  value.	
  
f. Value:	
  the	
  insights	
  gleaned	
  from	
  Big	
  Data	
  can	
  help	
  organizations	
  deepen	
  customer	
  engagement,	
  
optimize	
  operations,	
  prevent	
  threats	
  and	
  fraud,	
  and	
  capitalize	
  on	
  new	
  sources	
  of	
  revenue.	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
1	
  http://www.bain.com/publications/articles/big_data_the_organizational_challenge.aspx	
  
 
	
  
4
2.2 Big	
  Data	
  techniques	
  and	
  tools	
  
The	
  Big	
  Data	
  industry	
  has	
  been	
  supported	
  by	
  the	
  following	
  technologies:	
  
a. The	
  Apache	
  Hadoop	
  software	
  library	
  was	
  initially	
  released	
  in	
  December	
  2011	
  and	
  is	
  an	
  open	
  
source	
  framework	
  that	
  allows	
  for	
  the	
  distributed	
  processing	
  of	
  large	
  data	
  sets	
  across	
  clusters	
  of	
  
computers	
  using	
  simple	
  algorithms.	
  It	
  is	
  designed	
  to	
  scale	
  up	
  from	
  one	
  to	
  thousands	
  of	
  machines,	
  
each	
   one	
   being	
   a	
   computational	
   and	
   storage	
   unit.	
   The	
   software	
   library	
   is	
   designed	
   under	
   the	
  
fundamental	
   assumption	
   that	
   hardware	
   failures	
   are	
   common:	
   the	
   library	
   itself	
   automatically	
  
detects	
   and	
   handles	
   hardware	
   failures	
   in	
   order	
   to	
   guarantee	
   that	
   the	
   services	
   provided	
   by	
   a	
  
computer	
  cluster	
  will	
  stay	
  available	
  even	
  when	
  the	
  cluster	
  is	
  affected	
  by	
  hardware	
  failures.	
  A	
  wide	
  
variety	
  of	
  companies	
  and	
  organizations	
  use	
  Hadoop	
  for	
  both	
  research	
  and	
  production:	
  web-­‐based	
  
companies	
   that	
   own	
   some	
   of	
   the	
   world’s	
   biggest	
   data	
   warehouses	
   (Amazon,	
   Facebook,	
   Google,	
  
Twitter,	
  Yahoo!,	
  ...),	
  media	
  groups,	
  universities	
  among	
  others.	
  A	
  list	
  of	
  Hadoop	
  users	
  and	
  systems	
  
is	
  available	
  at	
  http://wiki.apache.org/hadoop/PoweredBy.	
  
b. Non-­‐relational	
  databases	
  have	
  existed	
  since	
  the	
  late	
  1960s	
  but	
  resurfaced	
  in	
  2009	
  (under	
  the	
  
moniker	
  of	
  Not	
  Only	
  SQL	
  -­‐	
  NOSQL))	
  as	
  it	
  became	
  clear	
  they	
  are	
  especially	
  well	
  suited	
  to	
  handle	
  the	
  
Big	
   Data	
   challenges	
   of	
   volume	
   and	
   variety	
   and	
   as	
   they	
   neatly	
   fit	
   within	
   the	
   Apache	
   Hadoop	
  
framework.	
  
c. Cloud	
   Computing	
   is	
   a	
   kind	
   of	
   internet-­‐based	
   computing,	
   where	
   shared	
   resources	
   and	
  
information	
   are	
   provided	
   to	
   computers	
   and	
   other	
   devices	
   on-­‐demand	
   (Wikipedia).	
   A	
   service	
  
provider	
  offers	
  computing	
  resources	
  for	
  a	
  fixed	
  price,	
  available	
  online	
  and	
  in	
  general	
  with	
  a	
  high	
  
degree	
  of	
  flexibility	
  and	
  reliability.	
  These	
  technologies	
  have	
  been	
  created	
  by	
  major	
  online	
  actors	
  
(Amazon,	
  Google)	
  followed	
  by	
  other	
  technology	
  providers	
  (IBM,	
  Microsoft,	
  RedHat).	
  There	
  is	
  a	
  
wide	
  variety	
  of	
  architecture	
  Public,	
  Private	
  and	
  Hybride	
  Cloud	
  with	
  all	
  the	
  objective	
  of	
  making	
  
computing	
  infrastructure	
  a	
  commodity	
  asset	
  with	
  the	
  best	
  quality/total	
  cost	
  of	
  ownership	
  ratio.	
  
Having	
  a	
  nearly	
  infinite	
  amount	
  of	
  computing	
  power	
  at	
  hand	
  with	
  a	
  high	
  flexibility	
  is	
  a	
  key	
  factor	
  
for	
  the	
  success	
  of	
  Big	
  Data	
  initiatives.	
  
d. Mining	
  Massive	
  Datasets	
  is	
  a	
  set	
  of	
  methods,	
  algorithms	
  and	
  techniques	
  that	
  can	
  be	
  used	
  to	
  deal	
  
with	
  Big	
  Data	
  problems	
  and	
  in	
  particular	
  with	
  volume,	
  variety	
  and	
  velocity	
  issues.	
  PageRank	
  can	
  
be	
   seen	
   as	
   a	
   major	
   step	
   (see	
   http://infolab.stanford.edu/pub/papers/google.pdf)	
   and	
   its	
  
evolution	
  to	
  a	
  Map-­‐Reduce	
  (https://en.wikipedia.org/wiki/MapReduce)	
  approach	
  is	
  definitively	
  a	
  
breakthrough.	
  Social	
  Netword	
  Analysis	
  is	
  becoming	
  an	
  area	
  of	
  research	
  in	
  itself	
  that	
  aim	
  to	
  extract	
  
useful	
   information	
   from	
   the	
   massive	
   amount	
   of	
   data	
   the	
   Social	
   Networks	
   are	
   providing.	
   These	
  
methods	
   are	
   very	
   well	
   suited	
   to	
   run	
   on	
   software	
   such	
   as	
   Hadoop	
   in	
   a	
   Cloud	
   Computing	
  
environment.	
  
e. Social	
  Networks	
  is	
  one	
  source	
  of	
  Bid	
  Data	
  that	
  provides	
  a	
  stream	
  of	
  data	
  with	
  a	
  huge	
  value	
  for	
  
almost	
  all	
  economic	
  (and	
  even	
  non-­‐economic)	
  actors.	
  For	
  most	
  companies,	
  it	
  is	
  the	
  very	
  first	
  time	
  
in	
  history	
  they	
  are	
  capable	
  of	
  interacting	
  directly	
  with	
  their	
  customers.	
  Many	
  applications	
  of	
  Big	
  
Data	
   make	
   use	
   of	
   these	
   data	
   to	
   provide	
   enhanced	
   services,	
   products	
   and	
   to	
   increase	
   customer	
  
satisfaction.	
  
2.3 Big	
  Data	
  Applications	
  
Big	
  Data	
  has	
  the	
  potential	
  to	
  change	
  the	
  way	
  academic	
  institutions,	
  corporate	
  and	
  organizations	
  conduct	
  
business	
  and	
  change	
  our	
  daily	
  life.	
  Great	
  examples	
  of	
  Big	
  Data	
  applications	
  include:	
  
a. Healthcare:	
   Big	
   Data	
   technologies	
   will	
   have	
   a	
   major	
   impact	
   in	
   healthcare.	
   IBM	
   estimates	
   that	
  
80%	
  of	
  medical	
  data	
  is	
  unstructured	
  and	
  is	
  clinically	
  relevant.	
  Furthermore	
  medical	
  data	
  resides	
  
in	
  multiple	
  places	
  like	
  individual	
  medical	
  files,	
  lab	
  and	
  imaging	
  systems,	
  physician	
  notes,	
  medical	
  
correspondence,	
   etc.	
   Big	
   Data	
   technologies	
   allow	
   healthcare	
   organizations	
   to	
   bring	
   all	
   the	
  
information	
   about	
   an	
   individual	
   together	
   to	
   get	
   insights	
   on	
   how	
   to	
   manage	
   care	
   coordination,	
  
outcomes-­‐based	
  reimbursement	
  models,	
  patient	
  engagement	
  and	
  outreach	
  programs.	
  
b. Retail:	
  Retailers	
  can	
  get	
  insights	
  for	
  personalizing	
  marketing	
  and	
  improving	
  the	
  effectiveness	
  of	
  
marketing	
  campaigns,	
  for	
  optimizing	
  assortment	
  and	
  merchandising	
  decisions,	
  and	
  for	
  removing	
  
inefficiencies	
  in	
  distribution	
  and	
  operations.	
  For	
  instance	
  several	
  retailers	
  now	
  incorporate	
  
 
	
  
5
Twitter	
  streams	
  into	
  their	
  analysis	
  of	
  loyalty-­‐program	
  data.	
  The	
  gained	
  insights	
  make	
  it	
  possible	
  
to	
  plan	
  for	
  surges	
  in	
  demand	
  for	
  certain	
  items	
  and	
  to	
  create	
  mobile	
  marketing	
  campaigns	
  
targeting	
  specific	
  customers	
  with	
  offers	
  at	
  the	
  times	
  of	
  day	
  they	
  would	
  be	
  most	
  receptive	
  to	
  them.2	
  
c. Politics:	
  Big	
  Data	
  technologies	
  will	
  improve	
  the	
  efficiency	
  and	
  effectiveness	
  across	
  the	
  broad	
  
range	
  of	
  government	
  responsibilities.	
  Great	
  example	
  of	
  Big	
  Data	
  use	
  in	
  politics	
  was	
  2012	
  analytics	
  
and	
  metrics	
  driven	
  Barack	
  Obama’s	
  presidential	
  campaign	
  [1].	
  Other	
  examples	
  include:	
  
i. Threat	
  and	
  crime	
  prediction	
  and	
  prevention.	
  For	
  instance	
  the	
  Detroit	
  Crime	
  Commission	
  
has	
  turned	
  to	
  Big	
  Data	
  in	
  its	
  effort	
  to	
  assist	
  the	
  government	
  and	
  citizens	
  of	
  southeast	
  
Michigan	
  in	
  the	
  prevention,	
  investigation	
  and	
  prosecution	
  of	
  neighbourhood	
  crime;3	
  
ii. Detection	
  of	
  fraud,	
  waste	
  and	
  errors	
  in	
  social	
  programs;	
  
iii. Detection	
  of	
  tax	
  fraud	
  and	
  abuse.	
  
d. Cyber	
  risk	
  prevention:	
  companies	
  can	
  analyse	
  data	
  traffic	
  in	
  their	
  computer	
  networks	
  in	
  real	
  
time	
  to	
  detect	
  anomalies	
  that	
  may	
  indicate	
  the	
  early	
  stages	
  of	
  a	
  cyber	
  attack.	
  Research	
  firm	
  
Gartner	
  estimates	
  that	
  by	
  2016,	
  more	
  than	
  25%	
  of	
  global	
  firms	
  will	
  adopt	
  big	
  data	
  analytics	
  for	
  at	
  
least	
  one	
  security	
  and	
  fraud	
  detection	
  use	
  case,	
  up	
  from	
  8%	
  as	
  at	
  2014.4	
  
e. Insurance	
  fraud	
  detection:	
  Insurance	
  companies	
  can	
  determine	
  a	
  score	
  for	
  each	
  claim	
  in	
  order	
  
to	
  target	
  for	
  fraud	
  investigation	
  the	
  claims	
  with	
  the	
  highest	
  scores	
  i.e.	
  the	
  ones	
  that	
  are	
  most	
  likely	
  
to	
  be	
  fraudulent.	
  Fraud	
  detection	
  is	
  treated	
  in	
  paragraph	
  3.4.	
  
f. Usage-­‐Based	
  Insurance:	
  is	
  an	
  insurance	
  scheme,	
  where	
  car	
  insurance	
  premiums	
  are	
  calculated	
  
based	
  on	
  dynamic	
  causal	
  data,	
  including	
  actual	
  usage	
  and	
  driving	
  behaviour.	
  Telematics	
  data	
  
transmitted	
  from	
  a	
  vehicle	
  combined	
  with	
  Big	
  Data	
  analytics	
  enables	
  insurers	
  to	
  distinguish	
  
cautious	
  drivers	
  from	
  aggressive	
  drivers	
  and	
  match	
  insurance	
  rate	
  with	
  the	
  actual	
  risk	
  incurred.	
  
2.4 Data	
  driven	
  business	
  
The	
   quantity	
   of	
   data	
   is	
   steeply	
   increasing	
   month	
   after	
   month	
   in	
   the	
   world.	
   Some	
   argue	
   it	
   is	
   time	
   to	
  
organize	
  and	
  use	
  this	
  information:	
  data	
  must	
  now	
  be	
  viewed	
  as	
  a	
  corporate	
  asset.	
  	
  In	
  order	
  to	
  respond	
  to	
  
this	
  arising	
  transformation	
  of	
  business	
  culture,	
  two	
  specific	
  C-­‐level	
  roles	
  have	
  thus	
  appeared	
  in	
  the	
  past	
  
years,	
  one	
  in	
  the	
  banking	
  and	
  the	
  other	
  in	
  the	
  insurance	
  industry.	
  
2.4.1 The	
  Chief	
  Data	
  Officer	
  
The	
  Chief	
  Data	
  Officer	
  (abbreviated	
  to	
  CDO)	
  is	
  the	
  first	
  architect	
  of	
  this	
  “data-­‐driven	
  business”.	
  Thanks	
  
to	
  his	
  role	
  of	
  coordinator,	
  the	
  CDO	
  will	
  be	
  in	
  charge	
  of	
  the	
  data	
  that	
  drive	
  the	
  company,	
  by:	
  	
  
• defining	
  and	
  setting	
  up	
  a	
  strategy	
  to	
  guarantee	
  their	
  quality,	
  their	
  reliability	
  and	
  their	
  
coherency;	
  
• organizing	
  and	
  classifying	
  them;	
  
• making	
  them	
  accessible	
  to	
  the	
  right	
  person	
  at	
  the	
  right	
  moment,	
  for	
  the	
  pertinent	
  need	
  and	
  in	
  
the	
  right	
  format.	
  
Thus,	
  the	
  Chief	
  Data	
  Officer	
  needs	
  a	
  strong	
  business	
  background	
  to	
  understand	
  how	
  business	
  runs.	
  The	
  
following	
   question	
   will	
   then	
   emerge:	
   to	
   whom	
   should	
   the	
   CDO	
   report?	
   In	
   some	
   firms,	
   the	
   CDO	
   is	
  
considered	
  part	
  of	
  the	
  IT,	
  and	
  reports	
  to	
  the	
  CTO	
  (Chief	
  Technology	
  Officer);	
  in	
  others,	
  he	
  holds	
  more	
  of	
  a	
  
business	
  role,	
  reporting	
  to	
  the	
  CEO.	
  It’s	
  therefore	
  up	
  to	
  the	
  company	
  to	
  decide,	
  as	
  not	
  two	
  companies	
  are	
  
exactly	
  similar	
  from	
  a	
  structural	
  point	
  of	
  view.	
  	
  
Which	
   companies	
   have	
   already	
   a	
   CDO?	
   Generali	
   Group	
   has	
   appointed	
   someone	
   to	
   this	
   newly	
   created	
  
position	
   in	
   June	
   2015.	
   Other	
   companies	
   such	
   as	
   HSBC,	
   Wells	
   Fargo	
   and	
   QBE	
   had	
   already	
   appointed	
   a	
  
person	
   to	
   this	
   position	
   in	
   2013	
   or	
   2014.	
   Even	
   Barack	
   Obama	
   appointed	
   a	
   Chief	
   Data	
   Officer/Scientist	
  
during	
  his	
  2012	
  campaign	
  and	
  the	
  metrics-­‐driven	
  decision-­‐making	
  campaign	
  played	
  a	
  big	
  role	
  in	
  Obama’s	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
2	
  http://asmarterplanet.com/blog/2015/03/surprising-­‐insights-­‐ibmtwitter-­‐alliance.html#more-­‐33140	
  
3	
  http://www.datameer.com/company/news/press-­‐releases/detroit-­‐crime-­‐commission-­‐combats-­‐crime-­‐with-­‐
datameer-­‐big-­‐data-­‐analytics.html	
  
4	
  http://www.gartner.com/newsroom/id/2663015	
  
 
	
  
6
re-­‐election.	
   In	
   the	
   beginning,	
   most	
   of	
   the	
   professionals	
   holding	
   the	
   actual	
   job	
   title	
   “Chief	
   Data	
   Officer”	
  
were	
  located	
  in	
  the	
  United	
  States.	
  After	
  a	
  while,	
  Europe	
  followed	
  the	
  move.	
  Also,	
  lots	
  of	
  people	
  did	
  the	
  job	
  
in	
  their	
  day-­‐to-­‐day	
  work,	
  but	
  didn’t	
  necessarily	
  hold	
  the	
  title.	
  Many	
  analysts	
  in	
  the	
  financial	
  sector	
  believe	
  
that	
  yet	
  more	
  insurance	
  and	
  banking	
  companies	
  will	
  have	
  to	
  do	
  the	
  move	
  in	
  the	
  following	
  years	
  if	
  they	
  
want	
  to	
  stay	
  attractive.	
  
2.4.2 The	
  Chief	
  Analytics	
  Officer	
  
Another	
  C-­‐level	
  position	
  aroused	
  in	
  the	
  past	
  months:	
  the	
  Chief	
  Analytics	
  Officer	
  (abbreviated	
  to	
  CAO).	
  Are	
  
there	
  differences	
  between	
  a	
  CAO	
  and	
  a	
  CDO?	
  	
  Theoretically	
  a	
  CDO	
  focuses	
  on	
  tactical	
  data	
  management,	
  
while	
  the	
  CAO	
  concentrates	
  on	
  the	
  strategic	
  deployment	
  of	
  analytics.	
  The	
  latter’s	
  focus	
  is	
  on	
  data	
  analysis	
  
to	
   find	
   hidden,	
   but	
   valuable,	
   patterns.	
   These	
   will	
   result	
   in	
   operational	
   decisions	
   that	
   will	
   make	
   the	
  
company	
   more	
   competitive,	
   more	
   efficient	
   and	
   more	
   attractive	
   to	
   their	
   potential	
   and	
   current	
   clients.	
  
Therefore,	
   the	
   CAO	
   is	
   a	
   normal	
   prolongation	
   of	
   the	
   data-­‐driven	
   business:	
   the	
   more	
   analytics	
   are	
  
embedded	
  in	
  the	
  organization,	
  the	
  more	
  you	
  need	
  an	
  executive-­‐level	
  person	
  to	
  manage	
  that	
  position	
  and	
  
communicate	
  the	
  results	
  in	
  an	
  understandable	
  way.	
  The	
  CAO	
  usually	
  reports	
  to	
  the	
  CEO.	
  
In	
   practice,	
   some	
   companies	
   put	
   the	
   CAO	
   responsibilities	
   into	
   the	
   CDO	
   tasks,	
   while	
   others	
   distinguish	
  
both	
  positions.	
  Currently,	
  it’s	
  quite	
  rare	
  to	
  find	
  an	
  explicit	
  “Chief	
  Analytics	
  Officer”	
  position	
  in	
  the	
  banking	
  
and	
  insurance	
  sector,	
  because	
  of	
  this	
  overlap.	
  But	
  in	
  other	
  fields,	
  the	
  distinction	
  is	
  often	
  made.	
  
3 Big	
  Data	
  in	
  insurance	
  value	
  chain	
  
Big	
   Data	
   provides	
   new	
   insights	
   from	
   social	
   networks,	
   telematics	
   sensors,	
   and	
   other	
   new	
   information	
  
channels	
   and	
   as	
   a	
   result	
   it	
   allows	
   understanding	
   customer	
   preferences	
   better,	
   enabling	
   new	
   business	
  
approaches	
  and	
  products,	
  and	
  enhancing	
  existing	
  internal	
  models,	
  processes	
  and	
  services.	
  With	
  the	
  rise	
  
of	
  Big	
  Data	
  the	
  insurance	
  world	
  could	
  fundamentally	
  change	
  and	
  the	
  entire	
  insurance	
  value	
  chain	
  could	
  
be	
  impacted	
  starting	
  from	
  underwriting	
  to	
  claims	
  management.	
  	
  
	
  
3.1 Insurance	
  underwriting	
  
3.1.1 Introduction	
  
In	
  traditional	
  insurance	
  underwriting	
  and	
  actuarial	
  analyses,	
  for	
  years	
  we	
  have	
  been	
  observing	
  a	
  never-­‐
ending	
  search	
  for	
  more	
  meaningful	
  insight	
  into	
  individual	
  policyholder	
  risk	
  characteristics	
  to	
  distinguish	
  
good	
   risks	
   from	
   the	
   bad	
   and	
   to	
   accurately	
   price	
   each	
   risk	
   accordingly.	
   The	
   analytics	
   performed	
   by	
  
actuaries,	
  based	
  on	
  advanced	
  mathematical	
  and	
  financial	
  theories,	
  have	
  always	
  been	
  critically	
  important	
  
to	
   an	
   insurer’s	
   profitability.	
   Over	
   the	
   last	
   decade,	
   however,	
   revolutionary	
   advances	
   in	
   computing	
  
technology	
   and	
   the	
   explosion	
   of	
   new	
   digital	
   data	
   sources	
   have	
   expanded	
   and	
   reinvented	
   the	
   core	
  
disciplines	
   of	
   insurers.	
   Today’s	
   advanced	
   analytics	
   in	
   insurance	
   go	
   much	
   further	
   than	
   traditional	
  
underwriting	
  and	
  actuarial	
  science.	
  Data	
  mining	
  and	
  predictive	
  modelling	
  is	
  today	
  the	
  way	
  forward	
  for	
  
insurers	
  for	
  improving	
  pricing,	
  segmentation	
  and	
  increasing	
  profitability.	
  
3.1.2 What	
  is	
  predictive	
  modelling?	
  
Predictive	
  modelling	
  can	
  be	
  defined	
  as	
  the	
  analysis	
  of	
  large	
  historical	
  data	
  sets	
  to	
  identify	
  correlations	
  
and	
  interactions	
  and	
  the	
  use	
  of	
  this	
  knowledge	
  to	
  predict	
  future	
  events.	
  For	
  actuaries,	
  the	
  concepts	
  of	
  
predictive	
  modelling	
  are	
  not	
  new	
  to	
  the	
  profession.	
  The	
  use	
  of	
  mortality	
  tables	
  to	
  price	
  life	
  insurance	
  
products	
   is	
   an	
   example	
   of	
   predictive	
   modelling.	
   The	
   Belgian	
   MK,	
   FK	
   and	
   MR,	
   FR	
   tables	
   showed	
   the	
  
relationship	
  between	
  death	
  probability	
  and	
  the	
  explaining	
  variables	
  of	
  age,	
  sex	
  and	
  product	
  type	
  (in	
  this	
  
case	
  life	
  insurance	
  or	
  annuity).	
  
Predictive	
   models	
   have	
   been	
   around	
   a	
   long	
   time	
   in	
   sales	
   and	
   marketing	
   environments	
   for	
   example	
   to	
  
predict	
  the	
  probability	
  of	
  a	
  customer	
  to	
  buy	
  a	
  new	
  product.	
  Bringing	
  together	
  expertise	
  from	
  both	
  the	
  
actuarial	
   profession	
   and	
   marketing	
   analytics	
   can	
   lead	
   to	
   new	
   innovative	
   initiatives	
   where	
   predictive	
  
models	
  guide	
  expert	
  decisions	
  in	
  areas	
  such	
  as	
  claims	
  management,	
  fraud	
  detection	
  and	
  underwriting.	
  
3.1.3 From	
  small	
  over	
  medium	
  to	
  Big	
  Data	
  
Insurers	
  collect	
  a	
  wealth	
  of	
  information	
  on	
  their	
  customers.	
  In	
  the	
  first	
  place	
  during	
  the	
  underwriting	
  
process:	
   by	
   asking	
   about	
   the	
   claims	
   history	
   of	
   a	
   customer	
   for	
   car	
   and	
   home	
   insurance	
   for	
   example.	
  
Another	
  source	
  is	
  the	
  history	
  of	
  the	
  relationship	
  the	
  customer	
  has	
  with	
  the	
  insurance	
  company.	
  While	
  in	
  
the	
  past	
  the	
  data	
  was	
  kept	
  in	
  silos	
  by	
  product,	
  the	
  key	
  challenge	
  now	
  lies	
  in	
  gathering	
  all	
  this	
  information	
  
into	
  one	
  place	
  where	
  the	
  customer	
  dimension	
  is	
  central.	
  The	
  transversal	
  approach	
  to	
  the	
  database	
  also	
  
 
	
  
7
reflects	
  the	
  recent	
  evolution	
  in	
  marketing:	
  going	
  from	
  the	
  4P’s	
  (product,	
  price,	
  place,	
  promotion)	
  to	
  the	
  
4C’s5	
  (customer,	
  costs,	
  convenience,	
  communication).	
  
On	
  top	
  of	
  unleashing	
  the	
  value	
  of	
  internal	
  data,	
  new	
  data	
  sources	
  are	
  becoming	
  available	
  like	
  for	
  instance	
  
wearables,	
  social	
  networks	
  to	
  name	
  few.	
  Because	
  Big	
  Data	
  can	
  be	
  overwhelming	
  to	
  start	
  with,	
  medium	
  
data	
   should	
   be	
   considered	
   at	
   first.	
   In	
   Belgium,	
   the	
   strong	
   bancassurance	
   tradition	
   offers	
   interesting	
  
opportunities	
  of	
  combining	
  the	
  insurance	
  and	
  bank	
  data	
  to	
  create	
  powerful	
  predictive	
  models.	
  
3.1.4 Examples	
  of	
  predictive	
  modelling	
  for	
  underwriting	
  
1°	
  Use	
  the	
  360	
  view	
  on	
  the	
  customer	
  and	
  predictive	
  models	
  to	
  maximize	
  profitability	
  and	
  gain	
  more	
  
business.	
  
By	
   thoroughly	
   analysing	
   data	
   from	
   different	
   sources	
   and	
   applying	
   analytics	
   to	
   gain	
   insight,	
   insurance	
  
companies	
   should	
   strive	
   to	
   develop	
   a	
   comprehensive	
   360-­‐degree	
   customer	
   view.	
   The	
   gains	
   of	
   this	
  
complete	
  and	
  accurate	
  view	
  of	
  the	
  customer	
  are	
  twofold:	
  
• Maximizing	
  the	
  profitability	
  of	
  the	
  current	
  customer	
  portfolio	
  through:	
  
o detecting	
  cross-­‐sell	
  and	
  up-­‐sell	
  opportunities;	
  
o customer	
  satisfaction	
  and	
  loyalty	
  actions,	
  
o effective	
  targeting	
  of	
  products	
  and	
  services	
  (e.g.	
  	
  customers	
  that	
  are	
  most	
  likely	
  to	
  be	
  in	
  
good	
  health	
  or	
  those	
  customers	
  that	
  are	
  less	
  likely	
  to	
  have	
  a	
  car	
  accident).	
  
• Acquiring	
   more	
   profitable	
   new	
   customers	
   at	
   a	
   reduced	
   marketing	
   cost:	
   modelling	
   the	
   existing	
  
customers	
  will	
  lead	
  to	
  useful	
  information	
  to	
  focus	
  marketing	
  campaigns	
  on	
  the	
  most	
  interesting	
  
prospects.	
  
By	
  combining	
  data	
  mining	
  and	
  analytics,	
  insurance	
  companies	
  can	
  better	
  understand	
  which	
  customers	
  
are	
  most	
  likely	
  to	
  buy,	
  discover	
  who	
  are	
  their	
  most	
  profitable	
  customers	
  and	
  how	
  to	
  attract	
  or	
  retain	
  
more	
   of	
   them.	
   Another	
   use	
   case	
   can	
   be	
   the	
   evaluation	
   of	
   the	
   underwriting	
   process	
   to	
   improve	
   the	
  
customer	
  experience	
  during	
  this	
  on-­‐boarding	
  process.	
  
2°	
  Predictive	
  underwriting	
  for	
  life	
  insurance6	
  
Using	
  predictive	
  models,	
  in	
  theory	
  it	
  is	
  possible	
  to	
  predict	
  the	
  death	
  probability	
  of	
  a	
  customer.	
  However,	
  
the	
  low	
  frequency	
  of	
  life	
  insurance	
  claims	
  presents	
  a	
  challenge	
  to	
  modellers.	
  While	
  for	
  car	
  insurance,	
  the	
  
probability	
  of	
  a	
  customer	
  having	
  a	
  claim	
  can	
  be	
  around	
  10%,	
  for	
  life	
  insurance	
  it	
  is	
  around	
  0,1%	
  for	
  the	
  
first	
  year.	
  Not	
  only	
  does	
  this	
  mean	
  that	
  a	
  significant	
  in	
  force	
  book	
  is	
  needed	
  to	
  have	
  confidence	
  in	
  the	
  
results,	
  but	
  also	
  that	
  sufficient	
  history	
  should	
  be	
  present	
  to	
  be	
  able	
  to	
  show	
  mortality	
  experience	
  over	
  
time.	
  For	
  this	
  reason,	
  using	
  the	
  underwriting	
  decision	
  as	
  the	
  variable	
  to	
  predict	
  is	
  a	
  more	
  common	
  choice.	
  
All	
  life	
  insurance	
  companies	
  hold	
  historical	
  data	
  on	
  medical	
  underwriting	
  decisions	
  that	
  can	
  be	
  leveraged	
  
to	
  build	
  predictive	
  models	
  that	
  predict	
  underwriting	
  decisions.	
  Depending	
  on	
  how	
  the	
  model	
  is	
  used,	
  the	
  
outcome	
  can	
  be	
  a	
  reduction	
  of	
  costs	
  for	
  medical	
  examinations,	
  to	
  have	
  more	
  customer	
  friendly	
  processes	
  
by	
  avoiding	
  asking	
  numerous	
  invasive	
  personal	
  questions	
  or	
  a	
  reduction	
  in	
  time	
  needed	
  to	
  assess	
  the	
  
risks	
  by	
  automatically	
  approving	
  good	
  risks	
  and	
  focusing	
  underwriting	
  efforts	
  on	
  more	
  complex	
  cases.	
  
For	
   example,	
   if	
   the	
   predictive	
   model	
   tells	
   you	
   that	
   a	
   new	
   customer	
   has	
   a	
   high	
   degree	
   of	
   similarity	
   to	
  
customers	
   that	
   passed	
   the	
   medical	
   examination,	
   the	
   medical	
   examination	
   could	
   be	
   waved	
   for	
   this	
  
customer.	
  
If	
  this	
  sounds	
  scary	
  for	
  risk	
  professionals,	
  first	
  a	
  softer	
  approach	
  can	
  be	
  tested,	
  for	
  instance	
  by	
  improving	
  
marketing	
  actions	
  by	
  targeting	
  only	
  those	
  individuals	
  that	
  have	
  a	
  high	
  likelihood	
  to	
  be	
  in	
  good	
  health.	
  
This	
   not	
   only	
   decreases	
   the	
   cost	
   of	
   the	
   campaign,	
   but	
   also	
   avoids	
   the	
   disappointment	
   of	
   a	
   potential	
  
customer	
  who	
  is	
  refused	
  during	
  the	
  medical	
  screening	
  process.	
  
	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
5	
  http://www.customfitonline.com/news/2012/10/19/4-­‐cs-­‐versus-­‐the-­‐4-­‐ps-­‐of-­‐marketing/	
  
6	
  Predictive	
  modeling	
  for	
  life	
  insurance,	
  April	
  2010,	
  Deloitte	
  
 
	
  
8
3.1.5 Challenges	
  of	
  predictive	
  modelling	
  in	
  underwriting7	
  
Predictive	
  models	
  can	
  only	
  be	
  as	
  good	
  as	
  the	
  input	
  used	
  to	
  calibrate	
  the	
  model.	
  The	
  first	
  challenge	
  in	
  
every	
  predictive	
  modelling	
  project	
  is	
  to	
  collect	
  relevant,	
  high	
  quality	
  data	
  of	
  which	
  a	
  history	
  is	
  present.	
  As	
  
many	
   insurers	
   are	
   currently	
   replacing	
   legacy	
   systems	
   to	
   reduce	
   maintenance	
   costs,	
   this	
   can	
   be	
   at	
   the	
  
expense	
  of	
  the	
  history.	
  Actuaries	
  are	
  uniquely	
  placed	
  to	
  prevent	
  the	
  history	
  being	
  lost,	
  as	
  for	
  adequate	
  
risk	
   management;	
   a	
   portfolio’s	
   history	
   should	
   be	
   kept.	
   The	
   trend	
   of	
   moving	
   all	
   policies	
   from	
   several	
  
legacy	
  systems	
  into	
  one	
  modern	
  single	
  policy	
  administration	
  system	
  is	
  an	
  opportunity	
  that	
  must	
  be	
  seized	
  
so	
  in	
  the	
  future	
  data	
  collection	
  will	
  be	
  easier.	
  
Once	
  the	
  necessary	
  data	
  are	
  collected,	
  some	
  legal	
  or	
  compliance	
  concerns	
  need	
  to	
  be	
  addressed	
  as	
  there	
  
might	
  be	
  boundaries	
  to	
  using	
  certain	
  variables	
  in	
  the	
  underwriting	
  process.	
  In	
  Europe,	
  if	
  the	
  model	
  will	
  
influence	
  the	
  price	
  of	
  the	
  insurance,	
  gender	
  is	
  no	
  longer	
  allowed	
  as	
  an	
  explanatory	
  variable.	
  And	
  this	
  is	
  
only	
  one	
  example.	
  It	
  is	
  important	
  that	
  the	
  purpose	
  of	
  the	
  model	
  and	
  the	
  possible	
  inputs	
  are	
  discussed	
  
with	
  the	
  legal	
  department	
  prior	
  to	
  starting	
  the	
  modelling.	
  
Once	
  the	
  model	
  is	
  built,	
  it	
  is	
  important	
  that	
  the	
  users	
  realize	
  that	
  no	
  model	
  is	
  perfect.	
  This	
  means	
  that	
  
residual	
  risks	
  will	
  be	
  present	
  and	
  this	
  should	
  be	
  put	
  in	
  the	
  balance	
  against	
  the	
  gains	
  that	
  the	
  use	
  of	
  the	
  
model	
  can	
  bring.	
  
And	
  finally,	
  once	
  a	
  predictive	
  model	
  has	
  been	
  set	
  up,	
  a	
  continuous	
  reviewing	
  cycle	
  must	
  be	
  put	
  in	
  place	
  
that	
  collects	
  feedback	
  from	
  the	
  underwriting	
  and	
  sales	
  teams	
  and	
  collects	
  data	
  to	
  improve	
  and	
  refine	
  the	
  
model.	
  Building	
  a	
  predictive	
  model	
  is	
  a	
  continuous	
  improvement	
  process,	
  not	
  a	
  one-­‐off	
  project.	
  
3.2 Insurance	
  pricing	
  
3.2.1 Overview	
  of	
  existing	
  pricing	
  techniques	
  
The	
  first	
  rate-­‐making	
  techniques	
  were	
  based	
  on	
  rudimentary	
  methods	
  such	
  as	
  univariate	
  analysis	
  and	
  
later	
  iterative	
  standardized	
  univariate	
  methods	
  such	
  as	
  the	
  minimum	
  bias	
  procedure.	
  They	
  look	
  at	
  how	
  
changes	
  in	
  one	
  characteristic	
  result	
  in	
  differences	
  in	
  loss	
  frequency	
  or	
  severity.	
  	
  
Later	
   on	
   insurance	
   companies	
   moved	
   to	
   multivariate	
   methods.	
   However,	
   this	
   was	
   associated	
   with	
   a	
  
further	
   development	
   of	
   the	
   computing	
   power	
   and	
   data	
   capabilities.	
   These	
   techniques	
   are	
   now	
   being	
  
adopted	
  by	
  more	
  and	
  more	
  insurers	
  and	
  are	
  becoming	
  part	
  of	
  everyday	
  business	
  practices.	
  Multivariate	
  
analytical	
  techniques	
  focus	
  on	
  individual	
  level	
  data	
  and	
  take	
  into	
  account	
  the	
  effects	
  (interactions)	
  that	
  
many	
  different	
  characteristics	
  of	
  a	
  risk	
  have	
  on	
  one	
  another.	
  As	
  it	
  was	
  explained	
  in	
  the	
  previous	
  section,	
  
many	
  companies	
  use	
  predictive	
  modelling	
  (a	
  form	
  of	
  multivariate	
  analysis)	
  to	
  create	
  measures	
  of	
  the	
  
likelihood	
  that	
  a	
  customer	
  will	
  purchase	
  a	
  particular	
  product.	
  Banks	
  use	
  these	
  tools	
  to	
  create	
  measures	
  
(e.g.	
  credit	
  scores)	
  of	
  whether	
  a	
  client	
  will	
  be	
  able	
  to	
  meet	
  lending	
  obligations	
  for	
  a	
  loan	
  or	
  mortgage.	
  
Similarly,	
   P&C	
   insurers	
   can	
   use	
   predictive	
   models	
   to	
   predict	
   claim	
   behaviour.	
   Multivariate	
   methods	
  
provide	
  valuable	
  diagnostics	
  that	
  aid	
  in	
  understanding	
  the	
  certainty	
  and	
  reasonableness	
  of	
  results.	
  	
  
Generalized	
  Linear	
  Models	
  are	
  essentially	
  a	
  generalized	
  form	
  of	
  linear	
  models.	
  This	
  family	
  encompasses	
  
normal	
   error	
   linear	
   regression	
   models	
   and	
   the	
   nonlinear	
   exponential,	
   logistic	
   and	
   Poisson	
   regression	
  
models,	
  as	
  well	
  as	
  many	
  other	
  models,	
  such	
  as	
  log-­‐linear	
  models	
  for	
  categorical	
  data.	
  Generalized	
  linear	
  
models	
  have	
  become	
  the	
  standard	
  for	
  classification	
  rate-­‐making	
  in	
  most	
  developed	
  insurance	
  markets—
particularly	
  because	
  of	
  the	
  benefit	
  of	
  transparency.	
  Understanding	
  the	
  mathematical	
  underpinnings	
  is	
  an	
  
important	
  responsibility	
  of	
  the	
  rate-­‐making	
  actuary	
  who	
  intends	
  to	
  use	
  such	
  a	
  method.	
  Linear	
  models	
  are	
  
a	
   good	
   place	
   to	
   start	
   as	
   GLMs	
   are	
   essentially	
   a	
   generalized	
   form	
   of	
   such	
   a	
   model.	
   As	
   with	
   many	
  
techniques,	
  visualizing	
  the	
  GLM	
  results	
  is	
  an	
  intuitive	
  way	
  to	
  connect	
  the	
  theory	
  with	
  the	
  practical	
  use.	
  
GLMs	
  do	
  not	
  stand	
  alone	
  as	
  the	
  only	
  multivariate	
  classification	
  method.	
  Other	
  methods	
  such	
  as	
  CART,	
  
factor	
  analysis,	
  and	
  neural	
  networks	
  are	
  often	
  used	
  to	
  augment	
  GLM	
  analysis.	
  	
  
In	
  general	
  the	
  data	
  mining	
  techniques	
  listed	
  above	
  can	
  enhance	
  a	
  rate-­‐making	
  exercise	
  by:	
  
• whittling	
  down	
  a	
  long	
  list	
  of	
  potential	
  explanatory	
  variables	
  to	
  a	
  more	
  manageable	
  list	
  for	
  use	
  
within	
  a	
  GLM;	
  
• providing	
  guidance	
  in	
  how	
  to	
  categorize	
  discrete	
  variables;	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
7	
  Predictive	
  modelling	
  in	
  insurance:	
  key	
  issues	
  to	
  consider	
  throughout	
  the	
  lifecycle	
  of	
  a	
  model	
  
 
	
  
9
• reducing	
   the	
   dimension	
   of	
   multi-­‐level	
   discrete	
   variables	
   (i.e.,	
   condensing	
   100	
   levels,	
   many	
   of	
  
which	
  have	
  few	
  or	
  no	
  claims,	
  into	
  20	
  homogenous	
  levels);	
  
• identifying	
   candidates	
   for	
   interaction	
   variables	
   within	
   GLMs	
   by	
   detecting	
   patterns	
   of	
  
interdependency	
  between	
  variables.	
  
	
  
3.2.2 Old	
  versus	
  new	
  modelling	
  techniques	
  
The	
  adoption	
  of	
  GLMs	
  resulted	
  in	
  many	
  companies	
  seeking	
  external	
  data	
  sources	
  to	
  augment	
  what	
  had	
  
already	
   been	
   collected	
   and	
   analysed	
   about	
   their	
   own	
   policies.	
   This	
   includes	
   but	
   is	
   not	
   limited	
   to	
  
information	
   about	
   geo-­‐demographics,	
   sensor	
   data,	
   social	
   media	
   information,	
   weather,	
   and	
   property	
  
characteristics,	
  information	
  about	
  insured	
  individuals	
  or	
  business.	
  This	
  additional	
  data	
  helps	
  actuaries	
  
further	
  improve	
  the	
  granularity	
  and	
  accuracy	
  of	
  classification	
  rate-­‐making.	
  Unfortunately	
  this	
  new	
  data	
  is	
  
very	
   often	
   unstructured	
   and	
   massive,	
   and	
   hence	
   the	
   traditional	
   generalized	
   linear	
   model	
   (GLM)	
  
techniques	
  become	
  useless.	
  
With	
   so	
   many	
   unique	
   new	
   variables	
   in	
   play,	
   it	
   can	
   become	
   a	
   very	
   difficult	
   task	
   to	
   identify	
   and	
   take	
  
advantage	
   of	
   the	
   most	
   meaningful	
   correlations.	
   In	
   many	
   cases,	
   GLM	
   techniques	
   are	
   simply	
   unable	
   to	
  
penetrate	
  deeply	
  into	
  these	
  giant	
  stores.	
  Even	
  in	
  the	
  cases	
  when	
  they	
  can,	
  the	
  time	
  constraints	
  required	
  to	
  
uncover	
  the	
  critical	
  correlations	
  tend	
  to	
  be	
  onerous,	
  requiring	
  days,	
  weeks,	
  and	
  even	
  months	
  of	
  analysis.	
  
Only	
   with	
   advanced	
   techniques,	
   and	
   specifically	
   machine	
   learning,	
   can	
   companies	
   generate	
   predictive	
  
models	
  to	
  take	
  advantage	
  of	
  all	
  the	
  data	
  they	
  are	
  capturing.	
  	
  
Machine	
  learning	
  is	
  the	
  modern	
  science	
  of	
  finding	
  patterns	
  and	
  making	
  predictions	
  from	
  data	
  based	
  on	
  
work	
   in	
   multivariate	
   statistics,	
   data	
   mining,	
   pattern	
   recognition,	
   and	
   advanced/predictive	
   analytics.	
  
Machine	
  learning	
  methods	
  are	
  particularly	
  effective	
  in	
  situations	
  where	
  deep	
  and	
  predictive	
  insights	
  need	
  
to	
  be	
  uncovered	
  from	
  data	
  sets	
  that	
  are	
  large,	
  diverse	
  and	
  fast	
  changing	
  —	
  Big	
  Data.	
  Across	
  these	
  types	
  of	
  
data,	
  machine	
  learning	
  easily	
  outperforms	
  traditional	
  methods	
  on	
  accuracy,	
  scale,	
  and	
  speed.	
  
3.2.3 Personalized	
  and	
  Real-­‐time	
  pricing	
  –	
  Motor	
  Insurance	
  
In	
  order	
  to	
  price	
  risk	
  more	
  accurately,	
  insurance	
  companies	
  are	
  now	
  combining	
  analytical	
  applications	
  –	
  
e.g.	
  behavioural	
  models	
  based	
  on	
  customer	
  profile	
  data	
  –	
  with	
  a	
  continuous	
  stream	
  of	
  real	
  time	
  data	
  –	
  e.g.	
  
satellite	
  data,	
  weather	
  reports,	
  vehicle	
  sensors	
  –	
  to	
  create	
  detailed	
  and	
  personalized	
  assessment	
  of	
  risk.	
  
Usage-­‐based	
  insurance	
  (UBI)	
  has	
  been	
  around	
  for	
  a	
  while	
  –	
  it	
  began	
  with	
  Pay-­‐As-­‐You-­‐Drive	
  programs	
  
that	
  gave	
  drivers	
  discounts	
  on	
  their	
  insurance	
  premiums	
  for	
  driving	
  under	
  a	
  set	
  number	
  of	
  miles.	
  These	
  
soon	
   developed	
   into	
   Pay-­‐How-­‐You-­‐Drive	
   programs,	
   which	
   track	
   your	
   driving	
   habits	
   and	
   give	
   you	
  
discounts	
  for	
  'safe'	
  driving.	
  
UBI	
  allows	
  a	
  firm	
  to	
  snap	
  a	
  picture	
  of	
  an	
  individual's	
  specific	
  risk	
  profile,	
  based	
  on	
  that	
  individual's	
  actual	
  
driving	
  habits.	
  UBI	
  condenses	
  the	
  period	
  of	
  time	
  under	
  inspection	
  to	
  a	
  few	
  months,	
  guaranteeing	
  a	
  much	
  
more	
  relevant	
  pool	
  of	
  information.	
  With	
  all	
  this	
  data	
  available,	
  the	
  pricing	
  scheme	
  for	
  UBI	
  deviates	
  greatly	
  
from	
   that	
   of	
   traditional	
   auto	
   insurance.	
   Traditional	
   auto	
   insurance	
   relies	
   on	
   actuarial	
   studies	
   of	
  
aggregated	
  historical	
  data	
  to	
  produce	
  rating	
  factors	
  that	
  include	
  driving	
  record,	
  credit-­‐based	
  insurance	
  
score,	
  personal	
  characteristics	
  (age,	
  gender,	
  and	
  marital	
  status),	
  vehicle	
  type,	
  living	
  location,	
  vehicle	
  use,	
  
previous	
  claims,	
  liability	
  limits,	
  and	
  deductibles.	
  	
  
Policyholders	
  tend	
  to	
  think	
  of	
  traditional	
  auto	
  insurance	
  as	
  a	
  fixed	
  cost,	
  assessed	
  annually	
  and	
  usually	
  
paid	
  for	
  in	
  lump	
  sums	
  on	
  an	
  annual,	
  semi-­‐annual,	
  or	
  quarterly	
  basis.	
  However,	
  studies	
  show	
  that	
  there	
  is	
  a	
  
strong	
   correlation	
   between	
   claim	
   and	
   loss	
   costs	
   and	
   mileage	
   driven,	
   particularly	
   within	
   existing	
   price	
  
rating	
  factors	
  (such	
  as	
  class	
  and	
  territory).	
  For	
  this	
  reason,	
  many	
  UBI	
  programs	
  seek	
  to	
  convert	
  the	
  fixed	
  
costs	
  associated	
  with	
  mileage	
  driven	
  into	
  variable	
  costs	
  that	
  can	
  be	
  used	
  in	
  conjunction	
  with	
  other	
  rating	
  
factors	
   in	
   the	
   premium	
   calculation.	
   UBI	
   has	
   the	
   advantage	
   of	
   utilizing	
   individual	
   and	
   current	
   driving	
  
behaviours,	
  rather	
  than	
  relying	
  on	
  aggregated	
  statistics	
  and	
  driving	
  records	
  that	
  are	
  based	
  on	
  past	
  trends	
  
and	
  events,	
  making	
  premium	
  pricing	
  more	
  individualized	
  and	
  precise.	
  
3.2.4 Advantages	
  
UBI	
  programs	
  offer	
  many	
  advantages	
  to	
  insurers,	
  consumers	
  and	
  society.	
  Linking	
  insurance	
  premiums	
  
more	
  closely	
  to	
  actual	
  individual	
  vehicle	
  or	
  fleet	
  performance	
  allows	
  insurers	
  to	
  price	
  premiums	
  more	
  
accurately.	
   This	
   increases	
   affordability	
   for	
   lower-­‐risk	
   drivers,	
   many	
   of	
   whom	
   are	
   also	
   lower-­‐income	
  
drivers.	
  It	
  also	
  gives	
  consumers	
  the	
  ability	
  to	
  control	
  their	
  premium	
  costs	
  by	
  encouraging	
  them	
  to	
  reduce	
  
 
	
  
10
miles	
   driven	
   and	
   adopt	
   safer	
   driving	
   habits.	
   The	
   use	
   of	
   telematics	
   helps	
   insurers	
   to	
   more	
   accurately	
  
estimate	
  accident	
  damages	
  and	
  reduce	
  fraud	
  by	
  enabling	
  them	
  to	
  analyse	
  the	
  driving	
  data	
  (such	
  as	
  hard	
  
breaking,	
  speed,	
  and	
  time)	
  during	
  an	
  accident.	
  This	
  additional	
  data	
  can	
  also	
  be	
  used	
  by	
  insurers	
  to	
  refine	
  
or	
  differentiate	
  UBI	
  products.	
  	
  
3.2.5 Shortcomings/challenges	
  	
  
3.2.5.1 Organization	
  and	
  resources	
  
Taking	
   advantage	
   of	
   the	
   potential	
   of	
   Big	
   Data	
   requires	
   some	
   different	
   approaches	
   to	
   organization,	
  
resources,	
   and	
   technology.	
   As	
   in	
   many	
   new	
   technologies	
   that	
   offer	
   promise,	
   there	
   are	
   challenges	
   to	
  
successful	
   implementation	
   and	
   the	
   production	
   of	
   meaningful	
   business	
   results.	
   The	
   number	
   one	
  
organizational	
  challenge	
  is	
  determining	
  the	
  business	
  value,	
  with	
  financing	
  as	
  a	
  close	
  second.	
  Talent	
  is	
  the	
  
other	
  big	
  issue	
  –	
  identifying	
  the	
  business	
  and	
  technology	
  experts	
  inside	
  the	
  enterprise,	
  recruiting	
  new	
  
employees,	
  training	
  and	
  mentoring	
  individuals,	
  and	
  partnering	
  with	
  outside	
  resources	
  is	
  clearly	
  a	
  critical	
  
success	
  factor	
  for	
  Big	
  Data.	
  Implementing	
  the	
  new	
  technology	
  and	
  organizing	
  the	
  data	
  are	
  listed	
  as	
  lesser	
  
challenges	
  by	
  insurers,	
  although	
  there	
  are	
  still	
  areas	
  that	
  require	
  attention.	
  
3.2.5.2 Technology	
  challenges	
  
The	
  biggest	
  technology	
  challenge	
  in	
  the	
  Big	
  Data	
  world	
  is	
  framed	
  in	
  the	
  context	
  of	
  different	
  Big	
  Data	
  “V”	
  
characteristics.	
  These	
  include	
  the	
  standard	
  three	
  V’s	
  of	
  volume,	
  velocity,	
  and	
  variety,	
  plus	
  two	
  more	
  –	
  
veracity	
   and	
   value.	
   The	
   variety	
   and	
   veracity	
   of	
   the	
   data	
   presents	
   the	
   biggest	
   challenges.	
   As	
   insurers	
  
venture	
   beyond	
   analysis	
   of	
   structured	
   transaction	
   data	
   to	
   incorporate	
   external	
   data	
   and	
   unstructured	
  
data	
  of	
  all	
  sorts,	
  the	
  ability	
  to	
  combine	
  and	
  input	
  the	
  data	
  into	
  an	
  analytic	
  analysis	
  may	
  be	
  complicated.	
  On	
  
one	
  hand,	
  the	
  variety	
  expresses	
  the	
  promise	
  of	
  Big	
  Data,	
  but	
  on	
  the	
  other	
  hand,	
  the	
  technical	
  challenges	
  
are	
   significant.	
   The	
   veracity	
   of	
   the	
   data	
   is	
   also	
   deemed	
   as	
   a	
   challenge.	
   It	
   is	
   true	
   that	
   some	
   Big	
   Data	
  
analyses	
  do	
  not	
  require	
  the	
  data	
  to	
  be	
  as	
  cleaned	
  and	
  organized	
  as	
  in	
  traditional	
  approaches.	
  However,	
  
the	
  data	
  must	
  still	
  reflect	
  the	
  underlying	
  truth/reality	
  of	
  the	
  domain.	
  
3.2.5.3 Technology	
  Approaches	
  
Technology	
  should	
  not	
  be	
  the	
  first	
  focus	
  area	
  for	
  evaluating	
  the	
  potential	
  of	
  Big	
  Data	
  in	
  an	
  organization.	
  
However,	
   choosing	
   the	
   best	
   technology	
   platform	
   for	
   your	
   organization	
   and	
   business	
   problems	
   does	
  
become	
  an	
  important	
  consideration	
  for	
  success.	
  Cloud	
  computing	
  will	
  play	
  a	
  very	
  important	
  role	
  in	
  Big	
  
Data.	
  Although	
  there	
  are	
  challenges	
  and	
  new	
  approaches	
  required	
  for	
  Big	
  Data,	
  there	
  is	
  a	
  growing	
  body	
  of	
  
experience,	
  expertise,	
  and	
  best	
  practices	
  to	
  assist	
  in	
  successful	
  Big	
  Data	
  implementations.	
  
3.3 Insurance	
  Reserving	
  
Loss	
  reserving	
  is	
  a	
  classic	
  actuarial	
  problem	
  encountered	
  extensively	
  in	
  motor,	
  property	
  and	
  casualty	
  as	
  
well	
  as	
  in	
  health	
  insurance.	
  It	
  is	
  a	
  consequence	
  of	
  the	
  fact	
  that	
  insurers	
  need	
  to	
  set	
  reserves	
  to	
  cover	
  
future	
  liabilities	
  related	
  to	
  the	
  book	
  of	
  contracts.	
  In	
  other	
  words	
  the	
  insurer	
  has	
  to	
  hold	
  funds	
  aside	
  to	
  
meet	
  future	
  liabilities	
  attached	
  to	
  incurred	
  claims.	
  
	
  
In	
  non-­‐life	
  insurance,	
  most	
  policies	
  run	
  for	
  a	
  period	
  of	
  12	
  months.	
  However	
  the	
  claims	
  payment	
  process	
  
can	
  take	
  years	
  or	
  even	
  decades.	
  In	
  particular,	
  losses	
  arising	
  from	
  casualty	
  insurance	
  can	
  take	
  a	
  long	
  time	
  
to	
   settle	
   and	
   even	
   when	
   the	
   claims	
   are	
   acknowledged,	
   it	
   may	
   take	
   time	
   to	
   establish	
   the	
   extent	
   of	
   the	
  
claims	
   settlement	
   costs.	
   A	
   well-­‐known	
   and	
   costly	
   example	
   is	
   provided	
   by	
   the	
   claims	
   from	
   asbestos	
  
liabilities.	
  Thus	
  it	
  is	
  not	
  a	
  surprise	
  that	
  the	
  biggest	
  item	
  on	
  the	
  liabilities	
  side	
  of	
  an	
  insurer’s	
  balance	
  sheet	
  
is	
   often	
   the	
   provision	
   of	
   reserves	
   for	
   future	
   claims	
   payments.	
   It	
   is	
   the	
   job	
   of	
   the	
   reserving	
   actuary	
   to	
  
predict,	
   with	
   maximum	
   accuracy,	
   the	
   total	
   amount	
   necessary	
   to	
   pay	
   those	
   claims	
   that	
   the	
   insurer	
   has	
  
legally	
  committed	
  to	
  cover	
  for.	
  
	
  
Historically,	
  reserving	
  was	
  based	
  on	
  deterministic	
  calculations	
  with	
  pen	
  and	
  paper,	
  combined	
  with	
  expert	
  
judgement.	
   Since	
   the	
   1980s,	
   the	
   arrival	
   of	
   personal	
   computers	
   and	
   ‘spreadsheet’	
   software	
   packages	
  
induced	
  a	
  real	
  change	
  for	
  the	
  reserving	
  actuaries.	
  The	
  use	
  of	
  spreadsheets	
  does	
  not	
  only	
  result	
  in	
  gain	
  of	
  
calculation	
  time	
  but	
  allows	
  also	
  testing	
  different	
  scenarios	
  and	
  the	
  sensitivity	
  of	
  the	
  forecasts.	
  The	
  first	
  
simple	
  models	
  used	
  by	
  actuaries	
  started	
  to	
  evolve	
  towards	
  more	
  developed	
  ideas	
  through	
  the	
  evolution	
  
of	
   the	
   IT	
   resources.	
   Moreover	
   the	
   recent	
   changes	
   in	
   regulatory	
   requirements,	
   such	
   as	
   Solvency	
   II	
   in	
  
Europe,	
  have	
  showed	
  the	
  need	
  of	
  stochastic	
  models	
  and	
  more	
  precise	
  statistical	
  techniques.	
  
	
  
	
  
	
  
 
	
  
11
3.3.1 Classical	
  methods	
  
There	
  are	
  a	
  lot	
  of	
  different	
  frameworks	
  and	
  models	
  used	
  by	
  reserving	
  actuaries	
  to	
  compute	
  the	
  technical	
  
provisions,	
  and	
  it	
  is	
  not	
  the	
  goal	
  of	
  this	
  paper	
  to	
  review	
  them	
  in	
  an	
  exhaustive	
  way	
  but	
  rather	
  to	
  show	
  that	
  
they	
  share	
  the	
  central	
  notion	
  of	
  triangle.	
  A	
  triangle	
  is	
  a	
  way	
  of	
  presenting	
  data	
  in	
  the	
  form	
  of	
  a	
  triangular	
  
structure	
  showing	
  the	
  development	
  of	
  claims	
  over	
  time	
  for	
  each	
  origin	
  period.	
  An	
  origin	
  period	
  can	
  be	
  the	
  
year	
  the	
  policy	
  was	
  written	
  or	
  earned,	
  or	
  the	
  loss	
  occurrence	
  period.	
  	
  
	
  
After	
  having	
  used	
  deterministic	
  models,	
  reserving	
  generally	
  switches	
  to	
  stochastic	
  models.	
  These	
  models	
  
allow	
  for	
  quantifying	
  reserve	
  risk.	
  	
  
	
  
The	
  use	
  of	
  models	
  based	
  on	
  aggregated	
  data	
  used	
  to	
  be	
  convenient	
  in	
  the	
  past	
  when	
  IT	
  resources	
  were	
  
limited	
  but	
  is	
  more	
  and	
  more	
  questionable	
  nowadays	
  when	
  we	
  have	
  huge	
  computational	
  power	
  at	
  hand	
  
at	
  an	
  affordable	
  price.	
  Therefore	
  there	
  is	
  a	
  need	
  to	
  move	
  to	
  models	
  that	
  fully	
  use	
  data	
  available	
  in	
  the	
  
insurers’	
  data	
  warehouses.	
  
	
  
3.3.2 Micro-­‐level	
  reserving	
  methods	
  
Unlike	
  aggregate	
  models	
  (or	
  macro-­‐level	
  models),	
  micro-­‐level	
  reserving	
  methods	
  (also	
  called	
  individual	
  
claim	
   level	
   models)	
   use	
   individual	
   claims	
   data	
   as	
   inputs	
   and	
   estimate	
   outstanding	
   liabilities	
   for	
   each	
  
individual	
  claim.	
  Unlike	
  the	
  models	
  detailed	
  in	
  the	
  previous	
  section,	
  they	
  model	
  very	
  precisely	
  the	
  lifetime	
  
development	
   process	
   of	
   each	
   individual	
   claim,	
   including	
   events	
   such	
   as	
   claim	
   occurrence,	
   reporting,	
  
payments	
  and	
  settlement.	
  Moreover	
  they	
  can	
  include	
  micro-­‐level	
  covariates	
  such	
  as	
  information	
  about	
  
the	
  policy,	
  the	
  policyholder,	
  claim,	
  claimant	
  and	
  transactions.	
  
	
  
When	
  well	
  specified,	
  such	
  models	
  are	
  expected	
  to	
  generate	
  reliable	
  reserve	
  estimates.	
  Indeed	
  the	
  ability	
  
to	
   model	
   the	
   claims	
   development	
   at	
   the	
   individual	
   level	
   and	
   to	
   incorporate	
   micro-­‐level	
   covariate	
  
information	
  allows	
  micro-­‐level	
  models	
  to	
  handle	
  heterogeneities	
  in	
  claims	
  data	
  efficiently.	
  Moreover	
  the	
  
large	
   amount	
   of	
   data	
   used	
   in	
   modelling	
   can	
   help	
   to	
   avoid	
   issues	
   of	
   over-­‐parameterization	
   and	
   lack	
   of	
  
robustness.	
   As	
   a	
   consequence,	
   micro-­‐level	
   models	
   are	
   especially	
   significant	
   under	
   changing	
  
environments,	
  as	
  these	
  changes	
  can	
  be	
  indicated	
  by	
  appropriate	
  covariates.	
  
	
  
3.4 Claims	
  Management	
  
Big	
  Data	
  can	
  play	
  a	
  tremendous	
  role	
  in	
  the	
  improvement	
  of	
  claims	
  management.	
  It	
  provides	
  access	
  to	
  data	
  
that	
  was	
  not	
  available	
  before,	
  and	
  makes	
  the	
  claims	
  processing	
  faster.	
  Therefore	
  it	
  enables	
  improved	
  risk	
  
management,	
  reduces	
  loss	
  adjustment	
  expenses	
  and	
  enhances	
  quality	
  of	
  service	
  resulting	
  in	
  increased	
  
customer	
  retention.	
  Below	
  we	
  present	
  details	
  of	
  how	
  Big	
  Data	
  analytics	
  improves	
  fraud	
  detection	
  process.	
  
3.4.1 Fraud	
  detection	
  
It	
  is	
  estimated	
  that	
  a	
  typical	
  organization	
  loses	
  5%	
  of	
  its	
  revenues	
  to	
  fraud	
  each	
  year8.	
  	
  The	
  total	
  cost	
  of	
  
insurance	
  fraud	
  (non-­‐health	
  insurance)	
  in	
  the	
  US	
  is	
  estimated	
  to	
  be	
  more	
  than	
  $40	
  billion	
  per	
  year9.	
  	
  The	
  
advent	
  of	
  Big	
  Data	
  &	
  Analytics	
  has	
  provided	
  new	
  and	
  powerful	
  tools	
  to	
  fight	
  fraud.	
  	
  	
  
3.4.2 What	
  are	
  the	
  current	
  challenges	
  in	
  fraud	
  detection?	
  
The	
  first	
  challenge	
  is	
  finding	
  the	
  right	
  data.	
  	
  Analytical	
  models	
  need	
  data	
  and	
  in	
  a	
  fraud	
  detection	
  setting	
  
this	
   is	
   not	
   always	
   that	
   evident.	
   	
   Collected	
   fraud	
   data	
   are	
   often	
   very	
   skew,	
   with	
   typically	
   less	
   than	
   1%	
  
fraudsters,	
  which	
  seriously	
  complicates	
  the	
  detection	
  task.	
  	
  Also	
  the	
  asymmetric	
  costs	
  of	
  missing	
  fraud	
  
versus	
   harassing	
   non-­‐fraudulent	
   customers	
   represent	
   important	
   model	
   difficulties.	
   	
   Furthermore,	
  
fraudsters	
   try	
   to	
   constantly	
   outperform	
   the	
   analytical	
   models	
   such	
   that	
   these	
   models	
   should	
   be	
  
permanently	
  monitored	
  and	
  re-­‐configured	
  on	
  an	
  ongoing	
  basis.	
  	
  	
  
3.4.3 What	
  analytical	
  approaches	
  are	
  being	
  used	
  to	
  tackle	
  fraud?	
  
Most	
   of	
   the	
   fraud	
   detection	
   models	
   in	
   use	
   nowadays	
   are	
   expert	
   based	
   models.	
   	
   When	
   data	
   becomes	
  
available,	
  one	
  can	
  start	
  doing	
  analytics.	
  	
  A	
  first	
  approach	
  is	
  supervised	
  learning	
  which	
  analyses	
  a	
  labelled	
  
data	
   set	
   of	
   historically	
   observed	
   fraud	
   behaviour.	
   	
   It	
   can	
   be	
   used	
   to	
   both	
   predict	
   fraud	
   as	
   well	
   as	
   the	
  
amount	
   thereof.	
   	
   Unsupervised	
   learning	
   starts	
   from	
   an	
   unlabelled	
   data	
   set	
   and	
   performs	
   anomaly	
  
detection.	
   	
   Finally,	
   Social	
   network	
   learning	
   analyses	
   fraud	
   behaviour	
   in	
   networks	
   of	
   linked	
   entities.	
  	
  
Throughout	
  our	
  research,	
  it	
  has	
  been	
  found	
  that	
  this	
  approach	
  is	
  superior	
  to	
  all	
  others!	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
8	
  www.acfe.com	
  
9	
  www.fbi.gov	
  
 
	
  
12
3.4.4 What	
  are	
  the	
  key	
  characteristics	
  of	
  successful	
  analytical	
  models	
  for	
  fraud	
  detection?	
  	
  
Successful	
  fraud	
  analytical	
  models	
  should	
  satisfy	
  various	
  requirements.	
  	
  First,	
  they	
  should	
  achieve	
  good	
  
statistical	
  performance	
  in	
  terms	
  of	
  recall	
  or	
  hit	
  rate,	
  which	
  is	
  the	
  percentage	
  of	
  fraudsters	
  labelled	
  by	
  the	
  
analytical	
   model	
   as	
   suspicious,	
   and	
   precision,	
   which	
   is	
   the	
   percentage	
   of	
   fraudsters	
   amongst	
   the	
   ones	
  
labelled	
   as	
   suspicious.	
   	
   Next,	
   the	
   analytical	
   models	
   should	
   not	
   be	
   based	
   on	
   complex	
   mathematical	
  
formulas	
  (such	
  as	
  neural	
  networks,	
  support	
  vector	
  machines,...)	
  but	
  should	
  provide	
  clear	
  insight	
  into	
  the	
  
fraud	
   mechanisms	
   adopted.	
   	
   This	
   is	
   particularly	
   important	
   since	
   the	
   insights	
   gained	
   will	
   be	
   used	
   to	
  
develop	
  new	
  fraud	
  prevention	
  strategies.	
  	
  Also	
  the	
  operational	
  efficiency	
  of	
  the	
  fraud	
  analytical	
  model	
  
needs	
  to	
  be	
  evaluated.	
  	
  This	
  refers	
  to	
  the	
  amount	
  of	
  resources	
  needed	
  to	
  calculate	
  the	
  fraud	
  score	
  and	
  
adequately	
  act	
  upon	
  it.	
  	
  E.g.,	
  in	
  a	
  credit	
  card	
  fraud	
  environment,	
  a	
  decision	
  needs	
  to	
  be	
  made	
  within	
  a	
  few	
  
seconds	
  after	
  the	
  transaction	
  was	
  initiated.	
  	
  	
  
3.4.5 Use	
  of	
  social	
  network	
  analytics	
  to	
  detect	
  fraud10	
  
	
  Research	
   has	
   proven	
   that	
   network	
   models	
   significantly	
   outperform	
   non-­‐network	
   models	
   in	
   terms	
   of	
  
accuracy,	
  precision	
  and	
  recall.	
  Network	
  analytics	
  can	
  help	
  improve	
  fraud	
  detection	
  techniques.	
  Fraud	
  is	
  
present	
  in	
  many	
  critical	
  human	
  processes	
  such	
  as	
  credit	
  card	
  transactions,	
  insurance	
  claim	
  fraud,	
  opinion	
  
fraud,	
   social	
   security	
   fraud...	
   Fraud	
   can	
   be	
   defined	
   by	
   the	
   following	
   five	
   characteristics.	
   	
   Fraud	
   is	
   an	
  
uncommon,	
   well-­‐considered,	
   imperceptibly	
   concealed,	
   time-­‐evolving	
   and	
   often	
   carefully	
   organized	
   crime,	
  
which	
  appears	
  in	
  many	
  types	
  and	
  forms.	
  Before	
  applying	
  fraud	
  detection	
  techniques,	
  these	
  five	
  issues	
  
should	
  be	
  resolved	
  or	
  counterbalanced.	
  	
  
	
  
Fraud	
   is	
   an	
   uncommon	
   crime	
   and	
   this	
   means	
   that	
   it	
   is	
   an	
   extremely	
   skewed	
   class	
   distribution.	
  
Rebalancing	
  techniques	
  could	
  be	
  used	
  such	
  as	
  the	
  SMOTE	
  to	
  counterbalance	
  this	
  effect.	
  SMOTE	
  consists	
  in	
  
under	
  sampling	
  the	
  majority	
  class	
  of	
  data	
  (reduce	
  the	
  number	
  of	
  legitimate	
  cases)	
  and	
  oversampling	
  the	
  
minority	
  class	
  of	
  data	
  (duplicate	
  of	
  fraud	
  cases	
  or	
  create	
  artificial	
  fraud	
  cases).	
  	
  	
  
Complex	
  fraud	
  structures	
  are	
  well-­‐considered,	
  this	
  implies	
  that	
  there	
  will	
  be	
  changes	
  in	
  behaviour	
  over	
  
time	
  so	
  not	
  every	
  time	
  period	
  will	
  have	
  the	
  same	
  importance.	
  A	
  temporal	
  weighting	
  adjustment	
  should	
  
put	
  an	
  emphasis	
  on	
  the	
  more	
  important	
  periods	
  (more	
  recent	
  data	
  periods)	
  that	
  could	
  be	
  explanatory	
  of	
  
the	
  fraudulent	
  behaviour.	
  
Fraud	
  is	
  imperceptibly	
  concealed	
  meaning	
  that	
  it	
  is	
  difficult	
  to	
  identify	
  fraud.	
  One	
  could	
  leverage	
  on	
  expert	
  
knowledge	
  to	
  create	
  features	
  and	
  help	
  identify	
  fraud.	
  	
  
Fraud	
   is	
   time-­‐evolving.	
   The	
   period	
   of	
   study	
   should	
   be	
   selected	
   carefully	
   taking	
   into	
   consideration	
   that	
  
fraud	
   evolves	
   over	
   time.	
   How	
   much	
   of	
   previous	
   time	
   periods	
   could	
   explain	
   or	
   affect	
   the	
   present?	
   The	
  
model	
  should	
  incorporate	
  these	
  changes	
  over	
  time.	
  Another	
  question	
  to	
  rise	
  is	
  in	
  what	
  time-­‐window	
  the	
  
model	
  should	
  be	
  able	
  to	
  detect	
  fraud:	
  short,	
  medium	
  or	
  long	
  term.	
  
The	
   last	
   characteristic	
   of	
   fraud	
   is	
   that	
   it	
   is	
   most	
   of	
   the	
   time	
   carefully	
   organized.	
   Fraud	
   is	
   often	
   not	
   an	
  
individual	
  phenomenon,	
  in	
  fact	
  there	
  are	
  many	
  interactions	
  between	
  fraudsters.	
  Often	
  there	
  are	
  fraud	
  
sub-­‐networks	
   developing	
   in	
   a	
   bigger	
   network.	
   Social	
   network	
   analysis	
   could	
   be	
   used	
   to	
   detect	
   these	
  
networks.	
  	
  
Social	
  Network	
  analysis	
  helps	
  deriving	
  useful	
  patterns	
  and	
  insights	
  by	
  exploiting	
  the	
  relational	
  structure	
  
between	
  objects.	
  
A	
  network	
  consists	
  of	
  two	
  set	
  of	
  elements:	
  the	
  objects	
  of	
  the	
  network	
  which	
  are	
  called	
  nodes	
  and	
  the	
  
relationships	
  between	
  nodes	
  which	
  are	
  called	
  links.	
  The	
  links	
  connect	
  two	
  or	
  more	
  nodes.	
  	
  A	
  weight	
  could	
  
be	
   assigned	
   to	
   the	
   nodes	
   and	
   links	
   to	
   measure	
   the	
   magnitude	
   of	
   the	
   crime	
   or	
   the	
   intensity	
   of	
   the	
  
relationship.	
  When	
  constructing	
  such	
  networks,	
  focus	
  will	
  be	
  put	
  on	
  the	
  neighbourhood	
  of	
  a	
  node	
  which	
  
is	
  a	
  subgraph	
  of	
  network	
  around	
  the	
  node	
  of	
  interest	
  (fraudster).	
  	
  
Once	
   a	
   network	
   has	
   been	
   constructed,	
   how	
   could	
   this	
   network	
   be	
   used	
   as	
   an	
   indicator	
   of	
   fraudulent	
  
activities?	
   Fraud	
   could	
   be	
   detected	
   by	
   answering	
   following	
   question:	
   Does	
   the	
   network	
   contain	
  
statistically	
   significant	
   patterns	
   of	
   homophily?	
   Detection	
   of	
   fraud	
   relies	
   on	
   a	
   concept	
   often	
   used	
   in	
  
sociology	
  which	
  is	
  called	
  homophily.	
  Homophily	
  in	
  networks	
  consists	
  in	
  people	
  have	
  a	
  strong	
  tendency	
  to	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
10	
  based	
  on	
  the	
  research	
  of	
  Véronique	
  Van	
  Vlasselaer	
  (KULeuven)	
  
	
  
 
	
  
13
associate	
  with	
  other	
  whom	
  they	
  perceive	
  as	
  being	
  similar	
  to	
  themselves	
  in	
  some	
  way.	
  This	
  concept	
  could	
  
be	
  translated	
  in	
  fraud	
  networks:	
  fraudulent	
  people	
  are	
  more	
  likely	
  to	
  be	
  connected	
  to	
  other	
  fraudulent	
  
people.	
   Clustering	
   techniques	
   could	
   be	
   used	
   to	
   detect	
   significant	
   pattern	
   of	
   homophily	
   and	
   thus	
   could	
  
spot	
  fraudsters.	
  	
  
Given	
  a	
  homophilic	
  network	
  with	
  evidence	
  of	
  fraud	
  clusters	
  then	
  it	
  is	
  possible	
  to	
  extract	
  features	
  from	
  the	
  
network	
   around	
   the	
   node(s)	
   of	
   interest	
   (fraud	
   activity)	
   which	
   is	
   also	
   called	
   the	
   neighbourhood	
   of	
   the	
  
node.	
  This	
  process	
  is	
  called	
  the	
  featurization	
  process:	
  extracting	
  features	
  for	
  each	
  network	
  object	
  based	
  
on	
  its	
  neighbourhood.	
  	
  Focus	
  will	
  be	
  put	
  on	
  the	
  first-­‐order	
  neighbourhood	
  (first-­‐degree	
  links)	
  also	
  known	
  
as	
  the	
  “egonet”.	
  (ego:	
  node	
  of	
  interest	
  surrounded	
  by	
  its	
  direct	
  associates	
  known	
  as	
  alters).	
  Featurization	
  
extraction	
  happens	
  at	
  two	
  levels:	
  egonet	
  generic	
  features	
  (how	
  many	
  fraudulent	
  resources	
  are	
  associated	
  
to	
  that	
  company,	
  is	
  there	
  relationships	
  between	
  resources...)	
  and	
  alter	
  specific	
  features	
  (how	
  similar	
  are	
  
the	
  alter	
  to	
  the	
  ego,	
  is	
  the	
  alter	
  involved	
  in	
  many	
  fraud	
  cases	
  or	
  not).	
  	
  
Once	
   these	
   first-­‐order	
   neighbourhood	
   features	
   for	
   each	
   subject	
   of	
   interest	
   (companies)	
   have	
   been	
  
extracted	
  such	
  as	
  degree	
  of	
  fraudulent	
  resources,	
  the	
  weight	
  of	
  the	
  fraudulent	
  resources,	
  it	
  is	
  then	
  easy	
  to	
  
derive	
  the	
  propagation	
  effect	
  of	
  these	
  fraudulent	
  influences	
  through	
  the	
  network.	
  	
  
To	
   conclude,	
   network	
   models	
   always	
   outperform	
   non-­‐network	
   models	
   as	
   they	
   are	
   able	
   to	
   better	
  
distinguish	
   fraudsters	
   from	
   non-­‐fraudsters.	
   	
   They	
   are	
   also	
   more	
   precise	
   in	
   generating	
   high-­‐risk	
  
companies	
  and	
  smaller	
  list	
  and	
  better	
  detect	
  more	
  fraudulent	
  corporates.	
  
3.4.6 Fraud	
  detection	
  in	
  motor	
  insurance	
  –	
  Usage-­‐Based	
  Insurance	
  example	
  
In	
   2014,	
   Coalition	
   Against	
   Insurance	
   Fraud11,	
   with	
   assistance	
   of	
   business	
   analytics	
   company	
   SAS,	
   has	
  
published	
  a	
  report	
  in	
  which	
  it	
  stresses	
  that	
  technology	
  plays	
  a	
  growing	
  role	
  in	
  fighting	
  fraud.	
  “Insurers	
  are	
  
investing	
  in	
  different	
  technologies	
  to	
  combat	
  fraud,	
  but	
  a	
  common	
  component	
  to	
  all	
  these	
  solutions	
  is	
  data,”	
  
said	
   Stuart	
   Rose,	
   Global	
   Insurance	
   Marketing	
   Principal	
   at	
   SAS.	
   “The	
   ability	
   to	
   aggregate	
   and	
   easily	
  
visualize	
   data	
   is	
   essential	
   to	
   identify	
   specific	
   fraud	
   patterns.”	
   “Technology	
   is	
   playing	
   a	
   larger	
   and	
   more	
  
trusted	
  role	
  with	
  insurers	
  in	
  countering	
  growing	
  fraud	
  threats.	
  Software	
  tools	
  provide	
  the	
  efficiency	
  insurers	
  
need	
  to	
  thwart	
  more	
  scams	
  and	
  impose	
  downward	
  pressure	
  on	
  premiums	
  for	
  policyholders,”	
  said	
  Dennis	
  Jay,	
  
the	
  Coalition’s	
  executive	
  director.	
  
In	
  motor	
  insurance,	
  a	
  good	
  example	
  is	
  Usage-­‐Based	
  Insurance	
  (UBI),	
  where	
  insurers	
  can	
  benefit	
  from	
  the	
  
superior	
   fraud	
   detection	
   that	
   telematics	
   can	
   provide.	
   It	
   equips	
   an	
   insurer	
   with	
   driving	
   behaviour	
   and	
  
driving	
  exposure	
  patterns	
  including	
  information	
  about	
  speeding,	
  driving	
  dynamics,	
  driving	
  trips,	
  day	
  and	
  
night	
  driving	
  patterns,	
  garaging	
  address	
  or	
  mileage.	
  In	
  some	
  sense	
  UBI	
  can	
  become	
  a	
  “lie	
  detector”	
  and	
  
can	
  help	
  companies	
  to	
  detect	
  falsification	
  of	
  the	
  garaging	
  address,	
  annual	
  mileage	
  or	
  driving	
  behaviour.	
  
Thanks	
  to	
  recording	
  vehicle’s	
  geographical	
  location	
  and	
  detecting	
  sharp	
  braking	
  and	
  harsh	
  acceleration	
  
during	
  an	
  accident,	
  an	
  insurer	
  can	
  analyse	
  accident	
  details	
  and	
  estimate	
  accident	
  damages.	
  The	
  telematics	
  
devices	
   used	
   in	
   the	
   UBI	
   can	
   contain	
   first	
   notice	
   of	
   loss	
   (FNOL)	
   services,	
   providing	
   very	
   valuable	
  
information	
  for	
  insurers.	
  Analytics	
  performed	
  on	
  this	
  data	
  provide	
  additional	
  evidence	
  to	
  consider	
  when	
  
investigating	
  a	
  claim,	
  and	
  can	
  help	
  to	
  reduce	
  fraud	
  and	
  claims	
  disputes.	
  
4 Legal	
  aspects	
  of	
  Big	
  Data	
  
4.1 Introduction	
  
Data	
  processing	
  lies	
  at	
  the	
  very	
  heart	
  of	
  the	
  insurance	
  activities.	
  Insurers	
  and	
  intermediaries	
  collect	
  and	
  
process	
  vast	
  amounts	
  of	
  personal	
  data	
  about	
  their	
  customers.	
  At	
  the	
  same	
  time	
  they	
  are	
  dealing	
  with	
  a	
  
particular	
  type	
  of	
  ‘discrimination’	
  among	
  their	
  insureds.	
  Like	
  all	
  businesses	
  operating	
  in	
  Europe,	
  insurers	
  
are	
   subject	
   to	
   European	
   and	
   national	
   data	
   protection	
   laws	
   and	
   anti-­‐discrimination	
   rules.	
   The	
   fast	
  
technological	
   evolution	
   and	
   globalization	
   has	
   activated	
   a	
   comprehensive	
   reform	
   of	
   the	
   current	
   Data	
  
Protection	
  laws.	
  The	
  EU	
  hopes	
  to	
  complete	
  a	
  new	
  General	
  Data	
  Protection	
  Regulation	
  at	
  the	
  end	
  of	
  this	
  
year.	
  Insurers	
  are	
  concerned	
  that	
  this	
  new	
  Regulation	
  could	
  introduce	
  unintended	
  consequences	
  for	
  the	
  
insurance	
  industry.	
  
	
  
	
  
	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
11	
  http://www.insurancefraud.org/about-­‐us.htm	
  
 
	
  
14
4.2 Data	
  processing	
  
4.2.1 Legislation:	
  an	
  overview	
  
Insurers	
   collect	
   and	
   process	
   data	
   to	
   analyse	
   risks	
   that	
   individuals	
   wish	
   to	
   cover,	
   to	
   tailor	
   products	
  
accordingly,	
  to	
  valuate	
  and	
  pay	
  claims	
  and	
  benefits,	
  and	
  detect	
  and	
  prevent	
  insurance	
  fraud.	
  The	
  rise	
  of	
  
Big	
   Data	
   presents	
   opportunities	
   to	
   offer	
   more	
   creative,	
   competitive	
   pricing	
   and,	
   importantly,	
   predict	
  
customers’	
   behavioural	
   activity.	
   As	
   insurers	
   continue	
   to	
   explore	
   this	
   relatively	
   untapped	
   resource,	
  
evolutions	
  in	
  data	
  processing	
  legislation	
  need	
  to	
  be	
  followed	
  very	
  closely.	
  	
  	
  
	
  
The	
   protection	
   of	
   personal	
   data	
   was	
   -­‐	
   as	
   a	
   separate	
   right	
   granted	
   to	
   an	
   individual	
   -­‐	
   for	
   the	
   first	
   time	
  
guaranteed	
  in	
  the	
  Convention	
  for	
  the	
  Protection	
  of	
  Individuals	
  with	
  regard	
  to	
  Automatic	
  Processing	
  
of	
  Personal	
  Data	
  (Convention	
  108).	
  It	
  was	
  adopted	
  by	
  the	
  Council	
  of	
  Europe	
  in	
  1981.	
  
The	
  current,	
  principal	
  EU	
  legal	
  instrument	
  establishing	
  rules	
  for	
  fair	
  personal	
  data	
  processing	
  is	
  the	
  Data	
  
Protection	
  Directive	
  (95/46/EC)	
  of	
  1995,	
  which	
  regulates	
  the	
  protection	
  of	
  individuals	
  with	
  regard	
  to	
  
the	
  processing	
  of	
  personal	
  data	
  and	
  the	
  free	
  movement	
  of	
  such	
  data.	
  As	
  a	
  framework	
  law,	
  the	
  Directive	
  
had	
  to	
  be	
  implemented	
  in	
  EU	
  Member	
  States	
  through	
  national	
  laws.	
  This	
  Directive	
  has	
  set	
  a	
  standard	
  for	
  
the	
  legal	
  definition	
  of	
  personal	
  data	
  and	
  regulatory	
  responses	
  to	
  the	
  use	
  of	
  personal	
  data.	
  The	
  provisions	
  
includes	
  principles	
  related	
  to	
  data	
  quality,	
  criteria	
  for	
  making	
  data	
  processing	
  legitimate	
  and	
  the	
  essential	
  
right	
  not	
  to	
  be	
  subject	
  to	
  automated	
  individual	
  decisions.	
  
The	
   Data	
   Protection	
   Directive	
   was	
   complemented	
   by	
   other	
   legal	
   instruments,	
   such	
   as	
   the	
   E-­‐Privacy	
  
Directive	
  (2002/58/EC),	
  part	
  of	
  a	
  package	
  of	
  5	
  new	
  Directives	
  that	
  aim	
  to	
  reform	
  the	
  legal	
  and	
  regulatory	
  
framework	
  of	
  electronic	
  communications	
  services	
  in	
  the	
  EU.	
  Personal	
  data	
  and	
  individuals’	
  fundamental	
  
right	
   to	
   privacy	
   needs	
   to	
   be	
   protected	
   but	
   at	
   the	
   same	
   time	
   the	
   legislator	
   must	
   take	
   into	
   account	
   the	
  
legitimate	
  interests	
  of	
  governments	
  and	
  businesses.	
  One	
  of	
  the	
  innovative	
  provisions	
  of	
  this	
  Directive	
  was	
  
the	
  introduction	
  of	
  a	
  legal	
  framework	
  for	
  the	
  use	
  of	
  devices	
  for	
  storing	
  or	
  retrieving	
  information,	
  such	
  as	
  
cookies.	
  Companies	
  must	
  also	
  inform	
  customers	
  of	
  the	
  data	
  processing	
  to	
  which	
  their	
  data	
  will	
  be	
  subject	
  
and	
   obtain	
   subscriber	
   consent	
   before	
   using	
   traffic	
   data	
   for	
   marketing	
   or	
   before	
   offering	
   added	
   value	
  
services	
  with	
  traffic	
  or	
  location	
  data.	
  The	
  EU	
  Cookie	
  Directive	
  (2009/136/EC),	
  an	
  amendment	
  of	
  the	
  E-­‐
Privacy	
  Directive,	
  aims	
  to	
  increase	
  consumer	
  protection	
  and	
  requires	
  websites	
  to	
  obtain	
  informed	
  consent	
  
from	
  visitors	
  before	
  they	
  store	
  information	
  on	
  a	
  computer	
  or	
  any	
  web	
  connected	
  device.	
  
In	
  2006	
  the	
  EU	
  Data	
  Retention	
  Directive	
  (2006/24/EC)	
  was	
  adopted	
  as	
  an	
  anti-­‐terrorism	
  measure	
  after	
  
the	
   terrorist	
   attacks	
   in	
   Madrid	
   and	
   London.	
   However	
   on	
   8	
   April	
   2014,	
   the	
   European	
   Court	
   of	
  
Justice	
  declared	
   this	
   Directive	
   invalid.	
   The	
   Court	
   took	
   the	
   view	
   that	
   the	
   Directive	
   does	
   not	
   meet	
   the	
  
principle	
  of	
  proportionality	
  and	
  should	
  have	
  provided	
  more	
  safeguards	
  to	
  protect	
  the	
  fundamental	
  rights	
  
with	
  respect	
  to	
  private	
  life	
  and	
  to	
  the	
  protection	
  of	
  personal	
  data.	
  
Belgium	
  has	
  established	
  a	
  Privacy	
  Act	
  or	
  Data	
  Protection	
  Act	
  in	
  1992.	
  Since	
  the	
  introduction	
  of	
  the	
  EU	
  
Data	
  Protection	
  Directive	
  (1995)	
  the	
  principles	
  of	
  that	
  directive	
  has	
  been	
  transposed	
  into	
  Belgian	
  law.	
  The	
  
Privacy	
   Act	
   consequently	
   underwent	
   significant	
   changes	
   introduced	
   by	
   the	
   Act	
   of	
   11	
   December	
   1998.	
  
Further	
  modifications	
  have	
  been	
  made	
  in	
  the	
  meantime,	
  including	
  those	
  of	
  the	
  Act	
  of	
  26	
  February	
  2006.	
  
The	
   Belgian	
   Privacy	
   Commission	
   is	
   part	
   of	
   a	
   European	
   task	
   force,	
   which	
   includes	
   data	
   protection	
  
authorities	
  from	
  the	
  Netherlands,	
  Belgium,	
  Germany,	
  France	
  and	
  Spain.	
  In	
  October	
  2014,	
  a	
  new	
  Privacy	
  
Bill	
  was	
  introduced	
  in	
  the	
  Belgian	
  Federal	
  Parliament.	
  The	
  Bill	
  mainly	
  aims	
  at	
  providing	
  the	
  Belgian	
  Data	
  
Protection	
   Authority	
   (DPA)	
   with	
   stronger	
   enforcement	
   capabilities	
   and	
   ensuring	
   that	
   Belgian	
   citizens	
  
regain	
  control	
  over	
  their	
  personal	
  data.	
  To	
  achieve	
  this,	
  certain	
  new	
  measures	
  are	
  being	
  proposed	
  to	
  be	
  
included	
  in	
  the	
  existing	
  legislation,	
  adopted	
  already	
  in	
  1992,	
  as	
  inspired	
  by	
  the	
  proposed	
  European	
  data	
  
protection	
  Regulation.	
  
At	
   this	
   moment	
   the	
   current	
   data	
   processing	
   legislation	
   needs	
   an	
   urgent	
   update.	
   Rapid	
   technological	
  
developments,	
  the	
  increasingly	
  globalized	
  nature	
  of	
  data	
  flows	
  and	
  the	
  arrival	
  of	
  cloud	
  computing	
  pose	
  
new	
   challenges	
   for	
   data	
   protection	
   authorities.	
   In	
   order	
   to	
   ensure	
   a	
   continuity	
   of	
   high	
   level	
   data	
  
protection,	
  the	
  rules	
  need	
  to	
  be	
  brought	
  in	
  line	
  with	
  technological	
  developments.	
  The	
  Directive	
  of	
  1995	
  
has	
  also	
  not	
  prevented	
  fragmentation	
  in	
  the	
  way	
  data	
  protection	
  is	
  implemented	
  across	
  the	
  Union.	
  
In	
   2012	
   the	
   European	
   Commission	
   has	
   proposed	
   a	
   comprehensive,	
   pan-­‐European	
   reform	
   of	
   the	
   data	
  
protection	
  rules	
  to	
  strengthen	
  online	
  privacy	
  rights	
  and	
  boost	
  Europe's	
  digital	
  economy.	
  On	
  15	
  June	
  2015,	
  
the	
   Council	
   reached	
   a	
   ‘general	
   approach’	
   on	
   a	
   General	
   Data	
   Protection	
   Regulation	
   (GDPR)	
   that	
  
 
	
  
15
establishes	
   rules	
   adapted	
   to	
   the	
   digital	
   era.	
   The	
   European	
   Commission	
   is	
   pushing	
   for	
   a	
   complete	
  
agreement	
  between	
  Council	
  and	
  European	
  Parliament	
  before	
  the	
  end	
  of	
  this	
  year.	
  The	
  twofold	
  aim	
  of	
  the	
  
Regulation	
  is	
  to	
  enhance	
  data	
  protection	
  rights	
  of	
  individuals	
  and	
  to	
  improve	
  business	
  opportunities	
  by	
  
facilitating	
   the	
   free	
   flow	
   of	
   personal	
   data	
   in	
   the	
   digital	
   single	
   market.	
   The	
   Regulation	
   must	
   be	
  
appropriately	
   balanced	
   in	
   order	
   to	
   guarantee	
   a	
   high	
   level	
   of	
   protection	
   of	
   the	
   individuals	
   and	
   allow	
  
companies	
   to	
   preserve	
   innovation	
   and	
   competitiveness.	
   In	
   parallel	
   with	
   the	
   proposal	
   for	
   a	
   GDPR,	
   the	
  
Commission	
  adopted	
  a	
  Directive	
  on	
  data	
  processing	
  for	
  law	
  enforcement	
  purposes	
  (5833/12).	
  	
  
4.2.2 Some	
  concerns	
  of	
  the	
  insurance	
  industry	
  
The	
  European	
  insurance	
  and	
  reinsurance	
  federation,	
  Insurance	
  Europe,	
  is	
  concerned	
  that	
  the	
  proposed	
  
Regulation	
  could	
  introduce	
  unintended	
  consequences	
  for	
  the	
  insurance	
  industry	
  and	
  their	
  policyholders.	
  
The	
   new	
   legislation	
   must	
   correctly	
   balance	
   an	
   individual’s	
   right	
   to	
   privacy	
   against	
   the	
   needs	
   of	
  
businesses.	
   The	
   way	
   insurers	
   process	
   data	
   must	
   be	
   taken	
   into	
   account	
   appropriately	
   so	
   that	
   they	
   can	
  
perform	
   their	
   contractual	
   obligations,	
   assess	
   consumers’	
   needs	
   and	
   risks,	
   innovate,	
   and	
   also	
   combat	
  
fraud.	
  There	
  is	
  also	
  a	
  clear	
  tension	
  between	
  Big	
  Data,	
  the	
  privacy	
  of	
  the	
  insured’s	
  personal	
  data	
  and	
  its	
  
availability	
  to	
  business	
  and	
  the	
  State.	
  
An	
  important	
  concern	
  is	
  that	
  the	
  proposed	
  rules	
  concerning	
  profiling	
  do	
  not	
  take	
  into	
  consideration	
  the	
  
way	
  that	
  insurance	
  works.	
  The	
  Directive	
  of	
  1995	
  contains	
  rules	
  on	
  'automated	
  processing'	
  but	
  there	
  is	
  not	
  
a	
  single	
  mention	
  of	
  'profiling'	
  in	
  the	
  text.	
  The	
  new	
  GDPR	
  aims	
  to	
  provide	
  more	
  legal	
  certainty	
  and	
  more	
  
protection	
   for	
   individuals	
   with	
   respect	
   to	
   data	
   processing	
   in	
   the	
   context	
   of	
   profiling.	
   Insures	
   need	
   to	
  
profile	
  potential	
  policyholders	
  to	
  measure	
  risk,	
  any	
  restrictions	
  on	
  profiling	
  could,	
  therefore,	
  translate	
  not	
  
only	
   into	
   higher	
   insurance	
   prices	
   and	
   less	
   insurance	
   coverage,	
   but	
   also	
   into	
   an	
   inability	
   to	
   provide	
  
consumers	
   with	
   appropriate	
   insurance.	
   Insurance	
   Europe	
   recommends	
   that	
   the	
   new	
   EU	
   Regulation	
  
should	
   allow	
   insurance-­‐related	
   profiling	
   at	
   pre-­‐contractual	
   stage	
   and	
   during	
   the	
   performance	
   of	
   the	
  
contract.	
  There	
  is	
  also	
  still	
  some	
  confusion	
  in	
  defining	
  profiling,	
  in	
  the	
  Council	
  approach	
  profiling	
  means	
  
solely	
  automated	
  processing	
  while	
  Article	
  20(5)	
  proposed	
  by	
  the	
  European	
  Parliament,	
  could,	
  according	
  
to	
   Insurance	
   Europe,	
   be	
   interpreted	
   as	
   prohibiting	
   fully	
   automated	
   processing,	
   requesting	
   human	
  
intervention	
  for	
  every	
  single	
  insurance	
  contract	
  offered	
  to	
  consumers.	
  	
  
The	
   proposal	
   of	
   the	
   EU	
   Council	
   (June	
   2015)	
   stipulates	
   that	
   the	
   controller	
   should	
   use	
   adequate	
  
mathematical	
  or	
  statistical	
  procedures	
  for	
  the	
  profiling.	
  He	
  must	
  secure	
  personal	
  data	
  in	
  a	
  way	
  which	
  
takes	
  account	
  of	
  the	
  potential	
  risks	
  involved	
  for	
  the	
  interests	
  and	
  rights	
  of	
  the	
  data	
  subject	
  and	
  which	
  
prevents	
  inter	
  alia	
  discriminatory	
  effects	
  against	
  individuals	
  on	
  the	
  basis	
  of	
  race	
  or	
  ethnic	
  origin,	
  political	
  
opinions,	
  religion	
  or	
  beliefs,	
  trade	
  union	
  membership,	
  genetic	
  or	
  health	
  status,	
  sexual	
  orientation	
  or	
  that	
  
result	
   in	
   measures	
   having	
   such	
   effect.	
   Automated	
   decision-­‐making	
   and	
   profiling	
   based	
   on	
   special	
  
categories	
  of	
  personal	
  data	
  should	
  only	
  be	
  allowed	
  under	
  specific	
  conditions.	
  	
  
According	
  to	
  the	
  Article	
  29	
  Working	
  Party12	
  the	
  proposals	
  of	
  the	
  Council	
  according	
  to	
  profiling	
  are	
  still	
  
unclear	
  and	
  do	
  not	
  foresee	
  sufficient	
  safeguards	
  which	
  should	
  be	
  put	
  in	
  place.	
  In	
  June	
  2015	
  it	
  renews	
  its	
  
call	
  for	
  provisions	
  giving	
  the	
  data	
  subject	
  a	
  maximum	
  of	
  control	
  and	
  autonomy	
  when	
  processing	
  personal	
  
data	
  for	
  profiling.	
  The	
  provisions	
  should	
  clearly	
  define	
  the	
  purposes	
  for	
  which	
  profiles	
  may	
  be	
  created	
  
and	
  used,	
  including	
  specific	
  obligations	
  on	
  controllers	
  to	
  inform	
  the	
  data	
  subject,	
  in	
  particular	
  on	
  his	
  or	
  
her	
  right	
  to	
  object	
  to	
  the	
  creation	
  and	
  the	
  use	
  of	
  profiles.	
  The	
  academic	
  Research	
  Group	
  IRISS	
  remarks	
  that	
  
the	
  GDPR	
  does	
  not	
  clarify	
  whether	
  or	
  not	
  there	
  is	
  an	
  obligation	
  on	
  data	
  controllers	
  to	
  disclose	
  information	
  
about	
  the	
  algorithm	
  involved	
  in	
  profiling	
  practices	
  and	
  suggest	
  clarification	
  on	
  this	
  point.	
  
Insurance	
  Europe	
  also	
  request	
  that	
  the	
  GDPR	
  should	
  explicitly	
  recognise	
  insurers’	
  need	
  to	
  process	
  and	
  
share	
  data	
  for	
  fraud	
  prevention	
  and	
  detection.	
  According	
  to	
  the	
  Council	
  and	
  the	
  Article	
  29	
  Working	
  Party	
  
fraud	
  prevention	
  may	
  fall	
  under	
  the	
  non-­‐exhaustive	
  list	
  of	
  ‘legitimate	
  interests’	
  in	
  Article	
  6(1)	
  (f)	
  and	
  will	
  
provide	
  the	
  necessary	
  legal	
  basis	
  to	
  allow	
  processes	
  for	
  combatting	
  insurance	
  fraud.	
  
The	
   new	
   Regulation	
   proposes	
   also	
   a	
   new	
   right	
   to	
   data	
   portability,	
   enabling	
   easier	
   transmission	
   of	
  
personal	
  data	
  from	
  one	
  service	
  provider	
  to	
  another.	
  This	
  would	
  allow	
  policyholders	
  to	
  obtain	
  a	
  copy	
  of	
  
any	
  of	
  their	
  data	
  being	
  processed	
  by	
  an	
  insurer	
  and	
  insurers	
  could	
  be	
  forced	
  to	
  disclose	
  confidential	
  and	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
12	
  Article	
  29	
  Working	
  Party	
  is	
  an	
  independent	
  advisory	
  body	
  on	
  data	
  protection	
  and	
  privacy,	
  set	
  up	
  under	
  Data	
  
Protection	
  Direction	
  of	
  1995.	
  It	
  is	
  composed	
  of	
  representatives	
  from	
  the	
  national	
  data	
  protection	
  authorities	
  of	
  the	
  
EU	
  Member	
  States,	
  the	
  European	
  Data	
  Protection	
  Supervisor	
  and	
  the	
  European	
  Commission.	
  
 
	
  
16
commercially	
   sensitive	
   information.	
   Insurance	
   Europe	
   believes	
   that	
   the	
   scope	
   of	
   the	
   right	
   to	
   data	
  
portability	
  should	
  be	
  narrowed	
  down,	
  to	
  make	
  sure	
  that	
  insurers	
  would	
  not	
  be	
  forced	
  to	
  disclose	
  actuarial	
  
information	
  to	
  competitors.	
  	
  	
  
Insurers	
   also	
   need	
   to	
   retain	
   policyholder	
   information.	
   It	
   should	
   clearly	
   state	
   that	
   the	
   right	
   to	
   be	
  
forgotten	
   should	
   not	
   apply	
   where	
   there	
   is	
   a	
   contractual	
   relationship	
   between	
   an	
   organisation	
   and	
   an	
  
individual	
  or	
  where	
  a	
  data	
  controller	
  is	
  required	
  to	
  comply	
  with	
  regulatory	
  obligations	
  to	
  retain	
  data	
  or	
  
where	
  the	
  data	
  is	
  processed	
  to	
  detect	
  and	
  prevent	
  fraudulent	
  activities.	
  	
  	
  
The	
   implementation	
   of	
   more	
   stringent,	
   complex	
   rules	
   will	
   require	
   insurance	
   firms	
   to	
   review	
   their	
  
compliance	
  programmes.	
  They	
  will	
  have	
  to	
  take	
  account	
  of	
  increased	
  data	
  handling	
  formalities,	
  profiling,	
  
consent	
   and	
   processing	
   requirements	
   and	
   the	
   responsibilities	
   and	
   obligations	
   of	
   controllers	
   and	
  
processors.	
  
4.3 Discrimination	
  
4.3.1 Legislation:	
  an	
  overview	
  
In	
   2000	
   two	
   important	
   EU	
   directives	
   have	
   provided	
   a	
   comprehensive	
   framework	
   for	
   European	
   anti-­‐
discrimination	
  law.	
  The	
  Employment	
  Equality	
  Directive	
  (2000/78/EC)	
  prohibits	
  discrimination	
  on	
  the	
  
basis	
   of	
   sexual	
   orientation,	
   religion	
   or	
   belief,	
   age	
   and	
   disability	
   in	
   the	
   area	
   of	
   employment	
   while	
   the	
  
Racial	
  Equality	
  Directive	
  (2000/43/EC)	
  combats	
  discrimination	
  on	
  the	
  grounds	
  of	
  race	
  or	
  ethnicity	
  in	
  
the	
  context	
  of	
  employment,	
  the	
  welfare	
  system,	
  social	
  security,	
  and	
  goods	
  and	
  services.	
  	
  
	
  
The	
   Gender	
   Goods	
   and	
   Services	
   Directive	
   (2004/113/EC)	
   has	
   expanded	
   the	
   scope	
   of	
   sex	
  
discrimination	
  and	
  requires	
  that	
  differences	
  in	
  treatment	
  may	
  be	
  accepted	
  only	
  if	
  they	
  are	
  justified	
  by	
  a	
  
legitimate	
  aim.	
  Any	
  limitation	
  should	
  nevertheless	
  be	
  appropriate	
  and	
  necessary	
  in	
  accordance	
  with	
  the	
  
criteria	
   derived	
   from	
   case	
   law	
   of	
   the	
   ECJ.	
   As	
   regards	
   the	
   insurance	
   sector,	
   the	
   Directive,	
   in	
   principle,	
  
imposes	
  ‘unisex’	
  premiums	
  and	
  benefits	
  for	
  contracts	
  concluded	
  after	
  21	
  December	
  2007.	
  However,	
  it	
  
provides	
  for	
  an	
  exception	
  to	
  this	
  principle	
  in	
  Article	
  5(2),	
  with	
  the	
  possibility	
  to	
  permit	
  differences	
  in	
  
treatment	
  between	
  women	
  and	
  men	
  after	
  this	
  date,	
  based	
  on	
  actuarial	
  data	
  and	
  reliable	
  statistics.	
  In	
  its	
  
Test-­‐Achats	
  judgment,	
  the	
  ECJ	
  invalidated	
  this	
  exception	
  because	
  it	
  was	
  incompatible	
  with	
  Articles	
  21	
  and	
  
23	
  of	
  the	
  EU’s	
  Charter	
  of	
  Fundamental	
  Rights.	
  
	
  
A	
  proposal	
  for	
  a	
  Council	
  Directive	
  (COM	
  2008	
  426-­‐(15))	
  stipulates	
  that	
  actuarial	
  and	
  risk	
  factors	
  related	
  
to	
   disability	
   and	
   to	
   age	
   can	
   be	
   used	
   in	
   the	
   provision	
   of	
   insurance.	
   These	
   should	
   not	
   be	
   regarded	
   as	
  
constituting	
  discrimination	
  where	
  the	
  factors	
  are	
  shown	
  to	
  be	
  key	
  factors	
  for	
  the	
  assessment	
  of	
  risk.	
  
	
  
The	
  recent	
  proposal	
  of	
  the	
  Council	
  on	
  the	
  new	
  General	
  Data	
  Protection	
  Regulation	
  (June	
  2015)	
  states	
  that	
  
the	
  processing	
  of	
  special	
  categories	
  of	
  personal	
  (sensitive)	
  data,	
  revealing	
  racial	
  or	
  ethnic	
  origin,	
  political	
  
opinions,	
  religious	
  or	
  philosophical	
  beliefs,	
  trade-­‐union	
  membership,	
  and	
  the	
  processing	
  of	
  genetic	
  data	
  
or	
  data	
  concerning	
  health	
  or	
  sex	
  life	
  shall	
  be	
  prohibited.	
  Derogations	
  from	
  this	
  general	
  prohibition	
  should	
  
be	
   explicitly	
   provided	
   inter	
   alia	
   where	
   the	
   data	
   subject	
   gives	
   explicit	
   consent	
   or	
   in	
   respect	
   of	
   specific	
  
needs,	
  in	
  particular	
  where	
  the	
  processing	
  is	
  carried	
  out	
  in	
  the	
  course	
  of	
  legitimate	
  activities	
  by	
  certain	
  
associations	
  or	
  foundations	
  the	
  purpose	
  of	
  which	
  is	
  to	
  permit	
  the	
  exercise	
  of	
  fundamental	
  freedoms.	
  	
  
	
  
In	
   Belgium	
   the	
   EU	
   Directive	
   2000/78/EC	
   is	
   transposed	
   to	
   the	
   national	
   legislation	
   with	
   the	
   anti-­‐
discrimination	
   Law	
   of	
   10	
   May	
   2007	
   (BS	
   30.V.2007).	
   This	
   law	
   has	
   been	
   amended	
   by	
   the	
   law	
   of	
   30	
  
December	
  2009	
  (BS	
  31.XII.2009)	
  and	
  by	
  the	
  law	
  of	
  17	
  Augustus	
  2013	
  (BS	
  5.III.2014).	
  Due	
  to	
  the	
  federal	
  
organization	
  of	
  Belgium,	
  laws	
  prohibiting	
  discrimination	
  are	
  complex	
  and	
  fragmented	
  because	
  they	
  are	
  
made	
  and	
  implemented	
  by	
  six	
  different	
  legislative	
  bodies,	
  each	
  within	
  its	
  own	
  sphere	
  of	
  competence.	
  	
  
	
  
4.3.2 Tension	
  between	
  insurance	
  and	
  anti-­‐discrimination	
  law	
  
Insurance	
  companies	
  are	
  dealing	
  with	
  a	
  particular	
  type	
  of	
  ‘discrimination’	
  among	
  their	
  insureds.	
  They	
  
attempt	
  to	
  segregate	
  insureds	
  into	
  separate	
  risk	
  pools	
  based	
  on	
  their	
  differences	
  in	
  risk	
  profiles,	
  first,	
  so	
  
that	
   they	
   can	
   charge	
   different	
   premiums	
   to	
   the	
   different	
   groups	
   based	
   on	
   their	
   risk	
   and,	
   second,	
   to	
  
incentivize	
  risk	
  reduction	
  by	
  insureds.	
  They	
  openly	
  ‘discriminate’	
  among	
  individuals	
  based	
  on	
  observable	
  
characteristics.	
   Accurate	
   risk	
   classification	
   and	
   incentivizing	
   risk	
   reduction	
   provide	
   the	
   primary	
  
justifications	
  for	
  why	
  we	
  let	
  insurers	
  ‘discriminate’.	
  [30]	
  
	
  
 
	
  
17
Regulatory	
  restrictions	
  on	
  insurers’	
  risk	
  classifications	
  can	
  produce	
  moral	
  hazard	
  and	
  generate	
  adverse	
  
selection.	
  Davey	
  [31]	
  remarks	
  that	
  insurance	
  and	
  anti-­‐discrimination	
  law	
  are	
  defending	
  a	
  fundamental	
  
different	
   perspective	
   to	
   risk	
   assessment.	
   Insurance	
   has	
   often	
   defended	
   its	
   practices	
   as	
   ‘fair	
  
discrimination’.	
  They	
  assert	
  that	
  they	
  are	
  not	
  discriminating	
  in	
  the	
  legal	
  sense	
  by	
  treating	
  similar	
  cases	
  
differently,	
   they	
   rather	
   are	
   treating	
   different	
   cases	
   differently.	
   This	
   clash	
   between	
   the	
   principal	
   of	
  
insurance	
  and	
  anti-­‐discrimination	
  law	
  is	
  fundamental:	
  whether	
  differential	
  treatment	
  based	
  on	
  actuarial	
  
experience	
   is	
   ‘discrimination’	
   in	
   law	
   or	
   justified	
   differential	
   treatment.	
   This	
   tension	
   is	
   felt	
   in	
   both	
   the	
  
national	
  and	
  supranational	
  levels	
  as	
  governments	
  and	
  the	
  EU	
  seek	
  to	
  regulate	
  underwriting	
  practices.	
  A	
  
good,	
  illustrative	
  example	
  is	
  the	
  already	
  mentioned	
  Test-­‐Achats	
  case.	
  	
  
	
  
Tension	
  between	
  insurance	
  and	
  the	
  Charter	
  of	
  Fundamental	
  Rights	
  is	
  also	
  clearly	
  felt	
  in	
  the	
  debate	
  on	
  
genetic	
  discrimination	
  in	
  the	
  context	
  of	
  life	
  insurance.	
  Insurers	
  might	
  wish	
  to	
  use	
  genetic	
  test	
  results	
  for	
  
underwriting,	
  just	
  as	
  other	
  medical	
  or	
  family	
  history	
  data.	
  The	
  disclosure	
  of	
  genetic	
  data	
  for	
  insurance	
  
risk	
  analysis	
  will	
  present	
  complex	
  issues	
  that	
  overlap	
  those	
  related	
  to	
  sensitive	
  data	
  in	
  general.	
  	
  Canada,	
  
the	
  US,	
  Russia,	
  and	
  Japan	
  have	
  chosen	
  not	
  to	
  adopt	
  laws	
  specifically	
  prohibiting	
  access	
  to	
  genetic	
  data	
  for	
  
underwriting	
  by	
  life	
  insurers.	
  In	
  these	
  countries,	
  insurers	
  treat	
  genetic	
  data	
  like	
  other	
  types	
  of	
  medical	
  or	
  
lifestyle	
  data	
  [32].	
  Belgium,	
  France,	
  and	
  Norway	
  have	
  chosen	
  to	
  adopt	
  laws	
  to	
  prevent	
  or	
  limit	
  insurers'	
  
access	
   to	
   genetic	
   data	
   for	
   life	
   insurance	
   underwriting.	
  The	
   Belgian	
   Parliament	
   has	
   incorporated	
   in	
   the	
  
Law	
  of	
  25	
  June	
  1992	
  legislative	
  dispositions	
  that	
  prohibits	
  the	
  use	
  of	
  genetic	
  testing	
  to	
  predict	
  the	
  future	
  
health	
  status	
  of	
  applicants	
  for	
  (life)	
  insurances.	
  	
  
	
  
Since	
  EU	
  member	
  states	
  have	
  adopted	
  different	
  approaches	
  on	
  the	
  use	
  of	
  genetic	
  data,	
  a	
  pan-­‐European	
  
regulation	
   is	
   needed.	
   The	
   recent	
   proposal	
   of	
   the	
   Council	
   on	
   a	
   new	
   General	
   Data	
   Protection	
   Regulation	
  
(June	
  2015)	
  does	
  not	
  solve	
  this	
  problem.	
  It	
  prohibits	
  the	
  processing	
  of	
  genetic	
  data	
  but	
  recognises	
  explicit	
  
consent	
  as	
  a	
  valid	
  legal	
  basis	
  for	
  the	
  processing	
  of	
  genetic	
  data	
  and	
  leaves	
  to	
  Member	
  States	
  (Article	
  9(2)	
  
(a))	
  the	
  decision	
  on	
  not	
  admitting	
  consent	
  for	
  legitimising	
  the	
  processing	
  of	
  genetic	
  data.	
  
	
  
5 New	
  Frontiers	
  
	
  
5.1 Risk	
  pooling	
  vs.	
  personalization	
  
With	
  an	
  introduction	
  of	
  Big	
  Data	
  in	
  insurance,	
  insurance	
  sector	
  is	
  opening	
  up	
  to	
  new	
  possibilities,	
  new	
  
innovative	
  offers	
  and	
  personalized	
  services	
  for	
  their	
  customers.	
  As	
  a	
  result	
  we	
  might	
  see	
  the	
  end	
  of	
  risk	
  
pooling	
  and	
  the	
  rise	
  of	
  individual	
  risk	
  assessment.	
  It	
  is	
  said	
  that	
  these	
  personalized	
  services	
  will	
  provide	
  
new	
  premiums	
  that	
  will	
  be	
  “fairer”	
  for	
  the	
  policyholder.	
  Is	
  it	
  indeed	
  true	
  that	
  the	
  imprudence	
  of	
  others	
  
will	
   have	
   less	
   impact	
   on	
   your	
   own	
   insurance	
   premium?	
   This	
   way	
   of	
   thinking	
   holds	
   for	
   as	
   long	
   as	
   the	
  
policyholder	
  does	
  not	
  have	
  any	
  claim.	
  In	
  the	
  world	
  of	
  totally	
  individualised	
  premium,	
  the	
  event	
  of	
  a	
  claim	
  
would	
  increase	
  the	
  premium	
  of	
  that	
  policyholder	
  enormously.	
  And	
  that	
  seems	
  in	
  contradiction	
  with	
  the	
  
way	
  we	
  think	
  about	
  insurance	
  i.e.	
  that	
  in	
  the	
  event	
  of	
  a	
  claim,	
  your	
  claim	
  is	
  paid	
  by	
  the	
  excess	
  premium	
  of	
  
the	
  other	
  policyholders.	
  It	
  seems	
  that	
  with	
  the	
  introduction	
  of	
  Big	
  Data,	
  the	
  social	
  aspect	
  of	
  insurance	
  is	
  
gone.	
  
However,	
  which	
  customer	
  would	
  like	
  to	
  subscribe	
  to	
  such	
  an	
  insurance	
  offer?	
  One	
  could	
  then	
  argue	
  that	
  it	
  
is	
  better	
  to	
  save	
  the	
  insurance	
  premium	
  on	
  your	
  own	
  and	
  put	
  it	
  aside	
  for	
  the	
  possibility	
  of	
  a	
  future	
  claim.	
  
So	
  in	
  order	
  to	
  talk	
  about	
  insurance,	
  risk	
  pooling	
  will	
  always	
  be	
  necessary.	
  Big	
  Data	
  is	
  just	
  changing	
  the	
  
way	
  we	
  pool	
  the	
  risks.	
  
For	
  example,	
  until	
  recently,	
  the	
  premium	
  for	
  car	
  insurance	
  was	
  only	
  dependant	
  on	
  a	
  handful	
  of	
  indicators	
  
(personal,	
  demographic	
  and	
  car	
  data).	
  Therefore,	
  an	
  insurance	
  portfolio	
  needed	
  to	
  be	
  big	
  enough	
  to	
  have	
  
risk	
  pools	
  with	
  enough	
  diversification	
  on	
  the	
  other	
  indicators	
  that	
  could	
  not	
  be	
  measured.	
  
In	
  recent	
  years	
  more	
  and	
  more	
  indicators	
  can	
  be	
  measured	
  and	
  used	
  as	
  data.	
  This	
  means	
  that	
  risk	
  pools	
  
don’t	
  have	
  to	
  be	
  as	
  big	
  as	
  before	
  because	
  the	
  behaviour	
  of	
  each	
  individual	
  of	
  the	
  risk	
  pool	
  is	
  becoming	
  
more	
   and	
   more	
   predictable.	
   Somebody	
   who	
   speeds	
   all	
   the	
   time	
   is	
   more	
   likely	
   to	
   have	
   an	
   accident.	
  
Previously	
  this	
  was	
  assumed	
  to	
  be	
  people	
  with	
  car	
  with	
  high	
  horsepower.	
  Nowadays,	
  this	
  behaviour	
  can	
  
be	
  exactly	
  measured,	
  removing	
  the	
  need	
  for	
  assumptions.	
  
 
	
  
18
However,	
  as	
  long	
  as	
  there	
  is	
  a	
  future	
  event	
  that	
  is	
  uncertain,	
  risk	
  pooling	
  still	
  makes	
  sense.	
  The	
  risk	
  pools	
  
are	
  just	
  becoming	
  smaller	
  and	
  more	
  predictable.	
  In	
  the	
  example	
  given,	
  even	
  a	
  driver	
  who	
  does	
  not	
  speed	
  
can	
  still	
  be	
  involved	
  in	
  an	
  accident.	
  
5.2 Personalised	
  Premium	
  
Personalisation	
  of	
  risk	
  pricing	
  relies	
  upon	
  an	
  insurer	
  having	
  the	
  capacity	
  to	
  handle	
  a	
  vast	
  amount	
  of	
  data.	
  
A	
  big	
  challenge	
  is	
  linked	
  with	
  data	
  collection,	
  making	
  sure	
  it	
  is	
  reliable	
  and	
  that	
  it	
  can	
  in	
  fact	
  be	
  used	
  for	
  
insurance	
  pricing.	
  They	
  will	
  have	
  to	
  be	
  careful	
  not	
  to	
  be	
  overwhelmed	
  by	
  Big	
  Data.	
  
We	
  stated	
  above	
  that	
  the	
  use	
  of	
  Big	
  Data	
  will	
  make	
  insurance	
  pricing	
  fairer.	
  In	
  this	
  case	
  fair	
  is	
  defined	
  as	
  
taking	
  into	
  account	
  all	
  members	
  of	
  society.	
  However,	
  this	
  does	
  not	
  mean	
  that	
  everyone	
  in	
  society	
  should	
  
be	
  treated	
  in	
  exactly	
  the	
  same	
  way.	
  Every	
  individual	
  should	
  have	
  an	
  equal	
  opportunity	
  to	
  what	
  is	
  on	
  offer.	
  
However,	
  it	
  can	
  appear	
  that	
  the	
  offer	
  does	
  not	
  meet	
  the	
  requirements	
  of	
  the	
  customer,	
  or	
  vice	
  versa.	
  It	
  
that	
  case,	
  an	
  insurance	
  cover	
  will	
  not	
  be	
  possible.	
  	
  
5.3 From	
  Insurance	
  to	
  Prevention	
  
One	
  of	
  the	
  big	
  advantages	
  of	
  the	
  gathering	
  of	
  Big	
  Data	
  by	
  Insurance	
  companies	
  or	
  other	
  companies	
  is	
  that	
  
this	
  data	
  can	
  in	
  a	
  certain	
  way	
  be	
  shared	
  with	
  its	
  customers.	
  In	
  that	
  way,	
  a	
  constant	
  interaction	
  can	
  arise	
  
between	
  the	
  insurer	
  and	
  the	
  policyholder.	
  When	
  consumers	
  understand	
  better	
  how	
  their	
  behaviour	
  can	
  
impact	
  their	
  insurance	
  premium,	
  they	
  can	
  make	
  changes	
  in	
  their	
  live	
  that	
  can	
  beneficial	
  both	
  parties.	
  
A	
  typical	
  example	
  of	
  this	
  is	
  the	
  use	
  of	
  telematics	
  in	
  car	
  insurance.	
  A	
  box	
  in	
  the	
  insured	
  car	
  automatically	
  
saves	
   and	
   transmits	
   all	
   driving	
   information	
   of	
   the	
   vehicle.	
   The	
   insurance	
   company	
   uses	
   this	
   data	
   to	
  
analyse	
   the	
   risk	
   the	
   policyholder	
   is	
   facing	
   during	
   driving.	
   When	
   for	
   example	
   the	
   driver	
   is	
   constantly	
  
speeding	
   and	
   braking	
   heavily,	
   the	
   insurance	
   company	
   can	
   take	
   this	
   as	
   an	
   indication	
   to	
   increase	
   the	
  
premium.	
  On	
  the	
  other	
  hand,	
  someone	
  who	
  drives	
  calmly	
  and	
  outside	
  the	
  busy	
  hours	
  and	
  only	
  outside	
  the	
  
city	
  will	
  be	
  rewarded	
  with	
  a	
  lower	
  premium.	
  
In	
  this	
  way	
  insurers	
  will	
  have	
  an	
  impact	
  on	
  the	
  driving	
  behaviour	
  of	
  people.	
  Once	
  this	
  communication	
  
between	
   policyholder	
   and	
   insurer	
   is	
   transparent,	
   the	
   policyholder	
   will	
   act	
   in	
   a	
   way	
   to	
   decrease	
   his	
  
premium.	
  The	
  insurer	
  has	
  played	
  the	
  role	
  of	
  a	
  prevention	
  officer.	
  
Another	
  example	
  is	
  “e-­‐Health”.	
  As	
  the	
  health	
  cost	
  is	
  rising	
  rapidly,	
  insurers	
  are	
  trying	
  to	
  lower	
  the	
  claim	
  
costs.	
  It	
  is	
  found	
  that	
  everyday	
  living	
  habits	
  of	
  people,	
  for	
  example,	
  eating	
  behaviour,	
  the	
  amount	
  of	
  sleep	
  
you	
  get,	
  or	
  the	
  number	
  of	
  hours	
  you	
  do	
  sport	
  has	
  a	
  large	
  influence	
  on	
  health	
  claims.	
  
The	
  Internet	
  of	
  Things	
  will	
  have	
  an	
  impact	
  on	
  the	
  way	
  the	
  pricing	
  is	
  done	
  for	
  each	
  individual.	
  Thanks	
  to	
  
modern	
  sensors,	
  insurer	
  will	
  be	
  able	
  to	
  acquire	
  data	
  at	
  the	
  individual/personal	
  level.	
  Each	
  policyholder	
  
will	
  in	
  that	
  way	
  be	
  encouraged	
  to	
  sleep	
  enough,	
  sport	
  enough	
  and	
  eat	
  healthy.	
  All	
  in	
  all,	
  it	
  is	
  the	
  consumer	
  
that	
  benefits	
  from	
  less	
  car	
  accidents,	
  a	
  healthy	
  lifestyle	
  and	
  …	
  lower	
  premiums.	
  	
  
5.4 The	
  all-­‐seeing	
  Insurer	
  
Insurance	
  companies	
  have	
  always	
  been	
  interested	
  in	
  gathering	
  as	
  much	
  information	
  possible	
  on	
  the	
  risk	
  
being	
  assured	
  and	
  the	
  people	
  insuring	
  them.	
  With	
  the	
  possibilities	
  of	
  Big	
  Data,	
  this	
  interest	
  in	
  people’s	
  
everyday	
  life	
  increases	
  enormously.	
  Therefore	
  insurance	
  is	
  becoming	
  more	
  and	
  more	
  an	
  embedded	
  part	
  
of	
  the	
  everyday	
  life	
  of	
  people	
  and	
  businesses.	
  Previously,	
  consumers	
  just	
  needed	
  to	
  fill	
  in	
  some	
  form	
  at	
  
the	
   beginning	
   of	
   an	
   insurance	
   contract	
   and	
   the	
   impact	
   of	
   that	
   insurance	
   was	
   more	
   or	
   less	
   stable	
   and	
  
predictable	
  during	
  the	
  whole	
  duration	
  of	
  the	
  contract,	
  whatever	
  the	
  future	
  behaviour	
  of	
  the	
  consumer.	
  
With	
  the	
  introduction	
  of	
  Big	
  Data,	
  insurers	
  have	
  influence	
  on	
  every	
  aspect	
  of	
  everyday	
  life.	
  The	
  way	
  you	
  
drive,	
  what	
  you	
  buy,	
  what	
  you	
  don’t	
  buy,	
  the	
  way	
  you	
  sleep,	
  etc.,	
  can	
  have	
  a	
  big	
  impact	
  on	
  your	
  financial	
  
situation.	
   Indeed,	
   insurers	
   are	
   moving	
   into	
   a	
   position	
   of	
   a	
   central	
   tower,	
   observing	
   our	
   everyday	
   life	
  
through	
  the	
  bias	
  of	
  smartphones	
  and	
  all	
  other	
  devices	
  and	
  sensors.	
  
The	
  future	
  will	
  tell	
  us	
  how	
  far	
  the	
  general	
  public	
  will	
  allow	
  this	
  influence	
  of	
  insurance	
  companies.	
  Sharing	
  
your	
  driving	
  behaviour	
  with	
  insurers	
  will	
  probably	
  not	
  be	
  a	
  problem	
  for	
  most	
  of	
  us,	
  but	
  sharing	
  what	
  we	
  
eat	
  and	
  how	
  we	
  sleep	
  is	
  a	
  bigger	
  step.	
  Every	
  person	
  will	
  have	
  to	
  make	
  a	
  trade-­‐off	
  between	
  privacy	
  and	
  
better	
  insurance	
  offer.	
  Currently,	
  for	
  instance	
  in	
  case	
  of	
  car	
  insurance	
  telematics,	
  drivers	
  have	
  an	
  opt-­‐in	
  
option	
   and	
   they	
   can	
   decide	
   whether	
   they	
   are	
   interested	
   in	
   the	
   telematics-­‐based	
   offer.	
   However	
   in	
   the	
  
future	
  data	
  collection	
  might	
  be	
  default	
  and	
  you	
  will	
  have	
  to	
  pay	
  extra	
  to	
  be	
  unlisted	
  and	
  keep	
  your	
  life	
  
private.	
  
 
	
  
19
5.5 Change	
  in	
  Insurance	
  business	
  
From	
  an	
  actuarial	
  point	
  of	
  view	
  we	
  tend	
  to	
  focus	
  on	
  the	
  opportunities	
  big	
  data	
  hold	
  for	
  managing	
  and	
  
pricing	
  risk.	
  But	
  the	
  digital	
  transformation	
  that	
  is	
  at	
  the	
  basis	
  of	
  big	
  data	
  (cfr.	
  the	
  increased	
  data	
  flow:	
  the	
  
V’s	
   from	
   section	
   2.1	
   and	
   the	
   increased	
   computational	
   power:	
   section	
   2.2)	
   has	
   also	
   led	
   to	
   a	
   change	
   in	
  
customer’s	
  expectations	
  and	
  behaviour.	
  The	
  ease	
  at	
  which	
  the	
  end	
  customer	
  can	
  access	
  information	
  and	
  
interact	
  with	
  companies	
  and	
  the	
  way	
  the	
  digital	
  enterprises	
  have	
  developed	
  their	
  services	
  to	
  enhance	
  this	
  
ease	
   of	
   use,	
   has	
   set	
   a	
   new	
   standard	
   in	
   customer	
   experience.	
   Customers	
   are	
   used	
   to	
   getting	
   quick	
   and	
  
online	
  reactions	
  from	
  the	
  companies	
  they	
  buy	
  goods	
  and	
  services	
  from.	
  Industries	
  that	
  do	
  not	
  adapt	
  to	
  
this	
   new	
   standard	
   can	
   quickly	
   get	
   an	
   image	
   of	
   old	
   fashion,	
   traditional	
   and	
   simply	
   not	
   interesting.	
   We	
  
already	
  have	
  seen	
  new	
  distribution	
  models	
  changing	
  the	
  insurance	
  market	
  in	
  surrounding	
  countries,	
  i.e.	
  
aggregator	
   websites	
   in	
   the	
   UK,	
   that	
   are	
   a	
   result	
   (or	
   play	
   into)	
   this	
   trend.	
   It	
   is	
   in	
   this	
   new	
   customer	
  
experience	
  that	
  big	
  data	
  plays	
  an	
  important	
  role	
  and	
  can	
  be	
  a	
  real	
  element	
  of	
  competitive	
  advantage	
  as	
  it	
  
gives	
  access	
  to	
  a	
  new	
  level	
  of	
  personalization.	
  Getting	
  this	
  personalization	
  right	
  can	
  give	
  a	
  company	
  the	
  
buy-­‐in	
  into	
  future	
  customer	
  interactions	
  and	
  therefore	
  the	
  opportunity	
  for	
  expanding	
  the	
  customer	
  wallet	
  
or	
  relation.	
  This	
  has	
  led	
  to	
  the	
  evolution	
  where	
  some	
  big	
  digital	
  retailers	
  have	
  continuously	
  expanded	
  
their	
  offer	
  to	
  a	
  wide	
  and	
  loyal	
  customer	
  base,	
  even	
  into	
  the	
  insurance	
  business	
  (e.g.	
  Alibaba	
  Insurance).	
  If	
  
these	
   players	
   get	
   it	
   right	
   they	
   can	
   change	
   the	
   insurance	
   distribution	
   landscape,	
   monopolizing	
   the	
  
customer	
  relation	
  and	
  leaving	
  traditional	
  insurers	
  the	
  role	
  of	
  pure	
  risk	
  carriers.	
  For	
  now	
  this	
  evolution	
  is	
  
less	
  noticeable	
  in	
  Belgium	
  where	
  the	
  traditional	
  Insurance	
  distribution	
  model	
  (brokers	
  and	
  banks)	
  still	
  
firmly	
   holds	
   its	
   ground	
   giving	
   the	
   Belgium	
   Insurance	
   Industry	
   an	
   opportunity	
   to	
   modernize	
   (read	
  
digitalize)	
  and	
  personalize	
  the	
  customer	
  experience	
  before	
  newcomers	
  do	
  so.	
  
6 Actuarial	
  sciences	
  and	
  the	
  role	
  of	
  actuaries	
  
Big	
  Data	
  opens	
  a	
  new	
  world	
  for	
  insurance	
  and	
  any	
  other	
  activity	
  based	
  on	
  data.	
  The	
  access	
  to	
  the	
  data,	
  
the	
  scope	
  of	
  the	
  data,	
  the	
  frequency	
  of	
  the	
  data,	
  the	
  extension	
  of	
  the	
  samples	
  of	
  the	
  data,	
  are	
  important	
  
elements	
  that	
  determine	
  to	
  what	
  extend	
  the	
  final	
  decision	
  is	
  inspired	
  by	
  the	
  statistical	
  evidence.	
  As	
  Big	
  
Data	
  changes	
  those	
  properties	
  drastically,	
  it	
  also	
  changes	
  the	
  environment	
  of	
  those	
  who	
  use	
  these	
  data	
  
drastically.	
  The	
  activity	
  of	
  the	
  actuary	
  is	
  particularly	
  influenced	
  by	
  the	
  underlying	
  data,	
  and	
  therefore	
  it	
  
is	
   appropriate	
   to	
   conclude	
   that	
   the	
   development	
   of	
   the	
   Big	
   Data	
   world	
   has	
   a	
   major	
   impact	
   on	
   the	
  
education	
  and	
  training	
  of	
  the	
  actuary,	
  the	
  tools	
  used	
  by	
  the	
  actuary,	
  the	
  role	
  of	
  the	
  actuary	
  in	
  the	
  process.	
  
Data	
  science	
  aiming	
  to	
  optimise	
  the	
  analytics	
  in	
  function	
  of	
  the	
  volume	
  and	
  diversity	
  of	
  the	
  data	
  is	
  an	
  
upcoming	
  and	
  fast	
  developing	
  field.	
  The	
  combination	
  of	
  the	
  actuarial	
  skills	
  and	
  research	
  allows	
  for	
  an	
  
optimal	
  implementation	
  of	
  the	
  insights	
  and	
  tools	
  offered	
  by	
  the	
  data	
  science	
  world.	
  
6.1 What	
  is	
  Big	
  Data	
  bringing	
  for	
  the	
  actuary?	
  
6.1.1 Knowledge	
  gives	
  power	
  
Big	
  data	
  gives	
  access	
  to	
  more	
  information	
  than	
  before:	
  this	
  gives	
  the	
  actuary	
  a	
  richer	
  basis	
  for	
  actuarial	
  
mathematical	
   analysis.	
   When	
   data	
   are	
   more	
   granular	
   and	
   readily	
   available,	
   actuaries	
   can	
   extend	
  their	
  
analysis	
  and	
  identify	
  better	
  the	
  risk	
  factors	
  and	
  the	
  underlying	
  dependencies.	
  Best	
  estimate	
  approaches	
  
are	
   upgraded	
   to	
   stochastic	
   evidence.	
   Christophe	
   Geissler13	
   states	
   that	
   big	
   data	
   will	
   progressively	
  
stimulate	
  the	
  actuary	
  to	
  abandon	
  purely	
  explicative	
  models	
  for	
  more	
  complex	
  models	
  aiming	
  to	
  identify	
  
sub	
  groups	
  with	
  heterogenic	
  subgroups.	
  The	
  explicative	
  models	
  are	
  based	
  on	
  the	
  assumption	
  that	
  there	
  
exists	
   a	
   formula	
   that	
   explains	
   the	
   behaviour	
   of	
   all	
   persons.	
   Big	
   data	
   and	
   the	
   calculation	
   power,	
   allow	
  
developing	
  innovative	
  algorithms	
  to	
  detect	
  visible	
  and	
  verifiable	
  indicators	
  for	
  a	
  different	
  risk	
  profile.	
  	
  
6.1.2 Dynamic	
  risk	
  management	
  
“Even	
   if	
   an	
   actuary	
   uses	
   data	
   to	
   develop	
   an	
   informed	
   judgement,	
   that	
   type	
   of	
   estimate	
   does	
   not	
   seem	
  
sufficient	
   in	
   today’s	
   era	
   of	
   Big	
   Data”,	
   a	
   statement	
   that	
   can	
   be	
   read	
   on	
   a	
   discussion	
   forum	
   of	
   actuaries.	
  
Instead,	
  dynamic	
  risk	
  management	
  is	
  considered	
  to	
  be	
  an	
  advanced	
  form	
  of	
  actuarial	
  science.	
  Actuarial	
  
science	
  is	
  about	
  collecting	
  all	
  pertinent	
  data,	
  using	
  models	
  and	
  expertise	
  to	
  factor	
  risks,	
  and	
  then	
  making	
  
a	
  decision.	
  Dynamic	
  risk	
  management	
  entails	
  real-­‐time	
  decision-­‐making	
  based	
  on	
  a	
  stream	
  of	
  data.	
  	
  
Scope	
  and	
  resources	
  
Big	
  data	
  opens	
  the	
  horizon	
  of	
  the	
  actuary.	
  The	
  applications	
  of	
  Big	
  Data	
  go	
  far	
  beyond	
  the	
  insurance	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
13	
  Christophe	
  Geissler,	
  Le	
  nouveau	
  big	
  bang	
  de	
  l’actuariat,	
  L’Argus	
  de	
  l’Assurance,	
  November	
  2013	
  
 
	
  
20
activity	
  and	
  relate	
  to	
  various	
  domains	
  where	
  the	
  statistical	
  analysis	
  and	
  the	
  economic/financial	
  
implications	
  are	
  essential.	
  Jobert	
  Koomans14,	
  board	
  member	
  of	
  the	
  Actuarial	
  Genootschap,	
  refers	
  to	
  
estimates	
  that	
  Big	
  Data	
  will	
  create	
  a	
  big	
  number	
  of	
  jobs	
  (“1,5	
  million	
  new	
  data	
  analysts	
  will	
  be	
  required	
  in	
  
the	
  US	
  in	
  2018”).	
  Given	
  that	
  actuaries	
  have	
  very	
  string	
  analytical	
  skills	
  combined	
  with	
  business	
  
knowledge	
  thanks	
  to	
  being	
  involved	
  from	
  pricing	
  to	
  financial	
  reporting,	
  gives	
  them	
  a	
  lot	
  of	
  new	
  
opportunities	
  across	
  different	
  industries.	
  
	
  
6.2 What	
  is	
  the	
  actuary	
  bringing	
  to	
  Big	
  Data?	
  
6.2.1 The	
  Subject	
  Matter	
  Expert	
  
Data	
  are	
  a	
  tool	
  to	
  quantify	
  the	
  implications	
  of	
  events	
  and	
  behaviour.	
  The	
  initial	
  modelling	
  and	
  analysis	
  
nevertheless	
  are	
  defining	
  the	
  framework	
  and	
  the	
  ultimate	
  outcome.	
  Deductive	
  and	
  inductive	
  approaches	
  
can	
  be	
  used	
  in	
  this	
  context.	
  
	
  
Kevin	
  Pledge15	
  refers	
  in	
  the	
  to	
  the	
  role	
  of	
  the	
  Subject	
  Matter	
  Expert.	
  “Understanding	
  the	
  business	
  is	
  a	
  
critical	
  factor	
  for	
  analytics,	
  understanding	
  does	
  not	
  come	
  from	
  a	
  system,	
  but	
  from	
  training	
  and	
  experience.	
  …	
  
Not	
  only	
  do	
  actuaries	
  have	
  the	
  quantitative	
  skills	
  to	
  be	
  data	
  scientists	
  of	
  insurance,	
  but	
  our	
  involvement	
  in	
  
everything	
  from	
  pricing	
  to	
  financial	
  reporting	
  gives	
  us	
  the	
  business	
  knowledge	
  to	
  make	
  sense	
  of	
  this.	
  This	
  
business	
  knowledge	
  is	
  as	
  important	
  as	
  the	
  statistical	
  and	
  quant	
  skills	
  typically	
  thought	
  of	
  when	
  you	
  think	
  
data	
  scientist”.	
  	
  
	
  
Actuaries	
   are	
   well	
   placed	
   to	
   combine	
   the	
   data	
   analytics	
   and	
   the	
   business	
   knowledge.	
   The	
   specific	
  
education	
  of	
  the	
  actuary	
  as	
  well	
  as	
  the	
  real	
  life	
  experience	
  in	
  the	
  insurance	
  industry	
  and	
  other	
  domains	
  
with	
   actuarial	
   roots	
   are	
   essential	
   for	
   a	
   successful	
   implementation	
   of	
   the	
   big	
   date	
   approach.	
  	
  
	
  
6.2.2 Streamlining	
  the	
  process	
  
The	
  actuary	
  formulates	
  the	
  objectives	
  and	
  framework	
  for	
  the	
  quantitative	
  research	
  and	
  by	
  this	
  initiates	
  
the	
  Big	
  Data	
  process.	
  Big	
  data	
  requires	
  the	
  appropriate	
  technology	
  and	
  the	
  use	
  of	
  advanced	
  data	
  science.	
  
Actuaries	
  can	
  help	
  to	
  optimise	
  this	
  computer	
  science	
  driven	
  analysis	
  with	
  their	
  in	
  depth	
  understanding	
  of	
  
the	
  full	
  cycle.	
  Streamlining	
  the	
  full	
  process	
  from	
  detecting	
  the	
  needs	
  and	
  defining	
  the	
  models,	
  over	
  using	
  
the	
  appropriate	
  data,	
  to	
  the	
  monitoring	
  of	
  the	
  outcome	
  taking	
  into	
  account	
  general	
  interest	
  and	
  specific	
  
stakeholder	
  interest,	
  is	
  the	
  key	
  of	
  success	
  of	
  data	
  science	
  in	
  hands	
  of	
  the	
  actuary.	
  
	
  
6.2.3 Simple	
  models	
  with	
  predictive	
  power	
  
Esko	
  Kivisaari16:	
  “The	
  real	
  challenge	
  of	
  Big	
  Data	
  for	
  actuaries	
  is	
  to	
  create	
  valid	
  models	
  with	
  good	
  predictive	
  
power	
  with	
  the	
  use	
  of	
  a	
  lots	
  of	
  data.	
  The	
  value	
  of	
  a	
  good	
  model	
  is	
  not	
  that	
  it	
  is	
  just	
  adapted	
  to	
  the	
  data	
  at	
  
hand	
   but	
   it	
   should	
   have	
   predictive	
   power	
   outside	
   experience.	
   There	
   will	
   be	
   the	
   temptation	
   to	
   create	
  
complicated	
  models	
  with	
  lots	
  of	
  parameters	
  that	
  closely	
  replicate	
  what	
  is	
  in	
  the	
  data.	
  The	
  real	
  challenge	
  is	
  to	
  
have	
  the	
  insight	
  to	
  still	
  produce	
  simple	
  models	
  that	
  have	
  real	
  predictive	
  power.”	
  
	
  
The	
  added	
  value	
  of	
  the	
  actuary	
  can	
  be	
  found	
  in	
  the	
  modelling	
  skills	
  and	
  the	
  ability	
  to	
  use	
  the	
  professional	
  
judgement.	
   The	
   organisation	
   of	
   the	
   profession	
   and	
   the	
   interaction	
   with	
   peers	
   creates	
   the	
   framework	
  
allowing	
  to	
  exercise	
  this	
  judgement.	
  Actuaries’	
  focus	
  also	
  goes	
  to	
  an	
  appropriate	
  communication	
  of	
  the	
  
results	
  so	
  that	
  the	
  contribution	
  to	
  the	
  value	
  creation	
  can	
  be	
  optimized.	
  	
  	
  
	
  
6.2.4 Information	
  to	
  the	
  individual	
  customer	
  
Big	
  Data	
  can	
  help	
  to	
  find	
  answers	
  on	
  the	
  needs	
  of	
  consumers	
  and	
  society.	
  Customers	
  will	
  be	
  informed	
  on	
  
their	
  behaviour	
  so	
  that	
  they	
  will	
  be	
  able	
  to	
  correct,	
  influence	
  and	
  change	
  the	
  risk	
  behaviour.	
  Actuaries	
  will	
  
be	
  in	
  the	
  perfect	
  position	
  to	
  bring	
  the	
  data	
  back	
  to	
  the	
  customer,	
  be	
  it	
  through	
  the	
  pricing	
  of	
  insurance	
  
products	
  or	
  through	
  helping	
  in	
  establishing	
  awareness	
  campaigns.	
  	
  
	
  
	
  
	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
14	
  Jobert	
  Koomans,	
  Big	
  Data	
  –	
  Kennis	
  maakt	
  macht,	
  De	
  Actuaris	
  (Actuarieel	
  Genootschap),	
  May	
  2014	
  
15	
  Kevin	
  Pledge,	
  Newsletters	
  of	
  the	
  Society	
  of	
  Actuaries,	
  October	
  2012	
  
16	
  Esko	
  Kivisaari,	
  Big	
  Data	
  and	
  actuarial	
  mathematics,	
  working	
  paper	
  Insurance	
  Committee	
  of	
  the	
  Actuarial	
  
Association	
  of	
  Europe,	
  March	
  2015	
  
 
	
  
21
7 Conclusions	
  
The	
   rise	
   of	
   technology	
   megatrends	
   like	
   ubiquitous	
   mobile	
   phones	
   and	
   social	
   media,	
   customer	
  
personalization,	
   cloud	
   computing	
   and	
   Big	
   Data	
   have	
   enormous	
   impact	
   on	
   our	
   daily	
   lives	
   but	
   also	
   on	
  
business	
  operations.	
  There	
  are	
  plenty	
  very	
  successful	
  businesses,	
  across	
  different	
  industries	
  that	
  regard	
  
Big	
  Data	
  as	
  very	
  important	
  and	
  central	
  to	
  their	
  strategy.	
  	
  
	
  
In	
  this	
  information	
  paper	
  we	
  wanted	
  to	
  understand	
  what	
  would	
  be	
  the	
  impact	
  of	
  Big	
  Data	
  on	
  insurance	
  
industry	
  and	
  the	
  actuarial	
  profession.	
  We	
  asked	
  ourselves	
  whether	
  insurers	
  are	
  immune	
  to	
  these	
  recent	
  
changes?	
   Will	
   they	
   be	
   able	
   to	
   leverage	
   on	
   huge	
   volumes	
   of	
   new	
   available	
   data	
   coming	
   from	
   various	
  
sources	
  (mobile	
  phones,	
  social	
  media,	
  telematics	
  sensors,	
  wearables)	
  and	
  power	
  of	
  Big	
  Data?	
  	
  	
  
	
  
We	
  think	
  that	
  Big	
  Data	
  will	
  have	
  various	
  effects.	
  It	
  will	
  demand	
  from	
  companies	
  to	
  adopt	
  new	
  business	
  
culture	
  and	
  become	
  data-­‐driven	
  businesses.	
  It	
  will	
  have	
  an	
  impact	
  on	
  the	
  entire	
  insurance	
  value	
  chain,	
  
ranging	
  from	
  underwriting	
  to	
  claims	
  management.	
  	
  
	
  
Today’s	
   advanced	
   analytics	
   in	
   insurance	
   go	
   much	
   further	
   than	
   traditional	
   underwriting	
   and	
   actuarial	
  
science.	
  Machine	
  learning	
  and	
  predictive	
  modelling	
  is	
  the	
  way	
  forward	
  for	
  insurers	
  for	
  improving	
  pricing,	
  
segmentation	
   and	
   increasing	
   profitability.	
   For	
   instance	
   direct	
   measurement	
   of	
   driving	
   behaviour	
  
provides	
  new	
  rating	
  factors	
  and	
  transforms	
  auto	
  insurance	
  underwriting	
  and	
  pricing	
  processes.	
  	
  
	
  
Big	
   Data	
   can	
   also	
   play	
   a	
   tremendous	
   role	
   in	
   the	
   improvement	
   of	
   claims	
   management	
   by	
   for	
   instance	
  
providing	
  very	
  efficient	
  fraud	
  detection	
  models.	
  	
  
	
  
We	
  would	
  note	
  that	
  there	
  are	
  few	
  inhibitors	
  that	
  could	
  block	
  these	
  changes	
  with	
  legislation	
  being	
  one	
  of	
  
the	
   main	
   concerns.	
   The	
   EU	
   is	
   currently	
   working	
   on	
   General	
   Data	
   Protection	
   Regulation	
   (GDPR)	
   that	
  
updates	
  the	
  data	
  processing,	
  protection	
  privacy	
  and	
  establishes	
  legislation	
  adapted	
  to	
  the	
  digital	
  era.	
  It	
  is	
  
still	
  unclear	
  what	
  will	
  be	
  the	
  final	
  agreement	
  but	
  the	
  Regulation	
  must	
  be	
  appropriately	
  balanced	
  in	
  order	
  
to	
  guarantee	
  a	
  high	
  level	
  of	
  protection	
  of	
  the	
  individuals	
  and	
  allow	
  companies	
  to	
  preserve	
  innovation	
  and	
  
competitiveness.	
  	
  
	
  
Finally	
  we	
  discussed	
  new	
  frontiers	
  of	
  insurance.	
  Big	
  Data	
  gives	
  us	
  huge	
  amount	
  of	
  information	
  and	
  allows	
  
creating	
  “fairer”,	
  more	
  personalized	
  insurance	
  premium	
  being	
  at	
  odds	
  with	
  solidarity	
  aspect	
  of	
  insurance.	
  
However	
  we	
  think	
  that	
  Big	
  Data	
  will	
  not	
  revolutionize	
  it	
  and	
  risk	
  pooling	
  will	
  remain	
  core,	
  it	
  will	
  just	
  
become	
  better.	
  	
  
	
  
Big	
   Data	
   opens	
   a	
   lot	
   of	
   new	
   possibilities	
   for	
   actuaries.	
   Data	
   science	
   and	
   actuarial	
   science	
   do	
   mutually	
  
reinforce	
  each	
  other.	
  More	
  data	
  allow	
  for	
  a	
  richer	
  basis	
  for	
  actuarial	
  mathematical	
  analysis,	
  big	
  data	
  leads	
  
to	
   a	
   dynamic	
   risk	
   management	
   approach;	
   the	
   application	
   of	
   Big	
   Data	
   goes	
   far	
   beyond	
   the	
   insurance	
  
activity	
  and	
  therefore	
  offers	
  a	
  lot	
  of	
  new	
  opportunities.	
  The	
  implementation	
  of	
  Big	
  Data	
  in	
  insurance	
  and	
  
the	
  financial	
  services	
  industry	
  requires	
  the	
  input	
  of	
  the	
  actuary	
  as	
  the	
  subject	
  matter	
  expert	
  who	
  also	
  
understands	
   the	
   complex	
   methodology.	
   For	
   Big	
   Data	
   to	
   be	
   successful,	
   understandable	
   models	
   with	
  
predictive	
  power	
  are	
  required	
  for	
  which	
  the	
  professional	
  judgement	
  of	
  the	
  actuary	
  is	
  essential.	
  	
  
	
  
We	
  hope	
  that	
  the	
  paper	
  will	
  be	
  a	
  good	
  starting	
  point	
  for	
  the	
  discussion	
  about	
  the	
  interplay	
  between	
  Big	
  
Data	
   and	
   insurance	
   and	
   the	
   actuarial	
   profession.	
   The	
   Institute	
   for	
   Actuaries	
   in	
   Belgium	
   will	
   further	
  
develop	
  the	
  subject	
  and	
  prepare	
  the	
  Belgian	
  actuaries.	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
 
	
  
22
8 References	
  
8.1 Section	
  3.1	
  
[1]	
  Predictive	
  modeling	
  for	
  life	
  insurance,	
  Mike	
  Batty,	
  2010,	
  Deloitte	
  
(https://www.soa.org/files/pdf/research-­‐pred-­‐mod-­‐life-­‐batty.pdf	
  )	
  
[2]	
  Predictive	
  modeling	
  in	
  insurance:	
  key	
  issues	
  to	
  consider	
  throughout	
  the	
  lifecycle	
  of	
  a	
  model,	
  Chris	
  
Homewood,	
  2012,	
  	
  Swiss	
  Re,	
  
(http://www.swissre.com/library/archive/?searchByType=1010965&searchByType=1010965&sort=de
scending&sort=descending&search=yes&search=yes&searchByLanguage=851547&searchByLanguage=8
51547&m=m&m=m&searchByCategory=1023505&searchByCategory=1023505&searchByYear=872532
&searchByYear=872532#inline	
  )	
  
[3]	
  Data	
  analytics	
  in	
  life	
  insurance:	
  lessons	
  from	
  predictive	
  underwriting,	
  Willam	
  Trump,	
  2014,	
  Swiss	
  
Re(http://cgd.swissre.com/risk_dialogue_magazine/Healthcare_revolution/Data_Analytics_in_life_insura
nce.html	
  )	
  
[4]	
  Advanced	
  analytics	
  and	
  the	
  art	
  of	
  underwriting,	
  transforming	
  the	
  insurance	
  industry,	
  2007,	
  Deloitte	
  
(https://www.risknet.de/fileadmin/.../Deloitte-­‐Underwriting-­‐2007.pdf)	
  
[5]	
  Data	
  Management:	
  Foundation	
  for	
  a	
  360-­‐degree	
  Customer	
  View-­‐	
  White	
  Paper,	
  2012?,	
  Pitney	
  Bowes	
  
Software	
  (http://www.retailsolutionsonline.com/doc/data-­‐management-­‐foundation-­‐for-­‐a-­‐degree-­‐
customer-­‐view-­‐0001)	
  
[6]	
  Unleashing	
  the	
  value	
  of	
  advanced	
  analytics	
  in	
  insurance,	
  Richard	
  Clarke	
  and	
  Ari	
  Libarikian,	
  2014,	
  
McKinsey	
  
(http://www.mckinsey.com/insights/financial_services/unleashing_the_value_of_advanced_analytics_in_
insurance)	
  
	
  
8.2 Section	
  3.2	
  
[7]	
  Ptolemus	
  USAGE-­‐BASED	
  INSURANCE	
  Global	
  Study	
  2013	
  
[8]	
  Milliman:	
  Usage-­‐based	
  insurance:	
  Big	
  data,	
  machine	
  learning,	
  and	
  putting	
  telematics	
  to	
  work	
  -­‐	
  Marcus	
  
Looft,	
  Scott	
  C.	
  Kurban	
  
[9]	
  Capitalizing	
  on	
  Big	
  Data	
  Analytics	
  for	
  the	
  Insurance	
  Industry	
  
[10]	
  Driving	
  profitability	
  and	
  lowering	
  costs	
  in	
  the	
  Insurance	
  Industry	
  using	
  Machine	
  Learning	
  on	
  
Hadoop	
  -­‐August	
  6,	
  2015	
  /	
  Amit	
  Rawlani	
  /	
  Big	
  Data	
  Ecosystem,	
  Business,	
  Machine	
  Learning	
  /	
  Leave	
  a	
  
Reply	
  
[11]	
  HSBC	
  -­‐	
  Big	
  data	
  opens	
  new	
  horizons	
  for	
  insurers	
  
	
  
8.3 Section	
  3.3:	
  
[12]	
  Antonio,	
  K.,	
  &	
  Plat,	
  R.	
  (2014).	
  Micro-­‐level	
  stochastic	
  loss	
  reserving	
  in	
  general	
  insurance.	
  Scandinavian	
  
Actuarial	
  Journal	
  ,	
  649-­‐669.	
  
[13]	
  Arjas,	
  E.	
  (1989).	
  The	
  Claims	
  Reserving	
  Problem	
  in	
  Non-­‐Life	
  Insurance:	
  Some	
  Structural	
  Ideas.	
  Astin	
  
Bulletin	
  19	
  (2),	
  140-­‐152.	
  
[14]	
  England,	
  P.,	
  &	
  Verrall,	
  R.	
  (2002).	
  Stochastic	
  Claims	
  Reserving	
  in	
  General	
  Insurance.	
  British	
  Actuarial	
  
Journal	
  (8),	
  443-­‐544.	
  
[15]	
  Gremillet,	
  M.,	
  Miehe,	
  P.,	
  &	
  Trufin,	
  J.	
  (n.d.).	
  Implementing	
  the	
  Individual	
  Claims	
  Reserving	
  Method,	
  A	
  
New	
  Approach	
  in	
  Non-­‐Life	
  Reserving.	
  Working	
  Paper	
  .	
  
[16]	
  Haastrup,	
  S.,	
  &	
  Arjas,	
  E.	
  (1996).	
  Claims	
  Reserving	
  in	
  Continuous	
  Time	
  -­‐	
  A	
  Nonparametric	
  Bayesian	
  
Approach.	
  ASTIN	
  Bulletin	
  (26),	
  139-­‐164.	
  
[17]	
  Jewell,	
  W.	
  (1989).	
  Predicting	
  IBNYR	
  Events	
  and	
  Delays,	
  Part	
  I	
  Continuous	
  Time.	
  ASTIN	
  Bulletin	
  (19),	
  
25-­‐56.	
  
[18]	
  Jin,	
  X.,	
  &	
  Frees,	
  E.	
  W.	
  (n.d.).	
  Comparing	
  Micro-­‐	
  and	
  Macro-­‐Level	
  Loss	
  Reserving	
  Models.	
  Working	
  
Paper	
  .	
  
[19]	
  Larsen,	
  C.	
  R.	
  (2007).	
  An	
  Individual	
  Claims	
  Reserving	
  Model.	
  ASTIN	
  Bulletin	
  (37),	
  113-­‐132.	
  
[20]	
  Mack,	
  T.	
  (1993).	
  Distribution-­‐free	
  calculation	
  of	
  the	
  standard	
  error	
  of	
  Chain	
  Ladder	
  reserve	
  
estimates.	
  ASTIN	
  Bulletin	
  (23),	
  213-­‐225.	
  
[21]	
  Mack,	
  T.	
  (1999).	
  The	
  standard	
  error	
  of	
  Chain	
  Ladder	
  reserve	
  estimate:	
  recursive	
  calculation	
  and	
  
inclusion	
  of	
  a	
  tail	
  factor.	
  ASTIN	
  Bulletin	
  (29),	
  361-­‐366.	
  
[22]	
  Norberg,	
  R.	
  (1999).	
  Prediction	
  of	
  Outstanding	
  Liabilities	
  II:	
  Model	
  Variations	
  and	
  Extensions.	
  ASTIN	
  
Bulletin	
  (29),	
  5-­‐25.	
  
[23]	
  Norberg,	
  R.	
  (1993).	
  Prediction	
  of	
  Outstanding	
  Liabilities	
  in	
  Non-­‐Life	
  Insurance.	
  ASTIN	
  Bulletin	
  (23),	
  
95-­‐115.	
  
 
	
  
23
[24]	
  Pigeon,	
  M.,	
  Antonio,	
  K.,	
  &	
  Denuit,	
  M.	
  (2014).	
  Individual	
  loss	
  reserving	
  using	
  paid-­‐incurred	
  data.	
  
Insurance:	
  Mathematics	
  &	
  Economics	
  (58),	
  121-­‐131.	
  
[25]	
  Pigeon,	
  M.,	
  Antonio,	
  K.,	
  &	
  Denuit,	
  M.	
  (2013).	
  Individual	
  Loss	
  Reserving	
  with	
  the	
  Multivariate	
  Skew	
  
Normal	
  Model.	
  ASTIN	
  Bulletin	
  (43),	
  399-­‐428.	
  
[26]	
  Wüthrich,	
  M.,	
  &	
  Merz,	
  M.	
  (2008).	
  Modelling	
  the	
  claims	
  development	
  result	
  for	
  Solvency	
  purposes.	
  
ASTIN	
  colloquium	
  .	
  
[27]	
  Wüthrich,	
  M.,	
  &	
  Merz,	
  M.	
  (2008).	
  Stochastic	
  Claims	
  Reserving	
  Methods	
  in	
  Insurance.	
  New	
  York:	
  Wiley.	
  
[28]	
  Zhao,	
  X.,	
  &	
  Zhou,	
  X.	
  (2010).	
  Applying	
  Copula	
  Models	
  to	
  Individual	
  Claim	
  Loss	
  Reserving	
  Methods.	
  
Insurance:	
  Mathematics	
  and	
  Economics	
  (46),	
  290-­‐299.[29]	
  Zhao,	
  X.,	
  Zhou,	
  X.,	
  &	
  Wang,	
  J.	
  (2009).	
  
Semiparametric	
  Model	
  for	
  Prediction	
  of	
  Individual	
  Claim	
  Loss	
  Reserving.	
  Insurance:	
  Mathematics	
  and	
  
Economics	
  (45),	
  1-­‐8.	
  
	
   	
  
8.4 Section	
  4	
  
[30]	
  Avraham,	
  R.,	
  Logue,	
  K.	
  D.,	
  and	
  Schwarcz,	
  D.	
  B.,	
  Understanding	
  Insurance	
  Anti-­‐Discrimination	
  Laws,	
  
Law	
  &	
  Economics	
  Working	
  Papers	
  52,	
  University	
  Michigan,	
  2013.	
  
[31]	
  Davey,	
  James,	
  Genetic	
  discrimination	
  in	
  insurance:	
  lessons	
  from	
  test	
  achats	
  in	
  De	
  Paor,	
  A.,	
  Quinn,	
  
G.	
  and	
  Blanck,	
  P.	
  (eds.),	
  Genetic	
  Discrimination	
  -­‐	
  Transatlantic	
  Perspectives	
  on	
  the	
  Case	
  for	
  a	
  European	
  
Level	
  Legal	
  Response,	
  Abingdon,	
  2014.	
  
[32]	
  Yann	
  Joly	
  et	
  al.,	
  Life	
  insurance:	
  genomic	
  stratification	
  and	
  risk	
  classification	
  in	
  European	
  Journal	
  of	
  
Human	
  Genetics,	
  2014	
  May;	
  22(5),	
  575–579,	
  p.	
  575	
  

IABE Big Data information paper - An actuarial perspective

  • 1.
              BIG  DATA:  An  actuarial  perspective       Information  Paper   November  2015    
  • 2.
        2 Table  of  Contents   1  INTRODUCTION   3   2  INTRODUCTION  TO  BIG  DATA   3   2.1  INTRODUCTION  AND  CHARACTERISTICS   3   2.2  BIG  DATA  TECHNIQUES  AND  TOOLS   4   2.3  BIG  DATA  APPLICATIONS   4   2.4  DATA  DRIVEN  BUSINESS   5   3  BIG  DATA  IN  INSURANCE  VALUE  CHAIN   6   3.1  INSURANCE  UNDERWRITING   6   3.2  INSURANCE  PRICING   8   3.3  INSURANCE  RESERVING   10   3.4  CLAIMS  MANAGEMENT   11   4  LEGAL  ASPECTS  OF  BIG  DATA   13   4.1  INTRODUCTION   13   4.2  DATA  PROCESSING   14   4.3  DISCRIMINATION   16   5  NEW  FRONTIERS   17   5.1  RISK  POOLING  VS.  PERSONALIZATION   17   5.2  PERSONALISED  PREMIUM   18   5.3  FROM  INSURANCE  TO  PREVENTION   18   5.4  THE  ALL-­‐SEEING  INSURER   18   5.5  CHANGE  IN  INSURANCE  BUSINESS   19   6  ACTUARIAL  SCIENCES  AND  THE  ROLE  OF  ACTUARIES   19   6.1  WHAT  IS  BIG  DATA  BRINGING  FOR  THE  ACTUARY?   19   6.2  WHAT  IS  THE  ACTUARY  BRINGING  TO  BIG  DATA?   20   7  CONCLUSIONS   21   8  REFERENCES   22  
  • 3.
        3 1 Introduction   The  Internet  has  started  in  1984  linking  1,000  university  and  corporate  labs.  In  1998  it  grew  to  50  million   users,  while  in  2015  it  reached  3.2  billion  people  (44%  of  the  global  population).  This  enormous  user   growth  was  combined  with  an  explosion  of  data  that  we  all  produce.  Every  day  we  create  around  2.5   quintillion  bytes  of  data,  information  coming  from  various  sources  including  social  media  sites,  gadgets,   smartphones,   intelligent   homes   and   cars   or   industrial   sensors   to   name   few.   Any   company   that   can   combine  various  datasets  and  can  entail  effective  data  analytics  will  be  able  to  become  more  profitable   and  successful.  According  to  a  recent  report1  400  large  companies  who  adopted  Big  Data  analytics  "have   gained  a  significant  lead  over  the  rest  of  the  corporate  world."  Big  data  offers  big  business  gains,  but  also   has   hidden   costs   and   complexity   that   companies   will   have   to   struggle   with.   Semi-­‐structured   and   unstructured  big  data  requires  new  skills  and  there  is  shortage  of  people  who  mastered  data  science  and   can  handle  mathematics  and  statistics,  programming  and  possess  substantive,  domain  knowledge.     What  will  be  the  impact  on  the  insurance  sector  and  the  actuarial  profession?  The  concepts  of  Big  Data   and   predictive   modelling   are   not   new   to   insurers   who   have   already   been   storing   and   analysing   large   quantities  of  data  to  achieve  deeper  insights  into  customers’  behaviour  or  setting  up  insurance  premiums.   Moreover   actuaries   are   data   scientists   for   insurance   and   they   have   all   the   statistical   training   and   analytical  thinking  to  understand  complexity  of  data  combined  with  the  business  insights.  We  look  closely   on   the   insurance   value   chain   and   assess   the   impact   of   Big   Data   on   underwriting,   pricing   and   claims   reserving.   We   examine   the   ethics   of   Big   Data   including   data   privacy,   customer   identification,   data   ownership   and   the   legal   aspects.   We   also   discuss   new   frontiers   for   insurance   and   its   impact   on   the   actuarial  profession.  Will  actuaries  will  be  able  to  leverage  Big  Data,  create  sophisticated  risk  models  and   more  personalized  insurance  offers,  and  bring  new  wave  of  innovation  to  the  market?       2 Introduction  to  Big  Data     2.1 Introduction  and  characteristics   Big  Data  broadly  refers  to  data  sets  so  large  and  complex  that  they  cannot  be  handled  by  traditional  data   processing  software  and  it  can  be  defined  by  the  following  attributes:   a. Volume:  in  2012  it  was  estimated  that  2.5  x  1018  bytes  of  data  was  created  worldwide  every  day  -­‐   this  is  equivalent  to  a  stack  of  books  from  the  Sun  to  Pluto  and  back  again.  This  data  comes  from   everywhere:   sensors   used   to   gather   climate   information,   posts   to   social   media   sites,   digital   pictures  and  videos,  purchase  transaction  records,  software  logs,  GPS  signals  from  mobile  devices,   among  others.   b. Variety  and  Variability:  the  challenges  of  Big  Data  do  not  only  arise  from  the  sheer  volume  of   data  but  also  from  the  fact  that  data  is  generated  in  multiple  forms  as  a  mix  of  unstructured  and   structured  data,  and  as  a  mix  of  data  at  rest  and  data  in  motion  (i.e.  static  and  real  time  data).   Furthermore   the   meaning   of   data   can   change   over   time   or   depend   on   the   context.   Structured   data  is  organized  in  a  way  that  both  computers  and  humans  can  read,  for  example  information   stored   in   traditional   databases.   Unstructured   data   refers   to   data   types   such   as   images,   audio,   video,   social   media   and   other   information   that   are   not   organized   or   easily   interpreted   by   traditional   databases.   It   includes   data   generated   by   machines   such   as   sensors,   web   feeds,   networks  or  service  platforms.   c. Visualization:  the  insights  gained  by  a  company  from  analysing  data  must  be  shared  in  a  way  that   is  efficient  and  understandable  to  the  company’s  stakeholders.   d. Velocity:  data  is  created,  saved,  analysed  and  visualized  at  an  increasing  speed,  making  it  possible   to  analyse  and  visualize  high  volumes  of  data  in  real  time.     e. Veracity:  it  is  essential  that  the  data  is  accurate  in  order  to  generate  value.   f. Value:  the  insights  gleaned  from  Big  Data  can  help  organizations  deepen  customer  engagement,   optimize  operations,  prevent  threats  and  fraud,  and  capitalize  on  new  sources  of  revenue.                                                                                                                             1  http://www.bain.com/publications/articles/big_data_the_organizational_challenge.aspx  
  • 4.
        4 2.2 Big  Data  techniques  and  tools   The  Big  Data  industry  has  been  supported  by  the  following  technologies:   a. The  Apache  Hadoop  software  library  was  initially  released  in  December  2011  and  is  an  open   source  framework  that  allows  for  the  distributed  processing  of  large  data  sets  across  clusters  of   computers  using  simple  algorithms.  It  is  designed  to  scale  up  from  one  to  thousands  of  machines,   each   one   being   a   computational   and   storage   unit.   The   software   library   is   designed   under   the   fundamental   assumption   that   hardware   failures   are   common:   the   library   itself   automatically   detects   and   handles   hardware   failures   in   order   to   guarantee   that   the   services   provided   by   a   computer  cluster  will  stay  available  even  when  the  cluster  is  affected  by  hardware  failures.  A  wide   variety  of  companies  and  organizations  use  Hadoop  for  both  research  and  production:  web-­‐based   companies   that   own   some   of   the   world’s   biggest   data   warehouses   (Amazon,   Facebook,   Google,   Twitter,  Yahoo!,  ...),  media  groups,  universities  among  others.  A  list  of  Hadoop  users  and  systems   is  available  at  http://wiki.apache.org/hadoop/PoweredBy.   b. Non-­‐relational  databases  have  existed  since  the  late  1960s  but  resurfaced  in  2009  (under  the   moniker  of  Not  Only  SQL  -­‐  NOSQL))  as  it  became  clear  they  are  especially  well  suited  to  handle  the   Big   Data   challenges   of   volume   and   variety   and   as   they   neatly   fit   within   the   Apache   Hadoop   framework.   c. Cloud   Computing   is   a   kind   of   internet-­‐based   computing,   where   shared   resources   and   information   are   provided   to   computers   and   other   devices   on-­‐demand   (Wikipedia).   A   service   provider  offers  computing  resources  for  a  fixed  price,  available  online  and  in  general  with  a  high   degree  of  flexibility  and  reliability.  These  technologies  have  been  created  by  major  online  actors   (Amazon,  Google)  followed  by  other  technology  providers  (IBM,  Microsoft,  RedHat).  There  is  a   wide  variety  of  architecture  Public,  Private  and  Hybride  Cloud  with  all  the  objective  of  making   computing  infrastructure  a  commodity  asset  with  the  best  quality/total  cost  of  ownership  ratio.   Having  a  nearly  infinite  amount  of  computing  power  at  hand  with  a  high  flexibility  is  a  key  factor   for  the  success  of  Big  Data  initiatives.   d. Mining  Massive  Datasets  is  a  set  of  methods,  algorithms  and  techniques  that  can  be  used  to  deal   with  Big  Data  problems  and  in  particular  with  volume,  variety  and  velocity  issues.  PageRank  can   be   seen   as   a   major   step   (see   http://infolab.stanford.edu/pub/papers/google.pdf)   and   its   evolution  to  a  Map-­‐Reduce  (https://en.wikipedia.org/wiki/MapReduce)  approach  is  definitively  a   breakthrough.  Social  Netword  Analysis  is  becoming  an  area  of  research  in  itself  that  aim  to  extract   useful   information   from   the   massive   amount   of   data   the   Social   Networks   are   providing.   These   methods   are   very   well   suited   to   run   on   software   such   as   Hadoop   in   a   Cloud   Computing   environment.   e. Social  Networks  is  one  source  of  Bid  Data  that  provides  a  stream  of  data  with  a  huge  value  for   almost  all  economic  (and  even  non-­‐economic)  actors.  For  most  companies,  it  is  the  very  first  time   in  history  they  are  capable  of  interacting  directly  with  their  customers.  Many  applications  of  Big   Data   make   use   of   these   data   to   provide   enhanced   services,   products   and   to   increase   customer   satisfaction.   2.3 Big  Data  Applications   Big  Data  has  the  potential  to  change  the  way  academic  institutions,  corporate  and  organizations  conduct   business  and  change  our  daily  life.  Great  examples  of  Big  Data  applications  include:   a. Healthcare:   Big   Data   technologies   will   have   a   major   impact   in   healthcare.   IBM   estimates   that   80%  of  medical  data  is  unstructured  and  is  clinically  relevant.  Furthermore  medical  data  resides   in  multiple  places  like  individual  medical  files,  lab  and  imaging  systems,  physician  notes,  medical   correspondence,   etc.   Big   Data   technologies   allow   healthcare   organizations   to   bring   all   the   information   about   an   individual   together   to   get   insights   on   how   to   manage   care   coordination,   outcomes-­‐based  reimbursement  models,  patient  engagement  and  outreach  programs.   b. Retail:  Retailers  can  get  insights  for  personalizing  marketing  and  improving  the  effectiveness  of   marketing  campaigns,  for  optimizing  assortment  and  merchandising  decisions,  and  for  removing   inefficiencies  in  distribution  and  operations.  For  instance  several  retailers  now  incorporate  
  • 5.
        5 Twitter  streams  into  their  analysis  of  loyalty-­‐program  data.  The  gained  insights  make  it  possible   to  plan  for  surges  in  demand  for  certain  items  and  to  create  mobile  marketing  campaigns   targeting  specific  customers  with  offers  at  the  times  of  day  they  would  be  most  receptive  to  them.2   c. Politics:  Big  Data  technologies  will  improve  the  efficiency  and  effectiveness  across  the  broad   range  of  government  responsibilities.  Great  example  of  Big  Data  use  in  politics  was  2012  analytics   and  metrics  driven  Barack  Obama’s  presidential  campaign  [1].  Other  examples  include:   i. Threat  and  crime  prediction  and  prevention.  For  instance  the  Detroit  Crime  Commission   has  turned  to  Big  Data  in  its  effort  to  assist  the  government  and  citizens  of  southeast   Michigan  in  the  prevention,  investigation  and  prosecution  of  neighbourhood  crime;3   ii. Detection  of  fraud,  waste  and  errors  in  social  programs;   iii. Detection  of  tax  fraud  and  abuse.   d. Cyber  risk  prevention:  companies  can  analyse  data  traffic  in  their  computer  networks  in  real   time  to  detect  anomalies  that  may  indicate  the  early  stages  of  a  cyber  attack.  Research  firm   Gartner  estimates  that  by  2016,  more  than  25%  of  global  firms  will  adopt  big  data  analytics  for  at   least  one  security  and  fraud  detection  use  case,  up  from  8%  as  at  2014.4   e. Insurance  fraud  detection:  Insurance  companies  can  determine  a  score  for  each  claim  in  order   to  target  for  fraud  investigation  the  claims  with  the  highest  scores  i.e.  the  ones  that  are  most  likely   to  be  fraudulent.  Fraud  detection  is  treated  in  paragraph  3.4.   f. Usage-­‐Based  Insurance:  is  an  insurance  scheme,  where  car  insurance  premiums  are  calculated   based  on  dynamic  causal  data,  including  actual  usage  and  driving  behaviour.  Telematics  data   transmitted  from  a  vehicle  combined  with  Big  Data  analytics  enables  insurers  to  distinguish   cautious  drivers  from  aggressive  drivers  and  match  insurance  rate  with  the  actual  risk  incurred.   2.4 Data  driven  business   The   quantity   of   data   is   steeply   increasing   month   after   month   in   the   world.   Some   argue   it   is   time   to   organize  and  use  this  information:  data  must  now  be  viewed  as  a  corporate  asset.    In  order  to  respond  to   this  arising  transformation  of  business  culture,  two  specific  C-­‐level  roles  have  thus  appeared  in  the  past   years,  one  in  the  banking  and  the  other  in  the  insurance  industry.   2.4.1 The  Chief  Data  Officer   The  Chief  Data  Officer  (abbreviated  to  CDO)  is  the  first  architect  of  this  “data-­‐driven  business”.  Thanks   to  his  role  of  coordinator,  the  CDO  will  be  in  charge  of  the  data  that  drive  the  company,  by:     • defining  and  setting  up  a  strategy  to  guarantee  their  quality,  their  reliability  and  their   coherency;   • organizing  and  classifying  them;   • making  them  accessible  to  the  right  person  at  the  right  moment,  for  the  pertinent  need  and  in   the  right  format.   Thus,  the  Chief  Data  Officer  needs  a  strong  business  background  to  understand  how  business  runs.  The   following   question   will   then   emerge:   to   whom   should   the   CDO   report?   In   some   firms,   the   CDO   is   considered  part  of  the  IT,  and  reports  to  the  CTO  (Chief  Technology  Officer);  in  others,  he  holds  more  of  a   business  role,  reporting  to  the  CEO.  It’s  therefore  up  to  the  company  to  decide,  as  not  two  companies  are   exactly  similar  from  a  structural  point  of  view.     Which   companies   have   already   a   CDO?   Generali   Group   has   appointed   someone   to   this   newly   created   position   in   June   2015.   Other   companies   such   as   HSBC,   Wells   Fargo   and   QBE   had   already   appointed   a   person   to   this   position   in   2013   or   2014.   Even   Barack   Obama   appointed   a   Chief   Data   Officer/Scientist   during  his  2012  campaign  and  the  metrics-­‐driven  decision-­‐making  campaign  played  a  big  role  in  Obama’s                                                                                                                             2  http://asmarterplanet.com/blog/2015/03/surprising-­‐insights-­‐ibmtwitter-­‐alliance.html#more-­‐33140   3  http://www.datameer.com/company/news/press-­‐releases/detroit-­‐crime-­‐commission-­‐combats-­‐crime-­‐with-­‐ datameer-­‐big-­‐data-­‐analytics.html   4  http://www.gartner.com/newsroom/id/2663015  
  • 6.
        6 re-­‐election.  In   the   beginning,   most   of   the   professionals   holding   the   actual   job   title   “Chief   Data   Officer”   were  located  in  the  United  States.  After  a  while,  Europe  followed  the  move.  Also,  lots  of  people  did  the  job   in  their  day-­‐to-­‐day  work,  but  didn’t  necessarily  hold  the  title.  Many  analysts  in  the  financial  sector  believe   that  yet  more  insurance  and  banking  companies  will  have  to  do  the  move  in  the  following  years  if  they   want  to  stay  attractive.   2.4.2 The  Chief  Analytics  Officer   Another  C-­‐level  position  aroused  in  the  past  months:  the  Chief  Analytics  Officer  (abbreviated  to  CAO).  Are   there  differences  between  a  CAO  and  a  CDO?    Theoretically  a  CDO  focuses  on  tactical  data  management,   while  the  CAO  concentrates  on  the  strategic  deployment  of  analytics.  The  latter’s  focus  is  on  data  analysis   to   find   hidden,   but   valuable,   patterns.   These   will   result   in   operational   decisions   that   will   make   the   company   more   competitive,   more   efficient   and   more   attractive   to   their   potential   and   current   clients.   Therefore,   the   CAO   is   a   normal   prolongation   of   the   data-­‐driven   business:   the   more   analytics   are   embedded  in  the  organization,  the  more  you  need  an  executive-­‐level  person  to  manage  that  position  and   communicate  the  results  in  an  understandable  way.  The  CAO  usually  reports  to  the  CEO.   In   practice,   some   companies   put   the   CAO   responsibilities   into   the   CDO   tasks,   while   others   distinguish   both  positions.  Currently,  it’s  quite  rare  to  find  an  explicit  “Chief  Analytics  Officer”  position  in  the  banking   and  insurance  sector,  because  of  this  overlap.  But  in  other  fields,  the  distinction  is  often  made.   3 Big  Data  in  insurance  value  chain   Big   Data   provides   new   insights   from   social   networks,   telematics   sensors,   and   other   new   information   channels   and   as   a   result   it   allows   understanding   customer   preferences   better,   enabling   new   business   approaches  and  products,  and  enhancing  existing  internal  models,  processes  and  services.  With  the  rise   of  Big  Data  the  insurance  world  could  fundamentally  change  and  the  entire  insurance  value  chain  could   be  impacted  starting  from  underwriting  to  claims  management.       3.1 Insurance  underwriting   3.1.1 Introduction   In  traditional  insurance  underwriting  and  actuarial  analyses,  for  years  we  have  been  observing  a  never-­‐ ending  search  for  more  meaningful  insight  into  individual  policyholder  risk  characteristics  to  distinguish   good   risks   from   the   bad   and   to   accurately   price   each   risk   accordingly.   The   analytics   performed   by   actuaries,  based  on  advanced  mathematical  and  financial  theories,  have  always  been  critically  important   to   an   insurer’s   profitability.   Over   the   last   decade,   however,   revolutionary   advances   in   computing   technology   and   the   explosion   of   new   digital   data   sources   have   expanded   and   reinvented   the   core   disciplines   of   insurers.   Today’s   advanced   analytics   in   insurance   go   much   further   than   traditional   underwriting  and  actuarial  science.  Data  mining  and  predictive  modelling  is  today  the  way  forward  for   insurers  for  improving  pricing,  segmentation  and  increasing  profitability.   3.1.2 What  is  predictive  modelling?   Predictive  modelling  can  be  defined  as  the  analysis  of  large  historical  data  sets  to  identify  correlations   and  interactions  and  the  use  of  this  knowledge  to  predict  future  events.  For  actuaries,  the  concepts  of   predictive  modelling  are  not  new  to  the  profession.  The  use  of  mortality  tables  to  price  life  insurance   products   is   an   example   of   predictive   modelling.   The   Belgian   MK,   FK   and   MR,   FR   tables   showed   the   relationship  between  death  probability  and  the  explaining  variables  of  age,  sex  and  product  type  (in  this   case  life  insurance  or  annuity).   Predictive   models   have   been   around   a   long   time   in   sales   and   marketing   environments   for   example   to   predict  the  probability  of  a  customer  to  buy  a  new  product.  Bringing  together  expertise  from  both  the   actuarial   profession   and   marketing   analytics   can   lead   to   new   innovative   initiatives   where   predictive   models  guide  expert  decisions  in  areas  such  as  claims  management,  fraud  detection  and  underwriting.   3.1.3 From  small  over  medium  to  Big  Data   Insurers  collect  a  wealth  of  information  on  their  customers.  In  the  first  place  during  the  underwriting   process:   by   asking   about   the   claims   history   of   a   customer   for   car   and   home   insurance   for   example.   Another  source  is  the  history  of  the  relationship  the  customer  has  with  the  insurance  company.  While  in   the  past  the  data  was  kept  in  silos  by  product,  the  key  challenge  now  lies  in  gathering  all  this  information   into  one  place  where  the  customer  dimension  is  central.  The  transversal  approach  to  the  database  also  
  • 7.
        7 reflects  the  recent  evolution  in  marketing:  going  from  the  4P’s  (product,  price,  place,  promotion)  to  the   4C’s5  (customer,  costs,  convenience,  communication).   On  top  of  unleashing  the  value  of  internal  data,  new  data  sources  are  becoming  available  like  for  instance   wearables,  social  networks  to  name  few.  Because  Big  Data  can  be  overwhelming  to  start  with,  medium   data   should   be   considered   at   first.   In   Belgium,   the   strong   bancassurance   tradition   offers   interesting   opportunities  of  combining  the  insurance  and  bank  data  to  create  powerful  predictive  models.   3.1.4 Examples  of  predictive  modelling  for  underwriting   1°  Use  the  360  view  on  the  customer  and  predictive  models  to  maximize  profitability  and  gain  more   business.   By   thoroughly   analysing   data   from   different   sources   and   applying   analytics   to   gain   insight,   insurance   companies   should   strive   to   develop   a   comprehensive   360-­‐degree   customer   view.   The   gains   of   this   complete  and  accurate  view  of  the  customer  are  twofold:   • Maximizing  the  profitability  of  the  current  customer  portfolio  through:   o detecting  cross-­‐sell  and  up-­‐sell  opportunities;   o customer  satisfaction  and  loyalty  actions,   o effective  targeting  of  products  and  services  (e.g.    customers  that  are  most  likely  to  be  in   good  health  or  those  customers  that  are  less  likely  to  have  a  car  accident).   • Acquiring   more   profitable   new   customers   at   a   reduced   marketing   cost:   modelling   the   existing   customers  will  lead  to  useful  information  to  focus  marketing  campaigns  on  the  most  interesting   prospects.   By  combining  data  mining  and  analytics,  insurance  companies  can  better  understand  which  customers   are  most  likely  to  buy,  discover  who  are  their  most  profitable  customers  and  how  to  attract  or  retain   more   of   them.   Another   use   case   can   be   the   evaluation   of   the   underwriting   process   to   improve   the   customer  experience  during  this  on-­‐boarding  process.   2°  Predictive  underwriting  for  life  insurance6   Using  predictive  models,  in  theory  it  is  possible  to  predict  the  death  probability  of  a  customer.  However,   the  low  frequency  of  life  insurance  claims  presents  a  challenge  to  modellers.  While  for  car  insurance,  the   probability  of  a  customer  having  a  claim  can  be  around  10%,  for  life  insurance  it  is  around  0,1%  for  the   first  year.  Not  only  does  this  mean  that  a  significant  in  force  book  is  needed  to  have  confidence  in  the   results,  but  also  that  sufficient  history  should  be  present  to  be  able  to  show  mortality  experience  over   time.  For  this  reason,  using  the  underwriting  decision  as  the  variable  to  predict  is  a  more  common  choice.   All  life  insurance  companies  hold  historical  data  on  medical  underwriting  decisions  that  can  be  leveraged   to  build  predictive  models  that  predict  underwriting  decisions.  Depending  on  how  the  model  is  used,  the   outcome  can  be  a  reduction  of  costs  for  medical  examinations,  to  have  more  customer  friendly  processes   by  avoiding  asking  numerous  invasive  personal  questions  or  a  reduction  in  time  needed  to  assess  the   risks  by  automatically  approving  good  risks  and  focusing  underwriting  efforts  on  more  complex  cases.   For   example,   if   the   predictive   model   tells   you   that   a   new   customer   has   a   high   degree   of   similarity   to   customers   that   passed   the   medical   examination,   the   medical   examination   could   be   waved   for   this   customer.   If  this  sounds  scary  for  risk  professionals,  first  a  softer  approach  can  be  tested,  for  instance  by  improving   marketing  actions  by  targeting  only  those  individuals  that  have  a  high  likelihood  to  be  in  good  health.   This   not   only   decreases   the   cost   of   the   campaign,   but   also   avoids   the   disappointment   of   a   potential   customer  who  is  refused  during  the  medical  screening  process.                                                                                                                                 5  http://www.customfitonline.com/news/2012/10/19/4-­‐cs-­‐versus-­‐the-­‐4-­‐ps-­‐of-­‐marketing/   6  Predictive  modeling  for  life  insurance,  April  2010,  Deloitte  
  • 8.
        8 3.1.5 Challenges  of  predictive  modelling  in  underwriting7   Predictive  models  can  only  be  as  good  as  the  input  used  to  calibrate  the  model.  The  first  challenge  in   every  predictive  modelling  project  is  to  collect  relevant,  high  quality  data  of  which  a  history  is  present.  As   many   insurers   are   currently   replacing   legacy   systems   to   reduce   maintenance   costs,   this   can   be   at   the   expense  of  the  history.  Actuaries  are  uniquely  placed  to  prevent  the  history  being  lost,  as  for  adequate   risk   management;   a   portfolio’s   history   should   be   kept.   The   trend   of   moving   all   policies   from   several   legacy  systems  into  one  modern  single  policy  administration  system  is  an  opportunity  that  must  be  seized   so  in  the  future  data  collection  will  be  easier.   Once  the  necessary  data  are  collected,  some  legal  or  compliance  concerns  need  to  be  addressed  as  there   might  be  boundaries  to  using  certain  variables  in  the  underwriting  process.  In  Europe,  if  the  model  will   influence  the  price  of  the  insurance,  gender  is  no  longer  allowed  as  an  explanatory  variable.  And  this  is   only  one  example.  It  is  important  that  the  purpose  of  the  model  and  the  possible  inputs  are  discussed   with  the  legal  department  prior  to  starting  the  modelling.   Once  the  model  is  built,  it  is  important  that  the  users  realize  that  no  model  is  perfect.  This  means  that   residual  risks  will  be  present  and  this  should  be  put  in  the  balance  against  the  gains  that  the  use  of  the   model  can  bring.   And  finally,  once  a  predictive  model  has  been  set  up,  a  continuous  reviewing  cycle  must  be  put  in  place   that  collects  feedback  from  the  underwriting  and  sales  teams  and  collects  data  to  improve  and  refine  the   model.  Building  a  predictive  model  is  a  continuous  improvement  process,  not  a  one-­‐off  project.   3.2 Insurance  pricing   3.2.1 Overview  of  existing  pricing  techniques   The  first  rate-­‐making  techniques  were  based  on  rudimentary  methods  such  as  univariate  analysis  and   later  iterative  standardized  univariate  methods  such  as  the  minimum  bias  procedure.  They  look  at  how   changes  in  one  characteristic  result  in  differences  in  loss  frequency  or  severity.     Later   on   insurance   companies   moved   to   multivariate   methods.   However,   this   was   associated   with   a   further   development   of   the   computing   power   and   data   capabilities.   These   techniques   are   now   being   adopted  by  more  and  more  insurers  and  are  becoming  part  of  everyday  business  practices.  Multivariate   analytical  techniques  focus  on  individual  level  data  and  take  into  account  the  effects  (interactions)  that   many  different  characteristics  of  a  risk  have  on  one  another.  As  it  was  explained  in  the  previous  section,   many  companies  use  predictive  modelling  (a  form  of  multivariate  analysis)  to  create  measures  of  the   likelihood  that  a  customer  will  purchase  a  particular  product.  Banks  use  these  tools  to  create  measures   (e.g.  credit  scores)  of  whether  a  client  will  be  able  to  meet  lending  obligations  for  a  loan  or  mortgage.   Similarly,   P&C   insurers   can   use   predictive   models   to   predict   claim   behaviour.   Multivariate   methods   provide  valuable  diagnostics  that  aid  in  understanding  the  certainty  and  reasonableness  of  results.     Generalized  Linear  Models  are  essentially  a  generalized  form  of  linear  models.  This  family  encompasses   normal   error   linear   regression   models   and   the   nonlinear   exponential,   logistic   and   Poisson   regression   models,  as  well  as  many  other  models,  such  as  log-­‐linear  models  for  categorical  data.  Generalized  linear   models  have  become  the  standard  for  classification  rate-­‐making  in  most  developed  insurance  markets— particularly  because  of  the  benefit  of  transparency.  Understanding  the  mathematical  underpinnings  is  an   important  responsibility  of  the  rate-­‐making  actuary  who  intends  to  use  such  a  method.  Linear  models  are   a   good   place   to   start   as   GLMs   are   essentially   a   generalized   form   of   such   a   model.   As   with   many   techniques,  visualizing  the  GLM  results  is  an  intuitive  way  to  connect  the  theory  with  the  practical  use.   GLMs  do  not  stand  alone  as  the  only  multivariate  classification  method.  Other  methods  such  as  CART,   factor  analysis,  and  neural  networks  are  often  used  to  augment  GLM  analysis.     In  general  the  data  mining  techniques  listed  above  can  enhance  a  rate-­‐making  exercise  by:   • whittling  down  a  long  list  of  potential  explanatory  variables  to  a  more  manageable  list  for  use   within  a  GLM;   • providing  guidance  in  how  to  categorize  discrete  variables;                                                                                                                             7  Predictive  modelling  in  insurance:  key  issues  to  consider  throughout  the  lifecycle  of  a  model  
  • 9.
        9 • reducing   the   dimension   of   multi-­‐level   discrete   variables   (i.e.,   condensing   100   levels,   many   of   which  have  few  or  no  claims,  into  20  homogenous  levels);   • identifying   candidates   for   interaction   variables   within   GLMs   by   detecting   patterns   of   interdependency  between  variables.     3.2.2 Old  versus  new  modelling  techniques   The  adoption  of  GLMs  resulted  in  many  companies  seeking  external  data  sources  to  augment  what  had   already   been   collected   and   analysed   about   their   own   policies.   This   includes   but   is   not   limited   to   information   about   geo-­‐demographics,   sensor   data,   social   media   information,   weather,   and   property   characteristics,  information  about  insured  individuals  or  business.  This  additional  data  helps  actuaries   further  improve  the  granularity  and  accuracy  of  classification  rate-­‐making.  Unfortunately  this  new  data  is   very   often   unstructured   and   massive,   and   hence   the   traditional   generalized   linear   model   (GLM)   techniques  become  useless.   With   so   many   unique   new   variables   in   play,   it   can   become   a   very   difficult   task   to   identify   and   take   advantage   of   the   most   meaningful   correlations.   In   many   cases,   GLM   techniques   are   simply   unable   to   penetrate  deeply  into  these  giant  stores.  Even  in  the  cases  when  they  can,  the  time  constraints  required  to   uncover  the  critical  correlations  tend  to  be  onerous,  requiring  days,  weeks,  and  even  months  of  analysis.   Only   with   advanced   techniques,   and   specifically   machine   learning,   can   companies   generate   predictive   models  to  take  advantage  of  all  the  data  they  are  capturing.     Machine  learning  is  the  modern  science  of  finding  patterns  and  making  predictions  from  data  based  on   work   in   multivariate   statistics,   data   mining,   pattern   recognition,   and   advanced/predictive   analytics.   Machine  learning  methods  are  particularly  effective  in  situations  where  deep  and  predictive  insights  need   to  be  uncovered  from  data  sets  that  are  large,  diverse  and  fast  changing  —  Big  Data.  Across  these  types  of   data,  machine  learning  easily  outperforms  traditional  methods  on  accuracy,  scale,  and  speed.   3.2.3 Personalized  and  Real-­‐time  pricing  –  Motor  Insurance   In  order  to  price  risk  more  accurately,  insurance  companies  are  now  combining  analytical  applications  –   e.g.  behavioural  models  based  on  customer  profile  data  –  with  a  continuous  stream  of  real  time  data  –  e.g.   satellite  data,  weather  reports,  vehicle  sensors  –  to  create  detailed  and  personalized  assessment  of  risk.   Usage-­‐based  insurance  (UBI)  has  been  around  for  a  while  –  it  began  with  Pay-­‐As-­‐You-­‐Drive  programs   that  gave  drivers  discounts  on  their  insurance  premiums  for  driving  under  a  set  number  of  miles.  These   soon   developed   into   Pay-­‐How-­‐You-­‐Drive   programs,   which   track   your   driving   habits   and   give   you   discounts  for  'safe'  driving.   UBI  allows  a  firm  to  snap  a  picture  of  an  individual's  specific  risk  profile,  based  on  that  individual's  actual   driving  habits.  UBI  condenses  the  period  of  time  under  inspection  to  a  few  months,  guaranteeing  a  much   more  relevant  pool  of  information.  With  all  this  data  available,  the  pricing  scheme  for  UBI  deviates  greatly   from   that   of   traditional   auto   insurance.   Traditional   auto   insurance   relies   on   actuarial   studies   of   aggregated  historical  data  to  produce  rating  factors  that  include  driving  record,  credit-­‐based  insurance   score,  personal  characteristics  (age,  gender,  and  marital  status),  vehicle  type,  living  location,  vehicle  use,   previous  claims,  liability  limits,  and  deductibles.     Policyholders  tend  to  think  of  traditional  auto  insurance  as  a  fixed  cost,  assessed  annually  and  usually   paid  for  in  lump  sums  on  an  annual,  semi-­‐annual,  or  quarterly  basis.  However,  studies  show  that  there  is  a   strong   correlation   between   claim   and   loss   costs   and   mileage   driven,   particularly   within   existing   price   rating  factors  (such  as  class  and  territory).  For  this  reason,  many  UBI  programs  seek  to  convert  the  fixed   costs  associated  with  mileage  driven  into  variable  costs  that  can  be  used  in  conjunction  with  other  rating   factors   in   the   premium   calculation.   UBI   has   the   advantage   of   utilizing   individual   and   current   driving   behaviours,  rather  than  relying  on  aggregated  statistics  and  driving  records  that  are  based  on  past  trends   and  events,  making  premium  pricing  more  individualized  and  precise.   3.2.4 Advantages   UBI  programs  offer  many  advantages  to  insurers,  consumers  and  society.  Linking  insurance  premiums   more  closely  to  actual  individual  vehicle  or  fleet  performance  allows  insurers  to  price  premiums  more   accurately.   This   increases   affordability   for   lower-­‐risk   drivers,   many   of   whom   are   also   lower-­‐income   drivers.  It  also  gives  consumers  the  ability  to  control  their  premium  costs  by  encouraging  them  to  reduce  
  • 10.
        10 miles  driven   and   adopt   safer   driving   habits.   The   use   of   telematics   helps   insurers   to   more   accurately   estimate  accident  damages  and  reduce  fraud  by  enabling  them  to  analyse  the  driving  data  (such  as  hard   breaking,  speed,  and  time)  during  an  accident.  This  additional  data  can  also  be  used  by  insurers  to  refine   or  differentiate  UBI  products.     3.2.5 Shortcomings/challenges     3.2.5.1 Organization  and  resources   Taking   advantage   of   the   potential   of   Big   Data   requires   some   different   approaches   to   organization,   resources,   and   technology.   As   in   many   new   technologies   that   offer   promise,   there   are   challenges   to   successful   implementation   and   the   production   of   meaningful   business   results.   The   number   one   organizational  challenge  is  determining  the  business  value,  with  financing  as  a  close  second.  Talent  is  the   other  big  issue  –  identifying  the  business  and  technology  experts  inside  the  enterprise,  recruiting  new   employees,  training  and  mentoring  individuals,  and  partnering  with  outside  resources  is  clearly  a  critical   success  factor  for  Big  Data.  Implementing  the  new  technology  and  organizing  the  data  are  listed  as  lesser   challenges  by  insurers,  although  there  are  still  areas  that  require  attention.   3.2.5.2 Technology  challenges   The  biggest  technology  challenge  in  the  Big  Data  world  is  framed  in  the  context  of  different  Big  Data  “V”   characteristics.  These  include  the  standard  three  V’s  of  volume,  velocity,  and  variety,  plus  two  more  –   veracity   and   value.   The   variety   and   veracity   of   the   data   presents   the   biggest   challenges.   As   insurers   venture   beyond   analysis   of   structured   transaction   data   to   incorporate   external   data   and   unstructured   data  of  all  sorts,  the  ability  to  combine  and  input  the  data  into  an  analytic  analysis  may  be  complicated.  On   one  hand,  the  variety  expresses  the  promise  of  Big  Data,  but  on  the  other  hand,  the  technical  challenges   are   significant.   The   veracity   of   the   data   is   also   deemed   as   a   challenge.   It   is   true   that   some   Big   Data   analyses  do  not  require  the  data  to  be  as  cleaned  and  organized  as  in  traditional  approaches.  However,   the  data  must  still  reflect  the  underlying  truth/reality  of  the  domain.   3.2.5.3 Technology  Approaches   Technology  should  not  be  the  first  focus  area  for  evaluating  the  potential  of  Big  Data  in  an  organization.   However,   choosing   the   best   technology   platform   for   your   organization   and   business   problems   does   become  an  important  consideration  for  success.  Cloud  computing  will  play  a  very  important  role  in  Big   Data.  Although  there  are  challenges  and  new  approaches  required  for  Big  Data,  there  is  a  growing  body  of   experience,  expertise,  and  best  practices  to  assist  in  successful  Big  Data  implementations.   3.3 Insurance  Reserving   Loss  reserving  is  a  classic  actuarial  problem  encountered  extensively  in  motor,  property  and  casualty  as   well  as  in  health  insurance.  It  is  a  consequence  of  the  fact  that  insurers  need  to  set  reserves  to  cover   future  liabilities  related  to  the  book  of  contracts.  In  other  words  the  insurer  has  to  hold  funds  aside  to   meet  future  liabilities  attached  to  incurred  claims.     In  non-­‐life  insurance,  most  policies  run  for  a  period  of  12  months.  However  the  claims  payment  process   can  take  years  or  even  decades.  In  particular,  losses  arising  from  casualty  insurance  can  take  a  long  time   to   settle   and   even   when   the   claims   are   acknowledged,   it   may   take   time   to   establish   the   extent   of   the   claims   settlement   costs.   A   well-­‐known   and   costly   example   is   provided   by   the   claims   from   asbestos   liabilities.  Thus  it  is  not  a  surprise  that  the  biggest  item  on  the  liabilities  side  of  an  insurer’s  balance  sheet   is   often   the   provision   of   reserves   for   future   claims   payments.   It   is   the   job   of   the   reserving   actuary   to   predict,   with   maximum   accuracy,   the   total   amount   necessary   to   pay   those   claims   that   the   insurer   has   legally  committed  to  cover  for.     Historically,  reserving  was  based  on  deterministic  calculations  with  pen  and  paper,  combined  with  expert   judgement.   Since   the   1980s,   the   arrival   of   personal   computers   and   ‘spreadsheet’   software   packages   induced  a  real  change  for  the  reserving  actuaries.  The  use  of  spreadsheets  does  not  only  result  in  gain  of   calculation  time  but  allows  also  testing  different  scenarios  and  the  sensitivity  of  the  forecasts.  The  first   simple  models  used  by  actuaries  started  to  evolve  towards  more  developed  ideas  through  the  evolution   of   the   IT   resources.   Moreover   the   recent   changes   in   regulatory   requirements,   such   as   Solvency   II   in   Europe,  have  showed  the  need  of  stochastic  models  and  more  precise  statistical  techniques.        
  • 11.
        11 3.3.1 Classical  methods   There  are  a  lot  of  different  frameworks  and  models  used  by  reserving  actuaries  to  compute  the  technical   provisions,  and  it  is  not  the  goal  of  this  paper  to  review  them  in  an  exhaustive  way  but  rather  to  show  that   they  share  the  central  notion  of  triangle.  A  triangle  is  a  way  of  presenting  data  in  the  form  of  a  triangular   structure  showing  the  development  of  claims  over  time  for  each  origin  period.  An  origin  period  can  be  the   year  the  policy  was  written  or  earned,  or  the  loss  occurrence  period.       After  having  used  deterministic  models,  reserving  generally  switches  to  stochastic  models.  These  models   allow  for  quantifying  reserve  risk.       The  use  of  models  based  on  aggregated  data  used  to  be  convenient  in  the  past  when  IT  resources  were   limited  but  is  more  and  more  questionable  nowadays  when  we  have  huge  computational  power  at  hand   at  an  affordable  price.  Therefore  there  is  a  need  to  move  to  models  that  fully  use  data  available  in  the   insurers’  data  warehouses.     3.3.2 Micro-­‐level  reserving  methods   Unlike  aggregate  models  (or  macro-­‐level  models),  micro-­‐level  reserving  methods  (also  called  individual   claim   level   models)   use   individual   claims   data   as   inputs   and   estimate   outstanding   liabilities   for   each   individual  claim.  Unlike  the  models  detailed  in  the  previous  section,  they  model  very  precisely  the  lifetime   development   process   of   each   individual   claim,   including   events   such   as   claim   occurrence,   reporting,   payments  and  settlement.  Moreover  they  can  include  micro-­‐level  covariates  such  as  information  about   the  policy,  the  policyholder,  claim,  claimant  and  transactions.     When  well  specified,  such  models  are  expected  to  generate  reliable  reserve  estimates.  Indeed  the  ability   to   model   the   claims   development   at   the   individual   level   and   to   incorporate   micro-­‐level   covariate   information  allows  micro-­‐level  models  to  handle  heterogeneities  in  claims  data  efficiently.  Moreover  the   large   amount   of   data   used   in   modelling   can   help   to   avoid   issues   of   over-­‐parameterization   and   lack   of   robustness.   As   a   consequence,   micro-­‐level   models   are   especially   significant   under   changing   environments,  as  these  changes  can  be  indicated  by  appropriate  covariates.     3.4 Claims  Management   Big  Data  can  play  a  tremendous  role  in  the  improvement  of  claims  management.  It  provides  access  to  data   that  was  not  available  before,  and  makes  the  claims  processing  faster.  Therefore  it  enables  improved  risk   management,  reduces  loss  adjustment  expenses  and  enhances  quality  of  service  resulting  in  increased   customer  retention.  Below  we  present  details  of  how  Big  Data  analytics  improves  fraud  detection  process.   3.4.1 Fraud  detection   It  is  estimated  that  a  typical  organization  loses  5%  of  its  revenues  to  fraud  each  year8.    The  total  cost  of   insurance  fraud  (non-­‐health  insurance)  in  the  US  is  estimated  to  be  more  than  $40  billion  per  year9.    The   advent  of  Big  Data  &  Analytics  has  provided  new  and  powerful  tools  to  fight  fraud.       3.4.2 What  are  the  current  challenges  in  fraud  detection?   The  first  challenge  is  finding  the  right  data.    Analytical  models  need  data  and  in  a  fraud  detection  setting   this   is   not   always   that   evident.     Collected   fraud   data   are   often   very   skew,   with   typically   less   than   1%   fraudsters,  which  seriously  complicates  the  detection  task.    Also  the  asymmetric  costs  of  missing  fraud   versus   harassing   non-­‐fraudulent   customers   represent   important   model   difficulties.     Furthermore,   fraudsters   try   to   constantly   outperform   the   analytical   models   such   that   these   models   should   be   permanently  monitored  and  re-­‐configured  on  an  ongoing  basis.       3.4.3 What  analytical  approaches  are  being  used  to  tackle  fraud?   Most   of   the   fraud   detection   models   in   use   nowadays   are   expert   based   models.     When   data   becomes   available,  one  can  start  doing  analytics.    A  first  approach  is  supervised  learning  which  analyses  a  labelled   data   set   of   historically   observed   fraud   behaviour.     It   can   be   used   to   both   predict   fraud   as   well   as   the   amount   thereof.     Unsupervised   learning   starts   from   an   unlabelled   data   set   and   performs   anomaly   detection.     Finally,   Social   network   learning   analyses   fraud   behaviour   in   networks   of   linked   entities.     Throughout  our  research,  it  has  been  found  that  this  approach  is  superior  to  all  others!                                                                                                                               8  www.acfe.com   9  www.fbi.gov  
  • 12.
        12 3.4.4 What  are  the  key  characteristics  of  successful  analytical  models  for  fraud  detection?     Successful  fraud  analytical  models  should  satisfy  various  requirements.    First,  they  should  achieve  good   statistical  performance  in  terms  of  recall  or  hit  rate,  which  is  the  percentage  of  fraudsters  labelled  by  the   analytical   model   as   suspicious,   and   precision,   which   is   the   percentage   of   fraudsters   amongst   the   ones   labelled   as   suspicious.     Next,   the   analytical   models   should   not   be   based   on   complex   mathematical   formulas  (such  as  neural  networks,  support  vector  machines,...)  but  should  provide  clear  insight  into  the   fraud   mechanisms   adopted.     This   is   particularly   important   since   the   insights   gained   will   be   used   to   develop  new  fraud  prevention  strategies.    Also  the  operational  efficiency  of  the  fraud  analytical  model   needs  to  be  evaluated.    This  refers  to  the  amount  of  resources  needed  to  calculate  the  fraud  score  and   adequately  act  upon  it.    E.g.,  in  a  credit  card  fraud  environment,  a  decision  needs  to  be  made  within  a  few   seconds  after  the  transaction  was  initiated.       3.4.5 Use  of  social  network  analytics  to  detect  fraud10    Research   has   proven   that   network   models   significantly   outperform   non-­‐network   models   in   terms   of   accuracy,  precision  and  recall.  Network  analytics  can  help  improve  fraud  detection  techniques.  Fraud  is   present  in  many  critical  human  processes  such  as  credit  card  transactions,  insurance  claim  fraud,  opinion   fraud,   social   security   fraud...   Fraud   can   be   defined   by   the   following   five   characteristics.     Fraud   is   an   uncommon,   well-­‐considered,   imperceptibly   concealed,   time-­‐evolving   and   often   carefully   organized   crime,   which  appears  in  many  types  and  forms.  Before  applying  fraud  detection  techniques,  these  five  issues   should  be  resolved  or  counterbalanced.       Fraud   is   an   uncommon   crime   and   this   means   that   it   is   an   extremely   skewed   class   distribution.   Rebalancing  techniques  could  be  used  such  as  the  SMOTE  to  counterbalance  this  effect.  SMOTE  consists  in   under  sampling  the  majority  class  of  data  (reduce  the  number  of  legitimate  cases)  and  oversampling  the   minority  class  of  data  (duplicate  of  fraud  cases  or  create  artificial  fraud  cases).       Complex  fraud  structures  are  well-­‐considered,  this  implies  that  there  will  be  changes  in  behaviour  over   time  so  not  every  time  period  will  have  the  same  importance.  A  temporal  weighting  adjustment  should   put  an  emphasis  on  the  more  important  periods  (more  recent  data  periods)  that  could  be  explanatory  of   the  fraudulent  behaviour.   Fraud  is  imperceptibly  concealed  meaning  that  it  is  difficult  to  identify  fraud.  One  could  leverage  on  expert   knowledge  to  create  features  and  help  identify  fraud.     Fraud   is   time-­‐evolving.   The   period   of   study   should   be   selected   carefully   taking   into   consideration   that   fraud   evolves   over   time.   How   much   of   previous   time   periods   could   explain   or   affect   the   present?   The   model  should  incorporate  these  changes  over  time.  Another  question  to  rise  is  in  what  time-­‐window  the   model  should  be  able  to  detect  fraud:  short,  medium  or  long  term.   The   last   characteristic   of   fraud   is   that   it   is   most   of   the   time   carefully   organized.   Fraud   is   often   not   an   individual  phenomenon,  in  fact  there  are  many  interactions  between  fraudsters.  Often  there  are  fraud   sub-­‐networks   developing   in   a   bigger   network.   Social   network   analysis   could   be   used   to   detect   these   networks.     Social  Network  analysis  helps  deriving  useful  patterns  and  insights  by  exploiting  the  relational  structure   between  objects.   A  network  consists  of  two  set  of  elements:  the  objects  of  the  network  which  are  called  nodes  and  the   relationships  between  nodes  which  are  called  links.  The  links  connect  two  or  more  nodes.    A  weight  could   be   assigned   to   the   nodes   and   links   to   measure   the   magnitude   of   the   crime   or   the   intensity   of   the   relationship.  When  constructing  such  networks,  focus  will  be  put  on  the  neighbourhood  of  a  node  which   is  a  subgraph  of  network  around  the  node  of  interest  (fraudster).     Once   a   network   has   been   constructed,   how   could   this   network   be   used   as   an   indicator   of   fraudulent   activities?   Fraud   could   be   detected   by   answering   following   question:   Does   the   network   contain   statistically   significant   patterns   of   homophily?   Detection   of   fraud   relies   on   a   concept   often   used   in   sociology  which  is  called  homophily.  Homophily  in  networks  consists  in  people  have  a  strong  tendency  to                                                                                                                             10  based  on  the  research  of  Véronique  Van  Vlasselaer  (KULeuven)    
  • 13.
        13 associate  with  other  whom  they  perceive  as  being  similar  to  themselves  in  some  way.  This  concept  could   be  translated  in  fraud  networks:  fraudulent  people  are  more  likely  to  be  connected  to  other  fraudulent   people.   Clustering   techniques   could   be   used   to   detect   significant   pattern   of   homophily   and   thus   could   spot  fraudsters.     Given  a  homophilic  network  with  evidence  of  fraud  clusters  then  it  is  possible  to  extract  features  from  the   network   around   the   node(s)   of   interest   (fraud   activity)   which   is   also   called   the   neighbourhood   of   the   node.  This  process  is  called  the  featurization  process:  extracting  features  for  each  network  object  based   on  its  neighbourhood.    Focus  will  be  put  on  the  first-­‐order  neighbourhood  (first-­‐degree  links)  also  known   as  the  “egonet”.  (ego:  node  of  interest  surrounded  by  its  direct  associates  known  as  alters).  Featurization   extraction  happens  at  two  levels:  egonet  generic  features  (how  many  fraudulent  resources  are  associated   to  that  company,  is  there  relationships  between  resources...)  and  alter  specific  features  (how  similar  are   the  alter  to  the  ego,  is  the  alter  involved  in  many  fraud  cases  or  not).     Once   these   first-­‐order   neighbourhood   features   for   each   subject   of   interest   (companies)   have   been   extracted  such  as  degree  of  fraudulent  resources,  the  weight  of  the  fraudulent  resources,  it  is  then  easy  to   derive  the  propagation  effect  of  these  fraudulent  influences  through  the  network.     To   conclude,   network   models   always   outperform   non-­‐network   models   as   they   are   able   to   better   distinguish   fraudsters   from   non-­‐fraudsters.     They   are   also   more   precise   in   generating   high-­‐risk   companies  and  smaller  list  and  better  detect  more  fraudulent  corporates.   3.4.6 Fraud  detection  in  motor  insurance  –  Usage-­‐Based  Insurance  example   In   2014,   Coalition   Against   Insurance   Fraud11,   with   assistance   of   business   analytics   company   SAS,   has   published  a  report  in  which  it  stresses  that  technology  plays  a  growing  role  in  fighting  fraud.  “Insurers  are   investing  in  different  technologies  to  combat  fraud,  but  a  common  component  to  all  these  solutions  is  data,”   said   Stuart   Rose,   Global   Insurance   Marketing   Principal   at   SAS.   “The   ability   to   aggregate   and   easily   visualize   data   is   essential   to   identify   specific   fraud   patterns.”   “Technology   is   playing   a   larger   and   more   trusted  role  with  insurers  in  countering  growing  fraud  threats.  Software  tools  provide  the  efficiency  insurers   need  to  thwart  more  scams  and  impose  downward  pressure  on  premiums  for  policyholders,”  said  Dennis  Jay,   the  Coalition’s  executive  director.   In  motor  insurance,  a  good  example  is  Usage-­‐Based  Insurance  (UBI),  where  insurers  can  benefit  from  the   superior   fraud   detection   that   telematics   can   provide.   It   equips   an   insurer   with   driving   behaviour   and   driving  exposure  patterns  including  information  about  speeding,  driving  dynamics,  driving  trips,  day  and   night  driving  patterns,  garaging  address  or  mileage.  In  some  sense  UBI  can  become  a  “lie  detector”  and   can  help  companies  to  detect  falsification  of  the  garaging  address,  annual  mileage  or  driving  behaviour.   Thanks  to  recording  vehicle’s  geographical  location  and  detecting  sharp  braking  and  harsh  acceleration   during  an  accident,  an  insurer  can  analyse  accident  details  and  estimate  accident  damages.  The  telematics   devices   used   in   the   UBI   can   contain   first   notice   of   loss   (FNOL)   services,   providing   very   valuable   information  for  insurers.  Analytics  performed  on  this  data  provide  additional  evidence  to  consider  when   investigating  a  claim,  and  can  help  to  reduce  fraud  and  claims  disputes.   4 Legal  aspects  of  Big  Data   4.1 Introduction   Data  processing  lies  at  the  very  heart  of  the  insurance  activities.  Insurers  and  intermediaries  collect  and   process  vast  amounts  of  personal  data  about  their  customers.  At  the  same  time  they  are  dealing  with  a   particular  type  of  ‘discrimination’  among  their  insureds.  Like  all  businesses  operating  in  Europe,  insurers   are   subject   to   European   and   national   data   protection   laws   and   anti-­‐discrimination   rules.   The   fast   technological   evolution   and   globalization   has   activated   a   comprehensive   reform   of   the   current   Data   Protection  laws.  The  EU  hopes  to  complete  a  new  General  Data  Protection  Regulation  at  the  end  of  this   year.  Insurers  are  concerned  that  this  new  Regulation  could  introduce  unintended  consequences  for  the   insurance  industry.                                                                                                                                     11  http://www.insurancefraud.org/about-­‐us.htm  
  • 14.
        14 4.2 Data  processing   4.2.1 Legislation:  an  overview   Insurers   collect   and   process   data   to   analyse   risks   that   individuals   wish   to   cover,   to   tailor   products   accordingly,  to  valuate  and  pay  claims  and  benefits,  and  detect  and  prevent  insurance  fraud.  The  rise  of   Big   Data   presents   opportunities   to   offer   more   creative,   competitive   pricing   and,   importantly,   predict   customers’   behavioural   activity.   As   insurers   continue   to   explore   this   relatively   untapped   resource,   evolutions  in  data  processing  legislation  need  to  be  followed  very  closely.         The   protection   of   personal   data   was   -­‐   as   a   separate   right   granted   to   an   individual   -­‐   for   the   first   time   guaranteed  in  the  Convention  for  the  Protection  of  Individuals  with  regard  to  Automatic  Processing   of  Personal  Data  (Convention  108).  It  was  adopted  by  the  Council  of  Europe  in  1981.   The  current,  principal  EU  legal  instrument  establishing  rules  for  fair  personal  data  processing  is  the  Data   Protection  Directive  (95/46/EC)  of  1995,  which  regulates  the  protection  of  individuals  with  regard  to   the  processing  of  personal  data  and  the  free  movement  of  such  data.  As  a  framework  law,  the  Directive   had  to  be  implemented  in  EU  Member  States  through  national  laws.  This  Directive  has  set  a  standard  for   the  legal  definition  of  personal  data  and  regulatory  responses  to  the  use  of  personal  data.  The  provisions   includes  principles  related  to  data  quality,  criteria  for  making  data  processing  legitimate  and  the  essential   right  not  to  be  subject  to  automated  individual  decisions.   The   Data   Protection   Directive   was   complemented   by   other   legal   instruments,   such   as   the   E-­‐Privacy   Directive  (2002/58/EC),  part  of  a  package  of  5  new  Directives  that  aim  to  reform  the  legal  and  regulatory   framework  of  electronic  communications  services  in  the  EU.  Personal  data  and  individuals’  fundamental   right   to   privacy   needs   to   be   protected   but   at   the   same   time   the   legislator   must   take   into   account   the   legitimate  interests  of  governments  and  businesses.  One  of  the  innovative  provisions  of  this  Directive  was   the  introduction  of  a  legal  framework  for  the  use  of  devices  for  storing  or  retrieving  information,  such  as   cookies.  Companies  must  also  inform  customers  of  the  data  processing  to  which  their  data  will  be  subject   and   obtain   subscriber   consent   before   using   traffic   data   for   marketing   or   before   offering   added   value   services  with  traffic  or  location  data.  The  EU  Cookie  Directive  (2009/136/EC),  an  amendment  of  the  E-­‐ Privacy  Directive,  aims  to  increase  consumer  protection  and  requires  websites  to  obtain  informed  consent   from  visitors  before  they  store  information  on  a  computer  or  any  web  connected  device.   In  2006  the  EU  Data  Retention  Directive  (2006/24/EC)  was  adopted  as  an  anti-­‐terrorism  measure  after   the   terrorist   attacks   in   Madrid   and   London.   However   on   8   April   2014,   the   European   Court   of   Justice  declared   this   Directive   invalid.   The   Court   took   the   view   that   the   Directive   does   not   meet   the   principle  of  proportionality  and  should  have  provided  more  safeguards  to  protect  the  fundamental  rights   with  respect  to  private  life  and  to  the  protection  of  personal  data.   Belgium  has  established  a  Privacy  Act  or  Data  Protection  Act  in  1992.  Since  the  introduction  of  the  EU   Data  Protection  Directive  (1995)  the  principles  of  that  directive  has  been  transposed  into  Belgian  law.  The   Privacy   Act   consequently   underwent   significant   changes   introduced   by   the   Act   of   11   December   1998.   Further  modifications  have  been  made  in  the  meantime,  including  those  of  the  Act  of  26  February  2006.   The   Belgian   Privacy   Commission   is   part   of   a   European   task   force,   which   includes   data   protection   authorities  from  the  Netherlands,  Belgium,  Germany,  France  and  Spain.  In  October  2014,  a  new  Privacy   Bill  was  introduced  in  the  Belgian  Federal  Parliament.  The  Bill  mainly  aims  at  providing  the  Belgian  Data   Protection   Authority   (DPA)   with   stronger   enforcement   capabilities   and   ensuring   that   Belgian   citizens   regain  control  over  their  personal  data.  To  achieve  this,  certain  new  measures  are  being  proposed  to  be   included  in  the  existing  legislation,  adopted  already  in  1992,  as  inspired  by  the  proposed  European  data   protection  Regulation.   At   this   moment   the   current   data   processing   legislation   needs   an   urgent   update.   Rapid   technological   developments,  the  increasingly  globalized  nature  of  data  flows  and  the  arrival  of  cloud  computing  pose   new   challenges   for   data   protection   authorities.   In   order   to   ensure   a   continuity   of   high   level   data   protection,  the  rules  need  to  be  brought  in  line  with  technological  developments.  The  Directive  of  1995   has  also  not  prevented  fragmentation  in  the  way  data  protection  is  implemented  across  the  Union.   In   2012   the   European   Commission   has   proposed   a   comprehensive,   pan-­‐European   reform   of   the   data   protection  rules  to  strengthen  online  privacy  rights  and  boost  Europe's  digital  economy.  On  15  June  2015,   the   Council   reached   a   ‘general   approach’   on   a   General   Data   Protection   Regulation   (GDPR)   that  
  • 15.
        15 establishes  rules   adapted   to   the   digital   era.   The   European   Commission   is   pushing   for   a   complete   agreement  between  Council  and  European  Parliament  before  the  end  of  this  year.  The  twofold  aim  of  the   Regulation  is  to  enhance  data  protection  rights  of  individuals  and  to  improve  business  opportunities  by   facilitating   the   free   flow   of   personal   data   in   the   digital   single   market.   The   Regulation   must   be   appropriately   balanced   in   order   to   guarantee   a   high   level   of   protection   of   the   individuals   and   allow   companies   to   preserve   innovation   and   competitiveness.   In   parallel   with   the   proposal   for   a   GDPR,   the   Commission  adopted  a  Directive  on  data  processing  for  law  enforcement  purposes  (5833/12).     4.2.2 Some  concerns  of  the  insurance  industry   The  European  insurance  and  reinsurance  federation,  Insurance  Europe,  is  concerned  that  the  proposed   Regulation  could  introduce  unintended  consequences  for  the  insurance  industry  and  their  policyholders.   The   new   legislation   must   correctly   balance   an   individual’s   right   to   privacy   against   the   needs   of   businesses.   The   way   insurers   process   data   must   be   taken   into   account   appropriately   so   that   they   can   perform   their   contractual   obligations,   assess   consumers’   needs   and   risks,   innovate,   and   also   combat   fraud.  There  is  also  a  clear  tension  between  Big  Data,  the  privacy  of  the  insured’s  personal  data  and  its   availability  to  business  and  the  State.   An  important  concern  is  that  the  proposed  rules  concerning  profiling  do  not  take  into  consideration  the   way  that  insurance  works.  The  Directive  of  1995  contains  rules  on  'automated  processing'  but  there  is  not   a  single  mention  of  'profiling'  in  the  text.  The  new  GDPR  aims  to  provide  more  legal  certainty  and  more   protection   for   individuals   with   respect   to   data   processing   in   the   context   of   profiling.   Insures   need   to   profile  potential  policyholders  to  measure  risk,  any  restrictions  on  profiling  could,  therefore,  translate  not   only   into   higher   insurance   prices   and   less   insurance   coverage,   but   also   into   an   inability   to   provide   consumers   with   appropriate   insurance.   Insurance   Europe   recommends   that   the   new   EU   Regulation   should   allow   insurance-­‐related   profiling   at   pre-­‐contractual   stage   and   during   the   performance   of   the   contract.  There  is  also  still  some  confusion  in  defining  profiling,  in  the  Council  approach  profiling  means   solely  automated  processing  while  Article  20(5)  proposed  by  the  European  Parliament,  could,  according   to   Insurance   Europe,   be   interpreted   as   prohibiting   fully   automated   processing,   requesting   human   intervention  for  every  single  insurance  contract  offered  to  consumers.     The   proposal   of   the   EU   Council   (June   2015)   stipulates   that   the   controller   should   use   adequate   mathematical  or  statistical  procedures  for  the  profiling.  He  must  secure  personal  data  in  a  way  which   takes  account  of  the  potential  risks  involved  for  the  interests  and  rights  of  the  data  subject  and  which   prevents  inter  alia  discriminatory  effects  against  individuals  on  the  basis  of  race  or  ethnic  origin,  political   opinions,  religion  or  beliefs,  trade  union  membership,  genetic  or  health  status,  sexual  orientation  or  that   result   in   measures   having   such   effect.   Automated   decision-­‐making   and   profiling   based   on   special   categories  of  personal  data  should  only  be  allowed  under  specific  conditions.     According  to  the  Article  29  Working  Party12  the  proposals  of  the  Council  according  to  profiling  are  still   unclear  and  do  not  foresee  sufficient  safeguards  which  should  be  put  in  place.  In  June  2015  it  renews  its   call  for  provisions  giving  the  data  subject  a  maximum  of  control  and  autonomy  when  processing  personal   data  for  profiling.  The  provisions  should  clearly  define  the  purposes  for  which  profiles  may  be  created   and  used,  including  specific  obligations  on  controllers  to  inform  the  data  subject,  in  particular  on  his  or   her  right  to  object  to  the  creation  and  the  use  of  profiles.  The  academic  Research  Group  IRISS  remarks  that   the  GDPR  does  not  clarify  whether  or  not  there  is  an  obligation  on  data  controllers  to  disclose  information   about  the  algorithm  involved  in  profiling  practices  and  suggest  clarification  on  this  point.   Insurance  Europe  also  request  that  the  GDPR  should  explicitly  recognise  insurers’  need  to  process  and   share  data  for  fraud  prevention  and  detection.  According  to  the  Council  and  the  Article  29  Working  Party   fraud  prevention  may  fall  under  the  non-­‐exhaustive  list  of  ‘legitimate  interests’  in  Article  6(1)  (f)  and  will   provide  the  necessary  legal  basis  to  allow  processes  for  combatting  insurance  fraud.   The   new   Regulation   proposes   also   a   new   right   to   data   portability,   enabling   easier   transmission   of   personal  data  from  one  service  provider  to  another.  This  would  allow  policyholders  to  obtain  a  copy  of   any  of  their  data  being  processed  by  an  insurer  and  insurers  could  be  forced  to  disclose  confidential  and                                                                                                                             12  Article  29  Working  Party  is  an  independent  advisory  body  on  data  protection  and  privacy,  set  up  under  Data   Protection  Direction  of  1995.  It  is  composed  of  representatives  from  the  national  data  protection  authorities  of  the   EU  Member  States,  the  European  Data  Protection  Supervisor  and  the  European  Commission.  
  • 16.
        16 commercially  sensitive   information.   Insurance   Europe   believes   that   the   scope   of   the   right   to   data   portability  should  be  narrowed  down,  to  make  sure  that  insurers  would  not  be  forced  to  disclose  actuarial   information  to  competitors.       Insurers   also   need   to   retain   policyholder   information.   It   should   clearly   state   that   the   right   to   be   forgotten   should   not   apply   where   there   is   a   contractual   relationship   between   an   organisation   and   an   individual  or  where  a  data  controller  is  required  to  comply  with  regulatory  obligations  to  retain  data  or   where  the  data  is  processed  to  detect  and  prevent  fraudulent  activities.       The   implementation   of   more   stringent,   complex   rules   will   require   insurance   firms   to   review   their   compliance  programmes.  They  will  have  to  take  account  of  increased  data  handling  formalities,  profiling,   consent   and   processing   requirements   and   the   responsibilities   and   obligations   of   controllers   and   processors.   4.3 Discrimination   4.3.1 Legislation:  an  overview   In   2000   two   important   EU   directives   have   provided   a   comprehensive   framework   for   European   anti-­‐ discrimination  law.  The  Employment  Equality  Directive  (2000/78/EC)  prohibits  discrimination  on  the   basis   of   sexual   orientation,   religion   or   belief,   age   and   disability   in   the   area   of   employment   while   the   Racial  Equality  Directive  (2000/43/EC)  combats  discrimination  on  the  grounds  of  race  or  ethnicity  in   the  context  of  employment,  the  welfare  system,  social  security,  and  goods  and  services.       The   Gender   Goods   and   Services   Directive   (2004/113/EC)   has   expanded   the   scope   of   sex   discrimination  and  requires  that  differences  in  treatment  may  be  accepted  only  if  they  are  justified  by  a   legitimate  aim.  Any  limitation  should  nevertheless  be  appropriate  and  necessary  in  accordance  with  the   criteria   derived   from   case   law   of   the   ECJ.   As   regards   the   insurance   sector,   the   Directive,   in   principle,   imposes  ‘unisex’  premiums  and  benefits  for  contracts  concluded  after  21  December  2007.  However,  it   provides  for  an  exception  to  this  principle  in  Article  5(2),  with  the  possibility  to  permit  differences  in   treatment  between  women  and  men  after  this  date,  based  on  actuarial  data  and  reliable  statistics.  In  its   Test-­‐Achats  judgment,  the  ECJ  invalidated  this  exception  because  it  was  incompatible  with  Articles  21  and   23  of  the  EU’s  Charter  of  Fundamental  Rights.     A  proposal  for  a  Council  Directive  (COM  2008  426-­‐(15))  stipulates  that  actuarial  and  risk  factors  related   to   disability   and   to   age   can   be   used   in   the   provision   of   insurance.   These   should   not   be   regarded   as   constituting  discrimination  where  the  factors  are  shown  to  be  key  factors  for  the  assessment  of  risk.     The  recent  proposal  of  the  Council  on  the  new  General  Data  Protection  Regulation  (June  2015)  states  that   the  processing  of  special  categories  of  personal  (sensitive)  data,  revealing  racial  or  ethnic  origin,  political   opinions,  religious  or  philosophical  beliefs,  trade-­‐union  membership,  and  the  processing  of  genetic  data   or  data  concerning  health  or  sex  life  shall  be  prohibited.  Derogations  from  this  general  prohibition  should   be   explicitly   provided   inter   alia   where   the   data   subject   gives   explicit   consent   or   in   respect   of   specific   needs,  in  particular  where  the  processing  is  carried  out  in  the  course  of  legitimate  activities  by  certain   associations  or  foundations  the  purpose  of  which  is  to  permit  the  exercise  of  fundamental  freedoms.       In   Belgium   the   EU   Directive   2000/78/EC   is   transposed   to   the   national   legislation   with   the   anti-­‐ discrimination   Law   of   10   May   2007   (BS   30.V.2007).   This   law   has   been   amended   by   the   law   of   30   December  2009  (BS  31.XII.2009)  and  by  the  law  of  17  Augustus  2013  (BS  5.III.2014).  Due  to  the  federal   organization  of  Belgium,  laws  prohibiting  discrimination  are  complex  and  fragmented  because  they  are   made  and  implemented  by  six  different  legislative  bodies,  each  within  its  own  sphere  of  competence.       4.3.2 Tension  between  insurance  and  anti-­‐discrimination  law   Insurance  companies  are  dealing  with  a  particular  type  of  ‘discrimination’  among  their  insureds.  They   attempt  to  segregate  insureds  into  separate  risk  pools  based  on  their  differences  in  risk  profiles,  first,  so   that   they   can   charge   different   premiums   to   the   different   groups   based   on   their   risk   and,   second,   to   incentivize  risk  reduction  by  insureds.  They  openly  ‘discriminate’  among  individuals  based  on  observable   characteristics.   Accurate   risk   classification   and   incentivizing   risk   reduction   provide   the   primary   justifications  for  why  we  let  insurers  ‘discriminate’.  [30]    
  • 17.
        17 Regulatory  restrictions  on  insurers’  risk  classifications  can  produce  moral  hazard  and  generate  adverse   selection.  Davey  [31]  remarks  that  insurance  and  anti-­‐discrimination  law  are  defending  a  fundamental   different   perspective   to   risk   assessment.   Insurance   has   often   defended   its   practices   as   ‘fair   discrimination’.  They  assert  that  they  are  not  discriminating  in  the  legal  sense  by  treating  similar  cases   differently,   they   rather   are   treating   different   cases   differently.   This   clash   between   the   principal   of   insurance  and  anti-­‐discrimination  law  is  fundamental:  whether  differential  treatment  based  on  actuarial   experience   is   ‘discrimination’   in   law   or   justified   differential   treatment.   This   tension   is   felt   in   both   the   national  and  supranational  levels  as  governments  and  the  EU  seek  to  regulate  underwriting  practices.  A   good,  illustrative  example  is  the  already  mentioned  Test-­‐Achats  case.       Tension  between  insurance  and  the  Charter  of  Fundamental  Rights  is  also  clearly  felt  in  the  debate  on   genetic  discrimination  in  the  context  of  life  insurance.  Insurers  might  wish  to  use  genetic  test  results  for   underwriting,  just  as  other  medical  or  family  history  data.  The  disclosure  of  genetic  data  for  insurance   risk  analysis  will  present  complex  issues  that  overlap  those  related  to  sensitive  data  in  general.    Canada,   the  US,  Russia,  and  Japan  have  chosen  not  to  adopt  laws  specifically  prohibiting  access  to  genetic  data  for   underwriting  by  life  insurers.  In  these  countries,  insurers  treat  genetic  data  like  other  types  of  medical  or   lifestyle  data  [32].  Belgium,  France,  and  Norway  have  chosen  to  adopt  laws  to  prevent  or  limit  insurers'   access   to   genetic   data   for   life   insurance   underwriting.  The   Belgian   Parliament   has   incorporated   in   the   Law  of  25  June  1992  legislative  dispositions  that  prohibits  the  use  of  genetic  testing  to  predict  the  future   health  status  of  applicants  for  (life)  insurances.       Since  EU  member  states  have  adopted  different  approaches  on  the  use  of  genetic  data,  a  pan-­‐European   regulation   is   needed.   The   recent   proposal   of   the   Council   on   a   new   General   Data   Protection   Regulation   (June  2015)  does  not  solve  this  problem.  It  prohibits  the  processing  of  genetic  data  but  recognises  explicit   consent  as  a  valid  legal  basis  for  the  processing  of  genetic  data  and  leaves  to  Member  States  (Article  9(2)   (a))  the  decision  on  not  admitting  consent  for  legitimising  the  processing  of  genetic  data.     5 New  Frontiers     5.1 Risk  pooling  vs.  personalization   With  an  introduction  of  Big  Data  in  insurance,  insurance  sector  is  opening  up  to  new  possibilities,  new   innovative  offers  and  personalized  services  for  their  customers.  As  a  result  we  might  see  the  end  of  risk   pooling  and  the  rise  of  individual  risk  assessment.  It  is  said  that  these  personalized  services  will  provide   new  premiums  that  will  be  “fairer”  for  the  policyholder.  Is  it  indeed  true  that  the  imprudence  of  others   will   have   less   impact   on   your   own   insurance   premium?   This   way   of   thinking   holds   for   as   long   as   the   policyholder  does  not  have  any  claim.  In  the  world  of  totally  individualised  premium,  the  event  of  a  claim   would  increase  the  premium  of  that  policyholder  enormously.  And  that  seems  in  contradiction  with  the   way  we  think  about  insurance  i.e.  that  in  the  event  of  a  claim,  your  claim  is  paid  by  the  excess  premium  of   the  other  policyholders.  It  seems  that  with  the  introduction  of  Big  Data,  the  social  aspect  of  insurance  is   gone.   However,  which  customer  would  like  to  subscribe  to  such  an  insurance  offer?  One  could  then  argue  that  it   is  better  to  save  the  insurance  premium  on  your  own  and  put  it  aside  for  the  possibility  of  a  future  claim.   So  in  order  to  talk  about  insurance,  risk  pooling  will  always  be  necessary.  Big  Data  is  just  changing  the   way  we  pool  the  risks.   For  example,  until  recently,  the  premium  for  car  insurance  was  only  dependant  on  a  handful  of  indicators   (personal,  demographic  and  car  data).  Therefore,  an  insurance  portfolio  needed  to  be  big  enough  to  have   risk  pools  with  enough  diversification  on  the  other  indicators  that  could  not  be  measured.   In  recent  years  more  and  more  indicators  can  be  measured  and  used  as  data.  This  means  that  risk  pools   don’t  have  to  be  as  big  as  before  because  the  behaviour  of  each  individual  of  the  risk  pool  is  becoming   more   and   more   predictable.   Somebody   who   speeds   all   the   time   is   more   likely   to   have   an   accident.   Previously  this  was  assumed  to  be  people  with  car  with  high  horsepower.  Nowadays,  this  behaviour  can   be  exactly  measured,  removing  the  need  for  assumptions.  
  • 18.
        18 However,  as  long  as  there  is  a  future  event  that  is  uncertain,  risk  pooling  still  makes  sense.  The  risk  pools   are  just  becoming  smaller  and  more  predictable.  In  the  example  given,  even  a  driver  who  does  not  speed   can  still  be  involved  in  an  accident.   5.2 Personalised  Premium   Personalisation  of  risk  pricing  relies  upon  an  insurer  having  the  capacity  to  handle  a  vast  amount  of  data.   A  big  challenge  is  linked  with  data  collection,  making  sure  it  is  reliable  and  that  it  can  in  fact  be  used  for   insurance  pricing.  They  will  have  to  be  careful  not  to  be  overwhelmed  by  Big  Data.   We  stated  above  that  the  use  of  Big  Data  will  make  insurance  pricing  fairer.  In  this  case  fair  is  defined  as   taking  into  account  all  members  of  society.  However,  this  does  not  mean  that  everyone  in  society  should   be  treated  in  exactly  the  same  way.  Every  individual  should  have  an  equal  opportunity  to  what  is  on  offer.   However,  it  can  appear  that  the  offer  does  not  meet  the  requirements  of  the  customer,  or  vice  versa.  It   that  case,  an  insurance  cover  will  not  be  possible.     5.3 From  Insurance  to  Prevention   One  of  the  big  advantages  of  the  gathering  of  Big  Data  by  Insurance  companies  or  other  companies  is  that   this  data  can  in  a  certain  way  be  shared  with  its  customers.  In  that  way,  a  constant  interaction  can  arise   between  the  insurer  and  the  policyholder.  When  consumers  understand  better  how  their  behaviour  can   impact  their  insurance  premium,  they  can  make  changes  in  their  live  that  can  beneficial  both  parties.   A  typical  example  of  this  is  the  use  of  telematics  in  car  insurance.  A  box  in  the  insured  car  automatically   saves   and   transmits   all   driving   information   of   the   vehicle.   The   insurance   company   uses   this   data   to   analyse   the   risk   the   policyholder   is   facing   during   driving.   When   for   example   the   driver   is   constantly   speeding   and   braking   heavily,   the   insurance   company   can   take   this   as   an   indication   to   increase   the   premium.  On  the  other  hand,  someone  who  drives  calmly  and  outside  the  busy  hours  and  only  outside  the   city  will  be  rewarded  with  a  lower  premium.   In  this  way  insurers  will  have  an  impact  on  the  driving  behaviour  of  people.  Once  this  communication   between   policyholder   and   insurer   is   transparent,   the   policyholder   will   act   in   a   way   to   decrease   his   premium.  The  insurer  has  played  the  role  of  a  prevention  officer.   Another  example  is  “e-­‐Health”.  As  the  health  cost  is  rising  rapidly,  insurers  are  trying  to  lower  the  claim   costs.  It  is  found  that  everyday  living  habits  of  people,  for  example,  eating  behaviour,  the  amount  of  sleep   you  get,  or  the  number  of  hours  you  do  sport  has  a  large  influence  on  health  claims.   The  Internet  of  Things  will  have  an  impact  on  the  way  the  pricing  is  done  for  each  individual.  Thanks  to   modern  sensors,  insurer  will  be  able  to  acquire  data  at  the  individual/personal  level.  Each  policyholder   will  in  that  way  be  encouraged  to  sleep  enough,  sport  enough  and  eat  healthy.  All  in  all,  it  is  the  consumer   that  benefits  from  less  car  accidents,  a  healthy  lifestyle  and  …  lower  premiums.     5.4 The  all-­‐seeing  Insurer   Insurance  companies  have  always  been  interested  in  gathering  as  much  information  possible  on  the  risk   being  assured  and  the  people  insuring  them.  With  the  possibilities  of  Big  Data,  this  interest  in  people’s   everyday  life  increases  enormously.  Therefore  insurance  is  becoming  more  and  more  an  embedded  part   of  the  everyday  life  of  people  and  businesses.  Previously,  consumers  just  needed  to  fill  in  some  form  at   the   beginning   of   an   insurance   contract   and   the   impact   of   that   insurance   was   more   or   less   stable   and   predictable  during  the  whole  duration  of  the  contract,  whatever  the  future  behaviour  of  the  consumer.   With  the  introduction  of  Big  Data,  insurers  have  influence  on  every  aspect  of  everyday  life.  The  way  you   drive,  what  you  buy,  what  you  don’t  buy,  the  way  you  sleep,  etc.,  can  have  a  big  impact  on  your  financial   situation.   Indeed,   insurers   are   moving   into   a   position   of   a   central   tower,   observing   our   everyday   life   through  the  bias  of  smartphones  and  all  other  devices  and  sensors.   The  future  will  tell  us  how  far  the  general  public  will  allow  this  influence  of  insurance  companies.  Sharing   your  driving  behaviour  with  insurers  will  probably  not  be  a  problem  for  most  of  us,  but  sharing  what  we   eat  and  how  we  sleep  is  a  bigger  step.  Every  person  will  have  to  make  a  trade-­‐off  between  privacy  and   better  insurance  offer.  Currently,  for  instance  in  case  of  car  insurance  telematics,  drivers  have  an  opt-­‐in   option   and   they   can   decide   whether   they   are   interested   in   the   telematics-­‐based   offer.   However   in   the   future  data  collection  might  be  default  and  you  will  have  to  pay  extra  to  be  unlisted  and  keep  your  life   private.  
  • 19.
        19 5.5 Change  in  Insurance  business   From  an  actuarial  point  of  view  we  tend  to  focus  on  the  opportunities  big  data  hold  for  managing  and   pricing  risk.  But  the  digital  transformation  that  is  at  the  basis  of  big  data  (cfr.  the  increased  data  flow:  the   V’s   from   section   2.1   and   the   increased   computational   power:   section   2.2)   has   also   led   to   a   change   in   customer’s  expectations  and  behaviour.  The  ease  at  which  the  end  customer  can  access  information  and   interact  with  companies  and  the  way  the  digital  enterprises  have  developed  their  services  to  enhance  this   ease   of   use,   has   set   a   new   standard   in   customer   experience.   Customers   are   used   to   getting   quick   and   online  reactions  from  the  companies  they  buy  goods  and  services  from.  Industries  that  do  not  adapt  to   this   new   standard   can   quickly   get   an   image   of   old   fashion,   traditional   and   simply   not   interesting.   We   already  have  seen  new  distribution  models  changing  the  insurance  market  in  surrounding  countries,  i.e.   aggregator   websites   in   the   UK,   that   are   a   result   (or   play   into)   this   trend.   It   is   in   this   new   customer   experience  that  big  data  plays  an  important  role  and  can  be  a  real  element  of  competitive  advantage  as  it   gives  access  to  a  new  level  of  personalization.  Getting  this  personalization  right  can  give  a  company  the   buy-­‐in  into  future  customer  interactions  and  therefore  the  opportunity  for  expanding  the  customer  wallet   or  relation.  This  has  led  to  the  evolution  where  some  big  digital  retailers  have  continuously  expanded   their  offer  to  a  wide  and  loyal  customer  base,  even  into  the  insurance  business  (e.g.  Alibaba  Insurance).  If   these   players   get   it   right   they   can   change   the   insurance   distribution   landscape,   monopolizing   the   customer  relation  and  leaving  traditional  insurers  the  role  of  pure  risk  carriers.  For  now  this  evolution  is   less  noticeable  in  Belgium  where  the  traditional  Insurance  distribution  model  (brokers  and  banks)  still   firmly   holds   its   ground   giving   the   Belgium   Insurance   Industry   an   opportunity   to   modernize   (read   digitalize)  and  personalize  the  customer  experience  before  newcomers  do  so.   6 Actuarial  sciences  and  the  role  of  actuaries   Big  Data  opens  a  new  world  for  insurance  and  any  other  activity  based  on  data.  The  access  to  the  data,   the  scope  of  the  data,  the  frequency  of  the  data,  the  extension  of  the  samples  of  the  data,  are  important   elements  that  determine  to  what  extend  the  final  decision  is  inspired  by  the  statistical  evidence.  As  Big   Data  changes  those  properties  drastically,  it  also  changes  the  environment  of  those  who  use  these  data   drastically.  The  activity  of  the  actuary  is  particularly  influenced  by  the  underlying  data,  and  therefore  it   is   appropriate   to   conclude   that   the   development   of   the   Big   Data   world   has   a   major   impact   on   the   education  and  training  of  the  actuary,  the  tools  used  by  the  actuary,  the  role  of  the  actuary  in  the  process.   Data  science  aiming  to  optimise  the  analytics  in  function  of  the  volume  and  diversity  of  the  data  is  an   upcoming  and  fast  developing  field.  The  combination  of  the  actuarial  skills  and  research  allows  for  an   optimal  implementation  of  the  insights  and  tools  offered  by  the  data  science  world.   6.1 What  is  Big  Data  bringing  for  the  actuary?   6.1.1 Knowledge  gives  power   Big  data  gives  access  to  more  information  than  before:  this  gives  the  actuary  a  richer  basis  for  actuarial   mathematical   analysis.   When   data   are   more   granular   and   readily   available,   actuaries   can   extend  their   analysis  and  identify  better  the  risk  factors  and  the  underlying  dependencies.  Best  estimate  approaches   are   upgraded   to   stochastic   evidence.   Christophe   Geissler13   states   that   big   data   will   progressively   stimulate  the  actuary  to  abandon  purely  explicative  models  for  more  complex  models  aiming  to  identify   sub  groups  with  heterogenic  subgroups.  The  explicative  models  are  based  on  the  assumption  that  there   exists   a   formula   that   explains   the   behaviour   of   all   persons.   Big   data   and   the   calculation   power,   allow   developing  innovative  algorithms  to  detect  visible  and  verifiable  indicators  for  a  different  risk  profile.     6.1.2 Dynamic  risk  management   “Even   if   an   actuary   uses   data   to   develop   an   informed   judgement,   that   type   of   estimate   does   not   seem   sufficient   in   today’s   era   of   Big   Data”,   a   statement   that   can   be   read   on   a   discussion   forum   of   actuaries.   Instead,  dynamic  risk  management  is  considered  to  be  an  advanced  form  of  actuarial  science.  Actuarial   science  is  about  collecting  all  pertinent  data,  using  models  and  expertise  to  factor  risks,  and  then  making   a  decision.  Dynamic  risk  management  entails  real-­‐time  decision-­‐making  based  on  a  stream  of  data.     Scope  and  resources   Big  data  opens  the  horizon  of  the  actuary.  The  applications  of  Big  Data  go  far  beyond  the  insurance                                                                                                                             13  Christophe  Geissler,  Le  nouveau  big  bang  de  l’actuariat,  L’Argus  de  l’Assurance,  November  2013  
  • 20.
        20 activity  and  relate  to  various  domains  where  the  statistical  analysis  and  the  economic/financial   implications  are  essential.  Jobert  Koomans14,  board  member  of  the  Actuarial  Genootschap,  refers  to   estimates  that  Big  Data  will  create  a  big  number  of  jobs  (“1,5  million  new  data  analysts  will  be  required  in   the  US  in  2018”).  Given  that  actuaries  have  very  string  analytical  skills  combined  with  business   knowledge  thanks  to  being  involved  from  pricing  to  financial  reporting,  gives  them  a  lot  of  new   opportunities  across  different  industries.     6.2 What  is  the  actuary  bringing  to  Big  Data?   6.2.1 The  Subject  Matter  Expert   Data  are  a  tool  to  quantify  the  implications  of  events  and  behaviour.  The  initial  modelling  and  analysis   nevertheless  are  defining  the  framework  and  the  ultimate  outcome.  Deductive  and  inductive  approaches   can  be  used  in  this  context.     Kevin  Pledge15  refers  in  the  to  the  role  of  the  Subject  Matter  Expert.  “Understanding  the  business  is  a   critical  factor  for  analytics,  understanding  does  not  come  from  a  system,  but  from  training  and  experience.  …   Not  only  do  actuaries  have  the  quantitative  skills  to  be  data  scientists  of  insurance,  but  our  involvement  in   everything  from  pricing  to  financial  reporting  gives  us  the  business  knowledge  to  make  sense  of  this.  This   business  knowledge  is  as  important  as  the  statistical  and  quant  skills  typically  thought  of  when  you  think   data  scientist”.       Actuaries   are   well   placed   to   combine   the   data   analytics   and   the   business   knowledge.   The   specific   education  of  the  actuary  as  well  as  the  real  life  experience  in  the  insurance  industry  and  other  domains   with   actuarial   roots   are   essential   for   a   successful   implementation   of   the   big   date   approach.       6.2.2 Streamlining  the  process   The  actuary  formulates  the  objectives  and  framework  for  the  quantitative  research  and  by  this  initiates   the  Big  Data  process.  Big  data  requires  the  appropriate  technology  and  the  use  of  advanced  data  science.   Actuaries  can  help  to  optimise  this  computer  science  driven  analysis  with  their  in  depth  understanding  of   the  full  cycle.  Streamlining  the  full  process  from  detecting  the  needs  and  defining  the  models,  over  using   the  appropriate  data,  to  the  monitoring  of  the  outcome  taking  into  account  general  interest  and  specific   stakeholder  interest,  is  the  key  of  success  of  data  science  in  hands  of  the  actuary.     6.2.3 Simple  models  with  predictive  power   Esko  Kivisaari16:  “The  real  challenge  of  Big  Data  for  actuaries  is  to  create  valid  models  with  good  predictive   power  with  the  use  of  a  lots  of  data.  The  value  of  a  good  model  is  not  that  it  is  just  adapted  to  the  data  at   hand   but   it   should   have   predictive   power   outside   experience.   There   will   be   the   temptation   to   create   complicated  models  with  lots  of  parameters  that  closely  replicate  what  is  in  the  data.  The  real  challenge  is  to   have  the  insight  to  still  produce  simple  models  that  have  real  predictive  power.”     The  added  value  of  the  actuary  can  be  found  in  the  modelling  skills  and  the  ability  to  use  the  professional   judgement.   The   organisation   of   the   profession   and   the   interaction   with   peers   creates   the   framework   allowing  to  exercise  this  judgement.  Actuaries’  focus  also  goes  to  an  appropriate  communication  of  the   results  so  that  the  contribution  to  the  value  creation  can  be  optimized.         6.2.4 Information  to  the  individual  customer   Big  Data  can  help  to  find  answers  on  the  needs  of  consumers  and  society.  Customers  will  be  informed  on   their  behaviour  so  that  they  will  be  able  to  correct,  influence  and  change  the  risk  behaviour.  Actuaries  will   be  in  the  perfect  position  to  bring  the  data  back  to  the  customer,  be  it  through  the  pricing  of  insurance   products  or  through  helping  in  establishing  awareness  campaigns.                                                                                                                                       14  Jobert  Koomans,  Big  Data  –  Kennis  maakt  macht,  De  Actuaris  (Actuarieel  Genootschap),  May  2014   15  Kevin  Pledge,  Newsletters  of  the  Society  of  Actuaries,  October  2012   16  Esko  Kivisaari,  Big  Data  and  actuarial  mathematics,  working  paper  Insurance  Committee  of  the  Actuarial   Association  of  Europe,  March  2015  
  • 21.
        21 7 Conclusions   The   rise   of   technology   megatrends   like   ubiquitous   mobile   phones   and   social   media,   customer   personalization,   cloud   computing   and   Big   Data   have   enormous   impact   on   our   daily   lives   but   also   on   business  operations.  There  are  plenty  very  successful  businesses,  across  different  industries  that  regard   Big  Data  as  very  important  and  central  to  their  strategy.       In  this  information  paper  we  wanted  to  understand  what  would  be  the  impact  of  Big  Data  on  insurance   industry  and  the  actuarial  profession.  We  asked  ourselves  whether  insurers  are  immune  to  these  recent   changes?   Will   they   be   able   to   leverage   on   huge   volumes   of   new   available   data   coming   from   various   sources  (mobile  phones,  social  media,  telematics  sensors,  wearables)  and  power  of  Big  Data?         We  think  that  Big  Data  will  have  various  effects.  It  will  demand  from  companies  to  adopt  new  business   culture  and  become  data-­‐driven  businesses.  It  will  have  an  impact  on  the  entire  insurance  value  chain,   ranging  from  underwriting  to  claims  management.       Today’s   advanced   analytics   in   insurance   go   much   further   than   traditional   underwriting   and   actuarial   science.  Machine  learning  and  predictive  modelling  is  the  way  forward  for  insurers  for  improving  pricing,   segmentation   and   increasing   profitability.   For   instance   direct   measurement   of   driving   behaviour   provides  new  rating  factors  and  transforms  auto  insurance  underwriting  and  pricing  processes.       Big   Data   can   also   play   a   tremendous   role   in   the   improvement   of   claims   management   by   for   instance   providing  very  efficient  fraud  detection  models.       We  would  note  that  there  are  few  inhibitors  that  could  block  these  changes  with  legislation  being  one  of   the   main   concerns.   The   EU   is   currently   working   on   General   Data   Protection   Regulation   (GDPR)   that   updates  the  data  processing,  protection  privacy  and  establishes  legislation  adapted  to  the  digital  era.  It  is   still  unclear  what  will  be  the  final  agreement  but  the  Regulation  must  be  appropriately  balanced  in  order   to  guarantee  a  high  level  of  protection  of  the  individuals  and  allow  companies  to  preserve  innovation  and   competitiveness.       Finally  we  discussed  new  frontiers  of  insurance.  Big  Data  gives  us  huge  amount  of  information  and  allows   creating  “fairer”,  more  personalized  insurance  premium  being  at  odds  with  solidarity  aspect  of  insurance.   However  we  think  that  Big  Data  will  not  revolutionize  it  and  risk  pooling  will  remain  core,  it  will  just   become  better.       Big   Data   opens   a   lot   of   new   possibilities   for   actuaries.   Data   science   and   actuarial   science   do   mutually   reinforce  each  other.  More  data  allow  for  a  richer  basis  for  actuarial  mathematical  analysis,  big  data  leads   to   a   dynamic   risk   management   approach;   the   application   of   Big   Data   goes   far   beyond   the   insurance   activity  and  therefore  offers  a  lot  of  new  opportunities.  The  implementation  of  Big  Data  in  insurance  and   the  financial  services  industry  requires  the  input  of  the  actuary  as  the  subject  matter  expert  who  also   understands   the   complex   methodology.   For   Big   Data   to   be   successful,   understandable   models   with   predictive  power  are  required  for  which  the  professional  judgement  of  the  actuary  is  essential.       We  hope  that  the  paper  will  be  a  good  starting  point  for  the  discussion  about  the  interplay  between  Big   Data   and   insurance   and   the   actuarial   profession.   The   Institute   for   Actuaries   in   Belgium   will   further   develop  the  subject  and  prepare  the  Belgian  actuaries.                        
  • 22.
        22 8 References   8.1 Section  3.1   [1]  Predictive  modeling  for  life  insurance,  Mike  Batty,  2010,  Deloitte   (https://www.soa.org/files/pdf/research-­‐pred-­‐mod-­‐life-­‐batty.pdf  )   [2]  Predictive  modeling  in  insurance:  key  issues  to  consider  throughout  the  lifecycle  of  a  model,  Chris   Homewood,  2012,    Swiss  Re,   (http://www.swissre.com/library/archive/?searchByType=1010965&searchByType=1010965&sort=de scending&sort=descending&search=yes&search=yes&searchByLanguage=851547&searchByLanguage=8 51547&m=m&m=m&searchByCategory=1023505&searchByCategory=1023505&searchByYear=872532 &searchByYear=872532#inline  )   [3]  Data  analytics  in  life  insurance:  lessons  from  predictive  underwriting,  Willam  Trump,  2014,  Swiss   Re(http://cgd.swissre.com/risk_dialogue_magazine/Healthcare_revolution/Data_Analytics_in_life_insura nce.html  )   [4]  Advanced  analytics  and  the  art  of  underwriting,  transforming  the  insurance  industry,  2007,  Deloitte   (https://www.risknet.de/fileadmin/.../Deloitte-­‐Underwriting-­‐2007.pdf)   [5]  Data  Management:  Foundation  for  a  360-­‐degree  Customer  View-­‐  White  Paper,  2012?,  Pitney  Bowes   Software  (http://www.retailsolutionsonline.com/doc/data-­‐management-­‐foundation-­‐for-­‐a-­‐degree-­‐ customer-­‐view-­‐0001)   [6]  Unleashing  the  value  of  advanced  analytics  in  insurance,  Richard  Clarke  and  Ari  Libarikian,  2014,   McKinsey   (http://www.mckinsey.com/insights/financial_services/unleashing_the_value_of_advanced_analytics_in_ insurance)     8.2 Section  3.2   [7]  Ptolemus  USAGE-­‐BASED  INSURANCE  Global  Study  2013   [8]  Milliman:  Usage-­‐based  insurance:  Big  data,  machine  learning,  and  putting  telematics  to  work  -­‐  Marcus   Looft,  Scott  C.  Kurban   [9]  Capitalizing  on  Big  Data  Analytics  for  the  Insurance  Industry   [10]  Driving  profitability  and  lowering  costs  in  the  Insurance  Industry  using  Machine  Learning  on   Hadoop  -­‐August  6,  2015  /  Amit  Rawlani  /  Big  Data  Ecosystem,  Business,  Machine  Learning  /  Leave  a   Reply   [11]  HSBC  -­‐  Big  data  opens  new  horizons  for  insurers     8.3 Section  3.3:   [12]  Antonio,  K.,  &  Plat,  R.  (2014).  Micro-­‐level  stochastic  loss  reserving  in  general  insurance.  Scandinavian   Actuarial  Journal  ,  649-­‐669.   [13]  Arjas,  E.  (1989).  The  Claims  Reserving  Problem  in  Non-­‐Life  Insurance:  Some  Structural  Ideas.  Astin   Bulletin  19  (2),  140-­‐152.   [14]  England,  P.,  &  Verrall,  R.  (2002).  Stochastic  Claims  Reserving  in  General  Insurance.  British  Actuarial   Journal  (8),  443-­‐544.   [15]  Gremillet,  M.,  Miehe,  P.,  &  Trufin,  J.  (n.d.).  Implementing  the  Individual  Claims  Reserving  Method,  A   New  Approach  in  Non-­‐Life  Reserving.  Working  Paper  .   [16]  Haastrup,  S.,  &  Arjas,  E.  (1996).  Claims  Reserving  in  Continuous  Time  -­‐  A  Nonparametric  Bayesian   Approach.  ASTIN  Bulletin  (26),  139-­‐164.   [17]  Jewell,  W.  (1989).  Predicting  IBNYR  Events  and  Delays,  Part  I  Continuous  Time.  ASTIN  Bulletin  (19),   25-­‐56.   [18]  Jin,  X.,  &  Frees,  E.  W.  (n.d.).  Comparing  Micro-­‐  and  Macro-­‐Level  Loss  Reserving  Models.  Working   Paper  .   [19]  Larsen,  C.  R.  (2007).  An  Individual  Claims  Reserving  Model.  ASTIN  Bulletin  (37),  113-­‐132.   [20]  Mack,  T.  (1993).  Distribution-­‐free  calculation  of  the  standard  error  of  Chain  Ladder  reserve   estimates.  ASTIN  Bulletin  (23),  213-­‐225.   [21]  Mack,  T.  (1999).  The  standard  error  of  Chain  Ladder  reserve  estimate:  recursive  calculation  and   inclusion  of  a  tail  factor.  ASTIN  Bulletin  (29),  361-­‐366.   [22]  Norberg,  R.  (1999).  Prediction  of  Outstanding  Liabilities  II:  Model  Variations  and  Extensions.  ASTIN   Bulletin  (29),  5-­‐25.   [23]  Norberg,  R.  (1993).  Prediction  of  Outstanding  Liabilities  in  Non-­‐Life  Insurance.  ASTIN  Bulletin  (23),   95-­‐115.  
  • 23.
        23 [24]  Pigeon,  M.,  Antonio,  K.,  &  Denuit,  M.  (2014).  Individual  loss  reserving  using  paid-­‐incurred  data.   Insurance:  Mathematics  &  Economics  (58),  121-­‐131.   [25]  Pigeon,  M.,  Antonio,  K.,  &  Denuit,  M.  (2013).  Individual  Loss  Reserving  with  the  Multivariate  Skew   Normal  Model.  ASTIN  Bulletin  (43),  399-­‐428.   [26]  Wüthrich,  M.,  &  Merz,  M.  (2008).  Modelling  the  claims  development  result  for  Solvency  purposes.   ASTIN  colloquium  .   [27]  Wüthrich,  M.,  &  Merz,  M.  (2008).  Stochastic  Claims  Reserving  Methods  in  Insurance.  New  York:  Wiley.   [28]  Zhao,  X.,  &  Zhou,  X.  (2010).  Applying  Copula  Models  to  Individual  Claim  Loss  Reserving  Methods.   Insurance:  Mathematics  and  Economics  (46),  290-­‐299.[29]  Zhao,  X.,  Zhou,  X.,  &  Wang,  J.  (2009).   Semiparametric  Model  for  Prediction  of  Individual  Claim  Loss  Reserving.  Insurance:  Mathematics  and   Economics  (45),  1-­‐8.       8.4 Section  4   [30]  Avraham,  R.,  Logue,  K.  D.,  and  Schwarcz,  D.  B.,  Understanding  Insurance  Anti-­‐Discrimination  Laws,   Law  &  Economics  Working  Papers  52,  University  Michigan,  2013.   [31]  Davey,  James,  Genetic  discrimination  in  insurance:  lessons  from  test  achats  in  De  Paor,  A.,  Quinn,   G.  and  Blanck,  P.  (eds.),  Genetic  Discrimination  -­‐  Transatlantic  Perspectives  on  the  Case  for  a  European   Level  Legal  Response,  Abingdon,  2014.   [32]  Yann  Joly  et  al.,  Life  insurance:  genomic  stratification  and  risk  classification  in  European  Journal  of   Human  Genetics,  2014  May;  22(5),  575–579,  p.  575