Improve Geocoding Accuracy For Insurance Risk Assessment and Pricing

•Download as PPTX, PDF•

1 like•340 views

This document discusses the importance of location data and geocoding in the insurance industry. It notes that pricing insurance policies depends heavily on assessing the risks associated with a property's location. It then outlines three key areas to improve geocoding accuracy: input addresses must be clean and structured, the geocoding engine needs to be optimized, and the reference database G-NAF requires better completeness and timeliness in adding new addresses. The overall goal is to achieve a geocoding match rate of over 95% to properly assess location-based risks.

Business

GIS Development Manager
IAG Direct Insurance - Geospatial Information
Hugh Saalmans
Linking millions of
people, policies and places

insurance & IAG
pricing & location based risk
geocoding & addressing
G-NAF
agenda

insurance & IAG
pricing & location based risk
geocoding & addressing
G-NAF
agenda
insurance & IAG

“...the business of
manufacturing promises”
~ Warren Buffet, 2012
what is insurance?

insurance is a sizable industry
$19,689,000,000
1.3% GDP
Source: APRA June 2012 Quarterly GI Performance Statistics
claims paid p.a.

insurance & IAG
pricing & location based risk
geocoding & addressing
G-NAF
pricing & location based risk
agenda

pricing is affected by…
reinsurance costs

pricing is affected by…
industry competition

pricing is affected by…
government
fees & charges

but most affected by…
how often a
customer will
need to make a
claim; and

but most affected by…
...how much it will
cost each time

to predict claims and claims cost - we need
to determine the risk of something
happening
to do that, we start with each property’s
location and then assess its risk

bushland
location based risk
distance to bushland = a bushfire risk factor

park
bushland
location based risk
distance to parks = a burglary risk factor

park
bushland
location based risk
distance to main roads = a collision risk factor

park
bushland
location based risk
regional geology determines earthquake risk

G - N A F
calculating location based risk

geocodingL O C A T I O N
G - N A F
calculating location based risk

spatial relationshipsWhat’s nearby?
geocodingL O C A T I O N
G - N A F
calculating location based risk

modellingWhat could happen?
spatial relationshipsWhat’s nearby?
geocodingL O C A T I O N
G - N A F
calculating location based risk

claim frequencyHow often?
modellingWhat could happen?
spatial relationshipsWhat’s nearby?
geocodingL O C A T I O N
G - N A F
calculating location based risk

damage curvesHow severe?
claim frequencyHow often?
modellingWhat could happen?
spatial relationshipsWhat’s nearby?
geocodingL O C A T I O N
G - N A F
calculating location based risk

insurance & IAG
pricing & location based risk
geocoding & addressing
G-NAF
geocoding & addressing
agenda

DI & geocoding
why? geocoding is the only cost effective method
of locating risks, at the property level, on a
national scale
1:1 pricing - a foundation of our strategy
What do we use?
geocoding

DI & geocoding
status? geocoding and geo-pricing
rolled out nationally
geocoding
100% overall geocoding rate
>95% household geocoding rate

what is geocoding?
converting addresses into usable locations

3 areas for improvement
1. input addresses
2. address matching
3. reference addresses (G-NAF)
geocoding

UNIT
2
LEVEL
3
10-20
ALFRED
STREET
NORTH
NORTH SYDNEY
2060
NSW
need a good
address structure
Street 2/3/10-20 Alfred St N
Suburb Nth Sydney NSW 2060
legacy addresses will
need to be cleansed
input addresses
sub-dwelling type
sub-dwelling no
level type
level no
street number
street name
street type
street suffix
locality
postcode
state

good address capture has:
 structured input
 predefined pick lists
 rapid address lookup
input addresses

locality issues
 local names vs gazetted
e.g. 96% of customers provide
an unofficial suburb name in
Tamworth
input addresses

locality issues
 vanity suburbs
input addresses
Kingswood
Heights

the geocoding engine
 evaluate, reconfigure,
test it
 G-NAF 360o test
(geocode G-NAF itself)
address matching

the geocoding engine
 add your own logic
 talk to your vendor
address matching

insurance & IAG
pricing & location based risk
geocoding & addressing
G-NAF
agenda
G-NAF

 is it any good – yes it is
 errors – yes, but limited
 completeness - ~95% complete
 timeliness – can take up to 12 months for
new addresses to be added
G-NAF (reference addresses)

errors
 mostly transient
 range from amusing to business impact
can impact customers
 use DIY database rules and QA to
limit the impact
G-NAF

errors
 examples:
units being 1km from their building address
addresses assigned to the wrong duplicate
locality
alias and principals with diff. cords
G-NAF

completeness
 postcodes sparsely
populated on addresses
G-NAF

completeness
 sub-dwellings generally
don’t have unique coords
 Your customer data can be
an additional “G-NAF data
source”... use it!
G-NAF

timeliness
 Some new houses
are insurable
before address is in
G-NAF
 That’s why G-NAF
Live is encouraging
G-NAF

location is fundamental to insurance
geocoding - 3 areas of improvement (to get
>95%)
 addresses – clean, structured, well captured
 engine – tested, optimised, customised
 G-NAF – postcodes, sub-dwellings,
G-NAF Live
summary

What's hot

The State of Smart TV AdvertisingTV[R]EV

The Future of Telecom (Petro Chernyshov Business Stream)IT Arena

Top Digital Strategic Predictions for 2017 and BeyondDuy, Vo Hoang

LUMA's Upfront Summit Keynote: "The Future of TV"LUMA Partners

andrew milroy - top security trends and takeaways for 2013Graeme Wood

Multi device ap ps innovattia 121015Perla Salcedo

What's hot (6)

The State of Smart TV Advertising

The Future of Telecom (Petro Chernyshov Business Stream)

Top Digital Strategic Predictions for 2017 and Beyond

LUMA's Upfront Summit Keynote: "The Future of TV"

andrew milroy - top security trends and takeaways for 2013

Multi device ap ps innovattia 121015

Similar to Improve Geocoding Accuracy For Insurance Risk Assessment and Pricing

Top-of-mind Insurance Webinar PresentationRocket Fuel Inc.

Application Software - M&A SummaryAlps Venture Partners

200708 prop cas360-ona_cleardaySteven Callahan

Navigating the maze of google bid adjustment v2Wing Yee Lee

Construction Industry Forum & OutlookRea & Associates

LoanScience_AFSummit_presentationKeith Shields

Advanced Pricing in General InsuranceSyed Danish Ali

The Fight for Dominance; RCS vsOTTsmobilesquared Ltd

Simplifying Risk Management with Accurate Property and Risk DataPrecisely

Leveraging Location Intelligence Data for Efficient Claims and Underwriting O...Precisely

RingCentral (RNG) Equity ReportMike Zimmer

Data Con LA 2019 - Pitney Bowes methodologies to Organize, Enrich and Analyze...Data Con LA

How Transparency Will Shape Location Marketing for the FutureMediaPost

Why the opportunity for RCS is now: RCS Business Messaging market forecasts t...mobilesquared Ltd

Digiday Programmatic Media Summit. Glen Straub. FactualDigiday

They're Not Wrong, They're Just Not RightZeta Global

Pay Per Click Advertising - Click Fraud ProposalBrainfartsy

CL King Virtual NDR - April 2020WinnebagoInd

Kidnap and ransom insurance at an inflection pointCognizant

a technology_powered real estate companyKashifKashif24

Similar to Improve Geocoding Accuracy For Insurance Risk Assessment and Pricing (20)

Top-of-mind Insurance Webinar Presentation

Application Software - M&A Summary

200708 prop cas360-ona_clearday

Navigating the maze of google bid adjustment v2

Construction Industry Forum & Outlook

LoanScience_AFSummit_presentation

Advanced Pricing in General Insurance

The Fight for Dominance; RCS vsOTTs

Simplifying Risk Management with Accurate Property and Risk Data

Leveraging Location Intelligence Data for Efficient Claims and Underwriting O...

RingCentral (RNG) Equity Report

Data Con LA 2019 - Pitney Bowes methodologies to Organize, Enrich and Analyze...

How Transparency Will Shape Location Marketing for the Future

Why the opportunity for RCS is now: RCS Business Messaging market forecasts t...

Digiday Programmatic Media Summit. Glen Straub. Factual

They're Not Wrong, They're Just Not Right

Pay Per Click Advertising - Click Fraud Proposal

CL King Virtual NDR - April 2020

Kidnap and ransom insurance at an inflection point

a technology_powered real estate company

Recently uploaded

RSA Conference Exhibitor List 2024 - Exhibitors DataExhibitors Data

Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club

Famous Olympic Siblings from the 21st Centuryrwgiffor

HONOR Veterans Event Keynote by Michael HawkinsMichael W. Hawkins

MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo

Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...Aggregage

Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...amitlee9823

👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...rajveerescorts2022

Dr. Admir Softic_ presentation_Green Club_ENG.pdfAdmir Softic

Organizational Transformation Lead with CultureSeta Wicaksana

KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...Any kyc Account

Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls

The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee

Cracking the Cultural Competence Code.pptxWorkforce Group

B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxpriyanshujha201

Mondelez State of Snacking and Future Trends 2023Neil Kimberley

Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora

Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888

Boost the utilization of your HCL environment by reevaluating use cases and f...Roland Driesen

Recently uploaded (20)

RSA Conference Exhibitor List 2024 - Exhibitors Data

Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...

Famous Olympic Siblings from the 21st Century

HONOR Veterans Event Keynote by Michael Hawkins

MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL

Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...

Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...

👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...

Dr. Admir Softic_ presentation_Green Club_ENG.pdf

Organizational Transformation Lead with Culture

KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...

Best VIP Call Girls Noida Sector 40 Call Me: 8448380779

The Coffee Bean & Tea Leaf(CBTL), Business strategy case study

Cracking the Cultural Competence Code.pptx

B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx

Mondelez State of Snacking and Future Trends 2023

Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...

Call Girls In Panjim North Goa 9971646499 Genuine Service

Boost the utilization of your HCL environment by reevaluating use cases and f...

Improve Geocoding Accuracy For Insurance Risk Assessment and Pricing

1. GIS Development Manager IAG Direct Insurance - Geospatial Information Hugh Saalmans Linking millions of people, policies and places

2. insurance & IAG pricing & location based risk geocoding & addressing G-NAF agenda

3. insurance & IAG pricing & location based risk geocoding & addressing G-NAF agenda insurance & IAG

4. “...the business of manufacturing promises” ~ Warren Buffet, 2012 what is insurance?

5. insurance is a sizable industry $19,689,000,000 1.3% GDP Source: APRA June 2012 Quarterly GI Performance Statistics claims paid p.a.

10.

11. insurance & IAG pricing & location based risk geocoding & addressing G-NAF pricing & location based risk agenda

12. pricing is affected by… reinsurance costs

13. pricing is affected by… industry competition

14. pricing is affected by… government fees & charges

15. but most affected by… how often a customer will need to make a claim; and

16. but most affected by… ...how much it will cost each time

17. to predict claims and claims cost - we need to determine the risk of something happening to do that, we start with each property’s location and then assess its risk

18. location based risk

19. bushland location based risk distance to bushland = a bushfire risk factor

20. park bushland location based risk distance to parks = a burglary risk factor

21. park bushland location based risk distance to main roads = a collision risk factor

22. park bushland location based risk regional geology determines earthquake risk

23. G - N A F calculating location based risk

24. geocodingL O C A T I O N G - N A F calculating location based risk

25. spatial relationshipsWhat’s nearby? geocodingL O C A T I O N G - N A F calculating location based risk

26. modellingWhat could happen? spatial relationshipsWhat’s nearby? geocodingL O C A T I O N G - N A F calculating location based risk

27. claim frequencyHow often? modellingWhat could happen? spatial relationshipsWhat’s nearby? geocodingL O C A T I O N G - N A F calculating location based risk

28. damage curvesHow severe? claim frequencyHow often? modellingWhat could happen? spatial relationshipsWhat’s nearby? geocodingL O C A T I O N G - N A F calculating location based risk

29. insurance & IAG pricing & location based risk geocoding & addressing G-NAF geocoding & addressing agenda

30. DI & geocoding why? geocoding is the only cost effective method of locating risks, at the property level, on a national scale 1:1 pricing - a foundation of our strategy What do we use? geocoding

31. DI & geocoding status? geocoding and geo-pricing rolled out nationally geocoding 100% overall geocoding rate >95% household geocoding rate

32. what is geocoding?

33. what is geocoding? converting addresses into usable locations

34. 3 areas for improvement 1. input addresses 2. address matching 3. reference addresses (G-NAF) geocoding

35. UNIT 2 LEVEL 3 10-20 ALFRED STREET NORTH NORTH SYDNEY 2060 NSW need a good address structure Street 2/3/10-20 Alfred St N Suburb Nth Sydney NSW 2060 legacy addresses will need to be cleansed input addresses sub-dwelling type sub-dwelling no level type level no street number street name street type street suffix locality postcode state

36. good address capture has:  structured input  predefined pick lists  rapid address lookup input addresses

37. locality issues  local names vs gazetted e.g. 96% of customers provide an unofficial suburb name in Tamworth input addresses

38. locality issues  vanity suburbs input addresses Kingswood Heights

39. the geocoding engine  evaluate, reconfigure, test it  G-NAF 360o test (geocode G-NAF itself) address matching

40. the geocoding engine  add your own logic  talk to your vendor address matching

41. insurance & IAG pricing & location based risk geocoding & addressing G-NAF agenda G-NAF

42.  is it any good – yes it is  errors – yes, but limited  completeness - ~95% complete  timeliness – can take up to 12 months for new addresses to be added G-NAF (reference addresses)

43. errors  mostly transient  range from amusing to business impact can impact customers  use DIY database rules and QA to limit the impact G-NAF

44. errors  examples: units being 1km from their building address addresses assigned to the wrong duplicate locality alias and principals with diff. cords G-NAF

45. completeness  postcodes sparsely populated on addresses G-NAF

46. completeness  sub-dwellings generally don’t have unique coords  Your customer data can be an additional “G-NAF data source”... use it! G-NAF

47. timeliness  Some new houses are insurable before address is in G-NAF  That’s why G-NAF Live is encouraging G-NAF

48. location is fundamental to insurance geocoding - 3 areas of improvement (to get >95%)  addresses – clean, structured, well captured  engine – tested, optimised, customised  G-NAF – postcodes, sub-dwellings, G-NAF Live summary

Editor's Notes

Hi everyone, thank you very much for coming to this presentation. I’d like to start off with a bit of interactivity! So, a show of hands please! Who works for an organisation that geocodes their address data? Very good – now who’s got a property level geocoding rate greater than 95%? (Excellent, not an easy achievement) OR (Who thinks that’s achievable?) It’s definitely achievable, our property level geocoding rate is 9n% Today I’d like to share with you how it can be done, and also touch on how users, vendors, and data custodians, aka the geospatial industry, have the opportunity to enhance addressing & geocoding further. I’d also like to share with you why geocoding is so important to my organisation.
I’d like to start today’s discussion by giving you some background to insurance & IAGrect Insurance ...before diving into the fundamentals of insurance pricing and location based risk - to give you some context to the core of today’s presentation Which is primarily about geocoding and addressing... ...and about sharing our experiences implementing a large scale geocoding system using G-NAF.
What is insurance? According to Warren Buffet on a recent Australian visit – it’s the business of manufacturing promises. That in return for a purchasing an insurance policy, an insurer will promise to help a customer recover financially in the case of a disaster or accident. Albeit within the limits of what is covered under that policy.
The promises fulfilled by the Australian insurance industry last financial year equated to $19.7 billion in claims paid to smash repairers, builders, suppliers, and customers. This was over 1% of GDP - not an insignificant amount.
The division I represent is Direct Insurance (DI); which looks after the NRMA, SGIC and SGIO insurance brands, we also have a joint venture with RACV.
To give you an idea of the scale of IAG’s operations - Last financial year, we insured over 16 million risks. i.e. over 16 million homes, cars, businesses, farms. We sold almost $9bn worth of policies and insured over $1.5 trillion of personal and commercial property – roughly the same as Australia’s GDP. Locally, DI contributed to roughly 50% of the group’s business.
To give you an idea of the scale of IAG’s operations - Last financial year, we insured over 16 million risks. i.e. over 16 million homes, cars, businesses, farms. We sold almost $9bn worth of policies and insured over $1.5 trillion of personal and commercial property – roughly the same as Australia’s GDP. Locally, DI contributed to roughly 50% of the group’s business.
To give you an idea of the scale of IAG’s operations - Last financial year, we insured over 16 million risks. i.e. over 16 million homes, cars, businesses, farms. We sold almost $9bn worth of policies and insured over $1.5 trillion of personal and commercial property – roughly the same as Australia’s GDP. Locally, DI contributed to roughly 50% of the group’s business.
To give you an idea of the scale of IAG’s operations - Last financial year, we insured over 16 million risks. i.e. over 16 million homes, cars, businesses, farms. We sold almost $9bn worth of policies and insured over $1.5 trillion of personal and commercial property – roughly the same as Australia’s GDP. Locally, DI contributed to roughly 50% of the group’s business.
Fundamental to our ability to insure millions of risks, and hundreds of billions of dollars of assets, is accurate pricing. Let’s look at that more closely
There are many factors that go into pricing an insurance policy, such as: Reinsurance costs (the insurance that protects insurers against catastrophic loss) Competition within the industry and Government fees and charges
There are many factors that go into pricing an insurance policy, such as: Reinsurance costs (the insurance that protects insurers against catastrophic loss) Competition within the industry and Government fees and charges
There are many factors that go into pricing an insurance policy, such as: Reinsurance costs (the insurance that protects insurers against catastrophic loss) Competition within the industry and Government fees and charges
But at the heart of a policy’s price is being able to predict how often a customer will need to make a claim and how much it will cost each time To do that we need to determine the risk of something happening, but where do we start... We start with location.
But at the heart of a policy’s price is being able to predict how often a customer will need to make a claim and how much it will cost each time To do that we need to determine the risk of something happening, but where do we start... We start with location.
Risk is fundamentally defined by location and it heavily influences the price of an insurance premium. This is because it defines the risk each property faces at the household level: e.g. whether you live in proximity to a park; or near bushland; or if you live on a main road It also defines the risk at the suburb level, like your local crime rate; Or at the regional level - like your earthquake risk
Risk is fundamentally defined by location and it heavily influences the price of an insurance premium. This is because it defines the risk each property faces at the household level: e.g. whether you live in proximity to a park; or near bushland; or if you live on a main road It also defines the risk at the suburb level, like your local crime rate; Or at the regional level - like your earthquake risk
Risk is fundamentally defined by location and it heavily influences the price of an insurance premium. This is because it defines the risk each property faces at the household level: e.g. whether you live in proximity to a park; or near bushland; or if you live on a main road It also defines the risk at the suburb level, like your local crime rate; Or at the regional level - like your earthquake risk
Risk is fundamentally defined by location and it heavily influences the price of an insurance premium. This is because it defines the risk each property faces at the household level: e.g. whether you live in proximity to a park; or near bushland; or if you live on a main road It also defines the risk at the suburb level, like your local crime rate; Or at the regional level - like your earthquake risk
So how do we determine these location based risks for a property? We start with location by geocoding an address to determine it’s location. We then look at the spatial relationships between that location and the surrounding area. We look at whether the property is near a park? Is it near bushland? Is it on a main road? For this , we use a variety of datasets, such as NAVTEQ street and POI data That data is then fed into a statistical model, to confirm which spatial relationships explain the risk of an event occurring, such as a bushfire. Using claim frequency data we can then determine how often an event might happen – every 5 years, every 10 years, every 50 years? Now that we’ve determined the types of risks that exist and how often they might occur - we can then apply historical damage data to assess the percentage of damage to a particular type of house would occur. To be able to do this work, we need a reference addresses set that we can both geocode against, and that we can use for spatial analysis, on a national scale. That dataset is obviously G-NAF.
So how do we determine these location based risks for a property? We start with location by geocoding an address to determine it’s location. We then look at the spatial relationships between that location and the surrounding area. We look at whether the property is near a park? Is it near bushland? Is it on a main road? For this , we use a variety of datasets, such as NAVTEQ street and POI data That data is then fed into a statistical model, to confirm which spatial relationships explain the risk of an event occurring, such as a bushfire. Using claim frequency data we can then determine how often an event might happen – every 5 years, every 10 years, every 50 years? Now that we’ve determined the types of risks that exist and how often they might occur - we can then apply historical damage data to assess the percentage of damage to a particular type of house would occur. To be able to do this work, we need a reference addresses set that we can both geocode against, and that we can use for spatial analysis, on a national scale. That dataset is obviously G-NAF.
So how do we determine these location based risks for a property? We start with location by geocoding an address to determine it’s location. We then look at the spatial relationships between that location and the surrounding area. We look at whether the property is near a park? Is it near bushland? Is it on a main road? For this , we use a variety of datasets, such as NAVTEQ street and POI data That data is then fed into a statistical model, to confirm which spatial relationships explain the risk of an event occurring, such as a bushfire. Using claim frequency data we can then determine how often an event might happen – every 5 years, every 10 years, every 50 years? Now that we’ve determined the types of risks that exist and how often they might occur - we can then apply historical damage data to assess the percentage of damage to a particular type of house would occur. To be able to do this work, we need a reference addresses set that we can both geocode against, and that we can use for spatial analysis, on a national scale. That dataset is obviously G-NAF.
So how do we determine these location based risks for a property? We start with location by geocoding an address to determine it’s location. We then look at the spatial relationships between that location and the surrounding area. We look at whether the property is near a park? Is it near bushland? Is it on a main road? For this , we use a variety of datasets, such as NAVTEQ street and POI data That data is then fed into a statistical model, to confirm which spatial relationships explain the risk of an event occurring, such as a bushfire. Using claim frequency data we can then determine how often an event might happen – every 5 years, every 10 years, every 50 years? Now that we’ve determined the types of risks that exist and how often they might occur - we can then apply historical damage data to assess the percentage of damage to a particular type of house would occur. To be able to do this work, we need a reference addresses set that we can both geocode against, and that we can use for spatial analysis, on a national scale. That dataset is obviously G-NAF.
So how do we determine these location based risks for a property? We start with location by geocoding an address to determine it’s location. We then look at the spatial relationships between that location and the surrounding area. We look at whether the property is near a park? Is it near bushland? Is it on a main road? For this , we use a variety of datasets, such as NAVTEQ street and POI data That data is then fed into a statistical model, to confirm which spatial relationships explain the risk of an event occurring, such as a bushfire. Using claim frequency data we can then determine how often an event might happen – every 5 years, every 10 years, every 50 years? Now that we’ve determined the types of risks that exist and how often they might occur - we can then apply historical damage data to assess the percentage of damage to a particular type of house would occur. To be able to do this work, we need a reference addresses set that we can both geocode against, and that we can use for spatial analysis, on a national scale. That dataset is obviously G-NAF.
So how do we determine these location based risks for a property? We start with location by geocoding an address to determine it’s location. We then look at the spatial relationships between that location and the surrounding area. We look at whether the property is near a park? Is it near bushland? Is it on a main road? For this , we use a variety of datasets, such as NAVTEQ street and POI data That data is then fed into a statistical model, to confirm which spatial relationships explain the risk of an event occurring, such as a bushfire. Using claim frequency data we can then determine how often an event might happen – every 5 years, every 10 years, every 50 years? Now that we’ve determined the types of risks that exist and how often they might occur - we can then apply historical damage data to assess the percentage of damage to a particular type of house would occur. To be able to do this work, we need a reference addresses set that we can both geocode against, and that we can use for spatial analysis, on a national scale. That dataset is obviously G-NAF.
I hope that gives you some context as to the value of geocoding to the insurance industry Let’s have a look at how we’ve implemented geocoding, and have a look at geocoding and addressing issues in detail
So why does Direct Insurance use geocoding? We use it because it’s the only cost effective method of locating all our customers across Australia, down to the household level. It is a key part of our 1:1 pricing strategy – to be able to price each customer individually based on their localised risk factors, at the household level. We’ve implemented Mastersoft’s Harmony Suite, with G-NAF, for address parsing and matching.
We’ve rolled out geocoding and individual customer pricing across several million policies, nationally Overall we’ve achieved a 100% geocoding rate. More importantly though, we’ve achieved a 9n% household level match rate
Geocoding at a basic level is simply a process for converting an address into a usable location. The key to a good geocoding rate is straightforward enough - but it can be difficult or expensive to implement, depending on the volume or structure of your address data
Geocoding at a basic level is simply a process for converting an address into a usable location. The key to a good geocoding rate is straightforward enough - but it can be difficult or expensive to implement, depending on the volume or structure of your address data
So what prevents you from getting a good geocoding rate? There are 3 distinct areas where your geocoding rate can be improved, and these are mostly common sense: 1 - The first point is the most obvious one – the quality of your own address data. 2 - The second is the flexibility of your geocoding system to interpret each input address in a multitude of ways to match it to a known address. 3 – Lastly is the quality of the reference addresses used by your geocoding engine. In other words the quality of G-NAF.
Looking at input address issues: Probably the most common issue is poor address structure. Not having addresses stored in a consistent set of fields. Another key issue is that you won’t be able to achieve a high geocoding rate without manually or at least semi-automatically cleaning up your addresses. If you’ve been gathering addresses over a long period of time - prior to thinking about using that data as location information - then you may well have a smorgasbord of poorly spelt or downright unintelligible addresses in your database. And if you have hundreds of thousands of addresses then you will potentially need to employ a team, for well over a year, to cleanup the data – that’s if you want a high geocoding rate. We have legacy addresses – we have millions of them. In fact we have more addresses on file than there are addresses in Australia! In the past we’ve insured P.O. Boxes! Fortunately for us – a lot of great work was done before we started on geocoding 3 years ago, which meant we had an excellent set of well structured addresses to start with.
Cleaning up your existing addresses is one thing - how you capture your data will keep your data clean. There are 3 things that can be implemented at the point of address capture - whether it be via your own web page or through your internal applications - to capture clean, well structured addresses: 1 - Make sure the data is captured using an appropriate set of structured input fields, with rules on those fields. Preferably using a set of fields based on a standard. 2 - Enforce street types, street suffixes and locality and state names using pre-defined pick lists, not freeform text fields with no rules 3 - Use a rapid address tool, such as Harmony, to auto-populate the street and locality information as the user is typing
Localities...! Local people will sometimes use the local or common name for their area, even though their gazetted locality name is completely different. This causes a few problems: Has anyone ever heard of a place called Glenquarie in SW Sydney? Neither has our geocoding engine! Nor G-NAF! Tamworth is a rural city made up of several suburbs, but everyone says they live in Tamworth. What percentage of our customers in Tamworth do you think give us the wrong locality name? 95% ???? Vanity suburbs could be affecting around 10% of your addresses – they come in 3 main flavours: 1 – The real estate agent told me I live here, so I live here even though it’s the neighbouring suburb 2 – I want to live in the neighbouring affluent suburb so I’ll just use that name 3 – I’ll make one up based on local information Your geocoding engine and G-NAF has some smarts to correct some of these issues, but not all of them. The solution is to create a lookup table of common and gazetted locality names
Localities...! Local people will sometimes use the local or common name for their area, even though their gazetted locality name is completely different. This causes a few problems: Has anyone ever heard of a place called Glenquarie in SW Sydney? Neither has our geocoding engine! Nor G-NAF! Tamworth is a rural city made up of several suburbs, but everyone says they live in Tamworth. What percentage of our customers in Tamworth do you think give us the wrong locality name? 95% ???? Vanity suburbs could be affecting around 10% of your addresses – they come in 3 main flavours: 1 – The real estate agent told me I live here, so I live here even though it’s the neighbouring suburb 2 – I want to live in the neighbouring affluent suburb so I’ll just use that name 3 – I’ll make one up based on local information Your geocoding engine and G-NAF has some smarts to correct some of these issues, but not all of them. The solution is to create a lookup table of common and gazetted locality names
Looking at your geocoding engine and it’s potential limitations... The key point is to identify the weaknesses in it’s address matching process, and to work around those issues where possible. Firstly – don’t just accept the default configuration out of the box. Test the system, reconfigure it and test again. If you really want to stress test the geocoding engine – input the entire raw G-NAF database and see what results you get? You won’t get 100% but you should get high 90’s Secondly, if you find limitations in the system – add your own custom logic to it. Lastly, the most obvious one: talk to your vendor: log the bugs and change requests if you want the system to perform better
Looking at your geocoding engine and it’s potential limitations... The key point is to identify the weaknesses in it’s address matching process, and to work around those issues where possible. Firstly – don’t just accept the default configuration out of the box. Test the system, reconfigure it and test again. If you really want to stress test the geocoding engine – input the entire raw G-NAF database and see what results you get? You won’t get 100% but you should get high 90’s Secondly, if you find limitations in the system – add your own custom logic to it. Lastly, the most obvious one: talk to your vendor: log the bugs and change requests if you want the system to perform better
In closing – let’s look at some G-NAF issues related to geocoding rates
TIME CHECK So how good is G-NAF as a reference address dataset? Good enough to give us a greater than 95% property level match rate, but it’s not perfect... The PSMA are well aware of this and looking into solutions. There are some errors that creep into the data and there are some significant challenges to make it a more complete reference set of geocoded Australian addresses On a positive note: resolving the issue of timely G-NAF updates is well and truly underway
Errors are a part of any large dataset with a reasonably complicated schema – and G-NAF is no different In our experience – these errors are mostly transient things ranging from amusing to having a business impact. They usually aren’t a symptom of wider data quality issues. The bad news is they can impact a customer. And in our case that could mean their premium goes up or down between policy renewals unexpectedly. So we tightly manage pricing impacts whenever we update G-NAF or re-geocode our customers. This problem can also occur between G-NAF versions when coordinates move significantly for valid reasons. We haven’t been actively logging bugs with the PSMA due to the competitive nature of insurance, that is now changing – so we owe the product team from PSMA a few emails. But, I couldn’t help listing a few of my favourites from the last 3 years: Units whose base address was up to a 1km away Addresses associated to the wrong duplicate locality (Hillgrove Wagga/Armidale) 500km away! Alias and principal addresses with differing coordinates – that one is a bit more serious and requires our manual intervention to ensure customers weren’t affected. My recommendation, if geocoding is important, is to – apply database schema rules to G-NAF and do your own QA to ensure any little things that have crept into the data don’t impact your business operations
Errors are a part of any large dataset with a reasonably complicated schema – and G-NAF is no different In our experience – these errors are mostly transient things ranging from amusing to having a business impact. They usually aren’t a symptom of wider data quality issues. The bad news is they can impact a customer. And in our case that could mean their premium goes up or down between policy renewals unexpectedly. So we tightly manage pricing impacts whenever we update G-NAF or re-geocode our customers. This problem can also occur between G-NAF versions when coordinates move significantly for valid reasons. We haven’t been actively logging bugs with the PSMA due to the competitive nature of insurance, that is now changing – so we owe the product team from PSMA a few emails. But, I couldn’t help listing a few of my favourites from the last 3 years: Units whose base address was up to a 1km away Addresses associated to the wrong duplicate locality (Hillgrove Wagga/Armidale) 500km away! Alias and principal addresses with differing coordinates – that one is a bit more serious and requires our manual intervention to ensure customers weren’t affected. My recommendation, if geocoding is important, is to – apply database schema rules to G-NAF and do your own QA to ensure any little things that have crept into the data don’t impact your business operations
Aside from the obvious candidates of missing reference addresses and addresses without geocodes - there are 2 key issues regarding the completeness of G-NAF: Postcodes are often viewed as a non-critical part of a structured address. That point of view ignores the fact that the address matching process works best with the maximum amount of information, and postcodes are a valuable piece of information that should be included in the process. However, currently, postcodes only exist for localities with duplicate names within states. Having a postcode, where applicable, for all G-NAF localities has been on the cards for a little while – but we’d like to see them added as it would not only give a good boost to the geocoding rate but also a reduction in false positives. Based on analysis of our customer address - there are about 8% of sub-dwellings not in G-NAF with a geocode – these include townhouse developments, retirement villages, blocks of flats as well as permanent sites at caravan parks – this is the number one area where our results to improve. Also, in large developments having property accurate coordinates, rather than a set of coordinates in the centre of the land parcel, is of great benefit – we’d love to see some more work done in this space as well. Just because an address is retired in G-NAF doesn’t mean you have to retire it. If you’ve got a good match to a retired address, why drop the geocode? Your data can be considered as another valid source of good address information, so why not treat it as a 4th G-NAF data source.
Aside from the obvious candidates of missing reference addresses and addresses without geocodes - there are 2 key issues regarding the completeness of G-NAF: Postcodes are often viewed as a non-critical part of a structured address. That point of view ignores the fact that the address matching process works best with the maximum amount of information, and postcodes are a valuable piece of information that should be included in the process. However, currently, postcodes only exist for localities with duplicate names within states. Having a postcode, where applicable, for all G-NAF localities has been on the cards for a little while – but we’d like to see them added as it would not only give a good boost to the geocoding rate but also a reduction in false positives. Based on analysis of our customer address - there are about 8% of sub-dwellings not in G-NAF with a geocode – these include townhouse developments, retirement villages, blocks of flats as well as permanent sites at caravan parks – this is the number one area where our results to improve. Also, in large developments having property accurate coordinates, rather than a set of coordinates in the centre of the land parcel, is of great benefit – we’d love to see some more work done in this space as well. Just because an address is retired in G-NAF doesn’t mean you have to retire it. If you’ve got a good match to a retired address, why drop the geocode? Your data can be considered as another valid source of good address information, so why not treat it as a 4th G-NAF data source.
Lastly, our take on the near future... New addresses coming in via the Internet or through our branches and telephone consultants use a real-time geocoding service. The geocoding rate we get from this service drops over time, in between quarterly G-NAF updates. In other words, customers are building houses faster than we can get their reference address into the system. This problem shouldn’t be around for too much longer – with the introduction of G-NAF Live in the near future we can potentially have a geocoding system that can be updated far more regularly than every quarter – and that will mostly eliminate this problem and allow us to maintain a very high geocoding rate far more easily.
In summary, some key points I’d like you to take away from today Location information is fundamental to insurance pricing! Focus on the 3 areas of improvement to improve your geocoding rates, 95% at the property level is achievable We’re very excited by G-NAF Live

Improve Geocoding Accuracy For Insurance Risk Assessment and Pricing

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Improve Geocoding Accuracy For Insurance Risk Assessment and Pricing

Similar to Improve Geocoding Accuracy For Insurance Risk Assessment and Pricing (20)

Recently uploaded

Recently uploaded (20)

Improve Geocoding Accuracy For Insurance Risk Assessment and Pricing

Editor's Notes