Who am I? Richard Cantwell, GIS Specialist / Technical Manager with GAMMA. I’ve been in GIS since 1991, so just about 20 years of experience working with GIS Who are GAMMA? Founded in 1993, an Irish GIS company specialising in Geocoding, Demographics, GIS consulting and MapInfo sales, development and training.
Gamma have been working with Irish Addresses for a long time. Our combined Institutional knowledge of addresses is well over 100 years I represent about 1% of this.
Irish Addressing is a problem. A well known problem. You’ve probably come across it yourselves. Whether it's non-unique addresses (in townlands typically) – and as many as 35% of Irish Addresses fall into this category Or the problem of Shrinking / growing suburbs – take Ballymun and Glasnevin as an example There are many other issues too This talk is about some of the Do’s and Don’ts which should go some way to reducing some of these problems.
The first Do/Don’t – and there are 10 in total. In Ireland we’ve had a particularly Lassez Faire attitude to addressing.. People often make up their own, and people being people they’re often aspirational, and sometimes even wrong!
Here’s an example, 3 different addresses, all within the same building. The centre of Castleknock (Myo’s pub, let’s say) is a good 3km away on the other side of the M50 from this building, by the way.
This is what the official GeoDirectory Address Model is for that building – Porterstown? Nobody mentioned that. Some areas don’t exist in the GeoDirectory database as part of the standard address model – Christchurch for example.
An Aside.. Does Porterstown appear on maps? Not on the OSi Dublin Streets
Or Bing (in fact not much in the way of area names at all on Bing Mapping)
More area names on the OSM map. Anybody can add them there of course, but no Porterstown
Google has it, but over the canal, further away then Annfield.
Ah, there it is – on the Discovery Series
Another example, this is a rural one. Take the townland of Roskeen. What is it’s address? Roskeen Co. Laois? But it’s a very small place, and there are 50,000 townlands in the country. So people often put the name of the nearest town or village – Killeigh in this case, but Killeigh is across the county boundary in Offaly. Maybe a larger town – one that’s in the same county – Portarlington, or one that isn’t – Tullamore. What happens if you're sanitising your data inputs and somebody has put the address as Roskeen, Tullamore, Co. Laois – but Tullamore is in Offaly? And I haven’t touched on the non-unique address issue that many of you are familiar with yet.
Here’s a sample return from the address matching system we use – AutoAddress. You can see that there is a Locality level match, and there are 11 addresses in the townland.
We can plot these addresses on a Google Map
Or even look at them in StreetView, and see the markers indicating the address points. No, those markers aren't in Google Maps because people stood outside their houses holding a big pin when the streetview car drove past. It's pulling the LatLong from the GeoDirectory Database and drawing the markers onto the Streetview image.
So GeoDirectory stores proper addresses, but most of us don’t know what our proper address is Even if we did we would probably want to stick with an address we’ve always used and is familiar to us – maybe our address is a bit ‘aspirational’, but we want to keep it.
But this causes a problem – if we’re going to conduct proper geographic analysis based on an address we need to verify it against GeoDirectory – But we need to preserve the vanity elements that aren’t in GeoDirectory.
Here’s an example. Note the misspelling being fixed, and the Millfarm element of the address, which is part of the proper GeoDirectory Address may be presented to the user, but isn’t stored as their confirmed address. Perhaps they don't want that part of the address used for some reason.
Depending on where the user lives (City centre or Rural Townland) and on your project requirements different levels of match can be obtained, or are appropriate .
What purpose are you using the address for? If it’s for the classic case of Ambulance routing, for example, you need a precise match to a building / address, if you don’t get this level of match then you need to handle that – presenting the user with a map to click on might be an option, but his is an extra step and is another hoop for the user to jump through.
However for Insurance purposes a locality level match might be sufficient – if that entire locality is subject to flooding for example. And this can be mix and match, you might need precise building level data for certain parts of the country and locality level or town level matching in others .
Here’s an example of a return showing differing levels of match. The ‘L’ indicates a locality level match, ‘A’ for address point, ‘B’ for building and ‘T’ for for Town and S for Street (or Thrufare, as GeoDirectory call it)
This tip is a bit conceptual. In some cases not all parts of the address which your customer enters will be matched against GeoDirectory. These can include things such as house names, streets or townlands which are not in GeoDirectory or cases where, despite all of the intelligence in the matching system, the correct address element simply can’t be pulled from the database based on what the user has entered.
An example might be where the user has supplied a streetname that doesn’t seem to exist in the town they have specified. This information is not useless, it can help in two ways; Firstly we can decide that the unmatched part of the address means that we can’t consider the match as valid, so have to step down to a less granular level– in our example that could mean that we decide that the town level match is invalid and we need to step down to the county level match, while flagging the record for review by a human.
Alternatively we can decide that the match is valid. For example we may have a match to Eyrecourt, with an unmatched street element – we can decide that the match to the town is valid, as the unmatched part contains the word ‘Street’ or ‘Road’ and so on. Alternatively the unmatched part could refer to a townland miles away from the town. So we would step down to a county level match, which would mean that there is a flood risk.
It would make all our lives much simpler if everyone was enthused about filling in forms with their address stated as accurately as possible, but in reality this is something the user wants to complete as quickly as possible.
Our primary concern is accuracy, so we are willing to concede simplicity or speed – making things complex and/or slow for the user. However the user wants things to be fast and simple and is not usually concerned with accuracy at all. So we need to make a trade-off
Based on usage metrics we’ve gathered from AutoAddress
It’s tempting to ask a user to search again if you can’t find an address, but most will give up after a couple of attempts. There is no sense in trapping a user in an endless loop searching and re-searching for an address that doesn’t seem to match.
Some addresses just do not match, this may be because An Post have not yet captured the address or it cannot be found in the database even with all of the intelligence within the address validation software. We estimate that 92% of all addresses can be matched, and if you're dealing with addresses in towns that have 700 people or more then this increases to 98% It is important to accept this and plan for it – doing so will mean that customers are not lost to the process .
Unless you’re a start-up you almost certainly have IT infrastructure in place already. This infrastructure probably stores addresses in a particular number of fields and may have restrictions on the number of characters these fields can hold and so on.
It will be tempting to simply set up address capture forms that mirror your existing infrastructure. Instead, from the user’s point of view it is much simpler to add a couple of address lines, and maybe a county field – the user then doesn’t have to figure out what to put where, and this can be worked out afterwards and then reformatted to fit the existing database. It is also good practice to retain the address which the user entered .
Entered Address, An Post Standard address, Vanity Address, An Post Address corrected with vanity elements.... At some point you will ask yourself &quot;What is the correct address anyway?&quot; When you do, remember that, no matter what, your user has told you what they consider their address to be and you should use this when corresponding with them. You may come under some pressure to ignore this principle, some common arguments include:
By all means store the An Post GeoDirectory standard address. However, use a User validated address in communication - you may find as many as 20% of GeoDirectory addresses are undeliverable
No problem with you verifying that someone lives at an address based on current bills. However, don't let the address capture mistakes of the utilities company become your own. (At least Donald Rumsfeld knows where his famous Unknown Unknown lives)
Real world addresses are useful. However, unless you have matched everyone’s address, you will not pick up on vanity issues in individual cases and will be back arguing with users about their address.
This may feel like a reasonable argument if addressing is starting to feel a bit of a hurdle. However, resist the urge to just take the easy way out. If you do, you will miss a great opportunity to capture user addresses with vanity information included - information which cannot be reverse engineered when the boss comes and asks that validation is extended to include correspondence addresses.
I imagine the audience here today is well aware of the value of locational information. However the power of having properly geocoded and cleaned addresses in your customer database might be something which you have not fully considered.
Most organisations manage a wide variety of databases. Especially in large organisations sharing information between different departments and sections can be difficult – you run the risk of duplicating data, especially when dealing with customer details.
As a result many departments have their own ‘Data Silos’ which they are very protective of. But there is huge value to be gained by unifying these silos, and thus gaining a ‘Single View of the Customer’ Properly geocoded, cleaned and deduplicated data can go a long way to achieving this aim.
Moving beyond ‘unknown unknowns’ there are several ‘known’ changes coming down the track. The first of these is the much mooted Postcode system. It is going to happen, but nobody knows yet what form it will take or when it will be introduced.
As far as we are currently aware there are three consortiums who are talking to the Department about the proposed system. We don’t yet know what form it will take, or when it will be introduced, but it could be by the end of 2012.
One thing we can be sure about in the introduction of the Small Areas, with Census Data to match. This is an example of the kind of increased granularity available with Small Areas,, moving from a single DED to about 120 small areas in this particular (extreme) example. Census 2011 data at this level is expected by the end of 2012.
There are also changes brought on by Social Media services like FourSquare et al. Weather these are a flash in the pan remains to be seen, but their POI database is huge and they recently passed 1 Billion checkins. This data can then be easily consumed in a range of formats and displayed on an ever widening range of mapping platforms, many of which are based on OpenStreetMap
An example of a new type of Mapping Platform, which we use a lot in GAMMA, is Google Fusion Tables. This is a very powerful way of displaying data and making it visible to the public while retaining control over it.
Thanks for your time. If you have any questions we’ll take them at the end of this session, but you can also contact me at this email address, and I’ll be at the GAMMA stand in the exhibition hall for the rest of the day.
The Do’s and Don’ts of Irish Addressing
The Do’s and Don’ts of Irish Addressing Richard Cantwell [email_address] www.gamma.ie
# 1 DO: Be prepared for many variations on the same address, and don't argue with users * about the one they use * Users, Customers, Citizens..
It isn't uncommon for neighbours along a street to write their addresses differently: 56 Woodbrook Sq., Castleknock, Dublin 15 42 Woodbrook Sq., Diswellstown Rd Clonsilla, Dublin 15 34 Woodbrook Sq., Carpenterstown, Dublin 15