In this presentation I talk about why open addresses are a public good, and the mechanisms that we're using within Open Addresses to navigate the legal and technical challenges of building an open address list for the UK.
Unblocking The Main Thread Solving ANRs and Frozen Frames
BCS Address Day - Open Addresses
1. Address Day
what next after the Address Wars
Jeni Tennison - @JeniT
5 March 2015
https://openaddressesuk.org
@openaddressesuk
2. In economics, a public good is a good that is
both non-excludable and non-rivalrous in
that individuals cannot be effectively
excluded from use and where use by one
individual does not reduce availability to
others.
Wikipedia - Public good
3. "Tompkins Square Park Central Knoll" by David Shankbone - (CC BY-SA 3.0) via Wikimedia Commons
6. Address data should be open data
● National Information Infrastructure
● Not just for posting mail...
○ geocoding for route finding
○ associating people with areas
○ classification for targeting interventions
○ linking datasets together
● Denmark has taken this step
○ 1000% increase use of address data
○ costs = €0.2M - benefits = €14M
7. Current real life problems
● startup wanting to build an application
○ prohibitive costs
○ prohibitive licensing complexity
● SME with a geodemographic product
○ prohibitive costs
○ limiting customer base & growth
● New build owners
○ 3 months to register to vote, order pizza
8. Funding public goods
● Government via taxation
● Collaborative bound by contract
● Cross-subsidy by selling other goods
● Voluntary effort
● Social norms
9. "The sale of the PAF with the Royal Mail was a mistake.
Public access to public sector data must never be sold or
given away again. This type of information, like census
information and many other data sets, is very expensive
to collect and collate into useable form, but it also has
huge potential value to the economy and society as a
whole if it is kept as an open, public good."
Bernard Jenkin, Chair of Public Administration Select Committee
10. Hypothesis 1: the maintenance of open address
data can only be effectively funded through
taxation
Hypothesis 2: it is possible to build and maintain
a sustainable open address database using
collaboration, cross-subsidy and voluntary effort
11.
12. Goals
● Free, openly licensed, up-to-date bulk
downloads of addresses
● Freemium services over that data
○ eg validation, auto-completion, geocoding
● 100% open source, collaboratively
maintained
● Initial ~£400k investment from government
○ compared with £25M annual cost maintaining PAF
13. Eventual Architecture
“Definitive” UK address list
- where the address data is safe to use
- where each record has confidence and provenance
Bulk
- Download
- Upload
APIs
- Add
- Sort
- Validate
- Search
URLs
- Linked data
- Extensibility
Service Providers
Aggregators, digital, telecoms, public sector, distribution, academics, manufacturers etc
Services
- Websites,
Users
Value
Revenueforsustainability
14. This takes time
Large
datasets and
inference to
tackle the
bulk of the
challenge
“80/20” rule
Ongoing,
collaborative
maintenance
Targeted
work. Low-
volume
records to fill
existing gaps
in available
datasets
NB: dates are “just for fun”
15. Approaches
1. Load open datasets containing addresses
2. Build out crowdsourcing mechanisms
3. Use inference to fill gaps
and throughout:
● keep track of provenance
● keep track of confidence
16. Loading datasets
Third Party IPR
Possibly infected if validated
against PAF or AddressBase
⇒ most Government “open”
data is infected
A few not:
● Companies House
● err...
17. Platform for loading bulk data
Originally developed for OpenCorporates
Sandboxed environment for running scripts
18. Motivating crowdsourcing
Bulk
- Download
- Upload
APIs
- Add
- Sort
- Validate
- Search
URLs
- Linked data
- Extensibility
Value
Building Blocks
- towns, postcodes, streets
- used to parse data and provide
confidence in the address list
- links between towns, postcodes
and streets are learned from
addresses
Authoritative and definitive UK
address list
- where the address data is safe to
use
- where each record has
confidence and provenance
Revenueforsustainability
19. ● Turn free-text
addresses into
building blocks
● Can be used with data
containing third party
IPR
● Optional “contribute”
option
Address parsing service
26. St James House, St James Square, Cheltenham, GL50 3PR
7, St James Square, Cheltenham, GL50 3PT
St James North 1, St James Square, Cheltenham, GL50 3PR
St James North 3, St James Square, Cheltenham, GL50 3PR
3, St James Square, Cheltenham, GL50 3PR
St James House, St James Square, Cheltenham Spa, GL50 3PR
St James North 1, St James Square, Cheltenham, GL50 3PR
St James Place, Jessop Avenue, Cheltenham, GL50 3PR
St James House, St James Square, Cheltenham, GL50 3PR
Apt. 3, St James Place, Jessop Avenue, Cheltenham, GL50 3PR
56, Cheltenham Road, London, SE15 3AR
Calculating confidence
27. St James House, St James Square, Cheltenham, GL50 3PR
7, St James Square, Cheltenham, GL50 3PT
St James North 1, St James Square, Cheltenham, GL50 3PR
St James North 3, St James Square, Cheltenham, GL50 3PR
3, St James Square, Cheltenham, GL50 3PR
St James House, St James Square, Cheltenham Spa, GL50 3PR
St James North 1, St James Square, Cheltenham, GL50 3PR
St James Place, Jessop Avenue, Cheltenham, GL50 3PR
St James House, St James Square, Cheltenham, GL50 3PR
Apt. 3, St James Place, Jessop Avenue, Cheltenham, GL50 3PR
56, Cheltenham Road, London, SE15 3AR
Calculating confidence
28. Sector Town Count Total Confidence
...
HD3 4 HUDDERSFIELD 66 66 87.71%
...
DG8 6 NEWTON STEWART 11 12 65.69%
DG8 6 STRANRAER 1 12 0.00%
DG8 7 NEWTON STEWART 1 1 0.00%
...
W3 6 LONDON 196 196 92.96%
...
CH44 4 WALLASEY 23 29 76.06%
CH44 4 WIRRAL 6 29 8.22%
Calculating confidence
This postcode/town association is right but
confidence is low because of the low count
This postcode/town association is incorrect
Another correct postcode/town association,
but with a higher count
This is what happens when post towns are
re-organised; Wirral is now split in
Birkenhead, Wallasey, Wirral and Prenton
This is how a correct postcode/town
association looks like
30. Summary
● Built most of the supporting platform
○ parsing free text / messy addresses
○ collaborative loading of data
○ providing downloads, search & URL identity
○ recording provenance & assigning confidence
○ using inference to fill in gaps
● We have low numbers of addresses currently
○ but the right mechanisms to add more
○ and many potential partners
31. What next?
● Building the platform
● Building the community of collaborators
● Building services to aid cross-subsidy
● Increasing quantity & quality of addresses
● Can anyone else reuse the technology?
● Can anyone else reuse the approach?
32. Any Questions?
@JeniT - jeni.tennison@openaddressesuk.org
https://openaddressesuk.org
info@openaddressesuk.org
@openaddressesuk
33. Open Addresses Ltd. is a new company being set
up to create and maintain an address database
for the UK that will be made available to the
public as Open Data. It will facilitate the
collaborative maintenance of the address
database with various stakeholders from the UK
Government, industry and non-profit.
Offices
Where?