Presentation from OpenTech June 2015 London. Includes learnings from open addresses, analogies with cholera, thoughts on what this means for data infrastructure.
2. Cholera
The last time I was in this building I went to
a talk on an early example of data analysis
and data visualisation.
John Snow famously traced a fatal cholera
epidemic in Soho in 1854 to a local water
pump.
Because of cholera in the pump the water
was not safe to use.
Read more about John Snow: http://en.wikipedia.org/wiki/John_Snow_%28physician%29
@peterkwells
3. Cholera and infrastructure
The Soho outbreak started at a
water pump, it could have been a
water reservoir.
The cholera bacteria would
spread and contaminate the water
downstream. An entire set of
water infrastructure could have
been contaminated.
The water would not have been
safe to use. Yet water is essential
to life.
Image CC-BY-2.0 by Woodley Wonderworks: https://www.flickr.com/photos/wwworks/
@peterkwells
4. Safe water
As a society we invest in water
infrastructure. We have:
- inspections
- alerting systems
- purification
- education
We put more focus at the top of the
infrastructure, on water producers and
distributors, than we do on water users.
The goal is to make water that’s safe
for people to use.
A Doctor from the World Health Organisation
@peterkwells
6. Open Addresses
Organisations have to buy lists of UK addresses, licensing is complicated, the
quality isn’t great, the data doesn’t meet all the needs.
It’s hard to build new services.
Open Addresses explored whether it was possible to build a new UK address list,
to make things simpler and make addresses more widely used.
@peterkwells
7. Addressing needs
Denmark had a 1000% increase in the organisations that use address data by
making address data simpler to use.
We discovered other needs and benefits:
- people who move into new houses need their addresses to be published faster
- people name their houses and need other people to know about it
- people need it to be easier to enter addresses on websites
- (I could go on…)
@peterkwells
More and better services that would make life
a little bit easier
8. Getting addresses
As well as understanding the needs we had to find data.
There are 26-40m addresses in the UK.
The Land Registry publishes over 18 million addresses in the Price Paid Dataset.
Sounds great!
@peterkwells
Aside: we also did some neat stuff on mathematical inference for addresses.
Check out www.openaddressesuk.org...
9. Land Registry says no...
Image from Owen Boswarva: http://mapgubbins.tumblr.com/post/107499166390/it-was-all-a-dream-land-registrys-price-paid
@peterkwells
10. Third Party Rights are
complex and can be fatal
Address datasets can include third-party database rights:
1. if the data was directly copied from an existing address database
2. if an existing list of addresses (obtained through another route) was corrected or
validated based on an existing address database
Unauthorised use of third party rights creates risk for both data publishers and
consumers.
The service can simply…... stop.
@peterkwells
11. Third party rights, they’re
everywhere!
As we inspected other datasets we saw similar issues with unauthorised rights:
- websites for data capture that used third party address products
- datasets that had been cleansed with third party address products
- a clean website followed by automated back-end validation
Even with submission guidelines, provenance tracking and takedown policies the legal
position for Open Addresses was really complex.
We made a :(
@peterkwells
12. Lightbulb
It is complicated to determine if unauthorised third party rights
exist. You need to inspect the data and how it was produced
@peterkwells
Image by Richard Rutter: https://www.
flickr.com/photos/clagnut/
13. Safe water - a reprise
As a society we invest in water
infrastructure:
- inspections
- alerting systems
- purification
- education
We put more focus at the top of the
infrastructure, on water producers and
distributors, than we do on water users.
The goal is to make water that’s safe
for people to use.
Image CC-BY-2.0 by Woodley Wonderworks: https://www.flickr.com/photos/wwworks/
@peterkwells
A Doctor from the World Health Organisation
14. Digital cholera
@peterkwells
Copyright is a good thing (don’t believe me? ask a musician) so I’m using a harsh metaphor, but
the metaphor is useful.
Don’t take away
my copyright!
15. Digital cholera
@peterkwells
The water may be infected with
cholera.
Therefore we inspect it to see if
the water is safe to use.
Land Registry address data may
be infected with digital cholera.
Therefore we inspect it to see if
the data is safe to use.
We learnt it wasn’t so we didn’t….
16. Digital cholera
@peterkwells
Not just about unauthorised third party rights.
Inappropriate releases of personal data.
Incomplete data.
Incorrect data.
Remember it’s a metaphor.
19. Purification?
@peterkwells
Tricky. There is no equivalent of a purification tablet.
We need to cleanse data infrastructure of digital cholera or we need to rebuild it.
It is simplest if the data is kept pure by whoever creates and maintains it.
Just as with water.
20. Education
@peterkwells
The ODI already have a wealth of education material and are including the thinking and
learning from Open Addresses in some future work:
Send your ideas more here:http://theodi.org/who-owns-our-data-infrastructure?
21. Water is essential to life so we invest in
maintaining our water infrastructure to make
water safe to use.
Data gives us more and better services. It is is
essential to life. We need to invest in
maintaining useful data infrastructure to make
data safe to use.
@peterkwells
22. @peterkwellsImage by Don Graham: https://www.flickr.com/photos/23155134@N06/
If we don’t look after our
data infrastructure we risk
simply ending up with
some rusty and unused
data pumps….