SlideShare a Scribd company logo
1 of 37
Joining Data Together
1
John Murray
@MurrayData
Joining Data Together
2
• The real value of data is not the data itself, but the insights
derived from it.
• To achieve maximum economic benefit from data, disparate
sources need to be joined:
• Appending socio-demographic data to a customer
database for marketing insights.
• Merging crime data with benefits and deprivation data
to analyse causes of crime.
• Joining NHS mortality and prescribing data to census
data to examine factors in poor health.
• Geography is a common "currency" in much Open Data
which allows us to join it.
Geography Types in Open Data
3
• Census geography.
• Output areas.
• Administrative geography.
• Local authorities, regions, NHS areas, Police Forces.
• Political geography.
• Electoral wards, Parliamentary constituencies.
• Postal geography.
• Postcodes, sectors, areas, districts.
• Unstructured geography.
• Spatial points.
• Bespoke catchments, e.g. retail stores.
Census Geography
4
• Hierarchy of published area statistics.
• Output Area (OA)
• 40-250 households.
• Lower Super Output Area (LSOA)
• 400-1200 households.
• Middle Super Output Area (MSOA)
• 2000-6000 households.
• Links to administrative geography
• Open data geography tables:
• ONS Postcode Directory (ONSPD)
• National Statistics Postcode Lookup (NSPL)
Administrative and Political Geography
5
• Local Authorities.
• District.
• County.
• Metropolitan Boroughs and Unitary Councils.
• Parish and Town Councils.
• Parliamentary Constituencies.
• Government Regions.
• NHS.
• Police Forces.
• Environment Agency Regions.
• Links Provided in ONSPD and NSPL.
Postal Geography
6
• Based around the postcode.
• Introduced in 1959 on a trial basis.
• Current UK system in use since 1967.
• Designed for the purpose of efficient delivery of
mail.
• Doesn't align exactly with Census and
Administrative Geography.
• 1.8 million postcodes currently in use.
• Mean number of "delivery points" is 14.
Anatomy Of A Postcode
7
CH1 2HS
• CH – Postcode Area
• CH1 – Postcode District
• CH1 1 – Postcode Sector
• CH1 2HS – Postcode
• "HS" is called the walk.
• CH1 referred to as the Out code
• 2HS is referred to as the In code
Postcode Facts
8
• Postcode mean 14 delivery points.
• Postcode sector mean 2530 delivery points.
• Postcode district mean 9080 delivery points.
• Postcode area mean 200,000 delivery points.
• 26 million delivery points in UK.
• Ordnance Survey Codepoint Open, ONSPD and
NSPL contain grid references for postcode
centroids.
Joining Data
9
• In most cases, use ONSPD
• Although approximate, good enough for most uses.
• Political and public sector, use NSPL
• Specifically designed for that purpose.
• Use postcode to join data.
• Can join individual/household data.
• Augment existing data, e.g. customer database
• Customer demographic profiling.
• Store catchment analysis.
• Join open and closed data sources.
• Common in many open data sources.
• Links easily to other levels of geography.
Joining Data
10
Geospatial Data in Databases
11
• Spatial data types
• Point (single point)
• Line (set of joined points e.g. road)
• Polygon (closed set of joined points e.g. boundary)
• Most database support spatial data types
• Proprietary e.g. MS SQL Server, Oracle.
• Open source: MySQL, MariaDB, PostGreSQL
• NSQL: Neo4J, MongoDB, PostGIS
• Spatial queries
• Contained in (point in polygon).
• Intersects (crosses).
• Distance (not supported by all).
Example of Polygon Data
12
Distance Metrics
13
• Euclidean Distance
• “Crow flies” linear distance
• Graph Distance
• Road distance
• Manhattan Distance
• Rectilinear distance
• Great Circle
• Shortest distance between two points on the surface
of a sphere
Euclidian Distance
14
• University of Chester to
Liverpool Airport.
• Euclidean distance 9.4
miles.
• Manhattan distance 11.1
miles.
• Graph distance (fastest)
24.5 miles.
• Used OS Strategy Roads
Opendata and A*
algorithm.
Non-Formal Unstructured Geography
15
• Micro geo-centric analysis
• Describe neighbourhood
• Point based data
• Relate to formal geography through boundaries.
• User defined
• Store catchments
• Sales territories
• Radial/drive time
Point Based Data
16
• The simplest type of spatial object.
• Represents a point relative to the Earth's surface.
• Has at least 2 values for coordinates.
• May optionally have an elevation z value in some
systems.
• Ordnance Survey grid references are Cartesian
Coordinates, in metres, east and north of origin
point.
Converting Between Systems
17
• Use GIS software or conversion software.
• Scripts freely downloadable from Ordnance Survey and
others.
• Ordnance Survey provide comprehensive guides and
resources to write your own scripts.
• Unfortunately, it ISN’T as straightforward as using a
formula.
• Need to take into account tectonic shifts and historic
inaccuracies in surveying.
• OS provides a dataset of shifts to do this.
Geocentric analysis
18
• Use point as centre.
• Use Euclidian distance to aggregate metrics.
• Standardise units.
• Example – population density at postcode level:
• Census Postcode Estimates
• Ordnance Survey Code-Point Open
• Join the datasets.
• Sum the counts within specified radius.
• Convert to standardised unit e.g. people/hectare
Geocentric analysis
19
Geocentric analysis
20
Postcode 1km 750m 500m 250m 100m EA NO
CH1 4BA 17.24 15.71 17.01 21.13 12.41 340282 367773
CH1 4BB 16.26 14.56 14.23 16.04 20.05 340178 367782
CH1 4BD 16.01 13.43 14.48 15.48 20.05 340143 367784
CH1 4BE 7.72 6.85 5.67 4.32 8.91 339532 368352
CH1 4BF 14.26 12.58 13.58 16.9 13.36 340105 367827
CH1 4BG 14.52 15.36 11.45 9.47 25.78 339790 367448
CH1 4BH 12.84 11.53 8.25 9.37 5.09 339647 367485
CH1 4BJ 19.21 24.05 23.49 23.73 8.91 340104 367217
CH1 4BL 12.84 11.53 8.25 9.37 5.09 339647 367485
CH1 4BN 15.95 17.04 20.1 23.37 36.92 339857 367238
CH1 4BP 13.64 10.3 9.51 14.26 19.73 339982 367851
CH1 4BQ 14.89 16.43 13.2 11.2 25.78 339834 367446
CH1 4BR 18.56 23.03 26.61 24.95 8.91 340059 367161
CH1 4BS 14.64 17.14 21.11 21.13 28.32 339806 367194
CH1 4BT 15.01 15.83 14.03 10.13 41.06 339791 367360
CH1 4BU 15.58 16.85 17.22 16.75 35.01 339827 367314
CH1 4BW 14.87 16.89 17.46 21.95 46.15 339813 367264
CH1 4BX 18.48 16.17 20 21.03 25.46 340323 367747
CH1 4BY 16.06 17.8 22.62 37.84 54.43 339843 367069
CH1 4BZ 15.22 17.29 21.83 33.97 41.38 339808 367083
CH1 4DA 13.37 12.59 10.42 15.53 15.59 340163 367955
CH1 4DB 12.94 9.84 6.48 6.51 18.46 339621 367567
CH1 4DD 14.59 16.39 20.99 35.54 41.38 339768 367073
CH1 4DE 16.15 18.22 27.15 53.62 9.54 339871 366882
CH1 4DF 13.31 14.62 19.21 32.08 27.69 339715 367101
CH1 4DG 14.21 15.69 23.1 41.6 49.01 339759 366931
CH1 4DH 16.34 18.96 28.95 40.64 28.96 339934 366769
CH1 4DJ 15.58 17.63 24.8 53.27 34.37 339837 366904
CH1 4DN 11.4 12.44 16.5 18.84 33.42 339585 366986
CH1 4DP 12.25 13.99 18.69 24.64 21.64 339644 367016
CH1 4DR 25.61 29.57 30.96 26.94 10.5 340393 366940
Wirral Population Density
21
Wirral Anti-Depressant Prescribing
22
Chester Postcode Crime Density (500m)
23
INSPIRE Directive
24
• INfrastructure for Spatial InfoRmation in Europe.
• EU Directive since May 2007.
• Lays down framework for spatial information.
• Aim is ensure compatibility and usability across
member states.
• Interoperability of spatial datasets.
• Metadata standards.
• Ordnance Survey Opendata.
• Land Registry Cadestral Polygons.
INSPIRE Example – Land Registry Cadestral Polygons
25
Street Level Data
26
• Use proximity to street geometry to link
attributes.
• Interrelation between features.
• Inference of addresses.
• Describe local neighbourhood.
Street Level Data Demo – OS OpenMap
27
Screenshot 1 - Roads
28
Screenshot 2 – Add railways
29
Screenshot 3 – Add buildings
30
Screenshot 4 – Add functional sites
31
Screenshot 5 – Add important buildings
32
Screenshot 6 – Add water
33
Screenshot 7 – Add postcode centroids
34
Screenshot 8 – Add INSPIRE polygons
35
Screenshot 9 – Add Census output areas
36
Screenshot 10 – Add proportion of 65+
37
Key: Red high, yellow average, blue low

More Related Content

Viewers also liked

Nc univers technologique chap 7
Nc univers technologique chap 7Nc univers technologique chap 7
Nc univers technologique chap 7ameliani
 
英国公共图书馆服务评估:目标、指标、标准、变动
英国公共图书馆服务评估:目标、指标、标准、变动英国公共图书馆服务评估:目标、指标、标准、变动
英国公共图书馆服务评估:目标、指标、标准、变动maolins
 
Spatial functions in MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others
Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and othersSpatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others
Spatial functions in MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and othersHenrik Ingo
 
Palatul lui Gigi Becali
Palatul lui Gigi BecaliPalatul lui Gigi Becali
Palatul lui Gigi BecaliFrescatiStory
 
NoSQL Tel Aviv Meetup#1: NoSQL Data Modeling
NoSQL Tel Aviv Meetup#1: NoSQL Data ModelingNoSQL Tel Aviv Meetup#1: NoSQL Data Modeling
NoSQL Tel Aviv Meetup#1: NoSQL Data ModelingNoSQL TLV
 
Localizando uma Agulha: usando MongoDB para dados Geolocalizados
Localizando uma Agulha: usando MongoDB para dados GeolocalizadosLocalizando uma Agulha: usando MongoDB para dados Geolocalizados
Localizando uma Agulha: usando MongoDB para dados GeolocalizadosCaliel Costa
 
TDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQLTDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQLtdc-globalcode
 

Viewers also liked (7)

Nc univers technologique chap 7
Nc univers technologique chap 7Nc univers technologique chap 7
Nc univers technologique chap 7
 
英国公共图书馆服务评估:目标、指标、标准、变动
英国公共图书馆服务评估:目标、指标、标准、变动英国公共图书馆服务评估:目标、指标、标准、变动
英国公共图书馆服务评估:目标、指标、标准、变动
 
Spatial functions in MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others
Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and othersSpatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others
Spatial functions in MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others
 
Palatul lui Gigi Becali
Palatul lui Gigi BecaliPalatul lui Gigi Becali
Palatul lui Gigi Becali
 
NoSQL Tel Aviv Meetup#1: NoSQL Data Modeling
NoSQL Tel Aviv Meetup#1: NoSQL Data ModelingNoSQL Tel Aviv Meetup#1: NoSQL Data Modeling
NoSQL Tel Aviv Meetup#1: NoSQL Data Modeling
 
Localizando uma Agulha: usando MongoDB para dados Geolocalizados
Localizando uma Agulha: usando MongoDB para dados GeolocalizadosLocalizando uma Agulha: usando MongoDB para dados Geolocalizados
Localizando uma Agulha: usando MongoDB para dados Geolocalizados
 
TDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQLTDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQL
 

Similar to Joining Disparate Data Sources for Insights

CI_SIModule_QGIS.pptx .
CI_SIModule_QGIS.pptx                         .CI_SIModule_QGIS.pptx                         .
CI_SIModule_QGIS.pptx .Athar739197
 
Merging statistics and geospatial information - demography / commuting / spat...
Merging statistics and geospatial information - demography / commuting / spat...Merging statistics and geospatial information - demography / commuting / spat...
Merging statistics and geospatial information - demography / commuting / spat...Mirosław Migacz
 
PIAS 2013-GIS.pptxfskjczjsbchdbfscnnND dHSA
PIAS 2013-GIS.pptxfskjczjsbchdbfscnnND  dHSAPIAS 2013-GIS.pptxfskjczjsbchdbfscnnND  dHSA
PIAS 2013-GIS.pptxfskjczjsbchdbfscnnND dHSAFloridaTLaoaten
 
FinalPresentation-GradProject
FinalPresentation-GradProjectFinalPresentation-GradProject
FinalPresentation-GradProjectManabu Mukohyoshi
 
Guidance to-use-lending-data for local authorities
Guidance to-use-lending-data for local authoritiesGuidance to-use-lending-data for local authorities
Guidance to-use-lending-data for local authoritiesRichard Browne
 
IDMA/Experian Presentation: Justin Gleeson, AIRO
IDMA/Experian Presentation: Justin Gleeson, AIRO IDMA/Experian Presentation: Justin Gleeson, AIRO
IDMA/Experian Presentation: Justin Gleeson, AIRO Justin Gleeson
 
Esta ld -exploring-spatio-temporal-linked-statistical-data
Esta ld -exploring-spatio-temporal-linked-statistical-dataEsta ld -exploring-spatio-temporal-linked-statistical-data
Esta ld -exploring-spatio-temporal-linked-statistical-datageoknow
 
ESTA-LD exploring spatio-temporal linked statistical data
ESTA-LD exploring spatio-temporal linked statistical dataESTA-LD exploring spatio-temporal linked statistical data
ESTA-LD exploring spatio-temporal linked statistical datageoknow
 
Big Data and Geospatial with HPCC Systems
Big Data and Geospatial with HPCC SystemsBig Data and Geospatial with HPCC Systems
Big Data and Geospatial with HPCC SystemsHPCC Systems
 
GEOGRAPHICAL INFORMATION SYSTEM (GIS)
GEOGRAPHICAL INFORMATION SYSTEM (GIS)GEOGRAPHICAL INFORMATION SYSTEM (GIS)
GEOGRAPHICAL INFORMATION SYSTEM (GIS)Siva Mbbs
 
From paper maps to point based statistics
From paper maps to point based statisticsFrom paper maps to point based statistics
From paper maps to point based statisticsMirosław Migacz
 
Geospatial analytics -Spatio-Temporal analysis
Geospatial analytics -Spatio-Temporal analysisGeospatial analytics -Spatio-Temporal analysis
Geospatial analytics -Spatio-Temporal analysisKriti Srivastava
 
KTH-Texxi Project 2010
KTH-Texxi Project 2010KTH-Texxi Project 2010
KTH-Texxi Project 2010Texxi Global
 
Open development cambodia
Open development cambodiaOpen development cambodia
Open development cambodiaBill Chea
 
BGP Communities: A Guide for Service Provider Networks
BGP Communities: A Guide for Service Provider NetworksBGP Communities: A Guide for Service Provider Networks
BGP Communities: A Guide for Service Provider NetworksRichard Steenbergen
 
Ground inventory and geospatial techniques for estimation of groundwater quality
Ground inventory and geospatial techniques for estimation of groundwater qualityGround inventory and geospatial techniques for estimation of groundwater quality
Ground inventory and geospatial techniques for estimation of groundwater qualitySainath Aher
 
Introduction of GIS & Remote Sensing (RS)
Introduction of GIS & Remote Sensing (RS)Introduction of GIS & Remote Sensing (RS)
Introduction of GIS & Remote Sensing (RS)Subtain Hussain Syed
 
Introduction to GIS-basic principles and description
Introduction to GIS-basic principles and descriptionIntroduction to GIS-basic principles and description
Introduction to GIS-basic principles and descriptionmahmoudelassar
 

Similar to Joining Disparate Data Sources for Insights (20)

CI_SIModule_QGIS.pptx .
CI_SIModule_QGIS.pptx                         .CI_SIModule_QGIS.pptx                         .
CI_SIModule_QGIS.pptx .
 
Merging statistics and geospatial information - demography / commuting / spat...
Merging statistics and geospatial information - demography / commuting / spat...Merging statistics and geospatial information - demography / commuting / spat...
Merging statistics and geospatial information - demography / commuting / spat...
 
PIAS 2013-GIS.pptxfskjczjsbchdbfscnnND dHSA
PIAS 2013-GIS.pptxfskjczjsbchdbfscnnND  dHSAPIAS 2013-GIS.pptxfskjczjsbchdbfscnnND  dHSA
PIAS 2013-GIS.pptxfskjczjsbchdbfscnnND dHSA
 
FinalPresentation-GradProject
FinalPresentation-GradProjectFinalPresentation-GradProject
FinalPresentation-GradProject
 
Guidance to-use-lending-data for local authorities
Guidance to-use-lending-data for local authoritiesGuidance to-use-lending-data for local authorities
Guidance to-use-lending-data for local authorities
 
IDMA/Experian Presentation: Justin Gleeson, AIRO
IDMA/Experian Presentation: Justin Gleeson, AIRO IDMA/Experian Presentation: Justin Gleeson, AIRO
IDMA/Experian Presentation: Justin Gleeson, AIRO
 
Esta ld -exploring-spatio-temporal-linked-statistical-data
Esta ld -exploring-spatio-temporal-linked-statistical-dataEsta ld -exploring-spatio-temporal-linked-statistical-data
Esta ld -exploring-spatio-temporal-linked-statistical-data
 
ESTA-LD exploring spatio-temporal linked statistical data
ESTA-LD exploring spatio-temporal linked statistical dataESTA-LD exploring spatio-temporal linked statistical data
ESTA-LD exploring spatio-temporal linked statistical data
 
Big Data and Geospatial with HPCC Systems
Big Data and Geospatial with HPCC SystemsBig Data and Geospatial with HPCC Systems
Big Data and Geospatial with HPCC Systems
 
Whats new open_devcam_sep_21_ 2012
Whats new open_devcam_sep_21_ 2012Whats new open_devcam_sep_21_ 2012
Whats new open_devcam_sep_21_ 2012
 
Sowo570
Sowo570 Sowo570
Sowo570
 
GEOGRAPHICAL INFORMATION SYSTEM (GIS)
GEOGRAPHICAL INFORMATION SYSTEM (GIS)GEOGRAPHICAL INFORMATION SYSTEM (GIS)
GEOGRAPHICAL INFORMATION SYSTEM (GIS)
 
From paper maps to point based statistics
From paper maps to point based statisticsFrom paper maps to point based statistics
From paper maps to point based statistics
 
Geospatial analytics -Spatio-Temporal analysis
Geospatial analytics -Spatio-Temporal analysisGeospatial analytics -Spatio-Temporal analysis
Geospatial analytics -Spatio-Temporal analysis
 
KTH-Texxi Project 2010
KTH-Texxi Project 2010KTH-Texxi Project 2010
KTH-Texxi Project 2010
 
Open development cambodia
Open development cambodiaOpen development cambodia
Open development cambodia
 
BGP Communities: A Guide for Service Provider Networks
BGP Communities: A Guide for Service Provider NetworksBGP Communities: A Guide for Service Provider Networks
BGP Communities: A Guide for Service Provider Networks
 
Ground inventory and geospatial techniques for estimation of groundwater quality
Ground inventory and geospatial techniques for estimation of groundwater qualityGround inventory and geospatial techniques for estimation of groundwater quality
Ground inventory and geospatial techniques for estimation of groundwater quality
 
Introduction of GIS & Remote Sensing (RS)
Introduction of GIS & Remote Sensing (RS)Introduction of GIS & Remote Sensing (RS)
Introduction of GIS & Remote Sensing (RS)
 
Introduction to GIS-basic principles and description
Introduction to GIS-basic principles and descriptionIntroduction to GIS-basic principles and description
Introduction to GIS-basic principles and description
 

More from Geovationchallenge

Geovation Water Challenge: Environment Agency - Flooding
Geovation Water Challenge: Environment Agency - FloodingGeovation Water Challenge: Environment Agency - Flooding
Geovation Water Challenge: Environment Agency - FloodingGeovationchallenge
 
Geovation Water Challenge: Environment Agency
Geovation Water Challenge: Environment AgencyGeovation Water Challenge: Environment Agency
Geovation Water Challenge: Environment AgencyGeovationchallenge
 
Geovation Water Challenge: British Geological Survey
Geovation Water Challenge: British Geological SurveyGeovation Water Challenge: British Geological Survey
Geovation Water Challenge: British Geological SurveyGeovationchallenge
 
Geovation Water Challenge: Trend Watching
Geovation Water Challenge: Trend WatchingGeovation Water Challenge: Trend Watching
Geovation Water Challenge: Trend WatchingGeovationchallenge
 
Geovation Water Challenge: Roundfield
Geovation Water Challenge:  RoundfieldGeovation Water Challenge:  Roundfield
Geovation Water Challenge: RoundfieldGeovationchallenge
 
Geovation Water Challenge: Produce World Group
Geovation Water Challenge: Produce World GroupGeovation Water Challenge: Produce World Group
Geovation Water Challenge: Produce World GroupGeovationchallenge
 
Geovation Water Challenge: Open Data Masterclass
Geovation Water Challenge: Open Data MasterclassGeovation Water Challenge: Open Data Masterclass
Geovation Water Challenge: Open Data MasterclassGeovationchallenge
 
Geovation Water Challenge: An Ofwat perspective
Geovation Water Challenge: An Ofwat perspectiveGeovation Water Challenge: An Ofwat perspective
Geovation Water Challenge: An Ofwat perspectiveGeovationchallenge
 
Geovation Water Challenge: Use of spatial data in SEPA
Geovation Water Challenge: Use of spatial data in SEPAGeovation Water Challenge: Use of spatial data in SEPA
Geovation Water Challenge: Use of spatial data in SEPAGeovationchallenge
 
Exploring the Water Challenge: Waterless Toilets - Cranfield University
Exploring the Water Challenge: Waterless Toilets - Cranfield UniversityExploring the Water Challenge: Waterless Toilets - Cranfield University
Exploring the Water Challenge: Waterless Toilets - Cranfield UniversityGeovationchallenge
 
Geovation Water Challenge: Abacus Agriculture
Geovation Water Challenge: Abacus AgricultureGeovation Water Challenge: Abacus Agriculture
Geovation Water Challenge: Abacus AgricultureGeovationchallenge
 
Geovation Water Challenge: OS Open Data
Geovation Water Challenge: OS Open DataGeovation Water Challenge: OS Open Data
Geovation Water Challenge: OS Open DataGeovationchallenge
 
Geovation Water Challenge: United Utilities
Geovation Water Challenge: United UtilitiesGeovation Water Challenge: United Utilities
Geovation Water Challenge: United UtilitiesGeovationchallenge
 

More from Geovationchallenge (14)

Geovation Water Challenge: Environment Agency - Flooding
Geovation Water Challenge: Environment Agency - FloodingGeovation Water Challenge: Environment Agency - Flooding
Geovation Water Challenge: Environment Agency - Flooding
 
Geovation Water Challenge: Environment Agency
Geovation Water Challenge: Environment AgencyGeovation Water Challenge: Environment Agency
Geovation Water Challenge: Environment Agency
 
Geovation Water Challenge: British Geological Survey
Geovation Water Challenge: British Geological SurveyGeovation Water Challenge: British Geological Survey
Geovation Water Challenge: British Geological Survey
 
Geovation Water Challenge: Trend Watching
Geovation Water Challenge: Trend WatchingGeovation Water Challenge: Trend Watching
Geovation Water Challenge: Trend Watching
 
Geovation Water Challenge: Roundfield
Geovation Water Challenge:  RoundfieldGeovation Water Challenge:  Roundfield
Geovation Water Challenge: Roundfield
 
Geovation Water Challenge: Produce World Group
Geovation Water Challenge: Produce World GroupGeovation Water Challenge: Produce World Group
Geovation Water Challenge: Produce World Group
 
Geovation Water Challenge: Open Data Masterclass
Geovation Water Challenge: Open Data MasterclassGeovation Water Challenge: Open Data Masterclass
Geovation Water Challenge: Open Data Masterclass
 
Geovation Water Challenge: An Ofwat perspective
Geovation Water Challenge: An Ofwat perspectiveGeovation Water Challenge: An Ofwat perspective
Geovation Water Challenge: An Ofwat perspective
 
Geovation Water Challenge: Use of spatial data in SEPA
Geovation Water Challenge: Use of spatial data in SEPAGeovation Water Challenge: Use of spatial data in SEPA
Geovation Water Challenge: Use of spatial data in SEPA
 
Exploring the Water Challenge: Waterless Toilets - Cranfield University
Exploring the Water Challenge: Waterless Toilets - Cranfield UniversityExploring the Water Challenge: Waterless Toilets - Cranfield University
Exploring the Water Challenge: Waterless Toilets - Cranfield University
 
Geovation Water Challenge: Abacus Agriculture
Geovation Water Challenge: Abacus AgricultureGeovation Water Challenge: Abacus Agriculture
Geovation Water Challenge: Abacus Agriculture
 
Geovation Water Challenge: OS Open Data
Geovation Water Challenge: OS Open DataGeovation Water Challenge: OS Open Data
Geovation Water Challenge: OS Open Data
 
Geovation - Water Challenge
Geovation - Water ChallengeGeovation - Water Challenge
Geovation - Water Challenge
 
Geovation Water Challenge: United Utilities
Geovation Water Challenge: United UtilitiesGeovation Water Challenge: United Utilities
Geovation Water Challenge: United Utilities
 

Recently uploaded

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 

Recently uploaded (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 

Joining Disparate Data Sources for Insights

  • 1. Joining Data Together 1 John Murray @MurrayData
  • 2. Joining Data Together 2 • The real value of data is not the data itself, but the insights derived from it. • To achieve maximum economic benefit from data, disparate sources need to be joined: • Appending socio-demographic data to a customer database for marketing insights. • Merging crime data with benefits and deprivation data to analyse causes of crime. • Joining NHS mortality and prescribing data to census data to examine factors in poor health. • Geography is a common "currency" in much Open Data which allows us to join it.
  • 3. Geography Types in Open Data 3 • Census geography. • Output areas. • Administrative geography. • Local authorities, regions, NHS areas, Police Forces. • Political geography. • Electoral wards, Parliamentary constituencies. • Postal geography. • Postcodes, sectors, areas, districts. • Unstructured geography. • Spatial points. • Bespoke catchments, e.g. retail stores.
  • 4. Census Geography 4 • Hierarchy of published area statistics. • Output Area (OA) • 40-250 households. • Lower Super Output Area (LSOA) • 400-1200 households. • Middle Super Output Area (MSOA) • 2000-6000 households. • Links to administrative geography • Open data geography tables: • ONS Postcode Directory (ONSPD) • National Statistics Postcode Lookup (NSPL)
  • 5. Administrative and Political Geography 5 • Local Authorities. • District. • County. • Metropolitan Boroughs and Unitary Councils. • Parish and Town Councils. • Parliamentary Constituencies. • Government Regions. • NHS. • Police Forces. • Environment Agency Regions. • Links Provided in ONSPD and NSPL.
  • 6. Postal Geography 6 • Based around the postcode. • Introduced in 1959 on a trial basis. • Current UK system in use since 1967. • Designed for the purpose of efficient delivery of mail. • Doesn't align exactly with Census and Administrative Geography. • 1.8 million postcodes currently in use. • Mean number of "delivery points" is 14.
  • 7. Anatomy Of A Postcode 7 CH1 2HS • CH – Postcode Area • CH1 – Postcode District • CH1 1 – Postcode Sector • CH1 2HS – Postcode • "HS" is called the walk. • CH1 referred to as the Out code • 2HS is referred to as the In code
  • 8. Postcode Facts 8 • Postcode mean 14 delivery points. • Postcode sector mean 2530 delivery points. • Postcode district mean 9080 delivery points. • Postcode area mean 200,000 delivery points. • 26 million delivery points in UK. • Ordnance Survey Codepoint Open, ONSPD and NSPL contain grid references for postcode centroids.
  • 9. Joining Data 9 • In most cases, use ONSPD • Although approximate, good enough for most uses. • Political and public sector, use NSPL • Specifically designed for that purpose. • Use postcode to join data. • Can join individual/household data. • Augment existing data, e.g. customer database • Customer demographic profiling. • Store catchment analysis. • Join open and closed data sources. • Common in many open data sources. • Links easily to other levels of geography.
  • 11. Geospatial Data in Databases 11 • Spatial data types • Point (single point) • Line (set of joined points e.g. road) • Polygon (closed set of joined points e.g. boundary) • Most database support spatial data types • Proprietary e.g. MS SQL Server, Oracle. • Open source: MySQL, MariaDB, PostGreSQL • NSQL: Neo4J, MongoDB, PostGIS • Spatial queries • Contained in (point in polygon). • Intersects (crosses). • Distance (not supported by all).
  • 13. Distance Metrics 13 • Euclidean Distance • “Crow flies” linear distance • Graph Distance • Road distance • Manhattan Distance • Rectilinear distance • Great Circle • Shortest distance between two points on the surface of a sphere
  • 14. Euclidian Distance 14 • University of Chester to Liverpool Airport. • Euclidean distance 9.4 miles. • Manhattan distance 11.1 miles. • Graph distance (fastest) 24.5 miles. • Used OS Strategy Roads Opendata and A* algorithm.
  • 15. Non-Formal Unstructured Geography 15 • Micro geo-centric analysis • Describe neighbourhood • Point based data • Relate to formal geography through boundaries. • User defined • Store catchments • Sales territories • Radial/drive time
  • 16. Point Based Data 16 • The simplest type of spatial object. • Represents a point relative to the Earth's surface. • Has at least 2 values for coordinates. • May optionally have an elevation z value in some systems. • Ordnance Survey grid references are Cartesian Coordinates, in metres, east and north of origin point.
  • 17. Converting Between Systems 17 • Use GIS software or conversion software. • Scripts freely downloadable from Ordnance Survey and others. • Ordnance Survey provide comprehensive guides and resources to write your own scripts. • Unfortunately, it ISN’T as straightforward as using a formula. • Need to take into account tectonic shifts and historic inaccuracies in surveying. • OS provides a dataset of shifts to do this.
  • 18. Geocentric analysis 18 • Use point as centre. • Use Euclidian distance to aggregate metrics. • Standardise units. • Example – population density at postcode level: • Census Postcode Estimates • Ordnance Survey Code-Point Open • Join the datasets. • Sum the counts within specified radius. • Convert to standardised unit e.g. people/hectare
  • 20. Geocentric analysis 20 Postcode 1km 750m 500m 250m 100m EA NO CH1 4BA 17.24 15.71 17.01 21.13 12.41 340282 367773 CH1 4BB 16.26 14.56 14.23 16.04 20.05 340178 367782 CH1 4BD 16.01 13.43 14.48 15.48 20.05 340143 367784 CH1 4BE 7.72 6.85 5.67 4.32 8.91 339532 368352 CH1 4BF 14.26 12.58 13.58 16.9 13.36 340105 367827 CH1 4BG 14.52 15.36 11.45 9.47 25.78 339790 367448 CH1 4BH 12.84 11.53 8.25 9.37 5.09 339647 367485 CH1 4BJ 19.21 24.05 23.49 23.73 8.91 340104 367217 CH1 4BL 12.84 11.53 8.25 9.37 5.09 339647 367485 CH1 4BN 15.95 17.04 20.1 23.37 36.92 339857 367238 CH1 4BP 13.64 10.3 9.51 14.26 19.73 339982 367851 CH1 4BQ 14.89 16.43 13.2 11.2 25.78 339834 367446 CH1 4BR 18.56 23.03 26.61 24.95 8.91 340059 367161 CH1 4BS 14.64 17.14 21.11 21.13 28.32 339806 367194 CH1 4BT 15.01 15.83 14.03 10.13 41.06 339791 367360 CH1 4BU 15.58 16.85 17.22 16.75 35.01 339827 367314 CH1 4BW 14.87 16.89 17.46 21.95 46.15 339813 367264 CH1 4BX 18.48 16.17 20 21.03 25.46 340323 367747 CH1 4BY 16.06 17.8 22.62 37.84 54.43 339843 367069 CH1 4BZ 15.22 17.29 21.83 33.97 41.38 339808 367083 CH1 4DA 13.37 12.59 10.42 15.53 15.59 340163 367955 CH1 4DB 12.94 9.84 6.48 6.51 18.46 339621 367567 CH1 4DD 14.59 16.39 20.99 35.54 41.38 339768 367073 CH1 4DE 16.15 18.22 27.15 53.62 9.54 339871 366882 CH1 4DF 13.31 14.62 19.21 32.08 27.69 339715 367101 CH1 4DG 14.21 15.69 23.1 41.6 49.01 339759 366931 CH1 4DH 16.34 18.96 28.95 40.64 28.96 339934 366769 CH1 4DJ 15.58 17.63 24.8 53.27 34.37 339837 366904 CH1 4DN 11.4 12.44 16.5 18.84 33.42 339585 366986 CH1 4DP 12.25 13.99 18.69 24.64 21.64 339644 367016 CH1 4DR 25.61 29.57 30.96 26.94 10.5 340393 366940
  • 23. Chester Postcode Crime Density (500m) 23
  • 24. INSPIRE Directive 24 • INfrastructure for Spatial InfoRmation in Europe. • EU Directive since May 2007. • Lays down framework for spatial information. • Aim is ensure compatibility and usability across member states. • Interoperability of spatial datasets. • Metadata standards. • Ordnance Survey Opendata. • Land Registry Cadestral Polygons.
  • 25. INSPIRE Example – Land Registry Cadestral Polygons 25
  • 26. Street Level Data 26 • Use proximity to street geometry to link attributes. • Interrelation between features. • Inference of addresses. • Describe local neighbourhood.
  • 27. Street Level Data Demo – OS OpenMap 27
  • 28. Screenshot 1 - Roads 28
  • 29. Screenshot 2 – Add railways 29
  • 30. Screenshot 3 – Add buildings 30
  • 31. Screenshot 4 – Add functional sites 31
  • 32. Screenshot 5 – Add important buildings 32
  • 33. Screenshot 6 – Add water 33
  • 34. Screenshot 7 – Add postcode centroids 34
  • 35. Screenshot 8 – Add INSPIRE polygons 35
  • 36. Screenshot 9 – Add Census output areas 36
  • 37. Screenshot 10 – Add proportion of 65+ 37 Key: Red high, yellow average, blue low