The document describes a project that uses k-means clustering to group local authorities in the UK that are statistically similar based on key metrics related to the UK government's 12 levelling up missions. The analysis found clusters of local authorities with higher/lower levels of health, well-being, connectivity, and educational performance. Future work may develop the analysis over multiple time periods and using additional datasets to understand outcomes based on demographic groupings. User feedback is sought on how similar groupings could best be utilized and presented.
How the Congressional Budget Office Assists Lawmakers
ONS local presents clustering
1. James McCrae - Levelling Up Data
Analysis (LUDA) and Dissemination
Team
08/06/2023
Clustering local authorities
against similar themes –
why Newcastle Upon Tyne
and New Forest are closer
than you think!
2. Join our Slido !
You can use Slido to ask questions throughout the presentation. Later on we’ll be using it to find out
what you as our users would like to see us do next with this work.
To join, either:
1. Go to Slido.com and enter the event code #1666805
2. Click on this link - https://app.sli.do/event/jTgULusV7WpQTH3PcwVvBd
3. Scan the below QR code:
3. Project aims
• Find out which local authorities are performing
similarly to each other against the overall Levelling
Up metrics, and single Levelling Up missions
• Analyse the common characteristics of local
authorities within each cluster; what are the trends
in the demographic make-up of each cluster?
• Identify similar groups of local authorities for a more
nuanced comparison/control group for evaluating
the impact of new levelling up policies and to better
support place-based interventions
4. LUDA/X-Govt platform
• Cross-government levelling up data
collaboration platform powered by Google Cloud Platform
• Currently separate to the Integrated Data Service (IDS) but
using similar software. Long term we will migrate into IDS and
through IDS create access for LA colleagues in LUDA.
Main aims of using this platform:
• Develop suite of dashboards to support decision making
across Govt.
• Space for shared analysis and output production in VertexAI.
• Stores all levelling up data in BigQuery, including Explore
Subnational Statistics (ESS) data.
• Will allow Govt colleagues to respond rapidly to queries using
available data.
• Jupyter notebooks used for clustering project, analysis was run
in Python and code shared with team members via Git.
5. Data collection
• Data sourced from ONS Subnational Indicators Explorer which presents available metrics for the
12 levelling up missions.
• Sourced for ~40 different metrics, from a range of different Government departments.
• Publicly available – collaborated with departments to resolve any caveating/quality issues
• Data sourced at different geographical granularities, LA level used for this bulletin.
• Data ingested into LUDA/X-Govt as a CSV for use in the analysis
8. Levelling Up missions/metrics
• The Levelling Up White Paper sets out twelve missions that support key levelling up objectives
• Headline and supporting metrics were chosen to monitor progress against the missions.
• Mission 1 (Economy based) - By 2030, pay, employment and productivity will have risen in every area of
the UK, with each containing a globally competitive city, with the gap between the top performing and
other areas closing.
• Metrics we sourced from ONS Subnational Indicators Explorer:
• Gross Value Added (GVA) per hour worked
• Gross median weekly pay (£)
• Employment rate for 16–64-year olds
• Gross Disposable Household Income (GDHI)
9. K-means clustering
• K-means clustering identifies and groups similar data points within a
dataset together.
• Algorithm chooses a number (k) of random centroid points, each
data point assigned to its closest cluster and total distance stored.
• Centroid points updated to be central within each cluster, then the
distances are recalculated, process repeated until the centroid
points no longer change and the length of a connecting line between
each point and their respective centroid (Euclidean distance) is
minimised.
• Our model is set to produce the minimal Euclidean distance and the
optimal number of clusters between 4 and 15.
Images source: What Is
K-means Clustering? |
365 Data Science
10. K-means clustering
• Clustering models run for each LU mission and a
overall headline model.
• Metric data at LA level used for each mission
• ‘Negative’ metrics reversed to make radar plots
easier to interpret.
• Eg: Smoking metric
• Extreme outliers removed
• Eg: Travel time metrics for Isles of Scilly
• White space where data uses different LA
boundaries (2020 used)
Connectivity model with Isles of Scilly removed
Connectivity model with Isles of Scilly included
(cluster 0 is entirely Isles of Scilly)
11. Using lookups to explore characteristics for
each cluster
• Cross-tabulations with various demographic information to analyse the characteristics of local
authorities within each cluster.
• Adds additional insight into the clustering results showing the typical trends of local authorities
clustered together.
• Geographic, population and deprivation based lookups used:
• Urban/Rural classification (DEFRA)
• Indices of Multiple Deprivation (DLUHC)
• Region (ONS GeoPortal)
• Age (Census 2021)
• Population Density (Census 2021)
• Coastal (ONS & UK Parliament)
12. Headline model
• Headline model – select 1 metric per mission, based on headline metrics in LU White paper and
most representative metric for each mission.
• Combined/weighted existing data to produce representative metrics – Male/Female life expectancy
into one weighted metric.
• Model now includes:
• Mission 1: Economy - Gross value added per hour worked
• Mission 3: Transport - Travel time to largest employment centre by public transport or
walking
• Mission 4: Digital Connectivity - Percentage of premises with Gigabit capable broadband
• Mission 5: Education - Percentage of pupils meeting the expected standard in reading,
writing and maths by end of primary school
• Mission 6: Skills - Apprenticeships achievements
• Mission 7: Health - Healthy life expectancy (combined/weighted)
• Mission 8: Wellbeing - Average life satisfaction rating
14. Headline model findings
GVA, Average Travel to employment centre using public transport, broadband, expected
standards by end of primary school, apprenticeships achievements, healthy life expectancy, life
satisfaction
Higher health and well-being, lower
connectivity cluster
Higher connectivity, lower health and
well-being cluster
Higher health and well-being, moderate
educational performance cluster
Higher health and productivity, lower
well-being cluster
Note: some local authorities are not assigned to a cluster due
to missing data and/or older district boundaries
15. Publication of Findings
• Initial findings published on the ONS website
24th February
• Interactive visualisation tool, users select
specific LAs with summary text presented
• Naming of the clusters to best summarise
results
• Summary characteristics outlined in text
below graph
• Accompanying methodology article to
explain methods and dataset.
16. Future work
• As our key users, we really want to understand what you’d like to see us do to develop this work,
and how you use these statistics.
• These questions are on our Slido, there’s 3 different ways to join our slido.
1. Go to Slido.com and enter the event code #1666805
2. Click on this link - https://app.sli.do/event/jTgULusV7WpQTH3PcwVvBd
3. Scan the below QR code:
17. Research questions
• If ONS created groups of statistically similar local authorities, how would you use that
information and at what level of distance are you interested in seeing statistically similar
local authorities (E.g. national, regional, neighbouring) ?
• What would you like to see ONS do to develop this work further? Examples include analysis
across multiple time periods and creating models based on demographic datasets and
assessing outcomes against those.
• How would you like to see work on similar LAs presented. Was the interactive map helpful?
Would you find a raw dataset/CSV useful or APIs useful?
If you want to be part of future user research for our Explore Subnational Statistics platform
(which this similarities work will sit within) please get in touch at Subnational@ons.gov.uk
Editor's Notes
Our model using headline metrics produced 4 distinct clusters. Due to data availability, we were only able to create clusters for local authorities based in England. The key characteristics per cluster are as follows:
Cluster 0 (Dark blue) – Higher health, life satisfaction, apps achievements, lower productivity, connectivity and primary school attainment. 83% predominantly rural. Low population density, older areas. >50% in the two least deprived quintiles.
Cluster 1 (Teal) – Higher connectivity, lower productivity, health, life satisfaction and primary school attainment. 90% predominantly urban. 87% in most deprived two quintiles. High proportion of North East, North West and Yorkshire based LAs.
Cluster 2 (Maroon) – Higher health and life satisfaction, moderate connectivity, lower productivity. High proportion of East and South East LAs. Relatively low deprivation, moderate density. Newcastle-upon-Tyne and New Forest are both in this cluster.
Cluster 3 (Orange) – Higher productivity, connectivity, primary school attainment and health, lower life satisfaction and apps achievements. 89% predominantly urban. High proportion London and SE, younger areas. Relatively low deprivation.
Our model using headline metrics produced 4 distinct clusters. Due to data availability, we were only able to create clusters for local authorities based in England. The key characteristics per cluster are as follows:
Cluster 0 (Dark blue) – Higher health, life satisfaction, apps achievements, lower productivity, connectivity and primary school attainment. 83% predominantly rural. Low population density, older areas. >50% in the two least deprived quintiles.
Cluster 1 (Teal) – Higher connectivity, lower productivity, health, life satisfaction and primary school attainment. 90% predominantly urban. 87% in most deprived two quintiles. High proportion of North East, North West and Yorkshire based LAs.
Cluster 2 (Maroon) – Higher health and life satisfaction, moderate connectivity, lower productivity. High proportion of East and South East LAs. Relatively low deprivation, moderate density. Newcastle-upon-Tyne and New Forest are both in this cluster.
Cluster 3 (Orange) – Higher productivity, connectivity, primary school attainment and health, lower life satisfaction and apps achievements. 89% predominantly urban. High proportion London and SE, younger areas. Relatively low deprivation.