SlideShare a Scribd company logo
1 of 80
Administrative data
census research
Chair: Becky Tinsley, Admin Data Census,
ONS
Becky.tinsley@ons.gov.uk
Population and Household
Estimates – what we have done
and what we plan to do
Chris Hill, Ali Dent and Claire Pereira
Administrative Data Census Project
Census Transformation Programme
Overview
Administrative Data Research Outputs:
• Population estimates – background, our
progress so far and our future plans (Chris
and Ali)
• Producing household estimates from
administrative data (Claire)
Note: These research outputs are NOT official statistics on the
population
Background
Beyond 2011 programme (April 2011)
• review the future provision of population statistics in England and Wales
and inform government and Parliament about options for the next census
• culminated in the National Statistician’s recommendation on the future of the
census and population statistics (March 2014)
The Census Transformation Programme (January 2015) to take forward
the National Statistician’s recommendation:
• deliver a predominantly online census in 2021
• increased use of administrative data and surveys to enhance the statistics
from the 2021 census and improve annual statistics between censuses
Administrative Data Census Project - aim is to produce the type of
information that is collected by a ten-yearly census (on housing, households
and people) from use of administrative data and surveys
Administrative Data Census Project –
Research Outputs
Key aim of the Research Outputs is:
• to replicate as many census outputs as possible using
admin data (and surveys) to compare with the 2021
Census
• Size of population
• Number and structure of households
• Characteristics of housing and the population
Continued development of the methodology based on
acquisition of new data sources and user feedback
Publish an annual assessment each spring to show
progress of our ability to move to an Administrative
Data Census in the next decade
A long way to go…but we have begun
• Published first set of
Research Outputs
Oct 2015
• Published first Annual
Assessment
May 2016
• Next set of Research
Outputs published –
expanding the range
Autumn 2016
What was included in the 2015 release?
• research outputs for each LA in England and Wales as a series
of admin data population estimates for 2011, 2013 and 2014 by
5 year age-sex groups
• analytical report comparing these to the 2011 Census and
subsequently the ONS mid-year population estimates
• case studies to highlight quality issues with the admin data
• interactive maps and population pyramids
• administrative data update paper – plans and aspirations for
future years
• feedback from users - aim of improving our methods
Producing population estimates from
administrative data (SPD)
NHS Patient
Register (PR)
DWP/HMRC Customer
Information
System (CIS)
Higher Education Statistics Agency (HESA)
data (students)
population
estimates
Included in Statistical
Population
Dataset (SPD)
If in different location
on PR & CIS, split half
and half across two
addresses
Statistical Population Dataset – SPD V1.0
Performance of SPD v1.0 compared with the 2011
Census estimates by LA
94% of LA total population
estimates within 3.8% of Census
estimate in 2011
Admin data
method lower
than 2011 Census
Admin data
method higher
than 2011 Census
Percentage difference from 2011 Census estimates
Performance of SPDv1.0 compared with the 2014
mid-year estimates by LA
90% of LA total population
estimates within 3.8% of mid-
year estimate in 2014
Admin data
method lower
than 2011 Census
Admin data
method higher
than 2011 Census
Percentage difference from 2014 MYEs
Feedback summary
• the need for a Population Coverage Survey to help with
estimating the size of the population (considering options)
• using ‘activity data’ (1) to help reduce levels of over-coverage
that are seen for particular age groups (some progress)
• refining the Statistical Population Dataset inclusion and
exclusion rules (changes made)
• reviewing the quality standards that are used to assess the
quality of the SPDs (considering options)
• producing population estimates for small areas, within a local
authority (potentially autumn 2016)
1 Information from administrative data sources about when individuals have interacted with
systems or services, such as the National Insurance, tax or benefits systems, or a hospital visit
through the NHS system.
SPD Developments for 2016
SPD (Statistical Population Dataset)
used to estimates of the size of the population by anonymously
linking multiple administrative datasets
• Continue with SPD v1.0 for 2015 estimates (stable)
• SPD v2.0 (improved model) will be used to produce
pop estimates for 2011 and 2015
SPD v2.0 changes
• Improve overall coverage of the usual resident
population
• Redistribute people in the correct location
Plans for 2016 Research Outputs
• Population estimates – expanding the breadth and
detail
• Improvements to the methods used to produce
administrative data population estimates
• Outputs on the number of households
• Research on income from combined PAYE and
benefits data
• Stagger the outputs over the autumn
What are we planning to publish this year?
Package 1
Population estimates
(National and LA)
by LA, sex and 5 year age-group 2015
(As last year, but extends to include
a new time series for 2011 and 2015,
and the old time series extends to
SYOA)
Autumn 2016
Package 2 NEW
Population estimates
(Small Area)
by LSOA, sex and 5 year age-group Autumn 2016
Package 3 NEW
Household estimates
(Number of households)
Combined PAYE and
benefits research
by LA (2011 and 2015)
by LA (2013/14 Tax Year)
Autumn 2016
All content and timings are provisional
Focus of methodological research for 2016
Males and females (where comparison data is higher or lower than
official estimates) percentage difference 2011 (England and Wales)
Add of other
admin data
Activity
Data
Improve Matching Methodology,
increasing number of matches
Tackling undercoverage for school age
children
User feedback had suggested additional
administrative sources to use including the
School Census
• School Census = record level source that includes all
pupils at state Schools, produced annually
• SPD V1 includes matches between any two of PR,
DWP-CIS and HESA
With School Census we can find additional
matches to include in an SPD:
• PR-SC matches
• CIS-SC matches
Also - improve use of matches for
students
• In SPD V1 – it is possible for the record identifiers to
conflict, for example
• In this case SPD v1 does not choose between the two HESA
IDs, so PR and CIS locations are used to place people in an
area
• The conflict implies that there is one SPD row that might
represent two people, and that we are not using the HESA
location for them in SPD v1
HESA-pr
match
PR-CIS CIS-HESA
match
HESA ID (via
PR)
NHS number
PR
DWP CIS ID HESA ID
(via CIS)
365421 7404747201 889877261543 739542
HESA ID (via
PR)
NHS number
PR
DWP CIS ID HESA ID
(via CIS)
365421 7404747201 889877261543 739542
Improved use of matches for students
• Want to ensure there are no rows with more than 1 identifier
from each dataset (e.g. prevent 2 HESA ids in 1 row)
• Resolving conflicts may change number of records in SPD
Could convert this to 2 matches, so each HESA ID only appears
once!
• Achieved by changing how we use the information for the
matches
• All IDs from the datasets are included in a “spine” of records
(even if they are non-matches) – convenient for research
HESA ID NHS number
PR
365421 7404747201
DWP CIS ID HESA ID
889877261543 739542
School Census matches and HESA conflicts:
estimated effect of extra matches compared to SPD v1
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
% of SPD 1.0 estimate
Singleyearofage
Females
Males
Effects from
adding SC
matches
Effects from
HESA conflict
resolution
Estimated proportion of SPD records by source
(after SPD exclusion/inclusion rules)
Frequency Percent
PR and CIS 45,202,800 81.2
PR, CIS and SC 7,733,800 13.9
PR,CIS and HESA 2,070,800 3.7
PR 225,100 0.4
PR and HESA 170,500 0.3
PR and SC 143,500 0.3
CIS and HESA 89,700 0.2
CIS and SC 52,200 0.1
New
matches
Increased number
of matches
Coverage improvements for working
ages
• SPD v1 used “exact” or deterministic matches
(e.g. based on combination of name, address, DoB etc).
• Using score based matches (probabilistic) we can find
more PR to CIS matches
• Over 150,000 additional matches of expected good
quality can be identified
• 60% male, majority are distributed evenly across ages
18-50
• 40% Female, many in 18-24 range but also some older
• The extra matches to be added to the
deterministic/matchkey matches to test impact (work in
progress)
“Activity” data
New activity data acquired from DWP and
HMRC (abbrev = BIDS):
• National Benefits Database (NBD)
• PAYE (Pay as you earn – income tax)
• Single Housing Benefit (SHBE)
• Tax Credits
• excludes: Child Benefit, self-employed and people on
Universal Credit
Research aim:
to derive broad activity to verify residency in E&W and
potentially reduce overcoverage in an SPD
Created combined DWP/HMRC activity
dataset for 2011
• PAYE and TC:
• Anyone present in 2010/2011 and 2011/2012 tax years
• NBD:
• All people with an active claim on 15/03/2011
• Any other people with JSA or ESA claim since 15/03/2009
• Any partners of the above given their own record
• SHBE:
• Active claim on 15/03/2011 or started after
• Partner status was active on 15/03/2011 or after
• Variables:
• Dates
• Additional variables derived from each source – eg account
status, claim status, boolean tax year variables for PAYE/TC
• Links to CIS, Census, PR, SPDv1
Age-sex distribution of active DWP/HMRC
records in SPDv1 2011
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
450,000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100+
Male not in BIDS
Male in BIDS
Female not in BIDS
Female in BIDS
SPD records
not active
Effect of removing inactive records
• Self-employed likely to be excluded from activity dataset
• Child benefit not included
• Would like to acquire MORE activity data!
-50%
-40%
-30%
-20%
-10%
0%
10%
15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90+
Proportion of males in BIDS
compared to census estimates
Proportion of females in BIDS
compared to census estimates
Proportion of males inSPDv1
compared to census estimates
Proportion of females in SPD v1
compared to census estimates
Improving local distributions – with
“PDS” activity data
• PDS is our first set of health activity data
• Is based on interactions with NHS, not same as Patient Register -
contains history (multiple rows per person)
• Extract for ONS contains “movers” – a history of locations for each
person
• Aim to remove uncertainty in SPD about location of people, so a
single record is not allocated as a half-person in 2 locations
• SPD v1.0 contains ~3.1 million half-weighted people (5.5%)
• PDS information likely to be more recent than CIS/PR and may
help to resolve half-weighted people
• Many half-weighted records persist in SPD for multiple years, so
linking to PDS from current and previous years may resolve more
• Aimed to link half-weighted records to PDS in current year or
earlier, and categorise
Resolving half weighted people with PDS:
linkage results (2013)
• 48% of 2013 half-weighted records are not linked to any PDS
record from 2013 or earlier, 52% are linked
• Those who are found by most recent PDS extract they are
found on:
• Significant benefit from including earlier years
(maximise information available for half-weighted records)
PDS extract Frequency Percentage
June 2013 806125 49.1
June 2012 469416 28.6
June 2011 323584 19.7
March 2011 44480 2.7
Example of resolving location :
perfect moves
PR:CIS:
LA Mod date
E06000047 16/04/2013
LA Addr start date
E09000027 09/03/2011
Most recent PDS move:
Origin LA Destination LA Effective date
E09000027 E06000047 04/04/2013
• Dates on PDS and PR are both later than CIS
• Category 1b is the same except PR to CIS
• Can assign very confidently to destination LA
Category 1a – a perfect CIS-PR move:
Producing population estimates from
administrative data (SPD)
NHS Patient
Register
DWP/HMRC Customer
Information
System
HESA data
(students)
SPD population
estimates
Included in Statistical
Population
Dataset (SPD)
School Census
Statistical Population Dataset – SPD V2.0
resolve half-weights
Add extra
PR-CIS
matches
To be published this autumn !
Producing household
estimates from
administrative data
Methodology and analysis towards
ONS Research Outputs 2016
What is a household?
A
household
is defined as:
one person living alone,
or
a group of people (not
necessarily related) living at the
same address who share
cooking facilities and
share a living room or
sitting room or dining area.
Beyond 2011
Early research showed potential for admin
data to provide number and sizes of
‘occupied addresses’.
But key challenges….
• Limited data sources available.
• Coverage & measurement error –undercount &
people not in the right place.
• Definitions – occupied address v census definition
Aims
Short term
• Producing numbers of households in England
and Wales by Local Authority for 2011 and 2015
• Deal with key challenges from previous work
Longer term
• Keep developing – build breadth and time series
of households statistics
• Develop alongside SPD production
What data can we use?
Address
Base
Tax and
Benefits
data
Population
Coverage
Survey
Comparing with other ONS outputs
OA
Output Area
DAU
Demographics
Analysis Unit
LFS
Labour Force
Survey
SPD
Statistical
Population
Database
No mid year estimates as with population
Can evaluate quality in 2011 by comparing with Census
estimates, down to OA level.
DAU produce national estimates for 1996 onwards:
• Families and people in families
• Households and people in households
Produced from LFS – sample size - 41,000 households
containing around 100,000 individuals. Internally
estimates can be produced at Local Authority level.
Can AddressBase help?
C Commercial
L Land
M Military
O Other (Ordnance
Survey Only)
P Parent Shell
R Residential
U Unclassified
X Dual Use
Z Object of Interest
RB Ancillary Building
RC Car Park Space
RD Dwelling
RG Garage
RH House In Multiple Occupation
RI Residential Institution
There are 1128 classifications of address on Address Base, an
Ordinance Survey product, including care home, house boat and
caravan. Classifications have four levels of detail (many/most do
not) and have dates attached, that allows further validation.
Address Matching
OSAPR
Ordnance Survey
Address-Point
Reference Number
UPRN
Unique Property
Reference Number
Address matching methodology is developing at ONS -
estimate a 5% increase in match rate.
Need a reliable unique identifier for addresses - transition
from OSAPR to UPRN
Changes in address identification
OSAPR
Ordnance Survey
Address-Point
Reference Number
UPRN
Unique Property
Reference Number
- 1.00 2.00 3.00 4.00
East Midlands
East of England
London
North East
North West
South East
South West
Wales
West Midlands
Yorkshire and…
OSAPRs/UPRNs in millionsUPRNs 2014 OSAPRs 2013
Currently ONS has attached OSAPRs onto records up to 2013,
with a switch to UPRNs in 2014.
We would expect an increase due to housing stock growth of
around 1%. 77% of LAs show an increase of more than 1 %.
Challenges
Our biggest challenges for producing household numbers
Definition – household/address is not a one to one relationship.
Putting people in the right place
•Half weights on SPD – when sources disagree
•Correct address allocation
• data lags
• high churn
• people not deregistering
• poor AddressBase matching/allocation
Dealing with half sizes
Our objective is to count each person in a household – need to resolve
unmatched records
Two methods
1. Source preference
HESA PR CIS
2. Redistribute according to
household size distributions
Dealing with half sizes
-
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
hh size 1 hh size 2 hh size 3 hh size 4 hh size 5
plus
total hhs
Comparing with Census Outputs
Census - QS406EW Redistributed half sizes
-25
-20
-15
-10
-5
0
5
10
15
20
25
hh size 1 hh size 2 hh size 3 hh size 4 hh size 5
plus
total hhs
% differences
Redistributed half sizes
Over counting large
household sizes, whilst
undercounting 1 and 2
person households.
It is anticipated that better
address matching and the
use of UPRNs rather than
OSAPRs will resolve some of
these differences.
Dual System Estimation
ONS often uses DSE to weight up for non response. To trial the use of
DSE, to weight up for undercount, I used a 4% sample by postcode taken
from the Census as a proxy for a survey.
To allow for differences in samples, 400 samples were taken.
In the future, an annual
survey similar to a population
coverage survey could
contribute.
-14 -12 -10 -8 -6 -4 -2 0
East Midlands
East of England
London
North East
North West
South East
South West
Wales
West Midlands
Yorkshire and The Humber
England and Wales
SPD % diff
Dual System Estimation
Entire
population
Sample population
match
(Census addresses * SPD addresses)
Matched addresses
Then to scale up to
England and Wales
aggregate
Dual System Estimation
Impact of DSE on household counts
85% of Local Authorities are within 0.5% of Census estimate
90% of Local Authorities are within 1% of Census estimate
95% of Local Authorities are within 1.5% of Census estimate
-14 -12 -10 -8 -6 -4 -2 0
East Midlands
East of England
London
North East
North West
South East
South West
Wales
West Midlands
Yorkshire and The Humber
England and Wales
DSE % diff SPD % diff
Allocating address at SPD record level
Using many data sources to find our
‘best’ address.
Benefits
Enables aggregation at different
levels and cross tabulation with other
variables.
Can weight certain data sources for
different demographic groups . e.g.
students
Allocating address at record level
PR
Joe Bloggs
17/4/1974
UPRN: 12345
CIS
Joe Bloggs
17/4/1974
UPRN: 12346
Patient register moves - PDS
Joe Bloggs 17/4/1974 move 1 - 1/1/2011: UPRN: 12345
Joe Bloggs 17/4/1974 move 2 - 2/2/2011: UPRN: 22345
Joe Bloggs 17/4/1974 move 3 - 3/3/2013: UPRN: 12346
Can use
activity
data to
locate the
newest
address.
True
match on
SPD
Plans for the future
This year
• Numbers of households by LA, England and Wales, 2011
and 2015 for Research Outputs, Autumn 2016 (including
case studies of Local Authorities of interest)
• Focus on issue of definitional differences – what is the real
need vs what can be produced.
• We have initiated a ONS household working group to join
different sectors of work to share ideas build knowledge.
Future years
• Develop time series of numbers of households
• Explore additional data sources to fill gaps in household
statistics
•Household sizes
•Household composition
•Investigate production of an enhanced address register.
Integrated sources for estimating
population characteristics
Alison Whitworth and Meghan Elkin
Administrative Data Census
Census provides information on:
1. Size of the population
by area, age and sex
2. Household and families
number, size and type of family
3. Population characteristics
information on ethnicity, educational attainment,
religion (etc)
Producing population statistics from
admin data
• Beyond 2011 admin data option :
o Admin data – population size by age and sex
o 4% annual survey – population characteristics
• National Statistician’s recommendation: use
all available sources
o Approach going forward to explore in more depth
the potential of admin data
51
Outline
• Framework for characteristics post 2021
• Methods for combining sources
• Academic research
• Examples
o Income
o Population by ethnic group
Framework for population characteristics
Survey
• Admin data available for target variable
• Admin data for characteristic associated with
target variable
• Admin data is a proxy of target variable
Admin
• No administrative data available for target
variable
Admin
Integrated
sources
Survey
Framework for population characteristics
Survey
• Admin data available for target variable
• Direct counts or estimates from administrative
data
•Admin data for characteristic associated with target variable
• Regression model for estimates
•Admin data is a proxy of target variable
• Structural model for estimates
Admin
• No administrative data available for target variable
• Direct survey estimates
Admin
Integrated
sources
Survey
Local
benchmarks
National /
regional
Microlevel /
multivariate
Framework – top down approach
Survey
Integrated
sources
Admin
Two key methods
Regression model
• Auxiliary data are correlated
with the target variable
• Model to define relationship
Structural model
• Auxiliary data same structure as
target variable
• Model to define the structure
Local Authority claimant count by
unemployment
Methods – ONS applications
Regression model
 Mean Household income
(MSOA)
 Median Household
income (MSOA) for 2011
 Unemployment (LA)
 Emigration (LA)
SPREE
 Broad ethnic group (LA)
Regression model
× Unemployment (MSOA)
× Informal caring (wards)
× Crime and fear of crime
(wards)
× Mental health of children
and adolescents (wards)
× Adult Neurotic Disorder
(wards)
Role of survey data
Survey
• Direct counts or estimates from administrative
data - Survey to adjust for under or over
coverage, measurement error
• Regression model - Survey estimates
strengthened by admin sources
• Structural model - Survey provides accurate
marginal totals
Admin
• No administrative - Direct survey estimates
Admin
Integrated
sources
Survey
Academic input
• Collaborative projects Structure Preserving
Estimation (SPREE),
• Expert advisory group Small area estimation
• Funded research/ Bids National Centre Research
Methods
• Conferences NCRM Bath (July)
SAE Maastricht (Aug)
Summary
1. Framework: top down
 gaps at small area, micro level & multivariate
2. Methods: two approaches for combining sources
 understand wider application
3. Academic momentum
Case Study: Income outputs
Meghan Elkin
Current published income outputs
Admin
outputs
Integrated
sources
Survey
outputs Majority of current income outputs
(lowest geography parliamentary constituency)
Small Area Income Estimates
(modelled household weekly income at MSOA)
Working towards multivariate, small area
income outputs
Income definition (Canberra Handbook)
Ideally achieve gross income:
• Income from employment
e.g. employee income and income from self-employment
• Property income
e.g. income from financial and non-financial assets
• Current transfer received
e.g. social security schemes, pensions
Source: UNECE Canberra Group Handbook on Household Income Statistics
Components of income & admin data
Total
Income
Current
Transfers
received
Property
income
Self
employment
Employment
via
PAYE
Financial
assets
Royalties
Rent
Dividends
Interest
Personal
pension
Benefits
Non-financial
assets
Tax
Credits
Un-
declared
Size of the bubbles
not relative to
proportion of income
Social
security &
assistance
State
pension
Pensions
Current
transfers
from non-profit
institutions
Current transfers
from other
households
This diagram does not provide a full disaggregation of components. For more detail see the Canberra Group Handbook.
Occupational
pension
Income
from
employment
Green = access to
admin data
Amber = admin data
available
Red = unknown/admin
data not available –
will need to estimate
using surveys
Un-
declared
Admin Data Census plans for income
Increased use of Admin Data
Direct estimates from
admin data individual
income estimates
Combining admin
data with surveys
(work with SAIE)
More developed
publication e.g.
components of income
Admin
outputs
Produce household
income estimates
More detailed PAYE &
self-assessment data
2016 2017 2018
Integrated
sources
Output
quality
National OA
Limited
coverage
LA LSOA OA
Income research outputs 2016 definition
Output Distributions
Population Those resident at 30 June 2013 on the Statistical Population Dataset (SPD) aged 16 and over
Geography England and Wales
Geographic level Local Authorities
Unit level Individuals
Reference period Annual
Time period Tax year 2013/14
Source of income PAYE earnings (employment and occupational pensions), child and working tax credits,
housing benefit, many DWP benefits
Accrual or receipt Receipt – administrative data on income and benefits is recorded on receipt of the income or
benefits
Location Income by individual’s home address
Proportion of SPD population that has
some income information by age
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88
Age
Males
Females
Proportion of the
population
Proportion of SPD population that has
some income information
Proposed income bands for 2016 outputs
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
LA1
LA2
Zero
0-5K
5-10K
10-15K
15-20K
20-30K
30-40K
40-60K
60K+
Missing
Proportion of the population
Case Study: Local Authority
Population Estimates by Ethnic
Group using Generalised Structure
Preserving Estimation (GSPREE)
June 2016
Census Table
Population by Local Authority and Ethnic Group
England (March 2011)
Local Authority White Mixed Asian Chinese Black Other Total
Fareham 107959 1359 1200 467 357 239 111581
Southampton 203528 5678 16443 3449 5067 2717 236882
Portsmouth 181182 5467 9863 2611 3777 2156 205056
Winchester 111577 1626 1894 745 457 296 116595
……. … … … … … … …
Bath & NE
Somerset 166473 2898 2665 1912 1326 742 176016
Data for Ethnic Group
• 2011 Census estimates (Mar 2011)
Proxy: Detailed cross tabulation but outdated
• School Census (Jan 2014)
Proxy: Detailed cross tabulation but age 5-15 only
• Annual Population Survey (2014)
Total population by ethnic group
• Mid Year Population Estimates (June 2014)
Total population by local authority
Data for Ethnic Group
Census 2011 MYE
2014
White Mixed Asian Chinese Black Other Total
Fareham 107959 1359 1200 467 357 239 ……..
Southampton 203528 5678 16443 3449 5067 2717 ……..
Portsmouth 181182 5467 9863 2611 3777 2156 ….....
…. … … … … … … …
Tower Hamlets 114819 10360 96392 8109 18629 5787 ……..
Slough 64053 4758 54900 797 12115 3582 ………
….. … … … … … … …
APS July 2012 - June 2014 (weighted estimates)
National total ………. ………. ………. ………. ……….. ……….
School Census Dec 2014
White Mixed Asian Chinese Black Other
Fareham … … … … … …
Southampton … … … … … …
Portsmouth … … … … … …
…. … … … … … …
Tower Hamlets … … … … … …
Slough … … … … … …
….. … … … … … …
Solution…
• Combine administrative and census data with
survey data to borrow strength and produce
reliable estimate for each cell (domain) using
GSPREE (Zhang and Chambers, 2004 and
Luna-Hernandez, A. 2014).
Applying GSPREE
• Step 1: Estimate the association structure by relating
survey counts (Yaj) to census counts (Xaj):
logYaj =g a + lj + baaj
X
lj = 0
jå
1, , , 1, ,a A j J K K
aaj
Y
= baaj
X

2011 Census (Xaj)
White Mixed Asian
Chines
e Black Other
Fareham 107959 1359 1200 467 357 239
Southampton 203528 5678 16443 3449 5067 2717
Portsmouth 181182 5467 9863 2611 3777 2156
…. … … … … … …
Tower Hamlets 114819 10360 96392 8109 18629 5787
Slough 64053 4758 54900 797 12115 3582
….. … … … … … …
2013 School Census (Xaj)
White Mixed Asian Chinese Black Other
Fareham … … … … … …
Southampton … … … … … …
Portsmouth … … … … … …
….
Tower Hamlets … … … … … …
Slough
….. … … … … … …
APS (Yaj)
Jan 2012-Dec 2014
White Mixed Asian Chinese Black Other
Fareham … … … … … …
Southampton … … … … … …
Portsmouth … … … … … …
….
Tower Hamlets … … … … … …
Slough
….. … … … … … …
- obtained via
MLE
- Poisson or
Multinomial
distribution
assumed
- Predict cell counts
but no
benchmarking
ˆb
Applying GSPREE
• Step 2: Benchmark updated cell counts to margins totals
Iterative Proportional Fitting (IPF) to impose the known row
and column totals to the cell counts obtained in step 1
GSPREE
Estimates
Dec
2014
MYE
2014
White Mixed Asian ChineseBlack Other Total
Fareham … … … … … … ……..
Southampton … … … … … … ……..
Portsmouth … … … … … … ……..
…. … … … … … … …
Tower Hamlets … … … … … … ……..
Slough … … … … … … ……..
….. … … … … … … …
APS July 2012 - June 2014 (weighted estimates)
National total ……….. …….. ………. …….. ……… …………
• Step 3: Obtain precision estimates via bootstrap
Distribution of LA estimates by ethnic
group, 2014
(England)
RMSE. LA by ethnic group, 2014
• Overall, GSPREE is successful in providing reliable
estimates for most LAs.
• However, non-negligible RMSEs (and CVs) are
observed in some areas
Fixed Effects GSPREE estimator (England)
Conclusions
• GSPREE shows good performance
Small RMSE in most LAs
• Work in progress
Validation study (1991/2001 Census)
GSPREE: 2001 Census x 2011 data (APS, MYE, ESC)
Validation: 2011 Census
• Further work …
Modelling strategy for more detailed categories
Consider SPD as row totals
Consider only School Census as proxy data
Consider different attributes
References
Purcell, N. J. and Kish, L. (1980). Postcensal Estimates for Local Areas
(or Domains). International Statistical Review, 48, 3-18.
Zhang, L.C. and Chambers, R. (2004). Small area estimates for cross-
classifications. Journal of the Royal Statistical Society, B, 66, 479–
496.
Luna-Hernandez, A. (2014). On Small Area Estimation for
Compositions Using Structure Preserving Models. Unpublished PhD
upgrade document, Department of Social Statistics and
Demography, University of Southampton.
Contacts
• Further feedback on today’s session please
contact us at:
Beyond.2021.Research.and.Design@ons.gov
.uk

More Related Content

What's hot

Plans for the online 2021 Census with increased use of administrative and sur...
Plans for the online 2021 Census with increased use of administrative and sur...Plans for the online 2021 Census with increased use of administrative and sur...
Plans for the online 2021 Census with increased use of administrative and sur...UKDSCensus
 
Delivering early benefits and trial outputs using administrative data
Delivering early benefits and trial outputs using administrative dataDelivering early benefits and trial outputs using administrative data
Delivering early benefits and trial outputs using administrative dataUKDSCensus
 
Evaluating the feasibility of using administrative data in the context of cen...
Evaluating the feasibility of using administrative data in the context of cen...Evaluating the feasibility of using administrative data in the context of cen...
Evaluating the feasibility of using administrative data in the context of cen...UKDSCensus
 
Using Address-Based Sampling to Recruit to Pew Research Center’s American Tre...
Using Address-Based Sampling to Recruit to Pew Research Center’s American Tre...Using Address-Based Sampling to Recruit to Pew Research Center’s American Tre...
Using Address-Based Sampling to Recruit to Pew Research Center’s American Tre...Nick Bertoni
 
Indicator mapping across Health 2020, SDGs and NCD frameworks and the joint ...
Indicator mapping across Health 2020, SDGs and NCD frameworks and the joint ...Indicator mapping across Health 2020, SDGs and NCD frameworks and the joint ...
Indicator mapping across Health 2020, SDGs and NCD frameworks and the joint ...WHO Regional Office for Europe
 
Gender equality Initiatives- Teresa FRAGOSO (Portugal)
Gender equality Initiatives- Teresa FRAGOSO (Portugal)Gender equality Initiatives- Teresa FRAGOSO (Portugal)
Gender equality Initiatives- Teresa FRAGOSO (Portugal)OECD Governance
 
Tiina Laatikainen: Improving the information base and optimising service solu...
Tiina Laatikainen: Improving the information base and optimising service solu...Tiina Laatikainen: Improving the information base and optimising service solu...
Tiina Laatikainen: Improving the information base and optimising service solu...STN IMPRO
 
Gender equality - Magnea MARINOSDOTTIR (Iceland)
Gender equality - Magnea MARINOSDOTTIR (Iceland)Gender equality - Magnea MARINOSDOTTIR (Iceland)
Gender equality - Magnea MARINOSDOTTIR (Iceland)OECD Governance
 
551_Lessons Learned from implementation of SHAPMoS
551_Lessons Learned from implementation of SHAPMoS551_Lessons Learned from implementation of SHAPMoS
551_Lessons Learned from implementation of SHAPMoSMavis Vilane
 
Gender Equality - Elena GENTILI (OECD)
Gender Equality - Elena GENTILI (OECD)Gender Equality - Elena GENTILI (OECD)
Gender Equality - Elena GENTILI (OECD)OECD Governance
 
Quality assurance in official statistics
Quality assurance in official statisticsQuality assurance in official statistics
Quality assurance in official statisticsSrikalaChitti
 
UK Statistical Policy Landscape
UK Statistical Policy LandscapeUK Statistical Policy Landscape
UK Statistical Policy LandscapePaul Askew
 
Мартін Яско. Вимірювання ефективності державних комунікацій: підхід Естонії
Мартін Яско. Вимірювання ефективності державних комунікацій: підхід ЕстоніїМартін Яско. Вимірювання ефективності державних комунікацій: підхід Естонії
Мартін Яско. Вимірювання ефективності державних комунікацій: підхід ЕстоніїUkraineCrisisMediaCenter
 
ESH E-Rate Item 21 Data Sample Report - States
ESH E-Rate Item 21 Data Sample Report - StatesESH E-Rate Item 21 Data Sample Report - States
ESH E-Rate Item 21 Data Sample Report - Statesedsuperhighway
 
M & E approaches for key population programs: perils, pitfalls, and promising...
M & E approaches for key population programs: perils, pitfalls, and promising...M & E approaches for key population programs: perils, pitfalls, and promising...
M & E approaches for key population programs: perils, pitfalls, and promising...LINKAGES
 
Ict enhanced credibility of election results in the kyrgyz republic - juwhan ...
Ict enhanced credibility of election results in the kyrgyz republic - juwhan ...Ict enhanced credibility of election results in the kyrgyz republic - juwhan ...
Ict enhanced credibility of election results in the kyrgyz republic - juwhan ...Cecep Husni Mubarok, S.Kom., M.T.
 

What's hot (20)

2021 Census collection strategy
2021 Census collection strategy2021 Census collection strategy
2021 Census collection strategy
 
Statistical design for the 2021 Census
Statistical design for the 2021 CensusStatistical design for the 2021 Census
Statistical design for the 2021 Census
 
Plans for the online 2021 Census with increased use of administrative and sur...
Plans for the online 2021 Census with increased use of administrative and sur...Plans for the online 2021 Census with increased use of administrative and sur...
Plans for the online 2021 Census with increased use of administrative and sur...
 
Delivering early benefits and trial outputs using administrative data
Delivering early benefits and trial outputs using administrative dataDelivering early benefits and trial outputs using administrative data
Delivering early benefits and trial outputs using administrative data
 
Evaluating the feasibility of using administrative data in the context of cen...
Evaluating the feasibility of using administrative data in the context of cen...Evaluating the feasibility of using administrative data in the context of cen...
Evaluating the feasibility of using administrative data in the context of cen...
 
Opportunities for alternative data sources
Opportunities for alternative data sourcesOpportunities for alternative data sources
Opportunities for alternative data sources
 
Using Address-Based Sampling to Recruit to Pew Research Center’s American Tre...
Using Address-Based Sampling to Recruit to Pew Research Center’s American Tre...Using Address-Based Sampling to Recruit to Pew Research Center’s American Tre...
Using Address-Based Sampling to Recruit to Pew Research Center’s American Tre...
 
Indicator mapping across Health 2020, SDGs and NCD frameworks and the joint ...
Indicator mapping across Health 2020, SDGs and NCD frameworks and the joint ...Indicator mapping across Health 2020, SDGs and NCD frameworks and the joint ...
Indicator mapping across Health 2020, SDGs and NCD frameworks and the joint ...
 
Gender equality Initiatives- Teresa FRAGOSO (Portugal)
Gender equality Initiatives- Teresa FRAGOSO (Portugal)Gender equality Initiatives- Teresa FRAGOSO (Portugal)
Gender equality Initiatives- Teresa FRAGOSO (Portugal)
 
Tiina Laatikainen: Improving the information base and optimising service solu...
Tiina Laatikainen: Improving the information base and optimising service solu...Tiina Laatikainen: Improving the information base and optimising service solu...
Tiina Laatikainen: Improving the information base and optimising service solu...
 
Gender equality - Magnea MARINOSDOTTIR (Iceland)
Gender equality - Magnea MARINOSDOTTIR (Iceland)Gender equality - Magnea MARINOSDOTTIR (Iceland)
Gender equality - Magnea MARINOSDOTTIR (Iceland)
 
551_Lessons Learned from implementation of SHAPMoS
551_Lessons Learned from implementation of SHAPMoS551_Lessons Learned from implementation of SHAPMoS
551_Lessons Learned from implementation of SHAPMoS
 
Gender Equality - Elena GENTILI (OECD)
Gender Equality - Elena GENTILI (OECD)Gender Equality - Elena GENTILI (OECD)
Gender Equality - Elena GENTILI (OECD)
 
Quality assurance in official statistics
Quality assurance in official statisticsQuality assurance in official statistics
Quality assurance in official statistics
 
Arrimage de données sociodémographiques et de santé pour un portrait micro‐te...
Arrimage de données sociodémographiques et de santé pour un portrait micro‐te...Arrimage de données sociodémographiques et de santé pour un portrait micro‐te...
Arrimage de données sociodémographiques et de santé pour un portrait micro‐te...
 
UK Statistical Policy Landscape
UK Statistical Policy LandscapeUK Statistical Policy Landscape
UK Statistical Policy Landscape
 
Мартін Яско. Вимірювання ефективності державних комунікацій: підхід Естонії
Мартін Яско. Вимірювання ефективності державних комунікацій: підхід ЕстоніїМартін Яско. Вимірювання ефективності державних комунікацій: підхід Естонії
Мартін Яско. Вимірювання ефективності державних комунікацій: підхід Естонії
 
ESH E-Rate Item 21 Data Sample Report - States
ESH E-Rate Item 21 Data Sample Report - StatesESH E-Rate Item 21 Data Sample Report - States
ESH E-Rate Item 21 Data Sample Report - States
 
M & E approaches for key population programs: perils, pitfalls, and promising...
M & E approaches for key population programs: perils, pitfalls, and promising...M & E approaches for key population programs: perils, pitfalls, and promising...
M & E approaches for key population programs: perils, pitfalls, and promising...
 
Ict enhanced credibility of election results in the kyrgyz republic - juwhan ...
Ict enhanced credibility of election results in the kyrgyz republic - juwhan ...Ict enhanced credibility of election results in the kyrgyz republic - juwhan ...
Ict enhanced credibility of election results in the kyrgyz republic - juwhan ...
 

Similar to Administrative data census research

ONS presentation at RSS South Wales poverty & inequality stats event
ONS presentation at RSS South Wales poverty & inequality stats eventONS presentation at RSS South Wales poverty & inequality stats event
ONS presentation at RSS South Wales poverty & inequality stats eventRichard Tonkin
 
Research Outputs for small areas 2017: analysis and findings
Research Outputs for small areas 2017: analysis and findingsResearch Outputs for small areas 2017: analysis and findings
Research Outputs for small areas 2017: analysis and findingsOffice for National Statistics
 
Ons households july 17 research cp ml
Ons households july 17   research cp mlOns households july 17   research cp ml
Ons households july 17 research cp mlonsaddresses
 
2014 SAHIE: Overview with Census Experts
2014 SAHIE: Overview with Census Experts2014 SAHIE: Overview with Census Experts
2014 SAHIE: Overview with Census Expertssoder145
 
Alex zscheile project
Alex zscheile projectAlex zscheile project
Alex zscheile projectajzscheile
 
Transforming the ONS’s household financial statistics
Transforming the ONS’s household financial statisticsTransforming the ONS’s household financial statistics
Transforming the ONS’s household financial statisticsOffice for National Statistics
 
Transforming population and migration statistics: Research into developing an...
Transforming population and migration statistics: Research into developing an...Transforming population and migration statistics: Research into developing an...
Transforming population and migration statistics: Research into developing an...Office for National Statistics
 
Medicaid Reporting Errors in Four National Surveys: ACS, CPS, MEPS, and NHIS
Medicaid Reporting Errors in Four National Surveys: ACS, CPS, MEPS, and NHISMedicaid Reporting Errors in Four National Surveys: ACS, CPS, MEPS, and NHIS
Medicaid Reporting Errors in Four National Surveys: ACS, CPS, MEPS, and NHISsoder145
 
Webinar for LSC grantees, Estimating LSC Funding Changes Based on Shifts in t...
Webinar for LSC grantees, Estimating LSC Funding Changes Based on Shifts in t...Webinar for LSC grantees, Estimating LSC Funding Changes Based on Shifts in t...
Webinar for LSC grantees, Estimating LSC Funding Changes Based on Shifts in t...LegalServicesCorporation
 
A new opening for transparency and transformation - the benefits of the commu...
A new opening for transparency and transformation - the benefits of the commu...A new opening for transparency and transformation - the benefits of the commu...
A new opening for transparency and transformation - the benefits of the commu...Department of Health
 
UK Labour Market - April 2015
UK Labour Market - April 2015UK Labour Market - April 2015
UK Labour Market - April 2015Miqui Mel
 
Local Transformation Plans Review
Local Transformation Plans ReviewLocal Transformation Plans Review
Local Transformation Plans ReviewCYP MH
 
Session 1 Presentation for Volume 1 Service Providers Manual Introduction HMI...
Session 1 Presentation for Volume 1 Service Providers Manual Introduction HMI...Session 1 Presentation for Volume 1 Service Providers Manual Introduction HMI...
Session 1 Presentation for Volume 1 Service Providers Manual Introduction HMI...CharanjitBasumatary
 
8 M&E: Data Sources
8 M&E: Data Sources8 M&E: Data Sources
8 M&E: Data SourcesTony
 

Similar to Administrative data census research (20)

Admin data census
Admin data censusAdmin data census
Admin data census
 
ONS presentation at RSS South Wales poverty & inequality stats event
ONS presentation at RSS South Wales poverty & inequality stats eventONS presentation at RSS South Wales poverty & inequality stats event
ONS presentation at RSS South Wales poverty & inequality stats event
 
Research Outputs for small areas 2017: analysis and findings
Research Outputs for small areas 2017: analysis and findingsResearch Outputs for small areas 2017: analysis and findings
Research Outputs for small areas 2017: analysis and findings
 
Ons households july 17 research cp ml
Ons households july 17   research cp mlOns households july 17   research cp ml
Ons households july 17 research cp ml
 
ONS household income statistics user event
ONS household income statistics user event ONS household income statistics user event
ONS household income statistics user event
 
PPX January 2016 LE
PPX January 2016 LEPPX January 2016 LE
PPX January 2016 LE
 
2014 SAHIE: Overview with Census Experts
2014 SAHIE: Overview with Census Experts2014 SAHIE: Overview with Census Experts
2014 SAHIE: Overview with Census Experts
 
Alex zscheile project
Alex zscheile projectAlex zscheile project
Alex zscheile project
 
Transforming the ONS’s household financial statistics
Transforming the ONS’s household financial statisticsTransforming the ONS’s household financial statistics
Transforming the ONS’s household financial statistics
 
Transforming population and migration statistics: Research into developing an...
Transforming population and migration statistics: Research into developing an...Transforming population and migration statistics: Research into developing an...
Transforming population and migration statistics: Research into developing an...
 
Medicaid Reporting Errors in Four National Surveys: ACS, CPS, MEPS, and NHIS
Medicaid Reporting Errors in Four National Surveys: ACS, CPS, MEPS, and NHISMedicaid Reporting Errors in Four National Surveys: ACS, CPS, MEPS, and NHIS
Medicaid Reporting Errors in Four National Surveys: ACS, CPS, MEPS, and NHIS
 
Webinar for LSC grantees, Estimating LSC Funding Changes Based on Shifts in t...
Webinar for LSC grantees, Estimating LSC Funding Changes Based on Shifts in t...Webinar for LSC grantees, Estimating LSC Funding Changes Based on Shifts in t...
Webinar for LSC grantees, Estimating LSC Funding Changes Based on Shifts in t...
 
A new opening for transparency and transformation - the benefits of the commu...
A new opening for transparency and transformation - the benefits of the commu...A new opening for transparency and transformation - the benefits of the commu...
A new opening for transparency and transformation - the benefits of the commu...
 
UK Labour Market - April 2015
UK Labour Market - April 2015UK Labour Market - April 2015
UK Labour Market - April 2015
 
Local Transformation Plans Review
Local Transformation Plans ReviewLocal Transformation Plans Review
Local Transformation Plans Review
 
TEA Webinar 2018
TEA Webinar 2018TEA Webinar 2018
TEA Webinar 2018
 
Session 1 Presentation for Volume 1 Service Providers Manual Introduction HMI...
Session 1 Presentation for Volume 1 Service Providers Manual Introduction HMI...Session 1 Presentation for Volume 1 Service Providers Manual Introduction HMI...
Session 1 Presentation for Volume 1 Service Providers Manual Introduction HMI...
 
8 M&E: Data Sources
8 M&E: Data Sources8 M&E: Data Sources
8 M&E: Data Sources
 
Maternity and Children's Data Sets
Maternity and Children's Data SetsMaternity and Children's Data Sets
Maternity and Children's Data Sets
 
ONS Economic Forum
ONS Economic ForumONS Economic Forum
ONS Economic Forum
 

More from Office for National Statistics

Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptxSlideshare - ONS Economic Forum Slidepack - 18 March 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptxOffice for National Statistics
 
Slideshare - ONS Economic Forum Slidepack - 19 February 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 19 February 2024.pptxSlideshare - ONS Economic Forum Slidepack - 19 February 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 19 February 2024.pptxOffice for National Statistics
 
SlideShare ONS Economic Forum Slidepack - 22 January 2024
SlideShare ONS Economic Forum Slidepack - 22 January 2024SlideShare ONS Economic Forum Slidepack - 22 January 2024
SlideShare ONS Economic Forum Slidepack - 22 January 2024Office for National Statistics
 
Beyond GDP: international developments and emerging frameworks - 26 September...
Beyond GDP: international developments and emerging frameworks - 26 September...Beyond GDP: international developments and emerging frameworks - 26 September...
Beyond GDP: international developments and emerging frameworks - 26 September...Office for National Statistics
 
SlideShare ONS Economic Forum Slidepack - 11 December 2023
SlideShare ONS Economic Forum Slidepack - 11 December 2023SlideShare ONS Economic Forum Slidepack - 11 December 2023
SlideShare ONS Economic Forum Slidepack - 11 December 2023Office for National Statistics
 
SlideShare ONS Economic Forum Slidepack - 13 November 2023
SlideShare ONS Economic Forum Slidepack - 13 November 2023SlideShare ONS Economic Forum Slidepack - 13 November 2023
SlideShare ONS Economic Forum Slidepack - 13 November 2023Office for National Statistics
 
SlideShare ONS Economic Forum Slidepack - 16 October 2023
SlideShare ONS Economic Forum Slidepack - 16 October 2023SlideShare ONS Economic Forum Slidepack - 16 October 2023
SlideShare ONS Economic Forum Slidepack - 16 October 2023Office for National Statistics
 
So what does ‘Beyond GDP’ mean for the UK – 12 October 2023
So what does ‘Beyond GDP’ mean for the UK – 12 October 2023So what does ‘Beyond GDP’ mean for the UK – 12 October 2023
So what does ‘Beyond GDP’ mean for the UK – 12 October 2023Office for National Statistics
 
GDP after 2025: updating national accounts and balance of payments – 11 Octob...
GDP after 2025: updating national accounts and balance of payments – 11 Octob...GDP after 2025: updating national accounts and balance of payments – 11 Octob...
GDP after 2025: updating national accounts and balance of payments – 11 Octob...Office for National Statistics
 
SlideShare Measuring the Economy Slidepack - 29 September 2023
SlideShare Measuring the Economy Slidepack - 29 September 2023SlideShare Measuring the Economy Slidepack - 29 September 2023
SlideShare Measuring the Economy Slidepack - 29 September 2023Office for National Statistics
 
SlideShare ONS Economic Forum Slidepack - 18 September 2023
SlideShare ONS Economic Forum Slidepack - 18 September 2023SlideShare ONS Economic Forum Slidepack - 18 September 2023
SlideShare ONS Economic Forum Slidepack - 18 September 2023Office for National Statistics
 
ONS Local presents Suffolk County Council's Cost of Living Dashboard
ONS Local presents Suffolk County Council's Cost of Living DashboardONS Local presents Suffolk County Council's Cost of Living Dashboard
ONS Local presents Suffolk County Council's Cost of Living DashboardOffice for National Statistics
 
ONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsOffice for National Statistics
 
ONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsOffice for National Statistics
 
ONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsOffice for National Statistics
 
ONS Local presents: Adult Education Outcomes in London
ONS Local presents: Adult Education Outcomes in LondonONS Local presents: Adult Education Outcomes in London
ONS Local presents: Adult Education Outcomes in LondonOffice for National Statistics
 

More from Office for National Statistics (20)

Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptxSlideshare - ONS Economic Forum Slidepack - 18 March 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 18 March 2024.pptx
 
Slideshare - ONS Economic Forum Slidepack - 19 February 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 19 February 2024.pptxSlideshare - ONS Economic Forum Slidepack - 19 February 2024.pptx
Slideshare - ONS Economic Forum Slidepack - 19 February 2024.pptx
 
SlideShare ONS Economic Forum Slidepack - 22 January 2024
SlideShare ONS Economic Forum Slidepack - 22 January 2024SlideShare ONS Economic Forum Slidepack - 22 January 2024
SlideShare ONS Economic Forum Slidepack - 22 January 2024
 
Beyond GDP: international developments and emerging frameworks - 26 September...
Beyond GDP: international developments and emerging frameworks - 26 September...Beyond GDP: international developments and emerging frameworks - 26 September...
Beyond GDP: international developments and emerging frameworks - 26 September...
 
SlideShare ONS Economic Forum Slidepack - 11 December 2023
SlideShare ONS Economic Forum Slidepack - 11 December 2023SlideShare ONS Economic Forum Slidepack - 11 December 2023
SlideShare ONS Economic Forum Slidepack - 11 December 2023
 
SlideShare ONS Economic Forum Slidepack - 13 November 2023
SlideShare ONS Economic Forum Slidepack - 13 November 2023SlideShare ONS Economic Forum Slidepack - 13 November 2023
SlideShare ONS Economic Forum Slidepack - 13 November 2023
 
SlideShare ONS Economic Forum Slidepack - 16 October 2023
SlideShare ONS Economic Forum Slidepack - 16 October 2023SlideShare ONS Economic Forum Slidepack - 16 October 2023
SlideShare ONS Economic Forum Slidepack - 16 October 2023
 
So what does ‘Beyond GDP’ mean for the UK – 12 October 2023
So what does ‘Beyond GDP’ mean for the UK – 12 October 2023So what does ‘Beyond GDP’ mean for the UK – 12 October 2023
So what does ‘Beyond GDP’ mean for the UK – 12 October 2023
 
GDP after 2025: updating national accounts and balance of payments – 11 Octob...
GDP after 2025: updating national accounts and balance of payments – 11 Octob...GDP after 2025: updating national accounts and balance of payments – 11 Octob...
GDP after 2025: updating national accounts and balance of payments – 11 Octob...
 
SlideShare Measuring the Economy Slidepack - 29 September 2023
SlideShare Measuring the Economy Slidepack - 29 September 2023SlideShare Measuring the Economy Slidepack - 29 September 2023
SlideShare Measuring the Economy Slidepack - 29 September 2023
 
Why dashboards?
Why dashboards?Why dashboards?
Why dashboards?
 
SlideShare ONS Economic Forum Slidepack - 18 September 2023
SlideShare ONS Economic Forum Slidepack - 18 September 2023SlideShare ONS Economic Forum Slidepack - 18 September 2023
SlideShare ONS Economic Forum Slidepack - 18 September 2023
 
Connecting to the StatXplore API in PowerBI
Connecting to the StatXplore API in PowerBIConnecting to the StatXplore API in PowerBI
Connecting to the StatXplore API in PowerBI
 
ONS Local presents Suffolk County Council's Cost of Living Dashboard
ONS Local presents Suffolk County Council's Cost of Living DashboardONS Local presents Suffolk County Council's Cost of Living Dashboard
ONS Local presents Suffolk County Council's Cost of Living Dashboard
 
ONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIs
 
ONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIs
 
ONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIsONS Local and Data Science Community Workshop 1: How to use APIs
ONS Local and Data Science Community Workshop 1: How to use APIs
 
ONS Local presents: Adult Education Outcomes in London
ONS Local presents: Adult Education Outcomes in LondonONS Local presents: Adult Education Outcomes in London
ONS Local presents: Adult Education Outcomes in London
 
ONS Local presents: Explore Subnational Statistics
ONS Local presents: Explore Subnational StatisticsONS Local presents: Explore Subnational Statistics
ONS Local presents: Explore Subnational Statistics
 
ONS Local presents - Census 2021 Education Analysis
ONS Local presents - Census 2021 Education AnalysisONS Local presents - Census 2021 Education Analysis
ONS Local presents - Census 2021 Education Analysis
 

Recently uploaded

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 

Recently uploaded (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 

Administrative data census research

  • 1. Administrative data census research Chair: Becky Tinsley, Admin Data Census, ONS Becky.tinsley@ons.gov.uk
  • 2. Population and Household Estimates – what we have done and what we plan to do Chris Hill, Ali Dent and Claire Pereira Administrative Data Census Project Census Transformation Programme
  • 3. Overview Administrative Data Research Outputs: • Population estimates – background, our progress so far and our future plans (Chris and Ali) • Producing household estimates from administrative data (Claire) Note: These research outputs are NOT official statistics on the population
  • 4. Background Beyond 2011 programme (April 2011) • review the future provision of population statistics in England and Wales and inform government and Parliament about options for the next census • culminated in the National Statistician’s recommendation on the future of the census and population statistics (March 2014) The Census Transformation Programme (January 2015) to take forward the National Statistician’s recommendation: • deliver a predominantly online census in 2021 • increased use of administrative data and surveys to enhance the statistics from the 2021 census and improve annual statistics between censuses Administrative Data Census Project - aim is to produce the type of information that is collected by a ten-yearly census (on housing, households and people) from use of administrative data and surveys
  • 5. Administrative Data Census Project – Research Outputs Key aim of the Research Outputs is: • to replicate as many census outputs as possible using admin data (and surveys) to compare with the 2021 Census • Size of population • Number and structure of households • Characteristics of housing and the population Continued development of the methodology based on acquisition of new data sources and user feedback Publish an annual assessment each spring to show progress of our ability to move to an Administrative Data Census in the next decade
  • 6. A long way to go…but we have begun • Published first set of Research Outputs Oct 2015 • Published first Annual Assessment May 2016 • Next set of Research Outputs published – expanding the range Autumn 2016
  • 7. What was included in the 2015 release? • research outputs for each LA in England and Wales as a series of admin data population estimates for 2011, 2013 and 2014 by 5 year age-sex groups • analytical report comparing these to the 2011 Census and subsequently the ONS mid-year population estimates • case studies to highlight quality issues with the admin data • interactive maps and population pyramids • administrative data update paper – plans and aspirations for future years • feedback from users - aim of improving our methods
  • 8. Producing population estimates from administrative data (SPD) NHS Patient Register (PR) DWP/HMRC Customer Information System (CIS) Higher Education Statistics Agency (HESA) data (students) population estimates Included in Statistical Population Dataset (SPD) If in different location on PR & CIS, split half and half across two addresses Statistical Population Dataset – SPD V1.0
  • 9. Performance of SPD v1.0 compared with the 2011 Census estimates by LA 94% of LA total population estimates within 3.8% of Census estimate in 2011 Admin data method lower than 2011 Census Admin data method higher than 2011 Census Percentage difference from 2011 Census estimates
  • 10. Performance of SPDv1.0 compared with the 2014 mid-year estimates by LA 90% of LA total population estimates within 3.8% of mid- year estimate in 2014 Admin data method lower than 2011 Census Admin data method higher than 2011 Census Percentage difference from 2014 MYEs
  • 11. Feedback summary • the need for a Population Coverage Survey to help with estimating the size of the population (considering options) • using ‘activity data’ (1) to help reduce levels of over-coverage that are seen for particular age groups (some progress) • refining the Statistical Population Dataset inclusion and exclusion rules (changes made) • reviewing the quality standards that are used to assess the quality of the SPDs (considering options) • producing population estimates for small areas, within a local authority (potentially autumn 2016) 1 Information from administrative data sources about when individuals have interacted with systems or services, such as the National Insurance, tax or benefits systems, or a hospital visit through the NHS system.
  • 12. SPD Developments for 2016 SPD (Statistical Population Dataset) used to estimates of the size of the population by anonymously linking multiple administrative datasets • Continue with SPD v1.0 for 2015 estimates (stable) • SPD v2.0 (improved model) will be used to produce pop estimates for 2011 and 2015 SPD v2.0 changes • Improve overall coverage of the usual resident population • Redistribute people in the correct location
  • 13. Plans for 2016 Research Outputs • Population estimates – expanding the breadth and detail • Improvements to the methods used to produce administrative data population estimates • Outputs on the number of households • Research on income from combined PAYE and benefits data • Stagger the outputs over the autumn
  • 14. What are we planning to publish this year? Package 1 Population estimates (National and LA) by LA, sex and 5 year age-group 2015 (As last year, but extends to include a new time series for 2011 and 2015, and the old time series extends to SYOA) Autumn 2016 Package 2 NEW Population estimates (Small Area) by LSOA, sex and 5 year age-group Autumn 2016 Package 3 NEW Household estimates (Number of households) Combined PAYE and benefits research by LA (2011 and 2015) by LA (2013/14 Tax Year) Autumn 2016 All content and timings are provisional
  • 15. Focus of methodological research for 2016 Males and females (where comparison data is higher or lower than official estimates) percentage difference 2011 (England and Wales) Add of other admin data Activity Data Improve Matching Methodology, increasing number of matches
  • 16. Tackling undercoverage for school age children User feedback had suggested additional administrative sources to use including the School Census • School Census = record level source that includes all pupils at state Schools, produced annually • SPD V1 includes matches between any two of PR, DWP-CIS and HESA With School Census we can find additional matches to include in an SPD: • PR-SC matches • CIS-SC matches
  • 17. Also - improve use of matches for students • In SPD V1 – it is possible for the record identifiers to conflict, for example • In this case SPD v1 does not choose between the two HESA IDs, so PR and CIS locations are used to place people in an area • The conflict implies that there is one SPD row that might represent two people, and that we are not using the HESA location for them in SPD v1 HESA-pr match PR-CIS CIS-HESA match HESA ID (via PR) NHS number PR DWP CIS ID HESA ID (via CIS) 365421 7404747201 889877261543 739542
  • 18. HESA ID (via PR) NHS number PR DWP CIS ID HESA ID (via CIS) 365421 7404747201 889877261543 739542 Improved use of matches for students • Want to ensure there are no rows with more than 1 identifier from each dataset (e.g. prevent 2 HESA ids in 1 row) • Resolving conflicts may change number of records in SPD Could convert this to 2 matches, so each HESA ID only appears once! • Achieved by changing how we use the information for the matches • All IDs from the datasets are included in a “spine” of records (even if they are non-matches) – convenient for research HESA ID NHS number PR 365421 7404747201 DWP CIS ID HESA ID 889877261543 739542
  • 19. School Census matches and HESA conflicts: estimated effect of extra matches compared to SPD v1 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 % of SPD 1.0 estimate Singleyearofage Females Males Effects from adding SC matches Effects from HESA conflict resolution
  • 20. Estimated proportion of SPD records by source (after SPD exclusion/inclusion rules) Frequency Percent PR and CIS 45,202,800 81.2 PR, CIS and SC 7,733,800 13.9 PR,CIS and HESA 2,070,800 3.7 PR 225,100 0.4 PR and HESA 170,500 0.3 PR and SC 143,500 0.3 CIS and HESA 89,700 0.2 CIS and SC 52,200 0.1 New matches Increased number of matches
  • 21. Coverage improvements for working ages • SPD v1 used “exact” or deterministic matches (e.g. based on combination of name, address, DoB etc). • Using score based matches (probabilistic) we can find more PR to CIS matches • Over 150,000 additional matches of expected good quality can be identified • 60% male, majority are distributed evenly across ages 18-50 • 40% Female, many in 18-24 range but also some older • The extra matches to be added to the deterministic/matchkey matches to test impact (work in progress)
  • 22. “Activity” data New activity data acquired from DWP and HMRC (abbrev = BIDS): • National Benefits Database (NBD) • PAYE (Pay as you earn – income tax) • Single Housing Benefit (SHBE) • Tax Credits • excludes: Child Benefit, self-employed and people on Universal Credit Research aim: to derive broad activity to verify residency in E&W and potentially reduce overcoverage in an SPD
  • 23. Created combined DWP/HMRC activity dataset for 2011 • PAYE and TC: • Anyone present in 2010/2011 and 2011/2012 tax years • NBD: • All people with an active claim on 15/03/2011 • Any other people with JSA or ESA claim since 15/03/2009 • Any partners of the above given their own record • SHBE: • Active claim on 15/03/2011 or started after • Partner status was active on 15/03/2011 or after • Variables: • Dates • Additional variables derived from each source – eg account status, claim status, boolean tax year variables for PAYE/TC • Links to CIS, Census, PR, SPDv1
  • 24. Age-sex distribution of active DWP/HMRC records in SPDv1 2011 0 50,000 100,000 150,000 200,000 250,000 300,000 350,000 400,000 450,000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100+ Male not in BIDS Male in BIDS Female not in BIDS Female in BIDS SPD records not active
  • 25. Effect of removing inactive records • Self-employed likely to be excluded from activity dataset • Child benefit not included • Would like to acquire MORE activity data! -50% -40% -30% -20% -10% 0% 10% 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90+ Proportion of males in BIDS compared to census estimates Proportion of females in BIDS compared to census estimates Proportion of males inSPDv1 compared to census estimates Proportion of females in SPD v1 compared to census estimates
  • 26. Improving local distributions – with “PDS” activity data • PDS is our first set of health activity data • Is based on interactions with NHS, not same as Patient Register - contains history (multiple rows per person) • Extract for ONS contains “movers” – a history of locations for each person • Aim to remove uncertainty in SPD about location of people, so a single record is not allocated as a half-person in 2 locations • SPD v1.0 contains ~3.1 million half-weighted people (5.5%) • PDS information likely to be more recent than CIS/PR and may help to resolve half-weighted people • Many half-weighted records persist in SPD for multiple years, so linking to PDS from current and previous years may resolve more • Aimed to link half-weighted records to PDS in current year or earlier, and categorise
  • 27. Resolving half weighted people with PDS: linkage results (2013) • 48% of 2013 half-weighted records are not linked to any PDS record from 2013 or earlier, 52% are linked • Those who are found by most recent PDS extract they are found on: • Significant benefit from including earlier years (maximise information available for half-weighted records) PDS extract Frequency Percentage June 2013 806125 49.1 June 2012 469416 28.6 June 2011 323584 19.7 March 2011 44480 2.7
  • 28. Example of resolving location : perfect moves PR:CIS: LA Mod date E06000047 16/04/2013 LA Addr start date E09000027 09/03/2011 Most recent PDS move: Origin LA Destination LA Effective date E09000027 E06000047 04/04/2013 • Dates on PDS and PR are both later than CIS • Category 1b is the same except PR to CIS • Can assign very confidently to destination LA Category 1a – a perfect CIS-PR move:
  • 29. Producing population estimates from administrative data (SPD) NHS Patient Register DWP/HMRC Customer Information System HESA data (students) SPD population estimates Included in Statistical Population Dataset (SPD) School Census Statistical Population Dataset – SPD V2.0 resolve half-weights Add extra PR-CIS matches To be published this autumn !
  • 30. Producing household estimates from administrative data Methodology and analysis towards ONS Research Outputs 2016
  • 31. What is a household? A household is defined as: one person living alone, or a group of people (not necessarily related) living at the same address who share cooking facilities and share a living room or sitting room or dining area.
  • 32. Beyond 2011 Early research showed potential for admin data to provide number and sizes of ‘occupied addresses’. But key challenges…. • Limited data sources available. • Coverage & measurement error –undercount & people not in the right place. • Definitions – occupied address v census definition
  • 33. Aims Short term • Producing numbers of households in England and Wales by Local Authority for 2011 and 2015 • Deal with key challenges from previous work Longer term • Keep developing – build breadth and time series of households statistics • Develop alongside SPD production
  • 34. What data can we use? Address Base Tax and Benefits data Population Coverage Survey
  • 35. Comparing with other ONS outputs OA Output Area DAU Demographics Analysis Unit LFS Labour Force Survey SPD Statistical Population Database No mid year estimates as with population Can evaluate quality in 2011 by comparing with Census estimates, down to OA level. DAU produce national estimates for 1996 onwards: • Families and people in families • Households and people in households Produced from LFS – sample size - 41,000 households containing around 100,000 individuals. Internally estimates can be produced at Local Authority level.
  • 36. Can AddressBase help? C Commercial L Land M Military O Other (Ordnance Survey Only) P Parent Shell R Residential U Unclassified X Dual Use Z Object of Interest RB Ancillary Building RC Car Park Space RD Dwelling RG Garage RH House In Multiple Occupation RI Residential Institution There are 1128 classifications of address on Address Base, an Ordinance Survey product, including care home, house boat and caravan. Classifications have four levels of detail (many/most do not) and have dates attached, that allows further validation.
  • 37. Address Matching OSAPR Ordnance Survey Address-Point Reference Number UPRN Unique Property Reference Number Address matching methodology is developing at ONS - estimate a 5% increase in match rate. Need a reliable unique identifier for addresses - transition from OSAPR to UPRN
  • 38. Changes in address identification OSAPR Ordnance Survey Address-Point Reference Number UPRN Unique Property Reference Number - 1.00 2.00 3.00 4.00 East Midlands East of England London North East North West South East South West Wales West Midlands Yorkshire and… OSAPRs/UPRNs in millionsUPRNs 2014 OSAPRs 2013 Currently ONS has attached OSAPRs onto records up to 2013, with a switch to UPRNs in 2014. We would expect an increase due to housing stock growth of around 1%. 77% of LAs show an increase of more than 1 %.
  • 39. Challenges Our biggest challenges for producing household numbers Definition – household/address is not a one to one relationship. Putting people in the right place •Half weights on SPD – when sources disagree •Correct address allocation • data lags • high churn • people not deregistering • poor AddressBase matching/allocation
  • 40. Dealing with half sizes Our objective is to count each person in a household – need to resolve unmatched records Two methods 1. Source preference HESA PR CIS 2. Redistribute according to household size distributions
  • 41. Dealing with half sizes - 5,000,000 10,000,000 15,000,000 20,000,000 25,000,000 hh size 1 hh size 2 hh size 3 hh size 4 hh size 5 plus total hhs Comparing with Census Outputs Census - QS406EW Redistributed half sizes -25 -20 -15 -10 -5 0 5 10 15 20 25 hh size 1 hh size 2 hh size 3 hh size 4 hh size 5 plus total hhs % differences Redistributed half sizes Over counting large household sizes, whilst undercounting 1 and 2 person households. It is anticipated that better address matching and the use of UPRNs rather than OSAPRs will resolve some of these differences.
  • 42. Dual System Estimation ONS often uses DSE to weight up for non response. To trial the use of DSE, to weight up for undercount, I used a 4% sample by postcode taken from the Census as a proxy for a survey. To allow for differences in samples, 400 samples were taken. In the future, an annual survey similar to a population coverage survey could contribute. -14 -12 -10 -8 -6 -4 -2 0 East Midlands East of England London North East North West South East South West Wales West Midlands Yorkshire and The Humber England and Wales SPD % diff
  • 43. Dual System Estimation Entire population Sample population match (Census addresses * SPD addresses) Matched addresses Then to scale up to England and Wales aggregate
  • 44. Dual System Estimation Impact of DSE on household counts 85% of Local Authorities are within 0.5% of Census estimate 90% of Local Authorities are within 1% of Census estimate 95% of Local Authorities are within 1.5% of Census estimate -14 -12 -10 -8 -6 -4 -2 0 East Midlands East of England London North East North West South East South West Wales West Midlands Yorkshire and The Humber England and Wales DSE % diff SPD % diff
  • 45. Allocating address at SPD record level Using many data sources to find our ‘best’ address. Benefits Enables aggregation at different levels and cross tabulation with other variables. Can weight certain data sources for different demographic groups . e.g. students
  • 46. Allocating address at record level PR Joe Bloggs 17/4/1974 UPRN: 12345 CIS Joe Bloggs 17/4/1974 UPRN: 12346 Patient register moves - PDS Joe Bloggs 17/4/1974 move 1 - 1/1/2011: UPRN: 12345 Joe Bloggs 17/4/1974 move 2 - 2/2/2011: UPRN: 22345 Joe Bloggs 17/4/1974 move 3 - 3/3/2013: UPRN: 12346 Can use activity data to locate the newest address. True match on SPD
  • 47. Plans for the future This year • Numbers of households by LA, England and Wales, 2011 and 2015 for Research Outputs, Autumn 2016 (including case studies of Local Authorities of interest) • Focus on issue of definitional differences – what is the real need vs what can be produced. • We have initiated a ONS household working group to join different sectors of work to share ideas build knowledge. Future years • Develop time series of numbers of households • Explore additional data sources to fill gaps in household statistics •Household sizes •Household composition •Investigate production of an enhanced address register.
  • 48. Integrated sources for estimating population characteristics Alison Whitworth and Meghan Elkin
  • 49. Administrative Data Census Census provides information on: 1. Size of the population by area, age and sex 2. Household and families number, size and type of family 3. Population characteristics information on ethnicity, educational attainment, religion (etc)
  • 50. Producing population statistics from admin data • Beyond 2011 admin data option : o Admin data – population size by age and sex o 4% annual survey – population characteristics • National Statistician’s recommendation: use all available sources o Approach going forward to explore in more depth the potential of admin data 51
  • 51. Outline • Framework for characteristics post 2021 • Methods for combining sources • Academic research • Examples o Income o Population by ethnic group
  • 52. Framework for population characteristics Survey • Admin data available for target variable • Admin data for characteristic associated with target variable • Admin data is a proxy of target variable Admin • No administrative data available for target variable Admin Integrated sources Survey
  • 53. Framework for population characteristics Survey • Admin data available for target variable • Direct counts or estimates from administrative data •Admin data for characteristic associated with target variable • Regression model for estimates •Admin data is a proxy of target variable • Structural model for estimates Admin • No administrative data available for target variable • Direct survey estimates Admin Integrated sources Survey
  • 54. Local benchmarks National / regional Microlevel / multivariate Framework – top down approach Survey Integrated sources Admin
  • 55. Two key methods Regression model • Auxiliary data are correlated with the target variable • Model to define relationship Structural model • Auxiliary data same structure as target variable • Model to define the structure Local Authority claimant count by unemployment
  • 56. Methods – ONS applications Regression model  Mean Household income (MSOA)  Median Household income (MSOA) for 2011  Unemployment (LA)  Emigration (LA) SPREE  Broad ethnic group (LA) Regression model × Unemployment (MSOA) × Informal caring (wards) × Crime and fear of crime (wards) × Mental health of children and adolescents (wards) × Adult Neurotic Disorder (wards)
  • 57. Role of survey data Survey • Direct counts or estimates from administrative data - Survey to adjust for under or over coverage, measurement error • Regression model - Survey estimates strengthened by admin sources • Structural model - Survey provides accurate marginal totals Admin • No administrative - Direct survey estimates Admin Integrated sources Survey
  • 58. Academic input • Collaborative projects Structure Preserving Estimation (SPREE), • Expert advisory group Small area estimation • Funded research/ Bids National Centre Research Methods • Conferences NCRM Bath (July) SAE Maastricht (Aug)
  • 59. Summary 1. Framework: top down  gaps at small area, micro level & multivariate 2. Methods: two approaches for combining sources  understand wider application 3. Academic momentum
  • 60. Case Study: Income outputs Meghan Elkin
  • 61. Current published income outputs Admin outputs Integrated sources Survey outputs Majority of current income outputs (lowest geography parliamentary constituency) Small Area Income Estimates (modelled household weekly income at MSOA) Working towards multivariate, small area income outputs
  • 62. Income definition (Canberra Handbook) Ideally achieve gross income: • Income from employment e.g. employee income and income from self-employment • Property income e.g. income from financial and non-financial assets • Current transfer received e.g. social security schemes, pensions Source: UNECE Canberra Group Handbook on Household Income Statistics
  • 63. Components of income & admin data Total Income Current Transfers received Property income Self employment Employment via PAYE Financial assets Royalties Rent Dividends Interest Personal pension Benefits Non-financial assets Tax Credits Un- declared Size of the bubbles not relative to proportion of income Social security & assistance State pension Pensions Current transfers from non-profit institutions Current transfers from other households This diagram does not provide a full disaggregation of components. For more detail see the Canberra Group Handbook. Occupational pension Income from employment Green = access to admin data Amber = admin data available Red = unknown/admin data not available – will need to estimate using surveys Un- declared
  • 64. Admin Data Census plans for income Increased use of Admin Data Direct estimates from admin data individual income estimates Combining admin data with surveys (work with SAIE) More developed publication e.g. components of income Admin outputs Produce household income estimates More detailed PAYE & self-assessment data 2016 2017 2018 Integrated sources Output quality National OA Limited coverage LA LSOA OA
  • 65. Income research outputs 2016 definition Output Distributions Population Those resident at 30 June 2013 on the Statistical Population Dataset (SPD) aged 16 and over Geography England and Wales Geographic level Local Authorities Unit level Individuals Reference period Annual Time period Tax year 2013/14 Source of income PAYE earnings (employment and occupational pensions), child and working tax credits, housing benefit, many DWP benefits Accrual or receipt Receipt – administrative data on income and benefits is recorded on receipt of the income or benefits Location Income by individual’s home address
  • 66. Proportion of SPD population that has some income information by age 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 Age Males Females Proportion of the population
  • 67. Proportion of SPD population that has some income information
  • 68. Proposed income bands for 2016 outputs 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% LA1 LA2 Zero 0-5K 5-10K 10-15K 15-20K 20-30K 30-40K 40-60K 60K+ Missing Proportion of the population
  • 69. Case Study: Local Authority Population Estimates by Ethnic Group using Generalised Structure Preserving Estimation (GSPREE) June 2016
  • 70. Census Table Population by Local Authority and Ethnic Group England (March 2011) Local Authority White Mixed Asian Chinese Black Other Total Fareham 107959 1359 1200 467 357 239 111581 Southampton 203528 5678 16443 3449 5067 2717 236882 Portsmouth 181182 5467 9863 2611 3777 2156 205056 Winchester 111577 1626 1894 745 457 296 116595 ……. … … … … … … … Bath & NE Somerset 166473 2898 2665 1912 1326 742 176016
  • 71. Data for Ethnic Group • 2011 Census estimates (Mar 2011) Proxy: Detailed cross tabulation but outdated • School Census (Jan 2014) Proxy: Detailed cross tabulation but age 5-15 only • Annual Population Survey (2014) Total population by ethnic group • Mid Year Population Estimates (June 2014) Total population by local authority
  • 72. Data for Ethnic Group Census 2011 MYE 2014 White Mixed Asian Chinese Black Other Total Fareham 107959 1359 1200 467 357 239 …….. Southampton 203528 5678 16443 3449 5067 2717 …….. Portsmouth 181182 5467 9863 2611 3777 2156 …..... …. … … … … … … … Tower Hamlets 114819 10360 96392 8109 18629 5787 …….. Slough 64053 4758 54900 797 12115 3582 ……… ….. … … … … … … … APS July 2012 - June 2014 (weighted estimates) National total ………. ………. ………. ………. ……….. ………. School Census Dec 2014 White Mixed Asian Chinese Black Other Fareham … … … … … … Southampton … … … … … … Portsmouth … … … … … … …. … … … … … … Tower Hamlets … … … … … … Slough … … … … … … ….. … … … … … …
  • 73. Solution… • Combine administrative and census data with survey data to borrow strength and produce reliable estimate for each cell (domain) using GSPREE (Zhang and Chambers, 2004 and Luna-Hernandez, A. 2014).
  • 74. Applying GSPREE • Step 1: Estimate the association structure by relating survey counts (Yaj) to census counts (Xaj): logYaj =g a + lj + baaj X lj = 0 jå 1, , , 1, ,a A j J K K aaj Y = baaj X  2011 Census (Xaj) White Mixed Asian Chines e Black Other Fareham 107959 1359 1200 467 357 239 Southampton 203528 5678 16443 3449 5067 2717 Portsmouth 181182 5467 9863 2611 3777 2156 …. … … … … … … Tower Hamlets 114819 10360 96392 8109 18629 5787 Slough 64053 4758 54900 797 12115 3582 ….. … … … … … … 2013 School Census (Xaj) White Mixed Asian Chinese Black Other Fareham … … … … … … Southampton … … … … … … Portsmouth … … … … … … …. Tower Hamlets … … … … … … Slough ….. … … … … … … APS (Yaj) Jan 2012-Dec 2014 White Mixed Asian Chinese Black Other Fareham … … … … … … Southampton … … … … … … Portsmouth … … … … … … …. Tower Hamlets … … … … … … Slough ….. … … … … … … - obtained via MLE - Poisson or Multinomial distribution assumed - Predict cell counts but no benchmarking ˆb
  • 75. Applying GSPREE • Step 2: Benchmark updated cell counts to margins totals Iterative Proportional Fitting (IPF) to impose the known row and column totals to the cell counts obtained in step 1 GSPREE Estimates Dec 2014 MYE 2014 White Mixed Asian ChineseBlack Other Total Fareham … … … … … … …….. Southampton … … … … … … …….. Portsmouth … … … … … … …….. …. … … … … … … … Tower Hamlets … … … … … … …….. Slough … … … … … … …….. ….. … … … … … … … APS July 2012 - June 2014 (weighted estimates) National total ……….. …….. ………. …….. ……… ………… • Step 3: Obtain precision estimates via bootstrap
  • 76. Distribution of LA estimates by ethnic group, 2014 (England)
  • 77. RMSE. LA by ethnic group, 2014 • Overall, GSPREE is successful in providing reliable estimates for most LAs. • However, non-negligible RMSEs (and CVs) are observed in some areas Fixed Effects GSPREE estimator (England)
  • 78. Conclusions • GSPREE shows good performance Small RMSE in most LAs • Work in progress Validation study (1991/2001 Census) GSPREE: 2001 Census x 2011 data (APS, MYE, ESC) Validation: 2011 Census • Further work … Modelling strategy for more detailed categories Consider SPD as row totals Consider only School Census as proxy data Consider different attributes
  • 79. References Purcell, N. J. and Kish, L. (1980). Postcensal Estimates for Local Areas (or Domains). International Statistical Review, 48, 3-18. Zhang, L.C. and Chambers, R. (2004). Small area estimates for cross- classifications. Journal of the Royal Statistical Society, B, 66, 479– 496. Luna-Hernandez, A. (2014). On Small Area Estimation for Compositions Using Structure Preserving Models. Unpublished PhD upgrade document, Department of Social Statistics and Demography, University of Southampton.
  • 80. Contacts • Further feedback on today’s session please contact us at: Beyond.2021.Research.and.Design@ons.gov .uk