2. Population and Household
Estimates – what we have done
and what we plan to do
Chris Hill, Ali Dent and Claire Pereira
Administrative Data Census Project
Census Transformation Programme
3. Overview
Administrative Data Research Outputs:
• Population estimates – background, our
progress so far and our future plans (Chris
and Ali)
• Producing household estimates from
administrative data (Claire)
Note: These research outputs are NOT official statistics on the
population
4. Background
Beyond 2011 programme (April 2011)
• review the future provision of population statistics in England and Wales
and inform government and Parliament about options for the next census
• culminated in the National Statistician’s recommendation on the future of the
census and population statistics (March 2014)
The Census Transformation Programme (January 2015) to take forward
the National Statistician’s recommendation:
• deliver a predominantly online census in 2021
• increased use of administrative data and surveys to enhance the statistics
from the 2021 census and improve annual statistics between censuses
Administrative Data Census Project - aim is to produce the type of
information that is collected by a ten-yearly census (on housing, households
and people) from use of administrative data and surveys
5. Administrative Data Census Project –
Research Outputs
Key aim of the Research Outputs is:
• to replicate as many census outputs as possible using
admin data (and surveys) to compare with the 2021
Census
• Size of population
• Number and structure of households
• Characteristics of housing and the population
Continued development of the methodology based on
acquisition of new data sources and user feedback
Publish an annual assessment each spring to show
progress of our ability to move to an Administrative
Data Census in the next decade
6. A long way to go…but we have begun
• Published first set of
Research Outputs
Oct 2015
• Published first Annual
Assessment
May 2016
• Next set of Research
Outputs published –
expanding the range
Autumn 2016
7. What was included in the 2015 release?
• research outputs for each LA in England and Wales as a series
of admin data population estimates for 2011, 2013 and 2014 by
5 year age-sex groups
• analytical report comparing these to the 2011 Census and
subsequently the ONS mid-year population estimates
• case studies to highlight quality issues with the admin data
• interactive maps and population pyramids
• administrative data update paper – plans and aspirations for
future years
• feedback from users - aim of improving our methods
8. Producing population estimates from
administrative data (SPD)
NHS Patient
Register (PR)
DWP/HMRC Customer
Information
System (CIS)
Higher Education Statistics Agency (HESA)
data (students)
population
estimates
Included in Statistical
Population
Dataset (SPD)
If in different location
on PR & CIS, split half
and half across two
addresses
Statistical Population Dataset – SPD V1.0
9. Performance of SPD v1.0 compared with the 2011
Census estimates by LA
94% of LA total population
estimates within 3.8% of Census
estimate in 2011
Admin data
method lower
than 2011 Census
Admin data
method higher
than 2011 Census
Percentage difference from 2011 Census estimates
10. Performance of SPDv1.0 compared with the 2014
mid-year estimates by LA
90% of LA total population
estimates within 3.8% of mid-
year estimate in 2014
Admin data
method lower
than 2011 Census
Admin data
method higher
than 2011 Census
Percentage difference from 2014 MYEs
11. Feedback summary
• the need for a Population Coverage Survey to help with
estimating the size of the population (considering options)
• using ‘activity data’ (1) to help reduce levels of over-coverage
that are seen for particular age groups (some progress)
• refining the Statistical Population Dataset inclusion and
exclusion rules (changes made)
• reviewing the quality standards that are used to assess the
quality of the SPDs (considering options)
• producing population estimates for small areas, within a local
authority (potentially autumn 2016)
1 Information from administrative data sources about when individuals have interacted with
systems or services, such as the National Insurance, tax or benefits systems, or a hospital visit
through the NHS system.
12. SPD Developments for 2016
SPD (Statistical Population Dataset)
used to estimates of the size of the population by anonymously
linking multiple administrative datasets
• Continue with SPD v1.0 for 2015 estimates (stable)
• SPD v2.0 (improved model) will be used to produce
pop estimates for 2011 and 2015
SPD v2.0 changes
• Improve overall coverage of the usual resident
population
• Redistribute people in the correct location
13. Plans for 2016 Research Outputs
• Population estimates – expanding the breadth and
detail
• Improvements to the methods used to produce
administrative data population estimates
• Outputs on the number of households
• Research on income from combined PAYE and
benefits data
• Stagger the outputs over the autumn
14. What are we planning to publish this year?
Package 1
Population estimates
(National and LA)
by LA, sex and 5 year age-group 2015
(As last year, but extends to include
a new time series for 2011 and 2015,
and the old time series extends to
SYOA)
Autumn 2016
Package 2 NEW
Population estimates
(Small Area)
by LSOA, sex and 5 year age-group Autumn 2016
Package 3 NEW
Household estimates
(Number of households)
Combined PAYE and
benefits research
by LA (2011 and 2015)
by LA (2013/14 Tax Year)
Autumn 2016
All content and timings are provisional
15. Focus of methodological research for 2016
Males and females (where comparison data is higher or lower than
official estimates) percentage difference 2011 (England and Wales)
Add of other
admin data
Activity
Data
Improve Matching Methodology,
increasing number of matches
16. Tackling undercoverage for school age
children
User feedback had suggested additional
administrative sources to use including the
School Census
• School Census = record level source that includes all
pupils at state Schools, produced annually
• SPD V1 includes matches between any two of PR,
DWP-CIS and HESA
With School Census we can find additional
matches to include in an SPD:
• PR-SC matches
• CIS-SC matches
17. Also - improve use of matches for
students
• In SPD V1 – it is possible for the record identifiers to
conflict, for example
• In this case SPD v1 does not choose between the two HESA
IDs, so PR and CIS locations are used to place people in an
area
• The conflict implies that there is one SPD row that might
represent two people, and that we are not using the HESA
location for them in SPD v1
HESA-pr
match
PR-CIS CIS-HESA
match
HESA ID (via
PR)
NHS number
PR
DWP CIS ID HESA ID
(via CIS)
365421 7404747201 889877261543 739542
18. HESA ID (via
PR)
NHS number
PR
DWP CIS ID HESA ID
(via CIS)
365421 7404747201 889877261543 739542
Improved use of matches for students
• Want to ensure there are no rows with more than 1 identifier
from each dataset (e.g. prevent 2 HESA ids in 1 row)
• Resolving conflicts may change number of records in SPD
Could convert this to 2 matches, so each HESA ID only appears
once!
• Achieved by changing how we use the information for the
matches
• All IDs from the datasets are included in a “spine” of records
(even if they are non-matches) – convenient for research
HESA ID NHS number
PR
365421 7404747201
DWP CIS ID HESA ID
889877261543 739542
19. School Census matches and HESA conflicts:
estimated effect of extra matches compared to SPD v1
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
% of SPD 1.0 estimate
Singleyearofage
Females
Males
Effects from
adding SC
matches
Effects from
HESA conflict
resolution
20. Estimated proportion of SPD records by source
(after SPD exclusion/inclusion rules)
Frequency Percent
PR and CIS 45,202,800 81.2
PR, CIS and SC 7,733,800 13.9
PR,CIS and HESA 2,070,800 3.7
PR 225,100 0.4
PR and HESA 170,500 0.3
PR and SC 143,500 0.3
CIS and HESA 89,700 0.2
CIS and SC 52,200 0.1
New
matches
Increased number
of matches
21. Coverage improvements for working
ages
• SPD v1 used “exact” or deterministic matches
(e.g. based on combination of name, address, DoB etc).
• Using score based matches (probabilistic) we can find
more PR to CIS matches
• Over 150,000 additional matches of expected good
quality can be identified
• 60% male, majority are distributed evenly across ages
18-50
• 40% Female, many in 18-24 range but also some older
• The extra matches to be added to the
deterministic/matchkey matches to test impact (work in
progress)
22. “Activity” data
New activity data acquired from DWP and
HMRC (abbrev = BIDS):
• National Benefits Database (NBD)
• PAYE (Pay as you earn – income tax)
• Single Housing Benefit (SHBE)
• Tax Credits
• excludes: Child Benefit, self-employed and people on
Universal Credit
Research aim:
to derive broad activity to verify residency in E&W and
potentially reduce overcoverage in an SPD
23. Created combined DWP/HMRC activity
dataset for 2011
• PAYE and TC:
• Anyone present in 2010/2011 and 2011/2012 tax years
• NBD:
• All people with an active claim on 15/03/2011
• Any other people with JSA or ESA claim since 15/03/2009
• Any partners of the above given their own record
• SHBE:
• Active claim on 15/03/2011 or started after
• Partner status was active on 15/03/2011 or after
• Variables:
• Dates
• Additional variables derived from each source – eg account
status, claim status, boolean tax year variables for PAYE/TC
• Links to CIS, Census, PR, SPDv1
24. Age-sex distribution of active DWP/HMRC
records in SPDv1 2011
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
450,000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100+
Male not in BIDS
Male in BIDS
Female not in BIDS
Female in BIDS
SPD records
not active
25. Effect of removing inactive records
• Self-employed likely to be excluded from activity dataset
• Child benefit not included
• Would like to acquire MORE activity data!
-50%
-40%
-30%
-20%
-10%
0%
10%
15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90+
Proportion of males in BIDS
compared to census estimates
Proportion of females in BIDS
compared to census estimates
Proportion of males inSPDv1
compared to census estimates
Proportion of females in SPD v1
compared to census estimates
26. Improving local distributions – with
“PDS” activity data
• PDS is our first set of health activity data
• Is based on interactions with NHS, not same as Patient Register -
contains history (multiple rows per person)
• Extract for ONS contains “movers” – a history of locations for each
person
• Aim to remove uncertainty in SPD about location of people, so a
single record is not allocated as a half-person in 2 locations
• SPD v1.0 contains ~3.1 million half-weighted people (5.5%)
• PDS information likely to be more recent than CIS/PR and may
help to resolve half-weighted people
• Many half-weighted records persist in SPD for multiple years, so
linking to PDS from current and previous years may resolve more
• Aimed to link half-weighted records to PDS in current year or
earlier, and categorise
27. Resolving half weighted people with PDS:
linkage results (2013)
• 48% of 2013 half-weighted records are not linked to any PDS
record from 2013 or earlier, 52% are linked
• Those who are found by most recent PDS extract they are
found on:
• Significant benefit from including earlier years
(maximise information available for half-weighted records)
PDS extract Frequency Percentage
June 2013 806125 49.1
June 2012 469416 28.6
June 2011 323584 19.7
March 2011 44480 2.7
28. Example of resolving location :
perfect moves
PR:CIS:
LA Mod date
E06000047 16/04/2013
LA Addr start date
E09000027 09/03/2011
Most recent PDS move:
Origin LA Destination LA Effective date
E09000027 E06000047 04/04/2013
• Dates on PDS and PR are both later than CIS
• Category 1b is the same except PR to CIS
• Can assign very confidently to destination LA
Category 1a – a perfect CIS-PR move:
29. Producing population estimates from
administrative data (SPD)
NHS Patient
Register
DWP/HMRC Customer
Information
System
HESA data
(students)
SPD population
estimates
Included in Statistical
Population
Dataset (SPD)
School Census
Statistical Population Dataset – SPD V2.0
resolve half-weights
Add extra
PR-CIS
matches
To be published this autumn !
31. What is a household?
A
household
is defined as:
one person living alone,
or
a group of people (not
necessarily related) living at the
same address who share
cooking facilities and
share a living room or
sitting room or dining area.
32. Beyond 2011
Early research showed potential for admin
data to provide number and sizes of
‘occupied addresses’.
But key challenges….
• Limited data sources available.
• Coverage & measurement error –undercount &
people not in the right place.
• Definitions – occupied address v census definition
33. Aims
Short term
• Producing numbers of households in England
and Wales by Local Authority for 2011 and 2015
• Deal with key challenges from previous work
Longer term
• Keep developing – build breadth and time series
of households statistics
• Develop alongside SPD production
34. What data can we use?
Address
Base
Tax and
Benefits
data
Population
Coverage
Survey
35. Comparing with other ONS outputs
OA
Output Area
DAU
Demographics
Analysis Unit
LFS
Labour Force
Survey
SPD
Statistical
Population
Database
No mid year estimates as with population
Can evaluate quality in 2011 by comparing with Census
estimates, down to OA level.
DAU produce national estimates for 1996 onwards:
• Families and people in families
• Households and people in households
Produced from LFS – sample size - 41,000 households
containing around 100,000 individuals. Internally
estimates can be produced at Local Authority level.
36. Can AddressBase help?
C Commercial
L Land
M Military
O Other (Ordnance
Survey Only)
P Parent Shell
R Residential
U Unclassified
X Dual Use
Z Object of Interest
RB Ancillary Building
RC Car Park Space
RD Dwelling
RG Garage
RH House In Multiple Occupation
RI Residential Institution
There are 1128 classifications of address on Address Base, an
Ordinance Survey product, including care home, house boat and
caravan. Classifications have four levels of detail (many/most do
not) and have dates attached, that allows further validation.
37. Address Matching
OSAPR
Ordnance Survey
Address-Point
Reference Number
UPRN
Unique Property
Reference Number
Address matching methodology is developing at ONS -
estimate a 5% increase in match rate.
Need a reliable unique identifier for addresses - transition
from OSAPR to UPRN
38. Changes in address identification
OSAPR
Ordnance Survey
Address-Point
Reference Number
UPRN
Unique Property
Reference Number
- 1.00 2.00 3.00 4.00
East Midlands
East of England
London
North East
North West
South East
South West
Wales
West Midlands
Yorkshire and…
OSAPRs/UPRNs in millionsUPRNs 2014 OSAPRs 2013
Currently ONS has attached OSAPRs onto records up to 2013,
with a switch to UPRNs in 2014.
We would expect an increase due to housing stock growth of
around 1%. 77% of LAs show an increase of more than 1 %.
39. Challenges
Our biggest challenges for producing household numbers
Definition – household/address is not a one to one relationship.
Putting people in the right place
•Half weights on SPD – when sources disagree
•Correct address allocation
• data lags
• high churn
• people not deregistering
• poor AddressBase matching/allocation
40. Dealing with half sizes
Our objective is to count each person in a household – need to resolve
unmatched records
Two methods
1. Source preference
HESA PR CIS
2. Redistribute according to
household size distributions
41. Dealing with half sizes
-
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
hh size 1 hh size 2 hh size 3 hh size 4 hh size 5
plus
total hhs
Comparing with Census Outputs
Census - QS406EW Redistributed half sizes
-25
-20
-15
-10
-5
0
5
10
15
20
25
hh size 1 hh size 2 hh size 3 hh size 4 hh size 5
plus
total hhs
% differences
Redistributed half sizes
Over counting large
household sizes, whilst
undercounting 1 and 2
person households.
It is anticipated that better
address matching and the
use of UPRNs rather than
OSAPRs will resolve some of
these differences.
42. Dual System Estimation
ONS often uses DSE to weight up for non response. To trial the use of
DSE, to weight up for undercount, I used a 4% sample by postcode taken
from the Census as a proxy for a survey.
To allow for differences in samples, 400 samples were taken.
In the future, an annual
survey similar to a population
coverage survey could
contribute.
-14 -12 -10 -8 -6 -4 -2 0
East Midlands
East of England
London
North East
North West
South East
South West
Wales
West Midlands
Yorkshire and The Humber
England and Wales
SPD % diff
44. Dual System Estimation
Impact of DSE on household counts
85% of Local Authorities are within 0.5% of Census estimate
90% of Local Authorities are within 1% of Census estimate
95% of Local Authorities are within 1.5% of Census estimate
-14 -12 -10 -8 -6 -4 -2 0
East Midlands
East of England
London
North East
North West
South East
South West
Wales
West Midlands
Yorkshire and The Humber
England and Wales
DSE % diff SPD % diff
45. Allocating address at SPD record level
Using many data sources to find our
‘best’ address.
Benefits
Enables aggregation at different
levels and cross tabulation with other
variables.
Can weight certain data sources for
different demographic groups . e.g.
students
46. Allocating address at record level
PR
Joe Bloggs
17/4/1974
UPRN: 12345
CIS
Joe Bloggs
17/4/1974
UPRN: 12346
Patient register moves - PDS
Joe Bloggs 17/4/1974 move 1 - 1/1/2011: UPRN: 12345
Joe Bloggs 17/4/1974 move 2 - 2/2/2011: UPRN: 22345
Joe Bloggs 17/4/1974 move 3 - 3/3/2013: UPRN: 12346
Can use
activity
data to
locate the
newest
address.
True
match on
SPD
47. Plans for the future
This year
• Numbers of households by LA, England and Wales, 2011
and 2015 for Research Outputs, Autumn 2016 (including
case studies of Local Authorities of interest)
• Focus on issue of definitional differences – what is the real
need vs what can be produced.
• We have initiated a ONS household working group to join
different sectors of work to share ideas build knowledge.
Future years
• Develop time series of numbers of households
• Explore additional data sources to fill gaps in household
statistics
•Household sizes
•Household composition
•Investigate production of an enhanced address register.
48. Integrated sources for estimating
population characteristics
Alison Whitworth and Meghan Elkin
49. Administrative Data Census
Census provides information on:
1. Size of the population
by area, age and sex
2. Household and families
number, size and type of family
3. Population characteristics
information on ethnicity, educational attainment,
religion (etc)
50. Producing population statistics from
admin data
• Beyond 2011 admin data option :
o Admin data – population size by age and sex
o 4% annual survey – population characteristics
• National Statistician’s recommendation: use
all available sources
o Approach going forward to explore in more depth
the potential of admin data
51
51. Outline
• Framework for characteristics post 2021
• Methods for combining sources
• Academic research
• Examples
o Income
o Population by ethnic group
52. Framework for population characteristics
Survey
• Admin data available for target variable
• Admin data for characteristic associated with
target variable
• Admin data is a proxy of target variable
Admin
• No administrative data available for target
variable
Admin
Integrated
sources
Survey
53. Framework for population characteristics
Survey
• Admin data available for target variable
• Direct counts or estimates from administrative
data
•Admin data for characteristic associated with target variable
• Regression model for estimates
•Admin data is a proxy of target variable
• Structural model for estimates
Admin
• No administrative data available for target variable
• Direct survey estimates
Admin
Integrated
sources
Survey
55. Two key methods
Regression model
• Auxiliary data are correlated
with the target variable
• Model to define relationship
Structural model
• Auxiliary data same structure as
target variable
• Model to define the structure
Local Authority claimant count by
unemployment
56. Methods – ONS applications
Regression model
Mean Household income
(MSOA)
Median Household
income (MSOA) for 2011
Unemployment (LA)
Emigration (LA)
SPREE
Broad ethnic group (LA)
Regression model
× Unemployment (MSOA)
× Informal caring (wards)
× Crime and fear of crime
(wards)
× Mental health of children
and adolescents (wards)
× Adult Neurotic Disorder
(wards)
57. Role of survey data
Survey
• Direct counts or estimates from administrative
data - Survey to adjust for under or over
coverage, measurement error
• Regression model - Survey estimates
strengthened by admin sources
• Structural model - Survey provides accurate
marginal totals
Admin
• No administrative - Direct survey estimates
Admin
Integrated
sources
Survey
58. Academic input
• Collaborative projects Structure Preserving
Estimation (SPREE),
• Expert advisory group Small area estimation
• Funded research/ Bids National Centre Research
Methods
• Conferences NCRM Bath (July)
SAE Maastricht (Aug)
59. Summary
1. Framework: top down
gaps at small area, micro level & multivariate
2. Methods: two approaches for combining sources
understand wider application
3. Academic momentum
61. Current published income outputs
Admin
outputs
Integrated
sources
Survey
outputs Majority of current income outputs
(lowest geography parliamentary constituency)
Small Area Income Estimates
(modelled household weekly income at MSOA)
Working towards multivariate, small area
income outputs
62. Income definition (Canberra Handbook)
Ideally achieve gross income:
• Income from employment
e.g. employee income and income from self-employment
• Property income
e.g. income from financial and non-financial assets
• Current transfer received
e.g. social security schemes, pensions
Source: UNECE Canberra Group Handbook on Household Income Statistics
63. Components of income & admin data
Total
Income
Current
Transfers
received
Property
income
Self
employment
Employment
via
PAYE
Financial
assets
Royalties
Rent
Dividends
Interest
Personal
pension
Benefits
Non-financial
assets
Tax
Credits
Un-
declared
Size of the bubbles
not relative to
proportion of income
Social
security &
assistance
State
pension
Pensions
Current
transfers
from non-profit
institutions
Current transfers
from other
households
This diagram does not provide a full disaggregation of components. For more detail see the Canberra Group Handbook.
Occupational
pension
Income
from
employment
Green = access to
admin data
Amber = admin data
available
Red = unknown/admin
data not available –
will need to estimate
using surveys
Un-
declared
64. Admin Data Census plans for income
Increased use of Admin Data
Direct estimates from
admin data individual
income estimates
Combining admin
data with surveys
(work with SAIE)
More developed
publication e.g.
components of income
Admin
outputs
Produce household
income estimates
More detailed PAYE &
self-assessment data
2016 2017 2018
Integrated
sources
Output
quality
National OA
Limited
coverage
LA LSOA OA
65. Income research outputs 2016 definition
Output Distributions
Population Those resident at 30 June 2013 on the Statistical Population Dataset (SPD) aged 16 and over
Geography England and Wales
Geographic level Local Authorities
Unit level Individuals
Reference period Annual
Time period Tax year 2013/14
Source of income PAYE earnings (employment and occupational pensions), child and working tax credits,
housing benefit, many DWP benefits
Accrual or receipt Receipt – administrative data on income and benefits is recorded on receipt of the income or
benefits
Location Income by individual’s home address
66. Proportion of SPD population that has
some income information by age
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88
Age
Males
Females
Proportion of the
population
68. Proposed income bands for 2016 outputs
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
LA1
LA2
Zero
0-5K
5-10K
10-15K
15-20K
20-30K
30-40K
40-60K
60K+
Missing
Proportion of the population
69. Case Study: Local Authority
Population Estimates by Ethnic
Group using Generalised Structure
Preserving Estimation (GSPREE)
June 2016
70. Census Table
Population by Local Authority and Ethnic Group
England (March 2011)
Local Authority White Mixed Asian Chinese Black Other Total
Fareham 107959 1359 1200 467 357 239 111581
Southampton 203528 5678 16443 3449 5067 2717 236882
Portsmouth 181182 5467 9863 2611 3777 2156 205056
Winchester 111577 1626 1894 745 457 296 116595
……. … … … … … … …
Bath & NE
Somerset 166473 2898 2665 1912 1326 742 176016
71. Data for Ethnic Group
• 2011 Census estimates (Mar 2011)
Proxy: Detailed cross tabulation but outdated
• School Census (Jan 2014)
Proxy: Detailed cross tabulation but age 5-15 only
• Annual Population Survey (2014)
Total population by ethnic group
• Mid Year Population Estimates (June 2014)
Total population by local authority
72. Data for Ethnic Group
Census 2011 MYE
2014
White Mixed Asian Chinese Black Other Total
Fareham 107959 1359 1200 467 357 239 ……..
Southampton 203528 5678 16443 3449 5067 2717 ……..
Portsmouth 181182 5467 9863 2611 3777 2156 ….....
…. … … … … … … …
Tower Hamlets 114819 10360 96392 8109 18629 5787 ……..
Slough 64053 4758 54900 797 12115 3582 ………
….. … … … … … … …
APS July 2012 - June 2014 (weighted estimates)
National total ………. ………. ………. ………. ……….. ……….
School Census Dec 2014
White Mixed Asian Chinese Black Other
Fareham … … … … … …
Southampton … … … … … …
Portsmouth … … … … … …
…. … … … … … …
Tower Hamlets … … … … … …
Slough … … … … … …
….. … … … … … …
73. Solution…
• Combine administrative and census data with
survey data to borrow strength and produce
reliable estimate for each cell (domain) using
GSPREE (Zhang and Chambers, 2004 and
Luna-Hernandez, A. 2014).
74. Applying GSPREE
• Step 1: Estimate the association structure by relating
survey counts (Yaj) to census counts (Xaj):
logYaj =g a + lj + baaj
X
lj = 0
jå
1, , , 1, ,a A j J K K
aaj
Y
= baaj
X
2011 Census (Xaj)
White Mixed Asian
Chines
e Black Other
Fareham 107959 1359 1200 467 357 239
Southampton 203528 5678 16443 3449 5067 2717
Portsmouth 181182 5467 9863 2611 3777 2156
…. … … … … … …
Tower Hamlets 114819 10360 96392 8109 18629 5787
Slough 64053 4758 54900 797 12115 3582
….. … … … … … …
2013 School Census (Xaj)
White Mixed Asian Chinese Black Other
Fareham … … … … … …
Southampton … … … … … …
Portsmouth … … … … … …
….
Tower Hamlets … … … … … …
Slough
….. … … … … … …
APS (Yaj)
Jan 2012-Dec 2014
White Mixed Asian Chinese Black Other
Fareham … … … … … …
Southampton … … … … … …
Portsmouth … … … … … …
….
Tower Hamlets … … … … … …
Slough
….. … … … … … …
- obtained via
MLE
- Poisson or
Multinomial
distribution
assumed
- Predict cell counts
but no
benchmarking
ˆb
75. Applying GSPREE
• Step 2: Benchmark updated cell counts to margins totals
Iterative Proportional Fitting (IPF) to impose the known row
and column totals to the cell counts obtained in step 1
GSPREE
Estimates
Dec
2014
MYE
2014
White Mixed Asian ChineseBlack Other Total
Fareham … … … … … … ……..
Southampton … … … … … … ……..
Portsmouth … … … … … … ……..
…. … … … … … … …
Tower Hamlets … … … … … … ……..
Slough … … … … … … ……..
….. … … … … … … …
APS July 2012 - June 2014 (weighted estimates)
National total ……….. …….. ………. …….. ……… …………
• Step 3: Obtain precision estimates via bootstrap
77. RMSE. LA by ethnic group, 2014
• Overall, GSPREE is successful in providing reliable
estimates for most LAs.
• However, non-negligible RMSEs (and CVs) are
observed in some areas
Fixed Effects GSPREE estimator (England)
78. Conclusions
• GSPREE shows good performance
Small RMSE in most LAs
• Work in progress
Validation study (1991/2001 Census)
GSPREE: 2001 Census x 2011 data (APS, MYE, ESC)
Validation: 2011 Census
• Further work …
Modelling strategy for more detailed categories
Consider SPD as row totals
Consider only School Census as proxy data
Consider different attributes
79. References
Purcell, N. J. and Kish, L. (1980). Postcensal Estimates for Local Areas
(or Domains). International Statistical Review, 48, 3-18.
Zhang, L.C. and Chambers, R. (2004). Small area estimates for cross-
classifications. Journal of the Royal Statistical Society, B, 66, 479–
496.
Luna-Hernandez, A. (2014). On Small Area Estimation for
Compositions Using Structure Preserving Models. Unpublished PhD
upgrade document, Department of Social Statistics and
Demography, University of Southampton.
80. Contacts
• Further feedback on today’s session please
contact us at:
Beyond.2021.Research.and.Design@ons.gov
.uk