1
Estimation & Adjustment
in Census 2021
Welcome
Please turn off your camera and mute your microphone
Questions?
Visit www.sli.do and enter code 61866 to ask your questions
Technology problems?
Email sdr.stakeholder.engagement@ons.gov.uk
Starting at 10:15am
#Census2021
2
Speakers
Gareth Powell
Project Lead for Census Estimation
Methodology
#Census2021 www.sli.do 61866
3
Speakers
Viktor Račinskij
Methodologist – Census Estimation
#Census2021 www.sli.do 61866
4
Speakers
Kirsten Piller
Methodologist – Census Adjustment
#Census2021 www.sli.do 61866
5
Aims for this session:
• Explain why we estimate and adjust the census population
• Explore what an estimated and adjusted census database allows us
to produce
• Describe the statistical methods we will use to estimate and adjust
the census population in 2021
• Describe what our plans are for sharing the details of our methods
#Census2021 www.sli.do 61866
6
Why are we running these webinars?
• To explain how the census works in collecting information
and producing great statistics
• To sign-post where more information is available
• To follow on from the material we published in October
• COVID response
• Statistical Design
• Findings from 2019 rehearsal
#Census2021 www.sli.do 61866
7 #Census2021 www.sli.do 61866
8
Census 2021 Quality Targets
High quality, flexible,
timely, accessible census
statistics for users
94% overall
response
At least 80% in
every local authority
75% Online
Response
Minimise variability
in response
Support
completion
#Census2021 www.sli.do 61866
9
Estimation and Adjustment
• How we account for non-response to the census
• Estimation creates key population totals, for each local
authority by age-sex group
• Adjustment takes these population estimates and makes
the record level database consistent
#Census2021 www.sli.do 61866
10
Coverage Estimation in
Census 2021
#Census2021 www.sli.do 61866
11
Overview of Estimation
• Even with a variety of strategies to maximise response, it is inevitable that
some people will not be counted, or counted in the wrong place
• Undercount error – a member of the target population is missing from
census
• Overcount error – a member of target population is counted in the wrong
location or counted twice
• In 2011 Census of England and Wales undercount was estimated to be 6%,
overcount – 0.6%
• Non-uniformly distributed across a population
#Census2021 www.sli.do 61866
12
Census Coverage Estimation
• The process of estimating the coverage error corrected population size at
national and subnational levels
• Estimates for a range of domains such as age-sex, tenure, ethnicity, etc. by
local authority
• Individual and household populations
• Coverage error corrected census estimates have higher accuracy than the
raw census counts
#Census2021 www.sli.do 61866
13
Raw census counts at the local authority level
in 2011 (estimated)
#Census2021 www.sli.do 61866
14
What coverage error corrected population size
estimates may look like compared to raw counts
#Census2021 www.sli.do 61866
15
Broad outline of estimation
• Census coverage survey conducted 6 weeks after the census
day, 1.5% of population sampled
• Census and the coverage survey data are matched
• Population size estimation
• Variance estimation
#Census2021 www.sli.do 61866
16
Coverage estimation in 2011
• Data delivered in batches, estimation on groups of local authorities
(estimation areas)
• Census coverage survey
• ~1.5% of postcodes
• Stratified by local authority and hard-to-count index
• Sample in every local authority
• Two-stage cluster (output areas, postcodes)
• Optimal allocation with constraints
• Combination of the dual system, ratio and synthetic estimators used
#Census2021 www.sli.do 61866
17
Data collection in Census 2021
• Online data collection means that the data are going to be
delivered quicker
• Estimation can be done using the data for the entirety of England
and Wales
• Different estimation approach based on the logistic / mixed
effects regression modelling
#Census2021 www.sli.do 61866
18
Independent re-enumeration of a sample of the
population
#Census2021 www.sli.do 61866
19
Independent re-enumeration of a sample of the
population: important points
#Census2021 www.sli.do 61866
20
How (and why) the dual system estimator works
𝑇𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 = [𝑁𝑢𝑚𝑏𝑒𝑟 𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑐𝑒𝑛𝑠𝑢𝑠] ∗
[𝐸𝑣𝑒𝑟𝑦𝑜𝑛𝑒 𝑖𝑛 𝐶𝐶𝑆]
[𝐼𝑛 𝑏𝑜𝑡ℎ 𝐶𝐶𝑆 𝑎𝑛𝑑 𝐶𝑒𝑛𝑠𝑢𝑠]
= 9 ∗
8
6
= 12
#Census2021 www.sli.do 61866
21
How to go from sample to population?
Ratio estimator
[Population size] =[DSE estimate within sample]
[census count at population level]
[census count within a sample]
= 12 ∗
90
9
= 120
#Census2021 www.sli.do 61866
22
Sampling error
Every CCS sample would be slightly different
#Census2021 www.sli.do 61866
23
Variance Estimation
• Variance estimation quantifies the error around coverage error corrected estimates
• Gives confidence intervals for our estimates
#Census2021 www.sli.do 61866
24
Estimating across many domains – Local
Authorities & characteristics
#Census2021 www.sli.do 61866
25
Changes for 2021 – Regression based
modelling approach
• In 2011 we estimated domains separately
• In 2021 we will use a modelling approach to allow us to use all
domains at the same time
• This allows each of them to help in the estimation of all the others
#Census2021 www.sli.do 61866
26
Changes for 2021 – Regression based
modelling approach
• Logistic or mixed effects logistic regression is used to estimate the
census response probability for an element (e.g. a person) in a local
authority with a set of characteristics
• Reduces variability of estimates and allows to reflect coverage
patterns and better meet assumptions of estimation
#Census2021 www.sli.do 61866
[𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦] =
1
1 + exp −[𝒙𝑇𝛽 + 𝒛𝑇𝜅 + 𝑢𝐿]
27
Why regression?
• Doing estimation separately for each
domain loses some information as
domains can overlap
• E.g. if we estimate for all men ages 40-
44 in an area, we aren’t also using their
marital status
• We could break the estimation down
further – but groups would get too small
#Census2021 www.sli.do 61866
28
Even better – use all variables and areas at once
#Census2021 www.sli.do 61866
29
The rest of estimation
• Overcoverage estimation and correcting for overcoverage error is
conceptually similar
• Use the census to the coverage survey matching to establish the
cases of overcount within a sample
• Fit the logistic regression model to estimate the probability of correct
enumeration
• Apply it to adjust for overcoverage (downweight)
• Communal Establishments
#Census2021 www.sli.do 61866
30
After Estimation
• Quality assurance using demographic analysis and admin data
• Record level totals aren’t the same as the estimates
• To get to small area multivariate outputs, need to impute record level
data – coverage adjustment
#Census2021 www.sli.do 61866
31
Coverage Adjustment in
Census 2021
#Census2021 www.sli.do 61866
32
Census Coverage Adjustment
• Coverage estimation process provides information on what types of persons
and households are estimated to have been missed
• Population estimates only broken down by characteristics associated with
non-response:
• age groups by sex, ethnicity, household size, hard to count index, activity
last week and tenure.
#Census2021 www.sli.do 61866
33
Census Coverage Adjustment continued
• Adjustment process adds households and persons into the
census database
• Adjust so that it remains consistent with the population estimates
• Builds on previous census adjustment which began in the 2001
One-Number-Census approach
• Method for 2021 Census has changed, better meets population
estimates for all key variables
#Census2021 www.sli.do 61866
34
Post-estimation
• Able to produce census outputs at any level and
for any breakdown of variables
• However, the census data does not account for
missed persons and households so outputs
would be biased
• Undercount does not usually occur uniformly
across all areas or other sub-groups of the
population such as age groups
Household ID Person ID Age Group
Census data
#Census2021 www.sli.do 61866
35
Post-estimation continued
• Population size estimates available at
local authority (LA) level, by some key
characteristics
• For both person and household
characteristics
• Allows small area outputs and more
characteristic breakdowns
Population estimates
#Census2021 www.sli.do 61866
36
2021 Census Coverage Adjustment Strategy
Stage 1: Impute missed households (with persons)
• Add household records into the census database – copy census
households from the database to meet the population estimates.
• Assign new geographical information (e.g. postcode)
Stage 2: Impute remaining characteristics
• Impute characteristics variables of the added records that aren’t
controlled for in the adjustment process.
#Census2021 www.sli.do 61866
37
Shortfall in Population Estimates (Stage 1)
• However, we have a
shortfall for several
variable breakdowns by LA
– benchmarks
• When adding in
persons/households to
meet one characteristic’s
estimated shortfall, they
add to other characteristics
too
Example: LA counts of males aged 20-24
LA
Census
count
Population
estimate
Shortfall (target
amount to impute)
LA 1 650 800 150
LA 2 400 500 100
LA … … … …
#Census2021 www.sli.do 61866
38
Selecting Records to Impute (Stage 1)
• In 2021, Combinatorial Optimisation (CO) will be used to select households.
• CO for adjustment process will involve the selection of a combination of
households from the census database that best fit the shortfall benchmarks.
• CO better meets these benchmarks; it performs well when controlling for all
of them simultaneously.
Household ID Person ID Age Group
#Census2021 www.sli.do 61866
39
Assigning Geographical Information (Stage 1)
• Need to assign a postcode to the imputed households
• Dummy forms provide us with records of addresses where unable to
enumerate, with basic information collected.
• Strong indicators of where usual resident households could exist.
• Assign imputed households a dummy form’s geographical information
• Dummy form has similar characteristics or is in small area with similar
characteristics
Household ID Person ID Age Group Postcode Output Area
#Census2021 www.sli.do 61866
40
Post-adjustment Edit and Imputation (Stage 2)
• In Stage 2, imputation of the
remaining characteristic
variables will be completed
using CANadian Census
Edit and Imputation System
(CANCEIS).
Household ID Person ID Age Group … … … Postcode Output Area
Household ID Person ID Age Group … … … Postcode Output Area
#Census2021 www.sli.do 61866
Resulting Census Database
• Now can produce estimated census outputs that
account for missed persons and households
• And for more detailed breakdown/at lower
geographical level
• These outputs will aggregate to any other
outputs (e.g. LA or national level) as they are
from the same database
41
Household ID Person ID Age Group
#Census2021 www.sli.do 61866
42
Communal Establishments
• Communal Establishments (CEs) are dealt with separately to households in
adjustment
• In 2011, there were over 46,000 communal establishments with almost one
million residents in England and Wales
• Small CEs (7-49 beds) and large CEs (50+ beds) are adjusted separately
as different data informs the coverage of each in the census
#Census2021 www.sli.do 61866
43 #Census2021 www.sli.do 61866
Summary
44 #Census2021 www.sli.do 61866
Key points
• Improve the quality of the census results
• Final census results represent our best estimate of the whole
population, not just those we count
• Building on the successful approach used in 2001 and 2011
• Improvements to methods and quality
• The process is flexible and adaptable
• Thorough quality assurance throughout – including
Methodological Assurance Review Panel
45 #Census2021 www.sli.do 61866
References
• Overview of our design and plans for Census 2021
(further information is available from that document)
• We publish the papers reviewed by the Methodological
Assurance Review panel (MARP) on the UK Statistics
Authority website
46
Estimation & Adjustment
in Census 2021
Any questions?
Visit www.sli.do and enter code 61866
Please complete our evaluation survey (we will email you a link) and let us know how we did.
If you have any questions email us on SDR.stakeholder.engagement@ons.gov.uk.
Want to learn more about the Census 2021 Statistical Design? Visit our website to find out
more about our other webinars.
#Census2021

Estimation & Adjustment in Census 2021

  • 1.
    1 Estimation & Adjustment inCensus 2021 Welcome Please turn off your camera and mute your microphone Questions? Visit www.sli.do and enter code 61866 to ask your questions Technology problems? Email sdr.stakeholder.engagement@ons.gov.uk Starting at 10:15am #Census2021
  • 2.
    2 Speakers Gareth Powell Project Leadfor Census Estimation Methodology #Census2021 www.sli.do 61866
  • 3.
    3 Speakers Viktor Račinskij Methodologist –Census Estimation #Census2021 www.sli.do 61866
  • 4.
    4 Speakers Kirsten Piller Methodologist –Census Adjustment #Census2021 www.sli.do 61866
  • 5.
    5 Aims for thissession: • Explain why we estimate and adjust the census population • Explore what an estimated and adjusted census database allows us to produce • Describe the statistical methods we will use to estimate and adjust the census population in 2021 • Describe what our plans are for sharing the details of our methods #Census2021 www.sli.do 61866
  • 6.
    6 Why are werunning these webinars? • To explain how the census works in collecting information and producing great statistics • To sign-post where more information is available • To follow on from the material we published in October • COVID response • Statistical Design • Findings from 2019 rehearsal #Census2021 www.sli.do 61866
  • 7.
  • 8.
    8 Census 2021 QualityTargets High quality, flexible, timely, accessible census statistics for users 94% overall response At least 80% in every local authority 75% Online Response Minimise variability in response Support completion #Census2021 www.sli.do 61866
  • 9.
    9 Estimation and Adjustment •How we account for non-response to the census • Estimation creates key population totals, for each local authority by age-sex group • Adjustment takes these population estimates and makes the record level database consistent #Census2021 www.sli.do 61866
  • 10.
    10 Coverage Estimation in Census2021 #Census2021 www.sli.do 61866
  • 11.
    11 Overview of Estimation •Even with a variety of strategies to maximise response, it is inevitable that some people will not be counted, or counted in the wrong place • Undercount error – a member of the target population is missing from census • Overcount error – a member of target population is counted in the wrong location or counted twice • In 2011 Census of England and Wales undercount was estimated to be 6%, overcount – 0.6% • Non-uniformly distributed across a population #Census2021 www.sli.do 61866
  • 12.
    12 Census Coverage Estimation •The process of estimating the coverage error corrected population size at national and subnational levels • Estimates for a range of domains such as age-sex, tenure, ethnicity, etc. by local authority • Individual and household populations • Coverage error corrected census estimates have higher accuracy than the raw census counts #Census2021 www.sli.do 61866
  • 13.
    13 Raw census countsat the local authority level in 2011 (estimated) #Census2021 www.sli.do 61866
  • 14.
    14 What coverage errorcorrected population size estimates may look like compared to raw counts #Census2021 www.sli.do 61866
  • 15.
    15 Broad outline ofestimation • Census coverage survey conducted 6 weeks after the census day, 1.5% of population sampled • Census and the coverage survey data are matched • Population size estimation • Variance estimation #Census2021 www.sli.do 61866
  • 16.
    16 Coverage estimation in2011 • Data delivered in batches, estimation on groups of local authorities (estimation areas) • Census coverage survey • ~1.5% of postcodes • Stratified by local authority and hard-to-count index • Sample in every local authority • Two-stage cluster (output areas, postcodes) • Optimal allocation with constraints • Combination of the dual system, ratio and synthetic estimators used #Census2021 www.sli.do 61866
  • 17.
    17 Data collection inCensus 2021 • Online data collection means that the data are going to be delivered quicker • Estimation can be done using the data for the entirety of England and Wales • Different estimation approach based on the logistic / mixed effects regression modelling #Census2021 www.sli.do 61866
  • 18.
    18 Independent re-enumeration ofa sample of the population #Census2021 www.sli.do 61866
  • 19.
    19 Independent re-enumeration ofa sample of the population: important points #Census2021 www.sli.do 61866
  • 20.
    20 How (and why)the dual system estimator works 𝑇𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 = [𝑁𝑢𝑚𝑏𝑒𝑟 𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑐𝑒𝑛𝑠𝑢𝑠] ∗ [𝐸𝑣𝑒𝑟𝑦𝑜𝑛𝑒 𝑖𝑛 𝐶𝐶𝑆] [𝐼𝑛 𝑏𝑜𝑡ℎ 𝐶𝐶𝑆 𝑎𝑛𝑑 𝐶𝑒𝑛𝑠𝑢𝑠] = 9 ∗ 8 6 = 12 #Census2021 www.sli.do 61866
  • 21.
    21 How to gofrom sample to population? Ratio estimator [Population size] =[DSE estimate within sample] [census count at population level] [census count within a sample] = 12 ∗ 90 9 = 120 #Census2021 www.sli.do 61866
  • 22.
    22 Sampling error Every CCSsample would be slightly different #Census2021 www.sli.do 61866
  • 23.
    23 Variance Estimation • Varianceestimation quantifies the error around coverage error corrected estimates • Gives confidence intervals for our estimates #Census2021 www.sli.do 61866
  • 24.
    24 Estimating across manydomains – Local Authorities & characteristics #Census2021 www.sli.do 61866
  • 25.
    25 Changes for 2021– Regression based modelling approach • In 2011 we estimated domains separately • In 2021 we will use a modelling approach to allow us to use all domains at the same time • This allows each of them to help in the estimation of all the others #Census2021 www.sli.do 61866
  • 26.
    26 Changes for 2021– Regression based modelling approach • Logistic or mixed effects logistic regression is used to estimate the census response probability for an element (e.g. a person) in a local authority with a set of characteristics • Reduces variability of estimates and allows to reflect coverage patterns and better meet assumptions of estimation #Census2021 www.sli.do 61866 [𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦] = 1 1 + exp −[𝒙𝑇𝛽 + 𝒛𝑇𝜅 + 𝑢𝐿]
  • 27.
    27 Why regression? • Doingestimation separately for each domain loses some information as domains can overlap • E.g. if we estimate for all men ages 40- 44 in an area, we aren’t also using their marital status • We could break the estimation down further – but groups would get too small #Census2021 www.sli.do 61866
  • 28.
    28 Even better –use all variables and areas at once #Census2021 www.sli.do 61866
  • 29.
    29 The rest ofestimation • Overcoverage estimation and correcting for overcoverage error is conceptually similar • Use the census to the coverage survey matching to establish the cases of overcount within a sample • Fit the logistic regression model to estimate the probability of correct enumeration • Apply it to adjust for overcoverage (downweight) • Communal Establishments #Census2021 www.sli.do 61866
  • 30.
    30 After Estimation • Qualityassurance using demographic analysis and admin data • Record level totals aren’t the same as the estimates • To get to small area multivariate outputs, need to impute record level data – coverage adjustment #Census2021 www.sli.do 61866
  • 31.
    31 Coverage Adjustment in Census2021 #Census2021 www.sli.do 61866
  • 32.
    32 Census Coverage Adjustment •Coverage estimation process provides information on what types of persons and households are estimated to have been missed • Population estimates only broken down by characteristics associated with non-response: • age groups by sex, ethnicity, household size, hard to count index, activity last week and tenure. #Census2021 www.sli.do 61866
  • 33.
    33 Census Coverage Adjustmentcontinued • Adjustment process adds households and persons into the census database • Adjust so that it remains consistent with the population estimates • Builds on previous census adjustment which began in the 2001 One-Number-Census approach • Method for 2021 Census has changed, better meets population estimates for all key variables #Census2021 www.sli.do 61866
  • 34.
    34 Post-estimation • Able toproduce census outputs at any level and for any breakdown of variables • However, the census data does not account for missed persons and households so outputs would be biased • Undercount does not usually occur uniformly across all areas or other sub-groups of the population such as age groups Household ID Person ID Age Group Census data #Census2021 www.sli.do 61866
  • 35.
    35 Post-estimation continued • Populationsize estimates available at local authority (LA) level, by some key characteristics • For both person and household characteristics • Allows small area outputs and more characteristic breakdowns Population estimates #Census2021 www.sli.do 61866
  • 36.
    36 2021 Census CoverageAdjustment Strategy Stage 1: Impute missed households (with persons) • Add household records into the census database – copy census households from the database to meet the population estimates. • Assign new geographical information (e.g. postcode) Stage 2: Impute remaining characteristics • Impute characteristics variables of the added records that aren’t controlled for in the adjustment process. #Census2021 www.sli.do 61866
  • 37.
    37 Shortfall in PopulationEstimates (Stage 1) • However, we have a shortfall for several variable breakdowns by LA – benchmarks • When adding in persons/households to meet one characteristic’s estimated shortfall, they add to other characteristics too Example: LA counts of males aged 20-24 LA Census count Population estimate Shortfall (target amount to impute) LA 1 650 800 150 LA 2 400 500 100 LA … … … … #Census2021 www.sli.do 61866
  • 38.
    38 Selecting Records toImpute (Stage 1) • In 2021, Combinatorial Optimisation (CO) will be used to select households. • CO for adjustment process will involve the selection of a combination of households from the census database that best fit the shortfall benchmarks. • CO better meets these benchmarks; it performs well when controlling for all of them simultaneously. Household ID Person ID Age Group #Census2021 www.sli.do 61866
  • 39.
    39 Assigning Geographical Information(Stage 1) • Need to assign a postcode to the imputed households • Dummy forms provide us with records of addresses where unable to enumerate, with basic information collected. • Strong indicators of where usual resident households could exist. • Assign imputed households a dummy form’s geographical information • Dummy form has similar characteristics or is in small area with similar characteristics Household ID Person ID Age Group Postcode Output Area #Census2021 www.sli.do 61866
  • 40.
    40 Post-adjustment Edit andImputation (Stage 2) • In Stage 2, imputation of the remaining characteristic variables will be completed using CANadian Census Edit and Imputation System (CANCEIS). Household ID Person ID Age Group … … … Postcode Output Area Household ID Person ID Age Group … … … Postcode Output Area #Census2021 www.sli.do 61866
  • 41.
    Resulting Census Database •Now can produce estimated census outputs that account for missed persons and households • And for more detailed breakdown/at lower geographical level • These outputs will aggregate to any other outputs (e.g. LA or national level) as they are from the same database 41 Household ID Person ID Age Group #Census2021 www.sli.do 61866
  • 42.
    42 Communal Establishments • CommunalEstablishments (CEs) are dealt with separately to households in adjustment • In 2011, there were over 46,000 communal establishments with almost one million residents in England and Wales • Small CEs (7-49 beds) and large CEs (50+ beds) are adjusted separately as different data informs the coverage of each in the census #Census2021 www.sli.do 61866
  • 43.
  • 44.
    44 #Census2021 www.sli.do61866 Key points • Improve the quality of the census results • Final census results represent our best estimate of the whole population, not just those we count • Building on the successful approach used in 2001 and 2011 • Improvements to methods and quality • The process is flexible and adaptable • Thorough quality assurance throughout – including Methodological Assurance Review Panel
  • 45.
    45 #Census2021 www.sli.do61866 References • Overview of our design and plans for Census 2021 (further information is available from that document) • We publish the papers reviewed by the Methodological Assurance Review panel (MARP) on the UK Statistics Authority website
  • 46.
    46 Estimation & Adjustment inCensus 2021 Any questions? Visit www.sli.do and enter code 61866 Please complete our evaluation survey (we will email you a link) and let us know how we did. If you have any questions email us on SDR.stakeholder.engagement@ons.gov.uk. Want to learn more about the Census 2021 Statistical Design? Visit our website to find out more about our other webinars. #Census2021