SlideShare a Scribd company logo
1 of 41
Producing household
estimates from
administrative data
Methodology and analysis towards
ONS Research Outputs 2017
Definitions
A
household
is defined as:
one person living alone,
or
a group of people (not
necessarily related) living at the
same address who share
cooking facilities and
share a living room or
sitting room or dining area.
ADCP aims for household outputs
Produce household statistics as part of Research Outputs 2016.
Three types of statistics over the next few years:-
• Number of households
• Household size
• Household composition
Household numbers released in February 2017
Derived from the same SPD as population estimates.
Replicate a similar output package as the population estimates -
time series
Can be produced at various levels of geography
Multiple versions from SPD versions.
SPD: Statistical Population Dataset
Household Numbers
Challenges
Our three biggest challenges for producing household
numbers
Definition – household/address is not a one to one
relationship
Correct address allocation
• data lags
• high churn
• people not deregistering
• poor AddressBase matching/allocation
What data can we use?
Address
Base
Population
Coverage
Survey
Tax and
Benefits
data
Definitions
There are some important distinctions between the household
estimates produced in these research outputs and those
published in official statistics:
The definition of ‘households’ used in these research outputs is
based on identifying occupied addresses in administrative data
Occupied addresses on administrative data include those with
at least one ‘usual resident’ included in our Statistical
Population Dataset (SPD V2.0)
Only occupied addresses that have been successfully linked to a
Unique Property Reference Number (UPRN) on AddressBase
have been included in these research outputs
Allocating address at SPD record level
Using many data sources to find our
‘best’ address.
Benefits
Enables aggregation at different
levels and cross tabulation with other
variables.
Can weight certain data sources for
different demographic groups . e.g.
students
Notes:
A non valid UPRN may occur when the address given cannot be
matched to one on reference data, or is not in England and Wales
4% of SPD V2.0 records could not be assigned to UPRN (i.e.
‘residual’)
Underestimations
When comparing SPD V2.0 household estimates with official estimates, there is a
clear tendency to underestimate the number of households using this
methodology. Reasons for this can be summarised as follows:
UPRN assignment - Not all records on SPD V2.0 can be assigned to a
UPRN, due to missing address information or failures to link addresses
Complex residential addresses – Addresses with ‘parent’ and ‘child’ UPRN
hierarchies are unlikely to have full coverage on the administrative data we are
using for these research outputs
SPD V2.0 inclusion rules – The rules used to determine usual residence in
our SPD V2.0 population estimates may have resulting in the incorrect exclusion
of some households from our population base
England and Wales –
Comparing with Census for 2011 :-
Outcomes – Numbers of Households
Distribution of Differences
2011 2015
Minimum -34.57 -25.43
Maximum 0.19 17.46
Mean -5.39 -3.01
England and Wales –
Comparing with Census for 2011 and DAU figures for 2011 and
2015:-
-14 -12 -10 -8 -6 -4 -2 0
England and Wales
East Midlands
East of England
London
North East
North West
South East
South West
Wales
West Midlands
Yorkshire and The Humber
Regional Percent Differences - 2011 and
2015
2011 2015
Outcomes – Numbers of Households
DAU: Demographics Analysis Unit at ONS
LA Name Region % difference
Kensington and Chelsea London -34.6
Westminster,City of London London -32.3
Islington London -22.2
Gwynedd Wales -21.4
Hammersmith and Fulham London -18.6
Camden London -17.4
Tower Hamlets London -16.8
Wandsworth London -16.0
Haringey London -15.6
Brent London -14.5
2011 2015
LA Name Region % difference
Gwynedd Wales -25.4
Westminster,City of London London -23.6
Kensington and Chelsea London -20.2
Cambridge East of England -20.0
Camden London -18.5
Broxbourne East of England 17.5
South Ribble North West 16.1
Watford East of England -15.4
Gravesham South East -14.5
Forest Heath East of England -14.4
Top Tens – largest differences
Outcomes – Numbers of Households
Household Sizes
Household Sizes
To investigate whether we can counteract the
definitional differences between census
households and addresses/UPRNs, using
SPREE (Structure Preserving Estimator)
Uses Annual Population Survey (APS)
proportions of household sizes to adjust SPD
estimates.
Challenges - sizes
Some categories vary more than others across
geographies, so are harder to estimate.
Some geographies are affected by certain
missingness e.g. armed forces data, so may need to
be treated differently
Some geographies are affected by usual residence
variations, so may need to be treated differently.
If an area is extremely different from the national
distribution, it may be harder to estimate using those
distributions.
Adjustment using SPREE
Structure Preserving Estimator (SPREE) method uses survey data to support admin data.
Adjusting the proportions of each category, rather than numbers.
Source: Office for National Statistics Notes: 1. Statistical Population Dataset 2. Annual Population Survey 3. SPREE - Structure Preserving Estimator
Adjustment using SPREE
Structure Preserving Estimator (SPREE) method uses survey data to support admin data
Source: Office for National Statistics Notes: 1. Statistical Population Dataset 2. Annual Population Survey 3. SPREE - Structure Preserving Estimator
Adjustment using SPREE
Structure Preserving Estimator (SPREE) method uses survey data to support admin data
Source: Office for National Statistics Notes: 1. Statistical Population Dataset 2. Annual Population Survey 3. SPREE - Structure Preserving Estimator
Effects of estimation
Kensington and Chelsea
-6
-5
-4
-3
-2
-1
0
1
2
3
4
1 2 3 4 5 plus
SPD¹ - Census
SPREE² - Census
SPD1 difference from census percentages versus SPREE2 adjustment, 2011
Source: Office for National Statistics
Notes: 1. SPD - Statistical Population Dataset
2. SPREE - Structure Preserving Estimator
Hastings
-8
-6
-4
-2
0
2
4
1 2 3 4 5 plus
SPD¹ - Census
SPREE² - Census
Effects of estimation
SPD1 difference from census percentages versus SPREE2 adjustment, 2011
Source: Office for National Statistics
Notes: 1. SPD - Statistical Population Dataset
2. SPREE - Structure Preserving Estimator
Newham
-2
-1
0
1
2
3
4
1 2 3 4 5 plus
SPD¹ - Census
SPREE² - Census
Richmondshire
-4
-3
-2
-1
0
1
2
3
4
5
6
1 2 3 4 5 plus
SPD¹ - Census
SPREE² - Census
Household Composition
Classification
Census KS105EW:
One person household
Aged 65 and over
Other
One family household
All aged 65 and over
Married or same-sex civil partnership couple
No children
Dependent children
All children non-dependent
Cohabiting couple
No children
Dependent children
All children non-dependent
Lone parent
Dependent children
All children non-dependent
Other household types
With dependent children
All full-time students
All aged 65 and over
Other
Annual UK estimates from
Labour Force Survey:
One person household
Under 65
65 or over
Two or more unrelated adults
One family households
Couple
No children
1-2 dependent children
3 or more dependent children
Non-dependent children only
Lone parent
Dependent children
Non-dependent children only
Multi-family households
Using admin data
To create household composition we need:
1. Population base of usual residents – SPD V2.0
2. Usual residents assigned to an address to create
households base
Issues with SPD and household base described earlier
impact household composition
Other information used for household composition
1. Age, sex, surnames of occupants
2. Relationships from other admin data - ONS now has
access to some admin data containing relationships
Other work and methods
Register based countries: Austria
• Social security, child allowance and tax sources
• Couple, parent-child, sibling, grandparent-grandchild
relationships
• Still have to use imputation method for some relationships
UK: Harper and Mayhew (2015)
• No relationships available
• Count people in broad age groups to assign household type
• Children (0-19), Working age (20-64), Older adults (65+)
ONS method falls between these
• Use the relationships available in admin data where possible
• Use demographic information to infer others
Relationships in admin data
Couple relationships:
1. Housing Benefit
• Partner ID available where
applicable
2. National Benefits
Database
• Partner ID available for
State Pension claimants
If not available, need to infer
a couple relationship
0
50,000
100,000
150,000
200,000
250,000
300,000
15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95
Age
Age of people with partner ID
Relationships in admin data
Child Benefit data
• Contains a National Insurance ID for one of the parents
• High coverage of dependent children
• Eligible up to age 16, then up to 19 if in approved education
or training
• Identify whether 16-18 year olds are dependent children, to
match census definition
Non-dependent children
• No longer on Child Benefit dataset
• Infer a relationship to a parent using additional information
Algorithm
1) Single person 2) All students 3) Lone parent
4) Couple
•a) With Partner ID
•b) No Partner ID
5) Other
•a) More than 2 generations
•b) Unrelated adult
• Use all possible relationships at address to assign
the household to a major category:
3
1
2
Age
18
16
Algorithm
1. Single person households – one person in UPRN
2. Student – all people have HESA record
3. Lone parent families:
Smith
Smith
Parent ID
> 18 years
Couple families
4. Couple families:
3
4
Partner ID
≤ 12 years
Parent ID
> 18 years
Smith
Age 1
2
18
16
Smith
Other households
Age
2
1
3
4
5
> 50 years
Age
1
2
3
< 15 years
Contain more than one family
More than two
generations:
Person 3 too old to
be child of 1 or 2
Results
0 10 20 30 40 50 60
Single
Student
Couple
Lone parent
Other
Missing
% of households
Census
SPD
• Percentage distribution to remove household undercount effect
• ‘Missing’ – does not meet any current category criteria
Minor categories
Single person
Aged 65+
Other
Lone parent
With
dependent
children
All children
non-dependent
Couple
No children
With
dependent
children
All children
non-
dependent
Other
With
dependent
children
All aged 65+
Other
Minor categories results
0 5 10 15 20
Aged 65 and over
Other
All aged 65 and over
No children
Dependent children
All children non-dependent
Dependent children
All children non-dependent
Student
With dependent children
All aged 65 and over
Other
Missing
SCLSOM
% of households
Census
SPD
Local authorities
• Very nearly all LAs have undercount for ‘Couple’ and ‘Other’
• Low level of ‘Missing’ in areas with high proportion of couple
households and low ‘Other’
• Older population = high proportion of couples with Partner ID
-15
-10
-5
0
5
10
Single Student Couple Lone
parent
Other
SPD%-Census%
Comparison with Census
0
10
20
30
40
Missing Couple with Partner ID
%ofhouseholds
Missing and Partner ID
Ranges of values for local authorities:
North East Derbyshire
• Lowest percentage of ‘Missing’ household
composition
Newham
• Highest percentage of ‘Missing’ household
composition
Kensington and Chelsea
• Largest difference for couple family households
Richmondshire
• Missing armed forces affect both distributions
Next Steps
• Assign addresses with ‘Missing’ household
composition to a category
• Many couples but age difference outside current range
• Some are ‘Other’ households eg unrelated adults
• Possibly use imputation method similar to Austria
• Use households containing a Partner ID as donors
• All other relationships in these are ‘non-couple’
• Evaluate effectiveness of algorithm
• Compare to record level census data
Future Plans
Publish Research outputs: occupied address (household)
estimates by size, 2011 – 24th July
Improve estimates of household numbers – output early next
year
Adjust numbers using a coverage survey
Research removal of communal establishments
Use more data e.g. Council Tax to identify students/one person
households
Household Composition – output early next year
Unoccupied addresses - do we need them?

More Related Content

Similar to Ons households july 17 research cp ml

Delivering early benefits and trial outputs using administrative data
Delivering early benefits and trial outputs using administrative dataDelivering early benefits and trial outputs using administrative data
Delivering early benefits and trial outputs using administrative data
UKDSCensus
 
Evaluating the feasibility of using administrative data in the context of cen...
Evaluating the feasibility of using administrative data in the context of cen...Evaluating the feasibility of using administrative data in the context of cen...
Evaluating the feasibility of using administrative data in the context of cen...
UKDSCensus
 
Distribution of Medicare Taxes and Spending by Lifetime Household Earnings
Distribution of Medicare Taxes and Spending by Lifetime Household EarningsDistribution of Medicare Taxes and Spending by Lifetime Household Earnings
Distribution of Medicare Taxes and Spending by Lifetime Household Earnings
Congressional Budget Office
 
Understanding differences in life satisfaction between local authority areas:...
Understanding differences in life satisfaction between local authority areas:...Understanding differences in life satisfaction between local authority areas:...
Understanding differences in life satisfaction between local authority areas:...
DCLGIntegration
 
Data Mining for Community Needs Assessments
Data Mining for Community Needs AssessmentsData Mining for Community Needs Assessments
Data Mining for Community Needs Assessments
Avery Eenigenburg
 

Similar to Ons households july 17 research cp ml (20)

Administrative data census research
Administrative data census researchAdministrative data census research
Administrative data census research
 
Research Outputs for small areas 2017: analysis and findings
Research Outputs for small areas 2017: analysis and findingsResearch Outputs for small areas 2017: analysis and findings
Research Outputs for small areas 2017: analysis and findings
 
FCS_Presentation
FCS_PresentationFCS_Presentation
FCS_Presentation
 
Towards an administrative data census the story so far
Towards an administrative data census   the story so farTowards an administrative data census   the story so far
Towards an administrative data census the story so far
 
Medicaid Reporting Errors in Four National Surveys: ACS, CPS, MEPS, and NHIS
Medicaid Reporting Errors in Four National Surveys: ACS, CPS, MEPS, and NHISMedicaid Reporting Errors in Four National Surveys: ACS, CPS, MEPS, and NHIS
Medicaid Reporting Errors in Four National Surveys: ACS, CPS, MEPS, and NHIS
 
Delivering early benefits and trial outputs using administrative data
Delivering early benefits and trial outputs using administrative dataDelivering early benefits and trial outputs using administrative data
Delivering early benefits and trial outputs using administrative data
 
Admin data census
Admin data censusAdmin data census
Admin data census
 
Evaluating the feasibility of using administrative data in the context of cen...
Evaluating the feasibility of using administrative data in the context of cen...Evaluating the feasibility of using administrative data in the context of cen...
Evaluating the feasibility of using administrative data in the context of cen...
 
adv analysis of com hlth data
adv analysis of com hlth dataadv analysis of com hlth data
adv analysis of com hlth data
 
The importance of place
The importance of placeThe importance of place
The importance of place
 
Demographic data101
Demographic data101Demographic data101
Demographic data101
 
ONS household income statistics user event
ONS household income statistics user event ONS household income statistics user event
ONS household income statistics user event
 
Distribution of Medicare Taxes and Spending by Lifetime Household Earnings
Distribution of Medicare Taxes and Spending by Lifetime Household EarningsDistribution of Medicare Taxes and Spending by Lifetime Household Earnings
Distribution of Medicare Taxes and Spending by Lifetime Household Earnings
 
Clermont County Consolidated Plan Public Presentation 2/4/15-2/5/15
Clermont County Consolidated Plan Public Presentation 2/4/15-2/5/15Clermont County Consolidated Plan Public Presentation 2/4/15-2/5/15
Clermont County Consolidated Plan Public Presentation 2/4/15-2/5/15
 
Opportunities for Improving Wealth Distribution Statistics
Opportunities for Improving Wealth Distribution StatisticsOpportunities for Improving Wealth Distribution Statistics
Opportunities for Improving Wealth Distribution Statistics
 
Webinar for LSC grantees, Estimating LSC Funding Changes Based on Shifts in t...
Webinar for LSC grantees, Estimating LSC Funding Changes Based on Shifts in t...Webinar for LSC grantees, Estimating LSC Funding Changes Based on Shifts in t...
Webinar for LSC grantees, Estimating LSC Funding Changes Based on Shifts in t...
 
Understanding differences in life satisfaction between local authority areas:...
Understanding differences in life satisfaction between local authority areas:...Understanding differences in life satisfaction between local authority areas:...
Understanding differences in life satisfaction between local authority areas:...
 
Data Mining for Community Needs Assessments
Data Mining for Community Needs AssessmentsData Mining for Community Needs Assessments
Data Mining for Community Needs Assessments
 
Grandparenting in Europe 2013- who are the grandparents provoding childcare?
Grandparenting in Europe 2013- who are the grandparents provoding childcare?Grandparenting in Europe 2013- who are the grandparents provoding childcare?
Grandparenting in Europe 2013- who are the grandparents provoding childcare?
 
Constructing families using administrative registers in Estonia, Helle Visk, ...
Constructing families using administrative registers in Estonia, Helle Visk, ...Constructing families using administrative registers in Estonia, Helle Visk, ...
Constructing families using administrative registers in Estonia, Helle Visk, ...
 

Recently uploaded

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Recently uploaded (20)

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

Ons households july 17 research cp ml

  • 1. Producing household estimates from administrative data Methodology and analysis towards ONS Research Outputs 2017
  • 2. Definitions A household is defined as: one person living alone, or a group of people (not necessarily related) living at the same address who share cooking facilities and share a living room or sitting room or dining area.
  • 3. ADCP aims for household outputs Produce household statistics as part of Research Outputs 2016. Three types of statistics over the next few years:- • Number of households • Household size • Household composition Household numbers released in February 2017 Derived from the same SPD as population estimates. Replicate a similar output package as the population estimates - time series Can be produced at various levels of geography Multiple versions from SPD versions. SPD: Statistical Population Dataset
  • 5. Challenges Our three biggest challenges for producing household numbers Definition – household/address is not a one to one relationship Correct address allocation • data lags • high churn • people not deregistering • poor AddressBase matching/allocation
  • 6. What data can we use? Address Base Population Coverage Survey Tax and Benefits data
  • 7. Definitions There are some important distinctions between the household estimates produced in these research outputs and those published in official statistics: The definition of ‘households’ used in these research outputs is based on identifying occupied addresses in administrative data Occupied addresses on administrative data include those with at least one ‘usual resident’ included in our Statistical Population Dataset (SPD V2.0) Only occupied addresses that have been successfully linked to a Unique Property Reference Number (UPRN) on AddressBase have been included in these research outputs
  • 8. Allocating address at SPD record level Using many data sources to find our ‘best’ address. Benefits Enables aggregation at different levels and cross tabulation with other variables. Can weight certain data sources for different demographic groups . e.g. students Notes: A non valid UPRN may occur when the address given cannot be matched to one on reference data, or is not in England and Wales 4% of SPD V2.0 records could not be assigned to UPRN (i.e. ‘residual’)
  • 9. Underestimations When comparing SPD V2.0 household estimates with official estimates, there is a clear tendency to underestimate the number of households using this methodology. Reasons for this can be summarised as follows: UPRN assignment - Not all records on SPD V2.0 can be assigned to a UPRN, due to missing address information or failures to link addresses Complex residential addresses – Addresses with ‘parent’ and ‘child’ UPRN hierarchies are unlikely to have full coverage on the administrative data we are using for these research outputs SPD V2.0 inclusion rules – The rules used to determine usual residence in our SPD V2.0 population estimates may have resulting in the incorrect exclusion of some households from our population base
  • 10. England and Wales – Comparing with Census for 2011 :- Outcomes – Numbers of Households
  • 11. Distribution of Differences 2011 2015 Minimum -34.57 -25.43 Maximum 0.19 17.46 Mean -5.39 -3.01
  • 12. England and Wales – Comparing with Census for 2011 and DAU figures for 2011 and 2015:- -14 -12 -10 -8 -6 -4 -2 0 England and Wales East Midlands East of England London North East North West South East South West Wales West Midlands Yorkshire and The Humber Regional Percent Differences - 2011 and 2015 2011 2015 Outcomes – Numbers of Households DAU: Demographics Analysis Unit at ONS
  • 13. LA Name Region % difference Kensington and Chelsea London -34.6 Westminster,City of London London -32.3 Islington London -22.2 Gwynedd Wales -21.4 Hammersmith and Fulham London -18.6 Camden London -17.4 Tower Hamlets London -16.8 Wandsworth London -16.0 Haringey London -15.6 Brent London -14.5 2011 2015 LA Name Region % difference Gwynedd Wales -25.4 Westminster,City of London London -23.6 Kensington and Chelsea London -20.2 Cambridge East of England -20.0 Camden London -18.5 Broxbourne East of England 17.5 South Ribble North West 16.1 Watford East of England -15.4 Gravesham South East -14.5 Forest Heath East of England -14.4 Top Tens – largest differences Outcomes – Numbers of Households
  • 15. Household Sizes To investigate whether we can counteract the definitional differences between census households and addresses/UPRNs, using SPREE (Structure Preserving Estimator) Uses Annual Population Survey (APS) proportions of household sizes to adjust SPD estimates.
  • 16. Challenges - sizes Some categories vary more than others across geographies, so are harder to estimate. Some geographies are affected by certain missingness e.g. armed forces data, so may need to be treated differently Some geographies are affected by usual residence variations, so may need to be treated differently. If an area is extremely different from the national distribution, it may be harder to estimate using those distributions.
  • 17. Adjustment using SPREE Structure Preserving Estimator (SPREE) method uses survey data to support admin data. Adjusting the proportions of each category, rather than numbers. Source: Office for National Statistics Notes: 1. Statistical Population Dataset 2. Annual Population Survey 3. SPREE - Structure Preserving Estimator
  • 18. Adjustment using SPREE Structure Preserving Estimator (SPREE) method uses survey data to support admin data Source: Office for National Statistics Notes: 1. Statistical Population Dataset 2. Annual Population Survey 3. SPREE - Structure Preserving Estimator
  • 19. Adjustment using SPREE Structure Preserving Estimator (SPREE) method uses survey data to support admin data Source: Office for National Statistics Notes: 1. Statistical Population Dataset 2. Annual Population Survey 3. SPREE - Structure Preserving Estimator
  • 20. Effects of estimation Kensington and Chelsea -6 -5 -4 -3 -2 -1 0 1 2 3 4 1 2 3 4 5 plus SPD¹ - Census SPREE² - Census SPD1 difference from census percentages versus SPREE2 adjustment, 2011 Source: Office for National Statistics Notes: 1. SPD - Statistical Population Dataset 2. SPREE - Structure Preserving Estimator Hastings -8 -6 -4 -2 0 2 4 1 2 3 4 5 plus SPD¹ - Census SPREE² - Census
  • 21. Effects of estimation SPD1 difference from census percentages versus SPREE2 adjustment, 2011 Source: Office for National Statistics Notes: 1. SPD - Statistical Population Dataset 2. SPREE - Structure Preserving Estimator Newham -2 -1 0 1 2 3 4 1 2 3 4 5 plus SPD¹ - Census SPREE² - Census Richmondshire -4 -3 -2 -1 0 1 2 3 4 5 6 1 2 3 4 5 plus SPD¹ - Census SPREE² - Census
  • 23. Classification Census KS105EW: One person household Aged 65 and over Other One family household All aged 65 and over Married or same-sex civil partnership couple No children Dependent children All children non-dependent Cohabiting couple No children Dependent children All children non-dependent Lone parent Dependent children All children non-dependent Other household types With dependent children All full-time students All aged 65 and over Other Annual UK estimates from Labour Force Survey: One person household Under 65 65 or over Two or more unrelated adults One family households Couple No children 1-2 dependent children 3 or more dependent children Non-dependent children only Lone parent Dependent children Non-dependent children only Multi-family households
  • 24. Using admin data To create household composition we need: 1. Population base of usual residents – SPD V2.0 2. Usual residents assigned to an address to create households base Issues with SPD and household base described earlier impact household composition Other information used for household composition 1. Age, sex, surnames of occupants 2. Relationships from other admin data - ONS now has access to some admin data containing relationships
  • 25. Other work and methods Register based countries: Austria • Social security, child allowance and tax sources • Couple, parent-child, sibling, grandparent-grandchild relationships • Still have to use imputation method for some relationships UK: Harper and Mayhew (2015) • No relationships available • Count people in broad age groups to assign household type • Children (0-19), Working age (20-64), Older adults (65+) ONS method falls between these • Use the relationships available in admin data where possible • Use demographic information to infer others
  • 26. Relationships in admin data Couple relationships: 1. Housing Benefit • Partner ID available where applicable 2. National Benefits Database • Partner ID available for State Pension claimants If not available, need to infer a couple relationship 0 50,000 100,000 150,000 200,000 250,000 300,000 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Age Age of people with partner ID
  • 27. Relationships in admin data Child Benefit data • Contains a National Insurance ID for one of the parents • High coverage of dependent children • Eligible up to age 16, then up to 19 if in approved education or training • Identify whether 16-18 year olds are dependent children, to match census definition Non-dependent children • No longer on Child Benefit dataset • Infer a relationship to a parent using additional information
  • 28. Algorithm 1) Single person 2) All students 3) Lone parent 4) Couple •a) With Partner ID •b) No Partner ID 5) Other •a) More than 2 generations •b) Unrelated adult • Use all possible relationships at address to assign the household to a major category:
  • 29. 3 1 2 Age 18 16 Algorithm 1. Single person households – one person in UPRN 2. Student – all people have HESA record 3. Lone parent families: Smith Smith Parent ID > 18 years
  • 30. Couple families 4. Couple families: 3 4 Partner ID ≤ 12 years Parent ID > 18 years Smith Age 1 2 18 16 Smith
  • 31. Other households Age 2 1 3 4 5 > 50 years Age 1 2 3 < 15 years Contain more than one family More than two generations: Person 3 too old to be child of 1 or 2
  • 32. Results 0 10 20 30 40 50 60 Single Student Couple Lone parent Other Missing % of households Census SPD • Percentage distribution to remove household undercount effect • ‘Missing’ – does not meet any current category criteria
  • 33. Minor categories Single person Aged 65+ Other Lone parent With dependent children All children non-dependent Couple No children With dependent children All children non- dependent Other With dependent children All aged 65+ Other
  • 34. Minor categories results 0 5 10 15 20 Aged 65 and over Other All aged 65 and over No children Dependent children All children non-dependent Dependent children All children non-dependent Student With dependent children All aged 65 and over Other Missing SCLSOM % of households Census SPD
  • 35. Local authorities • Very nearly all LAs have undercount for ‘Couple’ and ‘Other’ • Low level of ‘Missing’ in areas with high proportion of couple households and low ‘Other’ • Older population = high proportion of couples with Partner ID -15 -10 -5 0 5 10 Single Student Couple Lone parent Other SPD%-Census% Comparison with Census 0 10 20 30 40 Missing Couple with Partner ID %ofhouseholds Missing and Partner ID Ranges of values for local authorities:
  • 36. North East Derbyshire • Lowest percentage of ‘Missing’ household composition
  • 37. Newham • Highest percentage of ‘Missing’ household composition
  • 38. Kensington and Chelsea • Largest difference for couple family households
  • 39. Richmondshire • Missing armed forces affect both distributions
  • 40. Next Steps • Assign addresses with ‘Missing’ household composition to a category • Many couples but age difference outside current range • Some are ‘Other’ households eg unrelated adults • Possibly use imputation method similar to Austria • Use households containing a Partner ID as donors • All other relationships in these are ‘non-couple’ • Evaluate effectiveness of algorithm • Compare to record level census data
  • 41. Future Plans Publish Research outputs: occupied address (household) estimates by size, 2011 – 24th July Improve estimates of household numbers – output early next year Adjust numbers using a coverage survey Research removal of communal establishments Use more data e.g. Council Tax to identify students/one person households Household Composition – output early next year Unoccupied addresses - do we need them?

Editor's Notes

  1. What we’ve done so far – initial aggregates of number, sizes and composition. However the issue of undercount of numbers of households needed to be addressed, so concentration has been on investigations into estimation; resolution of half weights and improvements in address matching.
  2. Incorrect allocation seems to be leading to undercount, as does our definition, to some extent.
  3. London has high hh1 and low hh2 on SPD Hastings has low hh1 and hh2
  4. Richmondshire has high h11 and low hh2 – due to missing partners Newham has high expected hh5
  5. HH composition is on the ADC plan for publication with next years HH research outputs, possibility of an RO article earlier. Progress and results so far, algorithm is not complete and there are still some residuals to assign a value.
  6. KS105EW is the less detailed breakdown, QS113EW also has the ‘Dependent children’ categories split into ‘1’ and ‘2 or more’ categories, as well as the same-sex CP couples separate to the married couples. Therefore assuming these are the most important categories, and starting out aiming at this from admin data, but have the user needs evolved? Are there any specific types of households that users would really like more information on? Non census years, estimates from LFS which are only published at UK level. Could reproduce this but improve by producing outputs at LA level and below. Outline all the options that can be produced from Census Are they still most appropriate for current user needs? Are there any additional categories that should be added if possible? Which of these are most important? Which are likely to be difficult with admin data? What level is acceptable?
  7. There is a few % of people who couldn’t be assigned to an address and an undercount of almost 6% on household numbers for E&W. The earlier stages each have their issues as Claire has discussed, so the success of household composition is limited by these even if a perfect method can be produced.
  8. Austria situation is close to the ideal, Mayhew and Harper method is what has been done in UK. We have some relationships on admin data and information to infer others, so our situation falls between these two cases.
  9. Could archive relationships from earlier years, so if a child is now non-dependent, we can use the parent-child relationship that previously existed on the Child Benefit dataset
  10. Couples affected by the residuals that are still missing but likely to have a slight undercount due to household size distribution where size 2 is undercounted
  11. Smaller difference for couples with dependent children than for couples with no children. Likely due to the household size distribution, which has undercoverage for 2 person and overcoverage for 3 person households. Both categories then have the same issues with the algorithm identifying couples.
  12. Chart shows couple and other households are very nearly always undercounting The differences among the household types are largely dependent on how close the household size distribution is to census. The proportion of couple households identified by a partner_ID tells us something about relative uncertainty for this category. Algorithm has least missing household composition in North East Derbyshire and South Norfolk and highest for Newham. % of couples identified by partner_id is highest in West Somerset and lowest in City of London and Lambeth/Wandsworth/Southwark area. Mainly driven by age of population due to largest contribution from state pension.
  13. NE Derbyshire has least missing and percentage of couples identified with partner_id of 29.2%. Areas with higher partner_id are similar to this. A high proportion of couples, relatively old population. Residuals look like they should split between couples and other. A small net undercoverage in the size 2 and 3 households, meaning couples should come out quite close if the residuals can be assigned correctly.
  14. A high proportion of large households and a high proportion of ‘Other’ types After the residuals are assigned the SPD is likely to overestimate the proportion of ‘Other’ and underestimate the proportion of couples and lone parents. Lone parents are currently low, and the undercoverage is not likely to be made up by the residuals, so expect some undercount for couples too. 2 and 3 person must also contribute significantly to the ‘Other’ households, since eg 2 person are quite close to census but couple is not.
  15. K&C has largest gap for couples currently. Area is high on missing and low on partner_id. The message here is that the undercoverage of 2 person households means that the algorithm will never be able to match the Census proportion of couples. That would require improvements to the undercoverage in the SPD/residual people not assigned to addresses. Undercoverage on the SPD means we are overestimating the proportion of single person households.
  16. Other things to mention in general households next steps: Improvements to prior steps will benefit household composition eg: Over/undercoverage in SPD population base Addresses on admin sources Address matching Focus on acquiring more admin data with relationships (or marital status) Especially partners
  17. Adresses on HESA