Ons households july 17 research cp ml

Producing household
estimates from
administrative data
Methodology and analysis towards
ONS Research Outputs 2017

Definitions
A
household
is defined as:
one person living alone,
or
a group of people (not
necessarily related) living at the
same address who share
cooking facilities and
share a living room or
sitting room or dining area.

ADCP aims for household outputs
Produce household statistics as part of Research Outputs 2016.
Three types of statistics over the next few years:-
• Number of households
• Household size
• Household composition
Household numbers released in February 2017
Derived from the same SPD as population estimates.
Replicate a similar output package as the population estimates -
time series
Can be produced at various levels of geography
Multiple versions from SPD versions.
SPD: Statistical Population Dataset

Challenges
Our three biggest challenges for producing household
numbers
Definition – household/address is not a one to one
relationship
Correct address allocation
• data lags
• high churn
• people not deregistering
• poor AddressBase matching/allocation

What data can we use?
Address
Base
Population
Coverage
Survey
Tax and
Benefits
data

Definitions
There are some important distinctions between the household
estimates produced in these research outputs and those
published in official statistics:
The definition of ‘households’ used in these research outputs is
based on identifying occupied addresses in administrative data
Occupied addresses on administrative data include those with
at least one ‘usual resident’ included in our Statistical
Population Dataset (SPD V2.0)
Only occupied addresses that have been successfully linked to a
Unique Property Reference Number (UPRN) on AddressBase
have been included in these research outputs

Allocating address at SPD record level
Using many data sources to find our
‘best’ address.
Benefits
Enables aggregation at different
levels and cross tabulation with other
variables.
Can weight certain data sources for
different demographic groups . e.g.
students
Notes:
A non valid UPRN may occur when the address given cannot be
matched to one on reference data, or is not in England and Wales
4% of SPD V2.0 records could not be assigned to UPRN (i.e.
‘residual’)

Underestimations
When comparing SPD V2.0 household estimates with official estimates, there is a
clear tendency to underestimate the number of households using this
methodology. Reasons for this can be summarised as follows:
UPRN assignment - Not all records on SPD V2.0 can be assigned to a
UPRN, due to missing address information or failures to link addresses
Complex residential addresses – Addresses with ‘parent’ and ‘child’ UPRN
hierarchies are unlikely to have full coverage on the administrative data we are
using for these research outputs
SPD V2.0 inclusion rules – The rules used to determine usual residence in
our SPD V2.0 population estimates may have resulting in the incorrect exclusion
of some households from our population base

England and Wales –
Comparing with Census for 2011 :-
Outcomes – Numbers of Households

Distribution of Differences
2011 2015
Minimum -34.57 -25.43
Maximum 0.19 17.46
Mean -5.39 -3.01

England and Wales –
Comparing with Census for 2011 and DAU figures for 2011 and
2015:-
-14 -12 -10 -8 -6 -4 -2 0
England and Wales
East Midlands
East of England
London
North East
North West
South East
South West
Wales
West Midlands
Yorkshire and The Humber
Regional Percent Differences - 2011 and
2015
2011 2015
DAU: Demographics Analysis Unit at ONS

LA Name Region % difference
Kensington and Chelsea London -34.6
Westminster,City of London London -32.3
Islington London -22.2
Gwynedd Wales -21.4
Hammersmith and Fulham London -18.6
Camden London -17.4
Tower Hamlets London -16.8
Wandsworth London -16.0
Haringey London -15.6
Brent London -14.5
2011 2015
LA Name Region % difference
Gwynedd Wales -25.4
Westminster,City of London London -23.6
Kensington and Chelsea London -20.2
Cambridge East of England -20.0
Camden London -18.5
Broxbourne East of England 17.5
South Ribble North West 16.1
Watford East of England -15.4
Gravesham South East -14.5
Forest Heath East of England -14.4
Top Tens – largest differences

Household Sizes
To investigate whether we can counteract the
definitional differences between census
households and addresses/UPRNs, using
SPREE (Structure Preserving Estimator)
Uses Annual Population Survey (APS)
proportions of household sizes to adjust SPD
estimates.

Challenges - sizes
Some categories vary more than others across
geographies, so are harder to estimate.
Some geographies are affected by certain
missingness e.g. armed forces data, so may need to
be treated differently
Some geographies are affected by usual residence
variations, so may need to be treated differently.
If an area is extremely different from the national
distribution, it may be harder to estimate using those
distributions.

Adjustment using SPREE
Structure Preserving Estimator (SPREE) method uses survey data to support admin data.
Adjusting the proportions of each category, rather than numbers.
Source: Office for National Statistics Notes: 1. Statistical Population Dataset 2. Annual Population Survey 3. SPREE - Structure Preserving Estimator

Adjustment using SPREE
Structure Preserving Estimator (SPREE) method uses survey data to support admin data
Source: Office for National Statistics Notes: 1. Statistical Population Dataset 2. Annual Population Survey 3. SPREE - Structure Preserving Estimator

Effects of estimation
Kensington and Chelsea
-6
-5
-4
-3
-2
-1
0
1
2
3
4
1 2 3 4 5 plus
SPD¹ - Census
SPREE² - Census
SPD1 difference from census percentages versus SPREE2 adjustment, 2011
Source: Office for National Statistics
Notes: 1. SPD - Statistical Population Dataset
2. SPREE - Structure Preserving Estimator
Hastings
-8
-6
-4
-2
0
2
4
1 2 3 4 5 plus
SPD¹ - Census
SPREE² - Census

Effects of estimation
SPD1 difference from census percentages versus SPREE2 adjustment, 2011
Source: Office for National Statistics
Notes: 1. SPD - Statistical Population Dataset
2. SPREE - Structure Preserving Estimator
Newham
-2
-1
0
1
2
3
4
1 2 3 4 5 plus
SPD¹ - Census
SPREE² - Census
Richmondshire
-4
-3
-2
-1
0
1
2
3
4
5
6
1 2 3 4 5 plus
SPD¹ - Census
SPREE² - Census

Classification
Census KS105EW:
One person household
Aged 65 and over
Other
One family household
All aged 65 and over
Married or same-sex civil partnership couple
No children
Dependent children
All children non-dependent
Cohabiting couple
No children
Dependent children
Lone parent
Dependent children
Other household types
With dependent children
All full-time students
Other
Annual UK estimates from
Labour Force Survey:
One person household
Under 65
65 or over
Two or more unrelated adults
One family households
Couple
No children
1-2 dependent children
3 or more dependent children
Non-dependent children only
Lone parent
Dependent children
Non-dependent children only
Multi-family households

Using admin data
To create household composition we need:
1. Population base of usual residents – SPD V2.0
2. Usual residents assigned to an address to create
households base
Issues with SPD and household base described earlier
impact household composition
Other information used for household composition
1. Age, sex, surnames of occupants
2. Relationships from other admin data - ONS now has
access to some admin data containing relationships

Other work and methods
Register based countries: Austria
• Social security, child allowance and tax sources
• Couple, parent-child, sibling, grandparent-grandchild
relationships
• Still have to use imputation method for some relationships
UK: Harper and Mayhew (2015)
• No relationships available
• Count people in broad age groups to assign household type
• Children (0-19), Working age (20-64), Older adults (65+)
ONS method falls between these
• Use the relationships available in admin data where possible
• Use demographic information to infer others

Relationships in admin data
Couple relationships:
1. Housing Benefit
• Partner ID available where
applicable
2. National Benefits
Database
• Partner ID available for
State Pension claimants
If not available, need to infer
a couple relationship
0
50,000
100,000
150,000
200,000
250,000
300,000
15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95
Age
Age of people with partner ID

Relationships in admin data
Child Benefit data
• Contains a National Insurance ID for one of the parents
• High coverage of dependent children
• Eligible up to age 16, then up to 19 if in approved education
or training
• Identify whether 16-18 year olds are dependent children, to
match census definition
Non-dependent children
• No longer on Child Benefit dataset
• Infer a relationship to a parent using additional information

Algorithm
1) Single person 2) All students 3) Lone parent
4) Couple
•a) With Partner ID
•b) No Partner ID
5) Other
•a) More than 2 generations
•b) Unrelated adult
• Use all possible relationships at address to assign
the household to a major category:

3
1
2
Age
18
16
Algorithm
1. Single person households – one person in UPRN
2. Student – all people have HESA record
3. Lone parent families:
Smith
Smith
Parent ID
> 18 years

Couple families
4. Couple families:
3
4
Partner ID
≤ 12 years
Parent ID
> 18 years
Smith
Age 1
2
18
16
Smith

Other households
Age
2
1
3
4
5
> 50 years
Age
1
2
3
< 15 years
Contain more than one family
More than two
generations:
Person 3 too old to
be child of 1 or 2

Results
0 10 20 30 40 50 60
Single
Student
Couple
Lone parent
Other
Missing
% of households
Census
SPD
• Percentage distribution to remove household undercount effect
• ‘Missing’ – does not meet any current category criteria

Minor categories
Single person
Aged 65+
Other
Lone parent
With
dependent
children
All children
non-dependent
Couple
No children
With
dependent
children
All children
non-
dependent
Other
With
dependent
children
All aged 65+
Other

Minor categories results
0 5 10 15 20
Aged 65 and over
Other
No children
Dependent children
Dependent children
Student
With dependent children
Other
Missing
SCLSOM
% of households
Census
SPD

Local authorities
• Very nearly all LAs have undercount for ‘Couple’ and ‘Other’
• Low level of ‘Missing’ in areas with high proportion of couple
households and low ‘Other’
• Older population = high proportion of couples with Partner ID
-15
-10
-5
0
5
10
Single Student Couple Lone
parent
Other
SPD%-Census%
Comparison with Census
0
10
20
30
40
Missing Couple with Partner ID
%ofhouseholds
Missing and Partner ID
Ranges of values for local authorities:

North East Derbyshire
• Lowest percentage of ‘Missing’ household
composition

Newham
• Highest percentage of ‘Missing’ household
composition

Kensington and Chelsea
• Largest difference for couple family households

Richmondshire
• Missing armed forces affect both distributions

Next Steps
• Assign addresses with ‘Missing’ household
composition to a category
• Many couples but age difference outside current range
• Some are ‘Other’ households eg unrelated adults
• Possibly use imputation method similar to Austria
• Use households containing a Partner ID as donors
• All other relationships in these are ‘non-couple’
• Evaluate effectiveness of algorithm
• Compare to record level census data

Future Plans
Publish Research outputs: occupied address (household)
estimates by size, 2011 – 24th July
Improve estimates of household numbers – output early next
year
Adjust numbers using a coverage survey
Research removal of communal establishments
Use more data e.g. Council Tax to identify students/one person
households
Household Composition – output early next year
Unoccupied addresses - do we need them?

Ons households july 17 research cp ml

Recommended

Recommended

More Related Content

Similar to Ons households july 17 research cp ml

Similar to Ons households july 17 research cp ml (20)

Recently uploaded

Recently uploaded (20)

Ons households july 17 research cp ml

Editor's Notes