2. Definitions
A
household
is defined as:
one person living alone,
or
a group of people (not
necessarily related) living at the
same address who share
cooking facilities and
share a living room or
sitting room or dining area.
3. ADCP aims for household outputs
Produce household statistics as part of Research Outputs 2016.
Three types of statistics over the next few years:-
• Number of households
• Household size
• Household composition
Household numbers released in February 2017
Derived from the same SPD as population estimates.
Replicate a similar output package as the population estimates -
time series
Can be produced at various levels of geography
Multiple versions from SPD versions.
SPD: Statistical Population Dataset
5. Challenges
Our three biggest challenges for producing household
numbers
Definition – household/address is not a one to one
relationship
Correct address allocation
• data lags
• high churn
• people not deregistering
• poor AddressBase matching/allocation
6. What data can we use?
Address
Base
Population
Coverage
Survey
Tax and
Benefits
data
7. Definitions
There are some important distinctions between the household
estimates produced in these research outputs and those
published in official statistics:
The definition of ‘households’ used in these research outputs is
based on identifying occupied addresses in administrative data
Occupied addresses on administrative data include those with
at least one ‘usual resident’ included in our Statistical
Population Dataset (SPD V2.0)
Only occupied addresses that have been successfully linked to a
Unique Property Reference Number (UPRN) on AddressBase
have been included in these research outputs
8. Allocating address at SPD record level
Using many data sources to find our
‘best’ address.
Benefits
Enables aggregation at different
levels and cross tabulation with other
variables.
Can weight certain data sources for
different demographic groups . e.g.
students
Notes:
A non valid UPRN may occur when the address given cannot be
matched to one on reference data, or is not in England and Wales
4% of SPD V2.0 records could not be assigned to UPRN (i.e.
‘residual’)
9. Underestimations
When comparing SPD V2.0 household estimates with official estimates, there is a
clear tendency to underestimate the number of households using this
methodology. Reasons for this can be summarised as follows:
UPRN assignment - Not all records on SPD V2.0 can be assigned to a
UPRN, due to missing address information or failures to link addresses
Complex residential addresses – Addresses with ‘parent’ and ‘child’ UPRN
hierarchies are unlikely to have full coverage on the administrative data we are
using for these research outputs
SPD V2.0 inclusion rules – The rules used to determine usual residence in
our SPD V2.0 population estimates may have resulting in the incorrect exclusion
of some households from our population base
10. England and Wales –
Comparing with Census for 2011 :-
Outcomes – Numbers of Households
12. England and Wales –
Comparing with Census for 2011 and DAU figures for 2011 and
2015:-
-14 -12 -10 -8 -6 -4 -2 0
England and Wales
East Midlands
East of England
London
North East
North West
South East
South West
Wales
West Midlands
Yorkshire and The Humber
Regional Percent Differences - 2011 and
2015
2011 2015
Outcomes – Numbers of Households
DAU: Demographics Analysis Unit at ONS
13. LA Name Region % difference
Kensington and Chelsea London -34.6
Westminster,City of London London -32.3
Islington London -22.2
Gwynedd Wales -21.4
Hammersmith and Fulham London -18.6
Camden London -17.4
Tower Hamlets London -16.8
Wandsworth London -16.0
Haringey London -15.6
Brent London -14.5
2011 2015
LA Name Region % difference
Gwynedd Wales -25.4
Westminster,City of London London -23.6
Kensington and Chelsea London -20.2
Cambridge East of England -20.0
Camden London -18.5
Broxbourne East of England 17.5
South Ribble North West 16.1
Watford East of England -15.4
Gravesham South East -14.5
Forest Heath East of England -14.4
Top Tens – largest differences
Outcomes – Numbers of Households
15. Household Sizes
To investigate whether we can counteract the
definitional differences between census
households and addresses/UPRNs, using
SPREE (Structure Preserving Estimator)
Uses Annual Population Survey (APS)
proportions of household sizes to adjust SPD
estimates.
16. Challenges - sizes
Some categories vary more than others across
geographies, so are harder to estimate.
Some geographies are affected by certain
missingness e.g. armed forces data, so may need to
be treated differently
Some geographies are affected by usual residence
variations, so may need to be treated differently.
If an area is extremely different from the national
distribution, it may be harder to estimate using those
distributions.
17. Adjustment using SPREE
Structure Preserving Estimator (SPREE) method uses survey data to support admin data.
Adjusting the proportions of each category, rather than numbers.
Source: Office for National Statistics Notes: 1. Statistical Population Dataset 2. Annual Population Survey 3. SPREE - Structure Preserving Estimator
18. Adjustment using SPREE
Structure Preserving Estimator (SPREE) method uses survey data to support admin data
Source: Office for National Statistics Notes: 1. Statistical Population Dataset 2. Annual Population Survey 3. SPREE - Structure Preserving Estimator
19. Adjustment using SPREE
Structure Preserving Estimator (SPREE) method uses survey data to support admin data
Source: Office for National Statistics Notes: 1. Statistical Population Dataset 2. Annual Population Survey 3. SPREE - Structure Preserving Estimator
20. Effects of estimation
Kensington and Chelsea
-6
-5
-4
-3
-2
-1
0
1
2
3
4
1 2 3 4 5 plus
SPD¹ - Census
SPREE² - Census
SPD1 difference from census percentages versus SPREE2 adjustment, 2011
Source: Office for National Statistics
Notes: 1. SPD - Statistical Population Dataset
2. SPREE - Structure Preserving Estimator
Hastings
-8
-6
-4
-2
0
2
4
1 2 3 4 5 plus
SPD¹ - Census
SPREE² - Census
21. Effects of estimation
SPD1 difference from census percentages versus SPREE2 adjustment, 2011
Source: Office for National Statistics
Notes: 1. SPD - Statistical Population Dataset
2. SPREE - Structure Preserving Estimator
Newham
-2
-1
0
1
2
3
4
1 2 3 4 5 plus
SPD¹ - Census
SPREE² - Census
Richmondshire
-4
-3
-2
-1
0
1
2
3
4
5
6
1 2 3 4 5 plus
SPD¹ - Census
SPREE² - Census
23. Classification
Census KS105EW:
One person household
Aged 65 and over
Other
One family household
All aged 65 and over
Married or same-sex civil partnership couple
No children
Dependent children
All children non-dependent
Cohabiting couple
No children
Dependent children
All children non-dependent
Lone parent
Dependent children
All children non-dependent
Other household types
With dependent children
All full-time students
All aged 65 and over
Other
Annual UK estimates from
Labour Force Survey:
One person household
Under 65
65 or over
Two or more unrelated adults
One family households
Couple
No children
1-2 dependent children
3 or more dependent children
Non-dependent children only
Lone parent
Dependent children
Non-dependent children only
Multi-family households
24. Using admin data
To create household composition we need:
1. Population base of usual residents – SPD V2.0
2. Usual residents assigned to an address to create
households base
Issues with SPD and household base described earlier
impact household composition
Other information used for household composition
1. Age, sex, surnames of occupants
2. Relationships from other admin data - ONS now has
access to some admin data containing relationships
25. Other work and methods
Register based countries: Austria
• Social security, child allowance and tax sources
• Couple, parent-child, sibling, grandparent-grandchild
relationships
• Still have to use imputation method for some relationships
UK: Harper and Mayhew (2015)
• No relationships available
• Count people in broad age groups to assign household type
• Children (0-19), Working age (20-64), Older adults (65+)
ONS method falls between these
• Use the relationships available in admin data where possible
• Use demographic information to infer others
26. Relationships in admin data
Couple relationships:
1. Housing Benefit
• Partner ID available where
applicable
2. National Benefits
Database
• Partner ID available for
State Pension claimants
If not available, need to infer
a couple relationship
0
50,000
100,000
150,000
200,000
250,000
300,000
15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95
Age
Age of people with partner ID
27. Relationships in admin data
Child Benefit data
• Contains a National Insurance ID for one of the parents
• High coverage of dependent children
• Eligible up to age 16, then up to 19 if in approved education
or training
• Identify whether 16-18 year olds are dependent children, to
match census definition
Non-dependent children
• No longer on Child Benefit dataset
• Infer a relationship to a parent using additional information
28. Algorithm
1) Single person 2) All students 3) Lone parent
4) Couple
•a) With Partner ID
•b) No Partner ID
5) Other
•a) More than 2 generations
•b) Unrelated adult
• Use all possible relationships at address to assign
the household to a major category:
29. 3
1
2
Age
18
16
Algorithm
1. Single person households – one person in UPRN
2. Student – all people have HESA record
3. Lone parent families:
Smith
Smith
Parent ID
> 18 years
30. Couple families
4. Couple families:
3
4
Partner ID
≤ 12 years
Parent ID
> 18 years
Smith
Age 1
2
18
16
Smith
31. Other households
Age
2
1
3
4
5
> 50 years
Age
1
2
3
< 15 years
Contain more than one family
More than two
generations:
Person 3 too old to
be child of 1 or 2
32. Results
0 10 20 30 40 50 60
Single
Student
Couple
Lone parent
Other
Missing
% of households
Census
SPD
• Percentage distribution to remove household undercount effect
• ‘Missing’ – does not meet any current category criteria
33. Minor categories
Single person
Aged 65+
Other
Lone parent
With
dependent
children
All children
non-dependent
Couple
No children
With
dependent
children
All children
non-
dependent
Other
With
dependent
children
All aged 65+
Other
34. Minor categories results
0 5 10 15 20
Aged 65 and over
Other
All aged 65 and over
No children
Dependent children
All children non-dependent
Dependent children
All children non-dependent
Student
With dependent children
All aged 65 and over
Other
Missing
SCLSOM
% of households
Census
SPD
35. Local authorities
• Very nearly all LAs have undercount for ‘Couple’ and ‘Other’
• Low level of ‘Missing’ in areas with high proportion of couple
households and low ‘Other’
• Older population = high proportion of couples with Partner ID
-15
-10
-5
0
5
10
Single Student Couple Lone
parent
Other
SPD%-Census%
Comparison with Census
0
10
20
30
40
Missing Couple with Partner ID
%ofhouseholds
Missing and Partner ID
Ranges of values for local authorities:
40. Next Steps
• Assign addresses with ‘Missing’ household
composition to a category
• Many couples but age difference outside current range
• Some are ‘Other’ households eg unrelated adults
• Possibly use imputation method similar to Austria
• Use households containing a Partner ID as donors
• All other relationships in these are ‘non-couple’
• Evaluate effectiveness of algorithm
• Compare to record level census data
41. Future Plans
Publish Research outputs: occupied address (household)
estimates by size, 2011 – 24th July
Improve estimates of household numbers – output early next
year
Adjust numbers using a coverage survey
Research removal of communal establishments
Use more data e.g. Council Tax to identify students/one person
households
Household Composition – output early next year
Unoccupied addresses - do we need them?
Editor's Notes
What we’ve done so far – initial aggregates of number, sizes and composition. However the issue of undercount of numbers of households needed to be addressed, so concentration has been on investigations into estimation; resolution of half weights and improvements in address matching.
Incorrect allocation seems to be leading to undercount, as does our definition, to some extent.
London has high hh1 and low hh2 on SPD
Hastings has low hh1 and hh2
Richmondshire has high h11 and low hh2 – due to missing partners
Newham has high expected hh5
HH composition is on the ADC plan for publication with next years HH research outputs, possibility of an RO article earlier.
Progress and results so far, algorithm is not complete and there are still some residuals to assign a value.
KS105EW is the less detailed breakdown, QS113EW also has the ‘Dependent children’ categories split into ‘1’ and ‘2 or more’ categories, as well as the same-sex CP couples separate to the married couples.
Therefore assuming these are the most important categories, and starting out aiming at this from admin data, but have the user needs evolved? Are there any specific types of households that users would really like more information on?
Non census years, estimates from LFS which are only published at UK level. Could reproduce this but improve by producing outputs at LA level and below.
Outline all the options that can be produced from Census
Are they still most appropriate for current user needs?
Are there any additional categories that should be added if possible?
Which of these are most important?
Which are likely to be difficult with admin data?
What level is acceptable?
There is a few % of people who couldn’t be assigned to an address and an undercount of almost 6% on household numbers for E&W.
The earlier stages each have their issues as Claire has discussed, so the success of household composition is limited by these even if a perfect method can be produced.
Austria situation is close to the ideal, Mayhew and Harper method is what has been done in UK.
We have some relationships on admin data and information to infer others, so our situation falls between these two cases.
Could archive relationships from earlier years, so if a child is now non-dependent, we can use the parent-child relationship that previously existed on the Child Benefit dataset
Couples affected by the residuals that are still missing but likely to have a slight undercount due to household size distribution where size 2 is undercounted
Smaller difference for couples with dependent children than for couples with no children.
Likely due to the household size distribution, which has undercoverage for 2 person and overcoverage for 3 person households.
Both categories then have the same issues with the algorithm identifying couples.
Chart shows couple and other households are very nearly always undercounting
The differences among the household types are largely dependent on how close the household size distribution is to census.
The proportion of couple households identified by a partner_ID tells us something about relative uncertainty for this category.
Algorithm has least missing household composition in North East Derbyshire and South Norfolk and highest for Newham.
% of couples identified by partner_id is highest in West Somerset and lowest in City of London and Lambeth/Wandsworth/Southwark area.
Mainly driven by age of population due to largest contribution from state pension.
NE Derbyshire has least missing and percentage of couples identified with partner_id of 29.2%. Areas with higher partner_id are similar to this.
A high proportion of couples, relatively old population. Residuals look like they should split between couples and other.
A small net undercoverage in the size 2 and 3 households, meaning couples should come out quite close if the residuals can be assigned correctly.
A high proportion of large households and a high proportion of ‘Other’ types
After the residuals are assigned the SPD is likely to overestimate the proportion of ‘Other’ and underestimate the proportion of couples and lone parents.
Lone parents are currently low, and the undercoverage is not likely to be made up by the residuals, so expect some undercount for couples too.
2 and 3 person must also contribute significantly to the ‘Other’ households, since eg 2 person are quite close to census but couple is not.
K&C has largest gap for couples currently. Area is high on missing and low on partner_id.
The message here is that the undercoverage of 2 person households means that the algorithm will never be able to match the Census proportion of couples.
That would require improvements to the undercoverage in the SPD/residual people not assigned to addresses.
Undercoverage on the SPD means we are overestimating the proportion of single person households.
Other things to mention in general households next steps:
Improvements to prior steps will benefit household composition eg:
Over/undercoverage in SPD population base
Addresses on admin sources
Address matching
Focus on acquiring more admin data with relationships (or marital status)
Especially partners