2. UK Data Service
⢠Funded by the ESRC to support researchers who
depend on high-quality social and economic data
⢠Single point of access to a wide range of data
including large-scale government surveys,
international macrodata, qualitative studies and
business microdata
⢠Around 7000 datasets
⢠Today we will focus on UKDS Secure Lab
3. What is Secure Lab?
⢠Holds around sixty datasets â detailed microdata
⢠Data deemed more sensitive by data owners
⢠Same security model as VML (at ONS), HMRC Datalab
and ADRN
⢠Accessed remotely from researchersâ institution*
⢠Nothing goes in or out of the Secure Lab environment
without being checked by the Support Team first
* Subject to project approval, training etc.
4. Principles of the security model
SAFE PROJECTS
SAFE PEOPLE
SAFE DATA
SAFE SETTING
SAFE OUTPUTS
SAFE USE
5. What is the Data Protection Act 1998?
⢠The DPA 1998 provides a framework to ensure
that personal information is handled properly
⢠Guidelines for what you should avoid when
dealing with personal data
⢠But, it also allows you to use personal data
6. What is âpersonalâ data?
⢠Data which:
⢠Relate to a living individual
⢠Make it possible for an individual to be identified
from those data, or from those data and other
information
⢠Include any expression of opinion about the
individual
7. Data Protection Act, 1998 saysâŚ
⢠Should only disclose personal data if consent given to do so, and if
legally required to do
⢠When handling personal data, it should be:
⢠Kept securely
⢠Processed in accordance with
the rights of data subjects â
e.g.:
⢠Right to be informed how
data will be used, stored,
processed, transferred,
destroyed etc.
⢠Right to access info and
data held
⢠Processed fairly and lawfully
⢠Obtained and processed for a
specified purpose
⢠Adequate, relevant and not
excessive for purpose
⢠Accurate
⢠Not transferred abroad without
adequate protection
8. What is âsensitive personalâ data?
⢠In the DPA, sensitive personal data is data consisting of
information relating to the data subject about defined
set of categories including:
⢠Race
⢠Ethnicity
⢠Politics
⢠Trade Union membership
⢠Physical and mental health
⢠Sexual life
⢠Offences, sentences or disposals
9. Research exemption
⢠Section 33 of the DPA provides limited exemptions to some
of the data protection principles where personal data are to
be processed for âresearch purposesâ.
⢠To qualify for the âresearch exemptionâ, the researcher must
confirm that the personal data will not be processed:
⢠In order to support measures or decisions with respect to
particular individuals
⢠In a way that substantial damage or substantial distress
is, or is likely to be, caused to any data subject
10. Statistical disclosure control (SDC)
⢠Carry out SDC checks on outputs to ensure they arenât
disclosive
⢠Manual process carried out by two staff members
⢠Two approaches to SDC:
⢠Rules based
⢠Principles based
⢠We take a principles based approach
11. Rules of thumb
⢠We do have two ârules of thumbâ
1. Threshold rule: No cells should contain less than
10 observations
2. Dominance rule: No observation should dominate
the data to a huge extent
12. Why a threshold rule?
⢠Threshold includes a margin of error, enabling us to
assess and clear most outputs quickly and efficiently
⢠10 is rarely problematic for users but is high enough to
make identification of individuals difficult
⢠Also about perception:
⢠e.g. an output could for example be published openly
on a website.
⢠small numbers can look unsafe (even if theyâre not).
⢠Public perception of tables of small counts could be
damaging whatever the actual risk.
13. Threshold rule: basic
⢠Manufacturing firms with turnover over £10m by
region.
⢠The RDC has a threshold rule of N=10
⢠Is this data potentially disclosive?
Region Number of firms
England 152
Wales 28
Scotland 53
N. Ireland 3
14. Threshold rule & cell suppression
Tenure Gender Age group Total
Male Female <20 21 - 50 51 - 75 76 - 95
Private rent 440 451 138 472 171 110 891
Social
housing
182 346 117 209 104 98 528
Owns
outright
198 104 - 54 73 173 302
Owns with
mortgage
280 179 - 224 225 - 459
Housing tenure in Bundesrough
15. Dominance rule
⢠Either
⢠The sum of all but the largest two units must exceed
at least 12.5% of the value of the largest unit.
⢠The largest unit has less than 43.75% of the total.
16. Rules, procedures etc.
The rules and procedures we have in place are there
to:
⢠Keep researchers operating within the law
⢠Comply with the requirements of data owners
⢠Protect data subjects
⢠Ensure the continued operation of Secure Lab
MORE SENSITIVE â too sensitive for simple download
We think that this doesnât go far enough and take the attitude that we should all be careful about data which relate to non-living individuals too.
We take the attitude that ALL data may be sensitive to someone (itâs a bit subjective!) so best to treat ALL data with the same respect.
30 for HMRC data.
Here, the researcher realises that some cells will not meet the threshold rule so has suppressed these cells.
HOWEVER:
Some basic maths reveals that:
<20 âowns outrightâ is 2
<20 âowns with mortgageâ and 76-95 âowns with mortgageâ make 10 IN TOTAL and therefore must be below 10 each
SOMETIMES SUPPRESSION ISNâT ENOUGH TO MAKE AN OUTPUT SAFE