Big Data: Big Opportunity, Big Brother or Big
Oxford Martin School, December 3rd 2013
Sir Mark Walport, Chief Scientific Adviser to HM Government
The future will not be a repetition
of the past.
James Martin, 1933-2013
Writing in 1978
Credit: Oxford Martin School
Those who cannot remember the
past are condemned to repeat it.
George Santayana, 1863-1952
Writing in 1905
2 Big data and privacy
Florence Nightingale, Crimean War
nurse and pioneer of statistics. In
the 1890s she tried to get a
Professorship of Statistics
established at Oxford University,
specifically for applying statistical
analysis to social problems.
At the time the scheme came to
nothing, but her vision is now
realised all over the world.
Oxford began teaching Applied
Statistics in 1947, and appointed its
first Professor of Mathematical
Statistics in 1962.
3 Big data and privacy
1. Identity and identification
2. The promise of big data – opportunities
3. What about privacy?
4.Where is all of this headed, and what do
we need to do?
Identity – the sameness of a person or thing at
all times or in all circumstances; the condition
or fact that a person or thing is itself and not
something else; individuality, personality.
Identity – this is what makes me me
Credit: Wellcome Collection
Identification – The determination of identity;
the action or process of determining what a
thing is; the recognition of a thing as being
what it is
Identification – I will find out who you are
Credit: Wellcome Collection
Society doesn’t work in the absence of
identifiers. So who needs to know about us?
Family and friends
7 Big data and privacy
We manage our relationships by selective
disclosure of data - multiple identities
8 Big data and privacy
The outside world uses different approaches
to identify us
• Driving license
• Work pass
• NHS number
• National Insurance Number
Credit: Mark Yuill
Credentials and tokens
• PIN number
• RFID embedded device
9 Big data and privacy
What is personal information?
Hard to define, but ultimately information that enables particular
attributes to be linked to a unique individual.
Some attributes are more or less
sensitive in different contexts
• Football Team
Richard Nixon ‘s application to the FBI, 1937.
Released under FOI. Contains lots of (redacted)
sensitive health information.
11 Big data and privacy
Information Technology and the web have
created new opportunities to create identities
12 Big data and privacy
The next generation of products will generate yet
more data – the internet of things
Credit: MIT Media Lab
13 Big data and privacy
The data is used by each of us for our personal
Finding things out
Telling other people
Navigating the real
Buying and selling
Recording our lives and
those of friends/families
Socialising with others
Plotting and causing
14 Big data and privacy
Information technology has created new ways of
locating or finding us
Image: iPhone tracking data
The consequence of all of this is that we are giving a lot
of information out that others can then use….
15 Big data and privacy
Smart meters produce detailed data on energy
16 Big data and privacy
The price of the utility is that we are generating
data on a massive scale
17 Big data and privacy
Lots of other people are interested in our data. Who
knows the most about us?
18 Big data and privacy
How do they use it? Retail suppliers.
• Our data is used to provide
• But is also aggregated for
wholesale purposes - and
they give or sell the
wholesale data to other
Credit: Lotus Head/CC-BY-SA-2.5
…and do we know how they use it?
19 Big data and privacy
The myth of consent - do we really read and
understand the full terms and conditions?
In 2008, researchers calculated it would take 76 working days to read all
the privacy policies you encounter in a year. If everyone in the US did so,
it would cost the country more than the GDP of Florida.
20 Big data and privacy
How do they use it? Government
Credit: Phillip Ingham/CC-BY-ND-2.0
21 Big data and privacy
Credit: South Yorkshire Police
Credit: The Telegraph
22 Big data and privacy
Credit: The Guardian
Who else uses it?
• Future employers
• Hostile and
• Criminals and
23 Big data and privacy
How do the wholesale collectors of data add
value to it?
24 Big data and privacy
What more can we do?
(and research in
optimising cities and
25 Big data and privacy
Improving health: diabetes in Scotland
• Total Scottish Population 5.2m
• People with diabetes : 251,132
• People with Type 1 DM : ~27,000
• All patients nationally are
registered onto a single register;
the SCI-DC register
• SCI-DC used in all 38 hospitals
• Nightly capture of data from all
1043 primary care practices across
Courtesy of Andrew Morris
26 Big data and privacy
Getting about: Citymapper
• An app for New York and London, which links all transport
systems together so you can easily discover the best way to
get from where you are to where you want to be.
27 Big data and privacy
Improving infrastructure: Streetbump
• A project in Boston, a city plagued by potholes and other street
• People can report problems in various easy ways, including an app
that automatically detects bumps driven over.
• Highly successful, the critical element being an efficient system for
getting maintenance crews to the sites of reported issues.
28 Big data and privacy
What about the potential harms?
• UK research with 58,000 US
volunteers found that algorithms
based on Facebook “likes”, which
are often public, can predict
• 95% accurate in distinguishing
Caucasian-American and 85% for
differentiating Republican from
• Some odd links as well. Curly
fries correlated with high
29 Big data and privacy
Dangers of releasing data into the wild
• Released anonymised search data
for research purposes.
• Journalists were able to pick up
clues to name and location, then
triangulate with embarrassing search
• Programme was halted, its initiators
30 Big data and privacy
• Released anonymised film rental data
and set a $1m prize, hoping to improve
• People’s viewing taste beyond usual
blockbusters is highly individual.
• Triangulating with IMDB data, bloggers
identified individual users and were able
to reveal their full list of rentals, not just
those they had “rated”.
Privacy controls are not binary but fall on
Free on the
Access / Environment
32 Big data and privacy
Anonymised to the
point of losing
Locked in a steellined room
A taxonomy of obfuscation
Anonymisation: Remove all identifiers such
that it is impossible to identify an individual
Encryption: Prevent it from being read without
unlocking - in theory encrypted databases can
be analysed without breaking the encryption but basically they cannot be used for anything
but trivial uses
Credit: University of
Tokenisation or pseudonymisation: remove
as much of the 'personal' information as
possible - and link to personal via independent
securely held database
Credit: Robbie Cooper
33 Big data and privacy
Obfuscation - differential privacy
• Differential privacy: the database itself remains pure, but a small amount
of noise is added to the final answer of each query, to prevent identification
of a single record.
• Good for many situations, but not for small populations or finding needles
in haystacks, such as the common factors behind a rare disease.
34 Big data and privacy
Access and environment: safe havens
• A safe haven for data is more
like a traditional library, where
controlled access is granted to
people who have the right
• You lose some of the benefit of
making data freely available over
the internet, but the risk of
malicious use is greatly reduced.
35 Big data and privacy
• The Administrative Data
Research Network is a scheme
to make HMG data available in
Governance: data protection legislation
• Harm can be done by sharing and not
• The Data Protection Act is rarely the
real barrier to sharing data for the
protection of individuals
• DPA law provides exemptions for
research, which would be tightened
significantly by the proposed EU Data
Protection Regulation, making some
current medical research illegal. A major
36 Big data and privacy
Credit: EU dpi
Laws have borders – data does not
Map showing undersea internet cables
37 Big data and privacy
Even if a dataset is effectively anonymised on its own,
and this is very difficult, if freely available it can be
“decrypted” by finding overlaps with other datasets.
These could be a mixture of public and private datasets.
The bottom line: it is very hard to guarantee privacy
38 Big data and privacy
Where is all of this headed, and
what do we need to do?
Credit: Arne Hückelheim/CC-BY-SA-3.0
There are some tough challenges
• The digital infrastructure creates new threats
• Security considerations were not planned into
the internet and web
• The keys to cryptography are only as secure as
those that hold them – importance of human
• Who watches the watchers?
• Should big data be on the National Risk
Juvenal: Roman poet to which Quis
custodiet ipsos custodes? is
• Don't underplay risk
of releasing data: the
challenge is to
balance utility and
• Recognise that
people that will reidentify are extremely
able and may have
powerful hardware at
41 Big data and privacy
What will be the effect on people?
42 Big data and privacy
What will be the effect on people?
• It is impossible to completely
erase a digital past.
• Future generations may
require the right to be forgiven
rather than the right to be
•Young people are already
becoming more protective of
their data and abandoning
Facebook for Snapchat,
WhatsApp and other platforms.
43 Big data and privacy
There are utopian and dystopian futures
Utopia: Knowledge to all,
educating the world,
JMW Turner, The Rise of the Carthaginian Empire, 1815
Dystopia: end of individuality,
disrupted fabric of society,
childhood play disrupted,
monopoly of the state in law
enforcement disrupted, loss of
trust in service providers.
Presidio Modelo prison, Cuba (abandoned)
44 Big data and privacy
How do we move forward?
Continue to strongly
support science and skills
Don't underplay risk of
releasing data: challenge
to balance utility and
Reduce risk by choice of
environment - safe
havens with penalties:
proportional to risk of
45 Big data and privacy
• There is no going back – the world shaped
by the digital revolution
• There are new tools for understanding
ourselves and the world
• Huge economic opportunities
• There are unforeseen benefits and harms
• The internet has no borders
• There will be ever more scope for crime and terrorism in
• UK has great strength in cyber security
• We must stay at the leading edge, develop
proportionate regulation, legislation and accountability.
• Need a sophisticated level of debate.
Every effort has been made to trace copyright holders and to obtain their permission for the use of copyright material.
We apologise for any errors or omissions in the included attributions and would be grateful if notified of any corrections
that should be incorporated in future versions of this slide set. We can be contacted through firstname.lastname@example.org .