Making sense of big data


Published on

What is big data, and what are its potential benefits and risks?

Presentation given by Sir Mark Walport at the Oxford Martin School on 3 December 2013.

Published in: Technology, News & Politics
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Making sense of big data

  1. 1. Big Data: Big Opportunity, Big Brother or Big Trouble? Oxford Martin School, December 3rd 2013 Sir Mark Walport, Chief Scientific Adviser to HM Government
  2. 2. The future will not be a repetition of the past. James Martin, 1933-2013 Writing in 1978 Credit: Oxford Martin School Those who cannot remember the past are condemned to repeat it. George Santayana, 1863-1952 Writing in 1905 2 Big data and privacy PD
  3. 3. • Florence Nightingale, Crimean War nurse and pioneer of statistics. In the 1890s she tried to get a Professorship of Statistics established at Oxford University, specifically for applying statistical analysis to social problems. • At the time the scheme came to nothing, but her vision is now realised all over the world. • Oxford began teaching Applied Statistics in 1947, and appointed its first Professor of Mathematical Statistics in 1962. 3 Big data and privacy PD
  4. 4. Overview 1. Identity and identification 2. The promise of big data – opportunities and risks 3. What about privacy? 4.Where is all of this headed, and what do we need to do?
  5. 5. Identity – the sameness of a person or thing at all times or in all circumstances; the condition or fact that a person or thing is itself and not something else; individuality, personality. Identity – this is what makes me me Credit: Wellcome Collection
  6. 6. Identification – The determination of identity; the action or process of determining what a thing is; the recognition of a thing as being what it is Identification – I will find out who you are Credit: Wellcome Collection
  7. 7. Society doesn’t work in the absence of identifiers. So who needs to know about us? Credit: Getty Credit:imagezone Family and friends Credit: Getty Civic sector Credit: Getty Business 7 Big data and privacy Government
  8. 8. We manage our relationships by selective disclosure of data - multiple identities Age Financial status Place attachments Profession Nationality Hobbies Ethnicity Family role Religion Community & friendship 8 Big data and privacy
  9. 9. The outside world uses different approaches to identify us Direct disclosures • Passport • Driving license • Work pass • NHS number • National Insurance Number Credit: Mark Yuill Credentials and tokens • PIN number • Password • RFID embedded device Credit: Shutterstock 9 Big data and privacy
  10. 10. What is personal information? Direct Hard to define, but ultimately information that enables particular attributes to be linked to a unique individual. Face Fingerprint DNA Indirect Name Address Postcode Workplace Club
  11. 11. Some attributes are more or less sensitive in different contexts • Age • Sex • Nationality • Religion • Health • Education • Financial • Football Team Richard Nixon ‘s application to the FBI, 1937. Released under FOI. Contains lots of (redacted) sensitive health information. 11 Big data and privacy
  12. 12. Information Technology and the web have created new opportunities to create identities Anonymous 12 Big data and privacy Pseudonymous Real
  13. 13. The next generation of products will generate yet more data – the internet of things Credit: tedeytan/CC-BY-SA-2.0 Credit: MIT Media Lab Credit: MarkDoliner/CC-BY-2.0 Credit: LG 13 Big data and privacy
  14. 14. The data is used by each of us for our personal utility Finding things out Telling other people things Listening and watching things Navigating the real world Navigating fictional worlds Buying and selling stuff Playing games Storing stuff Recording our lives and those of friends/families Socialising with others Stealing things Plotting and causing harm 14 Big data and privacy
  15. 15. Information technology has created new ways of locating or finding us Image: iPhone tracking data The consequence of all of this is that we are giving a lot of information out that others can then use…. 15 Big data and privacy
  16. 16. Smart meters produce detailed data on energy consumption 16 Big data and privacy
  17. 17. The price of the utility is that we are generating data on a massive scale 17 Big data and privacy
  18. 18. Lots of other people are interested in our data. Who knows the most about us? Government Corporations ONS Google HMRC Experian NHS Loyalty Cards 18 Big data and privacy
  19. 19. How do they use it? Retail suppliers. • Our data is used to provide individual services. • But is also aggregated for wholesale purposes - and they give or sell the wholesale data to other organisations. Credit: Lotus Head/CC-BY-SA-2.5 …and do we know how they use it? Credit: Tesco 19 Big data and privacy
  20. 20. The myth of consent - do we really read and understand the full terms and conditions? Credit: Google In 2008, researchers calculated it would take 76 working days to read all the privacy policies you encounter in a year. If everyone in the US did so, it would cost the country more than the GDP of Florida. 20 Big data and privacy
  21. 21. How do they use it? Government Voting Credit: ClassicStock Taxes Credit: Phillip Ingham/CC-BY-ND-2.0 21 Big data and privacy Planning Credit: iStockphoto Law enforcement Credit: South Yorkshire Police
  22. 22. National security Credit: The Telegraph 22 Big data and privacy Credit: The Guardian
  23. 23. Who else uses it? • Future employers • Hostile and competing foreign states • Criminals and terrorists • Journalists 23 Big data and privacy Credit: Getty
  24. 24. How do the wholesale collectors of data add value to it? 24 Big data and privacy
  25. 25. What more can we do? Societal Level Improving Health (and research in general) Understanding and optimising business processes Improving and optimising cities and countries Optimising Machine and Device Performance Understanding, targeting, and serving customers Improving Security and Law Enforcement 25 Big data and privacy Individual Level Personal quantification and performance optimisation Improving sports performance
  26. 26. Improving health: diabetes in Scotland • Total Scottish Population 5.2m • People with diabetes : 251,132 (4.9%) • People with Type 1 DM : ~27,000 (0.5%) • All patients nationally are registered onto a single register; the SCI-DC register • SCI-DC used in all 38 hospitals • Nightly capture of data from all 1043 primary care practices across Scotland Courtesy of Andrew Morris 26 Big data and privacy
  27. 27. Getting about: Citymapper • An app for New York and London, which links all transport systems together so you can easily discover the best way to get from where you are to where you want to be. 27 Big data and privacy
  28. 28. Improving infrastructure: Streetbump Credit: Streetbump • A project in Boston, a city plagued by potholes and other street maintenance issues. • People can report problems in various easy ways, including an app that automatically detects bumps driven over. • Highly successful, the critical element being an efficient system for getting maintenance crews to the sites of reported issues. 28 Big data and privacy
  29. 29. What about the potential harms? • UK research with 58,000 US volunteers found that algorithms based on Facebook “likes”, which are often public, can predict personality traits. • 95% accurate in distinguishing African-American from Caucasian-American and 85% for differentiating Republican from Democrat. • Some odd links as well. Curly fries correlated with high intelligence… Credit: BBC 29 Big data and privacy
  30. 30. Dangers of releasing data into the wild • Released anonymised search data for research purposes. • Journalists were able to pick up clues to name and location, then triangulate with embarrassing search queries. • Programme was halted, its initiators sacked. 30 Big data and privacy • Released anonymised film rental data and set a $1m prize, hoping to improve recommendation algorithms. • People’s viewing taste beyond usual blockbusters is highly individual. • Triangulating with IMDB data, bloggers identified individual users and were able to reveal their full list of rentals, not just those they had “rated”.
  31. 31. What about privacy? Credit: mkabakov
  32. 32. Privacy controls are not binary but fall on spectra Openly identifiable Free on the internet Obfuscation Access / Environment (Everyone) Little legislation 32 Big data and privacy Anonymised to the point of losing valuable content Locked in a steellined room (Accredited researcher) Governance and accountability Highly legislated
  33. 33. A taxonomy of obfuscation Anonymisation: Remove all identifiers such that it is impossible to identify an individual Encryption: Prevent it from being read without unlocking - in theory encrypted databases can be analysed without breaking the encryption but basically they cannot be used for anything but trivial uses Credit: University of Regensburg Tokenisation or pseudonymisation: remove as much of the 'personal' information as possible - and link to personal via independent securely held database Credit: Robbie Cooper 33 Big data and privacy
  34. 34. Obfuscation - differential privacy • Differential privacy: the database itself remains pure, but a small amount of noise is added to the final answer of each query, to prevent identification of a single record. • Good for many situations, but not for small populations or finding needles in haystacks, such as the common factors behind a rare disease. 34 Big data and privacy
  35. 35. Access and environment: safe havens • A safe haven for data is more like a traditional library, where controlled access is granted to people who have the right credentials. • You lose some of the benefit of making data freely available over the internet, but the risk of malicious use is greatly reduced. Credit: QTS 35 Big data and privacy • The Administrative Data Research Network is a scheme to make HMG data available in safe havens.
  36. 36. Governance: data protection legislation • Harm can be done by sharing and not sharing data • The Data Protection Act is rarely the real barrier to sharing data for the protection of individuals • DPA law provides exemptions for research, which would be tightened significantly by the proposed EU Data Protection Regulation, making some current medical research illegal. A major concern. 36 Big data and privacy Credit: EU dpi
  37. 37. Laws have borders – data does not Map showing undersea internet cables 37 Big data and privacy
  38. 38. Even if a dataset is effectively anonymised on its own, and this is very difficult, if freely available it can be “decrypted” by finding overlaps with other datasets. These could be a mixture of public and private datasets. The bottom line: it is very hard to guarantee privacy 38 Big data and privacy
  39. 39. Where is all of this headed, and what do we need to do? Credit: Arne Hückelheim/CC-BY-SA-3.0
  40. 40. There are some tough challenges • The digital infrastructure creates new threats and vulnerabilities • Security considerations were not planned into the internet and web • The keys to cryptography are only as secure as those that hold them – importance of human science • Who watches the watchers? • Should big data be on the National Risk Register? PD Juvenal: Roman poet to which Quis custodiet ipsos custodes? is attributed.
  41. 41. Balancing risks • Don't underplay risk of releasing data: the challenge is to balance utility and privacy • Recognise that people that will reidentify are extremely able and may have powerful hardware at their disposal. Source: 41 Big data and privacy
  42. 42. What will be the effect on people? Autonomy Privacy Disclosure Credit: Shutterstock 42 Big data and privacy Credit: Shutterstock
  43. 43. What will be the effect on people? • It is impossible to completely erase a digital past. • Future generations may require the right to be forgiven rather than the right to be forgotten. •Young people are already becoming more protective of their data and abandoning Facebook for Snapchat, WhatsApp and other platforms. 43 Big data and privacy
  44. 44. There are utopian and dystopian futures • Utopia: Knowledge to all, educating the world, accountability and sustainability. PD JMW Turner, The Rise of the Carthaginian Empire, 1815 • Dystopia: end of individuality, disrupted fabric of society, childhood play disrupted, monopoly of the state in law enforcement disrupted, loss of trust in service providers. Credit: Friman/CC-BY-SA-3.0 Presidio Modelo prison, Cuba (abandoned) 44 Big data and privacy
  45. 45. How do we move forward? Technology Continue to strongly support science and skills agenda. Communication Don't underplay risk of releasing data: challenge to balance utility and privacy Governance Reduce risk by choice of environment - safe havens with penalties: control environment proportional to risk of harm 45 Big data and privacy
  46. 46. Final messages • There is no going back – the world shaped by the digital revolution • There are new tools for understanding ourselves and the world • Huge economic opportunities • There are unforeseen benefits and harms
  47. 47. Final messages • The internet has no borders • There will be ever more scope for crime and terrorism in cyberspace • UK has great strength in cyber security • We must stay at the leading edge, develop proportionate regulation, legislation and accountability. • Need a sophisticated level of debate.
  48. 48. @uksciencechief Every effort has been made to trace copyright holders and to obtain their permission for the use of copyright material. We apologise for any errors or omissions in the included attributions and would be grateful if notified of any corrections that should be incorporated in future versions of this slide set. We can be contacted through .