Data Science and EU Privacy
A Storm on the Horizon
David Stephenson, Ph.D.
dsiAnalytics.com
PRIVACY CONSIDERATIONS WITH DATA AND DATA SCIENCE
• Intro & Case Studies
• Data & Data Science: Growth and Usage
• Privacy: Storm on the Horizon
• Concluding Thoughts
Agenda
2
My Background
Intro & Case Studies
Head of Global Business
Analytics
Professor
(Advanced Analytics)
Ph.D. Analytics &
Computer Science
Financial Analytics,
Credit Risk and Insurance
Independent Consultant
3
Target corporation makes an embarrassing revelation
Intro & Case Studies
Legal
4
Netflix Plays with Fire and Gets Burned
Intro & Case Studies
5
Netflix
5
PRIVACY CONSIDERATIONS WITH DATA AND DATA SCIENCE
• Intro & Case Studies
• Data & Data Science: Growth and Usage
– Data Science
– Modern Technology
• Privacy: Storm on the Horizon
Agenda
6
Data Usage
The Power of Data Science
Propensity Classification/Profiling
PersonalizationMarketing
7
More Data Means More Insights
Traditional Data
Big
Data
Smart
Devices
IoT
8
William Weld
Healthcare Case Study: Linking Data Destroys Anonymization
Data Science
9
PRIVACY CONSIDERATIONS WITH DATA AND DATA SCIENCE
• Intro & Case Studies
• Data & Data Science: Growth and Usage
– Data Science
– Modern Technology
• Privacy: Storm on the Horizon
Agenda
10
Data Sources
The Power of Data Science
11
Brainstorm: Today’s Sources of Personal Data? 11
Sources: Browsing
Data Growth: Modern Technology
12
Sources: off-line behavior
Data Growth: Modern Technology
13
Sources: Biometrics
Data Growth: Modern Technology
14
Sources: The Internet of Things
Data Growth: Modern Technology
15
Data Storage
Data Growth: Modern Technology
16
Source and Use of Customer Data
Privacy: A Brief Background
Can be
Known
Used
Stored
Shared with
3rd parties
Observed
Volunteered
Data
Science
17
PRIVACY CONSIDERATIONS WITH DATA AND DATA SCIENCE
18
• Intro & Case Studies
• Data & Data Science: Growth and Usage
• Privacy: Storm on the Horizon
Agenda
Privacy Legislation
19
Privacy Legislation
EU Data
Protection
Directives
19
Preparing for compliance
Privacy: Storm on the Horizon
What are my
data assets?
Usage
Storage
Flow to/from
3rd parties
Observed
Volunteered
Data
Science
Right to be forgotten
De-anonymization
Cloud computing
Explicit and up-
front consent
Restricted profiling
Privacy by Design
Potential liabilities from
buying, selling and sharing
20
Moving Forward
Privacy: Storm on the Horizon
 Become aware of your entire data ecosystem and how it may expose
you to privacy violations
 Audit current data storage and governance for compliance
 Ensure that all product roadmaps comply with the principles of
Privacy by Design
21
 Ensure that proper user consent is in place from the moment of first
user registration
 Initiate dialogue with corporate privacy officer or external expert
22
david@dsiAnalytics.com
@Stephenson_data
Contact
Appendix
23
Privacy by Design
24
1 Proactive not Reactive; Preventative not Remedial
2 Privacy as the Default Setting
3 Privacy Embedded into Design
4 Full Functionality – Positive-Sum, not Zero-Sum
5 End-to-End Security – Full Lifecycle Protection
6 Visibility and Transparency – Keep it Open
7 Respect for User Privacy – Keep it User-Centric
Privacy by Design for Big Data (Jeff Jonas, IBM)
25
1. FULL ATTRIBUTION: Every observation (record) needs to know from where it came and when. There cannot be
merge/purge data survivorship processing whereby some observations or fields are discarded.
2. DATA TETHERING: Adds, changes and deletes occurring in systems of record must be accounted for, in real time, in sub-
seconds.
3. ANALYTICS ON ANONYMIZED DATA: The ability to perform advanced analytics (including some fuzzy matching) over
cryptographically altered data means organizations can anonymize more data before information sharing.
4. TAMPER-RESISTANT AUDIT LOGS: Every user search should be logged in a tamper-resistant manner — even the
database administrator should not be able to alter the evidence contained in this audit log.
5. FALSE NEGATIVE FAVORING METHODS: The capability to more strongly favor false negatives is of critical importance
in systems that could be used to affect someone’s civil liberties.
6. SELF-CORRECTING FALSE POSITIVES: With every new data point presented, prior assertions are re-evaluated to
ensure they are still correct, and if no longer correct, these earlier assertions can often be repaired — in real time.
7. INFORMATION TRANSFER ACCOUNTING: Every secondary transfer of data, whether to human eyeball or a tertiary
system, can be recorded to allow stakeholders (e.g., data custodians or the consumers themselves) to understand how
their data is flowing.

Data science and pending EU privacy laws - a storm on the horizon

  • 1.
    Data Science andEU Privacy A Storm on the Horizon David Stephenson, Ph.D. dsiAnalytics.com
  • 2.
    PRIVACY CONSIDERATIONS WITHDATA AND DATA SCIENCE • Intro & Case Studies • Data & Data Science: Growth and Usage • Privacy: Storm on the Horizon • Concluding Thoughts Agenda 2
  • 3.
    My Background Intro &Case Studies Head of Global Business Analytics Professor (Advanced Analytics) Ph.D. Analytics & Computer Science Financial Analytics, Credit Risk and Insurance Independent Consultant 3
  • 4.
    Target corporation makesan embarrassing revelation Intro & Case Studies Legal 4
  • 5.
    Netflix Plays withFire and Gets Burned Intro & Case Studies 5 Netflix 5
  • 6.
    PRIVACY CONSIDERATIONS WITHDATA AND DATA SCIENCE • Intro & Case Studies • Data & Data Science: Growth and Usage – Data Science – Modern Technology • Privacy: Storm on the Horizon Agenda 6
  • 7.
    Data Usage The Powerof Data Science Propensity Classification/Profiling PersonalizationMarketing 7
  • 8.
    More Data MeansMore Insights Traditional Data Big Data Smart Devices IoT 8
  • 9.
    William Weld Healthcare CaseStudy: Linking Data Destroys Anonymization Data Science 9
  • 10.
    PRIVACY CONSIDERATIONS WITHDATA AND DATA SCIENCE • Intro & Case Studies • Data & Data Science: Growth and Usage – Data Science – Modern Technology • Privacy: Storm on the Horizon Agenda 10
  • 11.
    Data Sources The Powerof Data Science 11 Brainstorm: Today’s Sources of Personal Data? 11
  • 12.
    Sources: Browsing Data Growth:Modern Technology 12
  • 13.
    Sources: off-line behavior DataGrowth: Modern Technology 13
  • 14.
  • 15.
    Sources: The Internetof Things Data Growth: Modern Technology 15
  • 16.
    Data Storage Data Growth:Modern Technology 16
  • 17.
    Source and Useof Customer Data Privacy: A Brief Background Can be Known Used Stored Shared with 3rd parties Observed Volunteered Data Science 17
  • 18.
    PRIVACY CONSIDERATIONS WITHDATA AND DATA SCIENCE 18 • Intro & Case Studies • Data & Data Science: Growth and Usage • Privacy: Storm on the Horizon Agenda
  • 19.
    Privacy Legislation 19 Privacy Legislation EUData Protection Directives 19
  • 20.
    Preparing for compliance Privacy:Storm on the Horizon What are my data assets? Usage Storage Flow to/from 3rd parties Observed Volunteered Data Science Right to be forgotten De-anonymization Cloud computing Explicit and up- front consent Restricted profiling Privacy by Design Potential liabilities from buying, selling and sharing 20
  • 21.
    Moving Forward Privacy: Stormon the Horizon  Become aware of your entire data ecosystem and how it may expose you to privacy violations  Audit current data storage and governance for compliance  Ensure that all product roadmaps comply with the principles of Privacy by Design 21  Ensure that proper user consent is in place from the moment of first user registration  Initiate dialogue with corporate privacy officer or external expert
  • 22.
  • 23.
  • 24.
    Privacy by Design 24 1Proactive not Reactive; Preventative not Remedial 2 Privacy as the Default Setting 3 Privacy Embedded into Design 4 Full Functionality – Positive-Sum, not Zero-Sum 5 End-to-End Security – Full Lifecycle Protection 6 Visibility and Transparency – Keep it Open 7 Respect for User Privacy – Keep it User-Centric
  • 25.
    Privacy by Designfor Big Data (Jeff Jonas, IBM) 25 1. FULL ATTRIBUTION: Every observation (record) needs to know from where it came and when. There cannot be merge/purge data survivorship processing whereby some observations or fields are discarded. 2. DATA TETHERING: Adds, changes and deletes occurring in systems of record must be accounted for, in real time, in sub- seconds. 3. ANALYTICS ON ANONYMIZED DATA: The ability to perform advanced analytics (including some fuzzy matching) over cryptographically altered data means organizations can anonymize more data before information sharing. 4. TAMPER-RESISTANT AUDIT LOGS: Every user search should be logged in a tamper-resistant manner — even the database administrator should not be able to alter the evidence contained in this audit log. 5. FALSE NEGATIVE FAVORING METHODS: The capability to more strongly favor false negatives is of critical importance in systems that could be used to affect someone’s civil liberties. 6. SELF-CORRECTING FALSE POSITIVES: With every new data point presented, prior assertions are re-evaluated to ensure they are still correct, and if no longer correct, these earlier assertions can often be repaired — in real time. 7. INFORMATION TRANSFER ACCOUNTING: Every secondary transfer of data, whether to human eyeball or a tertiary system, can be recorded to allow stakeholders (e.g., data custodians or the consumers themselves) to understand how their data is flowing.

Editor's Notes

  • #9 Photo from http://mac360.com/2012/02/free-make-your-own-photo-puzzles-on-a-mac/
  • #12 Photo from http://www.retrooffice.com/vintage-store/vintage-filing-storage/mcdowell-craig-vintage-steel-retro-vertical-letter-and-legal-file-cabinets.html.