PRIVACY CONSIDERATIONS
WITH DATA AND DATA SCIENCE

David Stephenson, Ph.D.
dsiAnalytics.com
PRIVACY CONSIDERATIONS WITH DATA AND DATA SCIENCE
•  Intro & Case Studies
•  Data & Data Science: Growth and Usage
•  Privacy: Storm on the Horizon
•  Concluding Thoughts
Agenda
2
My Background
Intro & Case Studies
Head of Global Business
Analytics
Professor
(Advanced Analytics)
Ph.D. Analytics &
Computer Science
Financial Analytics,
Credit Risk and Insurance
Independent Consultant
3
Target corporation makes an embarrassing revelation

Intro & Case Studies
Legal
4
Netflix Plays with Fire and Gets Burned

Intro & Case Studies
5
Netflix
5
PRIVACY CONSIDERATIONS WITH DATA AND DATA SCIENCE
•  Intro & Case Studies
•  Data & Data Science: Growth and Usage
– Data Science
– Modern Technology
•  Privacy: Storm on the Horizon
Agenda
6
Data Usage
The Power of Data Science
Propensity Classification/Profiling
PersonalizationMarketing
7
More Data Means More Insights
Traditional Data
Big
Data
Smart
Devices
IoT
8
William Weld
Healthcare Case Study: Linking Data Destroys Anonymization
Data Science
9
PRIVACY CONSIDERATIONS WITH DATA AND DATA SCIENCE
•  Intro & Case Studies
•  Data & Data Science: Growth and Usage
– Data Science
– Modern Technology
•  Privacy: Storm on the Horizon
Agenda
10
Data Sources
The Power of Data Science
11
Brainstorm: Today’s Sources of Personal Data? 11
Sources: Browsing
Data Growth: Modern Technology
12
Sources: off-line behavior
Data Growth: Modern Technology
13
Sources: Biometrics
Data Growth: Modern Technology
14
Sources: The Internet of Things
Data Growth: Modern Technology
15
Data Storage
Data Growth: Modern Technology
16
Source and Use of Customer Data
Privacy: A Brief Background
Can be
Known
Used
Stored
Shared with
3rd parties
Observed
Volunteered
Data
Science
17
PRIVACY CONSIDERATIONS WITH DATA AND DATA SCIENCE
18
•  Intro & Case Studies
•  Data & Data Science: Growth and Usage
•  Privacy: Storm on the Horizon
Agenda
Privacy Legislation
19
Privacy Legislation
EU Data
Protection
Directives
19
Preparing for compliance
Privacy: Storm on the Horizon
What are my
data assets?
Usage
Storage
Flow to/from
3rd parties
Observed
Volunteered
Data
Science
Right to be forgotten
De-anonymization
Cloud computing
Explicit and up-
front consent
Restricted profiling
Privacy by Design
Potential liabilities from
buying, selling and sharing
20
Moving Forward
Privacy: Storm on the Horizon
ü  Become aware of your entire data ecosystem and how it may expose
you to privacy violations
ü  Audit current data storage and governance for compliance
ü  Ensure that all product roadmaps comply with the principles of
Privacy by Design
21
ü  Ensure that proper user consent is in place from the moment of first
user registration
ü  Initiate dialogue with corporate privacy officer or external expert
22
david@dsiAnalytics.com
@DataScienceInno
Contact
Appendix
23
Privacy by Design
24
1  Proactive not Reactive; Preventative not Remedial
2  Privacy as the Default Setting
3  Privacy Embedded into Design
4  Full Functionality – Positive-Sum, not Zero-Sum
5  End-to-End Security – Full Lifecycle Protection
6  Visibility and Transparency – Keep it Open
7  Respect for User Privacy – Keep it User-Centric
Privacy by Design for Big Data (Jeff Jonas, IBM)
25
1.  FULL ATTRIBUTION: Every observation (record) needs to know from where it came and when. There cannot be merge/
purge data survivorship processing whereby some observations or fields are discarded.
2.  DATA TETHERING: Adds, changes and deletes occurring in systems of record must be accounted for, in real time, in sub-
seconds.
3.  ANALYTICS ON ANONYMIZED DATA: The ability to perform advanced analytics (including some fuzzy matching) over
cryptographically altered data means organizations can anonymize more data before information sharing.
4.  TAMPER-RESISTANT AUDIT LOGS: Every user search should be logged in a tamper-resistant manner — even the
database administrator should not be able to alter the evidence contained in this audit log.
5.  FALSE NEGATIVE FAVORING METHODS: The capability to more strongly favor false negatives is of critical importance
in systems that could be used to affect someone’s civil liberties.
6.  SELF-CORRECTING FALSE POSITIVES: With every new data point presented, prior assertions are re-evaluated to
ensure they are still correct, and if no longer correct, these earlier assertions can often be repaired — in real time.
7.  INFORMATION TRANSFER ACCOUNTING: Every secondary transfer of data, whether to human eyeball or a tertiary
system, can be recorded to allow stakeholders (e.g., data custodians or the consumers themselves) to understand how
their data is flowing.

Big Data Expo 2015 - Data Science Innovation Privacy Considerations

  • 1.
    PRIVACY CONSIDERATIONS WITH DATAAND DATA SCIENCE
 David Stephenson, Ph.D. dsiAnalytics.com
  • 2.
    PRIVACY CONSIDERATIONS WITHDATA AND DATA SCIENCE •  Intro & Case Studies •  Data & Data Science: Growth and Usage •  Privacy: Storm on the Horizon •  Concluding Thoughts Agenda 2
  • 3.
    My Background Intro &Case Studies Head of Global Business Analytics Professor (Advanced Analytics) Ph.D. Analytics & Computer Science Financial Analytics, Credit Risk and Insurance Independent Consultant 3
  • 4.
    Target corporation makesan embarrassing revelation
 Intro & Case Studies Legal 4
  • 5.
    Netflix Plays withFire and Gets Burned
 Intro & Case Studies 5 Netflix 5
  • 6.
    PRIVACY CONSIDERATIONS WITHDATA AND DATA SCIENCE •  Intro & Case Studies •  Data & Data Science: Growth and Usage – Data Science – Modern Technology •  Privacy: Storm on the Horizon Agenda 6
  • 7.
    Data Usage The Powerof Data Science Propensity Classification/Profiling PersonalizationMarketing 7
  • 8.
    More Data MeansMore Insights Traditional Data Big Data Smart Devices IoT 8
  • 9.
    William Weld Healthcare CaseStudy: Linking Data Destroys Anonymization Data Science 9
  • 10.
    PRIVACY CONSIDERATIONS WITHDATA AND DATA SCIENCE •  Intro & Case Studies •  Data & Data Science: Growth and Usage – Data Science – Modern Technology •  Privacy: Storm on the Horizon Agenda 10
  • 11.
    Data Sources The Powerof Data Science 11 Brainstorm: Today’s Sources of Personal Data? 11
  • 12.
    Sources: Browsing Data Growth:Modern Technology 12
  • 13.
    Sources: off-line behavior DataGrowth: Modern Technology 13
  • 14.
  • 15.
    Sources: The Internetof Things Data Growth: Modern Technology 15
  • 16.
    Data Storage Data Growth:Modern Technology 16
  • 17.
    Source and Useof Customer Data Privacy: A Brief Background Can be Known Used Stored Shared with 3rd parties Observed Volunteered Data Science 17
  • 18.
    PRIVACY CONSIDERATIONS WITHDATA AND DATA SCIENCE 18 •  Intro & Case Studies •  Data & Data Science: Growth and Usage •  Privacy: Storm on the Horizon Agenda
  • 19.
    Privacy Legislation 19 Privacy Legislation EUData Protection Directives 19
  • 20.
    Preparing for compliance Privacy:Storm on the Horizon What are my data assets? Usage Storage Flow to/from 3rd parties Observed Volunteered Data Science Right to be forgotten De-anonymization Cloud computing Explicit and up- front consent Restricted profiling Privacy by Design Potential liabilities from buying, selling and sharing 20
  • 21.
    Moving Forward Privacy: Stormon the Horizon ü  Become aware of your entire data ecosystem and how it may expose you to privacy violations ü  Audit current data storage and governance for compliance ü  Ensure that all product roadmaps comply with the principles of Privacy by Design 21 ü  Ensure that proper user consent is in place from the moment of first user registration ü  Initiate dialogue with corporate privacy officer or external expert
  • 22.
  • 23.
  • 24.
    Privacy by Design 24 1 Proactive not Reactive; Preventative not Remedial 2  Privacy as the Default Setting 3  Privacy Embedded into Design 4  Full Functionality – Positive-Sum, not Zero-Sum 5  End-to-End Security – Full Lifecycle Protection 6  Visibility and Transparency – Keep it Open 7  Respect for User Privacy – Keep it User-Centric
  • 25.
    Privacy by Designfor Big Data (Jeff Jonas, IBM) 25 1.  FULL ATTRIBUTION: Every observation (record) needs to know from where it came and when. There cannot be merge/ purge data survivorship processing whereby some observations or fields are discarded. 2.  DATA TETHERING: Adds, changes and deletes occurring in systems of record must be accounted for, in real time, in sub- seconds. 3.  ANALYTICS ON ANONYMIZED DATA: The ability to perform advanced analytics (including some fuzzy matching) over cryptographically altered data means organizations can anonymize more data before information sharing. 4.  TAMPER-RESISTANT AUDIT LOGS: Every user search should be logged in a tamper-resistant manner — even the database administrator should not be able to alter the evidence contained in this audit log. 5.  FALSE NEGATIVE FAVORING METHODS: The capability to more strongly favor false negatives is of critical importance in systems that could be used to affect someone’s civil liberties. 6.  SELF-CORRECTING FALSE POSITIVES: With every new data point presented, prior assertions are re-evaluated to ensure they are still correct, and if no longer correct, these earlier assertions can often be repaired — in real time. 7.  INFORMATION TRANSFER ACCOUNTING: Every secondary transfer of data, whether to human eyeball or a tertiary system, can be recorded to allow stakeholders (e.g., data custodians or the consumers themselves) to understand how their data is flowing.