Bart Custers PhD MSc LLM
Associate professor/head of research
eLaw – Center for Law and DigitalTechnologies
Leiden University,The Netherlands
Cyber Summit 2016 – Banff, Canada
October 27th 2016, 2:15 pm – 2:45 pm
Introduction: big data and data reuse
Eudeco-project
Generating new data vs data reuse
Legal and ethical issues
Privacy, security
Discrimination, stigmatization, polarization
Consent, autonomy, self-determination
Transparency, integrity, trust
Suggestions for solutions
Conclusions
2
more data => more opportunities
This calls for data sharing and reuse
The Eudeco project (3 years)
Five partners
Four countries
Modeling the European Data Economy
Focus on big data and data reuse
Legal, societal, economic and technological perspectives
3
Big Data
• Volume (big)
• Velocity (fast)
• Variety (unstructured)
People
Social media
User generated content
Devices (Internet ofThings)
Sensors
▪ Cameras, microphones
Trackers
▪ RFID tags, web surfing behavior
Other
▪ Mobile phones, wearables
▪ Self-surveillance/quantified self
4
Data sharing
Active role of data subjects
(hence: consent)
Data reuse
(with/without consent)
Data recycling
Data reuse for the same purpose
Data repurposing
Data reuse for new purposes
Data recontextualisation
Data reuse in a new context
5
Data reuse may…
• be more efficient
• be more effective
(e.g., larger volumes, more completeness)
• include historical data
• not always match purposes and
context
• be difficult
• Technological
(e.g. interoperability, data portability
• Legal
(e.g. privacy laws)
• Economic
(e.g. competition)
• Right to data portability
• Right to be forgotten
Facebook likes can predict:
sexual orientation, ethnicity, religious and
political views, personality traits, intelligence,
happiness, use of addictive substances,
parental separation, age, and gender.
(Kosinski et al. 2013)
Legal perspective
Violations of privacy depend on your definition of privacy
Ethical perspective
Violations of privacy depends on your expectations.
Subjective: personal expectations
Objective: reasonable expectations
Unwanted disclosure of information
Security (hacking, leaking)
Predictions
Unwanted use of information
Transparency regarding decision-making
Function creep 6
informational privacy:
Which data are used? For which purposes?
7
8
Data may be discriminating:
When police surveillance focuses on black
neighborhoods, people in database will be black
(selective sampling)
Patterns may be discriminating:
Database may show top managers are male
(self fulfilling prophecy)
People causing car accidents are >16 years old
(non-novel pattern)
Discrimination may be concealed/indirect
Selection on zip code instead of ethnic
background (redlining)
Selection on legitimate attributes correlated to
discriminating attributes (masking)
Discrimination
Stigmatisation
Polarisation
Privacy policies/Terms & Conditions
People do not read policies
Reading everything would take 244 hours annually
Users are willing to spend 1-5 minutes on this
Facebook: 9,500 words (>1 hour), LinkedIn: 7,500 words (~1 hour)
People do not understand policies
Policies are often highly legalistic, technical, or both
Devil is in the details
People do not grasp consequences
Preferred option is not available
Take-it-or-leave it decisions: check the box
9
informational self-determination (Westin, 1967)
People control who gets their data and for which purposes
10
Past Current Future?
Big data is used for a lot of decision-making
Based on what data?
Based on which analyses?
Do you know in how
many databases you are?
LimitingAccess to Sensitive Data
Basic idea is that if sensitive data are absent in the database/cloud, the
resulting decisions/selections cannot be discriminating
However, restricting access is very difficult:
According to information theory, the dissemination of data follows
the laws of entropy:
▪ Information can easily be copied and multiplied
▪ Information can easily be distributed
▪ This process is irreversible
11
Analyze the problem:
Privacy Impact Assessments
Customize the solution:
Privacy by Design
Privacy enhancing tools
Privacy preserving big data analytics
Discrimination aware data mining
12
Since there is not one problem, there is no single solution
Combinations of smart solutions are required
New perspectives
Focus less on:
Limiting access to data
Restrictions use of data
Focus more on:
Transparency
Responsibility
13
Restricting data access and use limits big data
opportunities and is difficult to enforce
We need data sharing and data reuse
There are risks, however, regarding
Privacy, discrimination, consent, transparency
These risks can be addressed via responsible innovation
Privacy ImpactAssessments
Privacy by Design
▪ Privacy enhancing tools
▪ Privacy preserving big data analytics
▪ Discrimination aware data mining
New approaches
Focus less on limitations of access to data and use restrictions
Focus more on transparency and responsibility
14
15
?
?
?
??
? ?
? ?
Thank you for your attention!
Or contact me later: b.h.m.custers@law.leidenuniv.nl

Cyber Summit 2016: Privacy Issues in Big Data Sharing and Reuse

  • 1.
    Bart Custers PhDMSc LLM Associate professor/head of research eLaw – Center for Law and DigitalTechnologies Leiden University,The Netherlands Cyber Summit 2016 – Banff, Canada October 27th 2016, 2:15 pm – 2:45 pm
  • 2.
    Introduction: big dataand data reuse Eudeco-project Generating new data vs data reuse Legal and ethical issues Privacy, security Discrimination, stigmatization, polarization Consent, autonomy, self-determination Transparency, integrity, trust Suggestions for solutions Conclusions 2 more data => more opportunities This calls for data sharing and reuse
  • 3.
    The Eudeco project(3 years) Five partners Four countries Modeling the European Data Economy Focus on big data and data reuse Legal, societal, economic and technological perspectives 3 Big Data • Volume (big) • Velocity (fast) • Variety (unstructured)
  • 4.
    People Social media User generatedcontent Devices (Internet ofThings) Sensors ▪ Cameras, microphones Trackers ▪ RFID tags, web surfing behavior Other ▪ Mobile phones, wearables ▪ Self-surveillance/quantified self 4
  • 5.
    Data sharing Active roleof data subjects (hence: consent) Data reuse (with/without consent) Data recycling Data reuse for the same purpose Data repurposing Data reuse for new purposes Data recontextualisation Data reuse in a new context 5 Data reuse may… • be more efficient • be more effective (e.g., larger volumes, more completeness) • include historical data • not always match purposes and context • be difficult • Technological (e.g. interoperability, data portability • Legal (e.g. privacy laws) • Economic (e.g. competition) • Right to data portability • Right to be forgotten
  • 6.
    Facebook likes canpredict: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender. (Kosinski et al. 2013) Legal perspective Violations of privacy depend on your definition of privacy Ethical perspective Violations of privacy depends on your expectations. Subjective: personal expectations Objective: reasonable expectations Unwanted disclosure of information Security (hacking, leaking) Predictions Unwanted use of information Transparency regarding decision-making Function creep 6 informational privacy: Which data are used? For which purposes?
  • 7.
  • 8.
    8 Data may bediscriminating: When police surveillance focuses on black neighborhoods, people in database will be black (selective sampling) Patterns may be discriminating: Database may show top managers are male (self fulfilling prophecy) People causing car accidents are >16 years old (non-novel pattern) Discrimination may be concealed/indirect Selection on zip code instead of ethnic background (redlining) Selection on legitimate attributes correlated to discriminating attributes (masking) Discrimination Stigmatisation Polarisation
  • 9.
    Privacy policies/Terms &Conditions People do not read policies Reading everything would take 244 hours annually Users are willing to spend 1-5 minutes on this Facebook: 9,500 words (>1 hour), LinkedIn: 7,500 words (~1 hour) People do not understand policies Policies are often highly legalistic, technical, or both Devil is in the details People do not grasp consequences Preferred option is not available Take-it-or-leave it decisions: check the box 9 informational self-determination (Westin, 1967) People control who gets their data and for which purposes
  • 10.
    10 Past Current Future? Bigdata is used for a lot of decision-making Based on what data? Based on which analyses? Do you know in how many databases you are?
  • 11.
    LimitingAccess to SensitiveData Basic idea is that if sensitive data are absent in the database/cloud, the resulting decisions/selections cannot be discriminating However, restricting access is very difficult: According to information theory, the dissemination of data follows the laws of entropy: ▪ Information can easily be copied and multiplied ▪ Information can easily be distributed ▪ This process is irreversible 11
  • 12.
    Analyze the problem: PrivacyImpact Assessments Customize the solution: Privacy by Design Privacy enhancing tools Privacy preserving big data analytics Discrimination aware data mining 12 Since there is not one problem, there is no single solution Combinations of smart solutions are required
  • 13.
    New perspectives Focus lesson: Limiting access to data Restrictions use of data Focus more on: Transparency Responsibility 13 Restricting data access and use limits big data opportunities and is difficult to enforce
  • 14.
    We need datasharing and data reuse There are risks, however, regarding Privacy, discrimination, consent, transparency These risks can be addressed via responsible innovation Privacy ImpactAssessments Privacy by Design ▪ Privacy enhancing tools ▪ Privacy preserving big data analytics ▪ Discrimination aware data mining New approaches Focus less on limitations of access to data and use restrictions Focus more on transparency and responsibility 14
  • 15.
    15 ? ? ? ?? ? ? ? ? Thankyou for your attention! Or contact me later: b.h.m.custers@law.leidenuniv.nl