Privacy in
Bigdata
Era
Srinath Perera, Ph.D.
VP Research WSO2, Apache Member,
( srinath@wso2.com)
@srinath_perera
What is Privacy?
Privacy is the ability of an
individual or group to
seclude themselves, or
information about
themselves, and thereby
express themselves
selectively.
!2
I have nothing to Hide
!3
"If you have something that you don't want
anyone to know, maybe you shouldn't be
doing it in the first place”
—Eric Schmidt, the CEO of Google
I have nothing to Hide
!4
"If one would give me
six lines written by the
hand of the most
honest man, I would
find something in them
to have him hanged”
—Cardinal Richelieu ( first used by
Bruce Schneier)
I have nothing to Hide
Hard to avoid discrimination,
bias, mis-interpretations
We put unreasonable
expectations on others (e.g.
Fundamental attribution
error, Illusory superiority)
People behave differently
when they are watched
(e.g. Be more conformist,
take less risks)
People change, norms change, but data
is forever
In a Zero Privacy world, competition is
hard and power balance is skewed.
• Powerful countries has advantage
• In Democracy, reigning government
has advantage
• Bigger companies has advantage
!6
“You have zero privacy anyway. Get
over it.”
—Scott McNealy
What can Anonymized CDR data can tell about you?
• Where you live, work, your name? When you come
home? When you leave home, your friends, your family,
your income bracket
!7
What Cameras
can tell about
you?
• Eigen face (Unique face ) - Number plate, where you drive
• What you drive? Where you go? Who you met (*), when
you leave home, your habits, how you feel
• This is in public space, you are not protected via privacy
laws
!8
What does electricity data can tell
about you?
• Are you in the house?
When do you leave, when
you come back?
• What appliances in the
house?
• What are you doing
( limited granularity)
• What programs are you
watching?
!9
Two Phones
traveling together
• Just by cell
tower level data
- Who did you
meet? When?
!10
What can we do?
Fighting Back
• Stop Sharing
• Not possible due to value of data
• Having greater control over what we share and how it is stored
• Law and Policy
• Using algorithms
!12
Law and Policy: HIPPA
• In healthcare data must be shared by people to health care provider
• Those data must be shared with other parties ( other hospitals, insurance,
doctors)
• Health Insurance Portability and Accountability Act of 1996 (US)
• Dictate how individually identifiable health information can be collected,
stored, and shared
• It works, but implementation is expensive
!13
Law and Policy: GDPR
• The General Data Protection
Regulation (EU) 2016/679
• Data can be collected, stored, or
processed only with explicit consent
• Dictate how it is stored
• Owner can revoke consent at any
time
• Might be a burden to startups and
small companies
!14
Law and Policy:
Limiting Correlations
• We can limit what
information is legal to
use
• We can limit correlation
and publications of
correlated data
Algorithms: Differential Privacy
• Adding noise to the data such that while keeping aggregative
values make it harder to recover individual values
• Creating artificial data sets for machine learning that build a
similar model
• Apple reported that they adopted Differential privacy in 2016
!16
Algorithms: Distributed Data
!17
• Storing data in
distributed manner
(e.g. each users phone,
machine) and allow
queries
• Doing computations
by combining results
• Much more expensive
than centralized
methods
Algorithms: Homomorphic Encryption
• Encryption technique that enable
computations on encrypted data
• Encrypted data can be use to do
limited calculations
• Still it is computationally expensive
can’t be used widely
!18
Conclusion
• Why Privacy is needed
• What it is challenged
• What can we do?
!19

Privacy in Bigdata Era

  • 1.
    Privacy in Bigdata Era Srinath Perera,Ph.D. VP Research WSO2, Apache Member, ( srinath@wso2.com) @srinath_perera
  • 2.
    What is Privacy? Privacyis the ability of an individual or group to seclude themselves, or information about themselves, and thereby express themselves selectively. !2
  • 3.
    I have nothingto Hide !3 "If you have something that you don't want anyone to know, maybe you shouldn't be doing it in the first place” —Eric Schmidt, the CEO of Google
  • 4.
    I have nothingto Hide !4 "If one would give me six lines written by the hand of the most honest man, I would find something in them to have him hanged” —Cardinal Richelieu ( first used by Bruce Schneier)
  • 5.
    I have nothingto Hide Hard to avoid discrimination, bias, mis-interpretations We put unreasonable expectations on others (e.g. Fundamental attribution error, Illusory superiority) People behave differently when they are watched (e.g. Be more conformist, take less risks) People change, norms change, but data is forever In a Zero Privacy world, competition is hard and power balance is skewed. • Powerful countries has advantage • In Democracy, reigning government has advantage • Bigger companies has advantage
  • 6.
    !6 “You have zeroprivacy anyway. Get over it.” —Scott McNealy
  • 7.
    What can AnonymizedCDR data can tell about you? • Where you live, work, your name? When you come home? When you leave home, your friends, your family, your income bracket !7
  • 8.
    What Cameras can tellabout you? • Eigen face (Unique face ) - Number plate, where you drive • What you drive? Where you go? Who you met (*), when you leave home, your habits, how you feel • This is in public space, you are not protected via privacy laws !8
  • 9.
    What does electricitydata can tell about you? • Are you in the house? When do you leave, when you come back? • What appliances in the house? • What are you doing ( limited granularity) • What programs are you watching? !9
  • 10.
    Two Phones traveling together •Just by cell tower level data - Who did you meet? When? !10
  • 11.
  • 12.
    Fighting Back • StopSharing • Not possible due to value of data • Having greater control over what we share and how it is stored • Law and Policy • Using algorithms !12
  • 13.
    Law and Policy:HIPPA • In healthcare data must be shared by people to health care provider • Those data must be shared with other parties ( other hospitals, insurance, doctors) • Health Insurance Portability and Accountability Act of 1996 (US) • Dictate how individually identifiable health information can be collected, stored, and shared • It works, but implementation is expensive !13
  • 14.
    Law and Policy:GDPR • The General Data Protection Regulation (EU) 2016/679 • Data can be collected, stored, or processed only with explicit consent • Dictate how it is stored • Owner can revoke consent at any time • Might be a burden to startups and small companies !14
  • 15.
    Law and Policy: LimitingCorrelations • We can limit what information is legal to use • We can limit correlation and publications of correlated data
  • 16.
    Algorithms: Differential Privacy •Adding noise to the data such that while keeping aggregative values make it harder to recover individual values • Creating artificial data sets for machine learning that build a similar model • Apple reported that they adopted Differential privacy in 2016 !16
  • 17.
    Algorithms: Distributed Data !17 •Storing data in distributed manner (e.g. each users phone, machine) and allow queries • Doing computations by combining results • Much more expensive than centralized methods
  • 18.
    Algorithms: Homomorphic Encryption •Encryption technique that enable computations on encrypted data • Encrypted data can be use to do limited calculations • Still it is computationally expensive can’t be used widely !18
  • 19.
    Conclusion • Why Privacyis needed • What it is challenged • What can we do? !19