SlideShare a Scribd company logo
1 of 35
Can Privacy Exist With
Machine Learning?
Steve Touw, Chief Technology Officer, Immuta - Gartner Cool Vendor 2018
“Data can be either useful or
perfectly anonymous but never both.”
Paul Ohm,
Broken Promises of Privacy,
57 UCLA Law Review 1701 (2010)
I know stuff about
Judd and Leslie
Judd Apatow & Leslie Mann
Photo Credit: PacificCoastNews.com
© 2017 Immuta All Rights Reserved. 3
New York Taxi &
Limousine Commission
• Data was released containing taxi pickups,
dropoffs, location, time, amount, and tip
amount, among others
• This seems pretty harmless?
© 2017 Immuta All Rights Reserved. 4
Well, Judd and Leslie May
Not Think It’s Harmless
This photos was geotagged (with time), so
by simply querying by medallion and time,
we know how much Judd and Leslie tip!
© 2017 Immuta All Rights Reserved. 5
This is an example of a “link attack”
Medallion & Photo Time
Medallion & Pickup Time
New York
Taxi Data
© 2017 Immuta All Rights Reserved. 6
New York Actually Tried to Anonymize the data
By hashing the medallion
But that didn’t matter….
© 2017 Immuta All Rights Reserved. 7
New York
Taxi Data
Medallion & Photo Time
Pickup Time & Pickup Loc
Pickup Loc & Dropoff Loc
Dropoff Loc & Dropoff Time
Dropoff Time & Receipt
Medallion & Pickup Time
Pickup Time & Pickup Loc
Pickup Loc & Dropoff Loc
Dropoff Loc & Dropoff Time
Dropoff Time & Amount
© 2017 Immuta All Rights Reserved. 8
Remember!
Data can be either useful or perfectly
anonymous but never both.
In fact
“...just three data points were enough to
identify an even larger percentage of
people in the data set. That means that
someone with copies of just three of
your recent receipts — or one receipt,
one Instagram photo of you having
coffee with friends, and one tweet about
the phone you just bought — would have
a 94 percent chance of extracting your
credit card records from those of a
million other people”
© 2017 Immuta All Rights Reserved. 10
“...one Instagram photo of you having
coffee with friends, and one tweet
about the phone you just bought…”
More data is available to us than ever, which
means link attacks become increasingly simple
It’s very easy to build profiles of individuals...
© 2017 Immuta All Rights Reserved. 11
The European Union responds
General Data Protection Regulation (GDPR)
Effective May 25, 2018
Fines up to 4 percent of global revenue
Applies to any company collecting data on EU citizens
© 2017 Immuta All Rights Reserved. 12
GDPR Article 4(1):
'personal data' means any information relating to an identified or identifiable
natural person ('data subject'); an identifiable natural person is one who can be
identified, directly or indirectly, in particular by reference to an identifier such as
a name, an identification number, location data, an online identifier or to one or
more factors specific to the physical, physiological, genetic, mental, economic,
cultural or social identity of that natural person;
In Q3 alone, we’ve seen a huge uptick in interest
from regulators in regulating data, to include
• California Consumer Privacy Act was passed in June 2018, and will take effect in 2020.
• Vermont became the first state in the nation to regulate data brokers.
• In September 2018, the Trump administration, acting through National Telecommunications and
Information Administration, released a “Request for Comments on Developing the Administration’s
Approach to Consumer Privacy.”
• This is the first concrete illustration that a national-level privacy regulation like the GDPR is coming to the US.
• Immuta prediction: By 2020, no major economic zone will be free of an overarching data protection law.
© 2017 Immuta All Rights Reserved. 14
PRIVACY
MACHINE
LEARNING
MACHINE LEARNING WILL
CHANGE THE ECONOMY AS WE
KNOW IT
It’s all
about
the
data!
What Amazon Teaches Us About the Future
Responding to data is at the core of Amazon does… and
why organizations across verticals need to follow its lead
• Supply chain optimization: optimize distribution, storage, routes, schedules, products
• Pricing and profit optimization: elastically tailor pricing to products and consumers
• Customer segmentation: real-time analysis to boost marketing/advertising efficiency
• Software/hardware system analytics: optimizing use and distribution of IT infrastructure globally
• Competitive analysis: automatically process billions of data points about the company, its
competitors, and new trends to create daily / hourly / real-time, automated analyses
© 2017 Immuta All Rights Reserved. 17
The Newer Guys Have the Upper Hand
Low technical debt
• Futuristic software
architectures
Centralized Data
• No data silos
• Specific problem-
set drove data schemas
Fewer Regulatory
Controls
• Not for long!!
They are Data Agile
© 2017 Immuta All Rights Reserved. 18
© 2017 Immuta All Rights Reserved. 19
Centralized Policy
Enforcement
Rapid Access to Data Frictionless to
Data Analysts
Focus on this today
The Three Pillars to Data Agility
© 2017 Immuta All Rights Reserved. 20
Centralized Policy Enforcement
Old World
• Policies managed uniquely
at each data source
• Use ETL to create ”safe” versions of
data
• IT interprets legal
guidance themselves
• Audit logs are disjointed/inconsistent
New World
• Consistent layer for creating data policies
• Policies are enforced dynamically
• Plain-english policy builder usable by any
author and understandable by all
• An unprecedented list of policy logic
at your fingertips
• All actions monitored granularly and
consistently
© 2017 Immuta All Rights Reserved. 21
Introducing
Immuta
© 2017 Immuta All Rights Reserved. 22
Privacy Preserving Techniques
(we do a bunch, I’m only going to touch on a few here)
© 2017 Immuta All Rights Reserved. 23
Right To Privacy?
• Early on photography was expensive
• Near the turn of the century the masses
had general use of photography
• "instantaneous photographs and newspaper
enterprise have invaded the sacred precincts of
private and domestic life." - Samuel Warren and
Louis Brandeis (U.S. Supreme Court Justice)
• Proposed right to “be let alone”
• We generally accept being observed,
but rarely accept being identified
© 2017 Immuta All Rights Reserved. 24
The End of Privacy
[as we know it]?
• Rise of technology and data science
has killed privacy as we know it
• Instead of focusing on how and
when our data is gathered...
• Privacy should now be
how our data is being used.
© 2017 Immuta All Rights Reserved. 25
Immuta can do this
The GDPR understands this!
• The cornerstone of GDPR is consent
• You should only process data for the purposes for
which your data subjects have explicitly consented
• In other words: you must consider analytical
context as a guide to what data you can see
• This is very different from role-based access controls
© 2017 Immuta All Rights Reserved. 26
Towards Practical Differential Privacy for SQL Queries
Johnson, Near, Song, Aug 2017
The Internal study
of queries at Uber
• SQL queries written by
employees at Uber
• 8.1 million queries executed
between March 2013 and
August 2016
• Broad range of sensitive data
including rider and driver
information, trip logs, and
customer support data
27
34% of Uber Data Science
Queries are aggregates
Statistical queries matter!
Data can be either useful or perfectly
anonymous but never both.
IF WE CONSIDER STATISTICAL QUERIES USEFUL, THIS CAN BE A LIE:
How?
© 2017 Immuta All Rights Reserved. 29
Let’s play a game
• Think of a number between 1 and 6
• Now I’m going to ask you a question you
probably don’t want to answer in public
• Do you hide spending from your spouse?
• Now raise your hand if you thought of
a 3 OR answered yes to the above
© 2017 Immuta All Rights Reserved. 30
This is Differential Privacy
• I protected your privacy by providing plausible deniability
• But I can also understand the percentage of people that hide spending from their
spouse because I understand the probability of you selecting a 3
• Differential Privacy is restricted to only statistical queries and adds the appropriate
amount of noise based on the sensitivity of the question
• ‘Differential privacy formalizes the idea that a "private" computation should not reveal
whether any one person participated in the input or not, much less what their data are.’
- [Frank McSherry] (https://github.com/frankmcsherry/blog/blob/master/posts/2016-02-03.md)
© 2017 Immuta All Rights Reserved. 31
How Could NYT Have Done it?
Localized Sensitivity
© 2017 Immuta All Rights Reserved. 32
How do we
do it?
Simple…
In plain
English
everyone
can
understand
© 2017 Immuta All Rights Reserved. 33
Can Privacy and
Machine Learning
Exist Together?
We believe it can,
data agility is what
you need
© 2017 Immuta All Rights Reserved. 34
Questions
steve@immuta.com
@steve_touw
www.immuta.com
Come visit our Booth #729!

More Related Content

Similar to Can Privacy Exist With Machine Learning?

Automated Data Governance 101 - A Guide to Proactively Addressing Your Privac...
Automated Data Governance 101 - A Guide to Proactively Addressing Your Privac...Automated Data Governance 101 - A Guide to Proactively Addressing Your Privac...
Automated Data Governance 101 - A Guide to Proactively Addressing Your Privac...DATAVERSITY
 
Data Privacy: What you need to know about privacy, from compliance to ethics
Data Privacy: What you need to know about privacy, from compliance to ethicsData Privacy: What you need to know about privacy, from compliance to ethics
Data Privacy: What you need to know about privacy, from compliance to ethicsAT Internet
 
Age Friendly Economy - Legislation and Ethics of Data Use
Age Friendly Economy - Legislation and Ethics of Data UseAge Friendly Economy - Legislation and Ethics of Data Use
Age Friendly Economy - Legislation and Ethics of Data UseAgeFriendlyEconomy
 
Is More Data Always Better? The Legal Risks of Data Collection, Storage and U...
Is More Data Always Better? The Legal Risks of Data Collection, Storage and U...Is More Data Always Better? The Legal Risks of Data Collection, Storage and U...
Is More Data Always Better? The Legal Risks of Data Collection, Storage and U...Vivastream
 
The Privacy Illusion
The Privacy IllusionThe Privacy Illusion
The Privacy IllusionMary Aviles
 
DATA GOVERNANCE
DATA GOVERNANCEDATA GOVERNANCE
DATA GOVERNANCEVivastream
 
Getting Started with GDPR Compliance
Getting Started with GDPR ComplianceGetting Started with GDPR Compliance
Getting Started with GDPR ComplianceDATAVERSITY
 
The privacy and security implications of AI, big data and predictive analytics
The privacy and security implications of AI, big data and predictive analyticsThe privacy and security implications of AI, big data and predictive analytics
The privacy and security implications of AI, big data and predictive analyticsDan Michaluk
 
Internet of Things With Privacy in Mind
Internet of Things With Privacy in MindInternet of Things With Privacy in Mind
Internet of Things With Privacy in MindGosia Fraser
 
TrustUX: balancing personalisation and privacy to create understanding and tr...
TrustUX: balancing personalisation and privacy to create understanding and tr...TrustUX: balancing personalisation and privacy to create understanding and tr...
TrustUX: balancing personalisation and privacy to create understanding and tr...Ann Wuyts
 
Access now : Data Protection: What you should know about it?
Access now : Data Protection: What you should know about it?Access now : Data Protection: What you should know about it?
Access now : Data Protection: What you should know about it?ANSItunCERT
 
Smart Data Module 5 d drive_legislation
Smart Data Module 5 d drive_legislationSmart Data Module 5 d drive_legislation
Smart Data Module 5 d drive_legislationcaniceconsulting
 
Privacy vs personalization: advisory for brand and comms practitioners into 2...
Privacy vs personalization: advisory for brand and comms practitioners into 2...Privacy vs personalization: advisory for brand and comms practitioners into 2...
Privacy vs personalization: advisory for brand and comms practitioners into 2...Dave Holland
 
Tom tom - Location services and privacy | Simon Hania @ VINT symposium THINGS...
Tom tom - Location services and privacy | Simon Hania @ VINT symposium THINGS...Tom tom - Location services and privacy | Simon Hania @ VINT symposium THINGS...
Tom tom - Location services and privacy | Simon Hania @ VINT symposium THINGS...VINTlabs | The Sogeti Trendlab
 
Hivos and Responsible Data
Hivos and Responsible DataHivos and Responsible Data
Hivos and Responsible DataTom Walker
 
IT law : the middle kingdom between east and West
IT law : the middle kingdom between east and WestIT law : the middle kingdom between east and West
IT law : the middle kingdom between east and WestLilian Edwards
 

Similar to Can Privacy Exist With Machine Learning? (20)

Automated Data Governance 101 - A Guide to Proactively Addressing Your Privac...
Automated Data Governance 101 - A Guide to Proactively Addressing Your Privac...Automated Data Governance 101 - A Guide to Proactively Addressing Your Privac...
Automated Data Governance 101 - A Guide to Proactively Addressing Your Privac...
 
Data Privacy: What you need to know about privacy, from compliance to ethics
Data Privacy: What you need to know about privacy, from compliance to ethicsData Privacy: What you need to know about privacy, from compliance to ethics
Data Privacy: What you need to know about privacy, from compliance to ethics
 
PP Lec9n10 Sp2020.pptx
PP Lec9n10 Sp2020.pptxPP Lec9n10 Sp2020.pptx
PP Lec9n10 Sp2020.pptx
 
Age Friendly Economy - Legislation and Ethics of Data Use
Age Friendly Economy - Legislation and Ethics of Data UseAge Friendly Economy - Legislation and Ethics of Data Use
Age Friendly Economy - Legislation and Ethics of Data Use
 
SFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdf
SFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdfSFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdf
SFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdf
 
Is More Data Always Better? The Legal Risks of Data Collection, Storage and U...
Is More Data Always Better? The Legal Risks of Data Collection, Storage and U...Is More Data Always Better? The Legal Risks of Data Collection, Storage and U...
Is More Data Always Better? The Legal Risks of Data Collection, Storage and U...
 
The Privacy Illusion
The Privacy IllusionThe Privacy Illusion
The Privacy Illusion
 
DATA GOVERNANCE
DATA GOVERNANCEDATA GOVERNANCE
DATA GOVERNANCE
 
Getting Started with GDPR Compliance
Getting Started with GDPR ComplianceGetting Started with GDPR Compliance
Getting Started with GDPR Compliance
 
The privacy and security implications of AI, big data and predictive analytics
The privacy and security implications of AI, big data and predictive analyticsThe privacy and security implications of AI, big data and predictive analytics
The privacy and security implications of AI, big data and predictive analytics
 
Internet of Things With Privacy in Mind
Internet of Things With Privacy in MindInternet of Things With Privacy in Mind
Internet of Things With Privacy in Mind
 
TrustUX: balancing personalisation and privacy to create understanding and tr...
TrustUX: balancing personalisation and privacy to create understanding and tr...TrustUX: balancing personalisation and privacy to create understanding and tr...
TrustUX: balancing personalisation and privacy to create understanding and tr...
 
Access now : Data Protection: What you should know about it?
Access now : Data Protection: What you should know about it?Access now : Data Protection: What you should know about it?
Access now : Data Protection: What you should know about it?
 
GDPR: Where should you be right now? - Dennis Slattery, EDM Works
GDPR: Where should you be right now? - Dennis Slattery, EDM WorksGDPR: Where should you be right now? - Dennis Slattery, EDM Works
GDPR: Where should you be right now? - Dennis Slattery, EDM Works
 
Smart Data Module 5 d drive_legislation
Smart Data Module 5 d drive_legislationSmart Data Module 5 d drive_legislation
Smart Data Module 5 d drive_legislation
 
Privacy vs personalization: advisory for brand and comms practitioners into 2...
Privacy vs personalization: advisory for brand and comms practitioners into 2...Privacy vs personalization: advisory for brand and comms practitioners into 2...
Privacy vs personalization: advisory for brand and comms practitioners into 2...
 
Role of CAs in cyber world
Role of CAs in cyber worldRole of CAs in cyber world
Role of CAs in cyber world
 
Tom tom - Location services and privacy | Simon Hania @ VINT symposium THINGS...
Tom tom - Location services and privacy | Simon Hania @ VINT symposium THINGS...Tom tom - Location services and privacy | Simon Hania @ VINT symposium THINGS...
Tom tom - Location services and privacy | Simon Hania @ VINT symposium THINGS...
 
Hivos and Responsible Data
Hivos and Responsible DataHivos and Responsible Data
Hivos and Responsible Data
 
IT law : the middle kingdom between east and West
IT law : the middle kingdom between east and WestIT law : the middle kingdom between east and West
IT law : the middle kingdom between east and West
 

Recently uploaded

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Can Privacy Exist With Machine Learning?

  • 1. Can Privacy Exist With Machine Learning? Steve Touw, Chief Technology Officer, Immuta - Gartner Cool Vendor 2018
  • 2. “Data can be either useful or perfectly anonymous but never both.” Paul Ohm, Broken Promises of Privacy, 57 UCLA Law Review 1701 (2010)
  • 3. I know stuff about Judd and Leslie Judd Apatow & Leslie Mann Photo Credit: PacificCoastNews.com © 2017 Immuta All Rights Reserved. 3
  • 4. New York Taxi & Limousine Commission • Data was released containing taxi pickups, dropoffs, location, time, amount, and tip amount, among others • This seems pretty harmless? © 2017 Immuta All Rights Reserved. 4
  • 5. Well, Judd and Leslie May Not Think It’s Harmless This photos was geotagged (with time), so by simply querying by medallion and time, we know how much Judd and Leslie tip! © 2017 Immuta All Rights Reserved. 5
  • 6. This is an example of a “link attack” Medallion & Photo Time Medallion & Pickup Time New York Taxi Data © 2017 Immuta All Rights Reserved. 6
  • 7. New York Actually Tried to Anonymize the data By hashing the medallion But that didn’t matter…. © 2017 Immuta All Rights Reserved. 7
  • 8. New York Taxi Data Medallion & Photo Time Pickup Time & Pickup Loc Pickup Loc & Dropoff Loc Dropoff Loc & Dropoff Time Dropoff Time & Receipt Medallion & Pickup Time Pickup Time & Pickup Loc Pickup Loc & Dropoff Loc Dropoff Loc & Dropoff Time Dropoff Time & Amount © 2017 Immuta All Rights Reserved. 8
  • 9. Remember! Data can be either useful or perfectly anonymous but never both.
  • 10. In fact “...just three data points were enough to identify an even larger percentage of people in the data set. That means that someone with copies of just three of your recent receipts — or one receipt, one Instagram photo of you having coffee with friends, and one tweet about the phone you just bought — would have a 94 percent chance of extracting your credit card records from those of a million other people” © 2017 Immuta All Rights Reserved. 10
  • 11. “...one Instagram photo of you having coffee with friends, and one tweet about the phone you just bought…” More data is available to us than ever, which means link attacks become increasingly simple It’s very easy to build profiles of individuals... © 2017 Immuta All Rights Reserved. 11
  • 12. The European Union responds General Data Protection Regulation (GDPR) Effective May 25, 2018 Fines up to 4 percent of global revenue Applies to any company collecting data on EU citizens © 2017 Immuta All Rights Reserved. 12
  • 13. GDPR Article 4(1): 'personal data' means any information relating to an identified or identifiable natural person ('data subject'); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
  • 14. In Q3 alone, we’ve seen a huge uptick in interest from regulators in regulating data, to include • California Consumer Privacy Act was passed in June 2018, and will take effect in 2020. • Vermont became the first state in the nation to regulate data brokers. • In September 2018, the Trump administration, acting through National Telecommunications and Information Administration, released a “Request for Comments on Developing the Administration’s Approach to Consumer Privacy.” • This is the first concrete illustration that a national-level privacy regulation like the GDPR is coming to the US. • Immuta prediction: By 2020, no major economic zone will be free of an overarching data protection law. © 2017 Immuta All Rights Reserved. 14
  • 16. MACHINE LEARNING WILL CHANGE THE ECONOMY AS WE KNOW IT
  • 17. It’s all about the data! What Amazon Teaches Us About the Future Responding to data is at the core of Amazon does… and why organizations across verticals need to follow its lead • Supply chain optimization: optimize distribution, storage, routes, schedules, products • Pricing and profit optimization: elastically tailor pricing to products and consumers • Customer segmentation: real-time analysis to boost marketing/advertising efficiency • Software/hardware system analytics: optimizing use and distribution of IT infrastructure globally • Competitive analysis: automatically process billions of data points about the company, its competitors, and new trends to create daily / hourly / real-time, automated analyses © 2017 Immuta All Rights Reserved. 17
  • 18. The Newer Guys Have the Upper Hand Low technical debt • Futuristic software architectures Centralized Data • No data silos • Specific problem- set drove data schemas Fewer Regulatory Controls • Not for long!! They are Data Agile © 2017 Immuta All Rights Reserved. 18
  • 19. © 2017 Immuta All Rights Reserved. 19
  • 20. Centralized Policy Enforcement Rapid Access to Data Frictionless to Data Analysts Focus on this today The Three Pillars to Data Agility © 2017 Immuta All Rights Reserved. 20
  • 21. Centralized Policy Enforcement Old World • Policies managed uniquely at each data source • Use ETL to create ”safe” versions of data • IT interprets legal guidance themselves • Audit logs are disjointed/inconsistent New World • Consistent layer for creating data policies • Policies are enforced dynamically • Plain-english policy builder usable by any author and understandable by all • An unprecedented list of policy logic at your fingertips • All actions monitored granularly and consistently © 2017 Immuta All Rights Reserved. 21
  • 22. Introducing Immuta © 2017 Immuta All Rights Reserved. 22
  • 23. Privacy Preserving Techniques (we do a bunch, I’m only going to touch on a few here) © 2017 Immuta All Rights Reserved. 23
  • 24. Right To Privacy? • Early on photography was expensive • Near the turn of the century the masses had general use of photography • "instantaneous photographs and newspaper enterprise have invaded the sacred precincts of private and domestic life." - Samuel Warren and Louis Brandeis (U.S. Supreme Court Justice) • Proposed right to “be let alone” • We generally accept being observed, but rarely accept being identified © 2017 Immuta All Rights Reserved. 24
  • 25. The End of Privacy [as we know it]? • Rise of technology and data science has killed privacy as we know it • Instead of focusing on how and when our data is gathered... • Privacy should now be how our data is being used. © 2017 Immuta All Rights Reserved. 25
  • 26. Immuta can do this The GDPR understands this! • The cornerstone of GDPR is consent • You should only process data for the purposes for which your data subjects have explicitly consented • In other words: you must consider analytical context as a guide to what data you can see • This is very different from role-based access controls © 2017 Immuta All Rights Reserved. 26
  • 27. Towards Practical Differential Privacy for SQL Queries Johnson, Near, Song, Aug 2017 The Internal study of queries at Uber • SQL queries written by employees at Uber • 8.1 million queries executed between March 2013 and August 2016 • Broad range of sensitive data including rider and driver information, trip logs, and customer support data 27
  • 28. 34% of Uber Data Science Queries are aggregates Statistical queries matter!
  • 29. Data can be either useful or perfectly anonymous but never both. IF WE CONSIDER STATISTICAL QUERIES USEFUL, THIS CAN BE A LIE: How? © 2017 Immuta All Rights Reserved. 29
  • 30. Let’s play a game • Think of a number between 1 and 6 • Now I’m going to ask you a question you probably don’t want to answer in public • Do you hide spending from your spouse? • Now raise your hand if you thought of a 3 OR answered yes to the above © 2017 Immuta All Rights Reserved. 30
  • 31. This is Differential Privacy • I protected your privacy by providing plausible deniability • But I can also understand the percentage of people that hide spending from their spouse because I understand the probability of you selecting a 3 • Differential Privacy is restricted to only statistical queries and adds the appropriate amount of noise based on the sensitivity of the question • ‘Differential privacy formalizes the idea that a "private" computation should not reveal whether any one person participated in the input or not, much less what their data are.’ - [Frank McSherry] (https://github.com/frankmcsherry/blog/blob/master/posts/2016-02-03.md) © 2017 Immuta All Rights Reserved. 31
  • 32. How Could NYT Have Done it? Localized Sensitivity © 2017 Immuta All Rights Reserved. 32
  • 33. How do we do it? Simple… In plain English everyone can understand © 2017 Immuta All Rights Reserved. 33
  • 34. Can Privacy and Machine Learning Exist Together? We believe it can, data agility is what you need © 2017 Immuta All Rights Reserved. 34