Successfully reported this slideshow.
Your SlideShare is downloading. ×

How businesses can benefit from privacy preserving synthetic data

Loading in …3

Check these out next

1 of 43 Ad

How businesses can benefit from privacy preserving synthetic data

Download to read offline

In this webinar we discuss privacy, it's relevance to data science, and how privacy-preserving synthetic data can help organizations build a bridge between compliance and efficient use of data.

In this webinar we discuss privacy, it's relevance to data science, and how privacy-preserving synthetic data can help organizations build a bridge between compliance and efficient use of data.


More Related Content

Slideshows for you (20)

Similar to How businesses can benefit from privacy preserving synthetic data (20)


Recently uploaded (20)

How businesses can benefit from privacy preserving synthetic data

  1. 1. Statice Webinar How can businesses benefit from privacy-preserving synthetic data? Berlin 2020
  2. 2. Statice Webinar | 2020 Outline 1. What is privacy? 2. Data sharing a. Why share data? b. Data sharing done wrong c. Synthetic data as a solution 3. What can you do with synthetic data? 4. Customer cases 5. Q+A
  3. 3. Statice Webinar | 2020 1. Privacy landscape
  4. 4. Statice Webinar | 2020 ● English dictionary definition: “Privacy is a state in which one is not observed or disturbed by other people” ● Lack of privacy => behavioral change ● Privacy is fundamental to a free society Anonymous voting guarantees freedom of choice Privacy landscape
  5. 5. Statice Webinar | 2020 Privacy yesterday
  6. 6. Statice Webinar | 2020 Privacy in the present ● Digital tracking everywhere ● Social circle, browsing habits, shopping details, location tracking, emails, calls ...
  7. 7. Statice Webinar | 2020 Data protection regulations ● Protection of individual privacy ● Over 80 countries and regions worldwide ● Strictest regulation ○ GDPR - European Union (2018) ● High fines for violations
  8. 8. Use of sensitive data in your company made practically impossible because of data protection regulations: Your data teams are slowed down as data is generally accessible only after a long governance process Your production data cannot be stored or processed using cloud resources as customer consent is mostly not feasible for exploratory data analysis. Your production data cannot be shared with partners for product development or research. Statice Webinar | 2020
  9. 9. Statice Webinar | 2020 Privacy promise: Opt-out scenario ● My data must have no effect on any analysis carried on on the dataset ● Problem: if nobody’s data has no effect on any analysis then there will be no utility.
  10. 10. Statice Webinar | 2020 Privacy promise: what can we expect? ● A tradeoff ○ With or without my data, any outcome of any analysis should be the same ○ The impact on me sharing information in the dataset will be limited to the general learnings not the specifics of my information
  11. 11. Statice Webinar | 2020 2a. Why share data?
  12. 12. Statice Webinar | 2020 Why share data? ● As individuals, we share data all the time ○ With our doctors ○ With our accountants ○ In exchange for a trusted service ● Privacy is not necessarily complete non-disclosure
  13. 13. Statice Webinar | 2020 Why share data? ● Society benefits from individuals sharing their data ○ Medical advances ○ Sociological research, understanding society dynamics ● Examples: ○ Tracking commute patterns to improve public transport networks ○ Detect epidemia and act fast by looking at search engine disease queries/medicine orders
  14. 14. Statice Webinar | 2020
  15. 15. Statice Webinar | 2020 2b. Data sharing done wrong
  16. 16. Statice Webinar | 2020 Illustration Dataset
  17. 17. Statice Webinar | 2020 Problem? Personally Identifying Information
  18. 18. Statice Webinar | 2020 Illustration: Cambridge Analytica ● Infamous leak involved Personally Identifiable Information of over 50 million people
  19. 19. Statice Webinar | 2020 Information not unique to you: "quasi-identifiers"
  20. 20. Statice Webinar | 2020 Illustration: Massachusetts Governor leak Sweeney, Latanya. Weaving Technology and Policy Together to Maintain Confidentiality. Journal of Law, Medicine and Ethics, Vol. 25 1997, p. 98-110
  21. 21. Statice Webinar | 2020 Fingerprint-like information ● On its own, a fingerprint seems cryptic ● Around 100 minutiae in a fingerprint ● Experts declare a fingerprint match if 12 minutiae match ● Precise identification is possible if fingerprints are indexed and queryable
  22. 22. Statice Webinar | 2020 Illustration: Netflix movie preferences Join movie ratings Ratings of only 4-5 movies allowed successful identification of a large number of users was possible. Narayanan A, Shmatikov V. Robust de-anonymization of large spa datasets. InSecurity and Privacy, 2008. SP 2008. IEEE Symposium on 2008 May 18 (pp. 111-125). IEEE.
  23. 23. Statice Webinar | 2020 French Military Base in MaliHeatmap 30 million runners worldwide Not that many in the Sahara Illustration: Strava Running Tracks
  24. 24. Statice Webinar | 2020 And many more . . . ● Search queries ● Browser configuration
  25. 25. So how do we enable the use of sensitive customer data while staying privacy-compliant? Statice Webinar | 2020
  26. 26. Recital 26 of the GDPR: “This regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.” The best way to securely access and leverage sensitive customer data is to use anonymous data. Statice Webinar | 2020
  27. 27. The problem is that traditional anonymization methods are unable to preserve the granularity and quality of the original data required for further processing and analysis. Either they obfuscate data to a large extent or they do not properly protect the data. Data utility Data privacy vs. Statice Webinar | 2020
  28. 28. Statice Webinar | 2020 2c. Synthetic data as a solution
  29. 29. Statice is a data anonymization engine that enables the secure anonymization of data while preserving its statistical utility and data structure. This allows you to perform meaningful data analysis without ever exposing the original data. Statice Webinar | 2020
  30. 30. Guaranteed data privacy Statice generates privacy-preserving synthetic data which is based on mathematical privacy guarantees. Data anonymization made easy. Automatic anonymization and granular data quality Statice anonymizes your data preserving statistical utility and data structure by generating synthetic data. Flexible integration Statice can be conveniently used on-premise both via a CLI or as a Python library. Support for all structured data Statice supports the anonymization of tabular, relational, time-series, geolocation and other types of structured data. Statice Webinar | 2020
  31. 31. Original data Statice engine Anonymous synthetic data 1 2 3 Data analysis ● Automatic understanding of provided data types ● Automatic data classification Training ● Generative algorithms learn the statistical structure and information of the original data Data generation ● Generation of anonymous synthetic data ● Provision of automatic utility and risk evaluations How Statice works Statice Webinar | 2020
  32. 32. Automatic evaluation metrics that are part of the Statice software prove how the statistical properties of the original data are preserved in the newly-generated anonymous synthetic data. Statice Webinar | 2020
  33. 33. Statice Webinar | 2020 3. What can you do with synthetic data?
  34. 34. Use data protection to your advantage and get the most value out of your data Build your data sandbox Train your machine learning algorithms Protect your customer data for BI analysis Enable your scalable use of cloud infrastructures Use Statice to effectively protect sensitive data in order to share it easily with partners or across your organization for quick access and collaborative use. Leverage synthetic data by Statice to train your machine learning models with the same accuracy as when using real-world data. By anonymizing customer data directly, you add a strong safeguard for protecting your customers and enable quick and flexible data analysis. Process synthetic data in cloud instances without ever putting sensitive data at risk and yet benefit from a scalable infrastructure and the cost-efficient use of cloud resources for your company. Statice Webinar | 2020
  35. 35. Statice Webinar | 2020 4. Customer cases
  36. 36. Customer case 1: The Statice engine enabling a German insurance provider to tailor products to its customers Challenges ● Impeded timely access to data and availability of granular information because of legal constraints ● Complicated product development due to sensitive customer data and privacy regulations ● Biased customer behavior modeling due to lack of access to complete customer data sets ● Weeks/months period between customer data acquisition and data processing Solutions ● Enabled timely access to data with Statice by generating synthetic data based on real customer data ● Creation of anonymous data warehouse with much lower compliance hurdles to allow data science teams to work faster on more representative data Long-term benefits ● Unlock sensitive customer data as a prime resource for product innovation ● Massively reduced time-to-data for both internal and external stakeholders (weeks/months to days) ● Lowered compliance overhead and enable innovation prototyping Statice Webinar | 2020
  37. 37. ● High risk of engaging in collaborative partnerships due to sensitive customer data exchange processes ● Potential exposure to customer data leakage and its legal implications ● Reduced ability to devise innovative strategies with third parties due to data privacy and security concerns Solutions ● Statice implemented to produce privacy-preserving synthetic data ● Safe data, with much lower compliance hurdles for partnerships, created for external sharing Long-term benefits ● Compliant and collaborative product development & data monetisation ● Facilitated innovative partnerships through unconstrained customer data exchange Customer case 2: The Statice engine allowing a German healthcare enterprise to safely engage in collaborative partnerships Statice Webinar | 2020 Challenges
  38. 38. Customer case 3: The Statice engine enabling a German telecommunications company the scalable use of cloud infrastructure ● Hugely valuable data in the business’ data exhaust which cannot be properly exploited due to privacy concerns ● Inability to scale a data processing and analysis pipeline on cloud infrastructure due to sensitive data exposure ● High costs and major delays in innovation projects due to the incapacity to perform and scale data processing on the cloud infrastructure Solutions ● Use customer data in the form of privacy-compliant synthetic data which contains highly similar statistical information ● Use of synthetic data generated with Statice offers the freedom to freely, cost-efficiently, fast and safely scale solutions on cloud infrastructure without concerns around customer data privacy Long-term benefits ● Accelerated, cost-efficient use of cloud resources and data for software testing Statice Webinar | 2020 Challenges
  39. 39. Statice ensures full data privacy compliance allowing your data team to work more efficiently Using Statice you can: Minimize your time-to-data from months to days. Unlock your sensitive customer data as a prime resource for product innovation. Ensure your regulatory compliance for the whole data value chain. Statice Webinar | 2020
  40. 40. Statice Webinar | 2020 Any questions?
  41. 41. Unlock your data with Statice. Ben Nolan Head of Business Development
  42. 42. Statice Webinar | 2020 Are you interested in learning more about working with us?
  43. 43. 3. Project kick-off2. Technical planning1. Feasibility study ~8 weeks WE FOLLOW THREE STEPS ON THE WAY TO A COOPERATION Goal Involved parties Results Understanding scope of data and use case for the customer Successful planning of the infrastructure to be used Successful coordination of joint project plan ● Evaluation of shared data schema ● Implementation plan ● Infrastructure plan ● Joint project plan ● Date for project start & the customer & the customer & the customer Statice Webinar | 2020