SlideShare a Scribd company logo
1 of 20
Download to read offline
Scaling Privacy with
Apache Spark
Aaron Colcord
Sr. Director Engineering, Northwestern Mutual
Don Durai Bosco
CTO and Co-Founder, Privacera
Agenda
▪ Our background
▪ Why privacy, security,
compliance?
▪ Approaches
▪ Ideal problem solve
▪ Real life meets ideal life
Backgrounds
▪ Building an Enterprise Scale Unified
Framework
▪ Very Long, Respected History ~ 160 Years
▪ Compliance is extremely important to us
▪ Agile Data vs Compliant Data
▪ Founded in 2016 by the creators of Apache
Ranger & Apache Atlas
▪ Extends Ranger's capabilities beyond traditional
Big Data environments to cloud (Databricks,
AWS, Azure, GCP, and more)
▪ Specializes in democratizing data for analytics,
while ensuring compliance with privacy
regulations (GDPR, CCPA, LGPD, HIPAA, & more)
• Privacera
• Northwestern Mutual
Why do we suddenly care about privacy?
• You care if you are regulated in any form
• Simple you need to show you can pass an audit
• You care if you store any information about your users
• Simple because governments have woken up with GDPR and CCPA
• You care if you want to democratize your data
• Simple because the use of your data can be scrutinized
We always did, but technology got ahead of privacy. Privacy is often this assumed competency, and
technology really showed how important it was.
Have you ever...
• Collecting information about your customers can
• Improve the experience
• Allow the company to understand their business better
• At the core, privacy is a policy and legal obligation
• You have the data, it used to be your business to just secure it.
• Do you want your information monetized? Sold? Traded?
• Most companies don’t do this. But the privacy policy is there for you.
• Clicked ‘accept all’ on website, used a digital assistant..
Gone to a website and read their privacy policy, clicked accept cookies, accepted terms of service, or
EULA?
And it’s only going to pick up speed.
• More Regulations are arriving around privacy
• Increasing your ability to execute against data means respecting your user’s rights
• A part of maturity is being able to manage governance
More importantly, why do we care so much?
• Technology like Apache Spark opens the capability to
democratize your data.
• Most every company wants the marketplace to enrich
and share their data.
• Who inside that company can view it? Do we have the
controls to protect your information? Can we verify
that the information is used for the right purposes?
What is the difference between these?
▪ Preventing unauthorized
usage of systems
▪ Ensuring users don’t see the
incorrect information
▪ Creating boundaries to
enforce right action of the
system
• The process of making sure
your company and
employees follow all laws,
regulations, standards, and
ethical practices that apply
to your organization
• Compliance
• Security
• “Data privacy may be
defined as the authorized,
fair, and legitimate
processing of personal
information”
• Consent rights
• Do not share
• Slippery space
• Privacy
Examine strategies to scale agile data w/privacy
• Build a metadata layer that defines PII in its schema
• Users and developers can and will change where PII is stored
• You can literally chase people to do the ‘right thing’ forever
• You could build views with permissions to certain users
• Not very scalable
• Plus you need to always show who accessed and why
• Are these security scenario?
Challenges to that strategy
• Is the metadata layer flexible enough or should we think in policies?
• Privacy is inherently your organization’s position which may evolve based on regulation
• Can your development keep up with views?
• When you discover the extra 10,000 fields, can you keep up?
• Implement a framework that scales
• Security is not Privacy.
• Security has a different domain and set of principles.
• Remember we are protecting the usage of your data.
How can we solve it?
Ideal scalable system
▪ Revocation of
Consent
▪ Portability
▪ Erasure
▪ Rectification
▪ How is data used?
▪ Rights follow Data
Reuse
▪ Flexible to change
▪ Should align with a
Data Governance
program
▪ Should adapt to
changing data
▪ Proactive.
▪ Reclassification
• Classification
• User Rights
▪ How was it used?
▪ How was it
accessed?
▪ How was it
protected?
▪ Did it cross
borders?
• Audit/Governance
▪ Authorization of
User may change
▪ Supports Agile
Access
▪ Business Use is
preserved
▪ Automated
Systems obey
Privacy
• Access
User Rights at Scale
▪ Revocation of Consent/ Right To Be Forgotten
▪ Portability
▪ Erasure
▪ Rectification
▪ How is data used?
▪ Rights follow Data Reuse
▪ Flexible to change
S3 ADLS Redshift Snowflake Synapse
Privacy Challenges in Open Data Ecosystem
Athena Databricks HDInsight
EMR
Dremio Trino PrestoDB
PowerBI Tableau
Storage
SQL Engines
Data Virtualization
BI Tools
Marketing
Data
Analyst
Data
Scientist/A
rchitect
Governance blind spot
Tools & Technology
AUTOMATED DATA DISCOVERY CENTRALIZED ACCESS CONTROL
AUDIT COLLECTION AND REPORTING
Automated Data Discovery
● Automatically detect and catalog sensitive
data
● Detailed classification, e.g. EMAIL, SSN,
GENDER, CC, PHONE_NUMBER, etc.
● Eliminate manual processes
● Catalog data as it is ingested
● Track data movement and propagate tag
● Catalog data across multiple cloud
services
Centralized Access Control
● Global Tag/Classification-based policies
● Purpose and Persona based policies
● Dynamic row filters v/s Views
● Dynamic masking or decryption
● Approval workflows with time and
purpose constraints
Centralized Auditing and Reporting
● Centralize auditing
● Monitoring data access by classification
● Track usage by Purpose
● Generate attestation reports
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

What's hot

Migrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksMigrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksDatabricks
 
Kelley Blue Book Uses Big Data to Increase User Engagement Over 100%
Kelley Blue Book Uses Big Data to Increase User Engagement Over 100%Kelley Blue Book Uses Big Data to Increase User Engagement Over 100%
Kelley Blue Book Uses Big Data to Increase User Engagement Over 100%Cloudera, Inc.
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchCloudera, Inc.
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsCloudera, Inc.
 
Making Self-Service BI a Reality in the Enterprise
Making Self-Service BI a Reality in the EnterpriseMaking Self-Service BI a Reality in the Enterprise
Making Self-Service BI a Reality in the EnterpriseCloudera, Inc.
 
Preparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissancePreparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissanceCloudera, Inc.
 
Data Drive Applications_Webinar
Data Drive Applications_WebinarData Drive Applications_Webinar
Data Drive Applications_WebinarSean Spediacci
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18Cloudera, Inc.
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondDataWorks Summit/Hadoop Summit
 
Managing Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team BuildingManaging Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team BuildingCloudera, Inc.
 
Using Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for TelcosUsing Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for TelcosCloudera, Inc.
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedCloudera, Inc.
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoopNiel Dunnage
 
RecordService for Unified Access Control
RecordService for Unified Access ControlRecordService for Unified Access Control
RecordService for Unified Access ControlCloudera, Inc.
 
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Cloudera, Inc.
 
Relying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services ExperienceRelying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services ExperienceCloudera, Inc.
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with HadoopCloudera, Inc.
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Cloudera, Inc.
 

What's hot (20)

Migrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksMigrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for Databricks
 
Kelley Blue Book Uses Big Data to Increase User Engagement Over 100%
Kelley Blue Book Uses Big Data to Increase User Engagement Over 100%Kelley Blue Book Uses Big Data to Increase User Engagement Over 100%
Kelley Blue Book Uses Big Data to Increase User Engagement Over 100%
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with Search
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 
Making Self-Service BI a Reality in the Enterprise
Making Self-Service BI a Reality in the EnterpriseMaking Self-Service BI a Reality in the Enterprise
Making Self-Service BI a Reality in the Enterprise
 
Preparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissancePreparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity Renaissance
 
Data Drive Applications_Webinar
Data Drive Applications_WebinarData Drive Applications_Webinar
Data Drive Applications_Webinar
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
 
Managing Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team BuildingManaging Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team Building
 
Using Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for TelcosUsing Hadoop to Drive Down Fraud for Telcos
Using Hadoop to Drive Down Fraud for Telcos
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
RecordService for Unified Access Control
RecordService for Unified Access ControlRecordService for Unified Access Control
RecordService for Unified Access Control
 
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Relying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services ExperienceRelying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services Experience
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 

Similar to Privacera and Northwestern Mutual - Scaling Privacy in a Spark Ecosystem

Scaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark EcosystemScaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark EcosystemDatabricks
 
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Cloudera, Inc.
 
Microsoft Cloud GDPR Compliance Options (SUGUK)
Microsoft Cloud GDPR Compliance Options (SUGUK)Microsoft Cloud GDPR Compliance Options (SUGUK)
Microsoft Cloud GDPR Compliance Options (SUGUK)Andy Talbot
 
cloud session uklug
cloud session uklugcloud session uklug
cloud session uklugdominion
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
 
Data Loss Prevention in O365
Data Loss Prevention in O365Data Loss Prevention in O365
Data Loss Prevention in O365Don Daubert
 
Data Services Marketplace
Data Services MarketplaceData Services Marketplace
Data Services MarketplaceDenodo
 
IDERA Live | Understanding SQL Server Compliance both in the Cloud and On Pre...
IDERA Live | Understanding SQL Server Compliance both in the Cloud and On Pre...IDERA Live | Understanding SQL Server Compliance both in the Cloud and On Pre...
IDERA Live | Understanding SQL Server Compliance both in the Cloud and On Pre...IDERA Software
 
BI: How Can Your High-Performance BI System Meet Expectations When You Feed I...
BI: How Can Your High-Performance BI System Meet Expectations When You Feed I...BI: How Can Your High-Performance BI System Meet Expectations When You Feed I...
BI: How Can Your High-Performance BI System Meet Expectations When You Feed I...Ray Mcglew
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
GDPR - Why it matters and how to make it Easy
GDPR - Why it matters and how to make it EasyGDPR - Why it matters and how to make it Easy
GDPR - Why it matters and how to make it EasyPaul McQuillan
 
Webinar - Compliance with the Microsoft Cloud- 2017-04-19
Webinar - Compliance with the Microsoft Cloud- 2017-04-19Webinar - Compliance with the Microsoft Cloud- 2017-04-19
Webinar - Compliance with the Microsoft Cloud- 2017-04-19TechSoup
 
SharePoint Governance 101 SPSSA2016
SharePoint Governance 101  SPSSA2016SharePoint Governance 101  SPSSA2016
SharePoint Governance 101 SPSSA2016Jim Adcock
 
Fuse Analytics - HR & Payroll Cloud Transformation Pitfalls, Lessons Learned
 Fuse Analytics - HR & Payroll Cloud Transformation Pitfalls, Lessons Learned Fuse Analytics - HR & Payroll Cloud Transformation Pitfalls, Lessons Learned
Fuse Analytics - HR & Payroll Cloud Transformation Pitfalls, Lessons LearnedCharles Eubanks
 
Lecture Data Classification And Data Loss Prevention
Lecture Data Classification And Data Loss PreventionLecture Data Classification And Data Loss Prevention
Lecture Data Classification And Data Loss PreventionNicholas Davis
 
Data Classification And Loss Prevention
Data Classification And Loss PreventionData Classification And Loss Prevention
Data Classification And Loss PreventionNicholas Davis
 
Lecture data classification_and_data_loss_prevention
Lecture data classification_and_data_loss_preventionLecture data classification_and_data_loss_prevention
Lecture data classification_and_data_loss_preventionNicholas Davis
 
Global Data Privacy Regulation
Global Data Privacy RegulationGlobal Data Privacy Regulation
Global Data Privacy RegulationJatin Kochhar
 
Cybersecurity and Data Protection Executive Briefing
Cybersecurity and Data Protection Executive BriefingCybersecurity and Data Protection Executive Briefing
Cybersecurity and Data Protection Executive BriefingRFA
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...DataScienceConferenc1
 

Similar to Privacera and Northwestern Mutual - Scaling Privacy in a Spark Ecosystem (20)

Scaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark EcosystemScaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark Ecosystem
 
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
 
Microsoft Cloud GDPR Compliance Options (SUGUK)
Microsoft Cloud GDPR Compliance Options (SUGUK)Microsoft Cloud GDPR Compliance Options (SUGUK)
Microsoft Cloud GDPR Compliance Options (SUGUK)
 
cloud session uklug
cloud session uklugcloud session uklug
cloud session uklug
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 
Data Loss Prevention in O365
Data Loss Prevention in O365Data Loss Prevention in O365
Data Loss Prevention in O365
 
Data Services Marketplace
Data Services MarketplaceData Services Marketplace
Data Services Marketplace
 
IDERA Live | Understanding SQL Server Compliance both in the Cloud and On Pre...
IDERA Live | Understanding SQL Server Compliance both in the Cloud and On Pre...IDERA Live | Understanding SQL Server Compliance both in the Cloud and On Pre...
IDERA Live | Understanding SQL Server Compliance both in the Cloud and On Pre...
 
BI: How Can Your High-Performance BI System Meet Expectations When You Feed I...
BI: How Can Your High-Performance BI System Meet Expectations When You Feed I...BI: How Can Your High-Performance BI System Meet Expectations When You Feed I...
BI: How Can Your High-Performance BI System Meet Expectations When You Feed I...
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
GDPR - Why it matters and how to make it Easy
GDPR - Why it matters and how to make it EasyGDPR - Why it matters and how to make it Easy
GDPR - Why it matters and how to make it Easy
 
Webinar - Compliance with the Microsoft Cloud- 2017-04-19
Webinar - Compliance with the Microsoft Cloud- 2017-04-19Webinar - Compliance with the Microsoft Cloud- 2017-04-19
Webinar - Compliance with the Microsoft Cloud- 2017-04-19
 
SharePoint Governance 101 SPSSA2016
SharePoint Governance 101  SPSSA2016SharePoint Governance 101  SPSSA2016
SharePoint Governance 101 SPSSA2016
 
Fuse Analytics - HR & Payroll Cloud Transformation Pitfalls, Lessons Learned
 Fuse Analytics - HR & Payroll Cloud Transformation Pitfalls, Lessons Learned Fuse Analytics - HR & Payroll Cloud Transformation Pitfalls, Lessons Learned
Fuse Analytics - HR & Payroll Cloud Transformation Pitfalls, Lessons Learned
 
Lecture Data Classification And Data Loss Prevention
Lecture Data Classification And Data Loss PreventionLecture Data Classification And Data Loss Prevention
Lecture Data Classification And Data Loss Prevention
 
Data Classification And Loss Prevention
Data Classification And Loss PreventionData Classification And Loss Prevention
Data Classification And Loss Prevention
 
Lecture data classification_and_data_loss_prevention
Lecture data classification_and_data_loss_preventionLecture data classification_and_data_loss_prevention
Lecture data classification_and_data_loss_prevention
 
Global Data Privacy Regulation
Global Data Privacy RegulationGlobal Data Privacy Regulation
Global Data Privacy Regulation
 
Cybersecurity and Data Protection Executive Briefing
Cybersecurity and Data Protection Executive BriefingCybersecurity and Data Protection Executive Briefing
Cybersecurity and Data Protection Executive Briefing
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
 

More from Privacera

Five Elements of Effective Data Access Governance
Five Elements of Effective Data Access Governance  Five Elements of Effective Data Access Governance
Five Elements of Effective Data Access Governance Privacera
 
Fortifying Data Access and Security Controls to Accelerate Cloud Migration
Fortifying Data Access and Security Controls to Accelerate Cloud MigrationFortifying Data Access and Security Controls to Accelerate Cloud Migration
Fortifying Data Access and Security Controls to Accelerate Cloud MigrationPrivacera
 
Closing the Governance Gap - Enabling Governed Self-Service Analytics
Closing the Governance Gap  - Enabling Governed Self-Service AnalyticsClosing the Governance Gap  - Enabling Governed Self-Service Analytics
Closing the Governance Gap - Enabling Governed Self-Service AnalyticsPrivacera
 
Data & the Machine Sofa Summit
Data & the Machine Sofa Summit Data & the Machine Sofa Summit
Data & the Machine Sofa Summit Privacera
 
Maturing Your Organization's Information Risk Management Strategy
Maturing Your Organization's Information Risk Management StrategyMaturing Your Organization's Information Risk Management Strategy
Maturing Your Organization's Information Risk Management StrategyPrivacera
 
Why enterprise data privacy and security matters more than ever before
Why enterprise data privacy and security matters more than ever beforeWhy enterprise data privacy and security matters more than ever before
Why enterprise data privacy and security matters more than ever beforePrivacera
 
History of Privacera
History of PrivaceraHistory of Privacera
History of PrivaceraPrivacera
 
How to streamline data governance and security across on-prem and cloud?
How to streamline data governance and security across on-prem and cloud?How to streamline data governance and security across on-prem and cloud?
How to streamline data governance and security across on-prem and cloud?Privacera
 
Privacera Databricks CCPA Webinar Feb 2020
Privacera Databricks CCPA Webinar Feb 2020Privacera Databricks CCPA Webinar Feb 2020
Privacera Databricks CCPA Webinar Feb 2020Privacera
 
Privacera Product Overview: Secure Data Sharing Across Cloud Services
Privacera Product Overview: Secure Data Sharing Across Cloud ServicesPrivacera Product Overview: Secure Data Sharing Across Cloud Services
Privacera Product Overview: Secure Data Sharing Across Cloud ServicesPrivacera
 

More from Privacera (10)

Five Elements of Effective Data Access Governance
Five Elements of Effective Data Access Governance  Five Elements of Effective Data Access Governance
Five Elements of Effective Data Access Governance
 
Fortifying Data Access and Security Controls to Accelerate Cloud Migration
Fortifying Data Access and Security Controls to Accelerate Cloud MigrationFortifying Data Access and Security Controls to Accelerate Cloud Migration
Fortifying Data Access and Security Controls to Accelerate Cloud Migration
 
Closing the Governance Gap - Enabling Governed Self-Service Analytics
Closing the Governance Gap  - Enabling Governed Self-Service AnalyticsClosing the Governance Gap  - Enabling Governed Self-Service Analytics
Closing the Governance Gap - Enabling Governed Self-Service Analytics
 
Data & the Machine Sofa Summit
Data & the Machine Sofa Summit Data & the Machine Sofa Summit
Data & the Machine Sofa Summit
 
Maturing Your Organization's Information Risk Management Strategy
Maturing Your Organization's Information Risk Management StrategyMaturing Your Organization's Information Risk Management Strategy
Maturing Your Organization's Information Risk Management Strategy
 
Why enterprise data privacy and security matters more than ever before
Why enterprise data privacy and security matters more than ever beforeWhy enterprise data privacy and security matters more than ever before
Why enterprise data privacy and security matters more than ever before
 
History of Privacera
History of PrivaceraHistory of Privacera
History of Privacera
 
How to streamline data governance and security across on-prem and cloud?
How to streamline data governance and security across on-prem and cloud?How to streamline data governance and security across on-prem and cloud?
How to streamline data governance and security across on-prem and cloud?
 
Privacera Databricks CCPA Webinar Feb 2020
Privacera Databricks CCPA Webinar Feb 2020Privacera Databricks CCPA Webinar Feb 2020
Privacera Databricks CCPA Webinar Feb 2020
 
Privacera Product Overview: Secure Data Sharing Across Cloud Services
Privacera Product Overview: Secure Data Sharing Across Cloud ServicesPrivacera Product Overview: Secure Data Sharing Across Cloud Services
Privacera Product Overview: Secure Data Sharing Across Cloud Services
 

Recently uploaded

Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 

Recently uploaded (17)

Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 

Privacera and Northwestern Mutual - Scaling Privacy in a Spark Ecosystem

  • 1. Scaling Privacy with Apache Spark Aaron Colcord Sr. Director Engineering, Northwestern Mutual Don Durai Bosco CTO and Co-Founder, Privacera
  • 2. Agenda ▪ Our background ▪ Why privacy, security, compliance? ▪ Approaches ▪ Ideal problem solve ▪ Real life meets ideal life
  • 3. Backgrounds ▪ Building an Enterprise Scale Unified Framework ▪ Very Long, Respected History ~ 160 Years ▪ Compliance is extremely important to us ▪ Agile Data vs Compliant Data ▪ Founded in 2016 by the creators of Apache Ranger & Apache Atlas ▪ Extends Ranger's capabilities beyond traditional Big Data environments to cloud (Databricks, AWS, Azure, GCP, and more) ▪ Specializes in democratizing data for analytics, while ensuring compliance with privacy regulations (GDPR, CCPA, LGPD, HIPAA, & more) • Privacera • Northwestern Mutual
  • 4. Why do we suddenly care about privacy? • You care if you are regulated in any form • Simple you need to show you can pass an audit • You care if you store any information about your users • Simple because governments have woken up with GDPR and CCPA • You care if you want to democratize your data • Simple because the use of your data can be scrutinized We always did, but technology got ahead of privacy. Privacy is often this assumed competency, and technology really showed how important it was.
  • 5. Have you ever... • Collecting information about your customers can • Improve the experience • Allow the company to understand their business better • At the core, privacy is a policy and legal obligation • You have the data, it used to be your business to just secure it. • Do you want your information monetized? Sold? Traded? • Most companies don’t do this. But the privacy policy is there for you. • Clicked ‘accept all’ on website, used a digital assistant.. Gone to a website and read their privacy policy, clicked accept cookies, accepted terms of service, or EULA?
  • 6. And it’s only going to pick up speed. • More Regulations are arriving around privacy • Increasing your ability to execute against data means respecting your user’s rights • A part of maturity is being able to manage governance
  • 7. More importantly, why do we care so much? • Technology like Apache Spark opens the capability to democratize your data. • Most every company wants the marketplace to enrich and share their data. • Who inside that company can view it? Do we have the controls to protect your information? Can we verify that the information is used for the right purposes?
  • 8. What is the difference between these? ▪ Preventing unauthorized usage of systems ▪ Ensuring users don’t see the incorrect information ▪ Creating boundaries to enforce right action of the system • The process of making sure your company and employees follow all laws, regulations, standards, and ethical practices that apply to your organization • Compliance • Security • “Data privacy may be defined as the authorized, fair, and legitimate processing of personal information” • Consent rights • Do not share • Slippery space • Privacy
  • 9. Examine strategies to scale agile data w/privacy • Build a metadata layer that defines PII in its schema • Users and developers can and will change where PII is stored • You can literally chase people to do the ‘right thing’ forever • You could build views with permissions to certain users • Not very scalable • Plus you need to always show who accessed and why • Are these security scenario?
  • 10. Challenges to that strategy • Is the metadata layer flexible enough or should we think in policies? • Privacy is inherently your organization’s position which may evolve based on regulation • Can your development keep up with views? • When you discover the extra 10,000 fields, can you keep up? • Implement a framework that scales • Security is not Privacy. • Security has a different domain and set of principles. • Remember we are protecting the usage of your data.
  • 11. How can we solve it?
  • 12. Ideal scalable system ▪ Revocation of Consent ▪ Portability ▪ Erasure ▪ Rectification ▪ How is data used? ▪ Rights follow Data Reuse ▪ Flexible to change ▪ Should align with a Data Governance program ▪ Should adapt to changing data ▪ Proactive. ▪ Reclassification • Classification • User Rights ▪ How was it used? ▪ How was it accessed? ▪ How was it protected? ▪ Did it cross borders? • Audit/Governance ▪ Authorization of User may change ▪ Supports Agile Access ▪ Business Use is preserved ▪ Automated Systems obey Privacy • Access
  • 13. User Rights at Scale ▪ Revocation of Consent/ Right To Be Forgotten ▪ Portability ▪ Erasure ▪ Rectification ▪ How is data used? ▪ Rights follow Data Reuse ▪ Flexible to change
  • 14. S3 ADLS Redshift Snowflake Synapse Privacy Challenges in Open Data Ecosystem Athena Databricks HDInsight EMR Dremio Trino PrestoDB PowerBI Tableau Storage SQL Engines Data Virtualization BI Tools Marketing Data Analyst Data Scientist/A rchitect
  • 16. Tools & Technology AUTOMATED DATA DISCOVERY CENTRALIZED ACCESS CONTROL AUDIT COLLECTION AND REPORTING
  • 17. Automated Data Discovery ● Automatically detect and catalog sensitive data ● Detailed classification, e.g. EMAIL, SSN, GENDER, CC, PHONE_NUMBER, etc. ● Eliminate manual processes ● Catalog data as it is ingested ● Track data movement and propagate tag ● Catalog data across multiple cloud services
  • 18. Centralized Access Control ● Global Tag/Classification-based policies ● Purpose and Persona based policies ● Dynamic row filters v/s Views ● Dynamic masking or decryption ● Approval workflows with time and purpose constraints
  • 19. Centralized Auditing and Reporting ● Centralize auditing ● Monitoring data access by classification ● Track usage by Purpose ● Generate attestation reports
  • 20. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.