SlideShare a Scribd company logo
Scaling Privacy with
Apache Spark
Aaron Colcord
Sr. Director Engineering, Northwestern Mutual
Don Durai Bosco
CTO and Co-Founder, Privacera
Agenda
▪ Our background
▪ Why privacy, security,
compliance?
▪ Approaches
▪ Ideal problem solve
▪ Real life meets ideal life
Backgrounds
▪ Building an Enterprise Scale Unified
Framework
▪ Very Long, Respected History ~ 160 Years
▪ Compliance is extremely important to us
▪ Agile Data vs Compliant Data
▪ Founded in 2016 by the creators of Apache
Ranger & Apache Atlas
▪ Extends Ranger's capabilities beyond traditional
Big Data environments to cloud (Databricks,
AWS, Azure, GCP, and more)
▪ Specializes in democratizing data for analytics,
while ensuring compliance with privacy
regulations (GDPR, CCPA, LGPD, HIPAA, & more)
• Privacera
• Northwestern Mutual
Why do we suddenly care about privacy?
• You care if you are regulated in any form
• Simple you need to show you can pass an audit
• You care if you store any information about your users
• Simple because governments have woken up with GDPR and CCPA
• You care if you want to democratize your data
• Simple because the use of your data can be scrutinized
We always did, but technology got ahead of privacy. Privacy is often this assumed competency, and
technology really showed how important it was.
Have you ever...
• Collecting information about your customers can
• Improve the experience
• Allow the company to understand their business better
• At the core, privacy is a policy and legal obligation
• You have the data, it used to be your business to just secure it.
• Do you want your information monetized? Sold? Traded?
• Most companies don’t do this. But the privacy policy is there for you.
• Clicked ‘accept all’ on website, used a digital assistant..
Gone to a website and read their privacy policy, clicked accept cookies, accepted terms of service, or
EULA?
And it’s only going to pick up speed.
• More Regulations are arriving around privacy
• Increasing your ability to execute against data means respecting your user’s rights
• A part of maturity is being able to manage governance
More importantly, why do we care so much?
• Technology like Apache Spark opens the capability to
democratize your data.
• Most every company wants the marketplace to enrich
and share their data.
• Who inside that company can view it? Do we have the
controls to protect your information? Can we verify
that the information is used for the right purposes?
What is the difference between these?
▪ Preventing unauthorized
usage of systems
▪ Ensuring users don’t see the
incorrect information
▪ Creating boundaries to
enforce right action of the
system
• The process of making sure
your company and
employees follow all laws,
regulations, standards, and
ethical practices that apply
to your organization
• Compliance
• Security
• “Data privacy may be
defined as the authorized,
fair, and legitimate
processing of personal
information”
• Consent rights
• Do not share
• Slippery space
• Privacy
Examine strategies to scale agile data w/privacy
• Build a metadata layer that defines PII in its schema
• Users and developers can and will change where PII is stored
• You can literally chase people to do the ‘right thing’ forever
• You could build views with permissions to certain users
• Not very scalable
• Plus you need to always show who accessed and why
• Are these security scenario?
Challenges to that strategy
• Is the metadata layer flexible enough or should we think in policies?
• Privacy is inherently your organization’s position which may evolve based on regulation
• Can your development keep up with views?
• When you discover the extra 10,000 fields, can you keep up?
• Implement a framework that scales
• Security is not Privacy.
• Security has a different domain and set of principles.
• Remember we are protecting the usage of your data.
How can we solve it?
Ideal scalable system
▪ Revocation of
Consent
▪ Portability
▪ Erasure
▪ Rectification
▪ How is data used?
▪ Rights follow Data
Reuse
▪ Flexible to change
▪ Should align with a
Data Governance
program
▪ Should adapt to
changing data
▪ Proactive.
▪ Reclassification
• Classification
• User Rights
▪ How was it used?
▪ How was it
accessed?
▪ How was it
protected?
▪ Did it cross
borders?
• Audit/Governance
▪ Authorization of
User may change
▪ Supports Agile
Access
▪ Business Use is
preserved
▪ Automated
Systems obey
Privacy
• Access
User Rights at Scale
▪Revocation of Consent/ Right To Be Forgotten
▪Portability
▪Erasure
▪Rectification
▪How is data used?
▪Rights follow Data Reuse
▪Flexible to change
S3 ADLS Redshift Snowflake Synapse
Privacy Challenges in Open Data Ecosystem
Athena Databricks HDInsight
EMR
Dremio Trino PrestoDB
PowerBI Tableau
Storage
SQL Engines
Data Virtualization
BI Tools
Marketing
Data
Analyst
Data
Scientist/A
rchitect
Governance blind spot
Tools & Technology
AUTOMATED DATA DISCOVERY CENTRALIZED ACCESS CONTROL
AUDIT COLLECTION AND REPORTING
Automated Data Discovery
● Automatically detect and catalog sensitive
data
● Detailed classification, e.g. EMAIL, SSN,
GENDER, CC, PHONE_NUMBER, etc.
● Eliminate manual processes
● Catalog data as it is ingested
● Track data movement and propagate tag
● Catalog data across multiple cloud
services
Centralized Access Control
● Global Tag/Classification-based policies
● Purpose and Persona based policies
● Dynamic row filters v/s Views
● Dynamic masking or decryption
● Approval workflows with time and
purpose constraints
Centralized Auditing and Reporting
● Centralize auditing
● Monitoring data access by classification
● Track usage by Purpose
● Generate attestation reports
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

What's hot

Introduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarIntroduction to Azure Synapse Webinar
Introduction to Azure Synapse Webinar
Peter Ward
 
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Cathrine Wilhelmsen
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
Auckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data LakeAuckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data Lake
Sergio Zenatti Filho
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Databricks
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret Weapon
Databricks
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
Nilesh Shah
 
Using Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on DatabricksUsing Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on Databricks
Databricks
 
From Events to Networks: Time Series Analysis on Scale
From Events to Networks: Time Series Analysis on ScaleFrom Events to Networks: Time Series Analysis on Scale
From Events to Networks: Time Series Analysis on Scale
Dr. Mirko Kämpf
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
Timothy McAliley
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
Sergio Zenatti Filho
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Rakesh Jayaram
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
Sudheer Kondla
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)
Daniel Toomey
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
Cloudera, Inc.
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure Synapse
Nilesh Gule
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
Matthew W. Bowers
 
The Power of Data
The Power of DataThe Power of Data
The Power of Data
DataWorks Summit
 

What's hot (20)

Introduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarIntroduction to Azure Synapse Webinar
Introduction to Azure Synapse Webinar
 
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Auckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data LakeAuckland SQL Saturday - Azure Data Lake
Auckland SQL Saturday - Azure Data Lake
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret Weapon
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
 
Using Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on DatabricksUsing Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on Databricks
 
From Events to Networks: Time Series Analysis on Scale
From Events to Networks: Time Series Analysis on ScaleFrom Events to Networks: Time Series Analysis on Scale
From Events to Networks: Time Series Analysis on Scale
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure Synapse
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
The Power of Data
The Power of DataThe Power of Data
The Power of Data
 

Similar to Scaling Privacy in a Spark Ecosystem

Privacera and Northwestern Mutual - Scaling Privacy in a Spark Ecosystem
Privacera and Northwestern Mutual  - Scaling Privacy in a Spark EcosystemPrivacera and Northwestern Mutual  - Scaling Privacy in a Spark Ecosystem
Privacera and Northwestern Mutual - Scaling Privacy in a Spark Ecosystem
Privacera
 
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Cloudera, Inc.
 
Microsoft Cloud GDPR Compliance Options (SUGUK)
Microsoft Cloud GDPR Compliance Options (SUGUK)Microsoft Cloud GDPR Compliance Options (SUGUK)
Microsoft Cloud GDPR Compliance Options (SUGUK)
Andy Talbot
 
cloud session uklug
cloud session uklugcloud session uklug
cloud session uklugdominion
 
GDPR - Why it matters and how to make it Easy
GDPR - Why it matters and how to make it EasyGDPR - Why it matters and how to make it Easy
GDPR - Why it matters and how to make it Easy
Paul McQuillan
 
SharePoint Governance 101 SPSSA2016
SharePoint Governance 101  SPSSA2016SharePoint Governance 101  SPSSA2016
SharePoint Governance 101 SPSSA2016
Jim Adcock
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
Caserta
 
Data Loss Prevention in O365
Data Loss Prevention in O365Data Loss Prevention in O365
Data Loss Prevention in O365
Don Daubert
 
CRMCS GDPR - Why it matters and how to make it Easy
CRMCS   GDPR - Why it matters and how to make it EasyCRMCS   GDPR - Why it matters and how to make it Easy
CRMCS GDPR - Why it matters and how to make it Easy
Paul McQuillan
 
SharePoint Governance 101 - Austin O365 & SharePoint User Group
SharePoint Governance 101  - Austin O365 & SharePoint User GroupSharePoint Governance 101  - Austin O365 & SharePoint User Group
SharePoint Governance 101 - Austin O365 & SharePoint User Group
Jim Adcock
 
Fuse Analytics - HR & Payroll Cloud Transformation Pitfalls, Lessons Learned
 Fuse Analytics - HR & Payroll Cloud Transformation Pitfalls, Lessons Learned Fuse Analytics - HR & Payroll Cloud Transformation Pitfalls, Lessons Learned
Fuse Analytics - HR & Payroll Cloud Transformation Pitfalls, Lessons Learned
Charles Eubanks
 
SharePoint Governance 101 - OKCSUG
SharePoint Governance 101 - OKCSUGSharePoint Governance 101 - OKCSUG
SharePoint Governance 101 - OKCSUG
Jim Adcock
 
Webinar - Compliance with the Microsoft Cloud- 2017-04-19
Webinar - Compliance with the Microsoft Cloud- 2017-04-19Webinar - Compliance with the Microsoft Cloud- 2017-04-19
Webinar - Compliance with the Microsoft Cloud- 2017-04-19
TechSoup
 
data_blending
data_blendingdata_blending
data_blendingsubit1615
 
Cybersecurity and Data Protection Executive Briefing
Cybersecurity and Data Protection Executive BriefingCybersecurity and Data Protection Executive Briefing
Cybersecurity and Data Protection Executive Briefing
RFA
 
Global Data Privacy Regulation
Global Data Privacy RegulationGlobal Data Privacy Regulation
Global Data Privacy Regulation
Jatin Kochhar
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
Caserta
 
IDERA Live | Understanding SQL Server Compliance both in the Cloud and On Pre...
IDERA Live | Understanding SQL Server Compliance both in the Cloud and On Pre...IDERA Live | Understanding SQL Server Compliance both in the Cloud and On Pre...
IDERA Live | Understanding SQL Server Compliance both in the Cloud and On Pre...
IDERA Software
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR compliance
Cloudera, Inc.
 
Security, Administration & Governance for SharePoint On-Prem, Online, & Every...
Security, Administration & Governance for SharePoint On-Prem, Online, & Every...Security, Administration & Governance for SharePoint On-Prem, Online, & Every...
Security, Administration & Governance for SharePoint On-Prem, Online, & Every...
Christian Buckley
 

Similar to Scaling Privacy in a Spark Ecosystem (20)

Privacera and Northwestern Mutual - Scaling Privacy in a Spark Ecosystem
Privacera and Northwestern Mutual  - Scaling Privacy in a Spark EcosystemPrivacera and Northwestern Mutual  - Scaling Privacy in a Spark Ecosystem
Privacera and Northwestern Mutual - Scaling Privacy in a Spark Ecosystem
 
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
 
Microsoft Cloud GDPR Compliance Options (SUGUK)
Microsoft Cloud GDPR Compliance Options (SUGUK)Microsoft Cloud GDPR Compliance Options (SUGUK)
Microsoft Cloud GDPR Compliance Options (SUGUK)
 
cloud session uklug
cloud session uklugcloud session uklug
cloud session uklug
 
GDPR - Why it matters and how to make it Easy
GDPR - Why it matters and how to make it EasyGDPR - Why it matters and how to make it Easy
GDPR - Why it matters and how to make it Easy
 
SharePoint Governance 101 SPSSA2016
SharePoint Governance 101  SPSSA2016SharePoint Governance 101  SPSSA2016
SharePoint Governance 101 SPSSA2016
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 
Data Loss Prevention in O365
Data Loss Prevention in O365Data Loss Prevention in O365
Data Loss Prevention in O365
 
CRMCS GDPR - Why it matters and how to make it Easy
CRMCS   GDPR - Why it matters and how to make it EasyCRMCS   GDPR - Why it matters and how to make it Easy
CRMCS GDPR - Why it matters and how to make it Easy
 
SharePoint Governance 101 - Austin O365 & SharePoint User Group
SharePoint Governance 101  - Austin O365 & SharePoint User GroupSharePoint Governance 101  - Austin O365 & SharePoint User Group
SharePoint Governance 101 - Austin O365 & SharePoint User Group
 
Fuse Analytics - HR & Payroll Cloud Transformation Pitfalls, Lessons Learned
 Fuse Analytics - HR & Payroll Cloud Transformation Pitfalls, Lessons Learned Fuse Analytics - HR & Payroll Cloud Transformation Pitfalls, Lessons Learned
Fuse Analytics - HR & Payroll Cloud Transformation Pitfalls, Lessons Learned
 
SharePoint Governance 101 - OKCSUG
SharePoint Governance 101 - OKCSUGSharePoint Governance 101 - OKCSUG
SharePoint Governance 101 - OKCSUG
 
Webinar - Compliance with the Microsoft Cloud- 2017-04-19
Webinar - Compliance with the Microsoft Cloud- 2017-04-19Webinar - Compliance with the Microsoft Cloud- 2017-04-19
Webinar - Compliance with the Microsoft Cloud- 2017-04-19
 
data_blending
data_blendingdata_blending
data_blending
 
Cybersecurity and Data Protection Executive Briefing
Cybersecurity and Data Protection Executive BriefingCybersecurity and Data Protection Executive Briefing
Cybersecurity and Data Protection Executive Briefing
 
Global Data Privacy Regulation
Global Data Privacy RegulationGlobal Data Privacy Regulation
Global Data Privacy Regulation
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
IDERA Live | Understanding SQL Server Compliance both in the Cloud and On Pre...
IDERA Live | Understanding SQL Server Compliance both in the Cloud and On Pre...IDERA Live | Understanding SQL Server Compliance both in the Cloud and On Pre...
IDERA Live | Understanding SQL Server Compliance both in the Cloud and On Pre...
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR compliance
 
Security, Administration & Governance for SharePoint On-Prem, Online, & Every...
Security, Administration & Governance for SharePoint On-Prem, Online, & Every...Security, Administration & Governance for SharePoint On-Prem, Online, & Every...
Security, Administration & Governance for SharePoint On-Prem, Online, & Every...
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 

Recently uploaded (20)

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 

Scaling Privacy in a Spark Ecosystem

  • 1. Scaling Privacy with Apache Spark Aaron Colcord Sr. Director Engineering, Northwestern Mutual Don Durai Bosco CTO and Co-Founder, Privacera
  • 2. Agenda ▪ Our background ▪ Why privacy, security, compliance? ▪ Approaches ▪ Ideal problem solve ▪ Real life meets ideal life
  • 3. Backgrounds ▪ Building an Enterprise Scale Unified Framework ▪ Very Long, Respected History ~ 160 Years ▪ Compliance is extremely important to us ▪ Agile Data vs Compliant Data ▪ Founded in 2016 by the creators of Apache Ranger & Apache Atlas ▪ Extends Ranger's capabilities beyond traditional Big Data environments to cloud (Databricks, AWS, Azure, GCP, and more) ▪ Specializes in democratizing data for analytics, while ensuring compliance with privacy regulations (GDPR, CCPA, LGPD, HIPAA, & more) • Privacera • Northwestern Mutual
  • 4. Why do we suddenly care about privacy? • You care if you are regulated in any form • Simple you need to show you can pass an audit • You care if you store any information about your users • Simple because governments have woken up with GDPR and CCPA • You care if you want to democratize your data • Simple because the use of your data can be scrutinized We always did, but technology got ahead of privacy. Privacy is often this assumed competency, and technology really showed how important it was.
  • 5. Have you ever... • Collecting information about your customers can • Improve the experience • Allow the company to understand their business better • At the core, privacy is a policy and legal obligation • You have the data, it used to be your business to just secure it. • Do you want your information monetized? Sold? Traded? • Most companies don’t do this. But the privacy policy is there for you. • Clicked ‘accept all’ on website, used a digital assistant.. Gone to a website and read their privacy policy, clicked accept cookies, accepted terms of service, or EULA?
  • 6. And it’s only going to pick up speed. • More Regulations are arriving around privacy • Increasing your ability to execute against data means respecting your user’s rights • A part of maturity is being able to manage governance
  • 7. More importantly, why do we care so much? • Technology like Apache Spark opens the capability to democratize your data. • Most every company wants the marketplace to enrich and share their data. • Who inside that company can view it? Do we have the controls to protect your information? Can we verify that the information is used for the right purposes?
  • 8. What is the difference between these? ▪ Preventing unauthorized usage of systems ▪ Ensuring users don’t see the incorrect information ▪ Creating boundaries to enforce right action of the system • The process of making sure your company and employees follow all laws, regulations, standards, and ethical practices that apply to your organization • Compliance • Security • “Data privacy may be defined as the authorized, fair, and legitimate processing of personal information” • Consent rights • Do not share • Slippery space • Privacy
  • 9. Examine strategies to scale agile data w/privacy • Build a metadata layer that defines PII in its schema • Users and developers can and will change where PII is stored • You can literally chase people to do the ‘right thing’ forever • You could build views with permissions to certain users • Not very scalable • Plus you need to always show who accessed and why • Are these security scenario?
  • 10. Challenges to that strategy • Is the metadata layer flexible enough or should we think in policies? • Privacy is inherently your organization’s position which may evolve based on regulation • Can your development keep up with views? • When you discover the extra 10,000 fields, can you keep up? • Implement a framework that scales • Security is not Privacy. • Security has a different domain and set of principles. • Remember we are protecting the usage of your data.
  • 11. How can we solve it?
  • 12. Ideal scalable system ▪ Revocation of Consent ▪ Portability ▪ Erasure ▪ Rectification ▪ How is data used? ▪ Rights follow Data Reuse ▪ Flexible to change ▪ Should align with a Data Governance program ▪ Should adapt to changing data ▪ Proactive. ▪ Reclassification • Classification • User Rights ▪ How was it used? ▪ How was it accessed? ▪ How was it protected? ▪ Did it cross borders? • Audit/Governance ▪ Authorization of User may change ▪ Supports Agile Access ▪ Business Use is preserved ▪ Automated Systems obey Privacy • Access
  • 13. User Rights at Scale ▪Revocation of Consent/ Right To Be Forgotten ▪Portability ▪Erasure ▪Rectification ▪How is data used? ▪Rights follow Data Reuse ▪Flexible to change
  • 14. S3 ADLS Redshift Snowflake Synapse Privacy Challenges in Open Data Ecosystem Athena Databricks HDInsight EMR Dremio Trino PrestoDB PowerBI Tableau Storage SQL Engines Data Virtualization BI Tools Marketing Data Analyst Data Scientist/A rchitect
  • 16. Tools & Technology AUTOMATED DATA DISCOVERY CENTRALIZED ACCESS CONTROL AUDIT COLLECTION AND REPORTING
  • 17. Automated Data Discovery ● Automatically detect and catalog sensitive data ● Detailed classification, e.g. EMAIL, SSN, GENDER, CC, PHONE_NUMBER, etc. ● Eliminate manual processes ● Catalog data as it is ingested ● Track data movement and propagate tag ● Catalog data across multiple cloud services
  • 18. Centralized Access Control ● Global Tag/Classification-based policies ● Purpose and Persona based policies ● Dynamic row filters v/s Views ● Dynamic masking or decryption ● Approval workflows with time and purpose constraints
  • 19. Centralized Auditing and Reporting ● Centralize auditing ● Monitoring data access by classification ● Track usage by Purpose ● Generate attestation reports
  • 20. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.