SlideShare a Scribd company logo
Considerations for
Data Access in the
Lakehouse
Zachary Friedman
Product Manager at Immuta
Agenda
Introduction to
Lakehouse Concepts for
Governance
Role-Based Access
Control (RBAC) vs.
Attribute-Based Access
Control (ABAC)
Enterprise-Grade
Authorization in
Databricks SQL Analytics
Data Governance meets the Lakehouse
What is a Lakehouse?
What is a Lakehouse?
■ Let’s do a (brief) history lesson
■ Late 1980’s: the Data Warehouse
■ Early 2010’s: the Data Lake
■ The roaring 20’s: the Data Lakehouse
Key Features of the Lakehouse
Transaction support
Schema enforcement and
governance
BI support
Separate storage from compute
Support for diverse workloads
Scalable security and access control
management
Additional data governance
capabilities such as auditing and
lineage
Data discovery tools such as data
catalogs
Enterprise-Grade Features
Basic Key Attributes
Diving Deeper
Key Concepts for Authorization in the Lakehouse
● Role-Based Access Control (RBAC)
● Attribute-Based Access Control (ABAC)
● Enforcement Point
Role-Based Access Control
Role-Based Access Control (RBAC)
To manage access to resources, group permissions into roles, and assign those roles to users
■ User-Role relationships
■ Role-Permission relationships
Role-Based Access Control (RBAC)
Define a User-Role relationship in Databricks SQL Analytics
■ Manage groups using the Admin Console, Groups API,
or SCIM API
■ Add users to groups and remove them
Role-Based Access Control (RBAC)
Define a Role-Permission relationship in Databricks SQL Analytics
■ Define the access that a role grants to a user
■ At a high level this can be implemented in terms of
the is_member() function
Attribute-Based Access Control
Attribute-Based Access Control (ABAC)
Represent fine-grained or dynamic permissions based on who the user is and their relationship to the
resource they want to access.
■ User relationship to the resource can be expressed
as a JOIN on user attributes and values of a resource
column
Access Control Dimensions in SQL Analytics
Access Control Dimensions
A user can access sales data,
but not financial data
A user can access a particular
sales opportunity, or a sales
opportunity matching certain
conditions
Row
Table
A user can access only certain
fields of a record, and we can
mask the values of a column
depending on the user trying
to access
Column
Me
Just now
You’re going to need a
framework to manage all of
these access controls across
your Enterprise.
Requirements for Enterprise-Grade Access Controls
Framework
Individuals can be granted
access to query tables and
views by virtue of:
● membership in a group
(role-based)
● possession of an attribute
(attribute-based)
● request and approval by an
admin
● public access
● individual user selection
● access for a specified
period of time
● access only for a specific
purpose
Individuals can be allowed to see
rows in a dataset based on:
● membership in a group with a
corresponding column value
with that group
● possession of an attribute with
a corresponding column value
with that attribute
● filter based on a time column,
so users are entitled to query
only rows with a specific
recency requirement
Row-level policies
Table-level policies
Different users see different
values in specific columns by
virtue of the above discussed
roles, attributes, and purposes;
examples include:
● Masking a column to NULL
● Masking a column using
hashing
● Masking a column to a
constant string
● Other advanced PETs and
Differential Privacy
Column-level policies
Users who are part of the Active Directory
group called finance are allowed to read
profit loss data.
Provided we’ve kept our groups in sync
between our corporate directory and
Databricks, using either the Admin Console,
Groups API, or SCIM API, then we can solve
this requirement simply with:
GRANT SELECT ON TABLE
accounting.profit_loss_statement
TO finance;
Framework for Managing Table-level Access Controls
Users with the attribute executive are
allowed to read sales data.
This one is a bit more complex. First, we
need to store a (user, name, value) triple
in some sort of attributes table.
Next, we’ll actually need to create a
secure view on top of the original table,
since we can’t pass a WHERE clause as a
principle, only user or group.
ABAC
RBAC
Solving for ABAC in our Framework
Users with the attribute executive are allowed to read sales data.
Solving for ABAC in our Framework
Restrict the user to only be able to view their own personal attributes.
Solving for ABAC in our Framework
Putting it all together. Users with the attribute executive are allowed to read sales data.
Managing Row-level Access Controls
A user can access a particular sales opportunity, or a sales opportunity matching certain conditions.
■ Let’s consider a sales dataset that has a territory
column, and we only want users with the attribute
territory to be able to see rows with the
corresponding value in the territory column
fct_sales
sale_id amount territory
1 1000000 US-EAST
2 150000 US-EAST
3 175000 EU
4 800000 APAC
5 50000 US-WEST
6 75000 US-CENTRAL
7 50000 US-EAST
Row-level ABAC
A user can access a particular sales opportunity, or a sales opportunity matching certain conditions.
Row-level ABAC
A user can access a particular sales opportunity, or a sales opportunity matching certain conditions.
sec_fct_sales
visible sale_id amount territory
YES 1 1000000 US-EAST
YES 2 150000 US-EAST
NO 3 175000 EU
NO 4 800000 APAC
NO 5 50000 US-WEST
NO 6 75000 US-CENTRAL
YES 7 50000 US-EAST
Column-level Masking
Only executives can see the amount of a sale.
sec_fct_sales
visible sale_id amount territory
YES 1 1000000 US-EAST
YES 2 150000 US-EAST
NO 3 175000 EU
NO 4 800000 APAC
NO 5 50000 US-WEST
NO 6 75000 US-CENTRAL
YES 7 50000 US-EAST
sec_fct_sales (for user without the executive attribute)
visible sale_id amount territory
YES 1 NULL US-EAST
YES 2 NULL US-EAST
NO 3 NULL EU
NO 4 NULL APAC
NO 5 NULL US-WEST
NO 6 NULL US-CENTRAL
YES 7 NULL US-EAST
Thanks for coming to my
talk. My name is Zachary
and I’m a product
manager at Immuta,
which provides an
Enterprise-grade access
controls platform to Data
teams just like this. AMA!
Thank You!
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

What's hot

Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
Sergio Zenatti Filho
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
Rajesh Kumar
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
Adam Doyle
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
HostedbyConfluent
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Dr. Arif Wider
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
Databricks
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
Databricks
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
Databricks
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
Dalibor Wijas
 

What's hot (20)

Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 

Similar to Considerations for Data Access in the Lakehouse

2022-12-02 Trailblazer Winter Coming to the Town.pptx
2022-12-02 Trailblazer Winter Coming to the Town.pptx2022-12-02 Trailblazer Winter Coming to the Town.pptx
2022-12-02 Trailblazer Winter Coming to the Town.pptx
Jihun Jung
 
Hovitaga authorization concept and setup guide
Hovitaga authorization concept and setup guideHovitaga authorization concept and setup guide
Hovitaga authorization concept and setup guide
Hovitaga Kft.
 
Salesforce talk
Salesforce talkSalesforce talk
Salesforce talk
Suvendu Roy
 
An expert guide to new sap bi security features
An expert guide to new sap bi security featuresAn expert guide to new sap bi security features
An expert guide to new sap bi security featuresShazia_Sultana
 
SAP BI Security Features
SAP BI Security FeaturesSAP BI Security Features
SAP BI Security Features
dw_anil
 
Introducing Visualforce
Introducing VisualforceIntroducing Visualforce
Introducing Visualforce
Mohammed Safwat Abu Kwaik
 
Best salesforce training Institute in Hyderabad
Best salesforce training Institute in HyderabadBest salesforce training Institute in Hyderabad
Best salesforce training Institute in Hyderabad
N Benchmark IT Solutions
 
Global Azure Bootcamp 2018 - Oh no my organization went Azure
Global Azure Bootcamp 2018 - Oh no my organization went AzureGlobal Azure Bootcamp 2018 - Oh no my organization went Azure
Global Azure Bootcamp 2018 - Oh no my organization went Azure
Karim Vaes
 
Bi requirements checklist
Bi requirements checklistBi requirements checklist
Bi requirements checklist
Henrique Ravanelli Martins
 
SAP BI 7 security concepts
SAP BI 7 security conceptsSAP BI 7 security concepts
SAP BI 7 security concepts
Siva Pradeep Bolisetti
 
SAP Business Objects Trianing
SAP Business Objects TrianingSAP Business Objects Trianing
Obiee interview questions and answers faq
Obiee interview questions and answers faqObiee interview questions and answers faq
Obiee interview questions and answers faqmaheshboggula
 
Scalable security modeling sap bw analysis authorizations
Scalable security modeling   sap bw analysis authorizationsScalable security modeling   sap bw analysis authorizations
Scalable security modeling sap bw analysis authorizations
Pallavi Koppula
 
Our API Evolution: From Metadata to Tooling API for Building Incredible Apps
Our API Evolution: From Metadata to Tooling API for Building Incredible AppsOur API Evolution: From Metadata to Tooling API for Building Incredible Apps
Our API Evolution: From Metadata to Tooling API for Building Incredible Apps
Dreamforce
 
OBIEE Interview Questions
OBIEE Interview QuestionsOBIEE Interview Questions
OBIEE Interview Questions
mrinalsingh385
 
Salesforce Spring 20 Highlights
Salesforce Spring 20 HighlightsSalesforce Spring 20 Highlights
Salesforce Spring 20 Highlights
Nishant Singh Panwar
 
8034.ppt
8034.ppt8034.ppt
8034.ppt
ssuser77162c
 
Building Modern Data Platform with AWS
Building Modern Data Platform with AWSBuilding Modern Data Platform with AWS
Building Modern Data Platform with AWS
Dmitry Anoshin
 

Similar to Considerations for Data Access in the Lakehouse (20)

2022-12-02 Trailblazer Winter Coming to the Town.pptx
2022-12-02 Trailblazer Winter Coming to the Town.pptx2022-12-02 Trailblazer Winter Coming to the Town.pptx
2022-12-02 Trailblazer Winter Coming to the Town.pptx
 
Hovitaga authorization concept and setup guide
Hovitaga authorization concept and setup guideHovitaga authorization concept and setup guide
Hovitaga authorization concept and setup guide
 
Salesforce talk
Salesforce talkSalesforce talk
Salesforce talk
 
Casa engl
Casa englCasa engl
Casa engl
 
An expert guide to new sap bi security features
An expert guide to new sap bi security featuresAn expert guide to new sap bi security features
An expert guide to new sap bi security features
 
SAP BI Security Features
SAP BI Security FeaturesSAP BI Security Features
SAP BI Security Features
 
Introducing Visualforce
Introducing VisualforceIntroducing Visualforce
Introducing Visualforce
 
Best salesforce training Institute in Hyderabad
Best salesforce training Institute in HyderabadBest salesforce training Institute in Hyderabad
Best salesforce training Institute in Hyderabad
 
Global Azure Bootcamp 2018 - Oh no my organization went Azure
Global Azure Bootcamp 2018 - Oh no my organization went AzureGlobal Azure Bootcamp 2018 - Oh no my organization went Azure
Global Azure Bootcamp 2018 - Oh no my organization went Azure
 
Bi requirements checklist
Bi requirements checklistBi requirements checklist
Bi requirements checklist
 
SAP BI 7 security concepts
SAP BI 7 security conceptsSAP BI 7 security concepts
SAP BI 7 security concepts
 
Oracle_Procurement_Cloud_Release_8_Whats_New
Oracle_Procurement_Cloud_Release_8_Whats_NewOracle_Procurement_Cloud_Release_8_Whats_New
Oracle_Procurement_Cloud_Release_8_Whats_New
 
SAP Business Objects Trianing
SAP Business Objects TrianingSAP Business Objects Trianing
SAP Business Objects Trianing
 
Obiee interview questions and answers faq
Obiee interview questions and answers faqObiee interview questions and answers faq
Obiee interview questions and answers faq
 
Scalable security modeling sap bw analysis authorizations
Scalable security modeling   sap bw analysis authorizationsScalable security modeling   sap bw analysis authorizations
Scalable security modeling sap bw analysis authorizations
 
Our API Evolution: From Metadata to Tooling API for Building Incredible Apps
Our API Evolution: From Metadata to Tooling API for Building Incredible AppsOur API Evolution: From Metadata to Tooling API for Building Incredible Apps
Our API Evolution: From Metadata to Tooling API for Building Incredible Apps
 
OBIEE Interview Questions
OBIEE Interview QuestionsOBIEE Interview Questions
OBIEE Interview Questions
 
Salesforce Spring 20 Highlights
Salesforce Spring 20 HighlightsSalesforce Spring 20 Highlights
Salesforce Spring 20 Highlights
 
8034.ppt
8034.ppt8034.ppt
8034.ppt
 
Building Modern Data Platform with AWS
Building Modern Data Platform with AWSBuilding Modern Data Platform with AWS
Building Modern Data Platform with AWS
 

More from Databricks

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 

More from Databricks (20)

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 

Considerations for Data Access in the Lakehouse

  • 1. Considerations for Data Access in the Lakehouse Zachary Friedman Product Manager at Immuta
  • 2. Agenda Introduction to Lakehouse Concepts for Governance Role-Based Access Control (RBAC) vs. Attribute-Based Access Control (ABAC) Enterprise-Grade Authorization in Databricks SQL Analytics
  • 3. Data Governance meets the Lakehouse
  • 4. What is a Lakehouse?
  • 5. What is a Lakehouse? ■ Let’s do a (brief) history lesson ■ Late 1980’s: the Data Warehouse ■ Early 2010’s: the Data Lake ■ The roaring 20’s: the Data Lakehouse
  • 6. Key Features of the Lakehouse Transaction support Schema enforcement and governance BI support Separate storage from compute Support for diverse workloads Scalable security and access control management Additional data governance capabilities such as auditing and lineage Data discovery tools such as data catalogs Enterprise-Grade Features Basic Key Attributes
  • 8. Key Concepts for Authorization in the Lakehouse ● Role-Based Access Control (RBAC) ● Attribute-Based Access Control (ABAC) ● Enforcement Point
  • 10. Role-Based Access Control (RBAC) To manage access to resources, group permissions into roles, and assign those roles to users ■ User-Role relationships ■ Role-Permission relationships
  • 11. Role-Based Access Control (RBAC) Define a User-Role relationship in Databricks SQL Analytics ■ Manage groups using the Admin Console, Groups API, or SCIM API ■ Add users to groups and remove them
  • 12. Role-Based Access Control (RBAC) Define a Role-Permission relationship in Databricks SQL Analytics ■ Define the access that a role grants to a user ■ At a high level this can be implemented in terms of the is_member() function
  • 14. Attribute-Based Access Control (ABAC) Represent fine-grained or dynamic permissions based on who the user is and their relationship to the resource they want to access. ■ User relationship to the resource can be expressed as a JOIN on user attributes and values of a resource column
  • 15. Access Control Dimensions in SQL Analytics
  • 16. Access Control Dimensions A user can access sales data, but not financial data A user can access a particular sales opportunity, or a sales opportunity matching certain conditions Row Table A user can access only certain fields of a record, and we can mask the values of a column depending on the user trying to access Column
  • 17. Me Just now You’re going to need a framework to manage all of these access controls across your Enterprise.
  • 18. Requirements for Enterprise-Grade Access Controls Framework Individuals can be granted access to query tables and views by virtue of: ● membership in a group (role-based) ● possession of an attribute (attribute-based) ● request and approval by an admin ● public access ● individual user selection ● access for a specified period of time ● access only for a specific purpose Individuals can be allowed to see rows in a dataset based on: ● membership in a group with a corresponding column value with that group ● possession of an attribute with a corresponding column value with that attribute ● filter based on a time column, so users are entitled to query only rows with a specific recency requirement Row-level policies Table-level policies Different users see different values in specific columns by virtue of the above discussed roles, attributes, and purposes; examples include: ● Masking a column to NULL ● Masking a column using hashing ● Masking a column to a constant string ● Other advanced PETs and Differential Privacy Column-level policies
  • 19. Users who are part of the Active Directory group called finance are allowed to read profit loss data. Provided we’ve kept our groups in sync between our corporate directory and Databricks, using either the Admin Console, Groups API, or SCIM API, then we can solve this requirement simply with: GRANT SELECT ON TABLE accounting.profit_loss_statement TO finance; Framework for Managing Table-level Access Controls Users with the attribute executive are allowed to read sales data. This one is a bit more complex. First, we need to store a (user, name, value) triple in some sort of attributes table. Next, we’ll actually need to create a secure view on top of the original table, since we can’t pass a WHERE clause as a principle, only user or group. ABAC RBAC
  • 20. Solving for ABAC in our Framework Users with the attribute executive are allowed to read sales data.
  • 21. Solving for ABAC in our Framework Restrict the user to only be able to view their own personal attributes.
  • 22. Solving for ABAC in our Framework Putting it all together. Users with the attribute executive are allowed to read sales data.
  • 23. Managing Row-level Access Controls A user can access a particular sales opportunity, or a sales opportunity matching certain conditions. ■ Let’s consider a sales dataset that has a territory column, and we only want users with the attribute territory to be able to see rows with the corresponding value in the territory column
  • 24. fct_sales sale_id amount territory 1 1000000 US-EAST 2 150000 US-EAST 3 175000 EU 4 800000 APAC 5 50000 US-WEST 6 75000 US-CENTRAL 7 50000 US-EAST
  • 25. Row-level ABAC A user can access a particular sales opportunity, or a sales opportunity matching certain conditions.
  • 26. Row-level ABAC A user can access a particular sales opportunity, or a sales opportunity matching certain conditions.
  • 27. sec_fct_sales visible sale_id amount territory YES 1 1000000 US-EAST YES 2 150000 US-EAST NO 3 175000 EU NO 4 800000 APAC NO 5 50000 US-WEST NO 6 75000 US-CENTRAL YES 7 50000 US-EAST
  • 28. Column-level Masking Only executives can see the amount of a sale.
  • 29. sec_fct_sales visible sale_id amount territory YES 1 1000000 US-EAST YES 2 150000 US-EAST NO 3 175000 EU NO 4 800000 APAC NO 5 50000 US-WEST NO 6 75000 US-CENTRAL YES 7 50000 US-EAST
  • 30. sec_fct_sales (for user without the executive attribute) visible sale_id amount territory YES 1 NULL US-EAST YES 2 NULL US-EAST NO 3 NULL EU NO 4 NULL APAC NO 5 NULL US-WEST NO 6 NULL US-CENTRAL YES 7 NULL US-EAST
  • 31. Thanks for coming to my talk. My name is Zachary and I’m a product manager at Immuta, which provides an Enterprise-grade access controls platform to Data teams just like this. AMA! Thank You!
  • 32. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.