SlideShare a Scribd company logo
1 of 40
SECURITY UPDATES:
More Seamless Access Controls with
Apache Spark and Apache Ranger
Dongjoon Hyun @ Hortonworks Spark Team
Jason Dere @ Hortonworks Hive Team
June 2017
SECURITY UPDATES:
More Seamless Access Controls with
Apache Spark and Apache Ranger
Dongjoon Hyun
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
Security Issues
Goals
Components
How it works
Demo
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Background – Security
 One of fundamental features for enterprise adoption
– Multi-tenancy: Billing team / Data science team / Marketing teams
 Row and column-level access control for SQL users
– Row filtering
– Column masking
 Must enforce shared policies to various SQL engines simultaneously
– E.g. Apache Spark 2.1/1.6 and Apache Hive 2.1
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Issue 1
 Spark reads all or nothing
 Directory/file-based permissions are insufficient for fine-grained
access control
Apache Spark is a general data processing engine
scala> val textFile = sc.textFile(“/apps/hive/warehouse/…")
textFile: org.apache.spark.rdd.RDD[String] = …
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Issue 2
 Permission 777 on warehouse?
Security starts from storage
Bad
Good
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Issue 3
 New policies for SparkSQL?
 Rewrite Spark apps?
– Special data source tables
 Duplicated data maintained manually
– Filtered rows
– Removed or masked columns
Overhead during starting and maintaining security policies
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
Security Issues
Goals
Components
How it works
Demo
9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Goal 1: Spark SQL Apps
Support row/column-level security with the batch apps
from pyspark.sql import SparkSession
spark = SparkSession 
.builder 
.enableHiveSupport() 
.getOrCreate()
spark.sql("SELECT * FROM db_common.t_customer").show()
db_common
t_customer
…
10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Goal 2: Spark shells (1/2)
Support row/column-level security in all shells
spark-shell
pyspark
11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Goal 2: Spark shells (2/2)
Support row/column-level security in all shells
sparkR
spark-sql
12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Goal 3: Spark Thrift Server
Support row/column-level security with Spark Thrift Server
Login as `billing`
Login as `datascience`
SECURITY UPDATES:
More Seamless Access Controls with
Apache Spark and Apache Ranger
Jason Dere @ Hortonworks Hive Team
14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
Security Issues
Goals
Components
How it works
Demo
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
What are required?
 Apache Ranger
 Apache Hive with LLAP
 Spark-LLAP (Apache License)
– A library and patches to integrate above tech with SparkSQL
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache Ranger
Provide a standard authorization method across many Hadoop components
https://hortonworks.com/apache/ranger/#section_2
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ranger Policies – Column Access
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ranger Policies – Column Masking
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ranger Policies – Row Filtering
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
YARN Cluster
HiveServer2
Client App
Hive Query
Coordinator
SQL Query:
select name from users
1
Apache Hive with LLAP
5
3 4
1.Client sends query to HiveServer2.
2.Query plan generation by HiveServer2.
3.Query plan sent to query coordinator
4.Query plan sent to LLAP daemons for
execution.
5.Results consolidated and sent to client
Plan Generation
TableScan: users
Projection: name
2
LLAP
LLAP
LLAP Daemons
21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hive Security with Ranger
 Seamless integration with Ranger user-level access policies
– Column/row based security policies are applied automatically
– Hive query plans rewritten to apply masking/filtering functions on top of
the base table data.
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
YARN Cluster
HiveServer2
Client App
Hive Query
Coordinator
SQL Query:
select name from users
1
HiveServer2 + LLAP
5
3 4
1.Client sends query to HiveServer2.
2.Query plan generation by HiveServer2.
3.Query plan sent to query coordinator
4.Query plan sent to LLAP daemons for
execution.
5.Results consolidated and sent to client
Plan Generation
TableScan: users
Projection: name
2
LLAP
LLAP
LLAP Daemons
23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
YARN Cluster
HiveServer2
Client App
Hive Query
Coordinator
Plan Generation
TableScan: users
Filter: state = ‘CA’
Projection: mask(name)
SQL Query:
select name from users
1.Client sends query to HiveServer2.
2.Query plan generation by HiveServer2.
Ranger security policies applied. Plan
modified based on dynamic security policies.
3.Query plan sent to query coordinator
4.Query plan sent to LLAP daemons for
execution. Filtering/masking performed.
5.Results consolidated and sent to client
1
HiveServer2 + LLAP + Ranger
Ranger
Dynamic Policies
5 2
3 4
LLAP
LLAP
LLAP Daemons
24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
External LLAP Client
 LLAP Daemon
– Persistent daemons combining query execution and in-memory caching
– External applications also able to use LLAP to retrieve data
• Provide a secure relational datanode view of the data
25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
LLAP
LLAP
LLAP Daemons
YARN Cluster
HiveServer2
Hive Query
Coordinator
Plan Generation
TableScan: users
Projection: name
1.Client requests data locations known as
“splits” from HiveServer2.
2.Query plan generation by HiveServer2.
3.Splits returned to client which include signed
query plan.
4.LLAP splits used by client to securely submit
query plan to LLAP. Data returned to client.
1
External LLAP Client
3 2
4
Client App
LLAP
InputFormat
SQL Query:
select name from users
26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
YARN Cluster
HiveServer2
Client App
Hive Query
Coordinator
Plan Generation
TableScan: users
Filter: state = ‘CA’
Projection: mask(name)
1.Client requests data locations known as
“splits” from HiveServer2.
2.Query plan generation by HiveServer2.
Ranger security policies applied. Plan
modified based on dynamic security policies.
3.Splits returned to client which include signed
query plan.
4.LLAP splits used by client to securely submit
query plan to LLAP. Filtering/masking
performed. Data returned to client.
1
External LLAP Client + Ranger
Ranger
Dynamic Policies
3 2
LLAP
InputFormat
SQL Query:
select name from users
LLAP
LLAP
LLAP Daemons
4
27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
Security Issues
Goals
Components
How it works
Demo
28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Spark-LLAP
 Spark connector library + patches on top of Spark
 Table data read securely through LLAP
 Leverages standard Ranger policies to control per-user
access/masking/filtering of data
29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Spark-LLAP: Credentials
 HDFS Delegation Token
– HDFSCredentialProvider gets it from namenode
 Hive Metastore Delegation Token
– HiveCredentialProvider gets it from Hive Metastore
 HiveServer2 Delegation Token
– HiveServer2CredentialProvider gets it from HiveServer2
Get and renew delegation tokens
Spark-LLAP
Existing
30 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Spark-LLAP: LlapMetastoreCatalog
LlapMetastoreCatalog: Replaces MetastoreRelation with LlapRelation
SELECT gender, count(*)
FROM db_common.t_customer
WHERE name LIKE '%Obama’
GROUP BY gender
LlapRelation
SubqueryAlias
Analyzed Logical Plan
Filter: name like %Obama
Aggregate: gender
UnresolvedRelation
Filter: name like %Obama
Parsed Logical Plan
Aggregate: gender
31 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Spark-LLAP: LlapMetastoreCatalog
LlapMetastoreCatalog: Replaces MetastoreRelation with LlapRelation
Without Spark-LLAP
With Spark-LLAP
32 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
YARN Cluster
HiveServer2
LlapRelation
Hive Query
Coordinator
Plan Generation
TableScan: users
Filter: state = ‘CA’
Projection mask(name)
1
Spark-LLAP: LlapRelation
Ranger
Dynamic Policies
3 2
LLAP
InputFormat
SQL Query:
select name from users
LLAP
LLAP
LLAP Daemons
4
Uses LLAP external client API to read table data
33 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Spark-LLAP: LlapRelation
LlapRelation supports predicate pushdown and column pruning
LlapRelation
SubqueryAlias
Analyzed Logical Plan
Filter: name like %Obama
Aggregate: gender
LlapRelation
Filter: EndsWith(name,Obama)
Optimized Logical Plan
Project: gender
Aggregate: gender
Scan LlapRelation
PushedFilter: StringEndsWith(…)
ReadSchema: gender
Filter: EndsWith(name, Obama)
Physical Plan
Project: gender
HashAggregate: gender
…
34 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Using Spark-LLAP
 spark-submit
--package spark-llap.jar
--conf spark.sql.hive.llap=true
--conf spark.yarn.security.credentials.hiveserver2.enabled=true
--master yarn
--deploy-mode cluster
sql.py
Launch Spark jobs `--package` option is supported, too
Easy to turn on/off
Only used for YARN cluster mode
35 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
Security Issues
Goals
Components
How it works
Demo
36 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Spark-LLAP for Spark 1.6 (TP)
• Use Ranger for SELECT statement
• Use LlapContext
HDP 2.5.X
Milestone
Spark-LLAP for Spark 2.1.0 (TP)
• Use Ranger for more statements (in STS)
• No need to rewrite codes
• Support all languages and shells
HDP 2.6.0 HDP 2.6.1
Spark-LLAP for Spark 2.1.1 (TP)
• Support YARN cluster mode
• Support Hive complex types
Spark-LLAP for Spark 2.2.0
• Available soon in GitHub
HDP X.X.X
37 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Resources
 GitHub
– https://github.com/hortonworks-spark/spark-llap
 Maven
– http://repo.hortonworks.com/content/groups/public/com/hortonworks/spark/spark-
llap_2.11/
 Youtube Demo
– https://www.youtube.com/watch?v=_-oYpQGWm5k (HDP 2.6.1)
 Hortonworks Blog
– https://hortonworks.com/blog/row-column-level-control-apache-spark/
 Hortonworks Community Connection Article
– https://community.hortonworks.com/articles/101181/rowcolumn-level-security-in-sql-for-
apache-spark-2.html
 Support Matrix
– https://github.com/hortonworks-spark/spark-llap/wiki/7.-Support-Matrix
38 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Summary
 Support row/column-level security with
– Spark apps with YARN client/cluster mode
– Spark shells
– Spark Thrift Server
 You can use the existing Spark 2.X SQL apps and scripts
 Easy to turn on/off with only configurations
 Ranger enforces Hive/Spark simultaneously and consistently
Spark-LLAP with HDP 2.6.1 is TP
39 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Acknowledgement
 Apache Hive / Apache Spark / Apache Ranger Community
 Bikas Saha, Mingjie Tang, Saisai Shao, Siddharth Seth, Sergey
Shelukhin, Thejas Nair, Zhan Zhang, and many others
40 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Thank you

More Related Content

What's hot

Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Artem Ervits
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!DataWorks Summit
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies DataWorks Summit/Hadoop Summit
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsDataWorks Summit
 
HAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataHAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataDataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Implementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right WayImplementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right WayDataWorks Summit
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasDataWorks Summit/Hadoop Summit
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Hortonworks
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...DataWorks Summit
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...DataWorks Summit/Hadoop Summit
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?DataWorks Summit/Hadoop Summit
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentDataWorks Summit/Hadoop Summit
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 
Building a data-driven authorization framework
Building a data-driven authorization frameworkBuilding a data-driven authorization framework
Building a data-driven authorization frameworkDataWorks Summit
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...DataWorks Summit/Hadoop Summit
 
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceBenefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceDataWorks Summit/Hadoop Summit
 

What's hot (20)

Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
 
HAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataHAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged Data
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Implementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right WayImplementing Security on a Large Multi-Tenant Cluster the Right Way
Implementing Security on a Large Multi-Tenant Cluster the Right Way
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Building a data-driven authorization framework
Building a data-driven authorization frameworkBuilding a data-driven authorization framework
Building a data-driven authorization framework
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceBenefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business Intelligence
 

Similar to Security Updates: More Seamless Access Controls with Apache Spark and Apache Ranger

Don't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDon't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDataWorks Summit
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPHortonworks
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemDataWorks Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudDataWorks Summit
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4DataWorks Summit
 
Architecting the Future: Abstractions and Metadata - BSidesKC
Architecting the Future: Abstractions and Metadata - BSidesKCArchitecting the Future: Abstractions and Metadata - BSidesKC
Architecting the Future: Abstractions and Metadata - BSidesKCDaniel Barker
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018alanfgates
 
Architecting the Future: Abstractions and Metadata - All Things Open
Architecting the Future: Abstractions and Metadata - All Things OpenArchitecting the Future: Abstractions and Metadata - All Things Open
Architecting the Future: Abstractions and Metadata - All Things OpenDaniel Barker
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
Architecting the Future: Abstractions and Metadata - KCDC
Architecting the Future: Abstractions and Metadata - KCDCArchitecting the Future: Abstractions and Metadata - KCDC
Architecting the Future: Abstractions and Metadata - KCDCDaniel Barker
 
Apache Metron - Profiler
Apache Metron - ProfilerApache Metron - Profiler
Apache Metron - ProfilerNick Allen
 
Architecting the Future: Abstractions and Metadata - CodeStock
Architecting the Future: Abstractions and Metadata - CodeStockArchitecting the Future: Abstractions and Metadata - CodeStock
Architecting the Future: Abstractions and Metadata - CodeStockDaniel Barker
 
Practical advice on deployment and management of enterprise workloads
Practical advice on deployment and management of enterprise workloadsPractical advice on deployment and management of enterprise workloads
Practical advice on deployment and management of enterprise workloadsJarek Miszczyk
 

Similar to Security Updates: More Seamless Access Controls with Apache Spark and Apache Ranger (20)

Don't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDon't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
 
Fine-Grained Security for Spark and Hive
Fine-Grained Security for Spark and HiveFine-Grained Security for Spark and Hive
Fine-Grained Security for Spark and Hive
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
 
Architecting the Future: Abstractions and Metadata - BSidesKC
Architecting the Future: Abstractions and Metadata - BSidesKCArchitecting the Future: Abstractions and Metadata - BSidesKC
Architecting the Future: Abstractions and Metadata - BSidesKC
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
 
Architecting the Future: Abstractions and Metadata - All Things Open
Architecting the Future: Abstractions and Metadata - All Things OpenArchitecting the Future: Abstractions and Metadata - All Things Open
Architecting the Future: Abstractions and Metadata - All Things Open
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
 
Architecting the Future: Abstractions and Metadata - KCDC
Architecting the Future: Abstractions and Metadata - KCDCArchitecting the Future: Abstractions and Metadata - KCDC
Architecting the Future: Abstractions and Metadata - KCDC
 
Apache Metron - Profiler
Apache Metron - ProfilerApache Metron - Profiler
Apache Metron - Profiler
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
Architecting the Future: Abstractions and Metadata - CodeStock
Architecting the Future: Abstractions and Metadata - CodeStockArchitecting the Future: Abstractions and Metadata - CodeStock
Architecting the Future: Abstractions and Metadata - CodeStock
 
Practical advice on deployment and management of enterprise workloads
Practical advice on deployment and management of enterprise workloadsPractical advice on deployment and management of enterprise workloads
Practical advice on deployment and management of enterprise workloads
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Security Updates: More Seamless Access Controls with Apache Spark and Apache Ranger

  • 1. SECURITY UPDATES: More Seamless Access Controls with Apache Spark and Apache Ranger Dongjoon Hyun @ Hortonworks Spark Team Jason Dere @ Hortonworks Hive Team June 2017
  • 2. SECURITY UPDATES: More Seamless Access Controls with Apache Spark and Apache Ranger Dongjoon Hyun
  • 3. 3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Agenda Security Issues Goals Components How it works Demo
  • 4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Background – Security  One of fundamental features for enterprise adoption – Multi-tenancy: Billing team / Data science team / Marketing teams  Row and column-level access control for SQL users – Row filtering – Column masking  Must enforce shared policies to various SQL engines simultaneously – E.g. Apache Spark 2.1/1.6 and Apache Hive 2.1
  • 5. 5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Issue 1  Spark reads all or nothing  Directory/file-based permissions are insufficient for fine-grained access control Apache Spark is a general data processing engine scala> val textFile = sc.textFile(“/apps/hive/warehouse/…") textFile: org.apache.spark.rdd.RDD[String] = …
  • 6. 6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Issue 2  Permission 777 on warehouse? Security starts from storage Bad Good
  • 7. 7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Issue 3  New policies for SparkSQL?  Rewrite Spark apps? – Special data source tables  Duplicated data maintained manually – Filtered rows – Removed or masked columns Overhead during starting and maintaining security policies
  • 8. 8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Agenda Security Issues Goals Components How it works Demo
  • 9. 9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Goal 1: Spark SQL Apps Support row/column-level security with the batch apps from pyspark.sql import SparkSession spark = SparkSession .builder .enableHiveSupport() .getOrCreate() spark.sql("SELECT * FROM db_common.t_customer").show() db_common t_customer …
  • 10. 10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Goal 2: Spark shells (1/2) Support row/column-level security in all shells spark-shell pyspark
  • 11. 11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Goal 2: Spark shells (2/2) Support row/column-level security in all shells sparkR spark-sql
  • 12. 12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Goal 3: Spark Thrift Server Support row/column-level security with Spark Thrift Server Login as `billing` Login as `datascience`
  • 13. SECURITY UPDATES: More Seamless Access Controls with Apache Spark and Apache Ranger Jason Dere @ Hortonworks Hive Team
  • 14. 14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Agenda Security Issues Goals Components How it works Demo
  • 15. 15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved What are required?  Apache Ranger  Apache Hive with LLAP  Spark-LLAP (Apache License) – A library and patches to integrate above tech with SparkSQL
  • 16. 16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache Ranger Provide a standard authorization method across many Hadoop components https://hortonworks.com/apache/ranger/#section_2
  • 17. 17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Ranger Policies – Column Access
  • 18. 18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Ranger Policies – Column Masking
  • 19. 19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Ranger Policies – Row Filtering
  • 20. 20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved YARN Cluster HiveServer2 Client App Hive Query Coordinator SQL Query: select name from users 1 Apache Hive with LLAP 5 3 4 1.Client sends query to HiveServer2. 2.Query plan generation by HiveServer2. 3.Query plan sent to query coordinator 4.Query plan sent to LLAP daemons for execution. 5.Results consolidated and sent to client Plan Generation TableScan: users Projection: name 2 LLAP LLAP LLAP Daemons
  • 21. 21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Hive Security with Ranger  Seamless integration with Ranger user-level access policies – Column/row based security policies are applied automatically – Hive query plans rewritten to apply masking/filtering functions on top of the base table data.
  • 22. 22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved YARN Cluster HiveServer2 Client App Hive Query Coordinator SQL Query: select name from users 1 HiveServer2 + LLAP 5 3 4 1.Client sends query to HiveServer2. 2.Query plan generation by HiveServer2. 3.Query plan sent to query coordinator 4.Query plan sent to LLAP daemons for execution. 5.Results consolidated and sent to client Plan Generation TableScan: users Projection: name 2 LLAP LLAP LLAP Daemons
  • 23. 23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved YARN Cluster HiveServer2 Client App Hive Query Coordinator Plan Generation TableScan: users Filter: state = ‘CA’ Projection: mask(name) SQL Query: select name from users 1.Client sends query to HiveServer2. 2.Query plan generation by HiveServer2. Ranger security policies applied. Plan modified based on dynamic security policies. 3.Query plan sent to query coordinator 4.Query plan sent to LLAP daemons for execution. Filtering/masking performed. 5.Results consolidated and sent to client 1 HiveServer2 + LLAP + Ranger Ranger Dynamic Policies 5 2 3 4 LLAP LLAP LLAP Daemons
  • 24. 24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved External LLAP Client  LLAP Daemon – Persistent daemons combining query execution and in-memory caching – External applications also able to use LLAP to retrieve data • Provide a secure relational datanode view of the data
  • 25. 25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved LLAP LLAP LLAP Daemons YARN Cluster HiveServer2 Hive Query Coordinator Plan Generation TableScan: users Projection: name 1.Client requests data locations known as “splits” from HiveServer2. 2.Query plan generation by HiveServer2. 3.Splits returned to client which include signed query plan. 4.LLAP splits used by client to securely submit query plan to LLAP. Data returned to client. 1 External LLAP Client 3 2 4 Client App LLAP InputFormat SQL Query: select name from users
  • 26. 26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved YARN Cluster HiveServer2 Client App Hive Query Coordinator Plan Generation TableScan: users Filter: state = ‘CA’ Projection: mask(name) 1.Client requests data locations known as “splits” from HiveServer2. 2.Query plan generation by HiveServer2. Ranger security policies applied. Plan modified based on dynamic security policies. 3.Splits returned to client which include signed query plan. 4.LLAP splits used by client to securely submit query plan to LLAP. Filtering/masking performed. Data returned to client. 1 External LLAP Client + Ranger Ranger Dynamic Policies 3 2 LLAP InputFormat SQL Query: select name from users LLAP LLAP LLAP Daemons 4
  • 27. 27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Agenda Security Issues Goals Components How it works Demo
  • 28. 28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Spark-LLAP  Spark connector library + patches on top of Spark  Table data read securely through LLAP  Leverages standard Ranger policies to control per-user access/masking/filtering of data
  • 29. 29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Spark-LLAP: Credentials  HDFS Delegation Token – HDFSCredentialProvider gets it from namenode  Hive Metastore Delegation Token – HiveCredentialProvider gets it from Hive Metastore  HiveServer2 Delegation Token – HiveServer2CredentialProvider gets it from HiveServer2 Get and renew delegation tokens Spark-LLAP Existing
  • 30. 30 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Spark-LLAP: LlapMetastoreCatalog LlapMetastoreCatalog: Replaces MetastoreRelation with LlapRelation SELECT gender, count(*) FROM db_common.t_customer WHERE name LIKE '%Obama’ GROUP BY gender LlapRelation SubqueryAlias Analyzed Logical Plan Filter: name like %Obama Aggregate: gender UnresolvedRelation Filter: name like %Obama Parsed Logical Plan Aggregate: gender
  • 31. 31 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Spark-LLAP: LlapMetastoreCatalog LlapMetastoreCatalog: Replaces MetastoreRelation with LlapRelation Without Spark-LLAP With Spark-LLAP
  • 32. 32 © Hortonworks Inc. 2011 – 2017. All Rights Reserved YARN Cluster HiveServer2 LlapRelation Hive Query Coordinator Plan Generation TableScan: users Filter: state = ‘CA’ Projection mask(name) 1 Spark-LLAP: LlapRelation Ranger Dynamic Policies 3 2 LLAP InputFormat SQL Query: select name from users LLAP LLAP LLAP Daemons 4 Uses LLAP external client API to read table data
  • 33. 33 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Spark-LLAP: LlapRelation LlapRelation supports predicate pushdown and column pruning LlapRelation SubqueryAlias Analyzed Logical Plan Filter: name like %Obama Aggregate: gender LlapRelation Filter: EndsWith(name,Obama) Optimized Logical Plan Project: gender Aggregate: gender Scan LlapRelation PushedFilter: StringEndsWith(…) ReadSchema: gender Filter: EndsWith(name, Obama) Physical Plan Project: gender HashAggregate: gender …
  • 34. 34 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Using Spark-LLAP  spark-submit --package spark-llap.jar --conf spark.sql.hive.llap=true --conf spark.yarn.security.credentials.hiveserver2.enabled=true --master yarn --deploy-mode cluster sql.py Launch Spark jobs `--package` option is supported, too Easy to turn on/off Only used for YARN cluster mode
  • 35. 35 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Agenda Security Issues Goals Components How it works Demo
  • 36. 36 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Spark-LLAP for Spark 1.6 (TP) • Use Ranger for SELECT statement • Use LlapContext HDP 2.5.X Milestone Spark-LLAP for Spark 2.1.0 (TP) • Use Ranger for more statements (in STS) • No need to rewrite codes • Support all languages and shells HDP 2.6.0 HDP 2.6.1 Spark-LLAP for Spark 2.1.1 (TP) • Support YARN cluster mode • Support Hive complex types Spark-LLAP for Spark 2.2.0 • Available soon in GitHub HDP X.X.X
  • 37. 37 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Resources  GitHub – https://github.com/hortonworks-spark/spark-llap  Maven – http://repo.hortonworks.com/content/groups/public/com/hortonworks/spark/spark- llap_2.11/  Youtube Demo – https://www.youtube.com/watch?v=_-oYpQGWm5k (HDP 2.6.1)  Hortonworks Blog – https://hortonworks.com/blog/row-column-level-control-apache-spark/  Hortonworks Community Connection Article – https://community.hortonworks.com/articles/101181/rowcolumn-level-security-in-sql-for- apache-spark-2.html  Support Matrix – https://github.com/hortonworks-spark/spark-llap/wiki/7.-Support-Matrix
  • 38. 38 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Summary  Support row/column-level security with – Spark apps with YARN client/cluster mode – Spark shells – Spark Thrift Server  You can use the existing Spark 2.X SQL apps and scripts  Easy to turn on/off with only configurations  Ranger enforces Hive/Spark simultaneously and consistently Spark-LLAP with HDP 2.6.1 is TP
  • 39. 39 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Acknowledgement  Apache Hive / Apache Spark / Apache Ranger Community  Bikas Saha, Mingjie Tang, Saisai Shao, Siddharth Seth, Sergey Shelukhin, Thejas Nair, Zhan Zhang, and many others
  • 40. 40 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Thank you