SlideShare a Scribd company logo
1 of 40
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Fine-Grained Security
for Spark and Hive
Carter Shanklin - Director PM
Don Bosco Durai - Security Architect
June 29, 2016
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
● Current security options and challenges
● Apache Ranger Overview
● LLAP Overview
● Use Cases and Demo
● Apache Atlas Integration
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Current Options and Challenges
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Current Options and Challenges
⬢ Limited to storage level access control for Spark, Pig and MR
⬢ Column Level Access via HiveServer2
⬢ Row Level filtering need Hive Views
– Multiple Hive Views needs to be created and managed
– Explicit permissions need to be given for each view/user
– User need to know which view to use
⬢ Masking needs custom UDF
– Needs to be wrapped using Views
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger Overview
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger
• Central audit location for all
access requests
• Support multiple destination
sources (HDFS, Solr, etc.)
• Real-time visual query
interface
AuditingAuthorization
• Store and manage
encryption keys
• Support HDFS TDE
• Integration with HSM
Ranger KMS
• Centralized platform to
define, administer and
manage security policies
consistently
• Enforce policies within each
component
© Hortonworks Inc. 2015. All Rights Reserved
© Hortonworks Inc. 2015. All Rights Reserved
© Hortonworks Inc. 2015. All Rights Reserved
Ranger Architecture
HDFS
Ranger Administration Portal
HBase
Hive Server2
Ranger Audit
Server
Ranger
Plugin
HadoopComponentsEnterprise
Users
Ranger
Plugin
Ranger
Plugin
Legacy Tools and Data
Governance
HDFS
Knox
NifI
Ranger
Plugin
Ranger
Plugin
RDBMS
Solr
Ranger
Plugin
Ranger Policy
Server Integration API
Kafka
Ranger
Plugin
YARN
Ranger
Plugin
Ranger
Plugin
Storm
Ranger
Plugin
Atlas
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Audits - Data Access
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Audits - Admin Actions
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LLAP Overview
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive 2.0 and LLAP
⬢ At a High Level:
– 2000+ features, improvements and bug
fixes in Hive since HDP 2.4.
– 600+ of these from outside of
Hortonworks.
⬢ Major Improvements:
– Preview: Hive LLAP: Persistent query
servers with intelligent in-memory
caching.
– ACID GA: Hardened and proven at scale.
– Expanded SQL Compliance: More capable
integration with BI tools.
– Performance: Interactive query, 2x faster
ETL.
– Security: Row / Column security
extending to views, Column level security
for Spark.
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive 2 with LLAP: Architecture Overview
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive 2 with LLAP: Open Interfaces
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Integration with Hive and LLAP
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive / LLAP Security Capabilities with Ranger
⬢ Ranger Hive plugin provides authorization / access controls.
⬢ Column Masking:
– Inject Hive UDFs that mask characters or hash values.
– Dynamic, per-user.
⬢ Dynamic Row Filtering:
– Query is analyzed and policies applied.
– Dynamic, per-user.
⬢ All operations run as ordinary SQL queries:
– Masking statements convert to clauses in the SQL select clause.
– Filters convert to clauses in the SQL where clause.
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Native Hive Masking Capabilities
UDF Purpose Example Start Example Result
mask Convert letters to X/x and
numbers to n.
123 Fake St. nnn Xxxx Xx.
mask_first_n Mask only the first n
characters.
433-54-3937 nnn-54-3937
mask_last_n Mask only the last n
characters.
433-54-3937 433-54-nnnn
mask_show_first_n Mask, showing only the first
n characters.
555-233-1234 555-nnn-nnnn
mask_show_last_n Mask, showing only the last
n characters.
433-54-3937 nnn-nn-3937
mask_hash Produce a consistent hash of
the field.
CA 21f241cccaa5cfa33190f56ff1510e37
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Delivering Spark Security
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Features: Spark Column Security with LLAP
⬢ Fine-Grained Column Level Access Control for SparkSQL.
⬢ Fully dynamic policies per user. Doesn’t require views.
⬢ Use Standard Ranger policies and tools to control access and masking policies.
Flow:
1. SparkSQL gets data locations
known as “splits” from
HiveServer and plans query.
2. HiveServer2 authorizes access
using Ranger. Per-user policies
like row filtering are applied.
3. Spark gets a modified query
plan based on dynamic security
policy.
4. Spark reads data from LLAP.
Filtering / masking guaranteed
by LLAP server.
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example: Per-User Row Filtering by Region in SparkSQL
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Cases
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Setup
⬢Customer User and Sales data in ORC (Metadata in MetaStore)
⬢Data can be access via SparkSQL or HiveServer2
⬢Marketing needs access to Sales and Users data for analytics
⬢Fraud Investigation team needs access to data for fraud detection
⬢Billing team needs access to Sales and Users data for billing
Users
customer_id
customer_name
customer_email
customer_phone
customer_ccn
customer_state
customer_zip
Sales
customer_id
product_id
promotion_id
cookie_id
tracking_id
Group Users
Fraud frank
Marketing mark
Billing bill
Tables
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case 1: Restricting Column Access
This is a simple use case where certain groups or users don’t permission to view
the query
⬢Billing group has access to all columns in table Users
⬢Marketing group can’t access credit card column from table Users
Users
customer_id
customer_name
customer_email
customer_phone
customer_ccn
customer_state
customer_zip
User/Column customer_phone customer_ccn
bill (Billing) 😀 😀
mark (Marketing) 😀 😡
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policy - Restrict Columns
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policy - Restrict Columns - Results
bill
from
Billing
mark
from
Marketing
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policy - Restrict Columns - Audit Screen
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case 2: Column Masking
In this use case where certain groups or users won't be able to see the real
value of certain columns.
⬢Billing group can see the real/raw values for all columns in table Users
⬢Fraud group can only see masked values of PII and PCI fields from table Users
Users
customer_id
customer_name
customer_email
customer_phone
customer_ccn
customer_state
customer_zip
User/Column customer_email,
customer_phone,
customer_ccn
bill (Billing) 😀
frank (Fraud) 😎
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policies - Mask Fields
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policy - Column Masking - Results
bill
from
Billing
frank
from
Fraud
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policy - Column Masking - Audit Screen
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case 3: Row Filtering
In this use case where certain groups or users won't be able to see all the rows
from certain tables
⬢Billing group can see all the rows in the table Users
⬢Marketing can only see rows/data from their region in the table Users
Users
customer_id
customer_name
customer_email
customer_phone
customer_ccn
customer_state
customer_zip
User/Column Rows in Users table
bill (Billing) 😀
Mark (Marketing-
CA)
Only CA Users
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policies - Row Filtering
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policy - Row Filtering - Results
bill
from
Billing
mark
from
Marketing
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case 4: Row Filtering - Cross Table
This an extension of previous use cases, where the context information for
filtering the row is in another table.
⬢Billing group can see all the rows in the table Sales
⬢Marketing can only see rows/data from their region in the table Sales,
however Sales table doesn’t have the customer geographic information, so it
needs to be derived from Users table
Users
customer_id
customer_name
customer_email
customer_phone
customer_ccn
customer_state
customer_zip
User/Column Rows in Sales table
bill (Billing) 😀
Mark (Marketing-
CA)
Only CA Users
Sales
customer_id
product_id
promotion_id
cookie_id
tracking_id
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policies - Row Filtering - Cross Table
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas Integration
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cross Product Symbiosis
Apache
Atlas
Apache
Ranger
LLAP
Classification/
Tagging
Governance
Lineage
Tag Based
Policies
Dynamic Custom
Policies
Enforcement hooks
HDFS S3
Meta
Store
* Column Masking and Row Filtering not yet supported by tag based policy
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger - Tag Based Policies
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Q & A

More Related Content

What's hot

Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationHortonworks
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseEnabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseDataWorks Summit/Hadoop Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureDataWorks Summit
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFiHortonworks
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureDataWorks Summit
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with ZeppelinHortonworks
 
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3DataWorks Summit
 
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHortonworks
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingDataWorks Summit/Hadoop Summit
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & FutureDataWorks Summit
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPHortonworks
 

What's hot (20)

Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseEnabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
 
Why is my Hadoop* job slow?
Why is my Hadoop* job slow?Why is my Hadoop* job slow?
Why is my Hadoop* job slow?
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with Zeppelin
 
#HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course #HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course
 
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
 
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
 
Scalable Real-time analytics using Druid
Scalable Real-time analytics using DruidScalable Real-time analytics using Druid
Scalable Real-time analytics using Druid
 

Similar to Fine-Grained Security for Spark and Hive

Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...DataWorks Summit
 
Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerDataWorks Summit
 
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDon't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDataWorks Summit
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Artem Ervits
 
Spark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXSpark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXKirk Haslbeck
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...DataWorks Summit
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemDataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Top Ten Tips for IBM i Security and Compliance
Top Ten Tips for IBM i Security and ComplianceTop Ten Tips for IBM i Security and Compliance
Top Ten Tips for IBM i Security and CompliancePrecisely
 
Avaya IP Office Customer Call Reporter
Avaya IP Office Customer Call ReporterAvaya IP Office Customer Call Reporter
Avaya IP Office Customer Call ReporterMotty Ben Atia
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Honeywell Experion HS
Honeywell Experion HSHoneywell Experion HS
Honeywell Experion HSShivam Singh
 
Paris FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerParis FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerAbdelkrim Hadjidj
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies DataWorks Summit/Hadoop Summit
 

Similar to Fine-Grained Security for Spark and Hive (20)

Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
 
Kafka/SMM Crash Course
Kafka/SMM Crash CourseKafka/SMM Crash Course
Kafka/SMM Crash Course
 
Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging Manager
 
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDon't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
 
Spark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXSpark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWX
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Top Ten Tips for IBM i Security and Compliance
Top Ten Tips for IBM i Security and ComplianceTop Ten Tips for IBM i Security and Compliance
Top Ten Tips for IBM i Security and Compliance
 
What's new in Ambari
What's new in AmbariWhat's new in Ambari
What's new in Ambari
 
Avaya IP Office Customer Call Reporter
Avaya IP Office Customer Call ReporterAvaya IP Office Customer Call Reporter
Avaya IP Office Customer Call Reporter
 
Streaming analytics manager
Streaming analytics managerStreaming analytics manager
Streaming analytics manager
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Honeywell Experion HS
Honeywell Experion HSHoneywell Experion HS
Honeywell Experion HS
 
Paris FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerParis FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging Manager
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Fine-Grained Security for Spark and Hive

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Fine-Grained Security for Spark and Hive Carter Shanklin - Director PM Don Bosco Durai - Security Architect June 29, 2016
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda ● Current security options and challenges ● Apache Ranger Overview ● LLAP Overview ● Use Cases and Demo ● Apache Atlas Integration
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Current Options and Challenges
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Current Options and Challenges ⬢ Limited to storage level access control for Spark, Pig and MR ⬢ Column Level Access via HiveServer2 ⬢ Row Level filtering need Hive Views – Multiple Hive Views needs to be created and managed – Explicit permissions need to be given for each view/user – User need to know which view to use ⬢ Masking needs custom UDF – Needs to be wrapped using Views
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger Overview
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger • Central audit location for all access requests • Support multiple destination sources (HDFS, Solr, etc.) • Real-time visual query interface AuditingAuthorization • Store and manage encryption keys • Support HDFS TDE • Integration with HSM Ranger KMS • Centralized platform to define, administer and manage security policies consistently • Enforce policies within each component
  • 7. © Hortonworks Inc. 2015. All Rights Reserved
  • 8. © Hortonworks Inc. 2015. All Rights Reserved
  • 9. © Hortonworks Inc. 2015. All Rights Reserved Ranger Architecture HDFS Ranger Administration Portal HBase Hive Server2 Ranger Audit Server Ranger Plugin HadoopComponentsEnterprise Users Ranger Plugin Ranger Plugin Legacy Tools and Data Governance HDFS Knox NifI Ranger Plugin Ranger Plugin RDBMS Solr Ranger Plugin Ranger Policy Server Integration API Kafka Ranger Plugin YARN Ranger Plugin Ranger Plugin Storm Ranger Plugin Atlas
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Audits - Data Access
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Audits - Admin Actions
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LLAP Overview
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive 2.0 and LLAP ⬢ At a High Level: – 2000+ features, improvements and bug fixes in Hive since HDP 2.4. – 600+ of these from outside of Hortonworks. ⬢ Major Improvements: – Preview: Hive LLAP: Persistent query servers with intelligent in-memory caching. – ACID GA: Hardened and proven at scale. – Expanded SQL Compliance: More capable integration with BI tools. – Performance: Interactive query, 2x faster ETL. – Security: Row / Column security extending to views, Column level security for Spark.
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive 2 with LLAP: Architecture Overview
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive 2 with LLAP: Open Interfaces
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Integration with Hive and LLAP
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive / LLAP Security Capabilities with Ranger ⬢ Ranger Hive plugin provides authorization / access controls. ⬢ Column Masking: – Inject Hive UDFs that mask characters or hash values. – Dynamic, per-user. ⬢ Dynamic Row Filtering: – Query is analyzed and policies applied. – Dynamic, per-user. ⬢ All operations run as ordinary SQL queries: – Masking statements convert to clauses in the SQL select clause. – Filters convert to clauses in the SQL where clause.
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Native Hive Masking Capabilities UDF Purpose Example Start Example Result mask Convert letters to X/x and numbers to n. 123 Fake St. nnn Xxxx Xx. mask_first_n Mask only the first n characters. 433-54-3937 nnn-54-3937 mask_last_n Mask only the last n characters. 433-54-3937 433-54-nnnn mask_show_first_n Mask, showing only the first n characters. 555-233-1234 555-nnn-nnnn mask_show_last_n Mask, showing only the last n characters. 433-54-3937 nnn-nn-3937 mask_hash Produce a consistent hash of the field. CA 21f241cccaa5cfa33190f56ff1510e37
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Delivering Spark Security
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Features: Spark Column Security with LLAP ⬢ Fine-Grained Column Level Access Control for SparkSQL. ⬢ Fully dynamic policies per user. Doesn’t require views. ⬢ Use Standard Ranger policies and tools to control access and masking policies. Flow: 1. SparkSQL gets data locations known as “splits” from HiveServer and plans query. 2. HiveServer2 authorizes access using Ranger. Per-user policies like row filtering are applied. 3. Spark gets a modified query plan based on dynamic security policy. 4. Spark reads data from LLAP. Filtering / masking guaranteed by LLAP server.
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example: Per-User Row Filtering by Region in SparkSQL
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Cases
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Setup ⬢Customer User and Sales data in ORC (Metadata in MetaStore) ⬢Data can be access via SparkSQL or HiveServer2 ⬢Marketing needs access to Sales and Users data for analytics ⬢Fraud Investigation team needs access to data for fraud detection ⬢Billing team needs access to Sales and Users data for billing Users customer_id customer_name customer_email customer_phone customer_ccn customer_state customer_zip Sales customer_id product_id promotion_id cookie_id tracking_id Group Users Fraud frank Marketing mark Billing bill Tables
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case 1: Restricting Column Access This is a simple use case where certain groups or users don’t permission to view the query ⬢Billing group has access to all columns in table Users ⬢Marketing group can’t access credit card column from table Users Users customer_id customer_name customer_email customer_phone customer_ccn customer_state customer_zip User/Column customer_phone customer_ccn bill (Billing) 😀 😀 mark (Marketing) 😀 😡
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policy - Restrict Columns
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policy - Restrict Columns - Results bill from Billing mark from Marketing
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policy - Restrict Columns - Audit Screen
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case 2: Column Masking In this use case where certain groups or users won't be able to see the real value of certain columns. ⬢Billing group can see the real/raw values for all columns in table Users ⬢Fraud group can only see masked values of PII and PCI fields from table Users Users customer_id customer_name customer_email customer_phone customer_ccn customer_state customer_zip User/Column customer_email, customer_phone, customer_ccn bill (Billing) 😀 frank (Fraud) 😎
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policies - Mask Fields
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policy - Column Masking - Results bill from Billing frank from Fraud
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policy - Column Masking - Audit Screen
  • 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case 3: Row Filtering In this use case where certain groups or users won't be able to see all the rows from certain tables ⬢Billing group can see all the rows in the table Users ⬢Marketing can only see rows/data from their region in the table Users Users customer_id customer_name customer_email customer_phone customer_ccn customer_state customer_zip User/Column Rows in Users table bill (Billing) 😀 Mark (Marketing- CA) Only CA Users
  • 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policies - Row Filtering
  • 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policy - Row Filtering - Results bill from Billing mark from Marketing
  • 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case 4: Row Filtering - Cross Table This an extension of previous use cases, where the context information for filtering the row is in another table. ⬢Billing group can see all the rows in the table Sales ⬢Marketing can only see rows/data from their region in the table Sales, however Sales table doesn’t have the customer geographic information, so it needs to be derived from Users table Users customer_id customer_name customer_email customer_phone customer_ccn customer_state customer_zip User/Column Rows in Sales table bill (Billing) 😀 Mark (Marketing- CA) Only CA Users Sales customer_id product_id promotion_id cookie_id tracking_id
  • 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policies - Row Filtering - Cross Table
  • 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas Integration
  • 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cross Product Symbiosis Apache Atlas Apache Ranger LLAP Classification/ Tagging Governance Lineage Tag Based Policies Dynamic Custom Policies Enforcement hooks HDFS S3 Meta Store * Column Masking and Row Filtering not yet supported by tag based policy
  • 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger - Tag Based Policies
  • 40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Q & A