SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Row Filtering and Column Masking
with Apache Ranger
Srikanth Venkat
Senior Director, Product Management
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Disclaimer
 This document may contain product features and technology directions that are under development, may be
under development in the future or may ultimately not be developed.
 Project capabilities are based on information that is publicly available within the Apache Software
Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception
to release through Apache, however, technical feasibility, market demand, user feedback and the
overarching Apache Software Foundation community development process can all effect timing and final
delivery.
 This document’s description of these features and technology directions does not represent a contractual
commitment, promise or obligation from Hortonworks to deliver these features in any generally available
product.
 Product features and technology directions are subject to change, and must not be included in contracts,
purchase orders, or sales agreements of any kind.
 Since this document contains an outline of general product development plans, customers should not rely
upon it when making purchasing decisions.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Background
Dynamic Column Masking and Row Filtering
Spark SQL Security via Hive LLAP/Ranger
Demo
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security Challenges of Today’s Data Platforms
 Central repository of critical and sensitive data
– Grey Data
 Data maintained over long duration
– Forever
 External ecosystem is in flux
– The Zoo
 Users can access and analyze data in new
and different ways
– Democratization
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger
• Central audit location for all
access requests
• Support multiple destination
sources (HDFS, Solr, etc.)
• Real-time visual query interface
AuditingAuthorization
• Store and manage encryption keys
• Support HDFS Transparent Data
Encryption
• Integration with HSM
• Safenet LUNA
Ranger KMS
• Centralized platform to define, administer
and manage security policies consistently
across Hadoop components
• HDFS, Hive, HBase, YARN, Kafka, Solr,
Storm, Knox, NiFi, Atlas
• Extensible Architecture
• Custom policy conditions, user context
enrichers
• Easy to add new component types for
authorization
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Architecture
HDFS
Ranger Administration Portal
HBase
Hive Server2
Ranger Audit Server
Ranger Plugin
HadoopComponentsEnterprise
Users
Ranger Plugin
Ranger Plugin
Legacy Tools and Data Governance
HDFS
Knox
NifI
Ranger Plugin
Ranger Plugin
SolrRanger Plugin
Ranger Policy Server Integration API
KafkaRanger Plugin
YARNRanger Plugin
Ranger PluginStorm Ranger Plugin Atlas
Solr
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
⬢ Simple Intuitive UI for Policy Editing and
Setup
⬢ Fine-grained specificity by resource type,
user context, tags, and operation
⬢ Supports Access, Tag Based, Dynamic Data
Masking, and Row Filtering Policy Types
Apache Ranger - Intuitive and Granular Policy Management
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger Audits - Data Access
⬢ Comprehensive scalable audit logging
⬢ Audits for:
⬢ Resource Access Events with user context
⬢ Policy Edits/Creation/Deletion
⬢ User session information
⬢ Component plugin policy sync operations
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Row Filtering in Hive
R A N G E R
Control Access to Rows in Hive Tables based on Context!
Goal: Improve reliability and robustness of HDP by providing Row
Level Security to Hive tables and reducing surface area of security
system
⬢ Capabilities
– Restrict data row access based on
– user characteristics (e.g. group membership) AND
– runtime context
⬢ Access restriction logic at Hive layer => No changes to apps!
– Hive applies the access restrictions every time that data access is
attempted
– Seamless behind the scenes enforcement of row level segmentation
without having to add this logic to the predicate of the query
– No need for multiple views to filter rows for different groups and
users!
⬢ Core Technologies: Ranger, Hive
AT L A S
H I V E
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Row Filtering in Hive
R A N G E R
Control Access to Rows in Hive Tables based on Context!
⬢ Use Cases: Cross-industry application for data protection:
AT L A S
H I V E
Healthcare
• A hospital can create a security policy that allows doctors
to view data rows only for their own patients
• Insurance claims administrators can view only specific
rows for their specific site.
Financial Services
• A bank can create a policy to restrict access to rows of
financial data based on the employee’s business division,
locale, or based on the employee’s role
• Employees in the finance department are allowed to
see customer invoices, payments, and accrual data
• European HR employees can see European
employee data).
Information
Technology
A multi-tenant application can create logical separation of
each tenant’s data so that each tenant can see only their
own data rows.
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Data Masking of Hive Columns
R A N G E R
Protect Sensitive Data in real-time with Dynamic Data Masking/Obfuscation!
Goal: Mask or anonymize sensitive columns of data
(e.g. PII, PCI, PHI) from Hive query output
⬢ Benefits
– Does not physically alter the data, or make a copy of it
– Original sensitive data also does not leave the data
store, but obfuscated when presenting to the user.
– No changes are required at the application or Hive layer
– No need to produce additional protected duplicate
versions of datasets
– Simple & easy to setup masking policies
⬢ Core Technologies: Ranger, Hive
AT L A S
H I V E
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Masking and Row Level Filtering
Country National ID CC No Name DOB MRN Policy ID
US 232323233 4539067047629850 John Doe 9/12/1969 8233054331 nj23j424
US 333287465 5391304868205600 Jane Doe 8/13/1979 3736885376 cadsd984
Germany T22000129 4532786256545550 Ernie Schwarz 3/5/1963 876452830A KK-2345909
Ranger Policy Enforcement
Country National ID CC No MRN Name
US xxxxx3233 4539 xxxx xxxx xxxx null John Doe
US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe
Country National ID Name MRN
Germany T22000129 Ernie Schwarz 876452830A
Users from US customer
support group see row
filtered data for US persons
with CC and National ID
(SSN) as masked values and
MRN is nullified
EU Health Policy Admins
view relevant columns of
data unmasked but are
restricted by row filtering
policies to see data for
EU persons only
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SparkSQL Security via Hive LLAP
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Spark SQL Security: Row Filtering and Column Masking
 Spark SQL + Hive enables users to explore very large data sets using SQL
 Enterprises want to enable Spark SQL for ad-hoc analysis using BI tools with
fine grain security
 Spark provides strong authentication via Kerberos and wire encryption via
SSL but as general purpose compute has no built in authorization sub-system
 Spark also does not have any way to define a pluggable module that contains
policies for fine grain authorization
– With structured data with columns and rows with Hive, fine grain security becomes a challenge
 Co-mingled data in the same table may belong to two different groups, each
with their own regulatory requirements.
 Data may have regional restrictions, time based availability restrictions,
departmental restrictions, etc.
all user passwords: hadoop
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive 2 with LLAP: Open Interfaces
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Features: Spark Column Security with LLAP
 Fine-Grained Column Level Access Control for SparkSQL.
 Fully dynamic policies per user. Doesn’t require views.
 Use Standard Ranger policies and tools to control access and masking policies.
Flow:
1. SparkSQL gets data locations
known as “splits” from HiveServer
and plans query.
2. HiveServer2 authorizes access
using Ranger. Per-user policies
like row filtering are applied.
3. Spark gets a modified query plan
based on dynamic security policy.
4. Spark reads data from LLAP.
Filtering / masking guaranteed by
LLAP server.
HiveServer2
Authorization
Hive Metastore
Data Locations
View Definitions
LLAP
Data Read
Filter Pushdown
Ranger Server
Dynamic Policies
Spark Client
1
2
4
3
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example: Per-User Row Filtering by Region in SparkSQL
Spark User 2
(East Region)
Spark User 1
(West Region)
Original Query:
SELECT * from CUSTOMERS
WHERE total_spend > 10000
Query Rewrites based on
Dynamic Ranger Policies
LLAP Data Access
User ID Region Total Spend
1 East 5,131
2 East 27,828
3 West 55,493
4 West 7,193
5 East 18,193
Dynamic Rewrite:
SELECT * from CUSTOMERS
WHERE total_spend > 10000
AND region = “east”
Dynamic Rewrite:
SELECT * from CUSTOMERS
WHERE total_spend > 10000
AND region = “west”
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Demo
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Setup
 Hortonia – mid-size financial services company expanding from US to
international markets
 Employees in EU and US
 Multiple business units need access to customer data: Analysts, HR
 Customer data is co-mingled as well as isolated
 Needs to have rational security policies to provide the right level of access
control to customer data across geographies, business functions, and to
comply with external regulations (PII, HIPAA, EU Privacy etc.)
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Data
 Customer data in hortoniabank DB
• 2 Customer Tables: 50K customer records each with 38 fields (PII, PHI, PCI & non-
sensitive data)
–us_customers: USA person data only
–ww_customers: multi-language, multi-country, localized person
data across the world
• 1 Reference table: eu_countries (reference table for looking up EU
country codes to country mappings – with BRExit etc.)
all user passwords: hadoop
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policies Setup for Demo
 Only US employees can see data in us_customers table and only from locations within the US
(access_us_customers)
 Only US employees can see data rows of US persons in ww_customers table (filter_ww_customers_table
+ access_ww_customers)
 Only EU employees can see rows with EU person data in ww_customers table (filter_ww_customers_table
+ access_ww_customers)
 US HR team members can see all original unmasked data (PCI, PII,….)
 Analysts can view masked versions of sensitive data from WW customers table but are prohibited from
viewing PII data in US tables (All masking policies under Masking Tab of Resource based policies)
 No combination of zip code, MRN, and bloodgroup data are permitted to be joined in any query
(prohibition policy)
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Personas Setup for Demo
User Group Access Privileges
joe-analyst us_employees,
analyst
US Data Only, non-sensitive data only, rest masked or forbidden
depending on sensitivity
kate-hr us_employees, hr US Data Only, All sensitive data (PCI, PII, PHI)
ivana-eu-hr eu_employees, hr EU Data Only, All sensitive data
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Column Data
Column
Description
Masking
Type
Sample Output Ranger Masking Policy
password Password Hash 237672b21819462ff39fcea7d990c3e5 mask_password_hash
nationalid National ID Show Last 4 xx-xx-9324 mask_nationalid_last4
ccnumber Credit Card
Number
Show First 4 4532xxxxxxxxxxxx mask_ccnumber_first4
streetaddress Street
Address
Redact nnn Xxxxxx Xxxxx mask_streetaddress_redact
MRN MRN Nullify null mask_mrn_nullify
age Age CUSTOM (Adds a random number below 20 to
actual age)
mask_age_custom
birthday Date of
Brith
CUSTOM 01-01-1987 (Keep year of birth and
make date & month 01-01)
mask_dob_custom
Data Masking Policies setup for us_customers data for analyst group
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Backup

More Related Content

What's hot

The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 

What's hot (20)

Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Review of Data Management Maturity Models
Review of Data Management Maturity ModelsReview of Data Management Maturity Models
Review of Data Management Maturity Models
 
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data Integration
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
 
Introduction to Data Management Maturity Models
Introduction to Data Management Maturity ModelsIntroduction to Data Management Maturity Models
Introduction to Data Management Maturity Models
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Straight Talk to Demystify Data Lineage
Straight Talk to Demystify Data LineageStraight Talk to Demystify Data Lineage
Straight Talk to Demystify Data Lineage
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data Quality
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introduction
 
The Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceThe Business Value of Metadata for Data Governance
The Business Value of Metadata for Data Governance
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Overview of new features in Apache Ranger
Overview of new features in Apache RangerOverview of new features in Apache Ranger
Overview of new features in Apache Ranger
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Master Data Management - Aligning Data, Process, and Governance
Master Data Management - Aligning Data, Process, and GovernanceMaster Data Management - Aligning Data, Process, and Governance
Master Data Management - Aligning Data, Process, and Governance
 

Viewers also liked

Viewers also liked (20)

Edw Optimization Solution
Edw Optimization Solution Edw Optimization Solution
Edw Optimization Solution
 
Top 5 Strategies for Retail Data Analytics
Top 5 Strategies for Retail Data AnalyticsTop 5 Strategies for Retail Data Analytics
Top 5 Strategies for Retail Data Analytics
 
Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
 
Getting involved with Open Source at the ASF
Getting involved with Open Source at the ASFGetting involved with Open Source at the ASF
Getting involved with Open Source at the ASF
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
 
Hive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
 
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
 
The path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesThe path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial Services
 
How Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform EducationHow Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform Education
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC IsilonScaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC Isilon
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambari
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsDelivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
 
The Power of your Data Achieved - Next Gen Modernization
The Power of your Data Achieved - Next Gen ModernizationThe Power of your Data Achieved - Next Gen Modernization
The Power of your Data Achieved - Next Gen Modernization
 
Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4
 

Similar to Dynamic Column Masking and Row-Level Filtering in HDP

Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
DataWorks Summit
 
The Implacable advance of the data
The Implacable advance of the dataThe Implacable advance of the data
The Implacable advance of the data
DataWorks Summit
 
Building a data-driven authorization framework
Building a data-driven authorization frameworkBuilding a data-driven authorization framework
Building a data-driven authorization framework
DataWorks Summit
 

Similar to Dynamic Column Masking and Row-Level Filtering in HDP (20)

Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDon't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Enterprise Data Classification and Provenance
Enterprise Data Classification and ProvenanceEnterprise Data Classification and Provenance
Enterprise Data Classification and Provenance
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop components
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
 
The Implacable advance of the data
The Implacable advance of the dataThe Implacable advance of the data
The Implacable advance of the data
 
Building a data-driven authorization framework
Building a data-driven authorization frameworkBuilding a data-driven authorization framework
Building a data-driven authorization framework
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your data
 
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
 

More from Hortonworks

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 

Dynamic Column Masking and Row-Level Filtering in HDP

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Row Filtering and Column Masking with Apache Ranger Srikanth Venkat Senior Director, Product Management
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Disclaimer  This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed.  Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery.  This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product.  Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.  Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Background Dynamic Column Masking and Row Filtering Spark SQL Security via Hive LLAP/Ranger Demo
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Security Challenges of Today’s Data Platforms  Central repository of critical and sensitive data – Grey Data  Data maintained over long duration – Forever  External ecosystem is in flux – The Zoo  Users can access and analyze data in new and different ways – Democratization
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger • Central audit location for all access requests • Support multiple destination sources (HDFS, Solr, etc.) • Real-time visual query interface AuditingAuthorization • Store and manage encryption keys • Support HDFS Transparent Data Encryption • Integration with HSM • Safenet LUNA Ranger KMS • Centralized platform to define, administer and manage security policies consistently across Hadoop components • HDFS, Hive, HBase, YARN, Kafka, Solr, Storm, Knox, NiFi, Atlas • Extensible Architecture • Custom policy conditions, user context enrichers • Easy to add new component types for authorization
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Architecture HDFS Ranger Administration Portal HBase Hive Server2 Ranger Audit Server Ranger Plugin HadoopComponentsEnterprise Users Ranger Plugin Ranger Plugin Legacy Tools and Data Governance HDFS Knox NifI Ranger Plugin Ranger Plugin SolrRanger Plugin Ranger Policy Server Integration API KafkaRanger Plugin YARNRanger Plugin Ranger PluginStorm Ranger Plugin Atlas Solr
  • 7. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ⬢ Simple Intuitive UI for Policy Editing and Setup ⬢ Fine-grained specificity by resource type, user context, tags, and operation ⬢ Supports Access, Tag Based, Dynamic Data Masking, and Row Filtering Policy Types Apache Ranger - Intuitive and Granular Policy Management
  • 8. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger Audits - Data Access ⬢ Comprehensive scalable audit logging ⬢ Audits for: ⬢ Resource Access Events with user context ⬢ Policy Edits/Creation/Deletion ⬢ User session information ⬢ Component plugin policy sync operations
  • 9. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Row Filtering in Hive R A N G E R Control Access to Rows in Hive Tables based on Context! Goal: Improve reliability and robustness of HDP by providing Row Level Security to Hive tables and reducing surface area of security system ⬢ Capabilities – Restrict data row access based on – user characteristics (e.g. group membership) AND – runtime context ⬢ Access restriction logic at Hive layer => No changes to apps! – Hive applies the access restrictions every time that data access is attempted – Seamless behind the scenes enforcement of row level segmentation without having to add this logic to the predicate of the query – No need for multiple views to filter rows for different groups and users! ⬢ Core Technologies: Ranger, Hive AT L A S H I V E
  • 10. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Row Filtering in Hive R A N G E R Control Access to Rows in Hive Tables based on Context! ⬢ Use Cases: Cross-industry application for data protection: AT L A S H I V E Healthcare • A hospital can create a security policy that allows doctors to view data rows only for their own patients • Insurance claims administrators can view only specific rows for their specific site. Financial Services • A bank can create a policy to restrict access to rows of financial data based on the employee’s business division, locale, or based on the employee’s role • Employees in the finance department are allowed to see customer invoices, payments, and accrual data • European HR employees can see European employee data). Information Technology A multi-tenant application can create logical separation of each tenant’s data so that each tenant can see only their own data rows.
  • 11. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Data Masking of Hive Columns R A N G E R Protect Sensitive Data in real-time with Dynamic Data Masking/Obfuscation! Goal: Mask or anonymize sensitive columns of data (e.g. PII, PCI, PHI) from Hive query output ⬢ Benefits – Does not physically alter the data, or make a copy of it – Original sensitive data also does not leave the data store, but obfuscated when presenting to the user. – No changes are required at the application or Hive layer – No need to produce additional protected duplicate versions of datasets – Simple & easy to setup masking policies ⬢ Core Technologies: Ranger, Hive AT L A S H I V E
  • 12. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Masking and Row Level Filtering Country National ID CC No Name DOB MRN Policy ID US 232323233 4539067047629850 John Doe 9/12/1969 8233054331 nj23j424 US 333287465 5391304868205600 Jane Doe 8/13/1979 3736885376 cadsd984 Germany T22000129 4532786256545550 Ernie Schwarz 3/5/1963 876452830A KK-2345909 Ranger Policy Enforcement Country National ID CC No MRN Name US xxxxx3233 4539 xxxx xxxx xxxx null John Doe US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe Country National ID Name MRN Germany T22000129 Ernie Schwarz 876452830A Users from US customer support group see row filtered data for US persons with CC and National ID (SSN) as masked values and MRN is nullified EU Health Policy Admins view relevant columns of data unmasked but are restricted by row filtering policies to see data for EU persons only
  • 13. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SparkSQL Security via Hive LLAP
  • 14. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Spark SQL Security: Row Filtering and Column Masking  Spark SQL + Hive enables users to explore very large data sets using SQL  Enterprises want to enable Spark SQL for ad-hoc analysis using BI tools with fine grain security  Spark provides strong authentication via Kerberos and wire encryption via SSL but as general purpose compute has no built in authorization sub-system  Spark also does not have any way to define a pluggable module that contains policies for fine grain authorization – With structured data with columns and rows with Hive, fine grain security becomes a challenge  Co-mingled data in the same table may belong to two different groups, each with their own regulatory requirements.  Data may have regional restrictions, time based availability restrictions, departmental restrictions, etc. all user passwords: hadoop
  • 15. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive 2 with LLAP: Open Interfaces
  • 16. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Features: Spark Column Security with LLAP  Fine-Grained Column Level Access Control for SparkSQL.  Fully dynamic policies per user. Doesn’t require views.  Use Standard Ranger policies and tools to control access and masking policies. Flow: 1. SparkSQL gets data locations known as “splits” from HiveServer and plans query. 2. HiveServer2 authorizes access using Ranger. Per-user policies like row filtering are applied. 3. Spark gets a modified query plan based on dynamic security policy. 4. Spark reads data from LLAP. Filtering / masking guaranteed by LLAP server. HiveServer2 Authorization Hive Metastore Data Locations View Definitions LLAP Data Read Filter Pushdown Ranger Server Dynamic Policies Spark Client 1 2 4 3
  • 17. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example: Per-User Row Filtering by Region in SparkSQL Spark User 2 (East Region) Spark User 1 (West Region) Original Query: SELECT * from CUSTOMERS WHERE total_spend > 10000 Query Rewrites based on Dynamic Ranger Policies LLAP Data Access User ID Region Total Spend 1 East 5,131 2 East 27,828 3 West 55,493 4 West 7,193 5 East 18,193 Dynamic Rewrite: SELECT * from CUSTOMERS WHERE total_spend > 10000 AND region = “east” Dynamic Rewrite: SELECT * from CUSTOMERS WHERE total_spend > 10000 AND region = “west”
  • 18. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Demo
  • 19. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Setup  Hortonia – mid-size financial services company expanding from US to international markets  Employees in EU and US  Multiple business units need access to customer data: Analysts, HR  Customer data is co-mingled as well as isolated  Needs to have rational security policies to provide the right level of access control to customer data across geographies, business functions, and to comply with external regulations (PII, HIPAA, EU Privacy etc.)
  • 20. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Data  Customer data in hortoniabank DB • 2 Customer Tables: 50K customer records each with 38 fields (PII, PHI, PCI & non- sensitive data) –us_customers: USA person data only –ww_customers: multi-language, multi-country, localized person data across the world • 1 Reference table: eu_countries (reference table for looking up EU country codes to country mappings – with BRExit etc.) all user passwords: hadoop
  • 21. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policies Setup for Demo  Only US employees can see data in us_customers table and only from locations within the US (access_us_customers)  Only US employees can see data rows of US persons in ww_customers table (filter_ww_customers_table + access_ww_customers)  Only EU employees can see rows with EU person data in ww_customers table (filter_ww_customers_table + access_ww_customers)  US HR team members can see all original unmasked data (PCI, PII,….)  Analysts can view masked versions of sensitive data from WW customers table but are prohibited from viewing PII data in US tables (All masking policies under Masking Tab of Resource based policies)  No combination of zip code, MRN, and bloodgroup data are permitted to be joined in any query (prohibition policy)
  • 22. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Personas Setup for Demo User Group Access Privileges joe-analyst us_employees, analyst US Data Only, non-sensitive data only, rest masked or forbidden depending on sensitivity kate-hr us_employees, hr US Data Only, All sensitive data (PCI, PII, PHI) ivana-eu-hr eu_employees, hr EU Data Only, All sensitive data
  • 23. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Column Data Column Description Masking Type Sample Output Ranger Masking Policy password Password Hash 237672b21819462ff39fcea7d990c3e5 mask_password_hash nationalid National ID Show Last 4 xx-xx-9324 mask_nationalid_last4 ccnumber Credit Card Number Show First 4 4532xxxxxxxxxxxx mask_ccnumber_first4 streetaddress Street Address Redact nnn Xxxxxx Xxxxx mask_streetaddress_redact MRN MRN Nullify null mask_mrn_nullify age Age CUSTOM (Adds a random number below 20 to actual age) mask_age_custom birthday Date of Brith CUSTOM 01-01-1987 (Keep year of birth and make date & month 01-01) mask_dob_custom Data Masking Policies setup for us_customers data for analyst group
  • 24. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Backup

Editor's Notes

  1. The Ranger Admin portal is the central interface for security administration. Users can create and update policies, which are then stored in a policy database. Plugins within each component poll these policies at regular intervals. The portal also consists of an audit server that sends audit data collected from the plugins for storage in HDFS or in a relational database. Ranger plugins: Plugins are lightweight Java programs which embed within processes of each cluster component. For example, the Apache Ranger plugin for Apache Hive is embedded within Hiveserver2.These plugins pull in policies from a central server and store them locally in a file. When a user request comes through the component, these plugins intercept the request and evaluate it against the security policy. Plugins also collect data from the user request and follow a separate thread to send this data back to the audit server. User group sync: Apache Ranger provides a user synchronization utility to pull users and groups from Unix or from LDAP or Active Directory. The user or group information is stored within Ranger portal and used for policy definition
  2. Row level filtering brings convenience to apps running on Hive. By moving the access restriction logic down into the Hive layer, Hive applies the access restrictions every time that data access is attempted, helping simplify authoring of the query and bringing in seamless behind the scenes enforcement of row level segmentation without having to add this logic to the predicate of the query
  3. Dynamic data masking via Apache Ranger enables security administrators to ensure that only authorized users can see the data they are permitted to see, while for other users or groups the same data is masked or anonymized to protect sensitive content.
  4. Interactive query: Low latency interactive query, persistent servers ready to process SQL Intelligent in-memory caching Builds on Hive engine + SQL capabilities Long running processes Ability to read from HDFS/S3, cache and serve it out Open interfaces/composable interfaces to read data Extensible interfaces to have Spark to read data out of LLAP and process Rely on LLAP that delivers trusted security Client side mechanisms can be circumvented so we have focused on server side enforcement of security
  5. Spark has its own exec engine and SQL dialect – so it needs to be able to deal w/ data in a raw manner Delegate all runtime and execution to Spark itself Spark plugin called LLAP context (aware of LLAP daemon, how to read data from LLAP daemon, & aware of Ranger query transformations) Spark SQL issue query, routed to HiveServer 2 into Ranger, Returns split locations Data read in based on split locations in parallel with assigned plan, Ranger applies query transformation to provide column masking and row filtering Then Spark is free to LLAP is trusted daemon