SlideShare a Scribd company logo
1 of 31
Download to read offline
1
Apache Sentry: Enterprise-grade
Security for Hadoop
Xuefu Zhang, Srayva Tirukkovalur | Cloudera
April 16, 2014
Outline
• Introduction
• Hadoop security primer
• Authentication
• Authorization
• Data Protection
• Governance and Auditing
• Introducing Apache Sentry
• What's Sentry
• Sentry Architecture
• Sentry Internal
• Future Work
• Demo
• Q&A
2
Introduction
●
Hadoop gets bigger ...
●
Hadoop has been enjoying an increasing adoption rate
●
More and more data on Hadoop Cluster
●
More and more access to the data
●
Data warehouse offload is the most common use case
●
Apache Hive, Apache Drill, Cloudera Impala
●
SQL on Hadoop is phenomenon
3
Introduction (cont'd)
●
But more encumbrance ...
●
Enterprises wants to protect sensitive data
●
Government regulations, compliance, like HIPPA, PII,
FISMA
●
Existing security problems with Hadoop has hindered
the adoption
●
Security has become the top priority
4
Introduction (cont'd)
●
Reality is ...
●
Different components, different security mechanisms
●
Multiple components may access the same data set
●
Hadoop was born out of trust, not security
●
Thinking of Windows
5
Outline
• Introduction
• Hadoop security primer
• Authentication
• Authorization
• Data Protection
• Governance and Auditing
• Introducing Apache Sentry
• What's Sentry
• Sentry Architecture
• Sentry Internal
• Future work
• Demo
• Q&A
6
Hadoop Security Primer
• Authentication
●
Identify who you are
●
Untrusted users has no access to the cluster network
●
In a trusted network, every one is good citizen
●
Who you are is determined by client host
7
Hadoop Security Primer
• Strong Authentication
●
Kerberos
●
LDAP, ActiveDirectory
●
LDAP, AD integrated with Kerberos, establishing a
single point of truth
●
Single Sign On
8
Hadoop Security Primer (cont'd)
• Kerberos
●
Strong authentication
●
Provides mutual authentication
●
Protects against eavesdropping and replay attacks
●
Every user and service has a Kerberos “principal”
●
Credentials: keytabs (service), password (user)
9
Hadoop Security Primer (cont'd)
• Authorization
●
Determine if you can access
●
HDFS Posix style permission R/W/X for U/G/O, coarse-
grained
●
Other components have authorization
●
MR job queue
●
HBase ACLs on table and column family.
●
Accumulo provides cell-level access control
●
Impersonation
10
Hadoop Security Primer (cont'd)
• Data Protection
●
Data at rest and in transit
●
Hadoop provides encryption on data in transit: DTP,
HTTP, RPC, JDBC/ODBC
●
Hadoop has no native encryption on data at rest (HDFS-
6134)
●
Relying on OS-level encryption
11
Hadoop Security Primer (cont'd)
• Governance and auditing
●
Again, component to component
●
DFS and MapReduce provide base audit support
●
Apache Hive metastore records audit (who/when)
information for Hive interactions.
●
Apache Oozie provides audit trail for services
12
Outline
• Introduction
• Hadoop security primer
• Authentication
• Authorization
• Data Protection
• Governance and Auditing
• Introducing Apache Sentry
• What's Sentry
• Sentry Architecture
• Sentry Internal
• Future work
• Demo
• Q&A
13
Introducing Apache Sentry
14
●
Hadoop Authorization
●
Existing authorization is fragmented, coarse-grained,
and manual
●
A lot of times data is just unprotected for simplicity
●
Enterprises need a centralized authorization component
that work across components with ease of use, fine-
grained, role based
Introducing Apache Sentry (cont'd)
15
●
What's Sentry
●
Sentry is an authorization module for Hive, Search,
Impala, and beyond
●
It unlocks Key RBAC Requirements: secure, fine-
grained, role-based authorization, multi-tenant
administration
●
Open Source, Apache Incubator project
●
Ecosystem Support: Apache SOLR, HiveServer2, &
Impala 1.1+
Introducing Apache Sentry (cont'd)
16
●
Key Benefits
●
Store Sensitive Data in Hadoop
●
Extend Hadoop to More Users
●
Comply with Regulations
Introducing Apache Sentry (cont'd)
17
●
Key Capabilities
●
Fine-Grained: SERVERS, DATABASES, TABLES &
VIEWS; INDEXES, COLLECTIONS
●
Role-Based: role including privileges such as SELECT,
INSERT, ALL; UPDATE, QUERY
●
Multi-Tenant administration
●
Separate policies for each database/schema
●
Can be maintained by separate admins
Introducing Apache Sentry (cont'd)
18
Binding
Layer
Impala
Impala Hive
Policy Engine
Policy Provider
File Database
HiveServer
2
Authorization
Provider
Local
FS/HDFS
Search
SOLR
Pig …
Sentry Architecture
Introducing Apache Sentry (cont'd)
19
QueryMR
SQL
Parse
Build
Check
Plan
Sentry
Validate SQL grammar
Construct statement tree
Validate statement objects
•
First check: Authorization
Forward to execution planner
Introducing Apache Sentry (cont'd)
• Actors
●
User
●
User group membership
●
Resources
●
Privilege
●
Role
20
Introducing Apache Sentry (cont'd)
• User
●
User authenticated
●
User identity obtained from session context
21
Introducing Apache Sentry (cont'd)
• User group membership
●
Defined outside sentry policy
●
Obtained from user directory (LDAP, AD, HDFS)
●
Maybe available from session context
22
Introducing Apache Sentry (cont'd)
• Resources
●
Data to be protected
●
File or directory on HDFS
●
Table or views in Hive
●
URI
●
Resource can be hierarchical
23
Introducing Apache Sentry (cont'd)
• Privilege
●
Action or operation associated with a resource
●
Exists in a role only
●
SELECT on a given TABLE or VIEW
●
CREATE a TABLE or VIEW
●
QUERY on a search COLLECTION
●
DELETE a FILE or DIRECTORY
●
Example
collection=customerCol->action=query
24
Introducing Apache Sentry (cont'd)
• Roles
●
A collection of privileges
●
Defined in Sentry policy
●
Example
[roles]
ana_query_role = collection=sentryColl->action=query
ana_update_role = collection=sentryColl->action=update
test_role = collection=testColl->action=update
full_admin_role = collection=*
25
Introducing Apache Sentry (cont'd)
• (Group, Role) mapping
●
Defined in policy
●
One-to-Many
●
Example
[groups]
analyts = ana_query_role, ana_update_role
admins = full_admin_role
testgroup = test_role
hbase = full_admin_role
26
Introducing Apache Sentry (cont'd)
• Rule evaluation
●
Who's the user?
●
Which group(s) does the user belong to?
●
What resource to be accessed?
●
How the resource is accessed (READ, SELECT, etc.)?
●
Does any of the user's groups have a role, which has
the right privilege?
●
Yes – great! Go head!
●
No – sorry! No sufficient privilege!
27
Outline
• Introduction
• Hadoop security primer
• Authentication
• Authorization
• Data Protection
• Governance and Auditing
• Introducing Apache Sentry
• What's Sentry
• Sentry Architecture
• Sentry Internal
• Future work
• Demo
• Q&A
28
Future Work
29
●
Introduce Sentry to more Hadoop components for their
authorization needs
●
Centralized policy store aiming for the whole enterprise
●
Grant/Revoke
●
Centralized authorization service for all protected
resources including metadata

We appreciate your contribution and support
Outline
• Introduction
• Hadoop security primer
• Authentication
• Authorization
• Data Protection
• Governance and Auditing
• Introducing Apache Sentry
• What's Sentry
• Sentry Architecture
• Sentry Internal
• Future work
• Demo
• Q&A
30
Click to edit Master title style
31

More Related Content

What's hot

Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
DataWorks Summit
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 

What's hot (20)

Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happy
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop operations
Hadoop operationsHadoop operations
Hadoop operations
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Redis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis UsageRedis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis Usage
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 

Viewers also liked

Viewers also liked (20)

One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple Clouds
 
Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)
 
Data: Open for Good and Secure by Default | Eddie Garcia
Data: Open for Good and Secure by Default | Eddie GarciaData: Open for Good and Secure by Default | Eddie Garcia
Data: Open for Good and Secure by Default | Eddie Garcia
 
BIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-CommerceBIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-Commerce
 
Presto - SQL on anything
Presto  - SQL on anythingPresto  - SQL on anything
Presto - SQL on anything
 
RecordService for Unified Access Control
RecordService for Unified Access ControlRecordService for Unified Access Control
RecordService for Unified Access Control
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Securing Your Apache Spark Applications
Securing Your Apache Spark ApplicationsSecuring Your Apache Spark Applications
Securing Your Apache Spark Applications
 
Tech Backpack Brief v1
Tech Backpack Brief v1Tech Backpack Brief v1
Tech Backpack Brief v1
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
 
Enabling the Connected Car Revolution

Enabling the Connected Car Revolution
Enabling the Connected Car Revolution

Enabling the Connected Car Revolution

 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 

Similar to April 2014 HUG : Apache Sentry

Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 
Hive contributors meetup apache sentry
Hive contributors meetup   apache sentryHive contributors meetup   apache sentry
Hive contributors meetup apache sentry
Brock Noland
 
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
DataWorks Summit
 

Similar to April 2014 HUG : Apache Sentry (20)

Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at Uber
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
BigDataTech 2015 Is Hadoop Enterprise ready?
BigDataTech 2015 Is Hadoop Enterprise ready?BigDataTech 2015 Is Hadoop Enterprise ready?
BigDataTech 2015 Is Hadoop Enterprise ready?
 
Search On Hadoop
Search On HadoopSearch On Hadoop
Search On Hadoop
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Hive contributors meetup apache sentry
Hive contributors meetup   apache sentryHive contributors meetup   apache sentry
Hive contributors meetup apache sentry
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi MensahTurning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
 
Oracle big data discovery 994294
Oracle big data discovery   994294Oracle big data discovery   994294
Oracle big data discovery 994294
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
 

More from Yahoo Developer Network

Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Yahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Yahoo Developer Network
 

More from Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Recently uploaded

Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
RafigAliyev2
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
pyhepag
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 

Recently uploaded (20)

Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 

April 2014 HUG : Apache Sentry

  • 1. 1 Apache Sentry: Enterprise-grade Security for Hadoop Xuefu Zhang, Srayva Tirukkovalur | Cloudera April 16, 2014
  • 2. Outline • Introduction • Hadoop security primer • Authentication • Authorization • Data Protection • Governance and Auditing • Introducing Apache Sentry • What's Sentry • Sentry Architecture • Sentry Internal • Future Work • Demo • Q&A 2
  • 3. Introduction ● Hadoop gets bigger ... ● Hadoop has been enjoying an increasing adoption rate ● More and more data on Hadoop Cluster ● More and more access to the data ● Data warehouse offload is the most common use case ● Apache Hive, Apache Drill, Cloudera Impala ● SQL on Hadoop is phenomenon 3
  • 4. Introduction (cont'd) ● But more encumbrance ... ● Enterprises wants to protect sensitive data ● Government regulations, compliance, like HIPPA, PII, FISMA ● Existing security problems with Hadoop has hindered the adoption ● Security has become the top priority 4
  • 5. Introduction (cont'd) ● Reality is ... ● Different components, different security mechanisms ● Multiple components may access the same data set ● Hadoop was born out of trust, not security ● Thinking of Windows 5
  • 6. Outline • Introduction • Hadoop security primer • Authentication • Authorization • Data Protection • Governance and Auditing • Introducing Apache Sentry • What's Sentry • Sentry Architecture • Sentry Internal • Future work • Demo • Q&A 6
  • 7. Hadoop Security Primer • Authentication ● Identify who you are ● Untrusted users has no access to the cluster network ● In a trusted network, every one is good citizen ● Who you are is determined by client host 7
  • 8. Hadoop Security Primer • Strong Authentication ● Kerberos ● LDAP, ActiveDirectory ● LDAP, AD integrated with Kerberos, establishing a single point of truth ● Single Sign On 8
  • 9. Hadoop Security Primer (cont'd) • Kerberos ● Strong authentication ● Provides mutual authentication ● Protects against eavesdropping and replay attacks ● Every user and service has a Kerberos “principal” ● Credentials: keytabs (service), password (user) 9
  • 10. Hadoop Security Primer (cont'd) • Authorization ● Determine if you can access ● HDFS Posix style permission R/W/X for U/G/O, coarse- grained ● Other components have authorization ● MR job queue ● HBase ACLs on table and column family. ● Accumulo provides cell-level access control ● Impersonation 10
  • 11. Hadoop Security Primer (cont'd) • Data Protection ● Data at rest and in transit ● Hadoop provides encryption on data in transit: DTP, HTTP, RPC, JDBC/ODBC ● Hadoop has no native encryption on data at rest (HDFS- 6134) ● Relying on OS-level encryption 11
  • 12. Hadoop Security Primer (cont'd) • Governance and auditing ● Again, component to component ● DFS and MapReduce provide base audit support ● Apache Hive metastore records audit (who/when) information for Hive interactions. ● Apache Oozie provides audit trail for services 12
  • 13. Outline • Introduction • Hadoop security primer • Authentication • Authorization • Data Protection • Governance and Auditing • Introducing Apache Sentry • What's Sentry • Sentry Architecture • Sentry Internal • Future work • Demo • Q&A 13
  • 14. Introducing Apache Sentry 14 ● Hadoop Authorization ● Existing authorization is fragmented, coarse-grained, and manual ● A lot of times data is just unprotected for simplicity ● Enterprises need a centralized authorization component that work across components with ease of use, fine- grained, role based
  • 15. Introducing Apache Sentry (cont'd) 15 ● What's Sentry ● Sentry is an authorization module for Hive, Search, Impala, and beyond ● It unlocks Key RBAC Requirements: secure, fine- grained, role-based authorization, multi-tenant administration ● Open Source, Apache Incubator project ● Ecosystem Support: Apache SOLR, HiveServer2, & Impala 1.1+
  • 16. Introducing Apache Sentry (cont'd) 16 ● Key Benefits ● Store Sensitive Data in Hadoop ● Extend Hadoop to More Users ● Comply with Regulations
  • 17. Introducing Apache Sentry (cont'd) 17 ● Key Capabilities ● Fine-Grained: SERVERS, DATABASES, TABLES & VIEWS; INDEXES, COLLECTIONS ● Role-Based: role including privileges such as SELECT, INSERT, ALL; UPDATE, QUERY ● Multi-Tenant administration ● Separate policies for each database/schema ● Can be maintained by separate admins
  • 18. Introducing Apache Sentry (cont'd) 18 Binding Layer Impala Impala Hive Policy Engine Policy Provider File Database HiveServer 2 Authorization Provider Local FS/HDFS Search SOLR Pig … Sentry Architecture
  • 19. Introducing Apache Sentry (cont'd) 19 QueryMR SQL Parse Build Check Plan Sentry Validate SQL grammar Construct statement tree Validate statement objects • First check: Authorization Forward to execution planner
  • 20. Introducing Apache Sentry (cont'd) • Actors ● User ● User group membership ● Resources ● Privilege ● Role 20
  • 21. Introducing Apache Sentry (cont'd) • User ● User authenticated ● User identity obtained from session context 21
  • 22. Introducing Apache Sentry (cont'd) • User group membership ● Defined outside sentry policy ● Obtained from user directory (LDAP, AD, HDFS) ● Maybe available from session context 22
  • 23. Introducing Apache Sentry (cont'd) • Resources ● Data to be protected ● File or directory on HDFS ● Table or views in Hive ● URI ● Resource can be hierarchical 23
  • 24. Introducing Apache Sentry (cont'd) • Privilege ● Action or operation associated with a resource ● Exists in a role only ● SELECT on a given TABLE or VIEW ● CREATE a TABLE or VIEW ● QUERY on a search COLLECTION ● DELETE a FILE or DIRECTORY ● Example collection=customerCol->action=query 24
  • 25. Introducing Apache Sentry (cont'd) • Roles ● A collection of privileges ● Defined in Sentry policy ● Example [roles] ana_query_role = collection=sentryColl->action=query ana_update_role = collection=sentryColl->action=update test_role = collection=testColl->action=update full_admin_role = collection=* 25
  • 26. Introducing Apache Sentry (cont'd) • (Group, Role) mapping ● Defined in policy ● One-to-Many ● Example [groups] analyts = ana_query_role, ana_update_role admins = full_admin_role testgroup = test_role hbase = full_admin_role 26
  • 27. Introducing Apache Sentry (cont'd) • Rule evaluation ● Who's the user? ● Which group(s) does the user belong to? ● What resource to be accessed? ● How the resource is accessed (READ, SELECT, etc.)? ● Does any of the user's groups have a role, which has the right privilege? ● Yes – great! Go head! ● No – sorry! No sufficient privilege! 27
  • 28. Outline • Introduction • Hadoop security primer • Authentication • Authorization • Data Protection • Governance and Auditing • Introducing Apache Sentry • What's Sentry • Sentry Architecture • Sentry Internal • Future work • Demo • Q&A 28
  • 29. Future Work 29 ● Introduce Sentry to more Hadoop components for their authorization needs ● Centralized policy store aiming for the whole enterprise ● Grant/Revoke ● Centralized authorization service for all protected resources including metadata  We appreciate your contribution and support
  • 30. Outline • Introduction • Hadoop security primer • Authentication • Authorization • Data Protection • Governance and Auditing • Introducing Apache Sentry • What's Sentry • Sentry Architecture • Sentry Internal • Future work • Demo • Q&A 30
  • 31. Click to edit Master title style 31