Your SlideShare is downloading. ×
0
1
Apache Sentry: Enterprise-grade
Security for Hadoop
Xuefu Zhang, Srayva Tirukkovalur | Cloudera
April 16, 2014
Outline
• Introduction
• Hadoop security primer
• Authentication
• Authorization
• Data Protection
• Governance and Auditi...
Introduction
●
Hadoop gets bigger ...
●
Hadoop has been enjoying an increasing adoption rate
●
More and more data on Hadoo...
Introduction (cont'd)
●
But more encumbrance ...
●
Enterprises wants to protect sensitive data
●
Government regulations, c...
Introduction (cont'd)
●
Reality is ...
●
Different components, different security mechanisms
●
Multiple components may acc...
Outline
• Introduction
• Hadoop security primer
• Authentication
• Authorization
• Data Protection
• Governance and Auditi...
Hadoop Security Primer
• Authentication
●
Identify who you are
●
Untrusted users has no access to the cluster network
●
In...
Hadoop Security Primer
• Strong Authentication
●
Kerberos
●
LDAP, ActiveDirectory
●
LDAP, AD integrated with Kerberos, est...
Hadoop Security Primer (cont'd)
• Kerberos
●
Strong authentication
●
Provides mutual authentication
●
Protects against eav...
Hadoop Security Primer (cont'd)
• Authorization
●
Determine if you can access
●
HDFS Posix style permission R/W/X for U/G/...
Hadoop Security Primer (cont'd)
• Data Protection
●
Data at rest and in transit
●
Hadoop provides encryption on data in tr...
Hadoop Security Primer (cont'd)
• Governance and auditing
●
Again, component to component
●
DFS and MapReduce provide base...
Outline
• Introduction
• Hadoop security primer
• Authentication
• Authorization
• Data Protection
• Governance and Auditi...
Introducing Apache Sentry
14
●
Hadoop Authorization
●
Existing authorization is fragmented, coarse-grained,
and manual
●
A...
Introducing Apache Sentry (cont'd)
15
●
What's Sentry
●
Sentry is an authorization module for Hive, Search,
Impala, and be...
Introducing Apache Sentry (cont'd)
16
●
Key Benefits
●
Store Sensitive Data in Hadoop
●
Extend Hadoop to More Users
●
Comp...
Introducing Apache Sentry (cont'd)
17
●
Key Capabilities
●
Fine-Grained: SERVERS, DATABASES, TABLES &
VIEWS; INDEXES, COLL...
Introducing Apache Sentry (cont'd)
18
Binding
Layer
Impala
Impala Hive
Policy Engine
Policy Provider
File Database
HiveSer...
Introducing Apache Sentry (cont'd)
19
QueryMR
SQL
Parse
Build
Check
Plan
Sentry
Validate SQL grammar
Construct statement t...
Introducing Apache Sentry (cont'd)
• Actors
●
User
●
User group membership
●
Resources
●
Privilege
●
Role
20
Introducing Apache Sentry (cont'd)
• User
●
User authenticated
●
User identity obtained from session context
21
Introducing Apache Sentry (cont'd)
• User group membership
●
Defined outside sentry policy
●
Obtained from user directory ...
Introducing Apache Sentry (cont'd)
• Resources
●
Data to be protected
●
File or directory on HDFS
●
Table or views in Hive...
Introducing Apache Sentry (cont'd)
• Privilege
●
Action or operation associated with a resource
●
Exists in a role only
●
...
Introducing Apache Sentry (cont'd)
• Roles
●
A collection of privileges
●
Defined in Sentry policy
●
Example
[roles]
ana_q...
Introducing Apache Sentry (cont'd)
• (Group, Role) mapping
●
Defined in policy
●
One-to-Many
●
Example
[groups]
analyts = ...
Introducing Apache Sentry (cont'd)
• Rule evaluation
●
Who's the user?
●
Which group(s) does the user belong to?
●
What re...
Outline
• Introduction
• Hadoop security primer
• Authentication
• Authorization
• Data Protection
• Governance and Auditi...
Future Work
29
●
Introduce Sentry to more Hadoop components for their
authorization needs
●
Centralized policy store aimin...
Outline
• Introduction
• Hadoop security primer
• Authentication
• Authorization
• Data Protection
• Governance and Auditi...
Click to edit Master title style
31
Upcoming SlideShare
Loading in...5
×

April 2014 HUG : Apache Sentry

1,376

Published on

April 2014 HUG : Apache Sentry

Published in: Data & Analytics, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,376
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
60
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "April 2014 HUG : Apache Sentry"

  1. 1. 1 Apache Sentry: Enterprise-grade Security for Hadoop Xuefu Zhang, Srayva Tirukkovalur | Cloudera April 16, 2014
  2. 2. Outline • Introduction • Hadoop security primer • Authentication • Authorization • Data Protection • Governance and Auditing • Introducing Apache Sentry • What's Sentry • Sentry Architecture • Sentry Internal • Future Work • Demo • Q&A 2
  3. 3. Introduction ● Hadoop gets bigger ... ● Hadoop has been enjoying an increasing adoption rate ● More and more data on Hadoop Cluster ● More and more access to the data ● Data warehouse offload is the most common use case ● Apache Hive, Apache Drill, Cloudera Impala ● SQL on Hadoop is phenomenon 3
  4. 4. Introduction (cont'd) ● But more encumbrance ... ● Enterprises wants to protect sensitive data ● Government regulations, compliance, like HIPPA, PII, FISMA ● Existing security problems with Hadoop has hindered the adoption ● Security has become the top priority 4
  5. 5. Introduction (cont'd) ● Reality is ... ● Different components, different security mechanisms ● Multiple components may access the same data set ● Hadoop was born out of trust, not security ● Thinking of Windows 5
  6. 6. Outline • Introduction • Hadoop security primer • Authentication • Authorization • Data Protection • Governance and Auditing • Introducing Apache Sentry • What's Sentry • Sentry Architecture • Sentry Internal • Future work • Demo • Q&A 6
  7. 7. Hadoop Security Primer • Authentication ● Identify who you are ● Untrusted users has no access to the cluster network ● In a trusted network, every one is good citizen ● Who you are is determined by client host 7
  8. 8. Hadoop Security Primer • Strong Authentication ● Kerberos ● LDAP, ActiveDirectory ● LDAP, AD integrated with Kerberos, establishing a single point of truth ● Single Sign On 8
  9. 9. Hadoop Security Primer (cont'd) • Kerberos ● Strong authentication ● Provides mutual authentication ● Protects against eavesdropping and replay attacks ● Every user and service has a Kerberos “principal” ● Credentials: keytabs (service), password (user) 9
  10. 10. Hadoop Security Primer (cont'd) • Authorization ● Determine if you can access ● HDFS Posix style permission R/W/X for U/G/O, coarse- grained ● Other components have authorization ● MR job queue ● HBase ACLs on table and column family. ● Accumulo provides cell-level access control ● Impersonation 10
  11. 11. Hadoop Security Primer (cont'd) • Data Protection ● Data at rest and in transit ● Hadoop provides encryption on data in transit: DTP, HTTP, RPC, JDBC/ODBC ● Hadoop has no native encryption on data at rest (HDFS- 6134) ● Relying on OS-level encryption 11
  12. 12. Hadoop Security Primer (cont'd) • Governance and auditing ● Again, component to component ● DFS and MapReduce provide base audit support ● Apache Hive metastore records audit (who/when) information for Hive interactions. ● Apache Oozie provides audit trail for services 12
  13. 13. Outline • Introduction • Hadoop security primer • Authentication • Authorization • Data Protection • Governance and Auditing • Introducing Apache Sentry • What's Sentry • Sentry Architecture • Sentry Internal • Future work • Demo • Q&A 13
  14. 14. Introducing Apache Sentry 14 ● Hadoop Authorization ● Existing authorization is fragmented, coarse-grained, and manual ● A lot of times data is just unprotected for simplicity ● Enterprises need a centralized authorization component that work across components with ease of use, fine- grained, role based
  15. 15. Introducing Apache Sentry (cont'd) 15 ● What's Sentry ● Sentry is an authorization module for Hive, Search, Impala, and beyond ● It unlocks Key RBAC Requirements: secure, fine- grained, role-based authorization, multi-tenant administration ● Open Source, Apache Incubator project ● Ecosystem Support: Apache SOLR, HiveServer2, & Impala 1.1+
  16. 16. Introducing Apache Sentry (cont'd) 16 ● Key Benefits ● Store Sensitive Data in Hadoop ● Extend Hadoop to More Users ● Comply with Regulations
  17. 17. Introducing Apache Sentry (cont'd) 17 ● Key Capabilities ● Fine-Grained: SERVERS, DATABASES, TABLES & VIEWS; INDEXES, COLLECTIONS ● Role-Based: role including privileges such as SELECT, INSERT, ALL; UPDATE, QUERY ● Multi-Tenant administration ● Separate policies for each database/schema ● Can be maintained by separate admins
  18. 18. Introducing Apache Sentry (cont'd) 18 Binding Layer Impala Impala Hive Policy Engine Policy Provider File Database HiveServer 2 Authorization Provider Local FS/HDFS Search SOLR Pig … Sentry Architecture
  19. 19. Introducing Apache Sentry (cont'd) 19 QueryMR SQL Parse Build Check Plan Sentry Validate SQL grammar Construct statement tree Validate statement objects • First check: Authorization Forward to execution planner
  20. 20. Introducing Apache Sentry (cont'd) • Actors ● User ● User group membership ● Resources ● Privilege ● Role 20
  21. 21. Introducing Apache Sentry (cont'd) • User ● User authenticated ● User identity obtained from session context 21
  22. 22. Introducing Apache Sentry (cont'd) • User group membership ● Defined outside sentry policy ● Obtained from user directory (LDAP, AD, HDFS) ● Maybe available from session context 22
  23. 23. Introducing Apache Sentry (cont'd) • Resources ● Data to be protected ● File or directory on HDFS ● Table or views in Hive ● URI ● Resource can be hierarchical 23
  24. 24. Introducing Apache Sentry (cont'd) • Privilege ● Action or operation associated with a resource ● Exists in a role only ● SELECT on a given TABLE or VIEW ● CREATE a TABLE or VIEW ● QUERY on a search COLLECTION ● DELETE a FILE or DIRECTORY ● Example collection=customerCol->action=query 24
  25. 25. Introducing Apache Sentry (cont'd) • Roles ● A collection of privileges ● Defined in Sentry policy ● Example [roles] ana_query_role = collection=sentryColl->action=query ana_update_role = collection=sentryColl->action=update test_role = collection=testColl->action=update full_admin_role = collection=* 25
  26. 26. Introducing Apache Sentry (cont'd) • (Group, Role) mapping ● Defined in policy ● One-to-Many ● Example [groups] analyts = ana_query_role, ana_update_role admins = full_admin_role testgroup = test_role hbase = full_admin_role 26
  27. 27. Introducing Apache Sentry (cont'd) • Rule evaluation ● Who's the user? ● Which group(s) does the user belong to? ● What resource to be accessed? ● How the resource is accessed (READ, SELECT, etc.)? ● Does any of the user's groups have a role, which has the right privilege? ● Yes – great! Go head! ● No – sorry! No sufficient privilege! 27
  28. 28. Outline • Introduction • Hadoop security primer • Authentication • Authorization • Data Protection • Governance and Auditing • Introducing Apache Sentry • What's Sentry • Sentry Architecture • Sentry Internal • Future work • Demo • Q&A 28
  29. 29. Future Work 29 ● Introduce Sentry to more Hadoop components for their authorization needs ● Centralized policy store aiming for the whole enterprise ● Grant/Revoke ● Centralized authorization service for all protected resources including metadata  We appreciate your contribution and support
  30. 30. Outline • Introduction • Hadoop security primer • Authentication • Authorization • Data Protection • Governance and Auditing • Introducing Apache Sentry • What's Sentry • Sentry Architecture • Sentry Internal • Future work • Demo • Q&A 30
  31. 31. Click to edit Master title style 31
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×