Your SlideShare is downloading. ×
  • Like
April 2014 HUG : Apache Sentry
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

April 2014 HUG : Apache Sentry

  • 1,033 views
Published

April 2014 HUG : Apache Sentry

April 2014 HUG : Apache Sentry

Published in Data & Analytics , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,033
On SlideShare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
38
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 1 Apache Sentry: Enterprise-grade Security for Hadoop Xuefu Zhang, Srayva Tirukkovalur | Cloudera April 16, 2014
  • 2. Outline • Introduction • Hadoop security primer • Authentication • Authorization • Data Protection • Governance and Auditing • Introducing Apache Sentry • What's Sentry • Sentry Architecture • Sentry Internal • Future Work • Demo • Q&A 2
  • 3. Introduction ● Hadoop gets bigger ... ● Hadoop has been enjoying an increasing adoption rate ● More and more data on Hadoop Cluster ● More and more access to the data ● Data warehouse offload is the most common use case ● Apache Hive, Apache Drill, Cloudera Impala ● SQL on Hadoop is phenomenon 3
  • 4. Introduction (cont'd) ● But more encumbrance ... ● Enterprises wants to protect sensitive data ● Government regulations, compliance, like HIPPA, PII, FISMA ● Existing security problems with Hadoop has hindered the adoption ● Security has become the top priority 4
  • 5. Introduction (cont'd) ● Reality is ... ● Different components, different security mechanisms ● Multiple components may access the same data set ● Hadoop was born out of trust, not security ● Thinking of Windows 5
  • 6. Outline • Introduction • Hadoop security primer • Authentication • Authorization • Data Protection • Governance and Auditing • Introducing Apache Sentry • What's Sentry • Sentry Architecture • Sentry Internal • Future work • Demo • Q&A 6
  • 7. Hadoop Security Primer • Authentication ● Identify who you are ● Untrusted users has no access to the cluster network ● In a trusted network, every one is good citizen ● Who you are is determined by client host 7
  • 8. Hadoop Security Primer • Strong Authentication ● Kerberos ● LDAP, ActiveDirectory ● LDAP, AD integrated with Kerberos, establishing a single point of truth ● Single Sign On 8
  • 9. Hadoop Security Primer (cont'd) • Kerberos ● Strong authentication ● Provides mutual authentication ● Protects against eavesdropping and replay attacks ● Every user and service has a Kerberos “principal” ● Credentials: keytabs (service), password (user) 9
  • 10. Hadoop Security Primer (cont'd) • Authorization ● Determine if you can access ● HDFS Posix style permission R/W/X for U/G/O, coarse- grained ● Other components have authorization ● MR job queue ● HBase ACLs on table and column family. ● Accumulo provides cell-level access control ● Impersonation 10
  • 11. Hadoop Security Primer (cont'd) • Data Protection ● Data at rest and in transit ● Hadoop provides encryption on data in transit: DTP, HTTP, RPC, JDBC/ODBC ● Hadoop has no native encryption on data at rest (HDFS- 6134) ● Relying on OS-level encryption 11
  • 12. Hadoop Security Primer (cont'd) • Governance and auditing ● Again, component to component ● DFS and MapReduce provide base audit support ● Apache Hive metastore records audit (who/when) information for Hive interactions. ● Apache Oozie provides audit trail for services 12
  • 13. Outline • Introduction • Hadoop security primer • Authentication • Authorization • Data Protection • Governance and Auditing • Introducing Apache Sentry • What's Sentry • Sentry Architecture • Sentry Internal • Future work • Demo • Q&A 13
  • 14. Introducing Apache Sentry 14 ● Hadoop Authorization ● Existing authorization is fragmented, coarse-grained, and manual ● A lot of times data is just unprotected for simplicity ● Enterprises need a centralized authorization component that work across components with ease of use, fine- grained, role based
  • 15. Introducing Apache Sentry (cont'd) 15 ● What's Sentry ● Sentry is an authorization module for Hive, Search, Impala, and beyond ● It unlocks Key RBAC Requirements: secure, fine- grained, role-based authorization, multi-tenant administration ● Open Source, Apache Incubator project ● Ecosystem Support: Apache SOLR, HiveServer2, & Impala 1.1+
  • 16. Introducing Apache Sentry (cont'd) 16 ● Key Benefits ● Store Sensitive Data in Hadoop ● Extend Hadoop to More Users ● Comply with Regulations
  • 17. Introducing Apache Sentry (cont'd) 17 ● Key Capabilities ● Fine-Grained: SERVERS, DATABASES, TABLES & VIEWS; INDEXES, COLLECTIONS ● Role-Based: role including privileges such as SELECT, INSERT, ALL; UPDATE, QUERY ● Multi-Tenant administration ● Separate policies for each database/schema ● Can be maintained by separate admins
  • 18. Introducing Apache Sentry (cont'd) 18 Binding Layer Impala Impala Hive Policy Engine Policy Provider File Database HiveServer 2 Authorization Provider Local FS/HDFS Search SOLR Pig … Sentry Architecture
  • 19. Introducing Apache Sentry (cont'd) 19 QueryMR SQL Parse Build Check Plan Sentry Validate SQL grammar Construct statement tree Validate statement objects • First check: Authorization Forward to execution planner
  • 20. Introducing Apache Sentry (cont'd) • Actors ● User ● User group membership ● Resources ● Privilege ● Role 20
  • 21. Introducing Apache Sentry (cont'd) • User ● User authenticated ● User identity obtained from session context 21
  • 22. Introducing Apache Sentry (cont'd) • User group membership ● Defined outside sentry policy ● Obtained from user directory (LDAP, AD, HDFS) ● Maybe available from session context 22
  • 23. Introducing Apache Sentry (cont'd) • Resources ● Data to be protected ● File or directory on HDFS ● Table or views in Hive ● URI ● Resource can be hierarchical 23
  • 24. Introducing Apache Sentry (cont'd) • Privilege ● Action or operation associated with a resource ● Exists in a role only ● SELECT on a given TABLE or VIEW ● CREATE a TABLE or VIEW ● QUERY on a search COLLECTION ● DELETE a FILE or DIRECTORY ● Example collection=customerCol->action=query 24
  • 25. Introducing Apache Sentry (cont'd) • Roles ● A collection of privileges ● Defined in Sentry policy ● Example [roles] ana_query_role = collection=sentryColl->action=query ana_update_role = collection=sentryColl->action=update test_role = collection=testColl->action=update full_admin_role = collection=* 25
  • 26. Introducing Apache Sentry (cont'd) • (Group, Role) mapping ● Defined in policy ● One-to-Many ● Example [groups] analyts = ana_query_role, ana_update_role admins = full_admin_role testgroup = test_role hbase = full_admin_role 26
  • 27. Introducing Apache Sentry (cont'd) • Rule evaluation ● Who's the user? ● Which group(s) does the user belong to? ● What resource to be accessed? ● How the resource is accessed (READ, SELECT, etc.)? ● Does any of the user's groups have a role, which has the right privilege? ● Yes – great! Go head! ● No – sorry! No sufficient privilege! 27
  • 28. Outline • Introduction • Hadoop security primer • Authentication • Authorization • Data Protection • Governance and Auditing • Introducing Apache Sentry • What's Sentry • Sentry Architecture • Sentry Internal • Future work • Demo • Q&A 28
  • 29. Future Work 29 ● Introduce Sentry to more Hadoop components for their authorization needs ● Centralized policy store aiming for the whole enterprise ● Grant/Revoke ● Centralized authorization service for all protected resources including metadata  We appreciate your contribution and support
  • 30. Outline • Introduction • Hadoop security primer • Authentication • Authorization • Data Protection • Governance and Auditing • Introducing Apache Sentry • What's Sentry • Sentry Architecture • Sentry Internal • Future work • Demo • Q&A 30
  • 31. Click to edit Master title style 31