Hive contributors meetup apache sentry


  1. 1. Apache Sentry (incubating) On Hive Integration November 18th, 2013
  2. 2. Current State of Authorization in Hive •  Advisory Authorization -  Facilitates self regulation to avoid safeguard against accidental changes -  Users can grant themselves privileges as necessary -  Problem: Insufficient to guard against malicious users •  Impersonation -  Data is protected at the file level by HDFS permissions -  Problem: File-level access is not granular enough -  Problem: Not role-based 2
  3. 3. Authorization Requirements •  Secure Authorization Ability to control access to data and/or privileges on data for authenticated users •  Fine-Grained Authorization Ability to give users access to a subset of data in files •  Role-Based Authorization Ability to create/apply templatized privileges based on functional roles •  Multi-Tenant Administration Ability for central admin group to empower lower-level admins to manage security for each database/schema 3
  4. 4. Introducing Sentry Authorization module for Hadoop ecosystem •  Unlocks Key RBAC Requirements ᵒ  Secure, fine-grained, role-based authorization ᵒ  Multi-tenant administration ᵒ  Open Source via Apache Incubator ᵒ  Modular RBAC Framework ᵒ  Multiple users in production for months 4
  5. 5. Sentry: Fine-Grained Authorization Concepts   Implementa=ons   Binding   Solr  Binding   Policy   Database   Policy   Search   Policy   Policy   Provider   5 Hive   Binding   File-­‐based   Provider   File-­‐based   Provider  
  6. 6. Sentry: Fine-Grained Authorization •  Ability to specify privileges on ᵒ  SERVER, DATABASE, TABLE, VIEW, URI •  Privilege Granularity ᵒ  SELECT ᵒ  INSERT ᵒ  ALL •  Multi-Tenant Administration ᵒ  Administration per database 6
  7. 7. Granting Privileges •  Example: Grant SELECT on table CUSTOMERS from database SALES: server=server1->db=sales->table=customer->action=SELECT! •  Objects represented by containment Hierarchy •  Privilege granted for the leaf object and its continues !! 7
  8. 8. Specifying Roles •  Roles are collection of Privileges •  Example: A role Seller that allows SELECT on table CUSTOMER and Insert on table ITEMS ! seller_role = server=server1->db=sales->table=customer->action=Select, ! ! 8 server=server1->db=sales->table=items->action=Insert!
  9. 9. Users and Groups •  Works with existing Authentication Mechanisms •  Group connects the authentication system with authorization system. ᵒ  A Set of Roles can be assigned to a Group !analyst = sales_reporting, data_export, audit_report! •  User to Group Mapping: ᵒ  Using Hadoop groups ᵒ  Or Specify Locally in sentry-site.xml file 9
  10. 10. User Feedback I have implemented Hiveserver2 Authentication (openLDAP) and Authorization (using Cloudera Sentry). I am super-excited because we know can open our Hive Data Platform in "read only" mode to remote clients in the company and SAS clients. Source: •  Apache •  Tue, 17 Sep 2013 19:10:43 GMT • 10
  11. 11. Future Direction •  Integration with other systems •  More Granular Privileges •  Usability Improvements 11
  12. 12. Hive Requirements •  Sentry plugs into existing hooks such as the Semantic Analyzer hook interface •  Changes required are minor, estimating ~600 LOC including unit tests 12
  13. 13. Hive Requirements Follow Hive integration via SENTRY-67 •  HIVE-4670 - Authentication module should pass the instance part of the Kerberos principle •  HIVE-4390 - Enable capturing input URI entities for DML statements •  HIVE-4741 - Add Hive config API to modify the restrict list •  HIVE-4641 - Support post execution/fetch hook for HiveServer2 13