Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Ranger Hive Metastore Security

1,810 views

Published on

Apache Ranger Hive Metastore Security

Published in: Technology
  • Be the first to comment

Apache Ranger Hive Metastore Security

  1. 1. © 2016 IBM CorporationHadoop Summit – San Jose 2016Hadoop Summit – San Jose 2015 Apache Ranger Hive Metastore Security Yan Zhou (zhouya@us.ibm.com), Tanping Wang(wangta@us.ibm.com) IBM Big Insights Product Lead Architects, Silicon Valley Lab, IBM
  2. 2. © 2016 IBM Corporation2 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Apache Ranger  Provides centralized policy definition for authorizing & auditing access to resources in a consistent manner.  Agent AgentAgent AgentAgent Agent HBase Hive YARN Knox Storm Solr Kafka Agent HDFS Agent Audit Server Policy Server Administration Portal REST APIs DB SOLR HDFS KMS LDAP/AD user/group syncLog4j
  3. 3. © 2016 IBM Corporation3 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 HiveServer2 Ranger Authorization Model Ranger Policy Manager HiveServer2 Ranger Agent Admin sets policies for Hive Databases/Tables/Columns … User Application Users access Hive data through application HiveServer2 IT/Analysis users access HiveServer2 through Beeline Hiveserver2 uses Agent for Authorization Ranger Audit Database Audit logs pushed to DB HiveServer2 provides table data access to user/client 1 2 2 3 4 5 Policy Refreshing
  4. 4. © 2016 IBM Corporation4 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Motivation: Gaps for the Current Hive Ranger Authorization Model DO DO NOT Hive CLI Hive CLI does not work with Ranger HiveServer 2 • Provides ACL to the database, tables, columns and locks. • Supports Ranger policy creation or deletion from the Hive Grant or Revoke statements. Do not support adjustments of Hive-created policies as result of DDLs: • Once the DB object name is changed from DDL, the Hive- created policy in Ranger is out of sync; • Once the DB object is deleted, the Hive-created policy in Ranger becomes orphan.
  5. 5. © 2016 IBM Corporation5 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Motivation: Gaps for the Current HiveServer2 Ranger Authorization Mode (cont’d) Resource ACL Sync Up GOOD NOT GOOD Storage-based Authorization Consistent access controls by Hive and HDFS Is not good at controlling of SQL data access at finer granularity like COLUMN SQL Standard-based Authorization Fits well with SQL standard privilege model Does not provide consistent privileges across Hive and HDFS, and potentially forbids the sharing of Hive data with other Hadoop apps Needs a holistic view of the HDFS and Hive ACLs to provide a consistent privilege control.
  6. 6. © 2016 IBM Corporation6 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 We Introduce: The New Hive Metastore Ranger Security Agent Provides Use Cases Hive CLI • ACLs for Hive CLI hive> SELECT * FROM employee; Before: Hive decides the ACL on its own. After: invoke the Hive Metastore Ranger security agent to get the ACL from Ranger. HiveServer2 • Authorization for the Metastore objects • ACLs is in sync with the SQL objects all the time. hive> GRANT SELECT on table employee to user hr1; hive> ALTER TABLE employee RENAME TO employees; Before: No changes on the Range policy for the user, hr1 on the table, employee. After: Ranger policy for hr1 changed to be on employees.
  7. 7. © 2016 IBM Corporation7 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 We Introduce: The New Hive Metastore Ranger Security Agent (cont’d) Provide Use Cases Resource ACL Sync Up  Provide consistent access control between Hive and HDFS for SQL- standard based privilege model. beeline> CREATE TABLE employee(name STRING); // by user “hr1” beeline> LOAD DATA LOCAL INPATH ‘/data/input.txt’ OVERWRITE INTO TABLE employee; pig> LOAD ‘/user/hive/warehouse/employee’ USING PigStorage() AS (name:chararray) Before: not allowed by the user, hr1 After: allowed by the user, hr1.
  8. 8. © 2016 IBM Corporation8 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Ranger Hive Metastore Security Workflow – Hive CLI Ranger Policy Manager Admin sets policies for Hive Databases/Tables/C olumns … User Application Users access Hive data through application invoking Hive CLI Hive CLI IT/Analysis users access Hive data through CLI Ranger Audit Database Audit logs pushed to DB Hive CLI provides table data access to user/client 1 2 2 4 5 Ranger Metastore Agents Hive CLI uses agents for Authz, and Policy Object Sync from DDL 3 Policy Refreshing
  9. 9. © 2016 IBM Corporation9 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Ranger Hive Metastore Security Workflow – HiveServer2 Ranger Policy Manager Ranger HiveServer2 Agent Admin sets policies for Hive Databases/Tables/Col umns … User Application Users access Hive data through application HiveServer2 IT/Analysis users access HiveServer2 through Beeline Ranger Audit Database Audit logs pushed to DB HiveServer2 provides table data access to user/client 1 2 2 3 5 6 Ranger Metastore Agents 4 Policy Refreshing Hiveserver2 uses Ranger Agent for Authz HiveServer2 uses Ranger Metastore agent for ACL Object Sync on DDL
  10. 10. © 2016 IBM Corporation10 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Metastore Security Workflow – HDFS ACL Sync (Ongoing) Ranger Policy Manager Admin sets policies for Hive Databases/Tables/Col umns … HiveServer2 IT/Analysis user Joe 1 1 Ranger Metastore Agents HDFS uses Agent for authorization Create table t1 Sets new HDFS policy for Joe on /user/hive/warehouse/t1 2 2 Ranger HDFS Agent HDFS NameNode HiveServer2 passes Hive Metadata to Metastore Agents 5 Joe uses PIG to read Hive Data in /user/hive/ warehouse /t1 PIG 6 Policy Refreshing Passes HDFS security info to Policy Manager3 4
  11. 11. © 2016 IBM Corporation11 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Hive Security Hooks and Their Ranger Implementation/Extensions Hive Authorizer MetaStorePre EventListener MetaStore EventListener RangerHive Authorizer RangerHive Metastore Authorizer RangerHive Metastore PrivilegeHandler implements extends extends Hive
  12. 12. © 2016 IBM Corporation12 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Ranger Implementation/Extensions of Hive Security Hooks  RangerHiveAuthorizer  Existing Ranger Hive Agent  Methods: check/grant/revokePrivileges  Handles: HiveServer2 Authorization; Grant/Revoke  RangerHiveMetastoreAuthorizer  New Ranger Hive Metastore Agent  Methods: on(Create/Drop/Alter)(Table/Database/Index/…)  Handles: CLI Authorization  RangerHiveMetastorePrivilegeHandler  New Ranger Hive Metastore Agent  Methods: (create/drop/alter)(Table/Databse/Index/…)  Handles: Sync of Hive ACL objects and Resource ACLs
  13. 13. © 2016 IBM Corporation13 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Status, Future Plan and References  Patch Ready: o CLI access control o Policy Object Sync from DDL  Ongoing Work: o Resource ACL Sync  References: o https://issues.apache.org/jira/browse/RANGER-768 o https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization o https://cwiki.apache.org/confluence/display/Hive/Storage+Based+Authorization+in+the+ Metastore+Server
  14. 14. © 2016 IBM Corporation14 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Demo  Software Versions: Ranger 6.0 + Hadoop 2.7.0 + Hive 1.2.1  Test Cases: With Ranger HiveServer2 Agent but without Ranger Hive Metastore Security Agents • CLI: SQL not subject to Ranger ACLs • HiveServer2: No Object sync of Ranger ACLs as result of SQL DDL With Ranger HiveServer2 Agent and Ranger Hive Metastore Security Agents • CLI: SQL subject to Ranger ACLs • HiveServer2: Object sync of Ranger ACLs as result of SQL DDL
  15. 15. © 2016 IBM Corporation15 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Q & A

×