• Save
Deploying Enterprise-grade Security for Hadoop
 

Deploying Enterprise-grade Security for Hadoop

on

  • 1,818 views

Deploying enterprise grade security for Hadoop or six security problems with Apache Hive. In this talk we will discuss the security problems with Hive and then secure Hive with Apache Sentry. ...

Deploying enterprise grade security for Hadoop or six security problems with Apache Hive. In this talk we will discuss the security problems with Hive and then secure Hive with Apache Sentry. Additional topics will include Hadoop security, and Role Based Access Control (RBAC).

Statistics

Views

Total Views
1,818
Views on SlideShare
1,540
Embed Views
278

Actions

Likes
5
Downloads
0
Comments
0

7 Embeds 278

http://www.cloudera.com 257
http://cloudera.com 11
http://www.slideee.com 3
http://author01.mtv.cloudera.com 2
https://www.linkedin.com 2
http://author01.core.cloudera.com 2
http://news.google.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Other aspects areConfidentiallyAudit
  • Many, many ways to execute arbitrary codeHive was created originally by web companies that simply don’t care about security. In fact we often run into push back from the community when integrating security. In my presentation at the TC HUG I will explain in detail all the ways in which Hive is insecure. The point is by default any user can execute any code they wish.Users grant themselves permissionsUsers can query any data they please by granting themselves permissions.Zero metadata securityNote possible to stop users from modifying or viewing any metadata.
  • Manual file permission managementWhen users want to share tables and data with other users it requires modifying file permissions. Can anyone guess what happens next?End state is world writable/readableUsers end up making data world writable and readable.No ability to restrict access to columns or rows Users cannot be restricted to a subset of the data and so tables are copied simply to restrict access to data which results in thousands of out of date tables which full read and write permissions.
  • Role-Based Access Control (RBAC) For finer-grained access to data accessible via schema -- that is, data structures described by the Apache Hive Metastore and utilized by computing engines like Hive and Impala, as well as collections and indices within Cloudera Search -- Cloudera developed Apache Sentry, which offers a highly modular, role-based privilege model for this data and its given schema. (Cloudera donated Apache Sentry to the Apache Foundation in 2013.) Sentry governs access to each schema object in the Metastore via a set of privileges like SELECT and INSERT. The schema objects are common entities in data management, such as SERVER, DATABASE, TABLE, COLUMN, and URI, i.e. file location within HDFS. Cloudera Search has its own set of privileges, e.g. QUERY, and objects, e.g. COLLECTION. As with other RBAC systems that IT teams are already familiar with, Sentry provides for: Hierarchies of objects, with permissions automatically inherited by objects that exist within a larger umbrella object; Rules containing a set of multiple object/permission pairs; Groups that can be granted one or more roles; Users can be assigned to one or more groups. Sentry is normally configured to deny access to services and data by default so that users have limited rights until they are assigned to a group that has explicit access roles. Column-level Security, Row-level Security and Masked Access Using the combination of Sentry-based permissions, SQL views, and User Defined Functions (UDFs), developers can gain a high degree of access control granularity for SQL computing engines through HiveServer2 and Impala, including: Column-level security - To limit access to only particular columns of entire tables, uses can access the data through a view, which contains either a subset of columns in the table, or have certain columns masked. For example, a view can filter a column to only the last four digits of a US Social Security number. Row-level security - To limit access by particular values, views can employ CASE statements to control rows to which a group of users has access. For example, a broker at a financial services firm may only be able to see data within her managed accounts.
  • Impala metadata queries, i.e. “SHOW TABLES,” query the Hive Metastore directly and then queries Sentry to filter the results before returning.

Deploying Enterprise-grade Security for Hadoop Deploying Enterprise-grade Security for Hadoop Presentation Transcript

  • 6 ways to exploit Hive – and what to do about it Brock Noland |Software Engineer, Cloudera January 23, 2013 1
  • Outline Introduction • Hadoop security primer • • • • Security options • • • • 2 Authentication Authorization Default Kerberos with Impersonation Kerberos with Sentry Demo
  • Introduction Tonight's focus is SQL-on-Hadoop • Vast majority of Hadoop users use Hive or Cloudera Impala • Data warehouse offload is the most common use case • Data warehouse offload is a two step process 1. 2. 3 Automatic transformations moved to Hadoop Data analysts given query access
  • Data warehouse use case Online Database 4 Hadoop Data Warehouse
  • Outline Introduction • Hadoop Security Primer • • • • Security options • • • • 5 Authentication Authorization Default Kerberos with Impersonation Kerberos with Sentry Demo
  • Authentication Authentication is who you are • Hadoop models • • • 6 Default - “trusted network” Strong - Kerberos
  • Default Authentication – trusted network Default security mechanism • Hadoop client uses local username • Used in • • • • • 7 POCs Startups Demos Pre-prod environments
  • Default Authentication – trusted network Client Host User: brock File: a.txt Contents: some data $ whoami brock $ cat a.txt some data $ hadoop fs -put file . 8 Hadoop
  • Strong Authentication – Kerberos • Hadoop is secured with Kerberos • • • Every user and service has a Kerberos “principal” • • • Service: impala/hostname@MYCOMPANY.COM User: brock@MYCOMPANY.COM Credentials • • 9 Provides mutual authentication Protects against eavesdropping and replay attacks Service: keytabs User: password
  • Strong Authentication – Kerberos Client Host User: brock <kerberos ticket> <encrypted data> * $ whoami brock $ kinit Password: ******* $ cat a.txt some data $ hadoop fs -put file . 10 Hadoop * RPC Encryption must be enabled
  • Strong Authentication – Kerberos • Keytab • • 11 Encrypted key for servers (similar to a “password”) Generated by server such as MIT Kerberos or Active Directory
  • Hive Server 2 and Oozie Beeline (Hive CLI) Tableau JDBC Hive Server 2 (HS2) Oozie Hadoop 12 Oozie CLI Control-M
  • Strong Authentication – Kerberos • Impersonation • • • 13 Services such as Hive Server2 impersonate users Data loaded by “joe” via HS2 is owned by “joe” Oozie jobs submitted by “brock” are run as “brock”
  • Authorization • HDFS permissions • • • • Other Hadoop components have authorization • • 14 Unix style Read/Write/Execute for Owner/Group/Other Coarse grained MapReduce who can use which job queues HBase table ACL’s
  • HDFS Permisssions $ hadoop fs -ls file -rw-r----1 analyst1 analysts • Permissions • • • • Owner • • Unix style permissions Read/Write/Execute Owner/Group/Other One and only one owner Group • One and only one group 2244 2014-01-19 12:15 file
  • Back to our use case • Scenario facts • • • • Next step • • 16 ETL offload is a success Data warehouse is expensive and at capacity Same data is in Hadoop End users start using Hadoop to augment the DW Security becomes primary concern
  • End users need to share data Unlike automated ETL jobs, end users want to share data with peers • Must manage HDFS permissions manually • Each file has a single group • End result is users set permissions to world readable/writeable • 17
  • Outline Introduction • Hadoop Security Primer • • • • Security options • • • • 18 Authentication Authorization Default Kerberos with Impersonation Kerberos with Sentry Demo
  • Hive: Security holes CREATE TEMPORARY FUNCTION custom_udf AS ’com.mycompany. MaliciousClass’; SELECT TRANSFORM(stuff) USING 'malicious-script.pl' AS thing1, thing; CREATE EXTERNAL TABLE external_table(column1 string) LOCATION ‘/path/to/any/table’; 19
  • Hive: Security holes CREATE TABLE test (c1 string) ROW FORMAT SERDE 'com.mycompany.MaliciousClass'; FROM ( FROM t1 MAP t1.c1 USING 'malicious-script1.pl' CLUSTER BY key) map_output INSERT OVERWRITE TABLE t2 REDUCE t2.c1 USING 'malicious-script2.pl' AS c2; 20
  • Default: Authorization • Hive ships with an “advisory” authorization system • • • 21 All users see all databases/tables/columns Does not fix any security holes Users grant themselves permissions
  • Outline Introduction • Hadoop Security Primer • • • • Security options • • • • 22 Authentication Authorization Default Kerberos with Impersonation Kerberos with Sentry Demo
  • Kerberos with impersonation: Sharing data The user “manager1” wants to share the table “manager1_table” with senior analysts but not junior analysts. # hadoop fs -ls -R /user/hive/warehouse drwxr-x--T - analyst1 analyst1 drwxr-x--T - jranalyst1 jranalyst1 drwxr-x--T - manager1 manager1 23 0 0 0 analyst1_table jranalyst1_table manager1_table
  • Kerberos with impersonation: Sharing data IT must create a group # groupadd senioranalysts Then add the appropriate members to group # usermod -G analyst,senioranalysts analyst1 # usermod -G management,analyst,senioranalysts manager1 24
  • Kerberos with impersonation: Sharing data Then “manager1” can manually change the file permissions $ hadoop fs -chgrp -R senioranalysts …/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 drwxr-x--T - jranalyst1 jranalyst1 drwxr-x--T - manager1 senioranalysts 25 0 0 0 analyst1_table jranalyst1_table manager1_table
  • Kerberos with impersonation: Sharing data Now any senior-level analyst can query the data $ whoami analyst1 $ beeline ... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> select count(*) from manager1_table; +------------+ | count(*) | +------------+ | 47 | +------------+ 26 ⏎
  • Kerberos with impersonation: Sharing data Junior analysts cannot query the data: $ whoami jranalyst1 $ beeline .... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> ⏎ select * from manager1_table; Error: java.io.IOException: org.apache.hadoop.security.AccessControlException: Permission denied: user=jranalyst1, access=READ_EXECUTE, inode="/user/hive/warehouse/mana ger1_table":manager1:senioranalysts:drwxr-x--T 27
  • Kerberos with impersonation: Sharing data What happens in the real world? 28
  • Kerberos with impersonation: Sharing data Table “manager1_table” is owned by user/group “manager1” $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 drwxr-x--T - jranalyst1 jranalyst1 drwxr-x--T - manager1 manager1 29 0 0 0 analyst1_table jranalyst1_table manager1_table
  • Kerberos with impersonation: Sharing data User “manager1” makes “manager1_table” world readable/writable $ hadoop fs -chmod -R 777 /user/hive/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 drwxr-x--T - jranalyst1 jranalyst1 drwxrwxrwt - manager1 manager1 30 0 0 0 analyst1_table jranalyst1_table manager1_table
  • Kerberos with impersonation: Summary • Securing Hive with Kerberos makes Hive unusable for DW offload • • • • 31 Manual file permission management End state is world writable/readable No ability to restrict access to columns or rows All users see all databases/tables/columns
  • Outline Introduction • Hadoop Security Primer • • • • Security options • • • • 32 Authentication Authorization Default Kerberos with Impersonation Kerberos with Sentry Demo
  • Fine Grained Security: Apache Sentry Authorization module for Hive, Search, & Impala Unlocks Key RBAC Requirements Secure, fine-grained, role-based authorization Multi-tenant administration Open Source Apache Incubator project Ecosystem Support Apache SOLR, HiveServer2, & Impala 1.1+ 33
  • Key Benefits of Sentry Store Sensitive Data in Hadoop Extend Hadoop to More Users Comply with Regulations 34
  • Key Capabilities of Sentry Fine-Grained Authorization Specify security for SERVERS, DATABASES, TABLES & VIEWS Role-Based Authorization SELECT privilege on views & tables INSERT privilege on tables ALL privilege on the server, databases, tables & views ALL privilege is needed to create/modify schema Multi-Tenant Administration Separate policies for each database/schema Can be maintained by separate admins 35
  • Sentry Architecture Impala Binding Layer Impala HiveServer2 Hive Authorization Provider SOLR Search Pig Policy Engine Policy Provider File Local FS/HDFS 36 Database …
  • Query Execution Flow SQL Parse Validate SQL grammar Build Construct statement tree Check Sentry Forward to execution planner Plan MR 37 Validate statement objects • First check: Authorization Query
  • Outline Introduction • Hadoop Security Primer • • • • Security options • • • • 38 Authentication Authorization Default Kerberos with Impersonation Kerberos with Sentry Demo
  • Click to edit Master title style 39