Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fine-Grained Security for Spark and Hive

2,695 views

Published on

Fine-Grained Security for Spark and Hive

Published in: Technology
  • Be the first to comment

Fine-Grained Security for Spark and Hive

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Fine-Grained Security for Spark and Hive Carter Shanklin - Director PM Don Bosco Durai - Security Architect June 29, 2016
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda ● Current security options and challenges ● Apache Ranger Overview ● LLAP Overview ● Use Cases and Demo ● Apache Atlas Integration
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Current Options and Challenges
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Current Options and Challenges ⬢ Limited to storage level access control for Spark, Pig and MR ⬢ Column Level Access via HiveServer2 ⬢ Row Level filtering need Hive Views – Multiple Hive Views needs to be created and managed – Explicit permissions need to be given for each view/user – User need to know which view to use ⬢ Masking needs custom UDF – Needs to be wrapped using Views
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger Overview
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger • Central audit location for all access requests • Support multiple destination sources (HDFS, Solr, etc.) • Real-time visual query interface AuditingAuthorization • Store and manage encryption keys • Support HDFS TDE • Integration with HSM Ranger KMS • Centralized platform to define, administer and manage security policies consistently • Enforce policies within each component
  7. 7. © Hortonworks Inc. 2015. All Rights Reserved
  8. 8. © Hortonworks Inc. 2015. All Rights Reserved
  9. 9. © Hortonworks Inc. 2015. All Rights Reserved Ranger Architecture HDFS Ranger Administration Portal HBase Hive Server2 Ranger Audit Server Ranger Plugin HadoopComponentsEnterprise Users Ranger Plugin Ranger Plugin Legacy Tools and Data Governance HDFS Knox NifI Ranger Plugin Ranger Plugin RDBMS Solr Ranger Plugin Ranger Policy Server Integration API Kafka Ranger Plugin YARN Ranger Plugin Ranger Plugin Storm Ranger Plugin Atlas
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Audits - Data Access
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Audits - Admin Actions
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LLAP Overview
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive 2.0 and LLAP ⬢ At a High Level: – 2000+ features, improvements and bug fixes in Hive since HDP 2.4. – 600+ of these from outside of Hortonworks. ⬢ Major Improvements: – Preview: Hive LLAP: Persistent query servers with intelligent in-memory caching. – ACID GA: Hardened and proven at scale. – Expanded SQL Compliance: More capable integration with BI tools. – Performance: Interactive query, 2x faster ETL. – Security: Row / Column security extending to views, Column level security for Spark.
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive 2 with LLAP: Architecture Overview
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive 2 with LLAP: Open Interfaces
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Integration with Hive and LLAP
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive / LLAP Security Capabilities with Ranger ⬢ Ranger Hive plugin provides authorization / access controls. ⬢ Column Masking: – Inject Hive UDFs that mask characters or hash values. – Dynamic, per-user. ⬢ Dynamic Row Filtering: – Query is analyzed and policies applied. – Dynamic, per-user. ⬢ All operations run as ordinary SQL queries: – Masking statements convert to clauses in the SQL select clause. – Filters convert to clauses in the SQL where clause.
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Native Hive Masking Capabilities UDF Purpose Example Start Example Result mask Convert letters to X/x and numbers to n. 123 Fake St. nnn Xxxx Xx. mask_first_n Mask only the first n characters. 433-54-3937 nnn-54-3937 mask_last_n Mask only the last n characters. 433-54-3937 433-54-nnnn mask_show_first_n Mask, showing only the first n characters. 555-233-1234 555-nnn-nnnn mask_show_last_n Mask, showing only the last n characters. 433-54-3937 nnn-nn-3937 mask_hash Produce a consistent hash of the field. CA 21f241cccaa5cfa33190f56ff1510e37
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Delivering Spark Security
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Features: Spark Column Security with LLAP ⬢ Fine-Grained Column Level Access Control for SparkSQL. ⬢ Fully dynamic policies per user. Doesn’t require views. ⬢ Use Standard Ranger policies and tools to control access and masking policies. Flow: 1. SparkSQL gets data locations known as “splits” from HiveServer and plans query. 2. HiveServer2 authorizes access using Ranger. Per-user policies like row filtering are applied. 3. Spark gets a modified query plan based on dynamic security policy. 4. Spark reads data from LLAP. Filtering / masking guaranteed by LLAP server.
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example: Per-User Row Filtering by Region in SparkSQL
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Cases
  23. 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo Setup ⬢Customer User and Sales data in ORC (Metadata in MetaStore) ⬢Data can be access via SparkSQL or HiveServer2 ⬢Marketing needs access to Sales and Users data for analytics ⬢Fraud Investigation team needs access to data for fraud detection ⬢Billing team needs access to Sales and Users data for billing Users customer_id customer_name customer_email customer_phone customer_ccn customer_state customer_zip Sales customer_id product_id promotion_id cookie_id tracking_id Group Users Fraud frank Marketing mark Billing bill Tables
  24. 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case 1: Restricting Column Access This is a simple use case where certain groups or users don’t permission to view the query ⬢Billing group has access to all columns in table Users ⬢Marketing group can’t access credit card column from table Users Users customer_id customer_name customer_email customer_phone customer_ccn customer_state customer_zip User/Column customer_phone customer_ccn bill (Billing) 😀 😀 mark (Marketing) 😀 😡
  25. 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policy - Restrict Columns
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policy - Restrict Columns - Results bill from Billing mark from Marketing
  27. 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policy - Restrict Columns - Audit Screen
  28. 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case 2: Column Masking In this use case where certain groups or users won't be able to see the real value of certain columns. ⬢Billing group can see the real/raw values for all columns in table Users ⬢Fraud group can only see masked values of PII and PCI fields from table Users Users customer_id customer_name customer_email customer_phone customer_ccn customer_state customer_zip User/Column customer_email, customer_phone, customer_ccn bill (Billing) 😀 frank (Fraud) 😎
  29. 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policies - Mask Fields
  30. 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policy - Column Masking - Results bill from Billing frank from Fraud
  31. 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policy - Column Masking - Audit Screen
  32. 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case 3: Row Filtering In this use case where certain groups or users won't be able to see all the rows from certain tables ⬢Billing group can see all the rows in the table Users ⬢Marketing can only see rows/data from their region in the table Users Users customer_id customer_name customer_email customer_phone customer_ccn customer_state customer_zip User/Column Rows in Users table bill (Billing) 😀 Mark (Marketing- CA) Only CA Users
  33. 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policies - Row Filtering
  34. 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policy - Row Filtering - Results bill from Billing mark from Marketing
  35. 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case 4: Row Filtering - Cross Table This an extension of previous use cases, where the context information for filtering the row is in another table. ⬢Billing group can see all the rows in the table Sales ⬢Marketing can only see rows/data from their region in the table Sales, however Sales table doesn’t have the customer geographic information, so it needs to be derived from Users table Users customer_id customer_name customer_email customer_phone customer_ccn customer_state customer_zip User/Column Rows in Sales table bill (Billing) 😀 Mark (Marketing- CA) Only CA Users Sales customer_id product_id promotion_id cookie_id tracking_id
  36. 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Policies - Row Filtering - Cross Table
  37. 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas Integration
  38. 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cross Product Symbiosis Apache Atlas Apache Ranger LLAP Classification/ Tagging Governance Lineage Tag Based Policies Dynamic Custom Policies Enforcement hooks HDFS S3 Meta Store * Column Masking and Row Filtering not yet supported by tag based policy
  39. 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger - Tag Based Policies
  40. 40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Q & A

×