Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Is your Enterprise Data lake Metadata Driven AND Secure?

1,815 views

Published on

Is your Enterprise Data lake Metadata Driven AND Secure?

Published in: Technology
  • Be the first to comment

Is your Enterprise Data lake Metadata Driven AND Secure?

  1. 1. Is Your Enterprise Data Lake Metadata Driven AND Secure? Apache Atlas + Ranger
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda • Introduction • Overview Apache Atlas & Ranger • Technical Preview: Dynamic, Tag based Policies • Q & A
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Speakers Andrew Ahn Director, Governance Product Management Madhan Neethiraj Director, Enterprise Security Engineering
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas + Ranger Overview
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas is Metadata Services Metadata Services Foundation — HDP 2.3 • Business Catalog: Taxonomy based classification • Technical Data: e.g. Model for Hive: DB, Tables, Views and Columns • Centralized location for all metadata inside and single Interface point for Metadata Exchange with platforms outside of HDP Metadata that enriches every component Available Now with HDP 2.3 • Hive – Complete lineage, every SQL statement tracked • Ambari – setup & monitoring Apache Atlas Hive Ranger Falcon Sqoop Storm Kafka Spark NiFi 1Q2016 – Technical Preview • Sqoop – supplement Hive lineage based on Sqoop import/export • Storm & Kafka – lineage for topologies and participating queues/topics • Ranger – Dynamic Security Policies: leveraging metadata tags • Falcon - Process entities lineage Roadmap • HDFS – Correlated with other components • Spark – support for SparkSQL • NiFi – integrate fine-grained data provenance with Atlas
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Big Data Management Through Metadata Management Scalability Many traditional tools and patterns do not scale when applied to multi-tenant data lakes. Many enterprise have silo’d data and metadata stores that collide in the data lake. This is compounded by the ability to have very large windows (years). Can traditional EDW tools manage 100 million entities effectively with room to grow ? Metadata Tools Scalable, decoupled, de-centralized manage driven through metadata is the only via solution. This allows quick integration with automation and other metamodels Tags for Management, Discovery and Security Proper metadata is the foundation for business taxonomy, stewardship, attribute based security and self-service.
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Access Policy Requirements • Basic Tag policy – PII example. Access and entitlements must be tag based ABAC and scalable in implementation. • Geo-based policy – Policy based on IP address, proxy IP substitution maybe required. The rule enforcement but be geo aware. • Time-based Tag policy – Timer for data access, de-coupled from deletion of data. • Prohibitions – Prevention of combination of Hive tables/Columns that may pose a risk together.
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How does Atlas work with Ranger at scale? Atlas provides: Metadata • Business Classification (taxonomy): Company > HR > Driver • Hierarchy with Inheritance of attribute to child objects: Sensitive “PII” tag of department HR will be inherited by group HR> Driver • Atlas will notify Ranger via Kafka Topic for changes Apache Atlas Hive Ranger Falcon Kafka Storm Atlas provides the metadata tag to create policies Ranger provides: Access & Entitlements • Ranger will cache tags and asset mapping for performance • Ranger will have policies based on tags instead of roles. • Example: PII = <group> This can work for many assets.
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger: Dynamic classification based Security
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger: Introduction Centralized authorization and auditing across Hadoop components • HDFS, Hive, HBase, Knox, Strom, YARN, Kafka, Solr, .. • Audit logs to: Solr, HDFS, RDBMS, Log4j, .. Resource based security • Policies for specific set of resources • Requires revision of policies as resources get added/moved Classification based security • Policies for classifications and not for specific resources • A single policy protects resources in multiple components • As classification for resources change, appropriate policies would automatically be applied • Enables separation of duties: resource-classification and security policies
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger: Authorization and Auditing HBase Ranger Administration Portal HDFS Hive Server2 Ranger Audit StoreRanger Policy Store Ranger Plugin Hadoop Components Enterprise Users Log4j Knox Storm YARN Kafka Solr HDFS Solr Ranger Plugin Ranger Plugin Ranger Plugin Ranger Plugin Ranger Plugin Ranger Plugin Ranger Plugin RDBMS
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas + Ranger integration Metastore • Tags • Assets • Entities Notification Framework Kafka Topics Atlas Atlas Client • Subscribes to Topic • Gets Metadata Updates PDP Resource Cache Ranger Notification Metadata updates Message durability Optimized for Speed Event driven updates
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DEMO
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Setup for the demo Database Table Columns finance tax_2010 Table Access Expires on 12/31/2015 hr employee SSN tagged as PII Users: • analyst: No access to PII, No access to Expired Data • admin: Access to PII, Access to Expired Data
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas: tag a column as PII 3. Select ‘Tags’ tab 4. Click on ‘Add Tag’ 5. Select PII tag & click ‘Save’ 1. Search for the column 2. Select the column
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas: tag a table for expiry_date Select EXPIRES_ON tag and enter value for expiry_date
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger: authorization policy for PII Pick the tag Deny access to PII data to all users with exception of ‘admin’ user
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger: authorization policy for expiry_date Pick the tag Deny access to data after expiry date with the exception of ‘admin’ user
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger: access audit logs Tags associated with resourcesResources accessedPolicy that allowed/denied access
  21. 21. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions
  22. 22. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved References
  23. 23. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved References • Apache Atlas • http://atlas.apache.org • http://hortonworks.com/apache/atlas • Apache Ranger • http://ranger.apache.org • http://hortonworks.com/apache/ranger • Apache Ranger wiki • https://cwiki.apache.org/confluence/display/RANGER • Tag based policies • https://cwiki.apache.org/confluence/display/RANGER/Tag+Based+Policies • Geo-location based policies • https://cwiki.apache.org/confluence/display/RANGER/Geo-location+based+policies

×