Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

1,067 views

Published on

Most enterprises with large data lakes today are flying blind when it comes to the extent to which they can understand how the data in their data lakes is organized, accessed, and utilized to create real business value. Couple this with the need to democratize data, enterprises often realize they have created a data swamp loaded with all kinds of data assets without any curation and without appropriate security controls hoping that developers and analysts can responsibly collaborate to generate insights. In this talk we will provide a broad overview of how organizations can use open source frameworks such as Apache Ranger and Apache Knox to secure their data lakes and Apache Atlas to effectively provide open metadata and governance services for Hadoop ecosystem. We will provide an overview of the new features that have been added in each of these Apache projects recently and how enterprises can leverage these new features to build a robust security and governance model for their data lakes.

Speaker
Owen O'Malley, Co-Founder & Technical Fellow, Hortonworks

Published in: Technology
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Owen O’Malley – Co-founder & Technical Fellow Srikanth Venkat – Senior Director, Product Management Treat Your Enterprise Data Lake Indigestion: Enterprise Ready Security And Governance For Hadoop Ecosystem
  2. 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Presenters Owen O’Malley Co-Founder & Technical Fellow Hortonworks Srikanth Venkat Senior Director of Product Management, Security & Governance Apache Ranger, Apache Atlas, Apache Knox
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda HDP Security Authentication (Kerberos, Apache Knox) Authorization & Audits (Apache Ranger) Data Protection HDP Governance: Apache Atlas Overview
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDP Security: Comprehensive, Complete, Extensible Data Protection Protect data at rest and in motion Audit Maintain a record of data access Authorization Provision access to data Authentication Authenticate users and systems Administration Central management and consistent security Single administrative console to set policy across the entire cluster: Apache Ranger Authentication for perimeter and cluster; integrates with existing Active Directory and LDAP solutions: Kerberos | Apache Knox Consistent authorization controls across all Apache components within HDP: Apache Ranger Record of data access events across all components that is consistent and accessible: Apache Ranger Secure data in motion and data at rest: HDFS TDE w/ Ranger KMS + HSM, Ranger Data Masking + Row Filtering, Wire encryption + Partner Solutions
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authentication & API Security: Apache Knox
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Knox Community Snapshot Mar 2013 Entered Incubator Oct 2013 0.1.0 - 0.3.0 Incubator Releases Feb 2014 Graduates to Apache TLP Apr 2014 0.4.0 TLP Release Nov 2014 0.5.0 May 2015 0.6.0 Apr/Aug 2016 0.9.0/0.9.1 Feb 2016 0.8.0 Dec 2015 0.7.0 Nov 2016 0.10.0 Dec 2016 0.11.0 Mar 2017 0.12.0 TBD 1.0.0 Target Release Date • Committers: 17 • Contributors from: • Hortonworks, IBM, CGI, Uber, Oracle, Blue Talon Apache 0.12.0/HDP 2.6 • Client SDK/DSL Improvements • Apache Zeppelin Proxying • YARN RM UI HA Support • Knox Token Service • Solr API and UI Apache 0.11.0 • LDAP Improvements • Hadoop Group Lookup Support • Phoenix Server Support (Avatica) • Management UI • Metrics @apache_knox
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Knox Proxying Services ★ Provide access to Hadoop via proxying of HTTP resources ★ Ecosystem APIs and UIs + Hadoop oriented dispatching for Kerberos + doAs (impersonation) etc. Authentication Services ★ REST API access, WebSSO flow for UIs ★ LDAP/AD, Header based PreAuth ★ Kerberos, SAML, OAuth Client DSL/SDK Services ★ Scripting through DSL ★ Using Knox Shell classes directly as SDK
  8. 8. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authentication: Kerberos
  9. 9. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Background: Kerberos ⬢ Strongly authenticating and establishing a user’s identity is the basis for secure access in Hadoop ⬢ Users need to be able to reliably “identify” themselves and have identity propagated throughout the Hadoop cluster ⬢ Design & implementation of Kerberos security in native Apache Hadoop was delivered by Hortonworks co-founder Owen O’Malley! ⬢ Why Kerberos? ⬢ Establishes identity for clients, hosts and services ⬢ Prevents impersonation/passwords are never sent over the wire ⬢ Integrates w/ enterprise identity mgmt tools such as LDAP &Active Directory ⬢ More granular auditing of data access/job execution
  10. 10. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Automated Kerberos Setup with Ambari  Wizard driven and automated Kerberos support (kerberos principal creation for service accounts, keytab generation and distribution for appropriate hosts, permissions, etc.)  Removes cumbersome, time consuming and error prone administration of Kerberos  Works with existing Kerberos infrastructure, including Active Directory to automate common tasks, removing the burden from the operator: • Add/Delete Host • Add Service • Add/Delete Component • Regenerate Keytabs • Disable Kerberos
  11. 11. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kerberos + Active Directory Page 18 Cross Realm Trust Client Hadoop Cluster AD / LDAP KDC Users: smith@EXAMPLE.COM Hosts: host1@HADOOP.EXAMPLE.COM Services: hdfs/host1@HADOOP.EXAMPLE.COM User Store Use existing directory tools to manage users Use Kerberos tools to manage host + service principals Authentication
  12. 12. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authorization & Audits: Apache Ranger
  13. 13. 20 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hortonworks Confidential. For Internal Use Only. Apache Ranger Community Snapshot May 2014 XASecure Acquisition July 2014 Enters Apache Incubation Nov 2014 Ranger 0.4.0 Release July 2015 Ranger 0.5/ HDP2.3 Aug 2016 Ranger 0.6/ HDP2.5 Nov 2016 Ranger 0.6.2/ HDP2.5.3 Jan 2017 Ranger TLP graduation! Apr 2017 Ranger 0.7 /HDP2.6 TBD 1.0.0 Target Release Date • Committers: 22 • Contributors from: Ebay, MSFT, Huawei, Pandora, Accenture, ING, Talend Ranger 0.7/HDP 2.6 • Export/import of Policies • $User and macros • Plugin status tab • “Show columns” and “describe extended support” • Incremental LDAP Sync • SmartSense Metrics Ranger 0.6/HDP2.5 • Classification (tag) based security (ABAC) • Dynamic Column Masking & Row Filtering • KMS HSM Integration (Safenet) • Dynamic Policies & Deny Conditions • LDAP Improvements & Audit Scalability Jun 2017 Ranger 0.7.1/ HDP2.6.1
  14. 14. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ranger • Central audit location for all access requests • Support multiple destination sources (HDFS, Solr, etc.) • Real-time visual query interface AuditingAuthorization • Store and manage encryption keys • Support HDFS Transparent Data Encryption • Integration with HSM • Safenet LUNA Ranger KMS • Centralized platform to define, administer and manage security policies consistently across Hadoop components • HDFS, Hive, HBase, YARN, Kafka, Solr, Storm, Knox, NiFi, Atlas • Extensible Architecture • Custom policy conditions, user context enrichers • Easy to add new component types for authorization
  15. 15. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger – Attribute Based Access Control (ABAC) Model ⬢ ABAC Model ⬢ Combination of the subject, action, resource, and environment ⬢ Uses descriptive attributes: AD group, Apache Atlas-based tags or classifications, geo-location, etc. ⬢ Ranger approach is consistent with NIST 800-162 ⬢ Avoid role proliferation and manageability issues
  16. 16. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger Architecture HDFS Ranger Administration Portal HBase Hive Server2 Ranger Audit Server Ranger Plugin HadoopComponentsEnterprise Users Ranger Plugin Ranger Plugin Legacy Tools and Data Governance HDFS Knox NifI Ranger Plugin Ranger Plugin SolrRanger Plugin Ranger Policy Server Integration API KafkaRanger Plugin YARNRanger Plugin Ranger PluginStorm Ranger Plugin Atlas Solr
  17. 17. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDP – Security & Governance Classification Prohibition Time Location Policies PDP Resource Cache Ranger Manage Access Policies and Audit Logs Track Metadata and Lineage Atlas Client Subscribers to Topic Gets Metadata Updates Atlas Metastore Tags Assets Entitles Streams Pipelines Feeds Hive Tables HDFS Files HBase Tables Entities in Data Lake Industry First: Dynamic Tag-based Security Policies
  18. 18. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Row Filtering & Column Masking: Apache Ranger with Apache Hive User 2: Ivanna Location : EU Group: HRUser 1: Joe Location : US Group: Analyst Original Query: SELECT country, nationalid, ccnumber, mrn, name FROM ww_customers Country National ID CC No DOB MRN Name Policy ID US 232323233 4539067047629850 9/12/1969 8233054331 John Doe nj23j424 US 333287465 5391304868205600 8/13/1979 3736885376 Jane Doe cadsd984 Germany T22000129 4532786256545550 3/5/1963 876452830A Ernie Schwarz KK-2345909 Country National ID CC No MRN Name US xxxxx3233 4539 xxxx xxxx xxxx null John Doe US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe Ranger Policy Enforcement Query Rewritten based on Dynamic Ranger Policies: Filter rows by region & apply relevant column masking Users from US Analyst group see data for US persons with CC and National ID (SSN) as masked values and MRN is nullified Country National ID Name MRN Germany T22000129 Ernie Schwarz 876452830A EU HR Policy Admins can see unmasked but are restricted by row filtering policies to see data for EU persons only Original Query: SELECT country, nationalid, name, mrn FROM ww_customers Analysts HR Marketing
  19. 19. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Data Protection
  20. 20. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Protection in Hadoop must be applied at three different layers in Apache Hadoop Storage: encrypt data while it is at rest Transparent Data Encryption in HDFS, Ranger KMS + HSM, Partner Products (HPE Voltage, Protegrity, Dataguise) Transmission: encrypt data as it is in motion Native Apache Hadoop 2.0 provides wire encryption. Upon Access: apply restrictions when accessed Ranger (Dynamic Column Masking + Row Filtering), Partner Masking + Encryption Data Protection
  21. 21. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger KMS Transparent Data Encryption in HDFS NN A B C D HDFS Client A B C D A B C D DN DN DN Benefits  Selective encryption of relevant files/folders  Prevent rogue admin access to sensitive data  Fine grained access controls  Transparent to end application w/o changes  Ranger KMS integrated to external HSM (Safenet Luna) adding to reliability/security of KMS SafeNet- Luna HSM
  22. 22. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Apache Atlas: Vision & Features Overview
  23. 23. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Background: DGI Community becomes Apache Atlas May 2015 Apache Atlas Incubation DGI group Kickoff Dec 2014 Apr 2017 Apache 0.8 Release Global Financial Company * DGI: Data Governance Initiative Aug 2016 Apache 0.7 Foundation Release Apache Atlas 0.8/HDP2.6 • Simplified Search UI • Simplified APIs • Classification-based security for HDFS, Kafka, HBase • Knox SSO • Performance/scalability improvements Apache Atlas 0.7.1/HDP2.5 • High availability support • LDAP Authentication/Authorization • Classification based security for Hive • UI Redesign • Committers – 35 • Code contributors from - Hortonworks, IBM, Aetna, Merck, Target Jun 2017 Atlas Becomes TLP!
  24. 24. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas Vision: Open Metadata & Governance Services STRUCTURED TRADITIONAL RDBMS METADATA MPP APPLIANCES Kafka Storm Sqoop Hive ATLAS METADATA Falcon RANGER Custom Partners Comprehensive Enterprise Data Catalog • Lists all of your data, where it is located, its origin (lineage), owner, structure, meaning, classification and quality • Integrate both on-premise and cloud platforms to provide enterprise wide view Open Enterprise Data Connectors • Interoperable connector framework to connect to your data catalog out of the box with many vendor technologies • No expensive population of proprietary siloed metadata repositories Dynamic Metadata Discovery • Metadata is added automatically to the catalog as new data is created or data is updated • Extensible discovery processes that characterize and classify the data Enabling Collaboration & Workflows • Subject matter experts locate the data they need quickly and efficiently, share their knowledge about the data and its usage to help others • Interested parties and processes are notified automatically Automated Governance Processes • Metadata-driven access control • Auditing, metering, and monitoring • Quality control and exception management • Rights (entitlement) management Predefined standards for glossaries, data schemas, rules and regulations Vision: Metadata-driven foundational governance services for enterprise data ecosystem • Open frameworks and APIs • Agile and secure collaboration around data and advanced analytics • Reduce operational costs while extracting economic value of data
  25. 25. 41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved High Level Architecture: 4 Key points Type System Repository Search DSL Bridge Hive Storm Falcon Custom REST API Graph DB Search Kafka Sqoop Connectors MessagingFramework 3 REST API Modern, flexible access to Atlas services, HDP components, UI & external tools 1 Data Lineage Only product that captures lineage across Hadoop components at platform level. 4 Exchange Leverage existing metadata / models by importing it from current tools. Export metadata to downstream systems 2 Agile Data Modeling: Type system allows custom metadata structures in a hierarchy taxonomy
  26. 26. 42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lineage • Where does this data originate from (source/provenance)? • Upstream path: Path through all data assets and processes leading up to current data asset Impact • How is this data being used ? • What other data assets (derivative/dependent) does this impact? • Downstream path: Path through all data assets and processes leading out of current data asset Used for forensics • Impact analysis • Auditing and Compliance Apache Atlas : Lineage
  27. 27. 43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas: Lineage and Impact
  28. 28. 44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas: Classification • Categorize and curate data assets for easier discovery • Associate context with data assets – Governance, Security, Business, … GOVERNANCE SECURITY BUSINESS
  29. 29. 48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas Classification: usecase – cross component Classification based security on cross-component data assets
  30. 30. 51 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Metadata Catalog Search : Basic Search Search for a hive_table classified as ‘PII’ and name starting with ‘prov’ Filter by Data Asset type Filter by Classification Search text Wildcards: prov*, *sum* Logical expressions: prov* AND *sum*
  31. 31. 52 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Metadata Catalog Search : Advanced Filter by Data asset type Search for a hive_table named ‘employees’ and owner ‘hive’ DSL search with SQL like syntax Select columns from impressions table in raw database hive_column where table.name=‘impressions’ and table.db.name = ‘raw’ DSL query string
  32. 32. 53 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Takeaways  Secure APIs and UIs in Hadoop ecosystem using Apache Knox gateway  Enforce appropriate security controls to monitor data access across your businesses with Apache Ranger – Implement fine-grained policy based controls to grant and monitor data access – Track user activity on data using user access audit logging features to help with forensic auditing for breach notification purposes – Protect sensitive data through anonymization and pseudonymization using dynamic masking and row filtering  Establish an Enterprise Data Catalog with Apache Atlas – Identify and classify data – Harvest and maintain metadata  Track and map the movement of data through your enterprise with Apache Atlas – Maintain a “Near Real Time” view to track data movement – Understand data proliferation (especially sensitive data) with data lineage and impact analysis
  33. 33. 54 © Hortonworks Inc. 2011 – 2016. All Rights Reserved More Information… Coming up Next.. BoF session – Security, Governance & Cybersecurity When: 6:00pm, Thursday September 21st 2017 Where: C4.7 Also Check out other sessions on Apache Atlas & Apache Ranger from recent DataWorks Summits https://dataworkssummit.com/san-jose- 2017/ https://dataworkssummit.com/munich- 2017/ Hortonworks Product Pages https://hortonworks.com/apache/ranger/ https://hortonworks.com/apache/atlas Hortonworks Community Connection: https://community.hortonworks.com/spaces/64/governance- lifecycle-track.html https://community.hortonworks.com/spaces/62/security- track_2.html Apache Software Foundation http://ranger.apache.org/ http://atlas.apache.org/

×