Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies

4,868 views

Published on

Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies

Published in: Technology
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies

  1. 1. Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Speakers Andrew Ahn Governance Director Product Management
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda • Atlas Overview • Near term roadmap • Taxonomy Benefits • Questions
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas Overview
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DGI* Community becomes Apache Atlas May 2015 Proto-type Built Apache Atlas Incubation DGI group Kickoff Feb 2015 Dec 2014 July 2015 HDP 2.3 Foundation GA Release First kickoff to GA in 7 months Global Financial Company * DGI: Data Governance Initiative Faster & Safer Co-Development driven by customer use cases
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved STRUCTURED UNSTRUCTURED Vision - Enterprise Data Governance Across Platfroms TRADITIONAL RDBMS METADATA MPP APPLIANCES Project 1 Project 5 Project 4 Project 3 Metadata Project 6 DATA LAKE Atlas: Metadata Truth in Hadoop Data Management along the entire data lifecycle with integrated provenance and lineage capability Modeling with Metadata enables comprehensive data lineage through a hybrid approach with enhanced tagging and attribute capabilities Interoperable Solutions across the Hadoop ecosystem, through a common metadata store
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas: Metadata Services • Cross- component dataset lineage. Centralized location for all metadata inside HDP • Single Interface point for Metadata Exchange with platforms outside of HDP • Business Taxonomy based classification. Conceptual, Logical And Technical Apache Atlas Hive Ranger Falcon Sqoop Storm Kafka Spark NiFi
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Big Data Management Through Metadata Management Scalability Many traditional tools and patterns do not scale when applied to multi-tenant data lakes. Many enterprise have silo’d data and metadata stores that collide in the data lake. This is compounded by the ability to have very large windows (years). Can traditional EDW tools manage 100 million entities effectively with room to grow ? Metadata Tools Scalable, decoupled, de-centralized manage driven through metadata is the only via solution. This allows quick integration with automation and other metamodels Tags for Management, Discovery and Security Proper metadata is the foundation for business taxonomy, stewardship, attribute based security and self-service.
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas High Level Architecture Type System Repository Search DSL Bridge Hive Storm Falcon Others REST API Graph DB Search Kafka Sqoop Connectors MessagingFramework
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Taxonomies Benefits: • Discovery – Business catalog of conceptual, logical and physical assets • Security --Dynamic metadata based Access control
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Near Term Roadmap: Summer 2016
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Sqoop Teradata Connector Apache Kafka Expanded Native Connector: Dataset Lineage Custom Activity Reporter Metadata Repository RDBMS
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Business Catalog Breadcrumbs for taxonomy context path Contents at taxonomy context
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Technical and Logical Metadata Exchange Knowledge Store Atlas REST API Structured Unstructured Files: XML / JSON 3rd Party Vendors Custom Reporter Non-Hadoop Taxonomy Data Lineage Technical Metadata
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Governance Ready Certification Program Discovery Tagging Prep / Cleanse ETL Governance BPM Self Service Visualization Curated: Selected group of vendor partners to provide rich, complimentary and complete features Choice: Customers choose features that they want to deploy—a la carte versus vendor lock Agile: Low switching costs, Faster deployement and innovation Standard: Common SLA & common open metadata store Flexibility: Interoperability of products through Atlas metadata HDP at core to provide stability and interoperability
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Business Taxonomy Inheritance Human Resources Drivers (Dimension) Timesheets (Facts) PII PIIPII Parent ChildChild Logical Business Taxonomy Data Assets
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Access Policy Apache Ranger + Atlas Integration
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Access Policy Driven by metadata
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tag-based Access Policy Requirements • Basic Tag policy – PII example. Access and entitlements must be tag based ABAC and scalable in implementation. • Geo-based policy – Policy based on IP address, proxy IP substitution maybe required. The rule enforcement but be geo aware. • Time-based policy – Timer for data access, de-coupled from deletion of data. • Prohibitions – Prevention of combination of Hive tables that may pose a risk together.
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How does Atlas work with Ranger at scale? Atlas provides: Metadata • Business Classification (taxonomy): Company > HR > Driver • Hierarchy with Inheritance of attribute to child objects: Sensitive “PII” tag of department HR will be inherited by group HR> Driver • Atlas will notify Ranger via Kafka Topic for changes Apache Atlas Hive Ranger Falcon Kafka Storm Atlas provides the metadata tag to create policies Ranger provides: Access & Entitlements • Ranger will cache tags and asset mapping for performance • Ranger will have a policy based on tags instead of roles. • Example: PII = <group> This can work for a may assets.
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use cases drives design – high reliability Metastore • Tags • Assets • Entities Notification Framework Kafka Topics Atlas Atlas Client • Subscribes to Topic • Gets Metadata Updates PDP Resource Cache Ranger Notification Metadata updates Message durability Optimized for Speed Event driven updates
  23. 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved • Security • Discovery & Lineage Preview Demo
  24. 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Availability: - Tech Preview VMs: May 2016 - GA Release: Summer 2016
  25. 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions ?
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Reference
  27. 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Online Resources VM: https://s3.amazonaws.com/demo-drops.hortonworks.com/HDP- Atlas-Ranger-TP.ova —> Download Public Preview VM Tutorial: https://github.com/hortonworks/tutorials/tree/atlas-ranger- tp/tutorials/hortonworks/atlas-ranger-preview Blog: http://hwxjojo.wpengine.com/blog/the-next-generation-of- hadoop-based-security-data-governance/ (this is giving an error, right now) Learn More: http://hortonworks.com/solutions/atlas-ranger- integration/
  28. 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tag Based Security Video: https://drive.google.com/file/d/0B0wjjMSH77srLXFZN3lmWHVJWVU/view?usp=sharing https://drive.google.com/file/d/0B0wjjMSH77srLXFZN3lmWHVJWVU/view ?usp=sharing
  29. 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You
  30. 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDF: Dataflow Governance Solution
  31. 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Dataflow Security Use case Requirements Accelerated Data Collection: An integrated, data source agnostic collection platform Increased Security and Unprecedented Chain of Custody: Secure from source to storage with high fidelity data provenance The Internet of Any Thing (IoAT): A Proven Platform for the Internet of Things http://hortonworks.com/hdf/
  32. 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Enterprise Grade Governance Dataflow Solution Filtered Metadata • HDP Taxonomy • Centrallized Metadata Repository • Downstream HDP Impacts • Cross component lineage • 3rd Party integration • Guaranteed Delivery • Data Buffering • Prioritized Queueing • Flow specific QoS • Visual Command & Control Months Lineage Years Lineage Reference Taxonomy (Tags) Event level versus Dataset level HDF - NiFI Operation Control Maximum Fidelity Event Level HDP – Atlas Governance Management Medium / Low Fidelity Dataset Level
  33. 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Expanded visibility throughout the eco-system HDF ETL Hive Hive Hook (Native) Security Appliance Data Metadata NiFi NiFi NiFi NiFi Kafka Hive Hook (Native) Hive Hive Hook (Native) HDP Atlas Metadata Repository Centralized Repository for multiple NiFi Deployments End to end data lineage Security Appliance Security Appliance Security Appliance Security Appliance Security Appliance

×