Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What the #$* is a Business Catalog and why you need it

821 views

Published on

What the #$* is a Business Catalog and why you need it

Published in: Technology
  • Be the first to comment

What the #$* is a Business Catalog and why you need it

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What the #$* is a Business Catalog and Why You Need It! June 28, 2016 Apache Atlas
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The Problem • Low confidence in Data - Fragmentation of metadata across the enterprise • Duplicate or MIA – Incorrect or missing classification • Rigid Governance – Traditional MDM tools are not agile, cannot keep up with rate of data change
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Atlas Solution • Cross component lineage: Dynamically capture dataset lineage • Single source: Combine and centralize information about your data • Dynamic Access Control: Integration with Ranger • Taxonomy (Business Catalog!): Common Business Language. Hierarchically organized – No dupes !
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is the Atlas Business Catalog ?  Organize data assets along business terms • Authoritative: Hierarchical Taxonomy Creation • Agile modeling: Model Conceptual, Logical, Physical assets • Definition and assignment of tags like PII (Personally Identifiable Information)  Comprehensive features for compliance • Multiple user profiles including Data Steward and Business Analysts • Object auditing to track “Who did it?” • Metadata Versioning to track ”what did they do?” Key Benefits: Organize data assets along business terms Impact analysis, Compliance, Acceptable use Faster Insight
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Taxonomies (catalog) enables: • Search / Discovery – Business catalog of conceptual, logical and physical assets • Security --Dynamic metadata based Access control
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved We conduct open-ended user interviews so that we can learn more about who are users are and what their needs are. This helps us validate whether or not we’re solving the right problem. Research: Focused on Hadoop
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved We test our prototype in InVision - a click through prototyping tool that allows users to interact with static mockups. Usability Testing
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Principle Roles & Activities • Data Steward – Curator, responsible for catalog veracity • Data Scientist – Analyst, primary consumer of Business Catalog • Administrator – Role management only • Data Engineer – Data ingress and egress, semantic data quality • 50% - 80%+ Time spend looking for data • Profit Center • Primary User of Atlas • Enables Scientist Goal: < 25% spent on finding data = Empowering scientist to spend their time uncovering insights -- faster
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Concepts Business Taxonomy (Catalog) The practice and science of classification of things or concepts, including the principles that underlie such classification. The business organization model is hierarchical, making it authoritative with no duplication. Data Lineage (Provenance) Data lineage is defined as a data life cycle that includes the data's origins and where it moves over time. It describes what happens to data as it goes through diverse processes. It helps provide visibility into the analytics pipeline and simplifies tracing errors back to their sources. Tags: Traits vs. Labels vs. Business Taxonomy Atlas has Tags that are authoritative and prevent duplication. Tag can span different parts of the business taxonomy. A tag PII can be used in HR as well Finance or Sales. Benefits: A view of data assets organized by business language Impact analysis, Compliance, Acceptable use Common tag though Hadoop components
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Walk Through • User Setup Atlas via Ranger • Create & Browse Taxonomy of Business Terms • Create & Browse Tags • Search for Assets • Classify Assets with Business Terms • Associate Assets with Tags Summer GA
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Atlas Value • Designed for Hadoop at platform, not application level • High Confidence data in Hadoop for regulated verticals • Compliance and business objectives aligned to data organization • Faster discovery for analysts – reduce time to value • Agile and adaptable – ensures information is current by native connectors • Dynamic protection with Ranger in simple audited policies
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved In Flight: Feature patches being review & committed • Object Versioning UX – Current state of object active or deleted • Comment Tab – User can add comments for collaboration • DQ / Profile Notes Tab – Populate by 3rd parties or by Steward via UI
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Additional Atlas Sessions • Top 3 Big Data Governance Issues: Tuesday 4:10PM @ Room 212 • Extend Governance in Hadoop with the Atlas Ecosystem: integrations with partners Waterline, Trifacta and Attivo: Thursday 4:10PM @ Room 210A
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Learn More: • Hortonworks links: http://hortonworks.com/solutions/security-and- governance/ • Tutorials: https://github.com/hortonworks/tutorials/tree/atlas-ranger- tp/tutorials/hortonworks/atlas-ranger-preview

×