Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Atlas. Data Governance for Hadoop. Strata London 2015

4,317 views

Published on

Apache Hadoop is being adopted across all industries for its ability
to store and process an abundance of new types of data in a modern data architecture. But this “Any Data” architecture presents a challenge when organizations must reconcile data management realities and as they bring existing and new data from disparate platforms under management.

Apache Atlas proposes to provide governance capabilities in Hadoop that use both a prescriptive and forensic models enriched by business taxonomical metadata. It is designed to exchange metadata with other tools and processes within and outside of the Hadoop stack, thereby enabling platform-agnostic governance.

Published in: Data & Analytics

Apache Atlas. Data Governance for Hadoop. Strata London 2015

  1. 1. Apache Atlas Data Governance for Hadoop Sean Roberts Partner Engineering London & EMEA @seano
  2. 2. Data Governance Availability Usability Integrity Security
  3. 3. Data Governance Technology Transparency Reproducibility Auditability Consistency ETL/DQ BPM Business Analytics Visualization & Dashboards ERP CRM SCM MDM ARCHIVE Common Governance Framework
  4. 4. Use Cases Financial Reporting Chain of custody, Lineage narratives Healthcare 30 day measures reporting Retail Point of sale analysis, Price optimization Telco Device log management, Correlation, Analysis & Mitigation
  5. 5. Challenges in Hadoop ecosystem Ecosystem No holistic approach Business Demand
  6. 6. Apache Atlas Data Governance for Hadoop
  7. 7. Open & co-development with users! wiki.apache.org/incubator/AtlasProposal Apache Atlas
  8. 8. Atlas: Capabilities ● Data Classification ● Metadata Exchange ● Centralized Auditing ● Search & Lineage ● Policy Engine ● Security Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Data Lifecycle Management Policy Engine Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  9. 9. Certification ● Metadata exchange ● Stability ● Interoperability ○ Low cost to switch ● Fosters innovation Discovery Tagging Prep / Cleanse ETL Governance BPM Self Service Visualization
  10. 10. Apache Atlas Components
  11. 11. Atlas: Knowledge Store Metadata exchange Flexible Taxonomy ● Data sets/objects ● Tables/Columns ● Logical Context ● Source/Destination Tech: Titan with HBase ● PluggableApache Atlas Audit Store Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM Knowledge Store ModelsType-System Policy RulesTaxonomies
  12. 12. Type System Class Struct Trait Primitives Collections ● Map ● Array Instances (Entity) ● Referenceable
  13. 13. Type System
  14. 14. Atlas: Data Lifecycle Management Focus on: ● Provenance ● Replication ● Data retention/eviction ● Late data handling ● Automation Tech: Falcon Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Data Lifecycle Management Other CWM Energy PPDM
  15. 15. Atlas: Audit Store Historical repository ● Security & Operational ● Indexed ● Searchable (DSL) Tech: ● YARN ATS, HBase, Hive ● Solr, ElasticSearch ○ PluggableApache Atlas Knowledge Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Audit Store Other CWM Energy PPDM
  16. 16. Atlas: Policy Engine Metadata driven Rationalized at runtime Geo/Time based rules Prohibitions Apache Atlas Knowledge Store Audit Store ModelsType-System Taxonomies Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Policy Rules Policy Engine Security Other CWM Energy PPDM
  17. 17. Atlas: Security Enforces policies Metadata driven ABAC (not simple RBAC) ● Attribute-based access control Tech: Ranger Apache Atlas Knowledge Store Audit Store ModelsType-System Taxonomies Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Policy Rules Policy Engine Security Other CWM Energy PPDM
  18. 18. Atlas: RESTful Interface API everything Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  19. 19. Atlas: Metadata Exchange Metadata Metadata Metadata Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  20. 20. Apache Atlas Now & Future
  21. 21. MVP: ASF Incubated ● Rest API ● UI ● Centralized Taxonomy ● Import / Export Metadata ● Documentation Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Data Lifecycle Management Policy Engine Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  22. 22. 2015 mid-year GA ● Policy Rules Engine ● Real-time Access Control ● Column Level Tagging ● Audit Store Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Data Lifecycle Management Policy Engine Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  23. 23. 2015 2H ● Enhanced Audit Store ○ Immutable File Format ○ Event Metadata Tagging ○ Advanced Reporting ● Advanced Policy Engine ● Row / Column Masking ● 3rd Party Metadata Exchange Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Data Lifecycle Management Policy Engine Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  24. 24. Apache Atlas Data Governance for Hadoop Sean Roberts @seano

×