Apache Atlas
Data Governance
for Hadoop
Sean Roberts
Partner Engineering
London & EMEA
@seano
Data Governance
Availability
Usability
Integrity
Security
Data Governance Technology
Transparency
Reproducibility
Auditability
Consistency
ETL/DQ
BPM
Business
Analytics
Visualization
& Dashboards
ERP
CRM
SCM
MDM
ARCHIVE
Common
Governance
Framework
Use Cases
Financial Reporting
Chain of custody, Lineage narratives
Healthcare
30 day measures reporting
Retail
Point of sale analysis, Price optimization
Telco
Device log management, Correlation, Analysis & Mitigation
Challenges in Hadoop ecosystem
Ecosystem
No holistic approach
Business Demand
Apache Atlas
Data Governance
for Hadoop
Open & co-development with users!
wiki.apache.org/incubator/AtlasProposal
Apache Atlas
Atlas: Capabilities
● Data Classification
● Metadata Exchange
● Centralized Auditing
● Search & Lineage
● Policy Engine
● Security
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Data Lifecycle
Management
Policy Engine
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
Certification
● Metadata exchange
● Stability
● Interoperability
○ Low cost to switch
● Fosters innovation
Discovery
Tagging
Prep /
Cleanse
ETL
Governance
BPM
Self Service
Visualization
Apache Atlas
Components
Atlas: Knowledge Store
Metadata exchange
Flexible Taxonomy
● Data sets/objects
● Tables/Columns
● Logical Context
● Source/Destination
Tech: Titan with HBase
● PluggableApache Atlas
Audit Store
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
Knowledge Store
ModelsType-System
Policy RulesTaxonomies
Type System
Class
Struct
Trait
Primitives
Collections
● Map
● Array
Instances (Entity)
● Referenceable
Type System
Atlas: Data Lifecycle Management
Focus on:
● Provenance
● Replication
● Data retention/eviction
● Late data handling
● Automation
Tech: Falcon
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Data Lifecycle
Management
Other
CWM
Energy
PPDM
Atlas: Audit Store
Historical repository
● Security & Operational
● Indexed
● Searchable (DSL)
Tech:
● YARN ATS, HBase, Hive
● Solr, ElasticSearch
○ PluggableApache Atlas
Knowledge Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Audit Store
Other
CWM
Energy
PPDM
Atlas: Policy Engine
Metadata driven
Rationalized at runtime
Geo/Time based rules
Prohibitions
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Taxonomies
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Policy Rules
Policy Engine
Security
Other
CWM
Energy
PPDM
Atlas: Security
Enforces policies
Metadata driven
ABAC (not simple RBAC)
● Attribute-based access control
Tech: Ranger
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Taxonomies
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Policy Rules
Policy Engine
Security
Other
CWM
Energy
PPDM
Atlas: RESTful Interface
API everything
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
Atlas: Metadata Exchange
Metadata
Metadata
Metadata
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
Apache Atlas
Now & Future
MVP: ASF Incubated
● Rest API
● UI
● Centralized Taxonomy
● Import / Export Metadata
● Documentation
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Data Lifecycle
Management
Policy Engine
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
2015 mid-year GA
● Policy Rules Engine
● Real-time Access Control
● Column Level Tagging
● Audit Store
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Data Lifecycle
Management
Policy Engine
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
2015 2H
● Enhanced Audit Store
○ Immutable File Format
○ Event Metadata Tagging
○ Advanced Reporting
● Advanced Policy Engine
● Row / Column Masking
● 3rd Party Metadata Exchange
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Data Lifecycle
Management
Policy Engine
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
Apache Atlas
Data Governance
for Hadoop
Sean Roberts
@seano

Apache Atlas. Data Governance for Hadoop. Strata London 2015

  • 1.
    Apache Atlas Data Governance forHadoop Sean Roberts Partner Engineering London & EMEA @seano
  • 2.
  • 3.
  • 4.
    Use Cases Financial Reporting Chainof custody, Lineage narratives Healthcare 30 day measures reporting Retail Point of sale analysis, Price optimization Telco Device log management, Correlation, Analysis & Mitigation
  • 5.
    Challenges in Hadoopecosystem Ecosystem No holistic approach Business Demand
  • 6.
  • 7.
    Open & co-developmentwith users! wiki.apache.org/incubator/AtlasProposal Apache Atlas
  • 8.
    Atlas: Capabilities ● DataClassification ● Metadata Exchange ● Centralized Auditing ● Search & Lineage ● Policy Engine ● Security Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Data Lifecycle Management Policy Engine Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  • 9.
    Certification ● Metadata exchange ●Stability ● Interoperability ○ Low cost to switch ● Fosters innovation Discovery Tagging Prep / Cleanse ETL Governance BPM Self Service Visualization
  • 10.
  • 11.
    Atlas: Knowledge Store Metadataexchange Flexible Taxonomy ● Data sets/objects ● Tables/Columns ● Logical Context ● Source/Destination Tech: Titan with HBase ● PluggableApache Atlas Audit Store Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM Knowledge Store ModelsType-System Policy RulesTaxonomies
  • 12.
  • 13.
  • 14.
    Atlas: Data LifecycleManagement Focus on: ● Provenance ● Replication ● Data retention/eviction ● Late data handling ● Automation Tech: Falcon Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Data Lifecycle Management Other CWM Energy PPDM
  • 15.
    Atlas: Audit Store Historicalrepository ● Security & Operational ● Indexed ● Searchable (DSL) Tech: ● YARN ATS, HBase, Hive ● Solr, ElasticSearch ○ PluggableApache Atlas Knowledge Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Audit Store Other CWM Energy PPDM
  • 16.
    Atlas: Policy Engine Metadatadriven Rationalized at runtime Geo/Time based rules Prohibitions Apache Atlas Knowledge Store Audit Store ModelsType-System Taxonomies Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Policy Rules Policy Engine Security Other CWM Energy PPDM
  • 17.
    Atlas: Security Enforces policies Metadatadriven ABAC (not simple RBAC) ● Attribute-based access control Tech: Ranger Apache Atlas Knowledge Store Audit Store ModelsType-System Taxonomies Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Policy Rules Policy Engine Security Other CWM Energy PPDM
  • 18.
    Atlas: RESTful Interface APIeverything Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  • 19.
    Atlas: Metadata Exchange Metadata Metadata Metadata ApacheAtlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  • 20.
  • 21.
    MVP: ASF Incubated ●Rest API ● UI ● Centralized Taxonomy ● Import / Export Metadata ● Documentation Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Data Lifecycle Management Policy Engine Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  • 22.
    2015 mid-year GA ●Policy Rules Engine ● Real-time Access Control ● Column Level Tagging ● Audit Store Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Data Lifecycle Management Policy Engine Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  • 23.
    2015 2H ● EnhancedAudit Store ○ Immutable File Format ○ Event Metadata Tagging ○ Advanced Reporting ● Advanced Policy Engine ● Row / Column Masking ● 3rd Party Metadata Exchange Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Data Lifecycle Management Policy Engine Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM
  • 24.
    Apache Atlas Data Governance forHadoop Sean Roberts @seano