Apache Atlas & Open Metadata
Dataworks Sydney 2017
Nigel Jones,
Software Architect
IBM
Ferd Scheepers
Chief Information Architect
ING
2
Open Metadata and Governance will allow…
… metadata to be captured when the data is created, moved with the data and
be augmented and processed by any of the vendor tools.
Open Metadata and Governance consists of:
1. Standardized, extensible set of metadata types
2. Metadata exchange APIs and notifications
3. Frameworks for automated governance
Open Metadata and Governance will allow you to have:
1. An enterprise data catalogue that lists all of your data, where it is located, its origin (lineage),
owner, structure, meaning, classification and quality
2. New data tools (from any vendor) connect to your data catalogue out of the box
3. Metadata being added automatically to the catalogue as new data is created and analysed
4. Subject matter experts collaborating around the data
5. Automated governance processes protect and manage your data
3
What is Open Metadata and Governance?
4
Positioning of Apache Atlas for Open Metadata
Open and
Unified Metadata
Metadata
repository
Apache Atlas
Metadata
repository
IBM
Metadata
repository
SAS
Open Metadata Repository Service
OMRS
Open Metadata Access Service
OMAS
Components defined
and being developed
by Open Metadata &
Governance project
Metadata
highway
• Apache Atlas provides an open community for developing the reference implementation
for open metadata and governance. In essence Apache Atlas delivers 2 main
capabilities:
• it plays a role of a metadata repository (Graph Database) for a metadata end-user tool
• and, it plays the important role of delivering the federated/unified metadata layer
across the entire landscape of an enterprise
• The software development governance from the Apache Software Foundation (ASF)
creates confidence that the technology will be maintained and enhanced as appropriate
in an equitable manner.
Role of Apache Atlas
5
… because Apache is mostly focused on development and we are missing a governance
body for managing the adoption of and compliance to the Open Metadata and Governance
standards. We envision the following roles for ODPI:
1. Be an advocate of the Open Metadata and Governance standards, make them visible
and their value understood.
2. Facilitate discussions around the Open Metadata and Governance standards evolution,
maintenance and development.
3. Test and sign-off compliance of vendor offerings to the Open Metadata and Governance
standards.
6
Doing all of this under Apache Atlas flag is not enough…
1. Hands-on Community members:
• ING
• IBM
• HortonWorks
2. Companies we have had conversations with:
• CIBC
• SAS
• Microsoft
• Oracle
• Informatica
• Waterline
• RBC
• DBS
7
Who is in ?
1. Ambition level:
• End of September 2017: Open Metadata working demo.
• Mid-December November 2017: first version of user access.
• Google for Data
2. Next steps:
• End of Q2 2018: production ready version of Virtual Data
Connector.
8
Timeline and next steps
About Me
https://www.linkedin.com/in/nigelljones
https://www.twitter.com/planetf1
jonesn@uk.ibm.com@
Objective
Why
How
Excite &
Engage
Apache Atlas
Open
Metadata
Atlas has graduated!
DOB: 2015-05-05R: 0.8.1
Atlas Architecture
Storage Repository
Graph
 Type System
 REST API
 Models
 UI & Apps
Hooks &
Bridges
https://cwiki.apache.org/confluence/display/ATLAS/Open+Metadata+and+Governance
A reminder of our problem.. And solution
Open and
Unified Metadata
Extend beyond Hadoop
++
Common Core Data model
Data Assets Governance Lineage
Glossary Collaboration
Models &
Reference
Data
Base Types,
Systems &
Infrastructure
Metadata
Discovery
https://cwiki.apache.org/confluence/display/ATLAS/Building+out+the+Open+Metadata+Typesystem
Open APIs - OMRS
Metadata Highway
Adapter
Plugin
Open Connector
Framework
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258803
Open APIs - OMAS
OMRS
Governance
Engine
OMAS
Glossary
OMAS
Asset OMAS
Information
View OMAS
++......
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258799
OMAS – detail
Project List
Metadata Service
Data/Asset
Community Metadata
Service
Landscape Definition
Metadata Service
Asset Catalog
Metadata Service
Classification and Mapping
Metadata Service
Information View
Metadata Service
Connector Directory
Metadata Service
Governance Definitions
Metadata Service
Information Process
Metadata Service
Glossary and Taxonomy
Metadata Service
Asset
Metadata Service
Discovery
Metadata Service
Governance Action
Metadata Service
Roles and Access
Metadata Service
Models and Schema
Metadata Service
Connector
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258799
Business
metadata
Structural
metadata for
a data store
New glossary function for semantic processing
EMPNAME EMPNO JOBCODE SALARY
EMPLOYEE
RECORD
Employee
Work Location
Annual Salary
Job Title
Employee Id
Employee Name
Hourly Pay Rate
Manager Compensation Plan
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
IS-A IS-A
Sensitive
IS-A
Data
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3
https://cwiki.apache.org/confluence/display/ATLAS/Area+3+-+Glossary
Replacing v1
Taxonomy (tech
preview)
Categories
Terms
hierarchies
Rich
Relationships
Classifications
Glossary
https://cwiki.apache.org/confluence/display/ATLAS/Area+3+-+Glossary
Governance Action Framework
metadata drives enforcement
Classification (tag) based – scalable, glossary
driven
Access, Masking, Filtering
Supports Apache Ranger but open APIs for others
Audit,Rights - Exception management, Rights,
Privacy (to look at in future)
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258801
Open Discovery Framework
Open
Framework
Plugins
characterize data
& relationships
Updates
metadata
with results
Initial implementation
in master
https://cwiki.apache.org/confluence/display/ATLAS/Automated+metadata+discovery
Open ecosystem
https://cwiki.apache.org/confluence/display/ATLAS/Open+Metadata+and+Governance
Summary
Open
Metadata
Enterprise
Catalog
Discovery
Multi Vendor
Open,
Layered APIs
Metadata
store
integration
Open Source
&
Governance
ubiquitous
Standard
Models
How can I get involved?
Discuss: Mailing List
Document, Explain: Wiki
Report, Design: Jira
Face to face
Code
Vendors!
https://cwiki.apache.org/confluence/display/ATLAS/Getting+Involved
Governance & Security BOF
Thursday 18:00
C4.7
Owen O’Malley
Nigel Jones
Ferd Scheepers
Backup
VDC End to End
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=69407333

Apache atlas sydney 2017-v4

  • 1.
    Apache Atlas &Open Metadata Dataworks Sydney 2017 Nigel Jones, Software Architect IBM Ferd Scheepers Chief Information Architect ING
  • 2.
    2 Open Metadata andGovernance will allow… … metadata to be captured when the data is created, moved with the data and be augmented and processed by any of the vendor tools.
  • 3.
    Open Metadata andGovernance consists of: 1. Standardized, extensible set of metadata types 2. Metadata exchange APIs and notifications 3. Frameworks for automated governance Open Metadata and Governance will allow you to have: 1. An enterprise data catalogue that lists all of your data, where it is located, its origin (lineage), owner, structure, meaning, classification and quality 2. New data tools (from any vendor) connect to your data catalogue out of the box 3. Metadata being added automatically to the catalogue as new data is created and analysed 4. Subject matter experts collaborating around the data 5. Automated governance processes protect and manage your data 3 What is Open Metadata and Governance?
  • 4.
    4 Positioning of ApacheAtlas for Open Metadata Open and Unified Metadata Metadata repository Apache Atlas Metadata repository IBM Metadata repository SAS Open Metadata Repository Service OMRS Open Metadata Access Service OMAS Components defined and being developed by Open Metadata & Governance project Metadata highway
  • 5.
    • Apache Atlasprovides an open community for developing the reference implementation for open metadata and governance. In essence Apache Atlas delivers 2 main capabilities: • it plays a role of a metadata repository (Graph Database) for a metadata end-user tool • and, it plays the important role of delivering the federated/unified metadata layer across the entire landscape of an enterprise • The software development governance from the Apache Software Foundation (ASF) creates confidence that the technology will be maintained and enhanced as appropriate in an equitable manner. Role of Apache Atlas 5
  • 6.
    … because Apacheis mostly focused on development and we are missing a governance body for managing the adoption of and compliance to the Open Metadata and Governance standards. We envision the following roles for ODPI: 1. Be an advocate of the Open Metadata and Governance standards, make them visible and their value understood. 2. Facilitate discussions around the Open Metadata and Governance standards evolution, maintenance and development. 3. Test and sign-off compliance of vendor offerings to the Open Metadata and Governance standards. 6 Doing all of this under Apache Atlas flag is not enough…
  • 7.
    1. Hands-on Communitymembers: • ING • IBM • HortonWorks 2. Companies we have had conversations with: • CIBC • SAS • Microsoft • Oracle • Informatica • Waterline • RBC • DBS 7 Who is in ?
  • 8.
    1. Ambition level: •End of September 2017: Open Metadata working demo. • Mid-December November 2017: first version of user access. • Google for Data 2. Next steps: • End of Q2 2018: production ready version of Virtual Data Connector. 8 Timeline and next steps
  • 9.
  • 10.
  • 11.
    Atlas has graduated! DOB:2015-05-05R: 0.8.1
  • 12.
    Atlas Architecture Storage Repository Graph Type System  REST API  Models  UI & Apps Hooks & Bridges https://cwiki.apache.org/confluence/display/ATLAS/Open+Metadata+and+Governance
  • 13.
    A reminder ofour problem.. And solution Open and Unified Metadata
  • 14.
  • 15.
    Common Core Datamodel Data Assets Governance Lineage Glossary Collaboration Models & Reference Data Base Types, Systems & Infrastructure Metadata Discovery https://cwiki.apache.org/confluence/display/ATLAS/Building+out+the+Open+Metadata+Typesystem
  • 17.
    Open APIs -OMRS Metadata Highway Adapter Plugin Open Connector Framework https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258803
  • 18.
    Open APIs -OMAS OMRS Governance Engine OMAS Glossary OMAS Asset OMAS Information View OMAS ++...... https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258799
  • 19.
    OMAS – detail ProjectList Metadata Service Data/Asset Community Metadata Service Landscape Definition Metadata Service Asset Catalog Metadata Service Classification and Mapping Metadata Service Information View Metadata Service Connector Directory Metadata Service Governance Definitions Metadata Service Information Process Metadata Service Glossary and Taxonomy Metadata Service Asset Metadata Service Discovery Metadata Service Governance Action Metadata Service Roles and Access Metadata Service Models and Schema Metadata Service Connector https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258799
  • 20.
    Business metadata Structural metadata for a datastore New glossary function for semantic processing EMPNAME EMPNO JOBCODE SALARY EMPLOYEE RECORD Employee Work Location Annual Salary Job Title Employee Id Employee Name Hourly Pay Rate Manager Compensation Plan HAS-A HAS-A HAS-A HAS-A HAS-A HAS-A IS-A IS-A Sensitive IS-A Data 00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3 https://cwiki.apache.org/confluence/display/ATLAS/Area+3+-+Glossary
  • 21.
  • 22.
    Governance Action Framework metadatadrives enforcement Classification (tag) based – scalable, glossary driven Access, Masking, Filtering Supports Apache Ranger but open APIs for others Audit,Rights - Exception management, Rights, Privacy (to look at in future) https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=70258801
  • 23.
    Open Discovery Framework Open Framework Plugins characterizedata & relationships Updates metadata with results Initial implementation in master https://cwiki.apache.org/confluence/display/ATLAS/Automated+metadata+discovery
  • 24.
  • 25.
  • 26.
    How can Iget involved? Discuss: Mailing List Document, Explain: Wiki Report, Design: Jira Face to face Code Vendors! https://cwiki.apache.org/confluence/display/ATLAS/Getting+Involved
  • 27.
    Governance & SecurityBOF Thursday 18:00 C4.7 Owen O’Malley Nigel Jones Ferd Scheepers
  • 29.
  • 30.
    VDC End toEnd https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=69407333