The rise of big data
governance: Insight on this
emerging trend from active
open source initiatives
June 21, 2018 – Dataworks Summit San Jose 2018
@ODPiOrg
TODAY’S SPEAKER
2
John Mertic,
Director of Program
Management, Linux
Foundation
@ODPiOrg
IMAGINE …
An enterprise data catalogue that lists all
of your data, where it is located, its origin
(lineage), owner, structure, meaning,
classification and quality
No matter where the data resides
Search
@ODPiOrg
New tools from any vendor connect to your data catalogue out of the box
No vendor lock-in and no expensive population of yet another proprietary, siloed
metadata repository
Search
Open Metadata Management & Governance
IMAGINE …
@ODPiOrg
Metadata is added automatically to the catalogue as new
data is created
Databases
Applications
Function
Function
Functions
Files
It’s possible if data-driven enterprises collaborate to build it
Let’s talk about how
IMAGINE …
@ODPiOrg
• The Metadata Problem
• Building an Open Ecosystem
• Benefits for Data Governance Professionals
AGENDA
@ODPiOrg
1.Use data outside the application
that created it
2.Find the right data sets
3.Automate governance processes
WHY DO WE NEED METADATA?
@ODPiOrg
• Many data platforms do not
have metadata support
• Proprietary tools support a
limited range of data sources
and governance actions
• Expensive efforts to create
an enterprise data catalogue
TODAY’S REALITY
@ODPiOrg
TODAY’S REALITY
@ODPiOrg
i. The maintenance of metadata must be automated
ii. Metadata management must become ubiquitous
iii. Metadata access must become open and remotely accessible
iv. Metadata should be used to drive the governance of data
v. Wherever possible, discovery and maintenance of metadata has to an integral
part of all tools that access, change and move information.
10
METADATA GOVERNANCE MANIFESTO
@ODPiOrg
Open and
Unified Metadata
Atlas
Metadata
repository
IBM
Metadata
repository
Custom
Metadata
repository
Open Metadata Repository Service
Open Metadata Access Service Open
and
Unified
Metadata
WHAT NEEDS TO CHANGE
@ODPiOrg
Update to Apache Atlas
12
Automation
Capture of metadata from data platforms, data
movement engines and data protection engines.
Exception management and stewardship
Business Value
Specialized services for key data roles such as CDO,
Data Scientist, Developer, DevOps Operator, Asset
Owner, Applications
Connectivity
Metadata Highway offering open metadata exchange,
linking and federation between heterogeneous
metadata repositories.
@ODPiOrg
Open and
Unified Metadata
Atlas
Metadata
repository
IBM
Metadata
repository
Microsoft
SSAS
Open Metadata Repository Service
OMAS Open and
Unified
Metadata
CURRENTLY IN DEVELOPMENT
Information
View
Asset CatalogSubject Area
Catalog
Search UI
Power BI
@ODPiOrg
OPEN SOURCE COLLABORTION
14
@ODPiOrg
Good metadata enables subject matter experts to
collaborate around the data
Locate the data they need, quickly and efficiently
Feeding back their knowledge about the data and the uses
they have made about it to help others and support
economic evaluation of data
CO-CREATION WITH PRACTITIONERS
@ODPiOrg
Your governance program if based on established
definitions
Allow a broader range of tools in your organization
Automated governance processes protect and
manage your data
Metadata-driven access control
Auditing, metering and monitoring
Quality control and exception management
Rights management
Your metadata offerings will deliver value faster as
they tap into metadata collected by other vendor’s
tools.
ODPi packages extend your metadata system’s
and tools’ capabilities
Conformance tests minimize your effort in being
compliant with key standards and regulations.
Customers have increased confidence in your
tools and services due to ODPi certification.
Data Governance Professionals Vendors
HOW THIS HELPS
@ODPiOrg
ROADMAP
March April May June July August September
Data Governance PMC meets weekly
• Focus of meetings are to develop the
open metadata usage guidelines, best
practices, connector descriptions
• Two threads every other week on the
PMC
• Thread 1 : Compliance tools and packs
• Thread 2 : Practitioner - Subject matter
experts
• Learn more at
https://lists.odpi.org/g/odpi-pmc-
datagovernance
Strata,
San Jose
Dataworks
Summit,
Berlin
IBM Think,
Las Vegas Webinar for
Offering
Managers
Webinar for
Developers
Privacy Pack
GA
Apache Atlas
1.0 GA
Releases upcoming
• Privacy pack due in June
(https://jira.odpi.org/browse/DG-3)
• Apache Atlas 1.0 GA to support
work due in late June
(https://cwiki.apache.org/confluenc
e/display/ATLAS/Open+Metadata+
and+Governance)
Future work
• Metadata tools and solutions will
integrate through the open
metadata interfaces
• Integrated solutions and products
with the open metadata interfaces
Dataworks
Summit,
San Jose
Apache Atlas
1.0 beta
Strata,
NYC
@ODPiOrg
18
ODPi – A NEUTRAL HOME FOR COLLABORATION
FOUNDATIONS ENABLE TRUSTED
INNOVATION
Successful Projects depend
on members, developers,
infrastructure to develop
technology, which is turned
into products that the
market will adopt.
Ecosystem
GET INVOLVED WITH ODPi DATA GOVERNANCE
Have your organization support ODPi
https://www.odpi.org/about/join
Visit ODPi website and join the quarterly newsletter
https://www.odpi.org/
Learn more about Data Governance PMC
https://www.odpi.org/projects/data-governance-pmc
Join the Data Governance PMC Mailing List
https://lists.odpi.org/g/odpi-pmc-datagovernance
@ODPiOrg
z
zz
z
z
z
z
Questions?
@ODPiOrg

The rise of big data governance: insight on this emerging trend from active open source initiatives

  • 1.
    The rise ofbig data governance: Insight on this emerging trend from active open source initiatives June 21, 2018 – Dataworks Summit San Jose 2018
  • 2.
    @ODPiOrg TODAY’S SPEAKER 2 John Mertic, Directorof Program Management, Linux Foundation
  • 3.
    @ODPiOrg IMAGINE … An enterprisedata catalogue that lists all of your data, where it is located, its origin (lineage), owner, structure, meaning, classification and quality No matter where the data resides Search
  • 4.
    @ODPiOrg New tools fromany vendor connect to your data catalogue out of the box No vendor lock-in and no expensive population of yet another proprietary, siloed metadata repository Search Open Metadata Management & Governance IMAGINE …
  • 5.
    @ODPiOrg Metadata is addedautomatically to the catalogue as new data is created Databases Applications Function Function Functions Files It’s possible if data-driven enterprises collaborate to build it Let’s talk about how IMAGINE …
  • 6.
    @ODPiOrg • The MetadataProblem • Building an Open Ecosystem • Benefits for Data Governance Professionals AGENDA
  • 7.
    @ODPiOrg 1.Use data outsidethe application that created it 2.Find the right data sets 3.Automate governance processes WHY DO WE NEED METADATA?
  • 8.
    @ODPiOrg • Many dataplatforms do not have metadata support • Proprietary tools support a limited range of data sources and governance actions • Expensive efforts to create an enterprise data catalogue TODAY’S REALITY
  • 9.
  • 10.
    @ODPiOrg i. The maintenanceof metadata must be automated ii. Metadata management must become ubiquitous iii. Metadata access must become open and remotely accessible iv. Metadata should be used to drive the governance of data v. Wherever possible, discovery and maintenance of metadata has to an integral part of all tools that access, change and move information. 10 METADATA GOVERNANCE MANIFESTO
  • 11.
    @ODPiOrg Open and Unified Metadata Atlas Metadata repository IBM Metadata repository Custom Metadata repository OpenMetadata Repository Service Open Metadata Access Service Open and Unified Metadata WHAT NEEDS TO CHANGE
  • 12.
    @ODPiOrg Update to ApacheAtlas 12 Automation Capture of metadata from data platforms, data movement engines and data protection engines. Exception management and stewardship Business Value Specialized services for key data roles such as CDO, Data Scientist, Developer, DevOps Operator, Asset Owner, Applications Connectivity Metadata Highway offering open metadata exchange, linking and federation between heterogeneous metadata repositories.
  • 13.
    @ODPiOrg Open and Unified Metadata Atlas Metadata repository IBM Metadata repository Microsoft SSAS OpenMetadata Repository Service OMAS Open and Unified Metadata CURRENTLY IN DEVELOPMENT Information View Asset CatalogSubject Area Catalog Search UI Power BI
  • 14.
  • 15.
    @ODPiOrg Good metadata enablessubject matter experts to collaborate around the data Locate the data they need, quickly and efficiently Feeding back their knowledge about the data and the uses they have made about it to help others and support economic evaluation of data CO-CREATION WITH PRACTITIONERS
  • 16.
    @ODPiOrg Your governance programif based on established definitions Allow a broader range of tools in your organization Automated governance processes protect and manage your data Metadata-driven access control Auditing, metering and monitoring Quality control and exception management Rights management Your metadata offerings will deliver value faster as they tap into metadata collected by other vendor’s tools. ODPi packages extend your metadata system’s and tools’ capabilities Conformance tests minimize your effort in being compliant with key standards and regulations. Customers have increased confidence in your tools and services due to ODPi certification. Data Governance Professionals Vendors HOW THIS HELPS
  • 17.
    @ODPiOrg ROADMAP March April MayJune July August September Data Governance PMC meets weekly • Focus of meetings are to develop the open metadata usage guidelines, best practices, connector descriptions • Two threads every other week on the PMC • Thread 1 : Compliance tools and packs • Thread 2 : Practitioner - Subject matter experts • Learn more at https://lists.odpi.org/g/odpi-pmc- datagovernance Strata, San Jose Dataworks Summit, Berlin IBM Think, Las Vegas Webinar for Offering Managers Webinar for Developers Privacy Pack GA Apache Atlas 1.0 GA Releases upcoming • Privacy pack due in June (https://jira.odpi.org/browse/DG-3) • Apache Atlas 1.0 GA to support work due in late June (https://cwiki.apache.org/confluenc e/display/ATLAS/Open+Metadata+ and+Governance) Future work • Metadata tools and solutions will integrate through the open metadata interfaces • Integrated solutions and products with the open metadata interfaces Dataworks Summit, San Jose Apache Atlas 1.0 beta Strata, NYC
  • 18.
    @ODPiOrg 18 ODPi – ANEUTRAL HOME FOR COLLABORATION
  • 19.
    FOUNDATIONS ENABLE TRUSTED INNOVATION SuccessfulProjects depend on members, developers, infrastructure to develop technology, which is turned into products that the market will adopt. Ecosystem
  • 20.
    GET INVOLVED WITHODPi DATA GOVERNANCE Have your organization support ODPi https://www.odpi.org/about/join Visit ODPi website and join the quarterly newsletter https://www.odpi.org/ Learn more about Data Governance PMC https://www.odpi.org/projects/data-governance-pmc Join the Data Governance PMC Mailing List https://lists.odpi.org/g/odpi-pmc-datagovernance
  • 21.
  • 22.

Editor's Notes

  • #8 Metadata enables data to be used outside of the application that created it. Analytics and decision making New business applications Reporting and compliance Metadata describes the format and content of data allowing people to judge which data set to use for a new project Structure Meaning Origin Valid values and quality Usage and ownership Regulations and classifications that apply <more> Metadata describes the business context and classification of data allowing automated governance processes to operate.
  • #9 Many data platforms do not have metadata support Proprietary tools support a range of data sources and governance actions No-one supports everything you need and assumes all tools come from their suite Each tool starts “empty” requiring effort to populate metadata Each tool operates as if it is the only tool No integration/interoperability of metadata repositories from different vendors Expensive efforts to create an enterprise data catalogue
  • #11 The maintenance of metadata must be automated to scale to the sheer volumes and variety of data involved in modern business.   Metadata management must become ubiquitous in cloud platforms and large data platforms, such as Apache Hadoop so that the processing engines on these platforms can rely on its availability and build capability around it. Metadata access must become open and remotely accessible so that tools from different vendors can work with metadata located on different platforms. This implies unique identifiers for metadata elements, some level of standardization in the types and formats for metadata and standard interfaces for manipulating metadata. Metadata should be used to drive the governance of data and create a business friendly logical interface to the data landscape. Wherever possible, discovery and maintenance of metadata has to an integral part of all tools that access, change and move information.
  • #16 Code development and standards development relationship
  • #19 ODPi, a Linux Foundation Project, can provide the platform for industry collaboration on shared technology In pursuit of its mission to make Apache Hadoop and associated Big Data solutions ready for enterprise-wide deployment, ODPi is focused on the biggest hurdles In 2016, the largest hurdles were cross-distro harmonization Today, a key blocker to broad-based production use of Big Data is Governance
  • #21 Mention that individuals can get involved.