Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building an open metadata and governance ecosystem

166 views

Published on

The frustration of working in the data industry is that so much time is spent finding, understanding, cleaning and reorganising data rather than putting it to good use. The cause comes down to a gap in the capabilities of our data processing platforms.

In software engineering we teach people that data is private to an application and should only be accessed through the application interface. However, the moment we want to do any form of analysis, we rip the data out of the application, copy it around and start using it for different projects. Very quickly the original context of the data is lost and downstream users waste time reconstructing it.

ODPi Egeria is an open source project delivering embeddable metadata management libraries and interchange technology for our data platforms that ensures metadata can flow with the data in a form that is accessible to tools from many vendors. This open metadata management is coupled with open governance APIs to enable business owners to set policies that is then pushed down into the data platforms engines and tools simplifying regulatory requirements and protection of valuable data assets.

The technology includes a comprehensive metadata type model seeded from many popular standards and enhanced with semantics and governance concepts. The underlying metamodel is a graph designed to be distributed across multiple heterogeneous metadata servers. Metadata is then accessible through replication, event notification and federated queries ensuring metadata is shared and linked to build a rich body of knowledge around the data.

In this presentation I will cover the basic mechanisms of Egeria and how its use across our data platforms and tools could revolutionise the data industry.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Building an open metadata and governance ecosystem

  1. 1. https://github.com/odpi/egeria BUILDING AN OPEN METADATA AND GOVERNANCE ECOSYSTEM 1 Mandy Chessell
  2. 2. https://github.com/odpi/egeria AI is having an increasing impact on every aspect of modern life Energy & Utilities Financial Services Government ManufacturingHealthcare Insurance Retail Telecommunication High Tech Hospital Oil & Gas Travel & Hotel Transportation Multi-channel integration Stock Market
  3. 3. https://github.com/odpi/egeria AI feeds on data, extracted from applications and services 3 Where do you live?
  4. 4. https://github.com/odpi/egeria Context AI feeds on data, extracted from applications and services 4 Where do you live?
  5. 5. https://github.com/odpi/egeria Good metadata at work 5
  6. 6. https://github.com/odpi/egeria AI feeds on data, extracted from applications and services 6
  7. 7. https://github.com/odpi/egeria Curation 00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3 00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 56944 045 27 Code St Harlem NY 1 3 00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 43800 215 27 Code St Harlem NY 1 3 I know I wonder what this means
  8. 8. https://github.com/odpi/egeria Metadata should bring as much information about the data sets to Callie’s data science as is known collectively by the organization. Employee Directory NameBand Job Title X Data Set Name: Employee Directory X Description: Core attributes describing all employees of Coco Pharmaceuticals created from a daily extract from Kenexa. Owner: Penny Payer Status: Last accessed: 6th May 2016 Records: 3488 Last Update: 1st May 2016 Contents: Structure … Contents … Lineage … XColumn: Band Classification Ranges: Confidentiality: Public, Confidential, Sensitive Confidence: Authoritative Retention: Indefinitely Characteristi cs LineageDescription Position reference number for non- exempt employees. The value ranges from 01 to 06 where 01 is the most senior and 06 is the most junior. Type: String Classification: Public
  9. 9. https://github.com/odpi/egeria Scared to share Faith Broker Human Resources 00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3 00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 56944 045 27 Code St Harlem NY 1 3 00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 43800 215 27 Code St Harlem NY 1 3 00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 ##### ### 27 Code St Harlem NY 1 3 00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 ##### ### 27 Code St Harlem NY 1 3 00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 ##### ### 27 Code St Harlem NY 1 3 Callie Quartile Data Scientist Very Sensitive DataVery Sensitive Data
  10. 10. https://github.com/odpi/egeria AI feeds on data, extracted from applications and services 10 Metadata Repository
  11. 11. https://github.com/odpi/egeria Today’s reality
  12. 12. https://github.com/odpi/egeria What needs to change? Open and Unified Metadata
  13. 13. https://github.com/odpi/egeria Example of a simple cohort Cohort A Chief Data Office Data Lake Systems of Record 13
  14. 14. https://github.com/odpi/egeria Connecting to multiple cohorts Cohort BCohort A Chief Data Office Data Lake Systems of Record Mobile Apps Data Lake Systems of Record Marketing 14
  15. 15. https://github.com/odpi/egeria Using glossary function for semantic processing Business metadata Structural metadata for a data store EMPNAME EMPNO JOBCODE SALARY EMPLOYEE RECORD Employee Work Location Annual Salary Job Title Employee Id Employee Name Hourly Pay Rate Manager Compensation Plan HAS-A HAS-A HAS-A HAS-A HAS-A HAS-A IS-A IS-A Sensitive IS-A Data 00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3
  16. 16. https://github.com/odpi/egeria Importance of the Graph Model 16 Database Column Glossary Term Server 1 Server 2 EntityEntity
  17. 17. https://github.com/odpi/egeria Importance of the Graph Model 17 Database Column Glossary Term Glossary Term Meaning Server 1 Server 2 Reference Copy Relationship
  18. 18. https://github.com/odpi/egeria Importance of the Graph Model 18 Database Column Glossary Term Server 1 Server 3 Server 2 Database Column Glossary Term Meaning
  19. 19. https://github.com/odpi/egeria Importance of the Graph Model – Using Entity Proxies 19 Database Column Glossary Term Server 1 Server 3 Server 2 Meaning Database Column Glossary Term Entity Proxy
  20. 20. https://github.com/odpi/egeria Making metadata actionable Metadata Repository Metadata Repository Security Tools Data Tools Open Metadata Highway Open APIs for tools and engines Open metadata exchange and federated queries
  21. 21. https://github.com/odpi/egeria Automating governance example IBM Information Governance Catalog Apache Atlas Apache Ranger Gaian Define Policies Hadoop Metadata Manage Data Access Egeria (Open metadata exchange and federated queries) Access Data Egeria Open Governance APIs configure configure
  22. 22. https://github.com/odpi/egeria Metadata and governance digital platform Open Metadata and Governance Reporting Platform ETL Platform Analytics Platform Virtualization Platform Governance Platform Data Platform
  23. 23. https://github.com/odpi/egeria Search Open Metadata Access Services Design philosophy Open Metadata Repository Services 23 Use cases, Personas, Practitioners input Data integration, availability and integrity best practices
  24. 24. https://github.com/odpi/egeria Coco Pharmaceuticals persona Jules Keeper, CDO Tessa Tube, Chief Researcher Erin Overview, Information Architect Faith Broker Chief Privacy Offic e r Bob Nitter, Integration Developer Callie Quartile, Data Scientist Nancy Noah Cloud Specialist Gary Geeke IT Infrastructure https://odpi.github.io/data-governance/coco-pharmaceuticals/personas/
  25. 25. https://github.com/odpi/egeria Using design thinking  Open Metadata Types  Access Service Identification  Samples and API design  Best Practices 25
  26. 26. https://github.com/odpi/egeria Different personas need different services Callie Quartile Data Scientist Jules Keeper Chief Data Officer Find data Understand data Manage analytics models Build data strategy Define governance program Monitor progress
  27. 27. https://github.com/odpi/egeria Different personas need different services Tanya Tidie Clinical Trials Administrator Ivor Padlock Chief Security Officer Maintain accurate patient records Catalog clinical trials data Demonstrate good data management practices Understand risks to organization Set up protection Monitor for suspicious activity
  28. 28. https://github.com/odpi/egeria Event-driven governance Open Metadata New Database Assign Owner Classify Data Use Data
  29. 29. https://github.com/odpi/egeria Open metadata type model summary Glossary Collaboration Governance Models and Reference Data Metadata Discovery Lineage Data Assets Base Types, Systems and Infrastructure 29
  30. 30. https://github.com/odpi/egeria Each area caters for appropriate metadata structures Policy Metadata (Principles, Regulations, Standards, Approaches, Rule Specifications, Roles and Metrics) Governance Actions and Processes Augmentation MappingImplementation Business Objects and Relationships, Taxonomies and Ontologies Business Attributes Organization Teaming Metadata (people profiles, communities, projects, notebooks, …) Models and Schemas 4 3 1 5 Physical Asset Descriptions (Data stores, APIs, models and components) Asset Collections (Sets, Typed Sets, Type Organized Sets) Information Views Rights Management Reference Data Feedback Metadata (tags, comments, ratings, …) ClassificationSchemes Classification Strategy Subject Area Definition Campaigns and Projects Rollout 2 Discovery Metadata (profile data, technical classification, data classification, data quality assessment, …) Augmentation Instrument Association Information Process Instrumentation (design lineage) 6 7 ConnectorsBasic Types, Infrastructure and Systems Access 0 30
  31. 31. https://github.com/odpi/egeria Current Open Metadata Access Services (OMASs) 31 Project Management Community ProfileAsset Catalog Stewardship Action Information View Governance Program Data Process Subject Area Connected Asset Discovery EngineGovernance Engine Data Protection Software Developer Data Platform Asset Owner Digital Architecture Data Science DevOps Asset Consumer Data Infrastructure Data Privacy Asset Lineage
  32. 32. https://github.com/odpi/egeria Realizing open metadata and governance  Delivering core technology  Recruiting vendors  Assisting practitioners 32 Vendors Practitioners Core Technology Compliance Suite Best Practices Project Egeria Project Data Governance
  33. 33. https://github.com/odpi/egeria Help wanted  Governance practice leaders needed to build out best practices  If you buy data technology please encourage your vendors to consume the Egeria technology.  Looking for developers:  UI development  Graph repository (eg JanusGraph/TinkerPop)  Python clients  Join the ODPi to help fund our work  Tell everyone about want we do 33
  34. 34. https://github.com/odpi/egeria z zz z z z z Questions? Open forum
  35. 35. https://github.com/odpi/egeria Links  Press Release and Podcast  Data Privacy Pack  Coco Pharmaceuticals Persona  Open source repositories • https://github.com/odpi/data-governance • https://github.com/odpi/egeria • https://odpi.github.io/data-governance/coco-pharmaceuticals/personas/ • https://github.com/odpi/data-governance/tree/master/webinars/july2018 • https://odpi.github.io/data-governance/data-privacy-pack/ • https://www.linuxfoundation.org/press-release/2018/08/odpi-announces-egeria-for-open- sharing-exchange-and-governance-of-metadata/ • https://roaringelephant.org/2018/09/25/episode-107-open-metadata-and-governance- masterclass-with-mandy-chessell-part-1/ • https://roaringelephant.org/2018/10/09/episode-109-open-metadata-and-governance- • masterclass-with-mandy-chessell-part-2/
  36. 36. https://github.com/odpi/egeria

×