SlideShare a Scribd company logo
https://github.com/odpi/egeria
WHY ODPI EGERIA?
1
Open metadata and Governance
https://github.com/odpi/egeria
How can we become more effective with data?
2
https://github.com/odpi/egeria
The value of open, standardized metadata
3
https://github.com/odpi/egeria
Using a metadata repository to describe data
4
Metadata
Repository
https://github.com/odpi/egeria
Today’s reality – organizations buy lots of tools
5
https://github.com/odpi/egeria 6
https://github.com/odpi/egeria 7
https://github.com/odpi/egeria
A new manifesto for metadata and governance
 The maintenance of metadata must be automated to scale to the sheer volumes and variety
of data involved in modern business. Similarly the use of metadata should be used to drive the
governance of data and create a business friendly logical interface to the data landscape.
 The availability of metadata management must become ubiquitous in cloud platforms and
large data platforms, such as Apache Hadoop so that the processing engines on these
platforms can rely on its availability and build capability around it.
 Metadata access must become open and remotely accessible so that tools from
different vendors can work with metadata located on different platforms. This implies
unique identifiers for metadata elements, some level of standardization in the types and
formats for metadata and standard interfaces for manipulating metadata.
 Wherever possible, discovery and maintenance of metadata has to an integral part of all
tools that access, change and move information.
https://github.com/odpi/egeria
ODPi Egeria enables exchange of metadata between tools
from different vendors
Open and
Unified Metadata
9
Development DevOps Data Science
https://github.com/odpi/egeria
EGERIA’S DISTRIBUTED VIRTUAL GRAPH
10
Uniting metadata from many tools
https://github.com/odpi/egeria
ODPi Egeria enables exchange of metadata between tools
11
Open and
Unified Metadata
Development DevOps Data Science
https://github.com/odpi/egeria
Search
Open Metadata Access Services
Design philosophy
12
Open Metadata Repository Services
Use cases,
Personas,
Practitioners
input
Data integration,
availability and
integrity best
practices
https://github.com/odpi/egeria
Search
A Cohort of OMAG Servers
13
Open Metadata Repository Services
OMRS Cohort
Open Metadata
Access Services
Open Metadata
Access Services Open Metadata
Access Services
Open Metadata
And Governance
(OMAG) Server
https://github.com/odpi/egeria
Egeria Open Metadata Repository Services (OMRS)
 The OMRS defines a protocol and a set of connectors
 The Enterprise Connector performs cohort-wide operations –
this includes issuing queries to the cohort and when metadata
is replicated from another server it can use the local connector
and repository to cache it for availability and performance
 The Local Connector performs local operations and provides a
default Event Mapper that enables events relating to local
operations to be sent to the cohort
 The Repository Connector interfaces to a specific repository –
and optionally, may be accompanied by a custom Event
Mapper
 Egeria provides two built in repositories and there are
connectors to other repositories
 The interface to a repository connector is the MetadataCollection
API, described on the next slide
OMRS Enterprise Connector
OMRS Local Connector
& Event Mapper
OMRS Repository
Connector
Repository
Cohort
MetadataCollection
API
https://github.com/odpi/egeria
The OMRSMetadataCollection interface
 The interface to an Egeria repository is the OMRSMetadataCollection interface
 It includes groups of operations:
 Group 1: Identification of metadata repository - metadataCollectionId
 Group 2: Type definitions (types, attributes) - add, find, get, remove, …
 Group 3: Find instances (entities, relationships) - get, find, graph-queries, …
 Group 4: Maintain instances (entities, relationships) - addEntity, deleteEntity, …
 Group 5: Change control information (entities, relationships) - reIdentify, reHome, …
 Group 6: Maintenance of reference (replica) copies – save, purge, refresh,…
https://github.com/odpi/egeria
Egeria metadata – a distributed graph
Business
metadata
Structural
metadata for
a data store
EMPNAM
E
EMPNO JOBCODE SALARY
EMPLOYEE
RECORD
Employee
Work Location
Annual Salary
Job Title
Employee Id
Employee Name
Hourly Pay Rate
Manager Compensation Plan
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
IS-A IS-A
SensitiveIS-A
Data
 The interconnected nature of metadata forms a graph
 The distributed nature of Egeria leads to a distributed graph…
https://github.com/odpi/egeria
Egeria distributed graph model
17
Database
Column
Glossary
Term
OMAG Server 1 OMAG Server 2
Entity Entity
 A pair of entities are stored in separate servers
https://github.com/odpi/egeria
Egeria distributed graph model
18
Database
Column
Glossary
Term
Glossary
Term
Meaning
OMAG Server 1 OMAG Server 2
Reference
Copy
Relationship
 One entity could be replicated to the other server, as a ‘reference copy’
 The original Glossary Term on OMAG Server 2 is still the master
 A relationship could be defined between the local DB column and the reference copy of the Glossary Term
https://github.com/odpi/egeria
Egeria distributed graph model
19
Database
Column
Glossary
Term
OMAG Server 1
OMAG Server 3
OMAG Server 2
Database
Column
Glossary
Term
Meaning
 Both entities could be replicated to a third server, as reference copies
 The originals are still the masters
 A relationship could be defined between the local reference copies
https://github.com/odpi/egeria
Egeria distributed graph model
20
Database
Column
Glossary
Term
OMAG Server 1
OMAG Server 3
OMAG Server 2
Meaning
Database
Column
Glossary
Term
Entity
Proxy
 Instead of replication, the third server could relate the original entities using entity proxies
https://github.com/odpi/egeria
Egeria Local Graph Repository
 The Egeria distribution includes a persistent repository and a non-persistent reposiutory
 The persistent repository is a graph repository built on JanusGraph
 JanusGraph is an open-source project, hosted by the Linux Foundation
 http://janusgraph.org
 http://github.com/janusgraph/janusgraph
 The built-in graph repository provides an OMAG Server with a persistent metadata store and is built
using Egeria’s ‘plugin’ pattern
 The graph repository can store instances of metadata owned by the local server
 It can also store reference copies of metadata instances replicated to the local server
 It also supports relationship instances that refer to entity proxy instances
https://github.com/odpi/egeria
Anatomy of the local graph repository
22
Graph Repository
JanusGraph
persistence
search
OMAG Server
OMAS – access services
OMRS Enterprise Connector OMRS topics
in
out
Apache
Tinkerpop
OMRS Local Connector
& Event Mapper
OMRS Graph Connector
JanusGraph
Management
Cohort
https://github.com/odpi/egeria
Graph Repository components
 GraphOMRSRepositoryConnector - implements the open connector framework interface
 GraphOMRSRepositoryConnectorProvider – implements the mechanism for brokering a connector
 GraphOMRSMetadataCollection – top level interface supporting type and instance operations
 GraphOMRSMetadataStore – implements the MetadataCollection using a graph database
 GraphOMRSGraphFactory – creation, schema, indexing - encapsulates JanusGraph-specifics
 Mappers – convert between OMRS objects and graph vertices and edges
 GraphOMRSEntityMapper
 GraphOMRSRelationshipMapper
 GraphOMRSClassificationMapper
 Plus various utility classes – error codes, audit logging, constants and utility methods
https://github.com/odpi/egeria/
See open-metadata-implementation/adapters/open-connectors/repository-services-connectors/
open-metadata-collection-store-connectors/graph-repository-connector
https://github.com/odpi/egeria
To use the Egeria Graph Repository
 Configure the OMAG Server repository-mode = ‘local-graph-repository’
 e.g. HTTP POST http://localhost:8080/open-metadata/admin-
services/users/{username}/servers/{servermame}/local-repository/mode/local-graph-repository
 Subsequently, start the OMRS instance in the server
 e.g. HTTP POST http://localhost:8080/open-metadata/admin-
services/users/{username}/servers/{servername}/instance
 When OMRS starts, the graph repository auto-creates a JanusGraph database – including:
 Persistence backend
 Search backend
 Graph schema
 Search indexes
 For now, the persistence backend is embedded Berkeley DB and the indexing backend is Lucene –
further options could be added
https://github.com/odpi/egeria
Graph Schema
The MetadataCollection interface is the formal interface to an Egeria repository.
Whilst it is possible to look at the graph directly (e.g. using Gremlin console):
Please don’t rely on the schema – it is likely to evolve
Type data:
 The Graph Repository does not store type definitions
 It delegates all type operations to the Repository Content Manager
Instance data:
 The Egeria Graph Repository stores instance data, using a JanusGraph schema that has:
 vertices for entities and classifications
 edges for relationships and classifiers
https://github.com/odpi/egeria
Instance representations in the OMRS
26
https://github.com/odpi/egeria
Graph mapping – vertices and edges
Classification
Instance
Entity
Instance
Relationship
Instance
Attributes
Primitives
Enums
Collections
AttributesAttributes
Primitives
Enums
Collections
Primitives
Enums
Collections
label : “classification” label : “entity” label : “relationship”
Properties Properties Properties
vertex
label : “classifier”
Properties
OMRSinstance
representation
Graphschema
element
vertex edge edge
https://github.com/odpi/egeria
Graph mapping – vertices and edges
Properties
Properties Properties
Properties
Properties
entity
entity
classification
classification
https://github.com/odpi/egeria
Metadata Repository API
 A MetadataCollection supports a comprehensive API
 Metadata collection Id
 Query types
 Define/maintain types
 Search/query metadata instances
 Maintain metadata instances
 Historical (as of time) queries
 Effectivity dating
 Versioning
 Metadata
 Advanced maintenance
 Managing reference copied
 Protocol is forgiving – allowing minimal capability -
metadata instance search/query
29
https://github.com/odpi/egeria
Local instances, reference copies and proxies
30
The graph contains one vertex per entity – whether the entity is local, a reference copy or a proxy
The graph contains one edge per relationship – whether the relationship is local or a reference copy
Reference Copies
• The metadataCollectionId core attribute is set to the ‘guid’ of the home repository
Entity Proxy objects
• Each entity instance has a vertex property of type Boolean, to indicate whether the instance is a proxy
https://github.com/odpi/egeria
The MetadataCollection ‘graph-query’ methods
 There are 4 sub-graph query methods:
 getRelatedEntities()
 Returns the entity and its immediate neighbors
 getEntityNeighborhood()
 Returns the entity and its neighbors up to the depth specified by the
‘level’ parameter
 getLinkingEntities()
 Returns the relationships and intermediate entities that connect the
specified pair of entities
 getRelationshipsForEntity()
 Returns relationships associated with entity, optionally filtered by
relationship type and status
level = 2
https://github.com/odpi/egeria
Graph Repository – supported functions
 The GraphRepository supports most of the OMRS MetadataCollection API, including:
 Save and purge of reference copies
 Use of entity proxies
 Delete and restore as well as purge – delete is a soft, restorable delete; purge is permanent
 Re-type of instances
 Re-identify of instances
 Re-home of instances
 The four ‘graph queries’ – described on the previous slide
 The ‘find’ methods – find..ByProperty, find..ByPropertyValue, findEntityByClassification
 The Graph Repository does not (yet) support:
 Historic queries – find methods that specify an asOfTime parameter
 Undo of previous instance updates
https://github.com/odpi/egeria
Further Information
 Please visit us at Booth #53 on the 4th floor
 Project website:
 https://www.odpi.org/projects/egeria
 Open source repositories:
 http://github.com/odpi/egeria
 http://github.com/janusgraph/janusgraph
33
https://github.com/odpi/egeria
DEPLOYMENT PATTERNS
34
From large scale cloud services, on-premises local
deployments to edge IoT devices
https://github.com/odpi/egeria
A hybrid multi-cloud world
Data Lake
Mobile
Apps
Databases
ApplicationsFiles
Independent
metadata
Repository
Linked
metadata
Repositories
Business Partners
Sharing data
IoT devices and
systems
Applications
New applications
deployed to cloud
https://github.com/odpi/egeria
Open metadata ecosystem
Data Lake
Mobile
Apps
Databases
ApplicationsFiles
Independent
metadata
Repository
Linked
metadata
Repositories
Business Partners
Sharing data
IoT devices and
systems
Applications
New applications
deployed to cloud
https://github.com/odpi/egeria
The OMAG Server Platform
37
OMAG
Server
Platform
OMAG
Server
Platform
OMAG
Server
Platform
OMAG
Server
Platform
Egeria Server 1
Egeria Server 2
Egeria Server 3
Kubernetes
OMAG Server
Platform
Egeria
Server 1
Egeria
Server 2
Egeria
Server 3
Multi-tenant
OMAG Server
Platform
Egeria
Server 1
Edge
https://github.com/odpi/egeria
Metadata Tool Integration Patterns
38
https://github.com/odpi/egeria
Metadata Tool Integration Patterns
39
https://github.com/odpi/egeria
Example of a simple cohort
Cohort A
Chief Data Office
Data Lake
Systems of
Record
40
Virtualizer
Security-Sync
Data Bridge
Apache Ranger
Gaian
Stewardship
Stewardship
Stewardship
Data Onboarding
https://github.com/odpi/egeria
Metadata Tool
Integration Patterns
41
https://github.com/odpi/egeria
USER INTERFACE DESIGN
42
Supporting business and technical people
https://github.com/odpi/egeria
UI: good and the not so good.
43
Confusing
Not my language
(too technical or not technical enough)
Not meeting my needs
Presented for my role
Logically flows to complete the
tasks I do.
Underpinned by relevant
(persona specific) APIs
Not using my words
Mismatches my world view
Someone from my role was involved
In creating the UI.
https://github.com/odpi/egeria
UIs
ODPi Egeria design
44
Search
Open Metadata Access Services
Open Metadata Repository Services
44
Use cases,
Personas,
Practitioners
input
Data integration,
availability and
integrity best
practices
ODPi
Egeria
Metadata
repositories
https://github.com/odpi/egeria
UIs
ODPi Egeria UI types
45
Open Metadata Access Services
Open Metadata Repository Services
45
Search
Daemon
Type 1
OMAS only
Type 2
OMAS and OCF
Connector
Type 3
OMRS
Type 4
Daemon UI
Data
store
https://github.com/odpi/egeria
UIs
ODPi Egeria UI types work in progress
4646
Search
Type 1
OMAS only
Type 2
OMAS and OCF
Connector
Type 3
OMRS
Type 4
Daemon UI
IBM creating
Subject Area UI
ING creating
Asset Search
IBM creating
Type explorer
and instance
explorer
ING creating
Lineage viewer
https://github.com/odpi/egeria
Tomcat *
• configuration
Current UI implementation
47
Web app
Egeria
OMAG Server
Rest call
* Egeria Uis are coded to work with Tomcat. We expect other web servers will be used as the community
requires and implements.
https://github.com/odpi/egeria
UI design – profile driven
48
Login
Personal
Profile
User’s roles defines what UI capabilities
a user should see
Subject
area
Type
explorer
Asset
Search
Many more to come ……..
Dealing well with
potentially large
amounts of data in a
persona specific way is
the challenge. E.g. by
paging, limiting by
neighborhood depth in
graph calls
https://github.com/odpi/egeria
Egeria UI technology experiences
49
• Web component technology providing web components. It is not a framework
• + nice separation of components – hiding implementation in shadow dom
• + communicate with property binding
• + support for events
• + many existing paper and iron components for simple things.
David’s (Polymer newby) experiences:
• - quirky – spent a lot of time finding the happy path to get things working, especially around web
components not being initialized when you want to use them (a big frustration was trying to issue a rest call
from the ready() method).
• +/- need to be rigorous with architecture, it seems best to use one way bindings and events and
a top level controller component to drive state transitions for MVC e.g. around a grid. Redux may make
sense to hold state and define state transitions
• - There is no free commercial smart (editable) grid I can find (this seems true for other frameworks as well)
https://github.com/odpi/egeria
The sort of architecture more complex web components
require.
50
• Controller controls all transitions
• The model allows data updates to occur on
the model with simple CRUD operations
• The model changes are then reflected into
the view.
Considerations:
- Operations are currently synchronous. Redux
would be asynchronous
- Spinner would need to lock across the complete
User interaction not just the rest call
- Changes to the view made by the user and
changes to the view from the model, need to be
managed
- Paging required.
https://github.com/odpi/egeria
Call for action!
51
Call to the community for open source UI developers!
Be part of showing how powerful open metadata is using visualization!
Fuel the ODPi rocket!
https://github.com/odpi/egeria
COMMUNITY AND ECOSYSTEM
52
Building a strong community for the future.
https://github.com/odpi/egeria
Open source dependencies
53
Spring Boot
https://github.com/odpi/egeria
Using ODPi Egeria …
 Eases the cost of metadata integration
through
 Comprehensive standards and libraries.
 Active vendor recruitment program.
 Provides direct support to many
governance roles, filling the gaps
between function offered through
commercial tools.
 Provides best practices and content
packs to accelerate an organization’s
journey to becoming data driven.
54
https://github.com/odpi/egeria
Egeria Conformance Program -
its an “imitation game”
55
Workbench
Vendors that pass the
conformance suite can
display this mark
https://github.com/odpi/egeria
Running the Conformance Suite
56
https://github.com/odpi/egeria
The ODPi is a non-profit that is part of The Linux Foundation
 Delivering core technology
 Recruiting vendors
 Assisting practitioners
57
Vendors
Practitioners
Core
Technology
Conformance
Suite
Best
Practices
Project
Egeria
Project
Data
Governance
https://github.com/odpi/egeria
Links
 Press Releases and Podcast
 Open source repositories
• https://github.com/odpi/data-governance
• https://github.com/odpi/egeria
• https://www.linuxfoundation.org/press-release/2018/08/odpi-announces-egeria-for-open-
sharing-exchange-and-governance-of-metadata/
• https://www.linuxfoundation.org/press-release/2019/02/odpi-announces-new-egeria-
conformance-program-to-advance-open-metadata-exchange-between-vendor-tools/
• https://roaringelephant.org/2018/09/25/episode-107-open-metadata-and-governance-
masterclass-with-mandy-chessell-part-1/
• https://roaringelephant.org/2018/10/09/episode-109-open-metadata-and-governance-
• masterclass-with-mandy-chessell-part-2/
• https://youtu.be/ryd3KFWT1mc
58
https://github.com/odpi/egeria
INTEGRATING WITH PARTNERS
59
Working with different vendors
https://github.com/odpi/egeria
Metadata Repository Integration Patterns
 Adapter
 Native
 Plug-in
 Caller
 Special
60
https://github.com/odpi/egeria
IBM Information Governance Catalog Integration
 Egeria’s IGC integration uses the
Adapter Pattern
 There are two connectors to IGC running
in the repository proxy server.
 They translate IGC APIs and events into
open metadata APIs and events.
 Egeria handles the interaction with the
cohort.
 No need to upgrade IGC to adopt
 Outbound metadata only
61
Information
Governance
Catalog
Repository
Proxy
Repository
Connector
Event
Mapper
Connector
Open Metadata Highway
ODPi Egeria
https://github.com/odpi/egeria
Apache Atlas Integration
 The Egeria community is working on a similar
integration for Apache Atlas.
 Again there are two connectors in the repository
proxy server.
 These connectors translate Atlas APIs and events
into open metadata APIs and events.
 Egeria handles the interaction with the cohort.
 No need to upgrade Atlas to adopt
 Two-way exchange of native Atlas metadata
62
Apache Atlas
Repository
Proxy
Repository
Connector
Event
Mapper
Connector
Open Metadata Highway
ODPi Egeria
https://github.com/odpi/egeria
Native Integration
 An alternative approach is the Native Pattern
 There are still two connectors. They translate
internal APIs and events into open metadata APIs
and events.
 ODPi Egeria handles the interaction with the cohort.
 The connectors and the ODPi Egeria libraries reside
in the metadata server.
 No additional server; less network traffic; upgrade
required.
63
Repository
Connector
Event
Mapper
Connector
Open Metadata Highway
ODPi Egeria
Metadata
Server
https://github.com/odpi/egeria
Plug-in Integration
 The plug-in pattern allows different repository back-
ends to be plugged into the ODPi Egeria’s OMAG
Server.
 Egeria includes:
 In-memory Repository (Testing and demos)
 JanusGraph Repository (All scenarios)
 Supports the full protocol and fills in the gaps left by
the proprietary tools.
64
Repository
Connector
Open Metadata Highway
Open Metadata and
Governance (OMAG)
Server
https://github.com/odpi/egeria
COHORT PROTOCOL
65
Server registration and metadata exchange
https://github.com/odpi/egeria
First server
 The first server to join the cohort issues a registration request and waits for
others to join.
66
https://github.com/odpi/egeria
Establishing contact
 When another server joins the cohort they exchange registration information.
67
https://github.com/odpi/egeria
Federated queries
 Once the registration is complete the cohort members can query each other.
68
https://github.com/odpi/egeria
Caching metadata for availability and performance
 Metadata can also be replicated through the cohort
69
https://github.com/odpi/egeria
OPEN METADATA TYPES
70
What is the scope of open metadata?
https://github.com/odpi/egeria
Scope of metadata covered
Glossary Collaboration
Governance
Models and
Reference Data
Metadata
Discovery
Lineage Data Assets
Base Types, Systems
and Infrastructure
71
https://github.com/odpi/egeria
Scope of metadata covered
Policy Metadata (Principles,
Regulations, Standards,
Approaches, Rule Specifications,
Roles and Metrics)
Governance
Actions and
Processes
Augmentation
MappingImplementation
Business Objects and
Relationships, Taxonomies
and Ontologies
Business Attributes
Organization
Teaming Metadata
(people profiles,
communities, projects,
notebooks, …)
Models and Schemas
4
3
1
5
Physical Asset Descriptions
(Data stores, APIs,
models and components)
Asset Collections
(Sets, Typed Sets, Type
Organized Sets)
Information Views
Rights
Management
Reference Data
Feedback Metadata
(tags, comments, ratings, …)
ClassificationSchemes
Classification
Strategy Subject Area Definition
Campaigns and Projects
Rollout
2
Discovery
Metadata (profile data,
technical classification, data
classification,
data quality assessment, …)
Augmentation
Instrument
Association
Information Process
Instrumentation (design lineage)
6
7
ConnectorsBasic Types, Infrastructure and Systems
Access
0
72
https://github.com/odpi/egeria
USING DESIGN THINKING
73
Introducing Coco Pharmaceuticals
https://github.com/odpi/egeria
Search
Open Metadata Access Services
Design philosophy
Open Metadata Repository Services
74
Use cases,
Personas,
Practitioners
input
Data integration,
availability and
integrity best
practices
https://github.com/odpi/egeria
Coco Pharmaceuticals persona
Jules Keeper, CDO Tessa Tube,
Chief Researcher
Erin Overview,
Information Architect
Faith Broker
Chief Privacy Offic
e
r
Bob Nitter,
Integration Developer
Callie Quartile,
Data Scientist
Nancy Noah
Cloud Specialist
Gary Geeke
IT Infrastructure
https://odpi.github.io/data-governance/coco-pharmaceuticals/personas/
75
https://github.com/odpi/egeria
Using design thinking
 Open Metadata Types
 Access Service Identification
 Samples and API design
 Best Practices
76
https://github.com/odpi/egeria
Different personas need different services
Callie Quartile
Data Scientist
Jules Keeper
Chief Data Officer
Find data
Understand data
Manage analytics models
Build data strategy
Define governance program
Monitor progress
77
https://github.com/odpi/egeria
Different personas need different services
Tanya Tidie
Clinical Trials Administrator
Ivor Padlock
Chief Security Officer
Maintain accurate patient records
Catalog clinical trials data
Demonstrate good data management practices
Understand risks to organization
Set up protection
Monitor for suspicious activity
78
https://github.com/odpi/egeria
Event-driven governance
Open
Metadata
New
Database
Assign
Owner
Classify
Data
Use
Data
79
https://github.com/odpi/egeria
Current Open Metadata Access Services (OMASs)
80
Project Management
Community ProfileAsset Catalog
Stewardship Action
Information View
Governance Program
Data Process
Subject Area
Connected Asset Discovery EngineGovernance Engine
Data Protection
Software Developer
Data Platform
Asset Owner
Digital Architecture
Data Science
DevOps
Asset Consumer
Data Infrastructure
Data Privacy
Asset Lineage
https://github.com/odpi/egeria
Open Metadata Access Service (OMAS) instance
81
https://github.com/odpi/egeria
VIRTUAL DATA CONNECTOR
82
Using metadata to control access to data
https://github.com/odpi/egeria
Automating governance example
IBM
Information
Governance
Catalog
Apache Atlas
Apache Ranger
Gaian
Define
Policies
Hadoop
Metadata
Manage Data Access
Egeria Cohort
(Open metadata exchange and federated queries)
Access
Data
Egeria
Open
Governance APIs
configure
configure
83
https://github.com/odpi/egeria
Scared to share (example)
Faith Broker
Human Resources
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3
00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 56944 045 27 Code St Harlem NY 1 3
00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 43800 215 27 Code St Harlem NY 1 3
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 ##### ### 27 Code St Harlem NY 1 3
00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 ##### ### 27 Code St Harlem NY 1 3
00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 ##### ### 27 Code St Harlem NY 1 3
Callie Quartile
Data Scientist
Very Sensitive DataVery Sensitive Data
84
https://github.com/odpi/egeria
What does metadata look like?
Business
metadata
Structural
metadata for
a data store
EMPNAME EMPNO JOBCODE SALARY
EMPLOYEE
RECORD
Employee
Work Location
Annual Salary
Job Title
Employee Id
Employee Name
Hourly Pay Rate
Manager Compensation Plan
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
IS-A
IS-A
Sensitive
IS-A
Data
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3
85
https://github.com/odpi/egeria
Automating governance example
IBM
Information
Governance
Catalog
Apache Atlas
Apache Ranger
Gaian
Define
Policies
Hadoop
Metadata
Manage Data Access
Egeria Cohort
(Open metadata exchange and federated queries)
Access
Data
Egeria
Open
Governance APIs
configure
configure
86
https://github.com/odpi/egeria
EVOLUTION OF GOVERNANCE
87
Egeria guidance on governance
https://github.com/odpi/egeria
Governance maturity seen in terms of Value and Scope
https://github.com/odpi/egeria
Building governance maturity is a gradual process
 Organizations may operate different
levels of maturity in different parts of
their business.
 Choices determined by where the
most value lies.
 Many organizations aspire to provide
all employees with the data they need
(data citizenship*)
89
https://opengovernance.odpi.org/maturity-model/
https://github.com/odpi/egeria
Implementing Data Awareness
90
https://github.com/odpi/egeria
Implementing Governance Awareness
91
https://github.com/odpi/egeria
Implementing Embedded Governance
92
https://github.com/odpi/egeria
Implementing Business Driven Governance
93
https://github.com/odpi/egeria
Implementing Data Citizenship
94

More Related Content

What's hot

CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
confluent
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
ScyllaDB
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
Snowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD PipelinesSnowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD Pipelines
Drew Hansen
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
Sheetal Pratik
 
Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium
confluent
 
Data Mesh
Data MeshData Mesh
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 Updates
Taro L. Saito
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandracodecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
DataStax Academy
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kai Wähner
 
ELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it mattersELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it matters
Matillion
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
Databricks
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
David Groozman
 
From my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumFrom my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debezium
Clement Demonchy
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
Prakash Chockalingam
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j InternalsTobias Lindaaker
 

What's hot (20)

CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Snowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD PipelinesSnowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD Pipelines
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 Updates
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandracodecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
ELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it mattersELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it matters
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
From my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debeziumFrom my sql to postgresql using kafka+debezium
From my sql to postgresql using kafka+debezium
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
 

Similar to OSS NA 2019 - Demo Booth deck overview of Egeria

FROM BIG DATA TO ACTION: HOW TO BREAK OUT OF THE SILOS AND LEVERAGE DATA GOVE...
FROM BIG DATA TO ACTION: HOW TO BREAK OUT OF THE SILOS AND LEVERAGE DATA GOVE...FROM BIG DATA TO ACTION: HOW TO BREAK OUT OF THE SILOS AND LEVERAGE DATA GOVE...
FROM BIG DATA TO ACTION: HOW TO BREAK OUT OF THE SILOS AND LEVERAGE DATA GOVE...
ODPi
 
Microservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive ProgrammingMicroservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive Programming
Araf Karsh Hamid
 
Migration from Rails2 to Rails3
Migration from Rails2 to Rails3Migration from Rails2 to Rails3
Migration from Rails2 to Rails3
Umair Amjad
 
Android application architecture
Android application architectureAndroid application architecture
Android application architecture
Romain Rochegude
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
Databricks
 
6 10-presentation
6 10-presentation6 10-presentation
6 10-presentation
Remi Arnaud
 
Leo's Notes about Apache Kafka
Leo's Notes about Apache KafkaLeo's Notes about Apache Kafka
Leo's Notes about Apache Kafka
Léopold Gault
 
Creating a modern web application using Symfony API Platform, ReactJS and Red...
Creating a modern web application using Symfony API Platform, ReactJS and Red...Creating a modern web application using Symfony API Platform, ReactJS and Red...
Creating a modern web application using Symfony API Platform, ReactJS and Red...
Jesus Manuel Olivas
 
Hatkit Project - Datafiddler
Hatkit Project - DatafiddlerHatkit Project - Datafiddler
Hatkit Project - Datafiddler
holiman
 
grlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Datagrlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Data
Albert Meroño-Peñuela
 
Technical Challenges in Open Metadata
Technical Challenges in Open MetadataTechnical Challenges in Open Metadata
Technical Challenges in Open Metadata
All Things Open
 
Time series data monitoring at 99acres.com
Time series data monitoring at 99acres.comTime series data monitoring at 99acres.com
Time series data monitoring at 99acres.com
Ravi Raj
 
Cacti
CactiCacti
Deep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBDeep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDB
ArangoDB Database
 
OGCE Project Overview
OGCE Project OverviewOGCE Project Overview
OGCE Project Overview
marpierc
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
Aucfan
 
Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)
Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)
Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)
sparkfabrik
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
1-informatica-training
1-informatica-training1-informatica-training
1-informatica-trainingKrishna Sujeer
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 

Similar to OSS NA 2019 - Demo Booth deck overview of Egeria (20)

FROM BIG DATA TO ACTION: HOW TO BREAK OUT OF THE SILOS AND LEVERAGE DATA GOVE...
FROM BIG DATA TO ACTION: HOW TO BREAK OUT OF THE SILOS AND LEVERAGE DATA GOVE...FROM BIG DATA TO ACTION: HOW TO BREAK OUT OF THE SILOS AND LEVERAGE DATA GOVE...
FROM BIG DATA TO ACTION: HOW TO BREAK OUT OF THE SILOS AND LEVERAGE DATA GOVE...
 
Microservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive ProgrammingMicroservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive Programming
 
Migration from Rails2 to Rails3
Migration from Rails2 to Rails3Migration from Rails2 to Rails3
Migration from Rails2 to Rails3
 
Android application architecture
Android application architectureAndroid application architecture
Android application architecture
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
6 10-presentation
6 10-presentation6 10-presentation
6 10-presentation
 
Leo's Notes about Apache Kafka
Leo's Notes about Apache KafkaLeo's Notes about Apache Kafka
Leo's Notes about Apache Kafka
 
Creating a modern web application using Symfony API Platform, ReactJS and Red...
Creating a modern web application using Symfony API Platform, ReactJS and Red...Creating a modern web application using Symfony API Platform, ReactJS and Red...
Creating a modern web application using Symfony API Platform, ReactJS and Red...
 
Hatkit Project - Datafiddler
Hatkit Project - DatafiddlerHatkit Project - Datafiddler
Hatkit Project - Datafiddler
 
grlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Datagrlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Data
 
Technical Challenges in Open Metadata
Technical Challenges in Open MetadataTechnical Challenges in Open Metadata
Technical Challenges in Open Metadata
 
Time series data monitoring at 99acres.com
Time series data monitoring at 99acres.comTime series data monitoring at 99acres.com
Time series data monitoring at 99acres.com
 
Cacti
CactiCacti
Cacti
 
Deep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBDeep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDB
 
OGCE Project Overview
OGCE Project OverviewOGCE Project Overview
OGCE Project Overview
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
 
Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)
Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)
Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
1-informatica-training
1-informatica-training1-informatica-training
1-informatica-training
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 

Recently uploaded

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 

Recently uploaded (20)

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 

OSS NA 2019 - Demo Booth deck overview of Egeria

  • 2. https://github.com/odpi/egeria How can we become more effective with data? 2
  • 3. https://github.com/odpi/egeria The value of open, standardized metadata 3
  • 4. https://github.com/odpi/egeria Using a metadata repository to describe data 4 Metadata Repository
  • 5. https://github.com/odpi/egeria Today’s reality – organizations buy lots of tools 5
  • 8. https://github.com/odpi/egeria A new manifesto for metadata and governance  The maintenance of metadata must be automated to scale to the sheer volumes and variety of data involved in modern business. Similarly the use of metadata should be used to drive the governance of data and create a business friendly logical interface to the data landscape.  The availability of metadata management must become ubiquitous in cloud platforms and large data platforms, such as Apache Hadoop so that the processing engines on these platforms can rely on its availability and build capability around it.  Metadata access must become open and remotely accessible so that tools from different vendors can work with metadata located on different platforms. This implies unique identifiers for metadata elements, some level of standardization in the types and formats for metadata and standard interfaces for manipulating metadata.  Wherever possible, discovery and maintenance of metadata has to an integral part of all tools that access, change and move information.
  • 9. https://github.com/odpi/egeria ODPi Egeria enables exchange of metadata between tools from different vendors Open and Unified Metadata 9 Development DevOps Data Science
  • 10. https://github.com/odpi/egeria EGERIA’S DISTRIBUTED VIRTUAL GRAPH 10 Uniting metadata from many tools
  • 11. https://github.com/odpi/egeria ODPi Egeria enables exchange of metadata between tools 11 Open and Unified Metadata Development DevOps Data Science
  • 12. https://github.com/odpi/egeria Search Open Metadata Access Services Design philosophy 12 Open Metadata Repository Services Use cases, Personas, Practitioners input Data integration, availability and integrity best practices
  • 13. https://github.com/odpi/egeria Search A Cohort of OMAG Servers 13 Open Metadata Repository Services OMRS Cohort Open Metadata Access Services Open Metadata Access Services Open Metadata Access Services Open Metadata And Governance (OMAG) Server
  • 14. https://github.com/odpi/egeria Egeria Open Metadata Repository Services (OMRS)  The OMRS defines a protocol and a set of connectors  The Enterprise Connector performs cohort-wide operations – this includes issuing queries to the cohort and when metadata is replicated from another server it can use the local connector and repository to cache it for availability and performance  The Local Connector performs local operations and provides a default Event Mapper that enables events relating to local operations to be sent to the cohort  The Repository Connector interfaces to a specific repository – and optionally, may be accompanied by a custom Event Mapper  Egeria provides two built in repositories and there are connectors to other repositories  The interface to a repository connector is the MetadataCollection API, described on the next slide OMRS Enterprise Connector OMRS Local Connector & Event Mapper OMRS Repository Connector Repository Cohort MetadataCollection API
  • 15. https://github.com/odpi/egeria The OMRSMetadataCollection interface  The interface to an Egeria repository is the OMRSMetadataCollection interface  It includes groups of operations:  Group 1: Identification of metadata repository - metadataCollectionId  Group 2: Type definitions (types, attributes) - add, find, get, remove, …  Group 3: Find instances (entities, relationships) - get, find, graph-queries, …  Group 4: Maintain instances (entities, relationships) - addEntity, deleteEntity, …  Group 5: Change control information (entities, relationships) - reIdentify, reHome, …  Group 6: Maintenance of reference (replica) copies – save, purge, refresh,…
  • 16. https://github.com/odpi/egeria Egeria metadata – a distributed graph Business metadata Structural metadata for a data store EMPNAM E EMPNO JOBCODE SALARY EMPLOYEE RECORD Employee Work Location Annual Salary Job Title Employee Id Employee Name Hourly Pay Rate Manager Compensation Plan HAS-A HAS-A HAS-A HAS-A HAS-A HAS-A IS-A IS-A SensitiveIS-A Data  The interconnected nature of metadata forms a graph  The distributed nature of Egeria leads to a distributed graph…
  • 17. https://github.com/odpi/egeria Egeria distributed graph model 17 Database Column Glossary Term OMAG Server 1 OMAG Server 2 Entity Entity  A pair of entities are stored in separate servers
  • 18. https://github.com/odpi/egeria Egeria distributed graph model 18 Database Column Glossary Term Glossary Term Meaning OMAG Server 1 OMAG Server 2 Reference Copy Relationship  One entity could be replicated to the other server, as a ‘reference copy’  The original Glossary Term on OMAG Server 2 is still the master  A relationship could be defined between the local DB column and the reference copy of the Glossary Term
  • 19. https://github.com/odpi/egeria Egeria distributed graph model 19 Database Column Glossary Term OMAG Server 1 OMAG Server 3 OMAG Server 2 Database Column Glossary Term Meaning  Both entities could be replicated to a third server, as reference copies  The originals are still the masters  A relationship could be defined between the local reference copies
  • 20. https://github.com/odpi/egeria Egeria distributed graph model 20 Database Column Glossary Term OMAG Server 1 OMAG Server 3 OMAG Server 2 Meaning Database Column Glossary Term Entity Proxy  Instead of replication, the third server could relate the original entities using entity proxies
  • 21. https://github.com/odpi/egeria Egeria Local Graph Repository  The Egeria distribution includes a persistent repository and a non-persistent reposiutory  The persistent repository is a graph repository built on JanusGraph  JanusGraph is an open-source project, hosted by the Linux Foundation  http://janusgraph.org  http://github.com/janusgraph/janusgraph  The built-in graph repository provides an OMAG Server with a persistent metadata store and is built using Egeria’s ‘plugin’ pattern  The graph repository can store instances of metadata owned by the local server  It can also store reference copies of metadata instances replicated to the local server  It also supports relationship instances that refer to entity proxy instances
  • 22. https://github.com/odpi/egeria Anatomy of the local graph repository 22 Graph Repository JanusGraph persistence search OMAG Server OMAS – access services OMRS Enterprise Connector OMRS topics in out Apache Tinkerpop OMRS Local Connector & Event Mapper OMRS Graph Connector JanusGraph Management Cohort
  • 23. https://github.com/odpi/egeria Graph Repository components  GraphOMRSRepositoryConnector - implements the open connector framework interface  GraphOMRSRepositoryConnectorProvider – implements the mechanism for brokering a connector  GraphOMRSMetadataCollection – top level interface supporting type and instance operations  GraphOMRSMetadataStore – implements the MetadataCollection using a graph database  GraphOMRSGraphFactory – creation, schema, indexing - encapsulates JanusGraph-specifics  Mappers – convert between OMRS objects and graph vertices and edges  GraphOMRSEntityMapper  GraphOMRSRelationshipMapper  GraphOMRSClassificationMapper  Plus various utility classes – error codes, audit logging, constants and utility methods https://github.com/odpi/egeria/ See open-metadata-implementation/adapters/open-connectors/repository-services-connectors/ open-metadata-collection-store-connectors/graph-repository-connector
  • 24. https://github.com/odpi/egeria To use the Egeria Graph Repository  Configure the OMAG Server repository-mode = ‘local-graph-repository’  e.g. HTTP POST http://localhost:8080/open-metadata/admin- services/users/{username}/servers/{servermame}/local-repository/mode/local-graph-repository  Subsequently, start the OMRS instance in the server  e.g. HTTP POST http://localhost:8080/open-metadata/admin- services/users/{username}/servers/{servername}/instance  When OMRS starts, the graph repository auto-creates a JanusGraph database – including:  Persistence backend  Search backend  Graph schema  Search indexes  For now, the persistence backend is embedded Berkeley DB and the indexing backend is Lucene – further options could be added
  • 25. https://github.com/odpi/egeria Graph Schema The MetadataCollection interface is the formal interface to an Egeria repository. Whilst it is possible to look at the graph directly (e.g. using Gremlin console): Please don’t rely on the schema – it is likely to evolve Type data:  The Graph Repository does not store type definitions  It delegates all type operations to the Repository Content Manager Instance data:  The Egeria Graph Repository stores instance data, using a JanusGraph schema that has:  vertices for entities and classifications  edges for relationships and classifiers
  • 27. https://github.com/odpi/egeria Graph mapping – vertices and edges Classification Instance Entity Instance Relationship Instance Attributes Primitives Enums Collections AttributesAttributes Primitives Enums Collections Primitives Enums Collections label : “classification” label : “entity” label : “relationship” Properties Properties Properties vertex label : “classifier” Properties OMRSinstance representation Graphschema element vertex edge edge
  • 28. https://github.com/odpi/egeria Graph mapping – vertices and edges Properties Properties Properties Properties Properties entity entity classification classification
  • 29. https://github.com/odpi/egeria Metadata Repository API  A MetadataCollection supports a comprehensive API  Metadata collection Id  Query types  Define/maintain types  Search/query metadata instances  Maintain metadata instances  Historical (as of time) queries  Effectivity dating  Versioning  Metadata  Advanced maintenance  Managing reference copied  Protocol is forgiving – allowing minimal capability - metadata instance search/query 29
  • 30. https://github.com/odpi/egeria Local instances, reference copies and proxies 30 The graph contains one vertex per entity – whether the entity is local, a reference copy or a proxy The graph contains one edge per relationship – whether the relationship is local or a reference copy Reference Copies • The metadataCollectionId core attribute is set to the ‘guid’ of the home repository Entity Proxy objects • Each entity instance has a vertex property of type Boolean, to indicate whether the instance is a proxy
  • 31. https://github.com/odpi/egeria The MetadataCollection ‘graph-query’ methods  There are 4 sub-graph query methods:  getRelatedEntities()  Returns the entity and its immediate neighbors  getEntityNeighborhood()  Returns the entity and its neighbors up to the depth specified by the ‘level’ parameter  getLinkingEntities()  Returns the relationships and intermediate entities that connect the specified pair of entities  getRelationshipsForEntity()  Returns relationships associated with entity, optionally filtered by relationship type and status level = 2
  • 32. https://github.com/odpi/egeria Graph Repository – supported functions  The GraphRepository supports most of the OMRS MetadataCollection API, including:  Save and purge of reference copies  Use of entity proxies  Delete and restore as well as purge – delete is a soft, restorable delete; purge is permanent  Re-type of instances  Re-identify of instances  Re-home of instances  The four ‘graph queries’ – described on the previous slide  The ‘find’ methods – find..ByProperty, find..ByPropertyValue, findEntityByClassification  The Graph Repository does not (yet) support:  Historic queries – find methods that specify an asOfTime parameter  Undo of previous instance updates
  • 33. https://github.com/odpi/egeria Further Information  Please visit us at Booth #53 on the 4th floor  Project website:  https://www.odpi.org/projects/egeria  Open source repositories:  http://github.com/odpi/egeria  http://github.com/janusgraph/janusgraph 33
  • 34. https://github.com/odpi/egeria DEPLOYMENT PATTERNS 34 From large scale cloud services, on-premises local deployments to edge IoT devices
  • 35. https://github.com/odpi/egeria A hybrid multi-cloud world Data Lake Mobile Apps Databases ApplicationsFiles Independent metadata Repository Linked metadata Repositories Business Partners Sharing data IoT devices and systems Applications New applications deployed to cloud
  • 36. https://github.com/odpi/egeria Open metadata ecosystem Data Lake Mobile Apps Databases ApplicationsFiles Independent metadata Repository Linked metadata Repositories Business Partners Sharing data IoT devices and systems Applications New applications deployed to cloud
  • 37. https://github.com/odpi/egeria The OMAG Server Platform 37 OMAG Server Platform OMAG Server Platform OMAG Server Platform OMAG Server Platform Egeria Server 1 Egeria Server 2 Egeria Server 3 Kubernetes OMAG Server Platform Egeria Server 1 Egeria Server 2 Egeria Server 3 Multi-tenant OMAG Server Platform Egeria Server 1 Edge
  • 40. https://github.com/odpi/egeria Example of a simple cohort Cohort A Chief Data Office Data Lake Systems of Record 40 Virtualizer Security-Sync Data Bridge Apache Ranger Gaian Stewardship Stewardship Stewardship Data Onboarding
  • 43. https://github.com/odpi/egeria UI: good and the not so good. 43 Confusing Not my language (too technical or not technical enough) Not meeting my needs Presented for my role Logically flows to complete the tasks I do. Underpinned by relevant (persona specific) APIs Not using my words Mismatches my world view Someone from my role was involved In creating the UI.
  • 44. https://github.com/odpi/egeria UIs ODPi Egeria design 44 Search Open Metadata Access Services Open Metadata Repository Services 44 Use cases, Personas, Practitioners input Data integration, availability and integrity best practices ODPi Egeria Metadata repositories
  • 45. https://github.com/odpi/egeria UIs ODPi Egeria UI types 45 Open Metadata Access Services Open Metadata Repository Services 45 Search Daemon Type 1 OMAS only Type 2 OMAS and OCF Connector Type 3 OMRS Type 4 Daemon UI Data store
  • 46. https://github.com/odpi/egeria UIs ODPi Egeria UI types work in progress 4646 Search Type 1 OMAS only Type 2 OMAS and OCF Connector Type 3 OMRS Type 4 Daemon UI IBM creating Subject Area UI ING creating Asset Search IBM creating Type explorer and instance explorer ING creating Lineage viewer
  • 47. https://github.com/odpi/egeria Tomcat * • configuration Current UI implementation 47 Web app Egeria OMAG Server Rest call * Egeria Uis are coded to work with Tomcat. We expect other web servers will be used as the community requires and implements.
  • 48. https://github.com/odpi/egeria UI design – profile driven 48 Login Personal Profile User’s roles defines what UI capabilities a user should see Subject area Type explorer Asset Search Many more to come …….. Dealing well with potentially large amounts of data in a persona specific way is the challenge. E.g. by paging, limiting by neighborhood depth in graph calls
  • 49. https://github.com/odpi/egeria Egeria UI technology experiences 49 • Web component technology providing web components. It is not a framework • + nice separation of components – hiding implementation in shadow dom • + communicate with property binding • + support for events • + many existing paper and iron components for simple things. David’s (Polymer newby) experiences: • - quirky – spent a lot of time finding the happy path to get things working, especially around web components not being initialized when you want to use them (a big frustration was trying to issue a rest call from the ready() method). • +/- need to be rigorous with architecture, it seems best to use one way bindings and events and a top level controller component to drive state transitions for MVC e.g. around a grid. Redux may make sense to hold state and define state transitions • - There is no free commercial smart (editable) grid I can find (this seems true for other frameworks as well)
  • 50. https://github.com/odpi/egeria The sort of architecture more complex web components require. 50 • Controller controls all transitions • The model allows data updates to occur on the model with simple CRUD operations • The model changes are then reflected into the view. Considerations: - Operations are currently synchronous. Redux would be asynchronous - Spinner would need to lock across the complete User interaction not just the rest call - Changes to the view made by the user and changes to the view from the model, need to be managed - Paging required.
  • 51. https://github.com/odpi/egeria Call for action! 51 Call to the community for open source UI developers! Be part of showing how powerful open metadata is using visualization! Fuel the ODPi rocket!
  • 54. https://github.com/odpi/egeria Using ODPi Egeria …  Eases the cost of metadata integration through  Comprehensive standards and libraries.  Active vendor recruitment program.  Provides direct support to many governance roles, filling the gaps between function offered through commercial tools.  Provides best practices and content packs to accelerate an organization’s journey to becoming data driven. 54
  • 55. https://github.com/odpi/egeria Egeria Conformance Program - its an “imitation game” 55 Workbench Vendors that pass the conformance suite can display this mark
  • 57. https://github.com/odpi/egeria The ODPi is a non-profit that is part of The Linux Foundation  Delivering core technology  Recruiting vendors  Assisting practitioners 57 Vendors Practitioners Core Technology Conformance Suite Best Practices Project Egeria Project Data Governance
  • 58. https://github.com/odpi/egeria Links  Press Releases and Podcast  Open source repositories • https://github.com/odpi/data-governance • https://github.com/odpi/egeria • https://www.linuxfoundation.org/press-release/2018/08/odpi-announces-egeria-for-open- sharing-exchange-and-governance-of-metadata/ • https://www.linuxfoundation.org/press-release/2019/02/odpi-announces-new-egeria- conformance-program-to-advance-open-metadata-exchange-between-vendor-tools/ • https://roaringelephant.org/2018/09/25/episode-107-open-metadata-and-governance- masterclass-with-mandy-chessell-part-1/ • https://roaringelephant.org/2018/10/09/episode-109-open-metadata-and-governance- • masterclass-with-mandy-chessell-part-2/ • https://youtu.be/ryd3KFWT1mc 58
  • 60. https://github.com/odpi/egeria Metadata Repository Integration Patterns  Adapter  Native  Plug-in  Caller  Special 60
  • 61. https://github.com/odpi/egeria IBM Information Governance Catalog Integration  Egeria’s IGC integration uses the Adapter Pattern  There are two connectors to IGC running in the repository proxy server.  They translate IGC APIs and events into open metadata APIs and events.  Egeria handles the interaction with the cohort.  No need to upgrade IGC to adopt  Outbound metadata only 61 Information Governance Catalog Repository Proxy Repository Connector Event Mapper Connector Open Metadata Highway ODPi Egeria
  • 62. https://github.com/odpi/egeria Apache Atlas Integration  The Egeria community is working on a similar integration for Apache Atlas.  Again there are two connectors in the repository proxy server.  These connectors translate Atlas APIs and events into open metadata APIs and events.  Egeria handles the interaction with the cohort.  No need to upgrade Atlas to adopt  Two-way exchange of native Atlas metadata 62 Apache Atlas Repository Proxy Repository Connector Event Mapper Connector Open Metadata Highway ODPi Egeria
  • 63. https://github.com/odpi/egeria Native Integration  An alternative approach is the Native Pattern  There are still two connectors. They translate internal APIs and events into open metadata APIs and events.  ODPi Egeria handles the interaction with the cohort.  The connectors and the ODPi Egeria libraries reside in the metadata server.  No additional server; less network traffic; upgrade required. 63 Repository Connector Event Mapper Connector Open Metadata Highway ODPi Egeria Metadata Server
  • 64. https://github.com/odpi/egeria Plug-in Integration  The plug-in pattern allows different repository back- ends to be plugged into the ODPi Egeria’s OMAG Server.  Egeria includes:  In-memory Repository (Testing and demos)  JanusGraph Repository (All scenarios)  Supports the full protocol and fills in the gaps left by the proprietary tools. 64 Repository Connector Open Metadata Highway Open Metadata and Governance (OMAG) Server
  • 66. https://github.com/odpi/egeria First server  The first server to join the cohort issues a registration request and waits for others to join. 66
  • 67. https://github.com/odpi/egeria Establishing contact  When another server joins the cohort they exchange registration information. 67
  • 68. https://github.com/odpi/egeria Federated queries  Once the registration is complete the cohort members can query each other. 68
  • 69. https://github.com/odpi/egeria Caching metadata for availability and performance  Metadata can also be replicated through the cohort 69
  • 71. https://github.com/odpi/egeria Scope of metadata covered Glossary Collaboration Governance Models and Reference Data Metadata Discovery Lineage Data Assets Base Types, Systems and Infrastructure 71
  • 72. https://github.com/odpi/egeria Scope of metadata covered Policy Metadata (Principles, Regulations, Standards, Approaches, Rule Specifications, Roles and Metrics) Governance Actions and Processes Augmentation MappingImplementation Business Objects and Relationships, Taxonomies and Ontologies Business Attributes Organization Teaming Metadata (people profiles, communities, projects, notebooks, …) Models and Schemas 4 3 1 5 Physical Asset Descriptions (Data stores, APIs, models and components) Asset Collections (Sets, Typed Sets, Type Organized Sets) Information Views Rights Management Reference Data Feedback Metadata (tags, comments, ratings, …) ClassificationSchemes Classification Strategy Subject Area Definition Campaigns and Projects Rollout 2 Discovery Metadata (profile data, technical classification, data classification, data quality assessment, …) Augmentation Instrument Association Information Process Instrumentation (design lineage) 6 7 ConnectorsBasic Types, Infrastructure and Systems Access 0 72
  • 74. https://github.com/odpi/egeria Search Open Metadata Access Services Design philosophy Open Metadata Repository Services 74 Use cases, Personas, Practitioners input Data integration, availability and integrity best practices
  • 75. https://github.com/odpi/egeria Coco Pharmaceuticals persona Jules Keeper, CDO Tessa Tube, Chief Researcher Erin Overview, Information Architect Faith Broker Chief Privacy Offic e r Bob Nitter, Integration Developer Callie Quartile, Data Scientist Nancy Noah Cloud Specialist Gary Geeke IT Infrastructure https://odpi.github.io/data-governance/coco-pharmaceuticals/personas/ 75
  • 76. https://github.com/odpi/egeria Using design thinking  Open Metadata Types  Access Service Identification  Samples and API design  Best Practices 76
  • 77. https://github.com/odpi/egeria Different personas need different services Callie Quartile Data Scientist Jules Keeper Chief Data Officer Find data Understand data Manage analytics models Build data strategy Define governance program Monitor progress 77
  • 78. https://github.com/odpi/egeria Different personas need different services Tanya Tidie Clinical Trials Administrator Ivor Padlock Chief Security Officer Maintain accurate patient records Catalog clinical trials data Demonstrate good data management practices Understand risks to organization Set up protection Monitor for suspicious activity 78
  • 80. https://github.com/odpi/egeria Current Open Metadata Access Services (OMASs) 80 Project Management Community ProfileAsset Catalog Stewardship Action Information View Governance Program Data Process Subject Area Connected Asset Discovery EngineGovernance Engine Data Protection Software Developer Data Platform Asset Owner Digital Architecture Data Science DevOps Asset Consumer Data Infrastructure Data Privacy Asset Lineage
  • 83. https://github.com/odpi/egeria Automating governance example IBM Information Governance Catalog Apache Atlas Apache Ranger Gaian Define Policies Hadoop Metadata Manage Data Access Egeria Cohort (Open metadata exchange and federated queries) Access Data Egeria Open Governance APIs configure configure 83
  • 84. https://github.com/odpi/egeria Scared to share (example) Faith Broker Human Resources 00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3 00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 56944 045 27 Code St Harlem NY 1 3 00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 43800 215 27 Code St Harlem NY 1 3 00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 ##### ### 27 Code St Harlem NY 1 3 00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 ##### ### 27 Code St Harlem NY 1 3 00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 ##### ### 27 Code St Harlem NY 1 3 Callie Quartile Data Scientist Very Sensitive DataVery Sensitive Data 84
  • 85. https://github.com/odpi/egeria What does metadata look like? Business metadata Structural metadata for a data store EMPNAME EMPNO JOBCODE SALARY EMPLOYEE RECORD Employee Work Location Annual Salary Job Title Employee Id Employee Name Hourly Pay Rate Manager Compensation Plan HAS-A HAS-A HAS-A HAS-A HAS-A HAS-A IS-A IS-A Sensitive IS-A Data 00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3 85
  • 86. https://github.com/odpi/egeria Automating governance example IBM Information Governance Catalog Apache Atlas Apache Ranger Gaian Define Policies Hadoop Metadata Manage Data Access Egeria Cohort (Open metadata exchange and federated queries) Access Data Egeria Open Governance APIs configure configure 86
  • 89. https://github.com/odpi/egeria Building governance maturity is a gradual process  Organizations may operate different levels of maturity in different parts of their business.  Choices determined by where the most value lies.  Many organizations aspire to provide all employees with the data they need (data citizenship*) 89 https://opengovernance.odpi.org/maturity-model/

Editor's Notes

  1. AUTOMATED – Metadata is created by application at the same as the data is created in a standard manner easily consumable for all with necessary permissions Device that took the picture / name of picture / settings picture was taken at / location geo tag of picture etc – all automatic – all done at creation of data time
  2. Egeria is an Open Source framework that can be used to provide a distributed, unified view of metadata from different sources, including different stores and tools from different vendors. Egeria creates a unified view of metadata residing in those tools and stores, so users can collaborate and share metadata, without needing to visit multiple tools or stores. Egeria does not attempt to consolidate the metadata into one repository or tool – it’s better to leave it in place - the current owners stay in control of their metadata, and it stays local to its native store or tool. Egeria provides an open type system, plus APIs, protocols, connectors and local metadata repositories.
  3. The internal architecture of Egeria has two distinct layers. The Open Metadata Access Services layer supports the different types of user and use case. The Open Metadata Repository Services layer provides the unified view of metadata across distinct systems, using protocols and repositories for access and exchange of metadata objects. Egeria’s OMRS layer includes the ability to refer to remote objects or replicate cached copies of remote objects for performance and availability Egeria can store this distributed model in its own local repositories, which support the storing of: local objects, replicas of remote objects and proxy-references to remote objects.
  4. This slide shows a physical embodiment of a cohort of OMAG Servers. An OMAG Server is a deployable unit of function and each OMAG Server can be configured to either run a set of OMAS services or support a repository, or a combination of these roles. An Egeria cohort is a collection of cooperating OMAG Servers. An OMAG Server may belong to multiple cohorts. The OMAS services are local to a server Each server runs the set of OMAS services listed in its configuration – it is OK to run 0, 1 or multiple OMAS services in a server Each OMAS is for a specific purpose or persona The OMRS protocol layer is supported by all servers The OMAG Servers use OMRS to access/exchange metadata across the cohort A server shares its metadata over OMRS – sending an event each time a change occurs, or sending a query to other servers A server may optionally maintain a local Egeria repository A server may optionally connect to a 3rd party metadata repository In a few slides we’ll see that the OMRS itself is composed of distinct layers that focus on cross-cohort (“Enterprise”) functions and Local functions.
  5. The role of OMRS is to provide a location transparent, unified view of metadata within a cohort. Cross-cohort operations are supported by the OMRS ‘Enterprise Connector’, including sending queries to the cohort and receiving the results, as well as receiving replicated metadata and saving copies via the local connector. Meanwhile the ‘Local Connector’ handles interactions with an (optional) local repository and provides a default event mapper that sends events when the local state changes. The OMRS protocol uses publish/subscribe over Kafka topics, but the communication/messaging system is pluggable so different transports could be used. The interface to the repository connector is the MetadataCollection API _ which is described on the next slide….
  6. We’re not going to describe this interface in detail – but it’s worth being aware of it, especially as we’re going to talk later about the graph-queries in Group 3.
  7. Egeria’s model of metadata is graph-oriented, both at the business layer and beneath that in the structural metadata Business metadata describes the data that the business needs, what it means and how it should be classified and protected. Structural metadata describes how the data is actually stored and labelled in the data store. The linkages within and between the business and technical metadata forms a graph, that can be used to switch between these two perspectives. One of the built-in repositories in Egeria is a graph repository,; a natural fit for the metadata graph that also accommodates the distributed nature of OMRS. The Egeria local graph repository is built on the open-source JanusGraph graph database.
  8. It may not always be practical to replicate an instance There are 2 occasions where using a proxy is advantageous: An OMAS wants to save a relationship in a repository and the replication has not happened yet (or the set up is such that replication of that type is not enabled). 2. The repository does not support the full entity type but does support proxies (all proxies have the same storage requirement). A key point about the distributed graph is that whether the relationship refers to a replica entity or uses an entity proxy – it is location transparent. The Enterprise OMRS layer can select which repository into which to save an instance – based on capability and proximity.
  9. Egeria provides a persistent graph repository It’s built using JanusGraph and currently uses version 0.3.1 JanusGraph is an open source project hosted by the Linux Foundation that supports the Apache Tinkerpop 3.3 interface. The Egeria graph repository is built using the Egeria ‘plugin’ repository pattern – in which the repository connector is both the connector and the implementation of the repository. The graph repository supports instances originating locally, instances replicated from a remote server and proxy instances.
  10. This slide shows (some of) the layers within an OMAG Server. We talked earlier about the access services and about the Enterprise Connector and Local Connectors within OMRS. Now we want to focus on the relationship between the Egeria graph repository connector and repository implementation (both in aqua-blue) and the JanusGraph code (in green) As far as possible the repository uses Apache Tinkerpop for graph operations. This is simply that – while we like JanusGraph – it is probably sensible to stay as far as possible with the Tinkerpop interface for possible future portability. There are some aspects of interacting with a graph database that are inherently implementation-specific – things like the configuration (e.g. of backends), schema and indexing. For these types of interaction it is necessary to use the JanusGraph Management interface.
  11. Whilst you could look inside the graph for debugging or development – please don’t write code that relies on the schema as it is very likely to evolve The graph does not contain type information – Egeria provides a repository helper that manages types. The graph is used to store instance data - as described in mode detail on the following slides…
  12. Here is an example of a number of OMRS instance objects – there are two entities, that are connected by a relationship. Also, one of the entities has two classifications. All of the instances have attributes – some will be core attributes used for type or control information; others will be attributes that are specific to the instance type (known as type-defined attributes). You don’t need to remember this picture – we’ll stick a copy of it in the top corner so we can refer back to it…..
  13. Entities and classifications are vertices. Relationships and classifiers are edges. The graph schema defines labels for Entity, Relationship, Classification and Classifier. Vertex and edge properties are used to store OMRS instance data, which includes type, control and property information: Type is referenced by name – not linked by an edge; types are held in the repository content manager, not stored in the graph Control information is stored in ‘core attribute’ properties Instance properties are stored in serialized form and under unique custom keys to support search
  14. Entities and classifications are vertices. Relationships and classifiers are edges. The graph schema defines labels for Entity, Relationship, Classification and Classifier. Vertex and edge properties are used to store OMRS instance data, which includes type, control and property information: Type is referenced by name – not linked by an edge; types are held in the repository content manager, not stored in the graph Control information is stored in ‘core attribute’ properties Instance properties are stored in serialized form and under unique custom keys to support search
  15. Within Group 3 of the MDC API ….
  16. Experts in a field with their own jargon and ways of doing things. Search report writer interested in assets and not security policies. Security policy author not interested in assets Goals tasks associated artifacts for a role.
  17. 1 OMAS only e,g Subject area, the UI only uses the OMAS interfaces to communicate with Egeria 2 OMAS and connector e.g. VDC metadata is obtained from Egeria using OMAs calls, the actual date is accessed using an RDB connector 3 OMRS oriented UIs – e.g. Tex used to explore Egeria types 4 Daemon UIs – displaying Lineage
  18. 1 OMAS only e,g Subject area, the UI only uses the OMAS interfaces to communicate with Egeria 2 OMAS and connector e.g. VDC metadata is obtained from Egeria using OMAs calls, the actual date is accessed using an RDB connector 3 OMRS oriented UIs – e.g. Tex used to explore Egeria types 4 Daemon UIs – displaying Lineage
  19. For this to work we need to know hostname and ports and url structures. Configuration for tomcat is via application.properties Configuration of the server is held in a file and authored via admin rest calls.
  20. Example here is the glossary grid. A grid for authoring glossaries in the subject area UI. Work in progress
  21. ODPi
  22. Business metadata describes the data that the business needs, what it means and how it should be classified and protected. Structural metadata describes how the data is actually stored and labelled in the data store. The linkage between the business and technical metadata allows our technology to switch between these two perspectives. For example, A request for data expressed in business terminology can be translated into a query for data from a data store. An integration engine copying data into a sand box can discover which are the fields that the business classifies as sensitive and then mask these values dynamically.