SlideShare a Scribd company logo
1 of 99
https://github.com/odpi/egeria
FROM BIG DATA TO ACTION: HOW TO BREAK
OUT OF THE SILOS AND LEVERAGE DATA
GOVERNANCE FOR YOUR ORGANIZATION?
1
Open metadata and Governance
https://github.com/odpi/egeria
Introduction
 Who:
 John Mertic – The Linux Foundation
 Chris Replogle – SAS Institute
 What:
 Using open practices and open software to govern your data.
2
https://github.com/odpi/egeria
How ING is becoming a metadata driven enterprise using Egeria
3
https://github.com/odpi/egeria
WHY ODPI EGERIA?
4
Open metadata and Governance
https://github.com/odpi/egeria
How can we become more effective with data?
5
https://github.com/odpi/egeria
The value of open, standardized metadata
6
https://github.com/odpi/egeria
Using a metadata repository to describe data
7
Metadata
Repository
https://github.com/odpi/egeria
Today’s reality – organizations buy lots of tools
8
https://github.com/odpi/egeria 9
https://github.com/odpi/egeria 10
https://github.com/odpi/egeria
A new manifesto for metadata and governance
 The maintenance of metadata must be automated to scale to the sheer volumes and variety
of data involved in modern business. Similarly the use of metadata should be used to drive the
governance of data and create a business friendly logical interface to the data landscape.
 The availability of metadata management must become ubiquitous in cloud platforms and
large data platforms, such as Apache Hadoop so that the processing engines on these
platforms can rely on its availability and build capability around it.
 Metadata access must become open and remotely accessible so that tools from
different vendors can work with metadata located on different platforms. This implies
unique identifiers for metadata elements, some level of standardization in the types and
formats for metadata and standard interfaces for manipulating metadata.
 Wherever possible, discovery and maintenance of metadata has to an integral part of all
tools that access, change and move information.
https://github.com/odpi/egeria
ODPi Egeria enables exchange of metadata between tools
from different vendors
Open and
Unified Metadata
12
Development DevOps Data Science
https://github.com/odpi/egeria
EGERIA’S DISTRIBUTED VIRTUAL GRAPH
13
Uniting metadata from many tools
https://github.com/odpi/egeria
ODPi Egeria enables exchange of metadata between tools
14
Open and
Unified Metadata
Development DevOps Data Science
https://github.com/odpi/egeria
Search
Open Metadata Access Services
Design philosophy
15
Open Metadata Repository Services
Use cases,
Personas,
Practitioners
input
Data integration,
availability and
integrity best
practices
https://github.com/odpi/egeria
Search
A Cohort of OMAG Servers
16
Open Metadata Repository Services
OMRS Cohort
Open Metadata
Access Services
Open Metadata
Access Services Open Metadata
Access Services
Open Metadata
And Governance
(OMAG) Server
https://github.com/odpi/egeria
Egeria Open Metadata Repository Services (OMRS)
 The OMRS defines a protocol and a set of connectors
 The Enterprise Connector performs cohort-wide operations –
this includes issuing queries to the cohort and when metadata
is replicated from another server it can use the local connector
and repository to cache it for availability and performance
 The Local Connector performs local operations and provides a
default Event Mapper that enables events relating to local
operations to be sent to the cohort
 The Repository Connector interfaces to a specific repository –
and optionally, may be accompanied by a custom Event
Mapper
 Egeria provides two built in repositories and there are
connectors to other repositories
 The interface to a repository connector is the MetadataCollection
API, described on the next slide
OMRS Enterprise Connector
OMRS Local Connector
& Event Mapper
OMRS Repository
Connector
Repository
Cohort
MetadataCollection
API
https://github.com/odpi/egeria
Egeria metadata – a distributed graph
Business
metadata
Structural
metadata for
a data store
EMPNAM
E
EMPNO JOBCODE SALARY
EMPLOYEE
RECORD
Employee
Work Location
Annual Salary
Job Title
Employee Id
Employee Name
Hourly Pay Rate
Manager Compensation Plan
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
IS-A IS-A
SensitiveIS-A
Data
 The interconnected nature of metadata forms a graph
 The distributed nature of Egeria leads to a distributed graph…
https://github.com/odpi/egeria
Egeria distributed graph model
19
Database
Column
Glossary
Term
OMAG Server 1 OMAG Server 2
Entity Entity
 A pair of entities are stored in separate servers
https://github.com/odpi/egeria
Egeria distributed graph model
20
Database
Column
Glossary
Term
Glossary
Term
Meaning
OMAG Server 1 OMAG Server 2
Reference
Copy
Relationship
 One entity could be replicated to the other server, as a ‘reference copy’
 The original Glossary Term on OMAG Server 2 is still the master
 A relationship could be defined between the local DB column and the reference copy of the Glossary Term
https://github.com/odpi/egeria
Egeria distributed graph model
21
Database
Column
Glossary
Term
OMAG Server 1
OMAG Server 3
OMAG Server 2
Database
Column
Glossary
Term
Meaning
 Both entities could be replicated to a third server, as reference copies
 The originals are still the masters
 A relationship could be defined between the local reference copies
https://github.com/odpi/egeria
Egeria distributed graph model
22
Database
Column
Glossary
Term
OMAG Server 1
OMAG Server 3
OMAG Server 2
Meaning
Database
Column
Glossary
Term
Entity
Proxy
 Instead of replication, the third server could relate the original entities using entity proxies
https://github.com/odpi/egeria
DEPLOYMENT PATTERNS
23
From large scale cloud services, on-premises local
deployments to edge IoT devices
https://github.com/odpi/egeria
A hybrid multi-cloud world
Data Lake
Mobile
Apps
Databases
ApplicationsFiles
Independent
metadata
Repository
Linked
metadata
Repositories
Business Partners
Sharing data
IoT devices and
systems
Applications
New applications
deployed to cloud
https://github.com/odpi/egeria
Open metadata ecosystem
Data Lake
Mobile
Apps
Databases
ApplicationsFiles
Independent
metadata
Repository
Linked
metadata
Repositories
Business Partners
Sharing data
IoT devices and
systems
Applications
New applications
deployed to cloud
https://github.com/odpi/egeria
The OMAG Server Platform
26
OMAG
Server
Platform
OMAG
Server
Platform
OMAG
Server
Platform
OMAG
Server
Platform
Egeria Server 1
Egeria Server 2
Egeria Server 3
Kubernetes
OMAG Server
Platform
Egeria
Server 1
Egeria
Server 2
Egeria
Server 3
Multi-tenant
OMAG Server
Platform
Egeria
Server 1
Edge
https://github.com/odpi/egeria
Metadata Tool Integration Patterns
27
https://github.com/odpi/egeria
Metadata Tool Integration Patterns
28
https://github.com/odpi/egeria
Example of a simple cohort
Cohort A
Chief Data Office
Data Lake
Systems of
Record
29
Virtualizer
Security-Sync
Data Bridge
Apache Ranger
Gaian
Stewardship
Stewardship
Stewardship
Data Onboarding
https://github.com/odpi/egeria
Metadata Tool
Integration Patterns
30
https://github.com/odpi/egeria
COHORT PROTOCOL
31
Server registration and metadata exchange
https://github.com/odpi/egeria
First server
 The first server to join the cohort issues a registration request and waits for
others to join.
32
https://github.com/odpi/egeria
Establishing contact
 When another server joins the cohort they exchange registration information.
33
https://github.com/odpi/egeria
Federated queries
 Once the registration is complete the cohort members can query each other.
34
https://github.com/odpi/egeria
Caching metadata for availability and performance
 Metadata can also be replicated through the cohort
35
https://github.com/odpi/egeria
OPEN METADATA TYPES
36
What is the scope of open metadata?
https://github.com/odpi/egeria
Scope of metadata covered
Glossary Collaboration
Governance
Models and
Reference Data
Metadata
Discovery
Lineage Data Assets
Base Types, Systems
and Infrastructure
37
https://github.com/odpi/egeria
Scope of metadata covered
Policy Metadata (Principles,
Regulations, Standards,
Approaches, Rule Specifications,
Roles and Metrics)
Governance
Actions and
Processes
Augmentation
MappingImplementation
Business Objects and
Relationships, Taxonomies
and Ontologies
Business Attributes
Organization
Teaming Metadata
(people profiles,
communities, projects,
notebooks, …)
Models and Schemas
4
3
1
5
Physical Asset Descriptions
(Data stores, APIs,
models and components)
Asset Collections
(Sets, Typed Sets, Type
Organized Sets)
Information Views
Rights
Management
Reference Data
Feedback Metadata
(tags, comments, ratings, …)
ClassificationSchemes
Classification
Strategy Subject Area Definition
Campaigns and Projects
Rollout
2
Discovery
Metadata (profile data,
technical classification, data
classification,
data quality assessment, …)
Augmentation
Instrument
Association
Information Process
Instrumentation (design lineage)
6
7
ConnectorsBasic Types, Infrastructure and Systems
Access
0
38
https://github.com/odpi/egeria
USING DESIGN THINKING
39
Introducing Coco Pharmaceuticals
https://github.com/odpi/egeria
Search
Open Metadata Access Services
Design philosophy
Open Metadata Repository Services
40
Use cases,
Personas,
Practitioners
input
Data integration,
availability and
integrity best
practices
https://github.com/odpi/egeria
Coco Pharmaceuticals persona
Jules Keeper, CDO Tessa Tube,
Chief Researcher
Erin Overview,
Information Architect
Faith Broker
Chief Privacy Offic
e
r
Bob Nitter,
Integration Developer
Callie Quartile,
Data Scientist
Nancy Noah
Cloud Specialist
Gary Geeke
IT Infrastructure
https://odpi.github.io/data-governance/coco-pharmaceuticals/personas/
41
https://github.com/odpi/egeria
Using design thinking
 Open Metadata Types
 Access Service Identification
 Samples and API design
 Best Practices
42
https://github.com/odpi/egeria
Different personas need different services
Callie Quartile
Data Scientist
Jules Keeper
Chief Data Officer
Find data
Understand data
Manage analytics models
Build data strategy
Define governance program
Monitor progress
43
https://github.com/odpi/egeria
Different personas need different services
Tanya Tidie
Clinical Trials Administrator
Ivor Padlock
Chief Security Officer
Maintain accurate patient records
Catalog clinical trials data
Demonstrate good data management practices
Understand risks to organization
Set up protection
Monitor for suspicious activity
44
https://github.com/odpi/egeria
Event-driven governance
Open
Metadata
New
Database
Assign
Owner
Classify
Data
Use
Data
45
https://github.com/odpi/egeria
Current Open Metadata Access Services (OMASs)
46
Project Management
Community ProfileAsset Catalog
Stewardship Action
Information View
Governance Program
Data Process
Subject Area
Connected Asset Discovery EngineGovernance Engine
Data Protection
Software Developer
Data Platform
Asset Owner
Digital Architecture
Data Science
DevOps
Asset Consumer
Data Infrastructure
Data Privacy
Asset Lineage
https://github.com/odpi/egeria
Open Metadata Access Service (OMAS) instance
47
https://github.com/odpi/egeria
EVOLUTION OF GOVERNANCE
48
Egeria guidance on governance
https://github.com/odpi/egeria
Governance maturity seen in terms of Value and Scope
https://github.com/odpi/egeria
Building governance maturity is a gradual process
 Organizations may operate different
levels of maturity in different parts of
their business.
 Choices determined by where the
most value lies.
 Many organizations aspire to provide
all employees with the data they need
(data citizenship*)
50
https://opengovernance.odpi.org/maturity-model/
https://github.com/odpi/egeria
Implementing Data Awareness
51
https://github.com/odpi/egeria
Implementing Governance Awareness
52
https://github.com/odpi/egeria
Implementing Embedded Governance
53
https://github.com/odpi/egeria
Implementing Business Driven Governance
54
https://github.com/odpi/egeria
Implementing Data Citizenship
55
https://github.com/odpi/egeria
Further Information
 ODPi :
 Website: https://www.odpi.org/
 ODPi / Egeria
 Website: https://www.odpi.org/projects/egeria
 Technical Information: https://egeria.odpi.org/
 ODPi Guidance on Governance
 https://opengovernance.odpi.org/
 Open source repositories:
 http://github.com/odpi/egeria
 https://github.com/odpi/data-governance
56
https://github.com/odpi/egeria
ADDITIONAL INFORMATION
57
https://github.com/odpi/egeria
COMMUNITY AND ECOSYSTEM
58
Building a strong community for the future.
https://github.com/odpi/egeria
Open source dependencies
59
Spring Boot
https://github.com/odpi/egeria
Using ODPi Egeria …
 Eases the cost of metadata integration
through
 Comprehensive standards and libraries.
 Active vendor recruitment program.
 Provides direct support to many
governance roles, filling the gaps
between function offered through
commercial tools.
 Provides best practices and content
packs to accelerate an organization’s
journey to becoming data driven.
60
https://github.com/odpi/egeria
Egeria Conformance Program -
its an “imitation game”
61
Workbench
Vendors that pass the
conformance suite can
display this mark
https://github.com/odpi/egeria
Running the Conformance Suite
62
https://github.com/odpi/egeria
The ODPi is a non-profit that is part of The Linux Foundation
 Delivering core technology
 Recruiting vendors
 Assisting practitioners
63
Vendors
Practitioners
Core
Technology
Conformance
Suite
Best
Practices
Project
Egeria
Project
Data
Governance
https://github.com/odpi/egeria
Links
 Press Releases and Podcast
 Open source repositories
• https://github.com/odpi/data-governance
• https://github.com/odpi/egeria
• https://www.linuxfoundation.org/press-release/2018/08/odpi-announces-egeria-for-open-
sharing-exchange-and-governance-of-metadata/
• https://www.linuxfoundation.org/press-release/2019/02/odpi-announces-new-egeria-
conformance-program-to-advance-open-metadata-exchange-between-vendor-tools/
• https://roaringelephant.org/2018/09/25/episode-107-open-metadata-and-governance-
masterclass-with-mandy-chessell-part-1/
• https://roaringelephant.org/2018/10/09/episode-109-open-metadata-and-governance-
• masterclass-with-mandy-chessell-part-2/
• https://youtu.be/ryd3KFWT1mc
64
https://github.com/odpi/egeria
VIRTUAL DATA CONNECTOR
65
Using metadata to control access to data
https://github.com/odpi/egeria
Automating governance example
IBM
Information
Governance
Catalog
Apache Atlas
Apache Ranger
Gaian
Define
Policies
Hadoop
Metadata
Manage Data Access
Egeria Cohort
(Open metadata exchange and federated queries)
Access
Data
Egeria
Open
Governance APIs
configure
configure
66
https://github.com/odpi/egeria
Scared to share (example)
Faith Broker
Human Resources
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3
00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 56944 045 27 Code St Harlem NY 1 3
00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 43800 215 27 Code St Harlem NY 1 3
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 ##### ### 27 Code St Harlem NY 1 3
00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 ##### ### 27 Code St Harlem NY 1 3
00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 ##### ### 27 Code St Harlem NY 1 3
Callie Quartile
Data Scientist
Very Sensitive DataVery Sensitive Data
67
https://github.com/odpi/egeria
What does metadata look like?
Business
metadata
Structural
metadata for
a data store
EMPNAME EMPNO JOBCODE SALARY
EMPLOYEE
RECORD
Employee
Work Location
Annual Salary
Job Title
Employee Id
Employee Name
Hourly Pay Rate
Manager Compensation Plan
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
IS-A
IS-A
Sensitive
IS-A
Data
00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3
68
https://github.com/odpi/egeria
Automating governance example
IBM
Information
Governance
Catalog
Apache Atlas
Apache Ranger
Gaian
Define
Policies
Hadoop
Metadata
Manage Data Access
Egeria Cohort
(Open metadata exchange and federated queries)
Access
Data
Egeria
Open
Governance APIs
configure
configure
69
https://github.com/odpi/egeria
INTEGRATING WITH PARTNERS
70
Working with different vendors
https://github.com/odpi/egeria
Metadata Repository Integration Patterns
 Adapter
 Native
 Plug-in
 Caller
 Special
71
https://github.com/odpi/egeria
IBM Information Governance Catalog Integration
 Egeria’s IGC integration uses the
Adapter Pattern
 There are two connectors to IGC running
in the repository proxy server.
 They translate IGC APIs and events into
open metadata APIs and events.
 Egeria handles the interaction with the
cohort.
 No need to upgrade IGC to adopt
 Outbound metadata only
72
Information
Governance
Catalog
Repository
Proxy
Repository
Connector
Event
Mapper
Connector
Open Metadata Highway
ODPi Egeria
https://github.com/odpi/egeria
Apache Atlas Integration
 The Egeria community is working on a similar
integration for Apache Atlas.
 Again there are two connectors in the repository
proxy server.
 These connectors translate Atlas APIs and events
into open metadata APIs and events.
 Egeria handles the interaction with the cohort.
 No need to upgrade Atlas to adopt
 Two-way exchange of native Atlas metadata
73
Apache Atlas
Repository
Proxy
Repository
Connector
Event
Mapper
Connector
Open Metadata Highway
ODPi Egeria
https://github.com/odpi/egeria
Native Integration
 An alternative approach is the Native Pattern
 There are still two connectors. They translate
internal APIs and events into open metadata APIs
and events.
 ODPi Egeria handles the interaction with the cohort.
 The connectors and the ODPi Egeria libraries reside
in the metadata server.
 No additional server; less network traffic; upgrade
required.
74
Repository
Connector
Event
Mapper
Connector
Open Metadata Highway
ODPi Egeria
Metadata
Server
https://github.com/odpi/egeria
Plug-in Integration
 The plug-in pattern allows different repository back-
ends to be plugged into the ODPi Egeria’s OMAG
Server.
 Egeria includes:
 In-memory Repository (Testing and demos)
 JanusGraph Repository (All scenarios)
 Supports the full protocol and fills in the gaps left by
the proprietary tools.
75
Repository
Connector
Open Metadata Highway
Open Metadata and
Governance (OMAG)
Server
https://github.com/odpi/egeria
EGERIA LOCAL GRAPH REPOSITORY
76
https://github.com/odpi/egeria
The OMRSMetadataCollection interface
 The interface to an Egeria repository is the OMRSMetadataCollection interface
 It includes groups of operations:
 Group 1: Identification of metadata repository - metadataCollectionId
 Group 2: Type definitions (types, attributes) - add, find, get, remove, …
 Group 3: Find instances (entities, relationships) - get, find, graph-queries, …
 Group 4: Maintain instances (entities, relationships) - addEntity, deleteEntity, …
 Group 5: Change control information (entities, relationships) - reIdentify, reHome, …
 Group 6: Maintenance of reference (replica) copies – save, purge, refresh,…
https://github.com/odpi/egeria
Egeria Local Graph Repository
 The Egeria distribution includes a persistent repository and a non-persistent reposiutory
 The persistent repository is a graph repository built on JanusGraph
 JanusGraph is an open-source project, hosted by the Linux Foundation
 http://janusgraph.org
 http://github.com/janusgraph/janusgraph
 The built-in graph repository provides an OMAG Server with a persistent metadata store and is built
using Egeria’s ‘plugin’ pattern
 The graph repository can store instances of metadata owned by the local server
 It can also store reference copies of metadata instances replicated to the local server
 It also supports relationship instances that refer to entity proxy instances
https://github.com/odpi/egeria
Anatomy of the local graph repository
79
Graph Repository
JanusGraph
persistence
search
OMAG Server
OMAS – access services
OMRS Enterprise Connector OMRS topics
in
out
Apache
Tinkerpop
OMRS Local Connector
& Event Mapper
OMRS Graph Connector
JanusGraph
Management
Cohort
https://github.com/odpi/egeria
Graph Repository components
 GraphOMRSRepositoryConnector - implements the open connector framework interface
 GraphOMRSRepositoryConnectorProvider – implements the mechanism for brokering a connector
 GraphOMRSMetadataCollection – top level interface supporting type and instance operations
 GraphOMRSMetadataStore – implements the MetadataCollection using a graph database
 GraphOMRSGraphFactory – creation, schema, indexing - encapsulates JanusGraph-specifics
 Mappers – convert between OMRS objects and graph vertices and edges
 GraphOMRSEntityMapper
 GraphOMRSRelationshipMapper
 GraphOMRSClassificationMapper
 Plus various utility classes – error codes, audit logging, constants and utility methods
https://github.com/odpi/egeria/
See open-metadata-implementation/adapters/open-connectors/repository-services-connectors/
open-metadata-collection-store-connectors/graph-repository-connector
https://github.com/odpi/egeria
To use the Egeria Graph Repository
 Configure the OMAG Server repository-mode = ‘local-graph-repository’
 e.g. HTTP POST http://localhost:8080/open-metadata/admin-
services/users/{username}/servers/{servermame}/local-repository/mode/local-graph-repository
 Subsequently, start the OMRS instance in the server
 e.g. HTTP POST http://localhost:8080/open-metadata/admin-
services/users/{username}/servers/{servername}/instance
 When OMRS starts, the graph repository auto-creates a JanusGraph database – including:
 Persistence backend
 Search backend
 Graph schema
 Search indexes
 For now, the persistence backend is embedded Berkeley DB and the indexing backend is Lucene –
further options could be added
https://github.com/odpi/egeria
Graph Schema
The MetadataCollection interface is the formal interface to an Egeria repository.
Whilst it is possible to look at the graph directly (e.g. using Gremlin console):
Please don’t rely on the schema – it is likely to evolve
Type data:
 The Graph Repository does not store type definitions
 It delegates all type operations to the Repository Content Manager
Instance data:
 The Egeria Graph Repository stores instance data, using a JanusGraph schema that has:
 vertices for entities and classifications
 edges for relationships and classifiers
https://github.com/odpi/egeria
Instance representations in the OMRS
83
https://github.com/odpi/egeria
Graph mapping – vertices and edges
Classification
Instance
Entity
Instance
Relationship
Instance
Attributes
Primitives
Enums
Collections
AttributesAttributes
Primitives
Enums
Collections
Primitives
Enums
Collections
label : “classification” label : “entity” label : “relationship”
Properties Properties Properties
vertex
label : “classifier”
Properties
OMRSinstance
representation
Graphschema
element
vertex edge edge
https://github.com/odpi/egeria
Graph mapping – vertices and edges
Properties
Properties Properties
Properties
Properties
entity
entity
classification
classification
https://github.com/odpi/egeria
Metadata Repository API
 A MetadataCollection supports a comprehensive API
 Metadata collection Id
 Query types
 Define/maintain types
 Search/query metadata instances
 Maintain metadata instances
 Historical (as of time) queries
 Effectivity dating
 Versioning
 Metadata
 Advanced maintenance
 Managing reference copied
 Protocol is forgiving – allowing minimal capability -
metadata instance search/query
86
https://github.com/odpi/egeria
Local instances, reference copies and proxies
87
The graph contains one vertex per entity – whether the entity is local, a reference copy or a proxy
The graph contains one edge per relationship – whether the relationship is local or a reference copy
Reference Copies
• The metadataCollectionId core attribute is set to the ‘guid’ of the home repository
Entity Proxy objects
• Each entity instance has a vertex property of type Boolean, to indicate whether the instance is a proxy
https://github.com/odpi/egeria
The MetadataCollection ‘graph-query’ methods
 There are 4 sub-graph query methods:
 getRelatedEntities()
 Returns the entity and its immediate neighbors
 getEntityNeighborhood()
 Returns the entity and its neighbors up to the depth specified by the
‘level’ parameter
 getLinkingEntities()
 Returns the relationships and intermediate entities that connect the
specified pair of entities
 getRelationshipsForEntity()
 Returns relationships associated with entity, optionally filtered by
relationship type and status
level = 2
https://github.com/odpi/egeria
Graph Repository – supported functions
 The GraphRepository supports most of the OMRS MetadataCollection API, including:
 Save and purge of reference copies
 Use of entity proxies
 Delete and restore as well as purge – delete is a soft, restorable delete; purge is permanent
 Re-type of instances
 Re-identify of instances
 Re-home of instances
 The four ‘graph queries’ – described on the previous slide
 The ‘find’ methods – find..ByProperty, find..ByPropertyValue, findEntityByClassification
 The Graph Repository does not (yet) support:
 Historic queries – find methods that specify an asOfTime parameter
 Undo of previous instance updates
https://github.com/odpi/egeria
USER INTERFACE DESIGN
90
Supporting business and technical people
https://github.com/odpi/egeria
UI: good and the not so good.
91
Confusing
Not my language
(too technical or not technical enough)
Not meeting my needs
Presented for my role
Logically flows to complete the
tasks I do.
Underpinned by relevant
(persona specific) APIs
Not using my words
Mismatches my world view
Someone from my role was involved
In creating the UI.
https://github.com/odpi/egeria
UIs
ODPi Egeria design
92
Search
Open Metadata Access Services
Open Metadata Repository Services
92
Use cases,
Personas,
Practitioners
input
Data integration,
availability and
integrity best
practices
ODPi
Egeria
Metadata
repositories
https://github.com/odpi/egeria
UIs
ODPi Egeria UI types
93
Open Metadata Access Services
Open Metadata Repository Services
93
Search
Daemon
Type 1
OMAS only
Type 2
OMAS and OCF
Connector
Type 3
OMRS
Type 4
Daemon UI
Data
store
https://github.com/odpi/egeria
UIs
ODPi Egeria UI types work in progress
9494
Search
Type 1
OMAS only
Type 2
OMAS and OCF
Connector
Type 3
OMRS
Type 4
Daemon UI
IBM creating
Subject Area UI
ING creating
Asset Search
IBM creating
Type explorer
and instance
explorer
ING creating
Lineage viewer
https://github.com/odpi/egeria
Tomcat *
• configuration
Current UI implementation
95
Web app
Egeria
OMAG Server
Rest call
* Egeria Uis are coded to work with Tomcat. We expect other web servers will be used as the community
requires and implements.
https://github.com/odpi/egeria
UI design – profile driven
96
Login
Personal
Profile
User’s roles defines what UI capabilities
a user should see
Subject
area
Type
explorer
Asset
Search
Many more to come ……..
Dealing well with
potentially large
amounts of data in a
persona specific way is
the challenge. E.g. by
paging, limiting by
neighborhood depth in
graph calls
https://github.com/odpi/egeria
Egeria UI technology experiences
97
• Web component technology providing web components. It is not a framework
• + nice separation of components – hiding implementation in shadow dom
• + communicate with property binding
• + support for events
• + many existing paper and iron components for simple things.
David’s (Polymer newby) experiences:
• - quirky – spent a lot of time finding the happy path to get things working, especially around web
components not being initialized when you want to use them (a big frustration was trying to issue a rest call
from the ready() method).
• +/- need to be rigorous with architecture, it seems best to use one way bindings and events and
a top level controller component to drive state transitions for MVC e.g. around a grid. Redux may make
sense to hold state and define state transitions
• - There is no free commercial smart (editable) grid I can find (this seems true for other frameworks as well)
https://github.com/odpi/egeria
The sort of architecture more complex web components
require.
98
• Controller controls all transitions
• The model allows data updates to occur on
the model with simple CRUD operations
• The model changes are then reflected into
the view.
Considerations:
- Operations are currently synchronous. Redux
would be asynchronous
- Spinner would need to lock across the complete
User interaction not just the rest call
- Changes to the view made by the user and
changes to the view from the model, need to be
managed
- Paging required.
https://github.com/odpi/egeria
Call for action!
99
Call to the community for open source UI developers!
Be part of showing how powerful open metadata is using visualization!
Fuel the ODPi rocket!

More Related Content

What's hot

Leveraging Apache Spark and Delta Lake for Efficient Data Encryption at Scale
Leveraging Apache Spark and Delta Lake for Efficient Data Encryption at ScaleLeveraging Apache Spark and Delta Lake for Efficient Data Encryption at Scale
Leveraging Apache Spark and Delta Lake for Efficient Data Encryption at ScaleDatabricks
 
A Query Model for Ad Hoc Queries using a Scanning Architecture
A Query Model for Ad Hoc Queries using a Scanning ArchitectureA Query Model for Ad Hoc Queries using a Scanning Architecture
A Query Model for Ad Hoc Queries using a Scanning ArchitectureFlurry, Inc.
 
Splunk for db_connect
Splunk for db_connectSplunk for db_connect
Splunk for db_connectGreg Hanchin
 
Data Management Systems for Government Agencies - with CKAN
Data Management Systems for Government Agencies - with CKANData Management Systems for Government Agencies - with CKAN
Data Management Systems for Government Agencies - with CKANSteven De Costa
 
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...Journal For Research
 
Talend Open Studio Introduction - OSSCamp 2014
Talend Open Studio Introduction - OSSCamp 2014Talend Open Studio Introduction - OSSCamp 2014
Talend Open Studio Introduction - OSSCamp 2014OSSCube
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop frameworkTu Pham
 
File Repository on GAE
File Repository on GAEFile Repository on GAE
File Repository on GAElynneblue
 
Data analysis using hive ql & tableau
Data analysis using hive ql & tableauData analysis using hive ql & tableau
Data analysis using hive ql & tableaupkale1708
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnDatabricks
 
CKAN and Australian open data updates for Wikimedia - 7 October 2015
CKAN and Australian open data updates for Wikimedia - 7 October 2015CKAN and Australian open data updates for Wikimedia - 7 October 2015
CKAN and Australian open data updates for Wikimedia - 7 October 2015Steven De Costa
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Databricks
 
Databricks with R: Deep Dive
Databricks with R: Deep DiveDatabricks with R: Deep Dive
Databricks with R: Deep DiveDatabricks
 
Introduction To Hibernate
Introduction To HibernateIntroduction To Hibernate
Introduction To Hibernateashishkulkarni
 

What's hot (20)

Leveraging Apache Spark and Delta Lake for Efficient Data Encryption at Scale
Leveraging Apache Spark and Delta Lake for Efficient Data Encryption at ScaleLeveraging Apache Spark and Delta Lake for Efficient Data Encryption at Scale
Leveraging Apache Spark and Delta Lake for Efficient Data Encryption at Scale
 
A Query Model for Ad Hoc Queries using a Scanning Architecture
A Query Model for Ad Hoc Queries using a Scanning ArchitectureA Query Model for Ad Hoc Queries using a Scanning Architecture
A Query Model for Ad Hoc Queries using a Scanning Architecture
 
Splunk for db_connect
Splunk for db_connectSplunk for db_connect
Splunk for db_connect
 
Data Management Systems for Government Agencies - with CKAN
Data Management Systems for Government Agencies - with CKANData Management Systems for Government Agencies - with CKAN
Data Management Systems for Government Agencies - with CKAN
 
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
PERFORMANCE EVALUATION OF SOCIAL NETWORK ANALYSIS ALGORITHMS USING DISTRIBUTE...
 
Talend Open Studio Introduction - OSSCamp 2014
Talend Open Studio Introduction - OSSCamp 2014Talend Open Studio Introduction - OSSCamp 2014
Talend Open Studio Introduction - OSSCamp 2014
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
 
File Repository on GAE
File Repository on GAEFile Repository on GAE
File Repository on GAE
 
Sqrrl and Accumulo
Sqrrl and AccumuloSqrrl and Accumulo
Sqrrl and Accumulo
 
Data analysis using hive ql & tableau
Data analysis using hive ql & tableauData analysis using hive ql & tableau
Data analysis using hive ql & tableau
 
What is apache pig
What is apache pigWhat is apache pig
What is apache pig
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
 
What is apache_pig
What is apache_pigWhat is apache_pig
What is apache_pig
 
CKAN and Australian open data updates for Wikimedia - 7 October 2015
CKAN and Australian open data updates for Wikimedia - 7 October 2015CKAN and Australian open data updates for Wikimedia - 7 October 2015
CKAN and Australian open data updates for Wikimedia - 7 October 2015
 
Introduction To Pentaho Kettle
Introduction To Pentaho KettleIntroduction To Pentaho Kettle
Introduction To Pentaho Kettle
 
Tableau Architecture
Tableau ArchitectureTableau Architecture
Tableau Architecture
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
 
Databricks with R: Deep Dive
Databricks with R: Deep DiveDatabricks with R: Deep Dive
Databricks with R: Deep Dive
 
Ado.net
Ado.netAdo.net
Ado.net
 
Introduction To Hibernate
Introduction To HibernateIntroduction To Hibernate
Introduction To Hibernate
 

Similar to FROM BIG DATA TO ACTION: HOW TO BREAK OUT OF THE SILOS AND LEVERAGE DATA GOVERNANCE FOR YOUR ORGANIZATION

Technical Challenges in Open Metadata
Technical Challenges in Open MetadataTechnical Challenges in Open Metadata
Technical Challenges in Open MetadataAll Things Open
 
Become an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi EgeriaBecome an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi EgeriaData Con LA
 
Egeria and graphs
Egeria and graphsEgeria and graphs
Egeria and graphsODPi
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...DataWorks Summit
 
[db tech showcase Tokyo 2018] #dbts2018 #C25 『マルチモデル・データベースへの道: PostgreSQLを最も...
[db tech showcase Tokyo 2018] #dbts2018 #C25 『マルチモデル・データベースへの道: PostgreSQLを最も...[db tech showcase Tokyo 2018] #dbts2018 #C25 『マルチモデル・データベースへの道: PostgreSQLを最も...
[db tech showcase Tokyo 2018] #dbts2018 #C25 『マルチモデル・データベースへの道: PostgreSQLを最も...Insight Technology, Inc.
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...DataWorks Summit
 
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoGimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoRomit Mehta
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...DataWorks Summit
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasDataWorks Summit
 
DevSecCon Singapore 2018 - in graph we trust By Imran Mohammed
DevSecCon Singapore 2018 - in graph we trust By Imran MohammedDevSecCon Singapore 2018 - in graph we trust By Imran Mohammed
DevSecCon Singapore 2018 - in graph we trust By Imran MohammedDevSecCon
 
In graph we trust: Microservices, GraphQL and security challenges
In graph we trust: Microservices, GraphQL and security challengesIn graph we trust: Microservices, GraphQL and security challenges
In graph we trust: Microservices, GraphQL and security challengesMohammed A. Imran
 
Building a Cyber Threat Intelligence Knowledge Management System (Paris Augus...
Building a Cyber Threat Intelligence Knowledge Management System (Paris Augus...Building a Cyber Threat Intelligence Knowledge Management System (Paris Augus...
Building a Cyber Threat Intelligence Knowledge Management System (Paris Augus...Vaticle
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowWes McKinney
 
Osgis2011 edina addy_pope
Osgis2011 edina addy_popeOsgis2011 edina addy_pope
Osgis2011 edina addy_popeAddy Pope
 
Osgis2011 edina addy_pope
Osgis2011 edina addy_popeOsgis2011 edina addy_pope
Osgis2011 edina addy_popeAddy Pope
 
ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)Abdelkrim Boujraf
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramFIWARE
 
All data accessible to all my organization - Presentation at OW2con'19, June...
 All data accessible to all my organization - Presentation at OW2con'19, June... All data accessible to all my organization - Presentation at OW2con'19, June...
All data accessible to all my organization - Presentation at OW2con'19, June...OW2
 
Unify logz with fluentd
Unify logz with fluentdUnify logz with fluentd
Unify logz with fluentdSoluto
 

Similar to FROM BIG DATA TO ACTION: HOW TO BREAK OUT OF THE SILOS AND LEVERAGE DATA GOVERNANCE FOR YOUR ORGANIZATION (20)

Technical Challenges in Open Metadata
Technical Challenges in Open MetadataTechnical Challenges in Open Metadata
Technical Challenges in Open Metadata
 
Become an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi EgeriaBecome an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi Egeria
 
Egeria and graphs
Egeria and graphsEgeria and graphs
Egeria and graphs
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
 
[db tech showcase Tokyo 2018] #dbts2018 #C25 『マルチモデル・データベースへの道: PostgreSQLを最も...
[db tech showcase Tokyo 2018] #dbts2018 #C25 『マルチモデル・データベースへの道: PostgreSQLを最も...[db tech showcase Tokyo 2018] #dbts2018 #C25 『マルチモデル・データベースへの道: PostgreSQLを最も...
[db tech showcase Tokyo 2018] #dbts2018 #C25 『マルチモデル・データベースへの道: PostgreSQLを最も...
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
 
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoGimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache Atlas
 
DevSecCon Singapore 2018 - in graph we trust By Imran Mohammed
DevSecCon Singapore 2018 - in graph we trust By Imran MohammedDevSecCon Singapore 2018 - in graph we trust By Imran Mohammed
DevSecCon Singapore 2018 - in graph we trust By Imran Mohammed
 
In graph we trust: Microservices, GraphQL and security challenges
In graph we trust: Microservices, GraphQL and security challengesIn graph we trust: Microservices, GraphQL and security challenges
In graph we trust: Microservices, GraphQL and security challenges
 
Building a Cyber Threat Intelligence Knowledge Management System (Paris Augus...
Building a Cyber Threat Intelligence Knowledge Management System (Paris Augus...Building a Cyber Threat Intelligence Knowledge Management System (Paris Augus...
Building a Cyber Threat Intelligence Knowledge Management System (Paris Augus...
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Osgis2011 edina addy_pope
Osgis2011 edina addy_popeOsgis2011 edina addy_pope
Osgis2011 edina addy_pope
 
Osgis2011 edina addy_pope
Osgis2011 edina addy_popeOsgis2011 edina addy_pope
Osgis2011 edina addy_pope
 
ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
 
All data accessible to all my organization - Presentation at OW2con'19, June...
 All data accessible to all my organization - Presentation at OW2con'19, June... All data accessible to all my organization - Presentation at OW2con'19, June...
All data accessible to all my organization - Presentation at OW2con'19, June...
 
Unify logz with fluentd
Unify logz with fluentdUnify logz with fluentd
Unify logz with fluentd
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

FROM BIG DATA TO ACTION: HOW TO BREAK OUT OF THE SILOS AND LEVERAGE DATA GOVERNANCE FOR YOUR ORGANIZATION

  • 1. https://github.com/odpi/egeria FROM BIG DATA TO ACTION: HOW TO BREAK OUT OF THE SILOS AND LEVERAGE DATA GOVERNANCE FOR YOUR ORGANIZATION? 1 Open metadata and Governance
  • 2. https://github.com/odpi/egeria Introduction  Who:  John Mertic – The Linux Foundation  Chris Replogle – SAS Institute  What:  Using open practices and open software to govern your data. 2
  • 3. https://github.com/odpi/egeria How ING is becoming a metadata driven enterprise using Egeria 3
  • 5. https://github.com/odpi/egeria How can we become more effective with data? 5
  • 6. https://github.com/odpi/egeria The value of open, standardized metadata 6
  • 7. https://github.com/odpi/egeria Using a metadata repository to describe data 7 Metadata Repository
  • 8. https://github.com/odpi/egeria Today’s reality – organizations buy lots of tools 8
  • 11. https://github.com/odpi/egeria A new manifesto for metadata and governance  The maintenance of metadata must be automated to scale to the sheer volumes and variety of data involved in modern business. Similarly the use of metadata should be used to drive the governance of data and create a business friendly logical interface to the data landscape.  The availability of metadata management must become ubiquitous in cloud platforms and large data platforms, such as Apache Hadoop so that the processing engines on these platforms can rely on its availability and build capability around it.  Metadata access must become open and remotely accessible so that tools from different vendors can work with metadata located on different platforms. This implies unique identifiers for metadata elements, some level of standardization in the types and formats for metadata and standard interfaces for manipulating metadata.  Wherever possible, discovery and maintenance of metadata has to an integral part of all tools that access, change and move information.
  • 12. https://github.com/odpi/egeria ODPi Egeria enables exchange of metadata between tools from different vendors Open and Unified Metadata 12 Development DevOps Data Science
  • 13. https://github.com/odpi/egeria EGERIA’S DISTRIBUTED VIRTUAL GRAPH 13 Uniting metadata from many tools
  • 14. https://github.com/odpi/egeria ODPi Egeria enables exchange of metadata between tools 14 Open and Unified Metadata Development DevOps Data Science
  • 15. https://github.com/odpi/egeria Search Open Metadata Access Services Design philosophy 15 Open Metadata Repository Services Use cases, Personas, Practitioners input Data integration, availability and integrity best practices
  • 16. https://github.com/odpi/egeria Search A Cohort of OMAG Servers 16 Open Metadata Repository Services OMRS Cohort Open Metadata Access Services Open Metadata Access Services Open Metadata Access Services Open Metadata And Governance (OMAG) Server
  • 17. https://github.com/odpi/egeria Egeria Open Metadata Repository Services (OMRS)  The OMRS defines a protocol and a set of connectors  The Enterprise Connector performs cohort-wide operations – this includes issuing queries to the cohort and when metadata is replicated from another server it can use the local connector and repository to cache it for availability and performance  The Local Connector performs local operations and provides a default Event Mapper that enables events relating to local operations to be sent to the cohort  The Repository Connector interfaces to a specific repository – and optionally, may be accompanied by a custom Event Mapper  Egeria provides two built in repositories and there are connectors to other repositories  The interface to a repository connector is the MetadataCollection API, described on the next slide OMRS Enterprise Connector OMRS Local Connector & Event Mapper OMRS Repository Connector Repository Cohort MetadataCollection API
  • 18. https://github.com/odpi/egeria Egeria metadata – a distributed graph Business metadata Structural metadata for a data store EMPNAM E EMPNO JOBCODE SALARY EMPLOYEE RECORD Employee Work Location Annual Salary Job Title Employee Id Employee Name Hourly Pay Rate Manager Compensation Plan HAS-A HAS-A HAS-A HAS-A HAS-A HAS-A IS-A IS-A SensitiveIS-A Data  The interconnected nature of metadata forms a graph  The distributed nature of Egeria leads to a distributed graph…
  • 19. https://github.com/odpi/egeria Egeria distributed graph model 19 Database Column Glossary Term OMAG Server 1 OMAG Server 2 Entity Entity  A pair of entities are stored in separate servers
  • 20. https://github.com/odpi/egeria Egeria distributed graph model 20 Database Column Glossary Term Glossary Term Meaning OMAG Server 1 OMAG Server 2 Reference Copy Relationship  One entity could be replicated to the other server, as a ‘reference copy’  The original Glossary Term on OMAG Server 2 is still the master  A relationship could be defined between the local DB column and the reference copy of the Glossary Term
  • 21. https://github.com/odpi/egeria Egeria distributed graph model 21 Database Column Glossary Term OMAG Server 1 OMAG Server 3 OMAG Server 2 Database Column Glossary Term Meaning  Both entities could be replicated to a third server, as reference copies  The originals are still the masters  A relationship could be defined between the local reference copies
  • 22. https://github.com/odpi/egeria Egeria distributed graph model 22 Database Column Glossary Term OMAG Server 1 OMAG Server 3 OMAG Server 2 Meaning Database Column Glossary Term Entity Proxy  Instead of replication, the third server could relate the original entities using entity proxies
  • 23. https://github.com/odpi/egeria DEPLOYMENT PATTERNS 23 From large scale cloud services, on-premises local deployments to edge IoT devices
  • 24. https://github.com/odpi/egeria A hybrid multi-cloud world Data Lake Mobile Apps Databases ApplicationsFiles Independent metadata Repository Linked metadata Repositories Business Partners Sharing data IoT devices and systems Applications New applications deployed to cloud
  • 25. https://github.com/odpi/egeria Open metadata ecosystem Data Lake Mobile Apps Databases ApplicationsFiles Independent metadata Repository Linked metadata Repositories Business Partners Sharing data IoT devices and systems Applications New applications deployed to cloud
  • 26. https://github.com/odpi/egeria The OMAG Server Platform 26 OMAG Server Platform OMAG Server Platform OMAG Server Platform OMAG Server Platform Egeria Server 1 Egeria Server 2 Egeria Server 3 Kubernetes OMAG Server Platform Egeria Server 1 Egeria Server 2 Egeria Server 3 Multi-tenant OMAG Server Platform Egeria Server 1 Edge
  • 29. https://github.com/odpi/egeria Example of a simple cohort Cohort A Chief Data Office Data Lake Systems of Record 29 Virtualizer Security-Sync Data Bridge Apache Ranger Gaian Stewardship Stewardship Stewardship Data Onboarding
  • 32. https://github.com/odpi/egeria First server  The first server to join the cohort issues a registration request and waits for others to join. 32
  • 33. https://github.com/odpi/egeria Establishing contact  When another server joins the cohort they exchange registration information. 33
  • 34. https://github.com/odpi/egeria Federated queries  Once the registration is complete the cohort members can query each other. 34
  • 35. https://github.com/odpi/egeria Caching metadata for availability and performance  Metadata can also be replicated through the cohort 35
  • 37. https://github.com/odpi/egeria Scope of metadata covered Glossary Collaboration Governance Models and Reference Data Metadata Discovery Lineage Data Assets Base Types, Systems and Infrastructure 37
  • 38. https://github.com/odpi/egeria Scope of metadata covered Policy Metadata (Principles, Regulations, Standards, Approaches, Rule Specifications, Roles and Metrics) Governance Actions and Processes Augmentation MappingImplementation Business Objects and Relationships, Taxonomies and Ontologies Business Attributes Organization Teaming Metadata (people profiles, communities, projects, notebooks, …) Models and Schemas 4 3 1 5 Physical Asset Descriptions (Data stores, APIs, models and components) Asset Collections (Sets, Typed Sets, Type Organized Sets) Information Views Rights Management Reference Data Feedback Metadata (tags, comments, ratings, …) ClassificationSchemes Classification Strategy Subject Area Definition Campaigns and Projects Rollout 2 Discovery Metadata (profile data, technical classification, data classification, data quality assessment, …) Augmentation Instrument Association Information Process Instrumentation (design lineage) 6 7 ConnectorsBasic Types, Infrastructure and Systems Access 0 38
  • 40. https://github.com/odpi/egeria Search Open Metadata Access Services Design philosophy Open Metadata Repository Services 40 Use cases, Personas, Practitioners input Data integration, availability and integrity best practices
  • 41. https://github.com/odpi/egeria Coco Pharmaceuticals persona Jules Keeper, CDO Tessa Tube, Chief Researcher Erin Overview, Information Architect Faith Broker Chief Privacy Offic e r Bob Nitter, Integration Developer Callie Quartile, Data Scientist Nancy Noah Cloud Specialist Gary Geeke IT Infrastructure https://odpi.github.io/data-governance/coco-pharmaceuticals/personas/ 41
  • 42. https://github.com/odpi/egeria Using design thinking  Open Metadata Types  Access Service Identification  Samples and API design  Best Practices 42
  • 43. https://github.com/odpi/egeria Different personas need different services Callie Quartile Data Scientist Jules Keeper Chief Data Officer Find data Understand data Manage analytics models Build data strategy Define governance program Monitor progress 43
  • 44. https://github.com/odpi/egeria Different personas need different services Tanya Tidie Clinical Trials Administrator Ivor Padlock Chief Security Officer Maintain accurate patient records Catalog clinical trials data Demonstrate good data management practices Understand risks to organization Set up protection Monitor for suspicious activity 44
  • 46. https://github.com/odpi/egeria Current Open Metadata Access Services (OMASs) 46 Project Management Community ProfileAsset Catalog Stewardship Action Information View Governance Program Data Process Subject Area Connected Asset Discovery EngineGovernance Engine Data Protection Software Developer Data Platform Asset Owner Digital Architecture Data Science DevOps Asset Consumer Data Infrastructure Data Privacy Asset Lineage
  • 50. https://github.com/odpi/egeria Building governance maturity is a gradual process  Organizations may operate different levels of maturity in different parts of their business.  Choices determined by where the most value lies.  Many organizations aspire to provide all employees with the data they need (data citizenship*) 50 https://opengovernance.odpi.org/maturity-model/
  • 56. https://github.com/odpi/egeria Further Information  ODPi :  Website: https://www.odpi.org/  ODPi / Egeria  Website: https://www.odpi.org/projects/egeria  Technical Information: https://egeria.odpi.org/  ODPi Guidance on Governance  https://opengovernance.odpi.org/  Open source repositories:  http://github.com/odpi/egeria  https://github.com/odpi/data-governance 56
  • 60. https://github.com/odpi/egeria Using ODPi Egeria …  Eases the cost of metadata integration through  Comprehensive standards and libraries.  Active vendor recruitment program.  Provides direct support to many governance roles, filling the gaps between function offered through commercial tools.  Provides best practices and content packs to accelerate an organization’s journey to becoming data driven. 60
  • 61. https://github.com/odpi/egeria Egeria Conformance Program - its an “imitation game” 61 Workbench Vendors that pass the conformance suite can display this mark
  • 63. https://github.com/odpi/egeria The ODPi is a non-profit that is part of The Linux Foundation  Delivering core technology  Recruiting vendors  Assisting practitioners 63 Vendors Practitioners Core Technology Conformance Suite Best Practices Project Egeria Project Data Governance
  • 64. https://github.com/odpi/egeria Links  Press Releases and Podcast  Open source repositories • https://github.com/odpi/data-governance • https://github.com/odpi/egeria • https://www.linuxfoundation.org/press-release/2018/08/odpi-announces-egeria-for-open- sharing-exchange-and-governance-of-metadata/ • https://www.linuxfoundation.org/press-release/2019/02/odpi-announces-new-egeria- conformance-program-to-advance-open-metadata-exchange-between-vendor-tools/ • https://roaringelephant.org/2018/09/25/episode-107-open-metadata-and-governance- masterclass-with-mandy-chessell-part-1/ • https://roaringelephant.org/2018/10/09/episode-109-open-metadata-and-governance- • masterclass-with-mandy-chessell-part-2/ • https://youtu.be/ryd3KFWT1mc 64
  • 66. https://github.com/odpi/egeria Automating governance example IBM Information Governance Catalog Apache Atlas Apache Ranger Gaian Define Policies Hadoop Metadata Manage Data Access Egeria Cohort (Open metadata exchange and federated queries) Access Data Egeria Open Governance APIs configure configure 66
  • 67. https://github.com/odpi/egeria Scared to share (example) Faith Broker Human Resources 00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3 00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 56944 045 27 Code St Harlem NY 1 3 00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 43800 215 27 Code St Harlem NY 1 3 00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 ##### ### 27 Code St Harlem NY 1 3 00 3809890 3 7 Callie Quartile 328080 7432 5 New York 4 27 Data Scientist 1 ##### ### 27 Code St Harlem NY 1 3 00 3809890 1 7 Tanya Tidie 209482 4051 2 New York 4 27 Data Steward 1 ##### ### 27 Code St Harlem NY 1 3 Callie Quartile Data Scientist Very Sensitive DataVery Sensitive Data 67
  • 68. https://github.com/odpi/egeria What does metadata look like? Business metadata Structural metadata for a data store EMPNAME EMPNO JOBCODE SALARY EMPLOYEE RECORD Employee Work Location Annual Salary Job Title Employee Id Employee Name Hourly Pay Rate Manager Compensation Plan HAS-A HAS-A HAS-A HAS-A HAS-A HAS-A IS-A IS-A Sensitive IS-A Data 00 3809890 6 7 Lemmie Stage 818928 3082 4 New York 4 27 DataStage Expert 1 45324 300 27 Code St Harlem NY 1 3 68
  • 69. https://github.com/odpi/egeria Automating governance example IBM Information Governance Catalog Apache Atlas Apache Ranger Gaian Define Policies Hadoop Metadata Manage Data Access Egeria Cohort (Open metadata exchange and federated queries) Access Data Egeria Open Governance APIs configure configure 69
  • 71. https://github.com/odpi/egeria Metadata Repository Integration Patterns  Adapter  Native  Plug-in  Caller  Special 71
  • 72. https://github.com/odpi/egeria IBM Information Governance Catalog Integration  Egeria’s IGC integration uses the Adapter Pattern  There are two connectors to IGC running in the repository proxy server.  They translate IGC APIs and events into open metadata APIs and events.  Egeria handles the interaction with the cohort.  No need to upgrade IGC to adopt  Outbound metadata only 72 Information Governance Catalog Repository Proxy Repository Connector Event Mapper Connector Open Metadata Highway ODPi Egeria
  • 73. https://github.com/odpi/egeria Apache Atlas Integration  The Egeria community is working on a similar integration for Apache Atlas.  Again there are two connectors in the repository proxy server.  These connectors translate Atlas APIs and events into open metadata APIs and events.  Egeria handles the interaction with the cohort.  No need to upgrade Atlas to adopt  Two-way exchange of native Atlas metadata 73 Apache Atlas Repository Proxy Repository Connector Event Mapper Connector Open Metadata Highway ODPi Egeria
  • 74. https://github.com/odpi/egeria Native Integration  An alternative approach is the Native Pattern  There are still two connectors. They translate internal APIs and events into open metadata APIs and events.  ODPi Egeria handles the interaction with the cohort.  The connectors and the ODPi Egeria libraries reside in the metadata server.  No additional server; less network traffic; upgrade required. 74 Repository Connector Event Mapper Connector Open Metadata Highway ODPi Egeria Metadata Server
  • 75. https://github.com/odpi/egeria Plug-in Integration  The plug-in pattern allows different repository back- ends to be plugged into the ODPi Egeria’s OMAG Server.  Egeria includes:  In-memory Repository (Testing and demos)  JanusGraph Repository (All scenarios)  Supports the full protocol and fills in the gaps left by the proprietary tools. 75 Repository Connector Open Metadata Highway Open Metadata and Governance (OMAG) Server
  • 77. https://github.com/odpi/egeria The OMRSMetadataCollection interface  The interface to an Egeria repository is the OMRSMetadataCollection interface  It includes groups of operations:  Group 1: Identification of metadata repository - metadataCollectionId  Group 2: Type definitions (types, attributes) - add, find, get, remove, …  Group 3: Find instances (entities, relationships) - get, find, graph-queries, …  Group 4: Maintain instances (entities, relationships) - addEntity, deleteEntity, …  Group 5: Change control information (entities, relationships) - reIdentify, reHome, …  Group 6: Maintenance of reference (replica) copies – save, purge, refresh,…
  • 78. https://github.com/odpi/egeria Egeria Local Graph Repository  The Egeria distribution includes a persistent repository and a non-persistent reposiutory  The persistent repository is a graph repository built on JanusGraph  JanusGraph is an open-source project, hosted by the Linux Foundation  http://janusgraph.org  http://github.com/janusgraph/janusgraph  The built-in graph repository provides an OMAG Server with a persistent metadata store and is built using Egeria’s ‘plugin’ pattern  The graph repository can store instances of metadata owned by the local server  It can also store reference copies of metadata instances replicated to the local server  It also supports relationship instances that refer to entity proxy instances
  • 79. https://github.com/odpi/egeria Anatomy of the local graph repository 79 Graph Repository JanusGraph persistence search OMAG Server OMAS – access services OMRS Enterprise Connector OMRS topics in out Apache Tinkerpop OMRS Local Connector & Event Mapper OMRS Graph Connector JanusGraph Management Cohort
  • 80. https://github.com/odpi/egeria Graph Repository components  GraphOMRSRepositoryConnector - implements the open connector framework interface  GraphOMRSRepositoryConnectorProvider – implements the mechanism for brokering a connector  GraphOMRSMetadataCollection – top level interface supporting type and instance operations  GraphOMRSMetadataStore – implements the MetadataCollection using a graph database  GraphOMRSGraphFactory – creation, schema, indexing - encapsulates JanusGraph-specifics  Mappers – convert between OMRS objects and graph vertices and edges  GraphOMRSEntityMapper  GraphOMRSRelationshipMapper  GraphOMRSClassificationMapper  Plus various utility classes – error codes, audit logging, constants and utility methods https://github.com/odpi/egeria/ See open-metadata-implementation/adapters/open-connectors/repository-services-connectors/ open-metadata-collection-store-connectors/graph-repository-connector
  • 81. https://github.com/odpi/egeria To use the Egeria Graph Repository  Configure the OMAG Server repository-mode = ‘local-graph-repository’  e.g. HTTP POST http://localhost:8080/open-metadata/admin- services/users/{username}/servers/{servermame}/local-repository/mode/local-graph-repository  Subsequently, start the OMRS instance in the server  e.g. HTTP POST http://localhost:8080/open-metadata/admin- services/users/{username}/servers/{servername}/instance  When OMRS starts, the graph repository auto-creates a JanusGraph database – including:  Persistence backend  Search backend  Graph schema  Search indexes  For now, the persistence backend is embedded Berkeley DB and the indexing backend is Lucene – further options could be added
  • 82. https://github.com/odpi/egeria Graph Schema The MetadataCollection interface is the formal interface to an Egeria repository. Whilst it is possible to look at the graph directly (e.g. using Gremlin console): Please don’t rely on the schema – it is likely to evolve Type data:  The Graph Repository does not store type definitions  It delegates all type operations to the Repository Content Manager Instance data:  The Egeria Graph Repository stores instance data, using a JanusGraph schema that has:  vertices for entities and classifications  edges for relationships and classifiers
  • 84. https://github.com/odpi/egeria Graph mapping – vertices and edges Classification Instance Entity Instance Relationship Instance Attributes Primitives Enums Collections AttributesAttributes Primitives Enums Collections Primitives Enums Collections label : “classification” label : “entity” label : “relationship” Properties Properties Properties vertex label : “classifier” Properties OMRSinstance representation Graphschema element vertex edge edge
  • 85. https://github.com/odpi/egeria Graph mapping – vertices and edges Properties Properties Properties Properties Properties entity entity classification classification
  • 86. https://github.com/odpi/egeria Metadata Repository API  A MetadataCollection supports a comprehensive API  Metadata collection Id  Query types  Define/maintain types  Search/query metadata instances  Maintain metadata instances  Historical (as of time) queries  Effectivity dating  Versioning  Metadata  Advanced maintenance  Managing reference copied  Protocol is forgiving – allowing minimal capability - metadata instance search/query 86
  • 87. https://github.com/odpi/egeria Local instances, reference copies and proxies 87 The graph contains one vertex per entity – whether the entity is local, a reference copy or a proxy The graph contains one edge per relationship – whether the relationship is local or a reference copy Reference Copies • The metadataCollectionId core attribute is set to the ‘guid’ of the home repository Entity Proxy objects • Each entity instance has a vertex property of type Boolean, to indicate whether the instance is a proxy
  • 88. https://github.com/odpi/egeria The MetadataCollection ‘graph-query’ methods  There are 4 sub-graph query methods:  getRelatedEntities()  Returns the entity and its immediate neighbors  getEntityNeighborhood()  Returns the entity and its neighbors up to the depth specified by the ‘level’ parameter  getLinkingEntities()  Returns the relationships and intermediate entities that connect the specified pair of entities  getRelationshipsForEntity()  Returns relationships associated with entity, optionally filtered by relationship type and status level = 2
  • 89. https://github.com/odpi/egeria Graph Repository – supported functions  The GraphRepository supports most of the OMRS MetadataCollection API, including:  Save and purge of reference copies  Use of entity proxies  Delete and restore as well as purge – delete is a soft, restorable delete; purge is permanent  Re-type of instances  Re-identify of instances  Re-home of instances  The four ‘graph queries’ – described on the previous slide  The ‘find’ methods – find..ByProperty, find..ByPropertyValue, findEntityByClassification  The Graph Repository does not (yet) support:  Historic queries – find methods that specify an asOfTime parameter  Undo of previous instance updates
  • 91. https://github.com/odpi/egeria UI: good and the not so good. 91 Confusing Not my language (too technical or not technical enough) Not meeting my needs Presented for my role Logically flows to complete the tasks I do. Underpinned by relevant (persona specific) APIs Not using my words Mismatches my world view Someone from my role was involved In creating the UI.
  • 92. https://github.com/odpi/egeria UIs ODPi Egeria design 92 Search Open Metadata Access Services Open Metadata Repository Services 92 Use cases, Personas, Practitioners input Data integration, availability and integrity best practices ODPi Egeria Metadata repositories
  • 93. https://github.com/odpi/egeria UIs ODPi Egeria UI types 93 Open Metadata Access Services Open Metadata Repository Services 93 Search Daemon Type 1 OMAS only Type 2 OMAS and OCF Connector Type 3 OMRS Type 4 Daemon UI Data store
  • 94. https://github.com/odpi/egeria UIs ODPi Egeria UI types work in progress 9494 Search Type 1 OMAS only Type 2 OMAS and OCF Connector Type 3 OMRS Type 4 Daemon UI IBM creating Subject Area UI ING creating Asset Search IBM creating Type explorer and instance explorer ING creating Lineage viewer
  • 95. https://github.com/odpi/egeria Tomcat * • configuration Current UI implementation 95 Web app Egeria OMAG Server Rest call * Egeria Uis are coded to work with Tomcat. We expect other web servers will be used as the community requires and implements.
  • 96. https://github.com/odpi/egeria UI design – profile driven 96 Login Personal Profile User’s roles defines what UI capabilities a user should see Subject area Type explorer Asset Search Many more to come …….. Dealing well with potentially large amounts of data in a persona specific way is the challenge. E.g. by paging, limiting by neighborhood depth in graph calls
  • 97. https://github.com/odpi/egeria Egeria UI technology experiences 97 • Web component technology providing web components. It is not a framework • + nice separation of components – hiding implementation in shadow dom • + communicate with property binding • + support for events • + many existing paper and iron components for simple things. David’s (Polymer newby) experiences: • - quirky – spent a lot of time finding the happy path to get things working, especially around web components not being initialized when you want to use them (a big frustration was trying to issue a rest call from the ready() method). • +/- need to be rigorous with architecture, it seems best to use one way bindings and events and a top level controller component to drive state transitions for MVC e.g. around a grid. Redux may make sense to hold state and define state transitions • - There is no free commercial smart (editable) grid I can find (this seems true for other frameworks as well)
  • 98. https://github.com/odpi/egeria The sort of architecture more complex web components require. 98 • Controller controls all transitions • The model allows data updates to occur on the model with simple CRUD operations • The model changes are then reflected into the view. Considerations: - Operations are currently synchronous. Redux would be asynchronous - Spinner would need to lock across the complete User interaction not just the rest call - Changes to the view made by the user and changes to the view from the model, need to be managed - Paging required.
  • 99. https://github.com/odpi/egeria Call for action! 99 Call to the community for open source UI developers! Be part of showing how powerful open metadata is using visualization! Fuel the ODPi rocket!

Editor's Notes

  1. 3 minute video gives a great intro into the why/how… let this lead us forward.
  2. AUTOMATED – Metadata is created by application at the same as the data is created in a standard manner easily consumable for all with necessary permissions Device that took the picture / name of picture / settings picture was taken at / location geo tag of picture etc – all automatic – all done at creation of data time
  3. Egeria is an Open Source framework that can be used to provide a distributed, unified view of metadata from different sources, including different stores and tools from different vendors. Egeria creates a unified view of metadata residing in those tools and stores, so users can collaborate and share metadata, without needing to visit multiple tools or stores. Egeria does not attempt to consolidate the metadata into one repository or tool – it’s better to leave it in place - the current owners stay in control of their metadata, and it stays local to its native store or tool. Egeria provides an open type system, plus APIs, protocols, connectors and local metadata repositories.
  4. The internal architecture of Egeria has two distinct layers. The Open Metadata Access Services layer supports the different types of user and use case. The Open Metadata Repository Services layer provides the unified view of metadata across distinct systems, using protocols and repositories for access and exchange of metadata objects. Egeria’s OMRS layer includes the ability to refer to remote objects or replicate cached copies of remote objects for performance and availability Egeria can store this distributed model in its own local repositories, which support the storing of: local objects, replicas of remote objects and proxy-references to remote objects.
  5. This slide shows a physical embodiment of a cohort of OMAG Servers. An OMAG Server is a deployable unit of function and each OMAG Server can be configured to either run a set of OMAS services or support a repository, or a combination of these roles. An Egeria cohort is a collection of cooperating OMAG Servers. An OMAG Server may belong to multiple cohorts. The OMAS services are local to a server Each server runs the set of OMAS services listed in its configuration – it is OK to run 0, 1 or multiple OMAS services in a server Each OMAS is for a specific purpose or persona The OMRS protocol layer is supported by all servers The OMAG Servers use OMRS to access/exchange metadata across the cohort A server shares its metadata over OMRS – sending an event each time a change occurs, or sending a query to other servers A server may optionally maintain a local Egeria repository A server may optionally connect to a 3rd party metadata repository In a few slides we’ll see that the OMRS itself is composed of distinct layers that focus on cross-cohort (“Enterprise”) functions and Local functions.
  6. The role of OMRS is to provide a location transparent, unified view of metadata within a cohort. Cross-cohort operations are supported by the OMRS ‘Enterprise Connector’, including sending queries to the cohort and receiving the results, as well as receiving replicated metadata and saving copies via the local connector. Meanwhile the ‘Local Connector’ handles interactions with an (optional) local repository and provides a default event mapper that sends events when the local state changes. The OMRS protocol uses publish/subscribe over Kafka topics, but the communication/messaging system is pluggable so different transports could be used. The interface to the repository connector is the MetadataCollection API _ which is described on the next slide….
  7. Egeria’s model of metadata is graph-oriented, both at the business layer and beneath that in the structural metadata Business metadata describes the data that the business needs, what it means and how it should be classified and protected. Structural metadata describes how the data is actually stored and labelled in the data store. The linkages within and between the business and technical metadata forms a graph, that can be used to switch between these two perspectives. One of the built-in repositories in Egeria is a graph repository,; a natural fit for the metadata graph that also accommodates the distributed nature of OMRS. The Egeria local graph repository is built on the open-source JanusGraph graph database.
  8. It may not always be practical to replicate an instance There are 2 occasions where using a proxy is advantageous: An OMAS wants to save a relationship in a repository and the replication has not happened yet (or the set up is such that replication of that type is not enabled). 2. The repository does not support the full entity type but does support proxies (all proxies have the same storage requirement). A key point about the distributed graph is that whether the relationship refers to a replica entity or uses an entity proxy – it is location transparent. The Enterprise OMRS layer can select which repository into which to save an instance – based on capability and proximity.
  9. This is ambitious.
  10. Beyond this is where we put stretch-goal material and deeper dive information.
  11. ODPi
  12. Business metadata describes the data that the business needs, what it means and how it should be classified and protected. Structural metadata describes how the data is actually stored and labelled in the data store. The linkage between the business and technical metadata allows our technology to switch between these two perspectives. For example, A request for data expressed in business terminology can be translated into a query for data from a data store. An integration engine copying data into a sand box can discover which are the fields that the business classifies as sensitive and then mask these values dynamically.
  13. We’re not going to describe this interface in detail – but it’s worth being aware of it, especially as we’re going to talk later about the graph-queries in Group 3.
  14. Egeria provides a persistent graph repository It’s built using JanusGraph and currently uses version 0.3.1 JanusGraph is an open source project hosted by the Linux Foundation that supports the Apache Tinkerpop 3.3 interface. The Egeria graph repository is built using the Egeria ‘plugin’ repository pattern – in which the repository connector is both the connector and the implementation of the repository. The graph repository supports instances originating locally, instances replicated from a remote server and proxy instances.
  15. This slide shows (some of) the layers within an OMAG Server. We talked earlier about the access services and about the Enterprise Connector and Local Connectors within OMRS. Now we want to focus on the relationship between the Egeria graph repository connector and repository implementation (both in aqua-blue) and the JanusGraph code (in green) As far as possible the repository uses Apache Tinkerpop for graph operations. This is simply that – while we like JanusGraph – it is probably sensible to stay as far as possible with the Tinkerpop interface for possible future portability. There are some aspects of interacting with a graph database that are inherently implementation-specific – things like the configuration (e.g. of backends), schema and indexing. For these types of interaction it is necessary to use the JanusGraph Management interface.
  16. Whilst you could look inside the graph for debugging or development – please don’t write code that relies on the schema as it is very likely to evolve The graph does not contain type information – Egeria provides a repository helper that manages types. The graph is used to store instance data - as described in mode detail on the following slides…
  17. Here is an example of a number of OMRS instance objects – there are two entities, that are connected by a relationship. Also, one of the entities has two classifications. All of the instances have attributes – some will be core attributes used for type or control information; others will be attributes that are specific to the instance type (known as type-defined attributes). You don’t need to remember this picture – we’ll stick a copy of it in the top corner so we can refer back to it…..
  18. Entities and classifications are vertices. Relationships and classifiers are edges. The graph schema defines labels for Entity, Relationship, Classification and Classifier. Vertex and edge properties are used to store OMRS instance data, which includes type, control and property information: Type is referenced by name – not linked by an edge; types are held in the repository content manager, not stored in the graph Control information is stored in ‘core attribute’ properties Instance properties are stored in serialized form and under unique custom keys to support search
  19. Entities and classifications are vertices. Relationships and classifiers are edges. The graph schema defines labels for Entity, Relationship, Classification and Classifier. Vertex and edge properties are used to store OMRS instance data, which includes type, control and property information: Type is referenced by name – not linked by an edge; types are held in the repository content manager, not stored in the graph Control information is stored in ‘core attribute’ properties Instance properties are stored in serialized form and under unique custom keys to support search
  20. Within Group 3 of the MDC API ….
  21. Experts in a field with their own jargon and ways of doing things. Search report writer interested in assets and not security policies. Security policy author not interested in assets Goals tasks associated artifacts for a role.
  22. 1 OMAS only e,g Subject area, the UI only uses the OMAS interfaces to communicate with Egeria 2 OMAS and connector e.g. VDC metadata is obtained from Egeria using OMAs calls, the actual date is accessed using an RDB connector 3 OMRS oriented UIs – e.g. Tex used to explore Egeria types 4 Daemon UIs – displaying Lineage
  23. 1 OMAS only e,g Subject area, the UI only uses the OMAS interfaces to communicate with Egeria 2 OMAS and connector e.g. VDC metadata is obtained from Egeria using OMAs calls, the actual date is accessed using an RDB connector 3 OMRS oriented UIs – e.g. Tex used to explore Egeria types 4 Daemon UIs – displaying Lineage
  24. For this to work we need to know hostname and ports and url structures. Configuration for tomcat is via application.properties Configuration of the server is held in a file and authored via admin rest calls.
  25. Example here is the glossary grid. A grid for authoring glossaries in the subject area UI. Work in progress