© 2022 Neo4j, Inc. All rights reserved.
© 2022 Neo4j, Inc. All rights reserved.
A Real World Case Study for
Implementing an Enterprise
Scale Data Fabric
Lulit Tesfaye
Partner and Division Director,
Data and Information Management
Enterprise Knowledge, LLC.
Joe Hilger
CTO, Solution Architect
Enterprise Knowledge,
LLC.
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
© 2022 Neo4j, Inc. All rights reserved.
Meet Enterprise Knowledge
10 AREAS OF EXPERTISE
KM STRATEGY & DESIGN
TAXONOMY & ONTOLOGY DESIGN
AGILE, DESIGN THINKING & FACILITATION
CONTENT & DATA STRATEGY
KNOWLEDGE GRAPHS, DATA MODELING, & AI
ENTERPRISE SEARCH
INTEGRATED CHANGE MANAGEMENT
ENTERPRISE LEARNING
CONTENT AND DATA MANAGEMENT
ENTERPRISE AI
Top Implementer of Leading Knowledge
and Data Management Tools
400+ Thought Leadership
Pieces Published
Clients in 25+ Countries Across Multiple Industries
HEADQUARTERED IN
ARLINGTON, VIRGINIA,
USA
GLOBAL OFFICE IN
BRUSSELS, BELGIUM
@EKConsulting
80+
EXPERT
CONSULTANTS
KMWORLD’S
100 COMPANIES THAT MATTER IN KM (2015, 2016, 2017, 2018,
2019, 2020, 2021)
TOP 50 TRAILBLAZERS IN AI (2020, 2021)
CIO REVIEW’S
20 MOST PROMISING KM SOLUTION PROVIDERS (2016)
INC MAGAZINE
THE 5000 FASTEST GROWING COMPANIES 2018, 2019, 2020,
2021, 2022
INC MAGAZINE
BEST WORKPLACES (2018, 2019, 2021, 2022)
WASHINGTONIAN MAGAZINE’S
TOP 50 GREAT PLACES TO WORK (2017)
WASHINGTON BUSINESS JOURNAL’S
BEST PLACES TO WORK (2017, 2018, 2019, 2020)
ARLINGTON ECONOMIC DEVELOPMENT’S
FAST FOUR AWARD – FASTEST GROWING COMPANY (2016)
VIRGINIA CHAMBER OF COMMERCE’S
FANTASTIC 50 AWARD – FASTEST GROWING COMPANY
(2019, 2020)
AWARD-WINNING
CONSULTANCY
© 2022 Neo4j, Inc. All rights reserved.
3
Why are we all here today?
THE CHALLENGE PERSISTS WHAT IT’S COSTING US
● Related data is fragmented - Information
not accessible at the time of need,
● Complex infrastructure and proprietary
platforms make it hard to enable
consistent or meaningful connections
● Business meaning and knowledge is lost
● Pace and dynamism of data - trust in and
integrity of evolving data
● Siloed decisions, missing holistic context
● Restricted collaboration across the
organization
● Compliance, security, and regulatory
violations
● Expensive migrations
● Stifled automation and progress towards
innovation and enterprise AI
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
4
What is a data fabric?
Unstructured
Data
Documentation,
Presentations,
Multimedia
Data Fabric: A data fabric is a logical
data architecture that serves as a data
connection and knowledge layer.
What does it do?
• Enables data federation and
virtualization through semantic labels
or rules (e.g., taxonomies/business
glossaries or ontologies)
• Capture and connect data based on
business or domain meaning and value
• Aggregate and unify unstructured and
structured data to connect data of all
formats
• Makes data available for both humans
and machines to understand
Existing
Metadata
Metadata
repositories,
CMS/Catalog
data, Taxonomy
Structured
Data
Datasets, Data
Warehouse,
ETL Lineage
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
5
Metadata
Data/content
abstraction and
enrichment
models through
labeling or
cataloging (data
catalogs)
Taxonomy and
Ontology
Business term
standardization
and attach
meaning/context
and relationships
to data
Knowledge
Graph
A key
component of a
data fabric that
serves as the
data
orchestration or
discovery layer
Connections
and
Integrations
Data integration
or orchestration
(ETL/ELT
pipelines), and
data
virtualization
solutions to
support multiple
types of data
users and
applications.
Applications
Semantic
search, data
visualization,
chatbots,
Recommendatio
n systems
Deconstructing the Components of a Data Fabric
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
6
A sample architecture
@EKConsulting
Data Fabric
Layer
APIs ETL
Application
APIs
Search
Research &
Analytics
Recommendations
& Chatbots
Admin &
Governance
Knowledge Graph
Metadata Service
Data Catalog
Taxonomy/Ontology Management
Content Storage
Content Management
System
Data Lake / Data
Warehouse
Subscriptions External Sources
Data
Sources
Presentation
Layer
© 2022 Neo4j, Inc. All rights reserved.
7
When is a data fabric the right solution?
Lack of business context is
hindering the results from your
data efforts
Your data changes frequently
and you need a flexible model
and/or schema to keep it up to
date
The results from your enterprise
AI efforts and applications are a
black-box or not explainable
Your organization has highly
interrelated data but is having
challenges with unifying data from
different locations, business units,
and formats
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
8
Building a Data Fabric
A phased approach
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
9
Design and Development: A Phased Approach
Pilot
Alignment on vision and goals of data fabric
Demoable, testable use case achieved
Gaps and risks in data quality surfaced
Standards-based foundation to scale
Scale
Minimum Viable
Product (MVP)
Integration with production environment
Tested and validated with users
Expansion to new use cases
Data fabric becomes business-critical
Data quality and standards established
Increased data governance
Iterative expansion to new use cases, new
data sources, and new systems
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
© 2022 Neo4j, Inc. All rights reserved.
Your time, resources, and effort should
focus on the critical outcomes of your
Pilot.
Think to your priorities, and focus your
use case on these goals.
#
Getting Started: What Are
You Trying to Prove?
Intuitive to
Customers
Can Integrate
with Source
Systems
Enables
Context-Based
Standardization /
Federation
Meets
Security
Standards
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
11
The Implementation Process
Governance & Setting
Global Data Standards
05
Global data standards and embedded governance to
support production and consumption across assets
Model data fabric layer
03
Standard framework and federation model for
development and consumption of data domains and
subdomains
Prioritize sources of data
and metadata
02
Define the metadata and data sources that are
important to the solution
Define the Problem or
Use Case
01
Define the problem to be solved and the overarching
solution architecture
Data Enrichment
04
Enrich, add, and connect data concepts via metadata
and semantics
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
12
Design Components
Semantic Model
Use Case Definition
and Target Personas
Solutions
Architecture
Consumption
Applications
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
Product
Success and
Coordination
Knowledge
Modeling and
Data Prep
Data and
Infrastructure
Engineering
Knowledge / Semantic
Engineer
Technical Analysts
Implementation
Engineer
Systems Owners
IT/Infrastructure Point of
Contact
Data Architects
Product Owner/Lead
Business Sponsors
Content Services Representative
Domain SMEs
13
Who Should Be Involved?
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
14
Case Studies
Real-world Applications and Lessons Learned
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
© 2022 Neo4j, Inc. All rights reserved.
15
Use Case 1
Data Fabric: Improved Data Consistency and Usability
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
16
The Challenge
The Solution
Results
A large financial corporation had a lack of
alignment around the meaning, format, and
intent of data elements across organizational
divisions, reducing the ability of data producers
and consumers to find, use, and and trust data.
● Develop an enterprise ontology to
standardize data from multiple systems
and migrate from an existing physical data
model.
● Implement a federated ontology
governance and contribution model.
● Leverage standardized ontology concepts
throughout the data lifecycle.
● Enabled 10+ departments to contribute and
lead federated development and
governance of the ontology.
● Drove the implementation of data standards
through the publication of the enterprise
ontology.
● Increased data awareness, consistent
understanding, and alignment for users
across departments and technologies.
Domain Standardization and Data
Ecosystem
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
The Challenge
17
Siloed processes and systems have led to inconsistent data across
the organization.
stateCode
CountrySubdivisionCode
Subdivision_Code
state_postal_code
USPS_Code
STATE
US AL
US AK
US AZ
… …
countryCode
IsoCountryCode
alpha_code_country
geo.country_code2
iso_3166_country_code_2
CTRY
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
18
The Solution
Embedding standards and automating their implementation
throughout the data lifecycle.
1
Ontology Development &
Formalization
2
Dataset Registration & Creation
Using Standardized Concepts
3 Data Validation & Enforcements
4 Data Discovery & Consumption
Standards
Data
Production
Data
Consumption
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
© 2022 Neo4j, Inc. All rights reserved.
19
Use Case 2
Data Fabric: Bioengineering Process Connection and
Data Standardization
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
20
The Challenge
The Solution
Results
A Bioengineering company needed to quickly find
and get insights about drug development
processes across 4 legacy systems and 5
departments. However, their insights were limited
to what the scientists could manually aggregate
from siloed legacy systems with different naming
conventions. The organization looked to EK for a
solution to easily access data for regulatory filings
while maintaining necessary integrity.
● Develop a comprehensive ontology to
model the drug development process and
standardized nomenclature.
● Create foundation for a knowledge graph
that aggregated and normalized disparate
data from four legacy systems, as well as
enabled automated report generation for
regulatory filings and advanced analytics
on dozens of process parameters.
● This solution allowed scientists to get more
value from an immense data set and focus
their time on strategic decisions.
● Reduced effort to compile data from legacy
systems without search to only 5 clicks.
● Increased process comparison capabilities
from 5 to 1,000+ at a glance, enabling
scientists to make unprecedented strategic
decisions.
● Eliminated analysts’ reliance on reaching
out to individuals to aggregate data that is
captured over the years and instead access
it in seconds.
Bioengineering Process Connection and
Data Standardization
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
21
Data Fabric Solution Architecture
Content & Data
Sources Data orchestration
tool
Data &
Metadata
Taxonomy/Ontology
Manager
Transforms data to fit
ontology model
Adds structure to
unstructured content via
auto-tagging
Data Lake Structured
Data
Text
Advanced End-user
Applications
Graph Storage
Standardization and
deduplication
Search
Chatbots/
Q&A
Data Visualization and
Reporting
Custom
Applications
Recommender
Systems
Queries
& APIs
Data Catalog
Metadata
Repository
Core
Metadata
Collection
DataOps Applications: Quality, Lineage, Observability
Enriched
Metadata (with
Knowledge)
Federated
Query Across
Sources
@EKConsulting
Usage and
Governance
Analysis, Data,
Glossary
© 2022 Neo4j, Inc. All rights reserved.
© 2022 Neo4j, Inc. All rights reserved.
22
Considerations for Operationalization
and Scale
Implementation Team
Collaboration with enterprise teams and end users
Quality Level
Higher validation and quality expectations
Development Environment
Production
Timeframe
6-7 months
Solutions Architecture
● Integration with source systems
● Metadata management
● Semantic and graph solutions
● Integration with end user application(s)
Pilot
MVP
Scale
@EKConsulting
© 2022 Neo4j, Inc. All rights reserved.
© 2022 Neo4j, Inc. All rights reserved.
23
Thank you!
JHILGER@ENTERPRISE-KNOWLEDGE.COM
Joe Hilger
WWW.LINKEDIN.COM/IN/JOSEPH-HILGER/
LTESFAYE@ENTERPRISE-KNOWLEDGE.COM
Lulit Tesfaye
WWW.LINKEDIN.COM/IN/LULIT-TESFAYE/
@EKConsulting

A Real World Case Study for Implementing an Enterprise Scale Data Fabric

  • 1.
    © 2022 Neo4j,Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. A Real World Case Study for Implementing an Enterprise Scale Data Fabric Lulit Tesfaye Partner and Division Director, Data and Information Management Enterprise Knowledge, LLC. Joe Hilger CTO, Solution Architect Enterprise Knowledge, LLC. @EKConsulting
  • 2.
    © 2022 Neo4j,Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. Meet Enterprise Knowledge 10 AREAS OF EXPERTISE KM STRATEGY & DESIGN TAXONOMY & ONTOLOGY DESIGN AGILE, DESIGN THINKING & FACILITATION CONTENT & DATA STRATEGY KNOWLEDGE GRAPHS, DATA MODELING, & AI ENTERPRISE SEARCH INTEGRATED CHANGE MANAGEMENT ENTERPRISE LEARNING CONTENT AND DATA MANAGEMENT ENTERPRISE AI Top Implementer of Leading Knowledge and Data Management Tools 400+ Thought Leadership Pieces Published Clients in 25+ Countries Across Multiple Industries HEADQUARTERED IN ARLINGTON, VIRGINIA, USA GLOBAL OFFICE IN BRUSSELS, BELGIUM @EKConsulting 80+ EXPERT CONSULTANTS KMWORLD’S 100 COMPANIES THAT MATTER IN KM (2015, 2016, 2017, 2018, 2019, 2020, 2021) TOP 50 TRAILBLAZERS IN AI (2020, 2021) CIO REVIEW’S 20 MOST PROMISING KM SOLUTION PROVIDERS (2016) INC MAGAZINE THE 5000 FASTEST GROWING COMPANIES 2018, 2019, 2020, 2021, 2022 INC MAGAZINE BEST WORKPLACES (2018, 2019, 2021, 2022) WASHINGTONIAN MAGAZINE’S TOP 50 GREAT PLACES TO WORK (2017) WASHINGTON BUSINESS JOURNAL’S BEST PLACES TO WORK (2017, 2018, 2019, 2020) ARLINGTON ECONOMIC DEVELOPMENT’S FAST FOUR AWARD – FASTEST GROWING COMPANY (2016) VIRGINIA CHAMBER OF COMMERCE’S FANTASTIC 50 AWARD – FASTEST GROWING COMPANY (2019, 2020) AWARD-WINNING CONSULTANCY
  • 3.
    © 2022 Neo4j,Inc. All rights reserved. 3 Why are we all here today? THE CHALLENGE PERSISTS WHAT IT’S COSTING US ● Related data is fragmented - Information not accessible at the time of need, ● Complex infrastructure and proprietary platforms make it hard to enable consistent or meaningful connections ● Business meaning and knowledge is lost ● Pace and dynamism of data - trust in and integrity of evolving data ● Siloed decisions, missing holistic context ● Restricted collaboration across the organization ● Compliance, security, and regulatory violations ● Expensive migrations ● Stifled automation and progress towards innovation and enterprise AI @EKConsulting
  • 4.
    © 2022 Neo4j,Inc. All rights reserved. 4 What is a data fabric? Unstructured Data Documentation, Presentations, Multimedia Data Fabric: A data fabric is a logical data architecture that serves as a data connection and knowledge layer. What does it do? • Enables data federation and virtualization through semantic labels or rules (e.g., taxonomies/business glossaries or ontologies) • Capture and connect data based on business or domain meaning and value • Aggregate and unify unstructured and structured data to connect data of all formats • Makes data available for both humans and machines to understand Existing Metadata Metadata repositories, CMS/Catalog data, Taxonomy Structured Data Datasets, Data Warehouse, ETL Lineage @EKConsulting
  • 5.
    © 2022 Neo4j,Inc. All rights reserved. 5 Metadata Data/content abstraction and enrichment models through labeling or cataloging (data catalogs) Taxonomy and Ontology Business term standardization and attach meaning/context and relationships to data Knowledge Graph A key component of a data fabric that serves as the data orchestration or discovery layer Connections and Integrations Data integration or orchestration (ETL/ELT pipelines), and data virtualization solutions to support multiple types of data users and applications. Applications Semantic search, data visualization, chatbots, Recommendatio n systems Deconstructing the Components of a Data Fabric @EKConsulting
  • 6.
    © 2022 Neo4j,Inc. All rights reserved. 6 A sample architecture @EKConsulting Data Fabric Layer APIs ETL Application APIs Search Research & Analytics Recommendations & Chatbots Admin & Governance Knowledge Graph Metadata Service Data Catalog Taxonomy/Ontology Management Content Storage Content Management System Data Lake / Data Warehouse Subscriptions External Sources Data Sources Presentation Layer
  • 7.
    © 2022 Neo4j,Inc. All rights reserved. 7 When is a data fabric the right solution? Lack of business context is hindering the results from your data efforts Your data changes frequently and you need a flexible model and/or schema to keep it up to date The results from your enterprise AI efforts and applications are a black-box or not explainable Your organization has highly interrelated data but is having challenges with unifying data from different locations, business units, and formats @EKConsulting
  • 8.
    © 2022 Neo4j,Inc. All rights reserved. 8 Building a Data Fabric A phased approach @EKConsulting
  • 9.
    © 2022 Neo4j,Inc. All rights reserved. 9 Design and Development: A Phased Approach Pilot Alignment on vision and goals of data fabric Demoable, testable use case achieved Gaps and risks in data quality surfaced Standards-based foundation to scale Scale Minimum Viable Product (MVP) Integration with production environment Tested and validated with users Expansion to new use cases Data fabric becomes business-critical Data quality and standards established Increased data governance Iterative expansion to new use cases, new data sources, and new systems @EKConsulting
  • 10.
    © 2022 Neo4j,Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. Your time, resources, and effort should focus on the critical outcomes of your Pilot. Think to your priorities, and focus your use case on these goals. # Getting Started: What Are You Trying to Prove? Intuitive to Customers Can Integrate with Source Systems Enables Context-Based Standardization / Federation Meets Security Standards @EKConsulting
  • 11.
    © 2022 Neo4j,Inc. All rights reserved. 11 The Implementation Process Governance & Setting Global Data Standards 05 Global data standards and embedded governance to support production and consumption across assets Model data fabric layer 03 Standard framework and federation model for development and consumption of data domains and subdomains Prioritize sources of data and metadata 02 Define the metadata and data sources that are important to the solution Define the Problem or Use Case 01 Define the problem to be solved and the overarching solution architecture Data Enrichment 04 Enrich, add, and connect data concepts via metadata and semantics @EKConsulting
  • 12.
    © 2022 Neo4j,Inc. All rights reserved. 12 Design Components Semantic Model Use Case Definition and Target Personas Solutions Architecture Consumption Applications @EKConsulting
  • 13.
    © 2022 Neo4j,Inc. All rights reserved. Product Success and Coordination Knowledge Modeling and Data Prep Data and Infrastructure Engineering Knowledge / Semantic Engineer Technical Analysts Implementation Engineer Systems Owners IT/Infrastructure Point of Contact Data Architects Product Owner/Lead Business Sponsors Content Services Representative Domain SMEs 13 Who Should Be Involved? @EKConsulting
  • 14.
    © 2022 Neo4j,Inc. All rights reserved. 14 Case Studies Real-world Applications and Lessons Learned @EKConsulting
  • 15.
    © 2022 Neo4j,Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. 15 Use Case 1 Data Fabric: Improved Data Consistency and Usability @EKConsulting
  • 16.
    © 2022 Neo4j,Inc. All rights reserved. 16 The Challenge The Solution Results A large financial corporation had a lack of alignment around the meaning, format, and intent of data elements across organizational divisions, reducing the ability of data producers and consumers to find, use, and and trust data. ● Develop an enterprise ontology to standardize data from multiple systems and migrate from an existing physical data model. ● Implement a federated ontology governance and contribution model. ● Leverage standardized ontology concepts throughout the data lifecycle. ● Enabled 10+ departments to contribute and lead federated development and governance of the ontology. ● Drove the implementation of data standards through the publication of the enterprise ontology. ● Increased data awareness, consistent understanding, and alignment for users across departments and technologies. Domain Standardization and Data Ecosystem @EKConsulting
  • 17.
    © 2022 Neo4j,Inc. All rights reserved. The Challenge 17 Siloed processes and systems have led to inconsistent data across the organization. stateCode CountrySubdivisionCode Subdivision_Code state_postal_code USPS_Code STATE US AL US AK US AZ … … countryCode IsoCountryCode alpha_code_country geo.country_code2 iso_3166_country_code_2 CTRY @EKConsulting
  • 18.
    © 2022 Neo4j,Inc. All rights reserved. 18 The Solution Embedding standards and automating their implementation throughout the data lifecycle. 1 Ontology Development & Formalization 2 Dataset Registration & Creation Using Standardized Concepts 3 Data Validation & Enforcements 4 Data Discovery & Consumption Standards Data Production Data Consumption @EKConsulting
  • 19.
    © 2022 Neo4j,Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. 19 Use Case 2 Data Fabric: Bioengineering Process Connection and Data Standardization @EKConsulting
  • 20.
    © 2022 Neo4j,Inc. All rights reserved. 20 The Challenge The Solution Results A Bioengineering company needed to quickly find and get insights about drug development processes across 4 legacy systems and 5 departments. However, their insights were limited to what the scientists could manually aggregate from siloed legacy systems with different naming conventions. The organization looked to EK for a solution to easily access data for regulatory filings while maintaining necessary integrity. ● Develop a comprehensive ontology to model the drug development process and standardized nomenclature. ● Create foundation for a knowledge graph that aggregated and normalized disparate data from four legacy systems, as well as enabled automated report generation for regulatory filings and advanced analytics on dozens of process parameters. ● This solution allowed scientists to get more value from an immense data set and focus their time on strategic decisions. ● Reduced effort to compile data from legacy systems without search to only 5 clicks. ● Increased process comparison capabilities from 5 to 1,000+ at a glance, enabling scientists to make unprecedented strategic decisions. ● Eliminated analysts’ reliance on reaching out to individuals to aggregate data that is captured over the years and instead access it in seconds. Bioengineering Process Connection and Data Standardization @EKConsulting
  • 21.
    © 2022 Neo4j,Inc. All rights reserved. 21 Data Fabric Solution Architecture Content & Data Sources Data orchestration tool Data & Metadata Taxonomy/Ontology Manager Transforms data to fit ontology model Adds structure to unstructured content via auto-tagging Data Lake Structured Data Text Advanced End-user Applications Graph Storage Standardization and deduplication Search Chatbots/ Q&A Data Visualization and Reporting Custom Applications Recommender Systems Queries & APIs Data Catalog Metadata Repository Core Metadata Collection DataOps Applications: Quality, Lineage, Observability Enriched Metadata (with Knowledge) Federated Query Across Sources @EKConsulting Usage and Governance Analysis, Data, Glossary
  • 22.
    © 2022 Neo4j,Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. 22 Considerations for Operationalization and Scale Implementation Team Collaboration with enterprise teams and end users Quality Level Higher validation and quality expectations Development Environment Production Timeframe 6-7 months Solutions Architecture ● Integration with source systems ● Metadata management ● Semantic and graph solutions ● Integration with end user application(s) Pilot MVP Scale @EKConsulting
  • 23.
    © 2022 Neo4j,Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. 23 Thank you! JHILGER@ENTERPRISE-KNOWLEDGE.COM Joe Hilger WWW.LINKEDIN.COM/IN/JOSEPH-HILGER/ LTESFAYE@ENTERPRISE-KNOWLEDGE.COM Lulit Tesfaye WWW.LINKEDIN.COM/IN/LULIT-TESFAYE/ @EKConsulting