SlideShare a Scribd company logo
1 of 36
Download to read offline
FAIR data: Superior data visibility
and reuse without warehousing
Alan Morrison
Data Architecture Best Practices Summit
April 18, 2023
1
Alain
Audet
at
https://pixabay.com/photos/lake-foggy-lake-nature-landscape-6839357/
Outline of today’s talk
2
● Problem
○ Lack of desiloed, high quality, well-integrated data and logic at scale
○ Shortfalls of data warehousing
● Solution
○ FAIR data and knowledge graphs
■ Blended data + logic centered infrastructure
● Result
○ Case study examples
○ Organic, data-centric systems
○ Zero-copy integration feasibility
Problem: Data quality, siloing and
poor integration
3
Yes, data warehousing focused on the integration problem
4
● Pro: Identified the critical problem to solve
● Con: Advocated a method that doesn’t delve deep enough to solve today’s
problem
● Still have the unified data model challenge
Data warehousing can’t solve today’s integration challenge
5
● Thousands of databases per enterprise (siloing)
● Thousands of applications (code sprawl)
● Data models buried in the app code
● Every app a special snowflake with its own data model
How did we get here? By selling the old as new
6
How data warehousing stopped scaling
“They recognized that these themes ended up in all these legacy apps. Sales rolled up against a
geographic and a product hierarchy, and an organizational hierarchy…. They said, Let’s have
those conformed dimensions and a small number of facts. Let’s bring the facts from all the
different systems and snap them together according to these conformed dimensions….
Brilliant idea, but I think what actually happened over time is the workload just got greater and
greater. The ability of people to actually conform those dimensions kept eroding….”
–Dave McComb, President, Semantic Arts
“Disambiguation of Data Mesh, Fabric, Centric, Driven, and Everything!” YouTube video, https://www.youtube.com/watch?v=M5XlGloj4UY&t=564s, 2021
7
Data warehousing model conformance doesn’t scale
“I spent a good 15 years working in financial services at some
pretty big banks. Half of the IT change budget is spent on
integration and the by-products of integration….I saw as the
technology was advancing that the percentage wasn’t going
down – in fact, it was going up. At some point, is the integration
tax going to be 100 percent?”
– Dan DeMers, CEO of Cinchy
“Disambiguation of Data Mesh, Fabric, Centric, Driven, and Everything!” YouTube video,
https://www.youtube.com/watch?v=M5XlGloj4UY&t=564s, 2021
8
An effective data model describes and unifies the contexts necessary for true data
integration. It gives machines enough clues to detect and discover layered context.
“What is data integration?
Let's start with a short list of what data integration is not:
● It's not shoveling data around between systems.
● It's not calling an API.
● It's not creating a data connection to a source system.
It can include one or more of the jobs in the list here above, but what is the ingredient
that cannot be missing?
It's connecting data from different source systems together in a consistent and
coherent data model.”
–Wouter Trappers, CDAO
What’s a data model? What is data integration?
9
Why large-scale integration?
10
Large scale integration is essential to
avoiding observational bias. The drunk
looking for his money under the lamppost
analogy describes the nature of this bias.
The drunk is looking for his money where
the light is, even though he knows the
money is in the shadows.
To manage today’s business at scale,
enterprises need light and visibility
across departments, organizations and
supply networks
Solution: Scale FAIR data development
using data-centric architecture,
semantics and knowledge graph
methods
11
Simple web hosting + legacy Client-Server
storage
Early Web (on Client-Server)
Compute and storage more loosely coupled,
virtualized, controlled and data-centric
“Decoupled” and “Decentralized” Cloud
Application Distribution via Proprietary
and IP Networking
Client-Server and Desktops
Commodity servers + storage + some
virtualization
Distributed Cloud and Mobile Devices
1st
2nd
3rd
4th
5th
Centralized storage and compute, with
minimal networking
Mainframe and Green Screens
The Five Commingled Phases of Compute, Networking and Storage
12
Less
centralized
Time
More
centralized
Application
Centric
Data
Centric
All phases are
still active and
evolving
Data-centric knowledge graphs allow desiloed visibility and interoperation at scale
13
Opportunity: Unitary data + description logic = knowledge
14
“Data management” (structured data,
mostly)
Knowledge management (internally
shared)
Content management (externally
shared)
Learning management (internal
coursework)
FAIR data and
associated
description
logic
FAIR data is data users can
have confidence in for
many purposes.
Data becomes FAIR when
it disambiguates concepts,
individuals and roles and
how they interact and relate
to one another.
In a knowledge graph
context, documented
knowledge = FAIR data.
FAIR stands for findable, accessible, interoperable,
and reusable. Under the FAIR data umbrella are all
heterogeneous types of data/content.
Semantics is the path to FAIR, smart, siloless data sharing
15
James Kobelius, 2016
Association of European Libraries, 2017
Compare FAIR and TRUST principles
16
Lin, D., Crabtree, J., Dillo, I. et al. The TRUST Principles for digital repositories. Sci Data 7, 144 (2020).
https://doi.org/10.1038/s41597-020-0486-7
FAIR data leads to TRUSTed data
repositories.
Who’s behind the FAIR data movement? Big pharma, for
one.
“From 2023, drug submissions to the European Medicines Agency (EMA) must
comply with select Identification of Medicinal Products (IDMP) standards. By
developing an IDMP-compliant ontology with machine-ready data, the Alliance will
support the move to automate this process, improving efficiency and patient
safety, reducing costs and time burden, and driving innovation in the drug
development pipeline.
“The project is managed by the Pistoia Alliance, with a project team of
experts from Bayer, Novartis, Roche, Merck KGaA, and GSK.”
17
–Erik Schultes, et al., ”FAIR Digital Twins for Data-Intensive Research,”
PERSPECTIVE article
Front. Big Data, 11 May 2022
Sec. Data Science
Volume 5 - 2022 | https://doi.org/10.3389/fdata.2022.883341
To create FAIR data, users can start with a single triple
18
Linked Open Data Cloud, 2022
Starter triple for a knowledge graph
A standard knowledge graph consists of triplified, relationship-rich
data. The data model, or ontology, is also described in triples and
lives with the rest of the data. Ontologies can also be managed as
data. Linking triples merely requires a verb (or predicate, or
described edge) to link them.
Simple way to start a business knowledge graph
● “Use JSON-LD to atomise your enterprise data down into three-part statements and voila!
You get a connected graph!
● ✨ Decentralize the process by having each team publish their own JSON-LD, for example,
let the sales team publish the sales data and ask them to link each sale to the correct product
and client.
● 🤖 Connect GPT to the JSON-LD that your teams have published. Then, harness the power
of GPT to assist new teams in publishing their JSON-LD and integrating it back into your
enterprise-wide Knowledge Graph.”
Key to scaling external/internal integration: use the schema.org modeled JSON-LD from websites
GPT is trained on and connect it with internal data also modeled with schema.org
–#HT Tony Seale, UBS
https://www.linkedin.com/posts/tonyseale_mlops-dataintegration-ai-activity-7052551060237819904-bAZc
19
To scale FAIR data, use an assisted, hybrid AI approach
20
Amit Sheth, From NLP to NLU: Why we need varied, comprehensive, and stratified knowledge (Neuro-symbolic AI),” USC Information Sciences Institute on
YouTube, March 2023, https://www.youtube.com/watch?v=xyxQXka6dRY&t=2377s
21
How hybrid AI helps in research
“LLMs have amazing abilities in
manipulating natural language text,
but generating timely and factually
verified recommendations is one
thing LLMs are not naturally great
at.”
–Mike Tung, CEO of Diffbot
Diffbot Blog, April 2023,
https://blog.diffbot.com/generating-company-recommendations-usi
ng-large-language-models-and-knowledge-graphs/
LLMs aren’t a reliable research tool
alone because they hallucinate. you
can’t trust the answers unless you know
the answer already.
Mike Tung recommends more precise
prompting on the query side and answer
verification via a knowledge graph such
as Diffbot. Both of these capabilities
harness precise logical description
missing in current LLM Q&As.
KGs and data-centric architecture
22
Semantic standards allow a desiloed data landscape
23
How shared graph semantics helps
● Boosts meaningful results (result of lack of data and logic transparency and
cohesiveness) and relevancy
● Contextualizes data for management and reuse with relationship logic
● Scales meaningful connections between contexts (relevant
relationships living with entities)
● Enables Metcalfe’s network of networks effect (network_effectN
)
● Enables model-driven development (code once, reuse anywhere)
● Scale efficiencies and economies so that energy consumption is reduced
24
Case study examples
25
IKEA’s product knowledge graph
26
Katariina Kari, “IKEA’s Knowledge Graph and Why It Has Three Layers,” August 2022,
“https://medium.com/flat-pack-tech/ikeas-knowledge-graph-and-why-it-has-three-layers-a38fca436349
Currently
designed to
be customer
facing; can
evolve for
logistics
purposes with
more detailed
product data
Blue Brain Nexus–graph-based Bioinformatics collaboration
27
Montefiore Health’s Patient-centered Analytical Learning
Machine – (“PALM”) – Personalized medicine at scale
28
Enterprise decentralized app environment: OriginTrail.io
29
https://origintrail.io/
OriginTrail + BSI’s supply chain tracking and tracing
30
OriginTrail and the British Standards Institute (BSI), https://twitter.com/origin_trail/status/1339606640887152642?s=20, Dec. 2020
The Monasteriven
whiskey produced in
Ireland is tracked and
traced from “grain to
glass” with the
OriginTrail.io
approach.
OT uses
decentralized
knowledge graph that
connects to one of
several different
blockchains.
This method enables
shared data reuse
and other synergies
across the supply
chain.
SOLID shared, federated XaaS: Construction industry
31
“TrinPod™: World's first conceptually indexed space-time
digital twin using Solid,” Graphmetrix, 2022,
https://graphmetrix.com/trinpod
Company-specific SOLID storage pods and access
control can be managed by each supply chain partner.
Graphmetrix as digital twin provider manages the
system and system-level apps.
Digital twins and agents: Better data sharing than APIs?
32
Autonomous agents
Digital twins
Locale: Portsmouth, UK
Sensor nets
Iotics, 2019
and 2023
Final thoughts
33
Organic data when nurtured grows from seeds into trees
34
Rich data ecosystems evolve naturally by
comparison with underdescribed, fragmented
data assets
Zero-copy integration becomes possible,
reducing complexity, labor and energy waste by
up to 90 percent
Second-order cybernetics (humans in the loop)
and precise facts and contextualization
complement probabilistic methods
Seven obstacles to adoption of FAIR data development at scale
35
Thoughts and Reactions?
Feel free to ping me anytime with questions, etc.
Alan Morrison
Data Science Central
LinkedIn | Twitter | Quora | Slideshare
+1 408 205 5109
a.s.morrison@gmail.com
36

More Related Content

Similar to FAIR data_ Superior data visibility and reuse without warehousing.pdf

Linked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data Principles, benefits of LOD for sustainable developmentLinked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data Principles, benefits of LOD for sustainable developmentMartin Kaltenböck
 
DCA Symposium 6 Feb 2023.pdf
DCA Symposium 6 Feb 2023.pdfDCA Symposium 6 Feb 2023.pdf
DCA Symposium 6 Feb 2023.pdfAlan Morrison
 
Data Collaboration Stack
Data Collaboration StackData Collaboration Stack
Data Collaboration StackPierre Brunelle
 
Better Architecture for Data: Adaptable, Scalable, and Smart
Better Architecture for Data: Adaptable, Scalable, and SmartBetter Architecture for Data: Adaptable, Scalable, and Smart
Better Architecture for Data: Adaptable, Scalable, and SmartPaul Boal
 
Business_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanBusiness_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanLuke Caratan
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Mr.Sameer Kumar Das
 
Smart Data for Smart Labs
Smart Data for Smart Labs Smart Data for Smart Labs
Smart Data for Smart Labs OSTHUS
 
Application and Methods of Deep Learning in IoT
Application and Methods of Deep Learning in IoTApplication and Methods of Deep Learning in IoT
Application and Methods of Deep Learning in IoTIJAEMSJORNAL
 
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityBarry Smith
 
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Denodo
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceJuuso Parkkinen
 
Big data and education 2015 leon
Big data and education 2015   leonBig data and education 2015   leon
Big data and education 2015 leoncruetic2015
 
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Denodo
 
Data-centric design and the knowledge graph
Data-centric design and the knowledge graphData-centric design and the knowledge graph
Data-centric design and the knowledge graphAlan Morrison
 
Open Data and Artificial Intelligence
Open Data and Artificial IntelligenceOpen Data and Artificial Intelligence
Open Data and Artificial IntelligenceOpen Knowledge Nepal
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Enterprise Knowledge
 
Big Data Science Workshop Documentation V1.0
Big Data Science Workshop Documentation V1.0Big Data Science Workshop Documentation V1.0
Big Data Science Workshop Documentation V1.0Abdelrahman Astro
 
Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing Data Blueprint
 

Similar to FAIR data_ Superior data visibility and reuse without warehousing.pdf (20)

Linked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data Principles, benefits of LOD for sustainable developmentLinked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data Principles, benefits of LOD for sustainable development
 
DCA Symposium 6 Feb 2023.pdf
DCA Symposium 6 Feb 2023.pdfDCA Symposium 6 Feb 2023.pdf
DCA Symposium 6 Feb 2023.pdf
 
Data Collaboration Stack
Data Collaboration StackData Collaboration Stack
Data Collaboration Stack
 
Better Architecture for Data: Adaptable, Scalable, and Smart
Better Architecture for Data: Adaptable, Scalable, and SmartBetter Architecture for Data: Adaptable, Scalable, and Smart
Better Architecture for Data: Adaptable, Scalable, and Smart
 
Business_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanBusiness_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_Caratan
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53
 
Smart Data for Smart Labs
Smart Data for Smart Labs Smart Data for Smart Labs
Smart Data for Smart Labs
 
Application and Methods of Deep Learning in IoT
Application and Methods of Deep Learning in IoTApplication and Methods of Deep Learning in IoT
Application and Methods of Deep Learning in IoT
 
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
 
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
Big data and education 2015 leon
Big data and education 2015   leonBig data and education 2015   leon
Big data and education 2015 leon
 
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
 
Dive deep into your Data Pools
Dive deep into your Data PoolsDive deep into your Data Pools
Dive deep into your Data Pools
 
Data-centric design and the knowledge graph
Data-centric design and the knowledge graphData-centric design and the knowledge graph
Data-centric design and the knowledge graph
 
Open Data and Artificial Intelligence
Open Data and Artificial IntelligenceOpen Data and Artificial Intelligence
Open Data and Artificial Intelligence
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020
 
Big Data Science Workshop Documentation V1.0
Big Data Science Workshop Documentation V1.0Big Data Science Workshop Documentation V1.0
Big Data Science Workshop Documentation V1.0
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
 
Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing
 

More from Alan Morrison

Graph Foundations for Advanced Analytics and Collaboration
Graph Foundations for Advanced Analytics and CollaborationGraph Foundations for Advanced Analytics and Collaboration
Graph Foundations for Advanced Analytics and CollaborationAlan Morrison
 
Dcaf transformation & kg adoption 2022 -alan morrison
Dcaf transformation & kg adoption 2022 -alan morrisonDcaf transformation & kg adoption 2022 -alan morrison
Dcaf transformation & kg adoption 2022 -alan morrisonAlan Morrison
 
Paths to more personal and collaborative knowledge graphs
Paths to more personal and collaborative knowledge graphsPaths to more personal and collaborative knowledge graphs
Paths to more personal and collaborative knowledge graphsAlan Morrison
 
The boom in Xaas and the knowledge graph
The boom in Xaas and the knowledge graphThe boom in Xaas and the knowledge graph
The boom in Xaas and the knowledge graphAlan Morrison
 
Data-centric market status, case studies and outlook
Data-centric market status, case studies and outlookData-centric market status, case studies and outlook
Data-centric market status, case studies and outlookAlan Morrison
 
Data-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge GraphsData-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge GraphsAlan Morrison
 
Blockchain demystified
Blockchain demystifiedBlockchain demystified
Blockchain demystifiedAlan Morrison
 

More from Alan Morrison (7)

Graph Foundations for Advanced Analytics and Collaboration
Graph Foundations for Advanced Analytics and CollaborationGraph Foundations for Advanced Analytics and Collaboration
Graph Foundations for Advanced Analytics and Collaboration
 
Dcaf transformation & kg adoption 2022 -alan morrison
Dcaf transformation & kg adoption 2022 -alan morrisonDcaf transformation & kg adoption 2022 -alan morrison
Dcaf transformation & kg adoption 2022 -alan morrison
 
Paths to more personal and collaborative knowledge graphs
Paths to more personal and collaborative knowledge graphsPaths to more personal and collaborative knowledge graphs
Paths to more personal and collaborative knowledge graphs
 
The boom in Xaas and the knowledge graph
The boom in Xaas and the knowledge graphThe boom in Xaas and the knowledge graph
The boom in Xaas and the knowledge graph
 
Data-centric market status, case studies and outlook
Data-centric market status, case studies and outlookData-centric market status, case studies and outlook
Data-centric market status, case studies and outlook
 
Data-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge GraphsData-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge Graphs
 
Blockchain demystified
Blockchain demystifiedBlockchain demystified
Blockchain demystified
 

Recently uploaded

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 

Recently uploaded (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

FAIR data_ Superior data visibility and reuse without warehousing.pdf

  • 1. FAIR data: Superior data visibility and reuse without warehousing Alan Morrison Data Architecture Best Practices Summit April 18, 2023 1 Alain Audet at https://pixabay.com/photos/lake-foggy-lake-nature-landscape-6839357/
  • 2. Outline of today’s talk 2 ● Problem ○ Lack of desiloed, high quality, well-integrated data and logic at scale ○ Shortfalls of data warehousing ● Solution ○ FAIR data and knowledge graphs ■ Blended data + logic centered infrastructure ● Result ○ Case study examples ○ Organic, data-centric systems ○ Zero-copy integration feasibility
  • 3. Problem: Data quality, siloing and poor integration 3
  • 4. Yes, data warehousing focused on the integration problem 4 ● Pro: Identified the critical problem to solve ● Con: Advocated a method that doesn’t delve deep enough to solve today’s problem ● Still have the unified data model challenge
  • 5. Data warehousing can’t solve today’s integration challenge 5 ● Thousands of databases per enterprise (siloing) ● Thousands of applications (code sprawl) ● Data models buried in the app code ● Every app a special snowflake with its own data model
  • 6. How did we get here? By selling the old as new 6
  • 7. How data warehousing stopped scaling “They recognized that these themes ended up in all these legacy apps. Sales rolled up against a geographic and a product hierarchy, and an organizational hierarchy…. They said, Let’s have those conformed dimensions and a small number of facts. Let’s bring the facts from all the different systems and snap them together according to these conformed dimensions…. Brilliant idea, but I think what actually happened over time is the workload just got greater and greater. The ability of people to actually conform those dimensions kept eroding….” –Dave McComb, President, Semantic Arts “Disambiguation of Data Mesh, Fabric, Centric, Driven, and Everything!” YouTube video, https://www.youtube.com/watch?v=M5XlGloj4UY&t=564s, 2021 7
  • 8. Data warehousing model conformance doesn’t scale “I spent a good 15 years working in financial services at some pretty big banks. Half of the IT change budget is spent on integration and the by-products of integration….I saw as the technology was advancing that the percentage wasn’t going down – in fact, it was going up. At some point, is the integration tax going to be 100 percent?” – Dan DeMers, CEO of Cinchy “Disambiguation of Data Mesh, Fabric, Centric, Driven, and Everything!” YouTube video, https://www.youtube.com/watch?v=M5XlGloj4UY&t=564s, 2021 8
  • 9. An effective data model describes and unifies the contexts necessary for true data integration. It gives machines enough clues to detect and discover layered context. “What is data integration? Let's start with a short list of what data integration is not: ● It's not shoveling data around between systems. ● It's not calling an API. ● It's not creating a data connection to a source system. It can include one or more of the jobs in the list here above, but what is the ingredient that cannot be missing? It's connecting data from different source systems together in a consistent and coherent data model.” –Wouter Trappers, CDAO What’s a data model? What is data integration? 9
  • 10. Why large-scale integration? 10 Large scale integration is essential to avoiding observational bias. The drunk looking for his money under the lamppost analogy describes the nature of this bias. The drunk is looking for his money where the light is, even though he knows the money is in the shadows. To manage today’s business at scale, enterprises need light and visibility across departments, organizations and supply networks
  • 11. Solution: Scale FAIR data development using data-centric architecture, semantics and knowledge graph methods 11
  • 12. Simple web hosting + legacy Client-Server storage Early Web (on Client-Server) Compute and storage more loosely coupled, virtualized, controlled and data-centric “Decoupled” and “Decentralized” Cloud Application Distribution via Proprietary and IP Networking Client-Server and Desktops Commodity servers + storage + some virtualization Distributed Cloud and Mobile Devices 1st 2nd 3rd 4th 5th Centralized storage and compute, with minimal networking Mainframe and Green Screens The Five Commingled Phases of Compute, Networking and Storage 12 Less centralized Time More centralized Application Centric Data Centric All phases are still active and evolving
  • 13. Data-centric knowledge graphs allow desiloed visibility and interoperation at scale 13
  • 14. Opportunity: Unitary data + description logic = knowledge 14 “Data management” (structured data, mostly) Knowledge management (internally shared) Content management (externally shared) Learning management (internal coursework) FAIR data and associated description logic FAIR data is data users can have confidence in for many purposes. Data becomes FAIR when it disambiguates concepts, individuals and roles and how they interact and relate to one another. In a knowledge graph context, documented knowledge = FAIR data. FAIR stands for findable, accessible, interoperable, and reusable. Under the FAIR data umbrella are all heterogeneous types of data/content.
  • 15. Semantics is the path to FAIR, smart, siloless data sharing 15 James Kobelius, 2016 Association of European Libraries, 2017
  • 16. Compare FAIR and TRUST principles 16 Lin, D., Crabtree, J., Dillo, I. et al. The TRUST Principles for digital repositories. Sci Data 7, 144 (2020). https://doi.org/10.1038/s41597-020-0486-7 FAIR data leads to TRUSTed data repositories.
  • 17. Who’s behind the FAIR data movement? Big pharma, for one. “From 2023, drug submissions to the European Medicines Agency (EMA) must comply with select Identification of Medicinal Products (IDMP) standards. By developing an IDMP-compliant ontology with machine-ready data, the Alliance will support the move to automate this process, improving efficiency and patient safety, reducing costs and time burden, and driving innovation in the drug development pipeline. “The project is managed by the Pistoia Alliance, with a project team of experts from Bayer, Novartis, Roche, Merck KGaA, and GSK.” 17 –Erik Schultes, et al., ”FAIR Digital Twins for Data-Intensive Research,” PERSPECTIVE article Front. Big Data, 11 May 2022 Sec. Data Science Volume 5 - 2022 | https://doi.org/10.3389/fdata.2022.883341
  • 18. To create FAIR data, users can start with a single triple 18 Linked Open Data Cloud, 2022 Starter triple for a knowledge graph A standard knowledge graph consists of triplified, relationship-rich data. The data model, or ontology, is also described in triples and lives with the rest of the data. Ontologies can also be managed as data. Linking triples merely requires a verb (or predicate, or described edge) to link them.
  • 19. Simple way to start a business knowledge graph ● “Use JSON-LD to atomise your enterprise data down into three-part statements and voila! You get a connected graph! ● ✨ Decentralize the process by having each team publish their own JSON-LD, for example, let the sales team publish the sales data and ask them to link each sale to the correct product and client. ● 🤖 Connect GPT to the JSON-LD that your teams have published. Then, harness the power of GPT to assist new teams in publishing their JSON-LD and integrating it back into your enterprise-wide Knowledge Graph.” Key to scaling external/internal integration: use the schema.org modeled JSON-LD from websites GPT is trained on and connect it with internal data also modeled with schema.org –#HT Tony Seale, UBS https://www.linkedin.com/posts/tonyseale_mlops-dataintegration-ai-activity-7052551060237819904-bAZc 19
  • 20. To scale FAIR data, use an assisted, hybrid AI approach 20 Amit Sheth, From NLP to NLU: Why we need varied, comprehensive, and stratified knowledge (Neuro-symbolic AI),” USC Information Sciences Institute on YouTube, March 2023, https://www.youtube.com/watch?v=xyxQXka6dRY&t=2377s
  • 21. 21 How hybrid AI helps in research “LLMs have amazing abilities in manipulating natural language text, but generating timely and factually verified recommendations is one thing LLMs are not naturally great at.” –Mike Tung, CEO of Diffbot Diffbot Blog, April 2023, https://blog.diffbot.com/generating-company-recommendations-usi ng-large-language-models-and-knowledge-graphs/ LLMs aren’t a reliable research tool alone because they hallucinate. you can’t trust the answers unless you know the answer already. Mike Tung recommends more precise prompting on the query side and answer verification via a knowledge graph such as Diffbot. Both of these capabilities harness precise logical description missing in current LLM Q&As.
  • 22. KGs and data-centric architecture 22
  • 23. Semantic standards allow a desiloed data landscape 23
  • 24. How shared graph semantics helps ● Boosts meaningful results (result of lack of data and logic transparency and cohesiveness) and relevancy ● Contextualizes data for management and reuse with relationship logic ● Scales meaningful connections between contexts (relevant relationships living with entities) ● Enables Metcalfe’s network of networks effect (network_effectN ) ● Enables model-driven development (code once, reuse anywhere) ● Scale efficiencies and economies so that energy consumption is reduced 24
  • 26. IKEA’s product knowledge graph 26 Katariina Kari, “IKEA’s Knowledge Graph and Why It Has Three Layers,” August 2022, “https://medium.com/flat-pack-tech/ikeas-knowledge-graph-and-why-it-has-three-layers-a38fca436349 Currently designed to be customer facing; can evolve for logistics purposes with more detailed product data
  • 27. Blue Brain Nexus–graph-based Bioinformatics collaboration 27
  • 28. Montefiore Health’s Patient-centered Analytical Learning Machine – (“PALM”) – Personalized medicine at scale 28
  • 29. Enterprise decentralized app environment: OriginTrail.io 29 https://origintrail.io/
  • 30. OriginTrail + BSI’s supply chain tracking and tracing 30 OriginTrail and the British Standards Institute (BSI), https://twitter.com/origin_trail/status/1339606640887152642?s=20, Dec. 2020 The Monasteriven whiskey produced in Ireland is tracked and traced from “grain to glass” with the OriginTrail.io approach. OT uses decentralized knowledge graph that connects to one of several different blockchains. This method enables shared data reuse and other synergies across the supply chain.
  • 31. SOLID shared, federated XaaS: Construction industry 31 “TrinPod™: World's first conceptually indexed space-time digital twin using Solid,” Graphmetrix, 2022, https://graphmetrix.com/trinpod Company-specific SOLID storage pods and access control can be managed by each supply chain partner. Graphmetrix as digital twin provider manages the system and system-level apps.
  • 32. Digital twins and agents: Better data sharing than APIs? 32 Autonomous agents Digital twins Locale: Portsmouth, UK Sensor nets Iotics, 2019 and 2023
  • 34. Organic data when nurtured grows from seeds into trees 34 Rich data ecosystems evolve naturally by comparison with underdescribed, fragmented data assets Zero-copy integration becomes possible, reducing complexity, labor and energy waste by up to 90 percent Second-order cybernetics (humans in the loop) and precise facts and contextualization complement probabilistic methods
  • 35. Seven obstacles to adoption of FAIR data development at scale 35
  • 36. Thoughts and Reactions? Feel free to ping me anytime with questions, etc. Alan Morrison Data Science Central LinkedIn | Twitter | Quora | Slideshare +1 408 205 5109 a.s.morrison@gmail.com 36