SlideShare a Scribd company logo
DCA: Current Themes and
Trends*
Alan Morrison
Data-Centric Architecture Forum
May 2023
1
Alain
Audet
at
https://pixabay.com/photos/lake-foggy-lake-nature-landscape-6839357/
*Separate talk to cover NLP/LLMs
Business goals enabled by a connected, shared data
ecosystem
2
Buying Helping
Making Selling
Sharing
Inhibitors to ecosystem-level sharing
● Data feudalism
● Poorly defined regulatory challenges
● Weak public sector
● Public apathy
● Technology + investor inertia and lack of clear vision
● Magic bullet syndrome
● Media groupthink
● Idol worship
● Pervasive myopia
● Lack of organization fox empowerment over hedgehogs
3
Unclaimed data market territory
FAIR*
Actionability
Immediacy
Divining purpose
Divining intent
Synthesis
Reasoning
Abstraction
Contextualization
Connection
Classification
Identification
Unclaimed market territory
Staked claims
Present vs Future Shared Data Market Map
12
steps
to
FAIR
data
power
*Findable, accessible, interoperable, reusable data
Reach of
current ML
efforts
Challenge: Seamless, at-scale, FAIR data collaboration
5
James Kobelius, 2016
Association of European Libraries, 2017
6
Opportunity: Unitary data + description logic = knowledge
7
“Data management” (structured data,
mostly)
Knowledge management (internally
shared)
Content management (externally
shared)
Learning management (internal
coursework)
FAIR data and
associated
description
logic
FAIR data is data users can
have confidence in for
many purposes.
Data becomes FAIR when
it disambiguates concepts,
individuals and roles and
how they interact and relate
to one another.
In a knowledge graph
context, documented
knowledge = FAIR data.
Under the FAIR data umbrella are all heterogeneous
types of data/content.
To create a knowledge graph, users can start with a single triple
8
Linked Open Data Cloud, 2022
Starter triple for a knowledge graph
A standard knowledge graph consists of triplified, relationship-rich
data. The data model, or ontology, is also described in triples and
lives with the rest of the data. Ontologies can also be managed as
data. Linking triples merely requires a verb (or predicate, or
described edge) to link them.
Simple way to start a business knowledge graph (besides using gist)
● “Use JSON-LD to atomise your enterprise data down into three-part statements and voila!
You get a connected graph!
● ✨ Decentralize the process by having each team publish their own JSON-LD, for example,
let the sales team publish the sales data and ask them to link each sale to the correct product
and client.
● 🤖 Connect GPT to the JSON-LD that your teams have published. Then, harness the power
of GPT to assist new teams in publishing their JSON-LD and integrating it back into your
enterprise-wide Knowledge Graph.”
Key to scaling external/internal integration: use the schema.org modeled JSON-LD from websites
GPT is trained on and connect it with internal data also modeled with schema.org
–#HT Tony Seale, UBS
https://www.linkedin.com/posts/tonyseale_mlops-dataintegration-ai-activity-7052551060237819904-bAZc
9
Yes, data warehousing focused on the integration problem
10
● Pro: Identified the critical problem to solve
● Con: Advocated a method that doesn’t delve deep enough to solve today’s
problem
● Still face the unified data model challenge
No, data warehousing model conformance doesn’t scale
“I spent a good 15 years working in financial services at some
pretty big banks. Half of the IT change budget is spent on
integration and the by-products of integration….I saw as the
technology was advancing that the percentage wasn’t going
down – in fact, it was going up. At some point, is the integration
tax going to be 100 percent?”
– Dan DeMers, CEO of Cinchy
“Disambiguation of Data Mesh, Fabric, Centric, Driven, and Everything!” YouTube video,
https://www.youtube.com/watch?v=M5XlGloj4UY&t=564s, 2021
11
How data warehousing stopped scaling
“They recognized that these themes ended up in all these legacy apps. Sales rolled up against a
geographic and a product hierarchy, and an organizational hierarchy…. They said, Let’s have
those conformed dimensions and a small number of facts. Let’s bring the facts from all the
different systems and snap them together according to these conformed dimensions….
Brilliant idea, but I think what actually happened over time is the workload just got greater and
greater. The ability of people to actually conform those dimensions kept eroding….”
–Dave McComb, President, Semantic Arts
“Disambiguation of Data Mesh, Fabric, Centric, Driven, and Everything!” YouTube video, https://www.youtube.com/watch?v=M5XlGloj4UY&t=564s, 2021
12
Data warehousing can’t solve today’s integration challenge
13
● Thousands of databases per enterprise (siloing)
● Thousands of applications (code sprawl)
● Data models buried in the app code
● Every app a special snowflake with its own data model
How did we get here? By selling the old as new
14
Why large-scale integration?
15
Large scale integration is essential to
avoiding observational bias. The drunk
looking for his money under the lamppost
analogy describes the nature of this bias.
The drunk is looking for his money where
the light is, even though he knows the
money is in the shadows.
To manage today’s business at scale,
enterprises need light and visibility
across departments, organizations and
supply networks
Semantic standards allow a desiloed data landscape for
interactive, interoperable digital twins and agents
16
Promise of digital twins and agents–way beyond APIs
17
Autonomous agents
Digital twins/
Small KGs
Locale: Portsmouth, UK
Sensor nets
Iotics, 2019
and 2023
How shared graph semantics helps
● Boosts meaningful results (result of lack of data and logic transparency and
cohesiveness) and relevancy
● Contextualizes data for management and reuse with relationship logic
● Scales meaningful connections between contexts (relevant relationships
living with entities)
● Enables Metcalfe’s network of networks effect (network_effectN
)
● Enables model-driven development via knowledge graphs (code once, reuse
anywhere)
● Provides access vIa KGs to logic programs as well as heterogeneous, smart data
● Scale efficiencies and economies so that energy consumption is reduced
18
KG centricity makes reliable, automated data webs possible
19
Data teams report spending 25-30% of their time cleaning, labelling, and
gathering data sets.... [Some can spend 80% plus]
What we know for sure is that data teams and knowledge workers
generally spend a noteworthy amount of their time procuring data
points that are available on the public web…”
It took Google knowledge panels one month and twenty days to update
following the inception of a new CEO at Citi, a F100 company. In Diffbot’s
Knowledge Graph, a new fact was logged within the week, with zero
human intervention and sourced from the public web.
– Merrill Cook, Diffbot Blog, 2021-2022
Example capabilities in Diffbot’s AI automated KG
20
Mike Tung, “VLDB2020: The Diffbot Knowledge Graph,”2020
“Decentralization”: Why you should care
● Further desiloing
● More systems federation
● More interorganizational use potential
● Data Centric approach to architecture
● “Decentralized/Web3 stack”
● More storage options and tiering
● Options at different temperatures (hot vs. cold storage) for new use cases
● More captive and independent storage
21
Simple web hosting + legacy Client-Server
storage
Early Web (on Client-Server)
Compute and storage more loosely coupled,
virtualized, controlled and data-centric
“Decoupled” and “Decentralized” Cloud
Application Distribution via Proprietary
and IP Networking
Client-Server and Desktops
Commodity servers + storage + some
virtualization
Distributed Cloud and Mobile Devices
1st
2nd
3rd
4th
5th
Centralized storage and compute, with
minimal networking
Mainframe and Green Screens
The Five Commingled Phases of Compute, Networking and Storage
22
Less
centralized
Time
More
centralized
Application
Centric
Data
Centric
All phases are
still active and
evolving
Degree of control assumes a continuum–not a binary split
23
See Thomas W. Malone, Inventing the Organizations of the 21st Century, MIT Press, 2003, 45FF.
SOLID: Federated storage and decentralized apps
24
Ruben Verborgh, “Decentralizing personal data management with Solid: a hands-on workshop,” SEMIC Workshop, October 2020
SOLID shared, federated XaaS: Construction industry
25
“TrinPod™: World's first conceptually indexed space-time
digital twin using Solid,” Graphmetrix, 2022,
https://graphmetrix.com/trinpod
Company-specific SOLID storage pods and access
control can be managed by each supply chain partner.
Graphmetrix as digital twin provider manages the
system and system-level apps.
Peergos makes personal file storage management possible via IPFS and a
browser
26
Peergos technology logical architecture, https://peergos.org/technology, 2019
Peergos is a personal data
dcloud storage environment
that also uses blockchain
based decentralized
public-key-infrastructure
(dpki). Consider as an
alternative to Google or
Amazon Photos, for example.
Enterprise decentralized app environment: OriginTrail.io
27
https://origintrail.io/
OriginTrail + BSI’s supply chain tracking and tracing
28
OriginTrail and the British Standards Institute (BSI), https://twitter.com/origin_trail/status/1339606640887152642?s=20, Dec. 2020
The Monasteriven
whiskey produced in
Ireland is tracked and
traced from “grain to
glass” with the
OriginTrail.io
approach.
OT uses
decentralized
knowledge graph that
connects to one of
several different
blockchains.
This method enables
shared data reuse
and other synergies
across the supply
chain.
Seven obstacles to adoption of decentralized,
interorganizational environments
29
To succeed, organizations will have to become
more bona fide data-centric organizations first
30
Seven obstacles to adoption of FAIR data development at scale
31
Thoughts and Reactions?
Feel free to ping me anytime with questions, etc.
Alan Morrison
Data Science Central
LinkedIn | Twitter | Quora | Slideshare
+1 408 205 5109
a.s.morrison@gmail.com
32
From NLP, to stochastic parrots,
to neurosymbolic AI
Alan Morrison
Data-Centric Architecture Forum
May 2023
33
What’s a “stochastic parrot” and one who worships the same?
“A Language Model is a system for haphazardly stitching together sequences of linguistic
forms it has observed in its vast training data, according to probabilistic information about
how they combine, but without any reference to meaning: a stochastic parrot.”
–Emily Bender, et al., “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?,”
ACM paper presented at FAccT ’21, March 3–10, 2021, virtual event, Canada
Stochastic parrot worshippers: Those who mindlessly praise LLMs without realizing they’ve
mistaken the parrot part—probabilistic language methods alone–for the whole. These
worshippers seem to assume those methods alone will deliver artificial general intelligence.
Related term: Documentation debt (also per Bender, et al.)
“When we rely on ever larger datasets we risk incurring documentation debt,” they say, “i.e.,
putting ourselves in the situation where the datasets are both undocumented and too large to
document post hoc…. The solution, we propose, is to budget for documentation as part of the
planned costs of dataset creation.”
34
Deep learning guru Yann LeCun on LLMs
35
What’s Natural Language Processing (NLP)?
36
“The root of Natural Language
Processing dates back to the 1950s
when Alan Turing first devised the
Turing Test.
“The objective of the Turing Test was
to determine whether a computer
was truly intelligent based on its
ability to interpret and generate
natural language as a criterion of
intelligence.”
– Tithy Sreemani, Analytics Vidhya
blog, 2022
What’s natural language understanding (NLU)?
1. A form of overpromising and underdelivering, or
2. A serious, ongoing linguistics + cognition endeavor to model how human
understanding works.
37
A sentence-level
model based on Role
and Reference
Grammar by PAT
Inc., 2022.
What’s a large language model (LLM)?
1. A neural network with many layers (“deep learning”).
2. A transformer model that “learns” context a token at a time, in sequence.
3. A tokenizer that converts words to numbers and numbers to words.
4. A token-to-embedding (vectorization) transformer.
5. An ML model that is trained on very large data sets with millions of billions of
parameters (akin to multi-dimensional topographic features)
6. The NLP (natural language processing) system currently in vogue.
38
LLM Leaderboard (partial)
39
Dan Saatrup Nielsen, Alexandra Institute, LinkedIn post, 2023
Solving arithmetic or chasing “facts” with LLMs wastes time and energy
“Suppose that I wanted to find out the square root of five. If I asked an LLM (say ChatGPT), getting this answer involves the
following steps:
● Me: Send a prompt saying “What is the square root of 5?”
● ChatGPT: Do I understand the concept of square root? Yes, I do … it’s a math function.
● ChatGPT: There is a Python function that can be used to invoked that function, in the Python Math Library. Retrieve
that library.
● ChatGPT: Evaluate the number 5 with the function call to get the value 2.235.
● ChatGPT: Construct a response and send that response back to the client.
This assumes that everything goes right.”
– Curt Kagle, The Cagle Report
40
Knowledge graphs know; LLMs need prompts and figure it out, sort of .
“LLMs have to figure things out. They follow an iterative feedback loop called a
langchain, with either a human, itself, or a combination of the two. This
langchain model should be emulatable with SPARQL.
“Update. I’m playing around with this idea on Jena/Fuseki, and the early results
are … intriguing. The key is to recognize that you are doing mutations to the
database, which makes many DBAs cringe. However, I don’t think there is any
way you can get to conversational AI on a knowledge graph without constantly
building (and, when necessary, destroying) contextual graphs.”
Kurt Cagle. “Figuring Out vs. Knowing,” The Cagle Report
41
Idea: Connect the LLM directly to a KG such as Wikidata
“We can just use the SPARQL query generation ability directly and ask queries
against Wikidata. Not only can we connect the LLM to a knowledge graph, but
also to a repository of functions such as wiki functions.” LLM can learn to use KGs
and functions as tools.”
–Denny Vrandečić, Wikimedia Foundation, 2023
42
Each machine learning answer creates some uncertainty
“You can use machine learning to retrieve Obama’s birthplace every time you
need it, but it costs a lot, and you’re never sure it’s correct.”
–Jamie Taylor of Google
43
Efficiency argument for knowledge graphs
“Why would you ever use a 96-layer, 156 billion parameter large language model
to do multiplication, when that’s something you can do in a single operation on
your CPU?”
“Why internalize knowledge in an LLM, when you can externalize it in a graph
store and look it up when you need it?”
“Use LLMs where they are efficient.”
– Denny Vrandečić of the Wikimedia Foundation
44
To scale FAIR data, use an assisted, hybrid AI approach
45
Amit Sheth, From NLP to NLU: Why we need varied, comprehensive, and stratified knowledge (Neuro-symbolic AI),” USC Information Sciences Institute on
YouTube, March 2023, https://www.youtube.com/watch?v=xyxQXka6dRY&t=2377s
46
How hybrid AI helps in research
“LLMs have amazing abilities in
manipulating natural language text,
but generating timely and factually
verified recommendations is one
thing LLMs are not naturally great
at.”
–Mike Tung, CEO of Diffbot
Diffbot Blog, April 2023,
https://blog.diffbot.com/generating-company-recommendations-usi
ng-large-language-models-and-knowledge-graphs/
LLMs aren’t a reliable research tool
alone because they hallucinate. you
can’t trust the answers unless you know
the answer already.
Mike Tung recommends more precise
prompting on the query side and answer
verification via a knowledge graph such
as Diffbot. Both of these capabilities
harness precise logical description
missing in current LLM Q&As.
NLP’s compost grinder data mentality
47
https://pixabay.com/photos/compost-grinder-compost-chipper-3389088/
Versus KGs growing naturally in companion plant mode
48
Rich data ecosystems evolve naturally by
comparison with underdescribed, fragmented
data assets
Zero-copy integration becomes possible,
reducing complexity, labor and energy waste by
up to 90 percent
Second-order cybernetics (humans in the loop)
and precise facts and contextualization
complement probabilistic methods
https://www.fruitsaladtrees.com/blogs/news/ediblegarden
AI’s Wave III: Less wasteful, more explicit smart data
management via a knowledge graph foundation
49
Thoughts and Reactions?
Feel free to ping me anytime with questions, etc.
Alan Morrison
Data Science Central
LinkedIn | Twitter | Quora | Slideshare
+1 408 205 5109
a.s.morrison@gmail.com
50
NLP versus NLU: Most true understanding is unclaimed territory
51
Unclaimed data market territory
FAIR*
Actionability
Immediacy
Divining purpose
Divining intent
Synthesis
Reasoning
Abstraction
Contextualization
Connection
Classification
Identification
Unclaimed market territory
Staked claims
Present vs Future Data Market Map
12
steps
to
FAIR
data
power
*Findable, accessible, interoperable, reusable data
Reach of
current ML
efforts
History of LLMs
52
Feature growth
53
Energy consumption
54
Stochastic parrots and hallucination
55
Neurosymbolic AI
56
Teaching LLMs to query knowledge graphs
57
Datalanguage hackathon results
58
Semantic community LLM use results
59
Goal: Develop FAIR data efficiently
60
Thoughts and Reactions?
Feel free to ping me anytime with questions, etc.
Alan Morrison
Data Science Central
LinkedIn | Twitter | Quora | Slideshare
+1 408 205 5109
a.s.morrison@gmail.com
61

More Related Content

What's hot

Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep Learning
Matthew Opala
 
How to Choose the Right Database for Your Workloads
How to Choose the Right Database for Your WorkloadsHow to Choose the Right Database for Your Workloads
How to Choose the Right Database for Your Workloads
InfluxData
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
Mykola Zerniuk
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
Mark Kromer
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
Databricks
 
stackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviatestackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviate
NETWAYS
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnn
Taeoh Kim
 
Azure Data Engineering.pptx
Azure Data Engineering.pptxAzure Data Engineering.pptx
Azure Data Engineering.pptx
priyadharshini626440
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
WinWire Technologies Inc
 
New Features in OBIEE 12c
New Features in OBIEE 12c New Features in OBIEE 12c
New Features in OBIEE 12c
Michelle Kolbe
 
Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detection
NAVER Engineering
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
Timothy McAliley
 
NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016
Mark Kilgard
 
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
changedaeoh
 
Azure Digital Twins.pdf
Azure Digital Twins.pdfAzure Digital Twins.pdf
Azure Digital Twins.pdf
Tomasz Kopacz
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
SpringPeople
 
adb.pdf
adb.pdfadb.pdf
Understanding neural radiance fields
Understanding neural radiance fieldsUnderstanding neural radiance fields
Understanding neural radiance fields
Varun Bhaseen
 

What's hot (20)

Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep Learning
 
How to Choose the Right Database for Your Workloads
How to Choose the Right Database for Your WorkloadsHow to Choose the Right Database for Your Workloads
How to Choose the Right Database for Your Workloads
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
stackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviatestackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviate
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnn
 
Azure Data Engineering.pptx
Azure Data Engineering.pptxAzure Data Engineering.pptx
Azure Data Engineering.pptx
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
 
New Features in OBIEE 12c
New Features in OBIEE 12c New Features in OBIEE 12c
New Features in OBIEE 12c
 
Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detection
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
 
NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016
 
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
 
Azure Digital Twins.pdf
Azure Digital Twins.pdfAzure Digital Twins.pdf
Azure Digital Twins.pdf
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
 
adb.pdf
adb.pdfadb.pdf
adb.pdf
 
Understanding neural radiance fields
Understanding neural radiance fieldsUnderstanding neural radiance fields
Understanding neural radiance fields
 

Similar to DCAF 2023 1 and 2.pdf

FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdf
Alan Morrison
 
Scaling the mirrorworld with knowledge graphs
Scaling the mirrorworld with knowledge graphsScaling the mirrorworld with knowledge graphs
Scaling the mirrorworld with knowledge graphs
Alan Morrison
 
Tecnologias Estratégicas
Tecnologias Estratégicas Tecnologias Estratégicas
Tecnologias Estratégicas sucesu68
 
HEC Digital Business. Sharing Economy and other trends
HEC Digital Business. Sharing Economy and other trendsHEC Digital Business. Sharing Economy and other trends
HEC Digital Business. Sharing Economy and other trends
André Blavier
 
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationMyth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Denodo
 
Big Data Science Workshop Documentation V1.0
Big Data Science Workshop Documentation V1.0Big Data Science Workshop Documentation V1.0
Big Data Science Workshop Documentation V1.0
Abdelrahman Astro
 
Agile Data Management with Enterprise Data Fabric (ASEAN)
Agile Data Management with Enterprise Data Fabric (ASEAN)Agile Data Management with Enterprise Data Fabric (ASEAN)
Agile Data Management with Enterprise Data Fabric (ASEAN)
Denodo
 
DCA Symposium 6 Feb 2023.pdf
DCA Symposium 6 Feb 2023.pdfDCA Symposium 6 Feb 2023.pdf
DCA Symposium 6 Feb 2023.pdf
Alan Morrison
 
Data centric business and knowledge graph trends
Data centric business and knowledge graph trendsData centric business and knowledge graph trends
Data centric business and knowledge graph trends
Alan Morrison
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
Denodo
 
3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your Portfolio3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your Portfolio
Denodo
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely heading
Paco Nathan
 
Five Key Focus Areas for New-Age Collaboration
Five Key Focus Areas for New-Age CollaborationFive Key Focus Areas for New-Age Collaboration
Five Key Focus Areas for New-Age Collaboration
Cognizant
 
The FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdfThe FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdf
Alan Morrison
 
The boom in Xaas and the knowledge graph
The boom in Xaas and the knowledge graphThe boom in Xaas and the knowledge graph
The boom in Xaas and the knowledge graph
Alan Morrison
 
A blueprint for data in a multicloud world
A blueprint for data in a multicloud worldA blueprint for data in a multicloud world
A blueprint for data in a multicloud world
Mehdi Charafeddine
 
Big Data.pdf
Big Data.pdfBig Data.pdf
Big Data.pdf
AnilaAbid2
 
AI Trends.pdf
AI Trends.pdfAI Trends.pdf
AI Trends.pdf
RevaldiAnggara
 
Meetup 10 here&now: Megatris Comp design method (Part 1)
Meetup 10 here&now: Megatris Comp design method (Part 1)Meetup 10 here&now: Megatris Comp design method (Part 1)
Meetup 10 here&now: Megatris Comp design method (Part 1)
Megatris Comp
 

Similar to DCAF 2023 1 and 2.pdf (20)

FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdf
 
Scaling the mirrorworld with knowledge graphs
Scaling the mirrorworld with knowledge graphsScaling the mirrorworld with knowledge graphs
Scaling the mirrorworld with knowledge graphs
 
Tecnologias Estratégicas
Tecnologias Estratégicas Tecnologias Estratégicas
Tecnologias Estratégicas
 
HEC Digital Business. Sharing Economy and other trends
HEC Digital Business. Sharing Economy and other trendsHEC Digital Business. Sharing Economy and other trends
HEC Digital Business. Sharing Economy and other trends
 
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationMyth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
 
Big Data Science Workshop Documentation V1.0
Big Data Science Workshop Documentation V1.0Big Data Science Workshop Documentation V1.0
Big Data Science Workshop Documentation V1.0
 
Agile Data Management with Enterprise Data Fabric (ASEAN)
Agile Data Management with Enterprise Data Fabric (ASEAN)Agile Data Management with Enterprise Data Fabric (ASEAN)
Agile Data Management with Enterprise Data Fabric (ASEAN)
 
DCA Symposium 6 Feb 2023.pdf
DCA Symposium 6 Feb 2023.pdfDCA Symposium 6 Feb 2023.pdf
DCA Symposium 6 Feb 2023.pdf
 
SegmentOfOne
SegmentOfOneSegmentOfOne
SegmentOfOne
 
Data centric business and knowledge graph trends
Data centric business and knowledge graph trendsData centric business and knowledge graph trends
Data centric business and knowledge graph trends
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
 
3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your Portfolio3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your Portfolio
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely heading
 
Five Key Focus Areas for New-Age Collaboration
Five Key Focus Areas for New-Age CollaborationFive Key Focus Areas for New-Age Collaboration
Five Key Focus Areas for New-Age Collaboration
 
The FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdfThe FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdf
 
The boom in Xaas and the knowledge graph
The boom in Xaas and the knowledge graphThe boom in Xaas and the knowledge graph
The boom in Xaas and the knowledge graph
 
A blueprint for data in a multicloud world
A blueprint for data in a multicloud worldA blueprint for data in a multicloud world
A blueprint for data in a multicloud world
 
Big Data.pdf
Big Data.pdfBig Data.pdf
Big Data.pdf
 
AI Trends.pdf
AI Trends.pdfAI Trends.pdf
AI Trends.pdf
 
Meetup 10 here&now: Megatris Comp design method (Part 1)
Meetup 10 here&now: Megatris Comp design method (Part 1)Meetup 10 here&now: Megatris Comp design method (Part 1)
Meetup 10 here&now: Megatris Comp design method (Part 1)
 

More from Alan Morrison

Graph Foundations for Advanced Analytics and Collaboration
Graph Foundations for Advanced Analytics and CollaborationGraph Foundations for Advanced Analytics and Collaboration
Graph Foundations for Advanced Analytics and Collaboration
Alan Morrison
 
Dcaf transformation & kg adoption 2022 -alan morrison
Dcaf transformation & kg adoption 2022 -alan morrisonDcaf transformation & kg adoption 2022 -alan morrison
Dcaf transformation & kg adoption 2022 -alan morrison
Alan Morrison
 
Paths to more personal and collaborative knowledge graphs
Paths to more personal and collaborative knowledge graphsPaths to more personal and collaborative knowledge graphs
Paths to more personal and collaborative knowledge graphs
Alan Morrison
 
Data-centric design and the knowledge graph
Data-centric design and the knowledge graphData-centric design and the knowledge graph
Data-centric design and the knowledge graph
Alan Morrison
 
Data-centric market status, case studies and outlook
Data-centric market status, case studies and outlookData-centric market status, case studies and outlook
Data-centric market status, case studies and outlook
Alan Morrison
 
Data-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge GraphsData-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge Graphs
Alan Morrison
 
Blockchain demystified
Blockchain demystifiedBlockchain demystified
Blockchain demystified
Alan Morrison
 

More from Alan Morrison (7)

Graph Foundations for Advanced Analytics and Collaboration
Graph Foundations for Advanced Analytics and CollaborationGraph Foundations for Advanced Analytics and Collaboration
Graph Foundations for Advanced Analytics and Collaboration
 
Dcaf transformation & kg adoption 2022 -alan morrison
Dcaf transformation & kg adoption 2022 -alan morrisonDcaf transformation & kg adoption 2022 -alan morrison
Dcaf transformation & kg adoption 2022 -alan morrison
 
Paths to more personal and collaborative knowledge graphs
Paths to more personal and collaborative knowledge graphsPaths to more personal and collaborative knowledge graphs
Paths to more personal and collaborative knowledge graphs
 
Data-centric design and the knowledge graph
Data-centric design and the knowledge graphData-centric design and the knowledge graph
Data-centric design and the knowledge graph
 
Data-centric market status, case studies and outlook
Data-centric market status, case studies and outlookData-centric market status, case studies and outlook
Data-centric market status, case studies and outlook
 
Data-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge GraphsData-Centric Business Transformation Using Knowledge Graphs
Data-Centric Business Transformation Using Knowledge Graphs
 
Blockchain demystified
Blockchain demystifiedBlockchain demystified
Blockchain demystified
 

Recently uploaded

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 

Recently uploaded (20)

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 

DCAF 2023 1 and 2.pdf

  • 1. DCA: Current Themes and Trends* Alan Morrison Data-Centric Architecture Forum May 2023 1 Alain Audet at https://pixabay.com/photos/lake-foggy-lake-nature-landscape-6839357/ *Separate talk to cover NLP/LLMs
  • 2. Business goals enabled by a connected, shared data ecosystem 2 Buying Helping Making Selling Sharing
  • 3. Inhibitors to ecosystem-level sharing ● Data feudalism ● Poorly defined regulatory challenges ● Weak public sector ● Public apathy ● Technology + investor inertia and lack of clear vision ● Magic bullet syndrome ● Media groupthink ● Idol worship ● Pervasive myopia ● Lack of organization fox empowerment over hedgehogs 3
  • 4. Unclaimed data market territory FAIR* Actionability Immediacy Divining purpose Divining intent Synthesis Reasoning Abstraction Contextualization Connection Classification Identification Unclaimed market territory Staked claims Present vs Future Shared Data Market Map 12 steps to FAIR data power *Findable, accessible, interoperable, reusable data Reach of current ML efforts
  • 5. Challenge: Seamless, at-scale, FAIR data collaboration 5 James Kobelius, 2016 Association of European Libraries, 2017
  • 6. 6
  • 7. Opportunity: Unitary data + description logic = knowledge 7 “Data management” (structured data, mostly) Knowledge management (internally shared) Content management (externally shared) Learning management (internal coursework) FAIR data and associated description logic FAIR data is data users can have confidence in for many purposes. Data becomes FAIR when it disambiguates concepts, individuals and roles and how they interact and relate to one another. In a knowledge graph context, documented knowledge = FAIR data. Under the FAIR data umbrella are all heterogeneous types of data/content.
  • 8. To create a knowledge graph, users can start with a single triple 8 Linked Open Data Cloud, 2022 Starter triple for a knowledge graph A standard knowledge graph consists of triplified, relationship-rich data. The data model, or ontology, is also described in triples and lives with the rest of the data. Ontologies can also be managed as data. Linking triples merely requires a verb (or predicate, or described edge) to link them.
  • 9. Simple way to start a business knowledge graph (besides using gist) ● “Use JSON-LD to atomise your enterprise data down into three-part statements and voila! You get a connected graph! ● ✨ Decentralize the process by having each team publish their own JSON-LD, for example, let the sales team publish the sales data and ask them to link each sale to the correct product and client. ● 🤖 Connect GPT to the JSON-LD that your teams have published. Then, harness the power of GPT to assist new teams in publishing their JSON-LD and integrating it back into your enterprise-wide Knowledge Graph.” Key to scaling external/internal integration: use the schema.org modeled JSON-LD from websites GPT is trained on and connect it with internal data also modeled with schema.org –#HT Tony Seale, UBS https://www.linkedin.com/posts/tonyseale_mlops-dataintegration-ai-activity-7052551060237819904-bAZc 9
  • 10. Yes, data warehousing focused on the integration problem 10 ● Pro: Identified the critical problem to solve ● Con: Advocated a method that doesn’t delve deep enough to solve today’s problem ● Still face the unified data model challenge
  • 11. No, data warehousing model conformance doesn’t scale “I spent a good 15 years working in financial services at some pretty big banks. Half of the IT change budget is spent on integration and the by-products of integration….I saw as the technology was advancing that the percentage wasn’t going down – in fact, it was going up. At some point, is the integration tax going to be 100 percent?” – Dan DeMers, CEO of Cinchy “Disambiguation of Data Mesh, Fabric, Centric, Driven, and Everything!” YouTube video, https://www.youtube.com/watch?v=M5XlGloj4UY&t=564s, 2021 11
  • 12. How data warehousing stopped scaling “They recognized that these themes ended up in all these legacy apps. Sales rolled up against a geographic and a product hierarchy, and an organizational hierarchy…. They said, Let’s have those conformed dimensions and a small number of facts. Let’s bring the facts from all the different systems and snap them together according to these conformed dimensions…. Brilliant idea, but I think what actually happened over time is the workload just got greater and greater. The ability of people to actually conform those dimensions kept eroding….” –Dave McComb, President, Semantic Arts “Disambiguation of Data Mesh, Fabric, Centric, Driven, and Everything!” YouTube video, https://www.youtube.com/watch?v=M5XlGloj4UY&t=564s, 2021 12
  • 13. Data warehousing can’t solve today’s integration challenge 13 ● Thousands of databases per enterprise (siloing) ● Thousands of applications (code sprawl) ● Data models buried in the app code ● Every app a special snowflake with its own data model
  • 14. How did we get here? By selling the old as new 14
  • 15. Why large-scale integration? 15 Large scale integration is essential to avoiding observational bias. The drunk looking for his money under the lamppost analogy describes the nature of this bias. The drunk is looking for his money where the light is, even though he knows the money is in the shadows. To manage today’s business at scale, enterprises need light and visibility across departments, organizations and supply networks
  • 16. Semantic standards allow a desiloed data landscape for interactive, interoperable digital twins and agents 16
  • 17. Promise of digital twins and agents–way beyond APIs 17 Autonomous agents Digital twins/ Small KGs Locale: Portsmouth, UK Sensor nets Iotics, 2019 and 2023
  • 18. How shared graph semantics helps ● Boosts meaningful results (result of lack of data and logic transparency and cohesiveness) and relevancy ● Contextualizes data for management and reuse with relationship logic ● Scales meaningful connections between contexts (relevant relationships living with entities) ● Enables Metcalfe’s network of networks effect (network_effectN ) ● Enables model-driven development via knowledge graphs (code once, reuse anywhere) ● Provides access vIa KGs to logic programs as well as heterogeneous, smart data ● Scale efficiencies and economies so that energy consumption is reduced 18
  • 19. KG centricity makes reliable, automated data webs possible 19 Data teams report spending 25-30% of their time cleaning, labelling, and gathering data sets.... [Some can spend 80% plus] What we know for sure is that data teams and knowledge workers generally spend a noteworthy amount of their time procuring data points that are available on the public web…” It took Google knowledge panels one month and twenty days to update following the inception of a new CEO at Citi, a F100 company. In Diffbot’s Knowledge Graph, a new fact was logged within the week, with zero human intervention and sourced from the public web. – Merrill Cook, Diffbot Blog, 2021-2022
  • 20. Example capabilities in Diffbot’s AI automated KG 20 Mike Tung, “VLDB2020: The Diffbot Knowledge Graph,”2020
  • 21. “Decentralization”: Why you should care ● Further desiloing ● More systems federation ● More interorganizational use potential ● Data Centric approach to architecture ● “Decentralized/Web3 stack” ● More storage options and tiering ● Options at different temperatures (hot vs. cold storage) for new use cases ● More captive and independent storage 21
  • 22. Simple web hosting + legacy Client-Server storage Early Web (on Client-Server) Compute and storage more loosely coupled, virtualized, controlled and data-centric “Decoupled” and “Decentralized” Cloud Application Distribution via Proprietary and IP Networking Client-Server and Desktops Commodity servers + storage + some virtualization Distributed Cloud and Mobile Devices 1st 2nd 3rd 4th 5th Centralized storage and compute, with minimal networking Mainframe and Green Screens The Five Commingled Phases of Compute, Networking and Storage 22 Less centralized Time More centralized Application Centric Data Centric All phases are still active and evolving
  • 23. Degree of control assumes a continuum–not a binary split 23 See Thomas W. Malone, Inventing the Organizations of the 21st Century, MIT Press, 2003, 45FF.
  • 24. SOLID: Federated storage and decentralized apps 24 Ruben Verborgh, “Decentralizing personal data management with Solid: a hands-on workshop,” SEMIC Workshop, October 2020
  • 25. SOLID shared, federated XaaS: Construction industry 25 “TrinPod™: World's first conceptually indexed space-time digital twin using Solid,” Graphmetrix, 2022, https://graphmetrix.com/trinpod Company-specific SOLID storage pods and access control can be managed by each supply chain partner. Graphmetrix as digital twin provider manages the system and system-level apps.
  • 26. Peergos makes personal file storage management possible via IPFS and a browser 26 Peergos technology logical architecture, https://peergos.org/technology, 2019 Peergos is a personal data dcloud storage environment that also uses blockchain based decentralized public-key-infrastructure (dpki). Consider as an alternative to Google or Amazon Photos, for example.
  • 27. Enterprise decentralized app environment: OriginTrail.io 27 https://origintrail.io/
  • 28. OriginTrail + BSI’s supply chain tracking and tracing 28 OriginTrail and the British Standards Institute (BSI), https://twitter.com/origin_trail/status/1339606640887152642?s=20, Dec. 2020 The Monasteriven whiskey produced in Ireland is tracked and traced from “grain to glass” with the OriginTrail.io approach. OT uses decentralized knowledge graph that connects to one of several different blockchains. This method enables shared data reuse and other synergies across the supply chain.
  • 29. Seven obstacles to adoption of decentralized, interorganizational environments 29
  • 30. To succeed, organizations will have to become more bona fide data-centric organizations first 30
  • 31. Seven obstacles to adoption of FAIR data development at scale 31
  • 32. Thoughts and Reactions? Feel free to ping me anytime with questions, etc. Alan Morrison Data Science Central LinkedIn | Twitter | Quora | Slideshare +1 408 205 5109 a.s.morrison@gmail.com 32
  • 33. From NLP, to stochastic parrots, to neurosymbolic AI Alan Morrison Data-Centric Architecture Forum May 2023 33
  • 34. What’s a “stochastic parrot” and one who worships the same? “A Language Model is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.” –Emily Bender, et al., “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?,” ACM paper presented at FAccT ’21, March 3–10, 2021, virtual event, Canada Stochastic parrot worshippers: Those who mindlessly praise LLMs without realizing they’ve mistaken the parrot part—probabilistic language methods alone–for the whole. These worshippers seem to assume those methods alone will deliver artificial general intelligence. Related term: Documentation debt (also per Bender, et al.) “When we rely on ever larger datasets we risk incurring documentation debt,” they say, “i.e., putting ourselves in the situation where the datasets are both undocumented and too large to document post hoc…. The solution, we propose, is to budget for documentation as part of the planned costs of dataset creation.” 34
  • 35. Deep learning guru Yann LeCun on LLMs 35
  • 36. What’s Natural Language Processing (NLP)? 36 “The root of Natural Language Processing dates back to the 1950s when Alan Turing first devised the Turing Test. “The objective of the Turing Test was to determine whether a computer was truly intelligent based on its ability to interpret and generate natural language as a criterion of intelligence.” – Tithy Sreemani, Analytics Vidhya blog, 2022
  • 37. What’s natural language understanding (NLU)? 1. A form of overpromising and underdelivering, or 2. A serious, ongoing linguistics + cognition endeavor to model how human understanding works. 37 A sentence-level model based on Role and Reference Grammar by PAT Inc., 2022.
  • 38. What’s a large language model (LLM)? 1. A neural network with many layers (“deep learning”). 2. A transformer model that “learns” context a token at a time, in sequence. 3. A tokenizer that converts words to numbers and numbers to words. 4. A token-to-embedding (vectorization) transformer. 5. An ML model that is trained on very large data sets with millions of billions of parameters (akin to multi-dimensional topographic features) 6. The NLP (natural language processing) system currently in vogue. 38
  • 39. LLM Leaderboard (partial) 39 Dan Saatrup Nielsen, Alexandra Institute, LinkedIn post, 2023
  • 40. Solving arithmetic or chasing “facts” with LLMs wastes time and energy “Suppose that I wanted to find out the square root of five. If I asked an LLM (say ChatGPT), getting this answer involves the following steps: ● Me: Send a prompt saying “What is the square root of 5?” ● ChatGPT: Do I understand the concept of square root? Yes, I do … it’s a math function. ● ChatGPT: There is a Python function that can be used to invoked that function, in the Python Math Library. Retrieve that library. ● ChatGPT: Evaluate the number 5 with the function call to get the value 2.235. ● ChatGPT: Construct a response and send that response back to the client. This assumes that everything goes right.” – Curt Kagle, The Cagle Report 40
  • 41. Knowledge graphs know; LLMs need prompts and figure it out, sort of . “LLMs have to figure things out. They follow an iterative feedback loop called a langchain, with either a human, itself, or a combination of the two. This langchain model should be emulatable with SPARQL. “Update. I’m playing around with this idea on Jena/Fuseki, and the early results are … intriguing. The key is to recognize that you are doing mutations to the database, which makes many DBAs cringe. However, I don’t think there is any way you can get to conversational AI on a knowledge graph without constantly building (and, when necessary, destroying) contextual graphs.” Kurt Cagle. “Figuring Out vs. Knowing,” The Cagle Report 41
  • 42. Idea: Connect the LLM directly to a KG such as Wikidata “We can just use the SPARQL query generation ability directly and ask queries against Wikidata. Not only can we connect the LLM to a knowledge graph, but also to a repository of functions such as wiki functions.” LLM can learn to use KGs and functions as tools.” –Denny Vrandečić, Wikimedia Foundation, 2023 42
  • 43. Each machine learning answer creates some uncertainty “You can use machine learning to retrieve Obama’s birthplace every time you need it, but it costs a lot, and you’re never sure it’s correct.” –Jamie Taylor of Google 43
  • 44. Efficiency argument for knowledge graphs “Why would you ever use a 96-layer, 156 billion parameter large language model to do multiplication, when that’s something you can do in a single operation on your CPU?” “Why internalize knowledge in an LLM, when you can externalize it in a graph store and look it up when you need it?” “Use LLMs where they are efficient.” – Denny Vrandečić of the Wikimedia Foundation 44
  • 45. To scale FAIR data, use an assisted, hybrid AI approach 45 Amit Sheth, From NLP to NLU: Why we need varied, comprehensive, and stratified knowledge (Neuro-symbolic AI),” USC Information Sciences Institute on YouTube, March 2023, https://www.youtube.com/watch?v=xyxQXka6dRY&t=2377s
  • 46. 46 How hybrid AI helps in research “LLMs have amazing abilities in manipulating natural language text, but generating timely and factually verified recommendations is one thing LLMs are not naturally great at.” –Mike Tung, CEO of Diffbot Diffbot Blog, April 2023, https://blog.diffbot.com/generating-company-recommendations-usi ng-large-language-models-and-knowledge-graphs/ LLMs aren’t a reliable research tool alone because they hallucinate. you can’t trust the answers unless you know the answer already. Mike Tung recommends more precise prompting on the query side and answer verification via a knowledge graph such as Diffbot. Both of these capabilities harness precise logical description missing in current LLM Q&As.
  • 47. NLP’s compost grinder data mentality 47 https://pixabay.com/photos/compost-grinder-compost-chipper-3389088/
  • 48. Versus KGs growing naturally in companion plant mode 48 Rich data ecosystems evolve naturally by comparison with underdescribed, fragmented data assets Zero-copy integration becomes possible, reducing complexity, labor and energy waste by up to 90 percent Second-order cybernetics (humans in the loop) and precise facts and contextualization complement probabilistic methods https://www.fruitsaladtrees.com/blogs/news/ediblegarden
  • 49. AI’s Wave III: Less wasteful, more explicit smart data management via a knowledge graph foundation 49
  • 50. Thoughts and Reactions? Feel free to ping me anytime with questions, etc. Alan Morrison Data Science Central LinkedIn | Twitter | Quora | Slideshare +1 408 205 5109 a.s.morrison@gmail.com 50
  • 51. NLP versus NLU: Most true understanding is unclaimed territory 51 Unclaimed data market territory FAIR* Actionability Immediacy Divining purpose Divining intent Synthesis Reasoning Abstraction Contextualization Connection Classification Identification Unclaimed market territory Staked claims Present vs Future Data Market Map 12 steps to FAIR data power *Findable, accessible, interoperable, reusable data Reach of current ML efforts
  • 55. Stochastic parrots and hallucination 55
  • 57. Teaching LLMs to query knowledge graphs 57
  • 59. Semantic community LLM use results 59
  • 60. Goal: Develop FAIR data efficiently 60
  • 61. Thoughts and Reactions? Feel free to ping me anytime with questions, etc. Alan Morrison Data Science Central LinkedIn | Twitter | Quora | Slideshare +1 408 205 5109 a.s.morrison@gmail.com 61