Ever wonder how these concepts contrast with and yet complement each other in a next-generation system?
Enterprise semantics
Knowledge graphs
Model-driven development
Digital twins
Self-Sovereign Identity
Own your own data
Data deduplication
Autonomous agents
Large language systems
Data-Centric Architecture combines the major technologies behind each of these concepts. In fact, it’s essential to the real-world implementation of general AI, enabling the context that’s behind contextual computing, DARPA’s Third Phase of AI. To be able to deliver, DCA needs to simplify and scale data ecosystems using these pieces of the data ecosystem puzzle.
This talk will provide an overview of how these pieces of the data-centric puzzle are fitting together. It’s a best practice to see these pieces can fit together side-by-size in an enterprise context and envision next-gen systems from the viewpoint of some of the most demanding enterprise use cases.
It’s also best practice to study how one industry vertical is moving ahead and contrast that progress with your own industry. Remember, as the data-centric ecosystem emerges and the benefits of true digitization start to pay off, many more techniques can be borrowed from other verticals and used in your own vertical. This talk will summarize several powerful recent case studies and highlight the key takeaways.
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
DCA Symposium 6 Feb 2023.pdf
1. How semantic
systems are
coming together
Alan Morrison
Enterprise Data Transformation
Symposium
Presented on February 6, 2023
1
2. Topics covered in this talk:
● Semantics
● Data centricity
● Knowledge graph and data mesh types in use
● Decentralization and semantics
● Digital twins and agents vs APIs
● Reducing duplication and rework
● Large language models and semantics
2
4. Web semantics harnesses the power of machine-readable
knowledge models to create quality data shared at scale
4
John Sowa, AWS, 2020
Semantics is the
science of shared
meaning in the form of
contextualized data
5. Semantics is the means to FAIR, smart, siloless data sharing
5
James Kobelius, 2016
Association of European Libraries, 2017
6. Problem: Semantics is the creative, capable stepmother the kids
all resent
● The kids miss Legacy Mom and insist on keeping the house the way Mom had
it–despite its evident problems.
● Legacy Dad is slowly dying. He plans to will the bulk of the estate to the
stepmother, for good reason–she knows how to manage, keep the family
together–and fix what’s wrong with the house.
● The stepmother has a great vision for how to fix the house and bring the family
together, but the kids won’t hear of it.
6
8. How to fix the house – one unified, multi-domain model
8
9. Solution: How shared graph semantics helps
● Boosts meaningful results (result of lack of data and logic transparency and
cohesiveness) and relevancy
● Contextualizes data for better management and reuse with relationship logic
● Scales meaningful connections between contexts (relevant relationships living with
entities)
● Enables Metcalfe’s network of networks effect (network_effectN
)
● Enables model-driven development (code once, reuse anywhere)
● Spans the management gap between structured data and unstructured “content”
(content being digitized and thus a subset of data)
● Scales overall data (and most application logic) management capability (organic
growth and evolution of the full resource)
● Moves beyond APIs to empower digital twins and agents (self-describing subgraphs
and the agents who do the message management)
9
10. Data centricity = more human-machine
interaction from a lifecycle perspective
10
11. Terpsichore: Human-in-the-loop semantic data lifecycle for
urban heritage/smart cities
11
An iterative, bottom-up,
user-driven process:
● User engagement
● Collection
● Digestion
● Semantic
classification
● Automated
suggestion loops
Results:
● Enrichment of
useful data
collections
● Improved dialogue
between user
communities
Artopoulos, Giorgos & Smaniotto
Costa, Carlos. (2019). Data-Driven
Processes in Participatory Urbanism:
The “Smartness” of Historical Cities.
Architecture and Culture. 7. 1-19.
10.1080/20507828.2019.1631061.
13. Problem: The “modern data stack” perpetuates the
application-centric architecture
13
From the Modern Data Stack to Knowledge Graphs by Bob Muglia, RelationalAI, Knowledge Graph Conference, June 2022
A different
database and
data model for
every app
14. Solution: RelationalAI and Goldman Sachs collaborate on
semantic, data-centric, model-driven apps
14
"The model becomes the program, and so
business analysts can become involved, and
make changes to the data structures.
"Think about thousands of people getting involved
who know about the business — think about that!"
– Bob Muglia, RelationalAI, Knowledge Graph
Conference, June 2022
Legend apps are available via
the GS app store.
16. Simple web hosting + legacy Client-Server
storage
Early Web (on Client-Server)
Compute and storage more loosely coupled,
virtualized, controlled and data-centric
“Decoupled” and “Decentralized” Cloud
Application Distribution via Proprietary
and IP Networking
Client-Server and Desktops
Commodity servers + storage + some
virtualization
Distributed Cloud and Mobile Devices
1st
2nd
3rd
4th
5th
Centralized storage and compute, with
minimal networking
Mainframe and Green Screens
The Five Commingled Phases of Compute, Networking and Storage
16
Less
centralized
Time
More
centralized
Application
Centric
Data
Centric
All phases are
still active and
evolving
17. File:Decentralization.jpg, by Adam Aladdin, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=35018016
Data centralization versus decentralization
17
Ethereum’s contribution:
Each peer node can play a
role in confirming blocks of
transactions.
This method also enables
tamperproof smart
contracts, or legal
agreements expressed in
self-executing code.
P2P data networks such as
IPFS + blockchains =
decentralized infrastructure
that enables dApps
Has a host, but one
that’s less of a
bottleneck
18. Evolution of open source decentralized file sharing,
decentralized and file systems
18
Erik Daniel and Florian Tschorsch, “ IPFS and Friends: A Qualitative Comparison of Next-Generation Peer-to-Peer Networks,” 2021,
https://arxiv.org/abs/2102.12737.
19. Shared transactions require tamperproof ledgers
19
Blockchains are
shared tamperproof
ledgers of concise,
deterministic
transaction
messages.
The graph
provides the
iterative
collaboration
and refined
data and logic
sharing loop.
Without the
data quality of a
knowledge
graph,
blockchains are
garbage
in/garbage out.
21. Data ownership and control is becoming a major bone of
contention
21
“Every time you drive (a post-2017 Tesla), it records the whole track of
where you drive, the GPS coordinates and certain other metrics for
every mile driven.
“They say that they are anonymizing the trigger results, but you could
probably match everything to a single person if you wanted to.”
–Anonymous reverse engineer of Tesla data, as quoted by Mark Harris in IEEE Spectrum, Aug 2022
22. Self-sovereign identity = personal or B2B data ownership/control
22
Markus Sabadello, “Decentralized IDentifers (DIDs),” W3C Workshop on Privacy and Linked Data, Vienna, 2018
Amazon controls
the user
agreements, data
and how it’s stored
User controls PII
and grants
permission and
access; PII stays in
place
PII = Personally
Identifiable
Information
23. Content addressing = rich, end-to-end encrypted identities
for represented entities
23
https://commons.wikimedia.org/wiki/File:Identity-concept.svg
Representation,
linking and
encryption are all
automated and
built into P2P data
networks.
You choose
whether or not to
share your content
addressed graph
with others, and if
so, how.
25. Example dCloud services base infrastructure today:
IPFS
25
“In IPFS, content* is delivered from the closest peers
that possess a copy of the content removing the
single-node pressure and improving the user
experience.”
–zK Capital Research, “IPFS: The Interplanetary File
system,” 2018
*Content infrastructure and management = data infrastructure and
management.
IPFS = Interplanetary File System
P2P
26. The InterPlanetary File System versus HTTP
26
Rachael Zisk, “Lockheed and Filecoin Foundation Partner to Deploy IPFS,” Payload, May 2022
29. OriginTrail + BSI’s supply chain tracking and tracing
29
OriginTrail and the British Standards Institute (BSI), https://twitter.com/origin_trail/status/1339606640887152642?s=20, Dec. 2020
The Monasteriven
whiskey produced in
Ireland is tracked and
traced from “grain to
glass” with the
OriginTrail.io
approach.
OT uses
decentralized
knowledge graph that
connects to one of
several different
blockchains.
This method enables
shared data reuse
and other synergies
across the supply
chain.
30. SOLID: Federated storage and decentralized apps
30
Ruben Verborgh, “Decentralizing personal data management with Solid: a hands-on workshop,” SEMIC Workshop, October 2020
31. SOLID shared, federated XaaS: Construction industry
31
“TrinPod™: World's first conceptually indexed space-time
digital twin using Solid,” Graphmetrix, 2022,
https://graphmetrix.com/trinpod
Company-specific SOLID storage pods and access
control can be managed by each supply chain partner.
Graphmetrix as digital twin provider manages the
system and system-level apps.
32. Digital twins and agents: Better data sharing than APIs?
32
Autonomous agents
Digital twins
Locale: Portsmouth, UK
Sensor nets
Iotics, 2019
and 2023
34. JP Morgan Chase creates a different lake for each product
domain
34
Raj Grover of Transform Partner and AWS, 2023
Claim is that the data
mesh is the means of
secure, FAIR data
36. Example of ChatGPT being led astray by a clever user
36
Mike Igartúa
(u/mikeigartua) on Reddit
37. In December, tech Q&A site Stack Overflow banned ChatGPT
37
“Overall, because the average rate of getting correct answers
from ChatGPT is too low, the posting of answers created by
ChatGPT is substantially harmful to the site and to users who are
asking or looking for correct answers,”
– “Temporary policy: ChatGPT is banned,” Stack Overflow, December 2022
38. A data management guru’s assessment of ChatGPT
38
“Don’t get me wrong, the technology is great in theory and I can see
many wonderful use cases for it. But if we are not VERY VERY
careful we will end up with the ens***tening of knowledge.”
– Daragh O’Brien, Managing Director of Castlebridge and Irish Computer Society Fellow
39. Reaction to Open AI’s success with ChatGPT
39
Google has invested about $300mn in artificial intelligence start-up
Anthropic, making it the latest tech giant to throw its money and
computing power behind a new generation of companies trying to claim a
place in the booming field of “generative AI”.
– Financial Times, 3 Feb 2023
https://www.ft.com/content/583ead66-467c-4bd5-84d0-ed5df7b5bf9c
40. Today, humans in the ChatGPT quality loop are labelers
40
Renu Khandelwal, “A Basic Understanding of the ChatGPT Model,” December 2022
https://arshren.medium.com/a-basic-understanding-of-the-chatgpt-model-92aba741eea1
41. Semantic properties in biochemistry are at the atomic layer?
41
Large language models (LLMs) are helping
biochemists discover new protein sequences.
Syntax helps identify chemically valid molecules
at a high level..
But semantics describes emergent properties,
i.e., what atoms are present and how they are
connected to each other.
At left, three molecules with the same identical
formula, but different semantic properties:
● Resorcinol, an antiseptic and disinfectant
● Hydroquinone, a skin lightening agent
● Catechol, a toxic molecule
Francesca Grisoni,
Chemical language models for de novo drug design: Challenges and opportunities,
Current Opinion in Structural Biology,
Volume 79,2023,102527,ISSN 0959-440X,
https://doi.org/10.1016/j.sbi.2023.102527.
(https://www.sciencedirect.com/science/article/pii/S0959440X23000015)
43. Humans-in-the loop = second-order cybernetics:
Involving users and SMEs to create context with the help
of machines
43
First order
(Engineer
outside box)
Second order
(Users and
domain
experts inside
box)
Stewart Brand, et al., Co-Evolution Quarterly, 1976
44. Seven obstacles to adoption of decentralized,
interorganizational environments
44
45. Q&A
45
Feel free to ping me anytime with questions, etc.
Alan Morrison
Data Science Central
LinkedIn | Twitter | Quora | Slideshare
+1 408 205 5109
a.s.morrison@gmail.com