Presentation on Knowledge Graph Foundations and how they're used.
Presented at TechTarget ML and AI Summit
September 20, 2022
View the full video recording of this deck at https://www.brighttalk.com/webcast/9059/556690
2. An argument for knowledge graph foundations for AI
Proposition: Without an scalable, contextualized data or knowledge foundation,
every major advanced analytics project becomes a one-off.
This presentation makes the case for richer, more relevant and timely analytics
made possible with a knowledge graph foundation and data-centric architecture.
The observations that inspire ongoing research about KG/DCA methods often
have to do with taking knowledge representation and management techniques
honed over decades into the realm of data management with the help of
semantics and graph technology.
The technical problem to solve to build next-generation systems that bring the
right data together in the right way at the right time for the right purpose.
2
3. AI’s data/knowledge problem: Provincial IT legacy infrastructure
3
● Thousands of databases per enterprise (siloing)
● Thousands of applications (code sprawl)
● Data models buried in the app code
● Every app a special snowflake with its own data model
4. Consider the oil & gas industry
as a model:
● Pipeline networks
● Refineries
● Transportation networks
● Thousands of refined
petroleum use cases
5. First step: Build knowledge graphs and link them
5
Linked Open Data Cloud, 2022
Starter triple for a knowledge graph
A standard knowledge graph consists of triplified, relationship-rich
data. The data model, or ontology, is also described in triples and
lives with the rest of the data. Ontologies can also be managed as
data. Linking triples merely requires a verb (or predicate, or
described edge) to link them.
6. Knowledge graphs share data, context— and rules
6
Example Datalog expression by Joachim x775,
from his primer
8. Biggest challenge: Mentality of provincial IT is still prevalent
today
● We have the means today to build an intelligent web, a shared AI
resource
● But we have the tribal, siloed mentality of the 2000s:
○ Business units subscribe to their own SaaSes
○ IT departments defend their own turf
○ Only tabular, structured data is catalogued
○ Data, content and knowledge are all managed separately
○ Data is treated as inorganic and static, rather than organically
8
10. Some definitions of intelligence and analytics
10
According to Webster, the term “intelligence” can refer to:
● the ability to learn or understand or to deal with new or trying situations
● the skilled use of reason
● the ability to apply knowledge to manipulate one's environment or to
think abstractly
● information concerning an enemy or possible enemy or an area
● the act of understanding
The term “analytics”, meanwhile, refers only to:
● The method of logical analysis
11. Agency approach to enterprise intelligence vs. passivity
11
Intelligence agency approach Approach of most enterprises
Find the right questions to ask Ask questions everyone else asks
Collect whatever you may need,
however,whenever and wherever you can
Use siloed applications to quantify, tabulate
and analyze
Use a holistic lifecycle management
approach for all collected data
Focus on what’s easily quantifiable
Embrace an investigators’ culture of
disambiguation, synthesis and abstraction
Fix problems in the immediate dataset for
the immediate purpose
Multiple, interacting feedback loops:
Triangulate across sources and share the
disambiguations
Determine sources on a project-by-project
basis; single small feedback loops
12. To succeed, organizations will have to become more
like intelligence agencies–bona fide data-centric
organizations
12
15. ML tribes need to collaborate to make The Third Wave real
15
16. Data foundation of analytics: Comparability at multiple
levels and contextualization for all relevant points of view
Just some of those levels are listed below:
● Individual things
● Classes of things
● Disparate aggregations of things
● Interactions of things or groups
● Changes of things or groups
● Creations of things or groups
● Degradation or elimination of things or groups
● Lifecycles and other patterns of evolution
● Lifecycle changes and durations
16
This simultaneous articulation,
disambiguation abstraction,
and coherent
context–knowledge
representation–is what
semantics gives you.
Building a true knowledge
graph gives semantics (in the
ontology or data model) a
place to live and grow
symbiotically and organically
with instance data.
18. An effective data model describes and unifies the contexts necessary for true data
integration. It gives machines enough clues to detect and discover layered context.
“What is data integration?
Let's start with a short list of what data integration is not:
● It's not shoveling data around between systems.
● It's not calling an API.
● It's not creating a data connection to a source system.
It can include one or more of the jobs in the list here above, but what is the
ingredient that cannot be missing?
It's connecting data from different source systems together in a consistent and
coherent data model.”
–Wouter Trappers, CDAO
What’s a data model? What is data integration?
18
21. Scaled out, purpose-specific intelligence platforms
and communities
21
Blue Brain
Nexus–Reverse
engineering the brain
Diffbot–Crawling the whole
web for ecommerce
intelligence
Strise.ai–Bringing together
160,000 sources for
Anti-money laundering and
fraud detection
Montefiore/Einstein–Improvin
g hospital outcomes and
efficiencies at the same time
with a KG foundation
AirBnB–Seeking out
new opportunities via
relationship mining in
its knowledge graph
23. Blue Brain Nexus knowledge graph uses
23
Serves to unify most
data handling,
management and
transformation functions
Starting point: Find out
what we can about the
neocortical
microcircuits of rats,
given ten years’s worth
of heterogeneous data
on these circuits.
26. Diffbot’s automated e-commerce graph
26
Diffbot’s
Knowledge As a
Service (KaaS)
relates and brings
a billion
ecommerce facts
together in a
single automated
graph. The goal of
this graph is more
trustworthy
analytics.
The company
offers a labeled
training set service
for ML.
27. Diffbot’s “fact” reliability is probability ranked
27
Source: Mike Tung’s demo during his VLDB2020 keynote presentation
28. Graphs with decentralized storage: End-to-end,
scalable intelligence sharing for supply chains
28
Graphmetrix: Smart document sharing for
large-scale construction projects using SOLID pods
OriginTrail.io: Decentralized
supply chain tracking and tracing
using knowledge graphs +
blockchains
More trusted supply-chain collaboration with the
help of customer-managed data.
30. Most of the code in knowledge graph-based systems is in
the data model
30
31. Result: Reusability and agility at scale
31
The more data mature the
company, the more able the
company is to skate along
pathways to new opportunity.
32. Reducing observational bias
32
Alleviates the “drunk under the lamppost looking for his
money” problem
Linking together contexts from many
heterogeneous data sources makes it
possible discover connections and answers
that weren’t discoverable in data warehouses,
for example.
34. Surprise–Transformation requires transformative methods
Key methods
● Diagnosing the root cause
● Openness to new approaches
● Building a new foundation, step by step
● Focusing on key, but manageable pain points first
● Picking the right teams to lead innovation projects
● Proving the value of the solution you’re building
● Infiltrating the organizational tribes that are at first resistant
● Then long-term commitment by leadership, with a bit of faith
34
35. Q&A
35
Feel free to ping me anytime with questions, etc.
Alan Morrison
Data Science Central
LinkedIn | Twitter | Quora | Slideshare
+1 408 205 5109
a.s.morrison@gmail.com