SlideShare a Scribd company logo
1 of 35
Download to read offline
Graph Foundations for
Advanced Analytics and
Collaboration
Alan Morrison
TechTarget ML and AI Summit
September 20, 2022
An argument for knowledge graph foundations for AI
Proposition: Without an scalable, contextualized data or knowledge foundation,
every major advanced analytics project becomes a one-off.
This presentation makes the case for richer, more relevant and timely analytics
made possible with a knowledge graph foundation and data-centric architecture.
The observations that inspire ongoing research about KG/DCA methods often
have to do with taking knowledge representation and management techniques
honed over decades into the realm of data management with the help of
semantics and graph technology.
The technical problem to solve to build next-generation systems that bring the
right data together in the right way at the right time for the right purpose.
2
AI’s data/knowledge problem: Provincial IT legacy infrastructure
3
● Thousands of databases per enterprise (siloing)
● Thousands of applications (code sprawl)
● Data models buried in the app code
● Every app a special snowflake with its own data model
Consider the oil & gas industry
as a model:
● Pipeline networks
● Refineries
● Transportation networks
● Thousands of refined
petroleum use cases
First step: Build knowledge graphs and link them
5
Linked Open Data Cloud, 2022
Starter triple for a knowledge graph
A standard knowledge graph consists of triplified, relationship-rich
data. The data model, or ontology, is also described in triples and
lives with the rest of the data. Ontologies can also be managed as
data. Linking triples merely requires a verb (or predicate, or
described edge) to link them.
Knowledge graphs share data, context— and rules
6
Example Datalog expression by Joachim x775,
from his primer
Connected logic in the graph
7
Biggest challenge: Mentality of provincial IT is still prevalent
today
● We have the means today to build an intelligent web, a shared AI
resource
● But we have the tribal, siloed mentality of the 2000s:
○ Business units subscribe to their own SaaSes
○ IT departments defend their own turf
○ Only tabular, structured data is catalogued
○ Data, content and knowledge are all managed separately
○ Data is treated as inorganic and static, rather than organically
8
A broader view of intelligence
9
Some definitions of intelligence and analytics
10
According to Webster, the term “intelligence” can refer to:
● the ability to learn or understand or to deal with new or trying situations
● the skilled use of reason
● the ability to apply knowledge to manipulate one's environment or to
think abstractly
● information concerning an enemy or possible enemy or an area
● the act of understanding
The term “analytics”, meanwhile, refers only to:
● The method of logical analysis
Agency approach to enterprise intelligence vs. passivity
11
Intelligence agency approach Approach of most enterprises
Find the right questions to ask Ask questions everyone else asks
Collect whatever you may need,
however,whenever and wherever you can
Use siloed applications to quantify, tabulate
and analyze
Use a holistic lifecycle management
approach for all collected data
Focus on what’s easily quantifiable
Embrace an investigators’ culture of
disambiguation, synthesis and abstraction
Fix problems in the immediate dataset for
the immediate purpose
Multiple, interacting feedback loops:
Triangulate across sources and share the
disambiguations
Determine sources on a project-by-project
basis; single small feedback loops
To succeed, organizations will have to become more
like intelligence agencies–bona fide data-centric
organizations
12
A broader view of AI
13
The three waves of AI
14
ML tribes need to collaborate to make The Third Wave real
15
Data foundation of analytics: Comparability at multiple
levels and contextualization for all relevant points of view
Just some of those levels are listed below:
● Individual things
● Classes of things
● Disparate aggregations of things
● Interactions of things or groups
● Changes of things or groups
● Creations of things or groups
● Degradation or elimination of things or groups
● Lifecycles and other patterns of evolution
● Lifecycle changes and durations
16
This simultaneous articulation,
disambiguation abstraction,
and coherent
context–knowledge
representation–is what
semantics gives you.
Building a true knowledge
graph gives semantics (in the
ontology or data model) a
place to live and grow
symbiotically and organically
with instance data.
Semantic graph innovations in data
management
17
An effective data model describes and unifies the contexts necessary for true data
integration. It gives machines enough clues to detect and discover layered context.
“What is data integration?
Let's start with a short list of what data integration is not:
● It's not shoveling data around between systems.
● It's not calling an API.
● It's not creating a data connection to a source system.
It can include one or more of the jobs in the list here above, but what is the
ingredient that cannot be missing?
It's connecting data from different source systems together in a consistent and
coherent data model.”
–Wouter Trappers, CDAO
What’s a data model? What is data integration?
18
Knowledge graphs put any-to-any relationships first
19
Case study examples
20
Scaled out, purpose-specific intelligence platforms
and communities
21
Blue Brain
Nexus–Reverse
engineering the brain
Diffbot–Crawling the whole
web for ecommerce
intelligence
Strise.ai–Bringing together
160,000 sources for
Anti-money laundering and
fraud detection
Montefiore/Einstein–Improvin
g hospital outcomes and
efficiencies at the same time
with a KG foundation
AirBnB–Seeking out
new opportunities via
relationship mining in
its knowledge graph
Blue Brain Nexus–graph-based Bioinformatics collaboration
22
Blue Brain Nexus knowledge graph uses
23
Serves to unify most
data handling,
management and
transformation functions
Starting point: Find out
what we can about the
neocortical
microcircuits of rats,
given ten years’s worth
of heterogeneous data
on these circuits.
Montefiore Health’s Patient-centered Analytical Learning
Machine – (“PALM”) – Personalized medicine at scale
24
AirBnB’s relationship mining (i.e., contextual computing)
25
Diffbot’s automated e-commerce graph
26
Diffbot’s
Knowledge As a
Service (KaaS)
relates and brings
a billion
ecommerce facts
together in a
single automated
graph. The goal of
this graph is more
trustworthy
analytics.
The company
offers a labeled
training set service
for ML.
Diffbot’s “fact” reliability is probability ranked
27
Source: Mike Tung’s demo during his VLDB2020 keynote presentation
Graphs with decentralized storage: End-to-end,
scalable intelligence sharing for supply chains
28
Graphmetrix: Smart document sharing for
large-scale construction projects using SOLID pods
OriginTrail.io: Decentralized
supply chain tracking and tracing
using knowledge graphs +
blockchains
More trusted supply-chain collaboration with the
help of customer-managed data.
Knowledge graphs as contemporary data
integration and sharing platforms
29
Most of the code in knowledge graph-based systems is in
the data model
30
Result: Reusability and agility at scale
31
The more data mature the
company, the more able the
company is to skate along
pathways to new opportunity.
Reducing observational bias
32
Alleviates the “drunk under the lamppost looking for his
money” problem
Linking together contexts from many
heterogeneous data sources makes it
possible discover connections and answers
that weren’t discoverable in data warehouses,
for example.
Seven obstacles to adoption of collaborative AI
environments
33
Surprise–Transformation requires transformative methods
Key methods
● Diagnosing the root cause
● Openness to new approaches
● Building a new foundation, step by step
● Focusing on key, but manageable pain points first
● Picking the right teams to lead innovation projects
● Proving the value of the solution you’re building
● Infiltrating the organizational tribes that are at first resistant
● Then long-term commitment by leadership, with a bit of faith
34
Q&A
35
Feel free to ping me anytime with questions, etc.
Alan Morrison
Data Science Central
LinkedIn | Twitter | Quora | Slideshare
+1 408 205 5109
a.s.morrison@gmail.com

More Related Content

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Graph Foundations for Advanced Analytics and Collaboration

  • 1. Graph Foundations for Advanced Analytics and Collaboration Alan Morrison TechTarget ML and AI Summit September 20, 2022
  • 2. An argument for knowledge graph foundations for AI Proposition: Without an scalable, contextualized data or knowledge foundation, every major advanced analytics project becomes a one-off. This presentation makes the case for richer, more relevant and timely analytics made possible with a knowledge graph foundation and data-centric architecture. The observations that inspire ongoing research about KG/DCA methods often have to do with taking knowledge representation and management techniques honed over decades into the realm of data management with the help of semantics and graph technology. The technical problem to solve to build next-generation systems that bring the right data together in the right way at the right time for the right purpose. 2
  • 3. AI’s data/knowledge problem: Provincial IT legacy infrastructure 3 ● Thousands of databases per enterprise (siloing) ● Thousands of applications (code sprawl) ● Data models buried in the app code ● Every app a special snowflake with its own data model
  • 4. Consider the oil & gas industry as a model: ● Pipeline networks ● Refineries ● Transportation networks ● Thousands of refined petroleum use cases
  • 5. First step: Build knowledge graphs and link them 5 Linked Open Data Cloud, 2022 Starter triple for a knowledge graph A standard knowledge graph consists of triplified, relationship-rich data. The data model, or ontology, is also described in triples and lives with the rest of the data. Ontologies can also be managed as data. Linking triples merely requires a verb (or predicate, or described edge) to link them.
  • 6. Knowledge graphs share data, context— and rules 6 Example Datalog expression by Joachim x775, from his primer
  • 7. Connected logic in the graph 7
  • 8. Biggest challenge: Mentality of provincial IT is still prevalent today ● We have the means today to build an intelligent web, a shared AI resource ● But we have the tribal, siloed mentality of the 2000s: ○ Business units subscribe to their own SaaSes ○ IT departments defend their own turf ○ Only tabular, structured data is catalogued ○ Data, content and knowledge are all managed separately ○ Data is treated as inorganic and static, rather than organically 8
  • 9. A broader view of intelligence 9
  • 10. Some definitions of intelligence and analytics 10 According to Webster, the term “intelligence” can refer to: ● the ability to learn or understand or to deal with new or trying situations ● the skilled use of reason ● the ability to apply knowledge to manipulate one's environment or to think abstractly ● information concerning an enemy or possible enemy or an area ● the act of understanding The term “analytics”, meanwhile, refers only to: ● The method of logical analysis
  • 11. Agency approach to enterprise intelligence vs. passivity 11 Intelligence agency approach Approach of most enterprises Find the right questions to ask Ask questions everyone else asks Collect whatever you may need, however,whenever and wherever you can Use siloed applications to quantify, tabulate and analyze Use a holistic lifecycle management approach for all collected data Focus on what’s easily quantifiable Embrace an investigators’ culture of disambiguation, synthesis and abstraction Fix problems in the immediate dataset for the immediate purpose Multiple, interacting feedback loops: Triangulate across sources and share the disambiguations Determine sources on a project-by-project basis; single small feedback loops
  • 12. To succeed, organizations will have to become more like intelligence agencies–bona fide data-centric organizations 12
  • 13. A broader view of AI 13
  • 14. The three waves of AI 14
  • 15. ML tribes need to collaborate to make The Third Wave real 15
  • 16. Data foundation of analytics: Comparability at multiple levels and contextualization for all relevant points of view Just some of those levels are listed below: ● Individual things ● Classes of things ● Disparate aggregations of things ● Interactions of things or groups ● Changes of things or groups ● Creations of things or groups ● Degradation or elimination of things or groups ● Lifecycles and other patterns of evolution ● Lifecycle changes and durations 16 This simultaneous articulation, disambiguation abstraction, and coherent context–knowledge representation–is what semantics gives you. Building a true knowledge graph gives semantics (in the ontology or data model) a place to live and grow symbiotically and organically with instance data.
  • 17. Semantic graph innovations in data management 17
  • 18. An effective data model describes and unifies the contexts necessary for true data integration. It gives machines enough clues to detect and discover layered context. “What is data integration? Let's start with a short list of what data integration is not: ● It's not shoveling data around between systems. ● It's not calling an API. ● It's not creating a data connection to a source system. It can include one or more of the jobs in the list here above, but what is the ingredient that cannot be missing? It's connecting data from different source systems together in a consistent and coherent data model.” –Wouter Trappers, CDAO What’s a data model? What is data integration? 18
  • 19. Knowledge graphs put any-to-any relationships first 19
  • 21. Scaled out, purpose-specific intelligence platforms and communities 21 Blue Brain Nexus–Reverse engineering the brain Diffbot–Crawling the whole web for ecommerce intelligence Strise.ai–Bringing together 160,000 sources for Anti-money laundering and fraud detection Montefiore/Einstein–Improvin g hospital outcomes and efficiencies at the same time with a KG foundation AirBnB–Seeking out new opportunities via relationship mining in its knowledge graph
  • 22. Blue Brain Nexus–graph-based Bioinformatics collaboration 22
  • 23. Blue Brain Nexus knowledge graph uses 23 Serves to unify most data handling, management and transformation functions Starting point: Find out what we can about the neocortical microcircuits of rats, given ten years’s worth of heterogeneous data on these circuits.
  • 24. Montefiore Health’s Patient-centered Analytical Learning Machine – (“PALM”) – Personalized medicine at scale 24
  • 25. AirBnB’s relationship mining (i.e., contextual computing) 25
  • 26. Diffbot’s automated e-commerce graph 26 Diffbot’s Knowledge As a Service (KaaS) relates and brings a billion ecommerce facts together in a single automated graph. The goal of this graph is more trustworthy analytics. The company offers a labeled training set service for ML.
  • 27. Diffbot’s “fact” reliability is probability ranked 27 Source: Mike Tung’s demo during his VLDB2020 keynote presentation
  • 28. Graphs with decentralized storage: End-to-end, scalable intelligence sharing for supply chains 28 Graphmetrix: Smart document sharing for large-scale construction projects using SOLID pods OriginTrail.io: Decentralized supply chain tracking and tracing using knowledge graphs + blockchains More trusted supply-chain collaboration with the help of customer-managed data.
  • 29. Knowledge graphs as contemporary data integration and sharing platforms 29
  • 30. Most of the code in knowledge graph-based systems is in the data model 30
  • 31. Result: Reusability and agility at scale 31 The more data mature the company, the more able the company is to skate along pathways to new opportunity.
  • 32. Reducing observational bias 32 Alleviates the “drunk under the lamppost looking for his money” problem Linking together contexts from many heterogeneous data sources makes it possible discover connections and answers that weren’t discoverable in data warehouses, for example.
  • 33. Seven obstacles to adoption of collaborative AI environments 33
  • 34. Surprise–Transformation requires transformative methods Key methods ● Diagnosing the root cause ● Openness to new approaches ● Building a new foundation, step by step ● Focusing on key, but manageable pain points first ● Picking the right teams to lead innovation projects ● Proving the value of the solution you’re building ● Infiltrating the organizational tribes that are at first resistant ● Then long-term commitment by leadership, with a bit of faith 34
  • 35. Q&A 35 Feel free to ping me anytime with questions, etc. Alan Morrison Data Science Central LinkedIn | Twitter | Quora | Slideshare +1 408 205 5109 a.s.morrison@gmail.com