SlideShare a Scribd company logo
making sense of text and data
October, 2019
Connected Data London
Semantic Similarity for Faster
Knowledge Graph Delivery at Scale
Why Knowledge Graphs?
“Cross-industry studies show that on average, less than half of an
organization’s structured data is actively used in making decisions—and
less than 1% of its unstructured data is analyzed or used at all”
What’s Your Data Strategy? Leandro DalleMule and Thomas H. Davenport, Harvard Business Review
Top 5 USA
Banks
Presentation Outline
Enterprise Knowledge Graphs
Smart Graphs with Embeddings
Implementing Knowledge Graphs
Presentation Outline
What is a Knowledge Graph?
Graph, Semantics, Smart, Alive
Multiple Enterprise Data Management Systems
KG platforms combine capabilities of several enterprise systems:
o Master and reference data management
o Corporate/Enterprise Taxonomy
o Datawarehouse
o Metadata management
o Digital asset management
o Enterprise search
Challenges in Enterprise Semantic Integration
Type Titles
TV Episodes 4’044’529
Short film 681’067
Feature film 516’726
Video 164’061
TV series 164’061
TV movies 126’206
… …
Total * 5’838’514
Type Titles
film 235’707
silent short film 16’377
television film 15’345
short film 11’225
animated film 3’785
… …
… …
Total 289’650
IMDB WikiData
* Later the tests use only 5K crawled datasets
Challenges in Enterprise Semantic Integration
Multiple levels of inconsistencies:
o Types: film vs “TV movie”
o Meta-data: “science fiction”, “military
science fiction” vs “Sci-Fi”
o Reference data: “US” vs. “United States”
o Manually curated cross-links (!) for testing
purposes only
A Classical Approach
o Start with string matching of the Titles
“Harry Potter and the Deathly Hallows: Part II” vs.
“Harry Potter and the Deathly Hallows – Part 2”
“Perfume: The Story of a Murderer” vs “Perfume”
“Pirate Radio” vs. “The Boat That Rocked”
“Avatar” vs ”Avatar” (4 movies)
A Classical Approach with extra Rules
o Add release date matching
Lose 10% of the matches due to bad dates
o Ambiguity is greatly reduced but still many:
tt0238520
16 October 1995
50 min
tt1125875
11 April 1995
48 min
tt0238520
23 June 1995
1h 21 min
Presentation Outline
Enterprise Knowledge Graphs
Smart Graphs with Embeddings
Implementing Knowledge Graphs
Presentation Outline
What is Knowledge Graph Embedding?
o Predict similar graph nodes or properties
o Require no input training data
o Mathematical representation of graph nodes as vectors:
duration
drama
comedy
The Godfather
(2h 58m)
American Pie
(1h 15 min)
vs.
o For each film include all actors, director, country of origin
o Vast matrix with entities and literals
Knowledge Graph Embedding Example
Movie [Actor]
“Adam
LeFevre”
[Actor]
“Anthony
Anderson
”
[Actor]
“Mia
Farrow”
[Country]
“France”
[Country]
”US”
[Country]
”United
states”
[Director]”
Luc
Besson”
…
wd:
Q550232
1 1 1 1 1
imdb:
tt0344854
1 1 1 1
... … … … … … … … …
TermsDocument
Random Indexing (RI) Algorithm
o Reduces the matrix dimension
with elemental vectors
For each term, w calculate a context vector S(w) by
summing the index vectors of all elemental vectors
x appearing in the context of w
o Light-weight and fast
(250K x 1.45M matrix in < 5m)
o Fast sub-second searches and
requires limited RAM
Actors
Movie
Adam
LeFevre
Anthony
Anderson
Mia
Farrow
Elemental
vectors
wd:
Q550232
1 1 1
imdb:
tt0344854
1 0 1
... … … …
Random Indexing (RI) Algorithm #2
o Supports similarity searches for:
Document to Document – similar movies
Document to Term – specific actor/director
Term to Term – similar actor/directors
Term to Document – find movies specific for this
actor/director
o Features all properties of a
Vector Space model
o Partial matching, weights, ranking + context
sensitive semantic search
Actors
Movie
Adam
LeFevre
Anthony
Anderson
Mia
Farrow
Elemental
vectors
wd:
Q550232
1 1 1
imdb:
tt0344854
1 0 1
... … … …
Presentation Outline
Enterprise Knowledge Graphs
Smart Graphs with Embeddings
Implementing Knowledge Graphs
Presentation Outline
KG Consumers
GraphDB
Reference Software Architecture
o Easy consumption of data
o No backend development
o Flexible data processing tools
o Standard and open interfaces
Ontotext Platform
GQL query
SPARQL
RDF /
Structured
data
GQL
mutation
GQL
Federation
Similarity
Plugin
Transform CSV to RDF
o Perform standard ETL tasks
o Trim spaces, parse numbers and dates
o Parse IMDB ids from links for testing
o Map table data to RDF
o SPARQL over tabular data
o Split multi-valued fields like ”Action|Thriller”
o Not yet applied schema level
alignment
Similarity Plugin API
subject predicate object
wd:Q550232 :actor “Adam LeFevre”
imdb:tt0344854 :actor "Adam LeFevre”
… … …
o Accepts a graph described by <s, p, o>
o Indexes any RDF types
o Works with virtual overlays like:
“Adam LeFevre”
imdb:
tt0344854
wd:
Q550232
“Adam LeFevre”
wd:Q2702
964
rdfs:label
wdt:P161
imdb:actor_2_name
Specify KG Embeddings – Select Predicates
o Similarity plugin expects triples <s, p, o>
Specify KG Embeddings – Align Schema
o Set a translation table of the predicates
Results
o Find similar RDF resources to “Pirate Radio”
o Even a limited set of predicates return acceptable results
o Important independent alternative for entity matching
Important Design Considerations
o Prefer RDF over Property Graph
o Much richer technology ecosystem (schema, dataset, reasoning, strings vs things)
o Virtualization versus Consolidation
o Virtualization works only for simple lookup queries, but not real data integration
o Push result federation to the GraphQL data consumption layer
o Integrating Random Indexing in the KG database
o Push heavy computation as closest to the data
o Choose GraphQL over SPARQL for app developers:
Questions & Answering

More Related Content

Similar to Semantic similarity for faster Knowledge Graph delivery at scale

lecture04_movie_discussion.pdf
lecture04_movie_discussion.pdflecture04_movie_discussion.pdf
lecture04_movie_discussion.pdf
KRISLAM4
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent Applications
Machine Learning Prague
 
An analytic framework for estimating puzzle quality
An analytic framework for estimating puzzle qualityAn analytic framework for estimating puzzle quality
An analytic framework for estimating puzzle qualityguestd6c836
 
An analytic framework for estimating puzzle quality
An analytic framework for estimating puzzle qualityAn analytic framework for estimating puzzle quality
An analytic framework for estimating puzzle qualitysblom
 
"Why the Semantic Web will Never Work" (note the quotes)
"Why the Semantic Web will Never Work"  (note the quotes)"Why the Semantic Web will Never Work"  (note the quotes)
"Why the Semantic Web will Never Work" (note the quotes)
James Hendler
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
Paco Nathan
 
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...
Abhay Prakash
 
Simple Slide Design and Data Visualization Crash Course
Simple Slide Design and Data Visualization Crash CourseSimple Slide Design and Data Visualization Crash Course
Simple Slide Design and Data Visualization Crash Course
Bessie Chu
 
Semantics In Declarative Systems
Semantics In Declarative SystemsSemantics In Declarative Systems
Semantics In Declarative Systems
Optum
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
DianaGray10
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
Inside Analysis
 
Evolving as a professional software developer
Evolving as a professional software developerEvolving as a professional software developer
Evolving as a professional software developer
Anton Kirillov
 
Traversing Graphs with Gremlin
Traversing Graphs with GremlinTraversing Graphs with Gremlin
Traversing Graphs with Gremlin
Artem Chebotko
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
Paco Nathan
 
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Jeff Magnusson
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011sssw2011
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
Dataiku
 
Visualizing your data in JavaScript
Visualizing your data in JavaScriptVisualizing your data in JavaScript
Visualizing your data in JavaScript
Mandi Cai
 
Graphs for Ai and ML
Graphs for Ai and MLGraphs for Ai and ML
Graphs for Ai and ML
Neo4j
 
State of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendState of NLP and Amazon Comprehend
State of NLP and Amazon Comprehend
Egor Pushkin
 

Similar to Semantic similarity for faster Knowledge Graph delivery at scale (20)

lecture04_movie_discussion.pdf
lecture04_movie_discussion.pdflecture04_movie_discussion.pdf
lecture04_movie_discussion.pdf
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent Applications
 
An analytic framework for estimating puzzle quality
An analytic framework for estimating puzzle qualityAn analytic framework for estimating puzzle quality
An analytic framework for estimating puzzle quality
 
An analytic framework for estimating puzzle quality
An analytic framework for estimating puzzle qualityAn analytic framework for estimating puzzle quality
An analytic framework for estimating puzzle quality
 
"Why the Semantic Web will Never Work" (note the quotes)
"Why the Semantic Web will Never Work"  (note the quotes)"Why the Semantic Web will Never Work"  (note the quotes)
"Why the Semantic Web will Never Work" (note the quotes)
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...
 
Simple Slide Design and Data Visualization Crash Course
Simple Slide Design and Data Visualization Crash CourseSimple Slide Design and Data Visualization Crash Course
Simple Slide Design and Data Visualization Crash Course
 
Semantics In Declarative Systems
Semantics In Declarative SystemsSemantics In Declarative Systems
Semantics In Declarative Systems
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Evolving as a professional software developer
Evolving as a professional software developerEvolving as a professional software developer
Evolving as a professional software developer
 
Traversing Graphs with Gremlin
Traversing Graphs with GremlinTraversing Graphs with Gremlin
Traversing Graphs with Gremlin
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
 
Visualizing your data in JavaScript
Visualizing your data in JavaScriptVisualizing your data in JavaScript
Visualizing your data in JavaScript
 
Graphs for Ai and ML
Graphs for Ai and MLGraphs for Ai and ML
Graphs for Ai and ML
 
State of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendState of NLP and Amazon Comprehend
State of NLP and Amazon Comprehend
 

More from Connected Data World

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van Harmelen
Connected Data World
 
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora Lassila
Connected Data World
 
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Connected Data World
 
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine Learning
Connected Data World
 
Graphs in sustainable finance
Graphs in sustainable financeGraphs in sustainable finance
Graphs in sustainable finance
Connected Data World
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is here
Connected Data World
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
Connected Data World
 
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3
Connected Data World
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data Model
Connected Data World
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Connected Data World
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Connected Data World
 
Graph Realities
Graph RealitiesGraph Realities
Graph Realities
Connected Data World
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Connected Data World
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Connected Data World
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the Web
Connected Data World
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
Connected Data World
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property Graphs
Connected Data World
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
Connected Data World
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGO
Connected Data World
 
What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?
Connected Data World
 

More from Connected Data World (20)

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van Harmelen
 
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora Lassila
 
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
 
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine Learning
 
Graphs in sustainable finance
Graphs in sustainable financeGraphs in sustainable finance
Graphs in sustainable finance
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is here
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
 
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data Model
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
 
Graph Realities
Graph RealitiesGraph Realities
Graph Realities
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the Web
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property Graphs
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGO
 
What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?
 

Recently uploaded

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 

Semantic similarity for faster Knowledge Graph delivery at scale

  • 1. making sense of text and data October, 2019 Connected Data London Semantic Similarity for Faster Knowledge Graph Delivery at Scale
  • 2. Why Knowledge Graphs? “Cross-industry studies show that on average, less than half of an organization’s structured data is actively used in making decisions—and less than 1% of its unstructured data is analyzed or used at all” What’s Your Data Strategy? Leandro DalleMule and Thomas H. Davenport, Harvard Business Review Top 5 USA Banks
  • 3. Presentation Outline Enterprise Knowledge Graphs Smart Graphs with Embeddings Implementing Knowledge Graphs Presentation Outline
  • 4. What is a Knowledge Graph? Graph, Semantics, Smart, Alive
  • 5. Multiple Enterprise Data Management Systems KG platforms combine capabilities of several enterprise systems: o Master and reference data management o Corporate/Enterprise Taxonomy o Datawarehouse o Metadata management o Digital asset management o Enterprise search
  • 6. Challenges in Enterprise Semantic Integration Type Titles TV Episodes 4’044’529 Short film 681’067 Feature film 516’726 Video 164’061 TV series 164’061 TV movies 126’206 … … Total * 5’838’514 Type Titles film 235’707 silent short film 16’377 television film 15’345 short film 11’225 animated film 3’785 … … … … Total 289’650 IMDB WikiData * Later the tests use only 5K crawled datasets
  • 7. Challenges in Enterprise Semantic Integration Multiple levels of inconsistencies: o Types: film vs “TV movie” o Meta-data: “science fiction”, “military science fiction” vs “Sci-Fi” o Reference data: “US” vs. “United States” o Manually curated cross-links (!) for testing purposes only
  • 8. A Classical Approach o Start with string matching of the Titles “Harry Potter and the Deathly Hallows: Part II” vs. “Harry Potter and the Deathly Hallows – Part 2” “Perfume: The Story of a Murderer” vs “Perfume” “Pirate Radio” vs. “The Boat That Rocked” “Avatar” vs ”Avatar” (4 movies)
  • 9. A Classical Approach with extra Rules o Add release date matching Lose 10% of the matches due to bad dates o Ambiguity is greatly reduced but still many: tt0238520 16 October 1995 50 min tt1125875 11 April 1995 48 min tt0238520 23 June 1995 1h 21 min
  • 10. Presentation Outline Enterprise Knowledge Graphs Smart Graphs with Embeddings Implementing Knowledge Graphs Presentation Outline
  • 11. What is Knowledge Graph Embedding? o Predict similar graph nodes or properties o Require no input training data o Mathematical representation of graph nodes as vectors: duration drama comedy The Godfather (2h 58m) American Pie (1h 15 min) vs.
  • 12. o For each film include all actors, director, country of origin o Vast matrix with entities and literals Knowledge Graph Embedding Example Movie [Actor] “Adam LeFevre” [Actor] “Anthony Anderson ” [Actor] “Mia Farrow” [Country] “France” [Country] ”US” [Country] ”United states” [Director]” Luc Besson” … wd: Q550232 1 1 1 1 1 imdb: tt0344854 1 1 1 1 ... … … … … … … … … TermsDocument
  • 13. Random Indexing (RI) Algorithm o Reduces the matrix dimension with elemental vectors For each term, w calculate a context vector S(w) by summing the index vectors of all elemental vectors x appearing in the context of w o Light-weight and fast (250K x 1.45M matrix in < 5m) o Fast sub-second searches and requires limited RAM Actors Movie Adam LeFevre Anthony Anderson Mia Farrow Elemental vectors wd: Q550232 1 1 1 imdb: tt0344854 1 0 1 ... … … …
  • 14. Random Indexing (RI) Algorithm #2 o Supports similarity searches for: Document to Document – similar movies Document to Term – specific actor/director Term to Term – similar actor/directors Term to Document – find movies specific for this actor/director o Features all properties of a Vector Space model o Partial matching, weights, ranking + context sensitive semantic search Actors Movie Adam LeFevre Anthony Anderson Mia Farrow Elemental vectors wd: Q550232 1 1 1 imdb: tt0344854 1 0 1 ... … … …
  • 15. Presentation Outline Enterprise Knowledge Graphs Smart Graphs with Embeddings Implementing Knowledge Graphs Presentation Outline
  • 16. KG Consumers GraphDB Reference Software Architecture o Easy consumption of data o No backend development o Flexible data processing tools o Standard and open interfaces Ontotext Platform GQL query SPARQL RDF / Structured data GQL mutation GQL Federation Similarity Plugin
  • 17. Transform CSV to RDF o Perform standard ETL tasks o Trim spaces, parse numbers and dates o Parse IMDB ids from links for testing o Map table data to RDF o SPARQL over tabular data o Split multi-valued fields like ”Action|Thriller” o Not yet applied schema level alignment
  • 18. Similarity Plugin API subject predicate object wd:Q550232 :actor “Adam LeFevre” imdb:tt0344854 :actor "Adam LeFevre” … … … o Accepts a graph described by <s, p, o> o Indexes any RDF types o Works with virtual overlays like: “Adam LeFevre” imdb: tt0344854 wd: Q550232 “Adam LeFevre” wd:Q2702 964 rdfs:label wdt:P161 imdb:actor_2_name
  • 19. Specify KG Embeddings – Select Predicates o Similarity plugin expects triples <s, p, o>
  • 20. Specify KG Embeddings – Align Schema o Set a translation table of the predicates
  • 21. Results o Find similar RDF resources to “Pirate Radio” o Even a limited set of predicates return acceptable results o Important independent alternative for entity matching
  • 22. Important Design Considerations o Prefer RDF over Property Graph o Much richer technology ecosystem (schema, dataset, reasoning, strings vs things) o Virtualization versus Consolidation o Virtualization works only for simple lookup queries, but not real data integration o Push result federation to the GraphQL data consumption layer o Integrating Random Indexing in the KG database o Push heavy computation as closest to the data o Choose GraphQL over SPARQL for app developers: