SlideShare a Scribd company logo
1 of 43
7/2/19 Heiko Paulheim 1
Machine Learning & Embeddings
for Large Knowledge Graphs
Heiko Paulheim
7/2/19 Heiko Paulheim 2
Crossing the Bridge from the Other Side
7/2/19 Heiko Paulheim 3
Crossing the Bridge from the Other Side
• There are plenty of established ML and DM toolkits...
– Weka
– RapidMiner
– scikit-learn
– R
• ...implementing all your favorite algorithms...
– Naive Bayes
– Random Forests
– SVMs
– (Deep) Neural Networks
– ...
• ...but they all work on feature vectors, not graphs!
7/2/19 Heiko Paulheim 4
Typical Tasks
• Knowledge Graph Internal
– Type prediction
– Link prediction
– Link validation
• Knowledge Graph External
– i.e., using the KG as background knowledge in some other task
– e.g., content-based recommender systems
– e.g., predictive modeling
●
who is the next nobel prize winner?
Gao et al.: Link Prediction Methods and Their Accuracy for Different Social Networks and Network Metrics.
Scientific Programming, 2014
Xu et al.: Explainable Reasoning over Knowledge Graphs for Recommendation. ebay tech blog, 2019
7/2/19 Heiko Paulheim 5
Example: Knowledge Graph Internal
• Type prediction
– Many instances in KGs are not typed or have very abstract types
– e.g., many actors are just typed as persons
• Classic approach
– Exploit ontology
– Shown to be rather sensitive to noise
• Example: ontology-based typing of Germany in DBpedia
– Airport, Award, Building, City, Country, Ethnic Group, Genre,
Language, Military Conflict, Mountain, Mountain Range, Person
Function, Place, Populated Place, Race, Route of Transportation,
Settlement, Stadium, Wine Region
Paulheim & Bizer: Type Inference on Noisy RDF Data. ISWC, 2013
Melo et al.: Type Prediction in Noisy RDF Knowledge Bases using Hierarchical Multilabel Classification with
Graph and Latent Features. IJAIT, 2017
7/2/19 Heiko Paulheim 6
Example: Knowledge Graph Internal
• Alternative: learn model for type prediction
– Train classifier to predict types (binary or hierarchical)
– More noise tolerant
Paulheim & Bizer: Improving the quality of linked data using statistical distributions. IJSWIS, 2014
7/2/19 Heiko Paulheim 7
Example: Knowledge Graph External
• Example machine learning task: predicting book sales
ISBN City Sold
3-2347-3427-1 Darmstadt 124
3-43784-324-2 Mannheim 493
3-145-34587-0 Roßdorf 14
...
ISBN City Population ... Genre Publisher ... Sold
3-2347-3427-1 Darm-
stadt
144402 ... Crime Bloody
Books
... 124
3-43784-324-2 Mann-
heim
291458 … Crime Guns Ltd. … 493
3-145-34587-0 Roß-
dorf
12019 ... Travel Up&Away ... 14
...
→ Crime novels sell better in larger cities
Paulheim & Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data.
WIMS, 2012
7/2/19 Heiko Paulheim 8
Example: The FeGeLOD Framework
IS B N
3 -2 3 4 7 -3 4 2 7 -1
C ity
D a r m s ta d t
# s o ld
1 2 4
N a m e d E n t it y
R e c o g n it io n
IS B N
3 -2 3 4 7 -3 4 2 7 - 1
C ity
D a r m s ta d t
# s o ld
1 2 4
C ity _ U R I
h ttp : / / d b p e d ia .o r g / r e s o u r c e/ D a r m s ta d t
F e a t u r e
G e n e r a t io n
IS B N
3 - 2 3 4 7 -3 4 2 7 -1
C ity
D a r m s ta d t
# s o ld
1 2 4
C ity _ U R I
h ttp : / / d b p e d ia .o r g / r e s o u r c e / D a r m s ta d t
C ity _ U R I_ d b p e d ia -o w l: p o p u la tio n T o ta l
1 4 1 4 7 1
C ity _ U R I_ ...
...
F e a t u r e
S e le c t io n
IS B N
3 -2 3 4 7 -3 4 2 7 - 1
C ity
D a r m s ta d t
# s o ld
1 2 4
C ity _ U R I
h ttp : / / d b p e d ia .o r g / r e s o u r c e/ D a r m s ta d t
C ity _ U R I_ d b p e d ia -o w l:p o p u la tio n T o ta l
1 4 1 4 7 1
Paulheim & Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data.
WIMS, 2012
7/2/19 Heiko Paulheim 9
The FeGeLOD Framework
• Entity Recognition
– Simple approach: guess DBpedia URIs
– Hit rate >95% for cities and countries (by English name)
• Feature Generation
– augmenting the dataset with additional attributes from KG
• Feature Selection
– Filter noise: >95% unknown, identical, or different nominals
Paulheim & Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data.
WIMS, 2012
7/2/19 Heiko Paulheim 10
Propositionalization
• Bridge Problem: Knowledge Graphs
vs. ML algorithms expecting Feature Vectors
→ wanted: a transformation from nodes to sets of features
?
Ristoski & Paulheim: A Comparison of Propositionalization Strategies for Creating Features from Linked
Open Data. LD4KD, 2014
7/2/19 Heiko Paulheim 11
Propositionalization
• Bridge Problem: Knowledge Graphs
vs. ML algorithms expecting Feature Vectors
→ wanted: a transformation from nodes to sets of features
• Basic strategies:
– literal values (e.g., population) are used directly
– instance types become binary features
– relations are counted (absolute, relative, TF-IDF)
– combinations of relations and object types are counted
(absolute, relative, TF-IDF)
– ...
Ristoski & Paulheim: A Comparison of Propositionalization Strategies for Creating Features from Linked
Open Data. LD4KD, 2014
7/2/19 Heiko Paulheim 12
Propositionalization ctd.
• Observations
– Even simple features (e.g., add all numbers and types)
can help on many problems
– More sophisticated features often bring additional improvements
●
Combinations of relations and individuals
– e.g., movies directed by Steven Spielberg
●
Combinations of relations and types
– e.g., movies directed by Oscar-winning directors
●
…
– But
●
The search space is enormous!
●
Generate first, filter later does not scale well
Ristoski & Paulheim: A Comparison of Propositionalization Strategies for Creating Features from Linked
Open Data. LD4KD, 2014
7/2/19 Heiko Paulheim 13
From Naive Propositionalization to
Knowledge Graph Embeddings
• Reconsidering the previous examples:
– We want to predict some attribute of a KG entity
●
e.g., types
●
e.g., sales figures of books
– ...given the entity’s vector representation
• How do we get a “good” vector representation for an entity?
– ...and: what is “good” in the first place?
7/2/19 Heiko Paulheim 14
From Naive Propositionalization to
Knowledge Graph Embeddings
• How do we get a “good” vector representation for an entity?
– ...and: what is “good” in the first place?
• “good” for machine learning means separable
– similar entities are close together
– different entities are further away
https://appliedmachinelearning.blog/2017/03/09/understanding-support-vector-machines-a-primer/
7/2/19 Heiko Paulheim 15
A Brief Excursion to word2vec
• A vector space model for words
• Introduced in 2013
• Each word becomes a vector
– similar words are close
– relations are preserved
– vector arithmetics are possible
https://www.adityathakker.com/introduction-to-word2vec-how-it-works/
7/2/19 Heiko Paulheim 16
A Brief Excursion to word2vec
• Assumption:
– Similar words appear in similar contexts
{Bush,Obama,Trump} was elected president of the United States
United States president {Bush,Obama,Trump} announced…
…
• Idea
– Train a network that can predict a word from its context (CBOW)
or the context from a word (Skip Gram)
Mikolov et al.: Efficient Estimation of Word Representations in Vector Space. 2013
7/2/19 Heiko Paulheim 17
A Brief Excursion to word2vec
• Skip Gram: train a neural network with one hidden layer
• Use output values at hidden layer as vector representation
• Observation:
– Bush, Obama, Trump will activate similar context words
– i.e., their output weights at the projection layer have to be similar
Mikolov et al.: Efficient Estimation of Word Representations in Vector Space. 2013
7/2/19 Heiko Paulheim 18
From word2vec to RDF2vec
• Word2vec operates on sentences, i.e., sequences of words
• Idea of RDF2vec
– First extract “sentences” from a graph
– Then train embedding using RDF2vec
• “Sentences” are extracted by performing random graph walks:
Year Zero Nine Inch Nails Trent
Reznor
• Experiments
– RDF2vec can be trained on large KGs (DBpedia, Wikidata)
– 300-500 dimensional vectors outperform
other propositionalization strategies
artist member
Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
7/2/19 Heiko Paulheim 19
From word2vec to RDF2vec
• RDF2vec example
– similar instances form clusters
– direction of relations is stable
Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
7/2/19 Heiko Paulheim 20
From word2vec to RDF2vec
• RecSys example: using proximity in latent RDF2vec feature space
Ristoski et al.: RDF2Vec: RDF Graph Embeddings and their Applications. SWJ 10(4), 2019
7/2/19 Heiko Paulheim 21
Extensions of RDF2vec
• Maybe random walks are not such a good idea
– They may give too much weight on less-known entities and facts
●
Strategies:
– Prefer edges with more frequent predicates
– Prefer nodes with higher indegree
– Prefer nodes with higher PageRank
– …
– They may cover less-known entities and facts too little
●
Strategies:
– The opposite of all of the above strategies
• Bottom line of experimental evaluation:
– Not one strategy fits all
Cochez et al.: Biased Graph Walks for RDF Graph Embeddings. WIMS, 2017
7/2/19 Heiko Paulheim 22
Other Word Embedding Methods
• GloVe (Global Word Embedding Vectors)
• Computes embeddings out of co-occurence statistics
– Using matrix factorization
• Has been applied to random RDF walks as well
• Experimental evaluation:
– In some cases, RDFGloVe outperforms RDF2vec
https://www.kdnuggets.com/2018/04/implementing-deep-learning-methods-feature-engineering-text-data-
glove.html
Cochez et al.: Global RDF Vector Space Embeddings, ISWC, 2017
7/2/19 Heiko Paulheim 23
Other Word Embedding Methods
• There is a lot of promising stuff not yet tried
– e.g., biasing walks based on human factors
– e.g., more recent word embedding methods such as ELMo and BERT
https://www.nbcnews.com/feature/nbc-out/bert-ernie-are-gay-couple-sesame-street-writer-claims-n910701
7/2/19 Heiko Paulheim 24
TransE and its Descendants
• In RDF2vec, relation preservation is a by-product
• TransE: direct modeling
– Formulates RDF embedding as an optimization problem
– Find mapping of entities and relations to Rn
so that
●
across all triples <s,p,o>
Σ ||s+p-o|| is minimized
●
try to obtain a smaller error
for existing triples
than for non-existing ones
Bordes et al: Translating Embeddings for Modeling Multi-relational Data. NIPS 2013.
Fan et al.: Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete
Repositories. WI 2016
7/2/19 Heiko Paulheim 25
Limitations of TransE
• Symmetric properties
– we have to minimize
||Barack + spouse – Michelle|| and ||Michelle + spouse – Barack||
simultaneously
– ideally, Barack + spouse = Michelle and Michelle + spouse = Barack
●
Michelle and Barack become infinitely close
●
spouse becomes 0 vector
Michelle
Barack
7/2/19 Heiko Paulheim 26
Limitations of TransE
• Transitive Properties
– we have to minimize
||Miami + partOf – Florida|| and ||Florida + partOf – USA||, but also
||Miami + partOf – USA||
– ideally, Miami + partOf = Florida, Florida + partOf = USA,
Miami + partOf = USA
●
Again: all three become infinitely close
●
partOf becomes 0 vector
Florida
Miami
USA
7/2/19 Heiko Paulheim 27
Limitations of TransE
• One to many properties
– we have to minimize
||New York + partOf – USA||, ||Florida + partOf – USA||,
||Ohio + partOf – USA||, …
– ideally, NewYork + partOf = USA, Florida + partOf = USA,
Ohio + partOf = USA
●
all the subjects become infinitely close
Florida
USA
New York
Ohio
7/2/19 Heiko Paulheim 28
Limitations of TransE
• Reflexive properties
– we have to minimize
||Tom + knows - Tom||
– ideally, Tom + knows = Tom
●
Knows becomes 0 vector
Tom
7/2/19 Heiko Paulheim 29
TransE
RDF2Vec
HolE
DistMult
RESCAL
NTN
TransR
TransH
TransD
KG2E
ComplEx
Limitations of TransE
• Numerous variants of TransE have been proposed
to overcome limitations (e.g., TransH, TransR, TransD, …)
• Plus: embedding approaches based on tensor factorization etc.
7/2/19 Heiko Paulheim 30
Are we Driving on the Wrong Side of the Road?
7/2/19 Heiko Paulheim 31
Are we Driving on the Wrong Side of the Road?
• Original ideas:
– Assign meaning to data
– Allow for machine inference
– Explain inference results to the user
Berners-Lee et al: The Semantic Web. Scientific American, May 2001
7/2/19 Heiko Paulheim 32
Running Example: Recommender Systems
• Content based recommender systems backed by Semantic Web
data
– (today: knowledge graphs)
• Advantages
– use rich background information about recommended items (for free)
– justifications can be generated (e.g., you like movies by that director)
https://lazyprogrammer.me/tutorial-on-collaborative-filtering-and-matrix-factorization-in-python/
7/2/19 Heiko Paulheim 33
The 2009 Semantic Web Layer Cake
7/2/19 Heiko Paulheim 34
The 2019 Semantic Web Layer Cake
Embeddings
7/2/19 Heiko Paulheim 35
Towards Semantic Vector Space Embeddings
cartoon
superhero
Ristoski et al.: RDF2Vec: RDF Graph Embeddings and their Applications. SWJ 10(4), 2019
7/2/19 Heiko Paulheim 36
The Holy Grail
• Combine semantics and embeddings
– e.g., directly create meaningful dimensions
– e.g., learn interpretation of dimensions a posteriori
– ...
7/2/19 Heiko Paulheim 37
A New Design Space
quantitative
performance
semantic
interpretability
7/2/19 Heiko Paulheim 38
Software to Check Out
• http://openke.thunlp.org/
– Implements many embedding approaches
– Pre-trained vectors available, e.g., for Wikidata
7/2/19 Heiko Paulheim 39
Software to Check Out
• Loading RDF in Python: https://github.com/RDFLib/rdflib
7/2/19 Heiko Paulheim 40
RapidMiner Linked Open Data Extension
caution: works
only until RM6! :-(
7/2/19 Heiko Paulheim 41
References (1)
• Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web. Scientific american, 284(5),
28-37.
• Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating
embeddings for modeling multi-relational data. In NIPS (pp. 2787-2795).
• Cochez, M., Ristoski, P., Ponzetto, S. P., & Paulheim, H. (2017). Biased graph walks for RDF
graph embeddings. In WIMS (p. 21). ACM.
• Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805.
• Melo, A., Völker, J., & Paulheim, H. (2017). Type prediction in noisy RDF knowledge bases using
hierarchical multilabel classification with graph and latent features. IJAIT, 26(02).
• Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word
representations in vector space. arXiv preprint arXiv:1301.3781.
• Paulheim, H., & Fümkranz, J. (2012). Unsupervised generation of data mining features from
linked open data. In WIMS (p. 31). ACM.
• Paulheim, H., & Bizer, C. (2013). Type inference on noisy RDF data. In International semantic
web conference (pp. 510-525). Springer, Berlin, Heidelberg.
7/2/19 Heiko Paulheim 42
References (2)
• Paulheim, H., & Bizer, C. (2014). Improving the quality of linked data using statistical
distributions. IJSWIS, 10(2), 63-86.
• Paulheim, H. (2017). Knowledge graph refinement: A survey of approaches and evaluation
methods. Semantic web, 8(3), 489-508.
• Paulheim, H. (2018). Make Embeddings Semantic Again! ISWC (Blue Sky Track)
• Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018).
Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
• Ristoski, P., & Paulheim, H. (2014). A comparison of propositionalization strategies for creating
features from linked open data. Linked Data for Knowledge Discovery, 6.
• Ristoski, P., Bizer, C., & Paulheim, H. (2015). Mining the web of linked data with rapidminer. Web
Semantics: Science, Services and Agents on the World Wide Web, 35, 142-151.
• Ristoski, P., & Paulheim, H. (2016). Semantic Web in data mining and knowledge discovery: A
comprehensive survey. Web semantics, 36, 1-22.
• Ristoski, P., & Paulheim, H. (2016). RDF2vec: RDF graph embeddings for data mining. In
International Semantic Web Conference (pp. 498-514). Springer, Cham.
• Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., & Paulheim, H. (2019). RDF2Vec: RDF graph
embeddings and their applications. Semantic Web, 10(4), 1-32.
7/2/19 Heiko Paulheim 43
Machine Learning & Embeddings
for Large Knowledge Graphs
Heiko Paulheim

More Related Content

What's hot

Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix ScaleAish Fenton
 
The Knowledge Graph Explosion
The Knowledge Graph ExplosionThe Knowledge Graph Explosion
The Knowledge Graph ExplosionNeo4j
 
Recommendations at Zillow
Recommendations at ZillowRecommendations at Zillow
Recommendations at Zillownjstevens
 
The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...Neo4j
 
ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox Tsahi Glik
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Mounia Lalmas-Roelleke
 
Graphs in Automotive and Manufacturing - Unlock New Value from Your Data
Graphs in Automotive and Manufacturing - Unlock New Value from Your DataGraphs in Automotive and Manufacturing - Unlock New Value from Your Data
Graphs in Automotive and Manufacturing - Unlock New Value from Your DataNeo4j
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at SpotifyErik Bernhardsson
 
Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...
Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...
Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...Neo4j
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai Wähner
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
LogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology MatchingLogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology MatchingErnesto Jimenez Ruiz
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsNeo4j
 
Deep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptxDeep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptxTomazBratanic1
 
Workshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceWorkshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceNeo4j
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyNeville Li
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 

What's hot (20)

Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix Scale
 
The Knowledge Graph Explosion
The Knowledge Graph ExplosionThe Knowledge Graph Explosion
The Knowledge Graph Explosion
 
Recommendations at Zillow
Recommendations at ZillowRecommendations at Zillow
Recommendations at Zillow
 
The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...
 
ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)
 
Recommending and searching @ Spotify
Recommending and searching @ SpotifyRecommending and searching @ Spotify
Recommending and searching @ Spotify
 
Graphs in Automotive and Manufacturing - Unlock New Value from Your Data
Graphs in Automotive and Manufacturing - Unlock New Value from Your DataGraphs in Automotive and Manufacturing - Unlock New Value from Your Data
Graphs in Automotive and Manufacturing - Unlock New Value from Your Data
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at Spotify
 
Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...
Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...
Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
LogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology MatchingLogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology Matching
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph Algorithms
 
Deep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptxDeep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptx
 
Workshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceWorkshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data Science
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time Analytics
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 

Similar to Machine Learning & Embeddings for Large Knowledge Graphs

New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
 
Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Shenghui Wang
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSHarsh Thakkar
 
Exploiting Linked Open Data as Background Knowledge in Data Mining
Exploiting Linked Open Data as Background Knowledge in Data MiningExploiting Linked Open Data as Background Knowledge in Data Mining
Exploiting Linked Open Data as Background Knowledge in Data MiningHeiko Paulheim
 
BDS14 Big Data Analytics to the masses
BDS14 Big Data Analytics to the massesBDS14 Big Data Analytics to the masses
BDS14 Big Data Analytics to the massesJose Luis Lopez Pino
 
DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesPetar Ristoski
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkGezim Sejdiu
 
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sfSparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sfHarsh Thakkar
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
 
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014James Powell
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?Samet KILICTAS
 
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache SparkThe Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache SparkKrishna Sankar
 
Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Julien PLU
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryRuben Schalk
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczIoan Toma
 
Data-mining the Semantic Web @TCD
Data-mining the Semantic Web @TCDData-mining the Semantic Web @TCD
Data-mining the Semantic Web @TCDFrank Lynam
 

Similar to Machine Learning & Embeddings for Large Knowledge Graphs (20)

New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
 
Exploiting Linked Open Data as Background Knowledge in Data Mining
Exploiting Linked Open Data as Background Knowledge in Data MiningExploiting Linked Open Data as Background Knowledge in Data Mining
Exploiting Linked Open Data as Background Knowledge in Data Mining
 
BDS14 Big Data Analytics to the masses
BDS14 Big Data Analytics to the massesBDS14 Big Data Analytics to the masses
BDS14 Big Data Analytics to the masses
 
DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spaces
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
 
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sfSparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache SparkThe Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
 
Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University Library
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
 
Data-mining the Semantic Web @TCD
Data-mining the Semantic Web @TCDData-mining the Semantic Web @TCD
Data-mining the Semantic Web @TCD
 

More from Heiko Paulheim

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...Heiko Paulheim
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsHeiko Paulheim
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph BlockHeiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Heiko Paulheim
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphHeiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Heiko Paulheim
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Heiko Paulheim
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterHeiko Paulheim
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingHeiko Paulheim
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the WebHeiko Paulheim
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyHeiko Paulheim
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine LearningHeiko Paulheim
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopHeiko Paulheim
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionHeiko Paulheim
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesHeiko Paulheim
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataHeiko Paulheim
 
Linked Open Data enhanced Knowledge Discovery
Linked Open Data enhanced  Knowledge DiscoveryLinked Open Data enhanced  Knowledge Discovery
Linked Open Data enhanced Knowledge DiscoveryHeiko Paulheim
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerHeiko Paulheim
 

More from Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly Detection
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia Entities
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open Data
 
Linked Open Data enhanced Knowledge Discovery
Linked Open Data enhanced  Knowledge DiscoveryLinked Open Data enhanced  Knowledge Discovery
Linked Open Data enhanced Knowledge Discovery
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner
 

Recently uploaded

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 

Recently uploaded (20)

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 

Machine Learning & Embeddings for Large Knowledge Graphs

  • 1. 7/2/19 Heiko Paulheim 1 Machine Learning & Embeddings for Large Knowledge Graphs Heiko Paulheim
  • 2. 7/2/19 Heiko Paulheim 2 Crossing the Bridge from the Other Side
  • 3. 7/2/19 Heiko Paulheim 3 Crossing the Bridge from the Other Side • There are plenty of established ML and DM toolkits... – Weka – RapidMiner – scikit-learn – R • ...implementing all your favorite algorithms... – Naive Bayes – Random Forests – SVMs – (Deep) Neural Networks – ... • ...but they all work on feature vectors, not graphs!
  • 4. 7/2/19 Heiko Paulheim 4 Typical Tasks • Knowledge Graph Internal – Type prediction – Link prediction – Link validation • Knowledge Graph External – i.e., using the KG as background knowledge in some other task – e.g., content-based recommender systems – e.g., predictive modeling ● who is the next nobel prize winner? Gao et al.: Link Prediction Methods and Their Accuracy for Different Social Networks and Network Metrics. Scientific Programming, 2014 Xu et al.: Explainable Reasoning over Knowledge Graphs for Recommendation. ebay tech blog, 2019
  • 5. 7/2/19 Heiko Paulheim 5 Example: Knowledge Graph Internal • Type prediction – Many instances in KGs are not typed or have very abstract types – e.g., many actors are just typed as persons • Classic approach – Exploit ontology – Shown to be rather sensitive to noise • Example: ontology-based typing of Germany in DBpedia – Airport, Award, Building, City, Country, Ethnic Group, Genre, Language, Military Conflict, Mountain, Mountain Range, Person Function, Place, Populated Place, Race, Route of Transportation, Settlement, Stadium, Wine Region Paulheim & Bizer: Type Inference on Noisy RDF Data. ISWC, 2013 Melo et al.: Type Prediction in Noisy RDF Knowledge Bases using Hierarchical Multilabel Classification with Graph and Latent Features. IJAIT, 2017
  • 6. 7/2/19 Heiko Paulheim 6 Example: Knowledge Graph Internal • Alternative: learn model for type prediction – Train classifier to predict types (binary or hierarchical) – More noise tolerant Paulheim & Bizer: Improving the quality of linked data using statistical distributions. IJSWIS, 2014
  • 7. 7/2/19 Heiko Paulheim 7 Example: Knowledge Graph External • Example machine learning task: predicting book sales ISBN City Sold 3-2347-3427-1 Darmstadt 124 3-43784-324-2 Mannheim 493 3-145-34587-0 Roßdorf 14 ... ISBN City Population ... Genre Publisher ... Sold 3-2347-3427-1 Darm- stadt 144402 ... Crime Bloody Books ... 124 3-43784-324-2 Mann- heim 291458 … Crime Guns Ltd. … 493 3-145-34587-0 Roß- dorf 12019 ... Travel Up&Away ... 14 ... → Crime novels sell better in larger cities Paulheim & Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data. WIMS, 2012
  • 8. 7/2/19 Heiko Paulheim 8 Example: The FeGeLOD Framework IS B N 3 -2 3 4 7 -3 4 2 7 -1 C ity D a r m s ta d t # s o ld 1 2 4 N a m e d E n t it y R e c o g n it io n IS B N 3 -2 3 4 7 -3 4 2 7 - 1 C ity D a r m s ta d t # s o ld 1 2 4 C ity _ U R I h ttp : / / d b p e d ia .o r g / r e s o u r c e/ D a r m s ta d t F e a t u r e G e n e r a t io n IS B N 3 - 2 3 4 7 -3 4 2 7 -1 C ity D a r m s ta d t # s o ld 1 2 4 C ity _ U R I h ttp : / / d b p e d ia .o r g / r e s o u r c e / D a r m s ta d t C ity _ U R I_ d b p e d ia -o w l: p o p u la tio n T o ta l 1 4 1 4 7 1 C ity _ U R I_ ... ... F e a t u r e S e le c t io n IS B N 3 -2 3 4 7 -3 4 2 7 - 1 C ity D a r m s ta d t # s o ld 1 2 4 C ity _ U R I h ttp : / / d b p e d ia .o r g / r e s o u r c e/ D a r m s ta d t C ity _ U R I_ d b p e d ia -o w l:p o p u la tio n T o ta l 1 4 1 4 7 1 Paulheim & Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data. WIMS, 2012
  • 9. 7/2/19 Heiko Paulheim 9 The FeGeLOD Framework • Entity Recognition – Simple approach: guess DBpedia URIs – Hit rate >95% for cities and countries (by English name) • Feature Generation – augmenting the dataset with additional attributes from KG • Feature Selection – Filter noise: >95% unknown, identical, or different nominals Paulheim & Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data. WIMS, 2012
  • 10. 7/2/19 Heiko Paulheim 10 Propositionalization • Bridge Problem: Knowledge Graphs vs. ML algorithms expecting Feature Vectors → wanted: a transformation from nodes to sets of features ? Ristoski & Paulheim: A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data. LD4KD, 2014
  • 11. 7/2/19 Heiko Paulheim 11 Propositionalization • Bridge Problem: Knowledge Graphs vs. ML algorithms expecting Feature Vectors → wanted: a transformation from nodes to sets of features • Basic strategies: – literal values (e.g., population) are used directly – instance types become binary features – relations are counted (absolute, relative, TF-IDF) – combinations of relations and object types are counted (absolute, relative, TF-IDF) – ... Ristoski & Paulheim: A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data. LD4KD, 2014
  • 12. 7/2/19 Heiko Paulheim 12 Propositionalization ctd. • Observations – Even simple features (e.g., add all numbers and types) can help on many problems – More sophisticated features often bring additional improvements ● Combinations of relations and individuals – e.g., movies directed by Steven Spielberg ● Combinations of relations and types – e.g., movies directed by Oscar-winning directors ● … – But ● The search space is enormous! ● Generate first, filter later does not scale well Ristoski & Paulheim: A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data. LD4KD, 2014
  • 13. 7/2/19 Heiko Paulheim 13 From Naive Propositionalization to Knowledge Graph Embeddings • Reconsidering the previous examples: – We want to predict some attribute of a KG entity ● e.g., types ● e.g., sales figures of books – ...given the entity’s vector representation • How do we get a “good” vector representation for an entity? – ...and: what is “good” in the first place?
  • 14. 7/2/19 Heiko Paulheim 14 From Naive Propositionalization to Knowledge Graph Embeddings • How do we get a “good” vector representation for an entity? – ...and: what is “good” in the first place? • “good” for machine learning means separable – similar entities are close together – different entities are further away https://appliedmachinelearning.blog/2017/03/09/understanding-support-vector-machines-a-primer/
  • 15. 7/2/19 Heiko Paulheim 15 A Brief Excursion to word2vec • A vector space model for words • Introduced in 2013 • Each word becomes a vector – similar words are close – relations are preserved – vector arithmetics are possible https://www.adityathakker.com/introduction-to-word2vec-how-it-works/
  • 16. 7/2/19 Heiko Paulheim 16 A Brief Excursion to word2vec • Assumption: – Similar words appear in similar contexts {Bush,Obama,Trump} was elected president of the United States United States president {Bush,Obama,Trump} announced… … • Idea – Train a network that can predict a word from its context (CBOW) or the context from a word (Skip Gram) Mikolov et al.: Efficient Estimation of Word Representations in Vector Space. 2013
  • 17. 7/2/19 Heiko Paulheim 17 A Brief Excursion to word2vec • Skip Gram: train a neural network with one hidden layer • Use output values at hidden layer as vector representation • Observation: – Bush, Obama, Trump will activate similar context words – i.e., their output weights at the projection layer have to be similar Mikolov et al.: Efficient Estimation of Word Representations in Vector Space. 2013
  • 18. 7/2/19 Heiko Paulheim 18 From word2vec to RDF2vec • Word2vec operates on sentences, i.e., sequences of words • Idea of RDF2vec – First extract “sentences” from a graph – Then train embedding using RDF2vec • “Sentences” are extracted by performing random graph walks: Year Zero Nine Inch Nails Trent Reznor • Experiments – RDF2vec can be trained on large KGs (DBpedia, Wikidata) – 300-500 dimensional vectors outperform other propositionalization strategies artist member Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
  • 19. 7/2/19 Heiko Paulheim 19 From word2vec to RDF2vec • RDF2vec example – similar instances form clusters – direction of relations is stable Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
  • 20. 7/2/19 Heiko Paulheim 20 From word2vec to RDF2vec • RecSys example: using proximity in latent RDF2vec feature space Ristoski et al.: RDF2Vec: RDF Graph Embeddings and their Applications. SWJ 10(4), 2019
  • 21. 7/2/19 Heiko Paulheim 21 Extensions of RDF2vec • Maybe random walks are not such a good idea – They may give too much weight on less-known entities and facts ● Strategies: – Prefer edges with more frequent predicates – Prefer nodes with higher indegree – Prefer nodes with higher PageRank – … – They may cover less-known entities and facts too little ● Strategies: – The opposite of all of the above strategies • Bottom line of experimental evaluation: – Not one strategy fits all Cochez et al.: Biased Graph Walks for RDF Graph Embeddings. WIMS, 2017
  • 22. 7/2/19 Heiko Paulheim 22 Other Word Embedding Methods • GloVe (Global Word Embedding Vectors) • Computes embeddings out of co-occurence statistics – Using matrix factorization • Has been applied to random RDF walks as well • Experimental evaluation: – In some cases, RDFGloVe outperforms RDF2vec https://www.kdnuggets.com/2018/04/implementing-deep-learning-methods-feature-engineering-text-data- glove.html Cochez et al.: Global RDF Vector Space Embeddings, ISWC, 2017
  • 23. 7/2/19 Heiko Paulheim 23 Other Word Embedding Methods • There is a lot of promising stuff not yet tried – e.g., biasing walks based on human factors – e.g., more recent word embedding methods such as ELMo and BERT https://www.nbcnews.com/feature/nbc-out/bert-ernie-are-gay-couple-sesame-street-writer-claims-n910701
  • 24. 7/2/19 Heiko Paulheim 24 TransE and its Descendants • In RDF2vec, relation preservation is a by-product • TransE: direct modeling – Formulates RDF embedding as an optimization problem – Find mapping of entities and relations to Rn so that ● across all triples <s,p,o> Σ ||s+p-o|| is minimized ● try to obtain a smaller error for existing triples than for non-existing ones Bordes et al: Translating Embeddings for Modeling Multi-relational Data. NIPS 2013. Fan et al.: Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete Repositories. WI 2016
  • 25. 7/2/19 Heiko Paulheim 25 Limitations of TransE • Symmetric properties – we have to minimize ||Barack + spouse – Michelle|| and ||Michelle + spouse – Barack|| simultaneously – ideally, Barack + spouse = Michelle and Michelle + spouse = Barack ● Michelle and Barack become infinitely close ● spouse becomes 0 vector Michelle Barack
  • 26. 7/2/19 Heiko Paulheim 26 Limitations of TransE • Transitive Properties – we have to minimize ||Miami + partOf – Florida|| and ||Florida + partOf – USA||, but also ||Miami + partOf – USA|| – ideally, Miami + partOf = Florida, Florida + partOf = USA, Miami + partOf = USA ● Again: all three become infinitely close ● partOf becomes 0 vector Florida Miami USA
  • 27. 7/2/19 Heiko Paulheim 27 Limitations of TransE • One to many properties – we have to minimize ||New York + partOf – USA||, ||Florida + partOf – USA||, ||Ohio + partOf – USA||, … – ideally, NewYork + partOf = USA, Florida + partOf = USA, Ohio + partOf = USA ● all the subjects become infinitely close Florida USA New York Ohio
  • 28. 7/2/19 Heiko Paulheim 28 Limitations of TransE • Reflexive properties – we have to minimize ||Tom + knows - Tom|| – ideally, Tom + knows = Tom ● Knows becomes 0 vector Tom
  • 29. 7/2/19 Heiko Paulheim 29 TransE RDF2Vec HolE DistMult RESCAL NTN TransR TransH TransD KG2E ComplEx Limitations of TransE • Numerous variants of TransE have been proposed to overcome limitations (e.g., TransH, TransR, TransD, …) • Plus: embedding approaches based on tensor factorization etc.
  • 30. 7/2/19 Heiko Paulheim 30 Are we Driving on the Wrong Side of the Road?
  • 31. 7/2/19 Heiko Paulheim 31 Are we Driving on the Wrong Side of the Road? • Original ideas: – Assign meaning to data – Allow for machine inference – Explain inference results to the user Berners-Lee et al: The Semantic Web. Scientific American, May 2001
  • 32. 7/2/19 Heiko Paulheim 32 Running Example: Recommender Systems • Content based recommender systems backed by Semantic Web data – (today: knowledge graphs) • Advantages – use rich background information about recommended items (for free) – justifications can be generated (e.g., you like movies by that director) https://lazyprogrammer.me/tutorial-on-collaborative-filtering-and-matrix-factorization-in-python/
  • 33. 7/2/19 Heiko Paulheim 33 The 2009 Semantic Web Layer Cake
  • 34. 7/2/19 Heiko Paulheim 34 The 2019 Semantic Web Layer Cake Embeddings
  • 35. 7/2/19 Heiko Paulheim 35 Towards Semantic Vector Space Embeddings cartoon superhero Ristoski et al.: RDF2Vec: RDF Graph Embeddings and their Applications. SWJ 10(4), 2019
  • 36. 7/2/19 Heiko Paulheim 36 The Holy Grail • Combine semantics and embeddings – e.g., directly create meaningful dimensions – e.g., learn interpretation of dimensions a posteriori – ...
  • 37. 7/2/19 Heiko Paulheim 37 A New Design Space quantitative performance semantic interpretability
  • 38. 7/2/19 Heiko Paulheim 38 Software to Check Out • http://openke.thunlp.org/ – Implements many embedding approaches – Pre-trained vectors available, e.g., for Wikidata
  • 39. 7/2/19 Heiko Paulheim 39 Software to Check Out • Loading RDF in Python: https://github.com/RDFLib/rdflib
  • 40. 7/2/19 Heiko Paulheim 40 RapidMiner Linked Open Data Extension caution: works only until RM6! :-(
  • 41. 7/2/19 Heiko Paulheim 41 References (1) • Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web. Scientific american, 284(5), 28-37. • Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In NIPS (pp. 2787-2795). • Cochez, M., Ristoski, P., Ponzetto, S. P., & Paulheim, H. (2017). Biased graph walks for RDF graph embeddings. In WIMS (p. 21). ACM. • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. • Melo, A., Völker, J., & Paulheim, H. (2017). Type prediction in noisy RDF knowledge bases using hierarchical multilabel classification with graph and latent features. IJAIT, 26(02). • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. • Paulheim, H., & Fümkranz, J. (2012). Unsupervised generation of data mining features from linked open data. In WIMS (p. 31). ACM. • Paulheim, H., & Bizer, C. (2013). Type inference on noisy RDF data. In International semantic web conference (pp. 510-525). Springer, Berlin, Heidelberg.
  • 42. 7/2/19 Heiko Paulheim 42 References (2) • Paulheim, H., & Bizer, C. (2014). Improving the quality of linked data using statistical distributions. IJSWIS, 10(2), 63-86. • Paulheim, H. (2017). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, 8(3), 489-508. • Paulheim, H. (2018). Make Embeddings Semantic Again! ISWC (Blue Sky Track) • Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365. • Ristoski, P., & Paulheim, H. (2014). A comparison of propositionalization strategies for creating features from linked open data. Linked Data for Knowledge Discovery, 6. • Ristoski, P., Bizer, C., & Paulheim, H. (2015). Mining the web of linked data with rapidminer. Web Semantics: Science, Services and Agents on the World Wide Web, 35, 142-151. • Ristoski, P., & Paulheim, H. (2016). Semantic Web in data mining and knowledge discovery: A comprehensive survey. Web semantics, 36, 1-22. • Ristoski, P., & Paulheim, H. (2016). RDF2vec: RDF graph embeddings for data mining. In International Semantic Web Conference (pp. 498-514). Springer, Cham. • Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., & Paulheim, H. (2019). RDF2Vec: RDF graph embeddings and their applications. Semantic Web, 10(4), 1-32.
  • 43. 7/2/19 Heiko Paulheim 43 Machine Learning & Embeddings for Large Knowledge Graphs Heiko Paulheim