Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning & Embeddings for Large Knowledge Graphs

Slides for my summer school talk on ML and Embeddings for Knowledge Graphs

Machine Learning & Embeddings for Large Knowledge Graphs

  1. 1. 7/2/19 Heiko Paulheim 1 Machine Learning & Embeddings for Large Knowledge Graphs Heiko Paulheim
  2. 2. 7/2/19 Heiko Paulheim 2 Crossing the Bridge from the Other Side
  3. 3. 7/2/19 Heiko Paulheim 3 Crossing the Bridge from the Other Side • There are plenty of established ML and DM toolkits... – Weka – RapidMiner – scikit-learn – R • ...implementing all your favorite algorithms... – Naive Bayes – Random Forests – SVMs – (Deep) Neural Networks – ... • ...but they all work on feature vectors, not graphs!
  4. 4. 7/2/19 Heiko Paulheim 4 Typical Tasks • Knowledge Graph Internal – Type prediction – Link prediction – Link validation • Knowledge Graph External – i.e., using the KG as background knowledge in some other task – e.g., content-based recommender systems – e.g., predictive modeling ● who is the next nobel prize winner? Gao et al.: Link Prediction Methods and Their Accuracy for Different Social Networks and Network Metrics. Scientific Programming, 2014 Xu et al.: Explainable Reasoning over Knowledge Graphs for Recommendation. ebay tech blog, 2019
  5. 5. 7/2/19 Heiko Paulheim 5 Example: Knowledge Graph Internal • Type prediction – Many instances in KGs are not typed or have very abstract types – e.g., many actors are just typed as persons • Classic approach – Exploit ontology – Shown to be rather sensitive to noise • Example: ontology-based typing of Germany in DBpedia – Airport, Award, Building, City, Country, Ethnic Group, Genre, Language, Military Conflict, Mountain, Mountain Range, Person Function, Place, Populated Place, Race, Route of Transportation, Settlement, Stadium, Wine Region Paulheim & Bizer: Type Inference on Noisy RDF Data. ISWC, 2013 Melo et al.: Type Prediction in Noisy RDF Knowledge Bases using Hierarchical Multilabel Classification with Graph and Latent Features. IJAIT, 2017
  6. 6. 7/2/19 Heiko Paulheim 6 Example: Knowledge Graph Internal • Alternative: learn model for type prediction – Train classifier to predict types (binary or hierarchical) – More noise tolerant Paulheim & Bizer: Improving the quality of linked data using statistical distributions. IJSWIS, 2014
  7. 7. 7/2/19 Heiko Paulheim 7 Example: Knowledge Graph External • Example machine learning task: predicting book sales ISBN City Sold 3-2347-3427-1 Darmstadt 124 3-43784-324-2 Mannheim 493 3-145-34587-0 Roßdorf 14 ... ISBN City Population ... Genre Publisher ... Sold 3-2347-3427-1 Darm- stadt 144402 ... Crime Bloody Books ... 124 3-43784-324-2 Mann- heim 291458 … Crime Guns Ltd. … 493 3-145-34587-0 Roß- dorf 12019 ... Travel Up&Away ... 14 ... → Crime novels sell better in larger cities Paulheim & Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data. WIMS, 2012
  8. 8. 7/2/19 Heiko Paulheim 8 Example: The FeGeLOD Framework IS B N 3 -2 3 4 7 -3 4 2 7 -1 C ity D a r m s ta d t # s o ld 1 2 4 N a m e d E n t it y R e c o g n it io n IS B N 3 -2 3 4 7 -3 4 2 7 - 1 C ity D a r m s ta d t # s o ld 1 2 4 C ity _ U R I h ttp : / / d b p e d ia .o r g / r e s o u r c e/ D a r m s ta d t F e a t u r e G e n e r a t io n IS B N 3 - 2 3 4 7 -3 4 2 7 -1 C ity D a r m s ta d t # s o ld 1 2 4 C ity _ U R I h ttp : / / d b p e d ia .o r g / r e s o u r c e / D a r m s ta d t C ity _ U R I_ d b p e d ia -o w l: p o p u la tio n T o ta l 1 4 1 4 7 1 C ity _ U R I_ ... ... F e a t u r e S e le c t io n IS B N 3 -2 3 4 7 -3 4 2 7 - 1 C ity D a r m s ta d t # s o ld 1 2 4 C ity _ U R I h ttp : / / d b p e d ia .o r g / r e s o u r c e/ D a r m s ta d t C ity _ U R I_ d b p e d ia -o w l:p o p u la tio n T o ta l 1 4 1 4 7 1 Paulheim & Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data. WIMS, 2012
  9. 9. 7/2/19 Heiko Paulheim 9 The FeGeLOD Framework • Entity Recognition – Simple approach: guess DBpedia URIs – Hit rate >95% for cities and countries (by English name) • Feature Generation – augmenting the dataset with additional attributes from KG • Feature Selection – Filter noise: >95% unknown, identical, or different nominals Paulheim & Fürnkranz: Unsupervised Generation of Data Mining Features from Linked Open Data. WIMS, 2012
  10. 10. 7/2/19 Heiko Paulheim 10 Propositionalization • Bridge Problem: Knowledge Graphs vs. ML algorithms expecting Feature Vectors → wanted: a transformation from nodes to sets of features ? Ristoski & Paulheim: A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data. LD4KD, 2014
  11. 11. 7/2/19 Heiko Paulheim 11 Propositionalization • Bridge Problem: Knowledge Graphs vs. ML algorithms expecting Feature Vectors → wanted: a transformation from nodes to sets of features • Basic strategies: – literal values (e.g., population) are used directly – instance types become binary features – relations are counted (absolute, relative, TF-IDF) – combinations of relations and object types are counted (absolute, relative, TF-IDF) – ... Ristoski & Paulheim: A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data. LD4KD, 2014
  12. 12. 7/2/19 Heiko Paulheim 12 Propositionalization ctd. • Observations – Even simple features (e.g., add all numbers and types) can help on many problems – More sophisticated features often bring additional improvements ● Combinations of relations and individuals – e.g., movies directed by Steven Spielberg ● Combinations of relations and types – e.g., movies directed by Oscar-winning directors ● … – But ● The search space is enormous! ● Generate first, filter later does not scale well Ristoski & Paulheim: A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data. LD4KD, 2014
  13. 13. 7/2/19 Heiko Paulheim 13 From Naive Propositionalization to Knowledge Graph Embeddings • Reconsidering the previous examples: – We want to predict some attribute of a KG entity ● e.g., types ● e.g., sales figures of books – ...given the entity’s vector representation • How do we get a “good” vector representation for an entity? – ...and: what is “good” in the first place?
  14. 14. 7/2/19 Heiko Paulheim 14 From Naive Propositionalization to Knowledge Graph Embeddings • How do we get a “good” vector representation for an entity? – ...and: what is “good” in the first place? • “good” for machine learning means separable – similar entities are close together – different entities are further away https://appliedmachinelearning.blog/2017/03/09/understanding-support-vector-machines-a-primer/
  15. 15. 7/2/19 Heiko Paulheim 15 A Brief Excursion to word2vec • A vector space model for words • Introduced in 2013 • Each word becomes a vector – similar words are close – relations are preserved – vector arithmetics are possible https://www.adityathakker.com/introduction-to-word2vec-how-it-works/
  16. 16. 7/2/19 Heiko Paulheim 16 A Brief Excursion to word2vec • Assumption: – Similar words appear in similar contexts {Bush,Obama,Trump} was elected president of the United States United States president {Bush,Obama,Trump} announced… … • Idea – Train a network that can predict a word from its context (CBOW) or the context from a word (Skip Gram) Mikolov et al.: Efficient Estimation of Word Representations in Vector Space. 2013
  17. 17. 7/2/19 Heiko Paulheim 17 A Brief Excursion to word2vec • Skip Gram: train a neural network with one hidden layer • Use output values at hidden layer as vector representation • Observation: – Bush, Obama, Trump will activate similar context words – i.e., their output weights at the projection layer have to be similar Mikolov et al.: Efficient Estimation of Word Representations in Vector Space. 2013
  18. 18. 7/2/19 Heiko Paulheim 18 From word2vec to RDF2vec • Word2vec operates on sentences, i.e., sequences of words • Idea of RDF2vec – First extract “sentences” from a graph – Then train embedding using RDF2vec • “Sentences” are extracted by performing random graph walks: Year Zero Nine Inch Nails Trent Reznor • Experiments – RDF2vec can be trained on large KGs (DBpedia, Wikidata) – 300-500 dimensional vectors outperform other propositionalization strategies artist member Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
  19. 19. 7/2/19 Heiko Paulheim 19 From word2vec to RDF2vec • RDF2vec example – similar instances form clusters – direction of relations is stable Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
  20. 20. 7/2/19 Heiko Paulheim 20 From word2vec to RDF2vec • RecSys example: using proximity in latent RDF2vec feature space Ristoski et al.: RDF2Vec: RDF Graph Embeddings and their Applications. SWJ 10(4), 2019
  21. 21. 7/2/19 Heiko Paulheim 21 Extensions of RDF2vec • Maybe random walks are not such a good idea – They may give too much weight on less-known entities and facts ● Strategies: – Prefer edges with more frequent predicates – Prefer nodes with higher indegree – Prefer nodes with higher PageRank – … – They may cover less-known entities and facts too little ● Strategies: – The opposite of all of the above strategies • Bottom line of experimental evaluation: – Not one strategy fits all Cochez et al.: Biased Graph Walks for RDF Graph Embeddings. WIMS, 2017
  22. 22. 7/2/19 Heiko Paulheim 22 Other Word Embedding Methods • GloVe (Global Word Embedding Vectors) • Computes embeddings out of co-occurence statistics – Using matrix factorization • Has been applied to random RDF walks as well • Experimental evaluation: – In some cases, RDFGloVe outperforms RDF2vec https://www.kdnuggets.com/2018/04/implementing-deep-learning-methods-feature-engineering-text-data- glove.html Cochez et al.: Global RDF Vector Space Embeddings, ISWC, 2017
  23. 23. 7/2/19 Heiko Paulheim 23 Other Word Embedding Methods • There is a lot of promising stuff not yet tried – e.g., biasing walks based on human factors – e.g., more recent word embedding methods such as ELMo and BERT https://www.nbcnews.com/feature/nbc-out/bert-ernie-are-gay-couple-sesame-street-writer-claims-n910701
  24. 24. 7/2/19 Heiko Paulheim 24 TransE and its Descendants • In RDF2vec, relation preservation is a by-product • TransE: direct modeling – Formulates RDF embedding as an optimization problem – Find mapping of entities and relations to Rn so that ● across all triples <s,p,o> Σ ||s+p-o|| is minimized ● try to obtain a smaller error for existing triples than for non-existing ones Bordes et al: Translating Embeddings for Modeling Multi-relational Data. NIPS 2013. Fan et al.: Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete Repositories. WI 2016
  25. 25. 7/2/19 Heiko Paulheim 25 Limitations of TransE • Symmetric properties – we have to minimize ||Barack + spouse – Michelle|| and ||Michelle + spouse – Barack|| simultaneously – ideally, Barack + spouse = Michelle and Michelle + spouse = Barack ● Michelle and Barack become infinitely close ● spouse becomes 0 vector Michelle Barack
  26. 26. 7/2/19 Heiko Paulheim 26 Limitations of TransE • Transitive Properties – we have to minimize ||Miami + partOf – Florida|| and ||Florida + partOf – USA||, but also ||Miami + partOf – USA|| – ideally, Miami + partOf = Florida, Florida + partOf = USA, Miami + partOf = USA ● Again: all three become infinitely close ● partOf becomes 0 vector Florida Miami USA
  27. 27. 7/2/19 Heiko Paulheim 27 Limitations of TransE • One to many properties – we have to minimize ||New York + partOf – USA||, ||Florida + partOf – USA||, ||Ohio + partOf – USA||, … – ideally, NewYork + partOf = USA, Florida + partOf = USA, Ohio + partOf = USA ● all the subjects become infinitely close Florida USA New York Ohio
  28. 28. 7/2/19 Heiko Paulheim 28 Limitations of TransE • Reflexive properties – we have to minimize ||Tom + knows - Tom|| – ideally, Tom + knows = Tom ● Knows becomes 0 vector Tom
  29. 29. 7/2/19 Heiko Paulheim 29 TransE RDF2Vec HolE DistMult RESCAL NTN TransR TransH TransD KG2E ComplEx Limitations of TransE • Numerous variants of TransE have been proposed to overcome limitations (e.g., TransH, TransR, TransD, …) • Plus: embedding approaches based on tensor factorization etc.
  30. 30. 7/2/19 Heiko Paulheim 30 Are we Driving on the Wrong Side of the Road?
  31. 31. 7/2/19 Heiko Paulheim 31 Are we Driving on the Wrong Side of the Road? • Original ideas: – Assign meaning to data – Allow for machine inference – Explain inference results to the user Berners-Lee et al: The Semantic Web. Scientific American, May 2001
  32. 32. 7/2/19 Heiko Paulheim 32 Running Example: Recommender Systems • Content based recommender systems backed by Semantic Web data – (today: knowledge graphs) • Advantages – use rich background information about recommended items (for free) – justifications can be generated (e.g., you like movies by that director) https://lazyprogrammer.me/tutorial-on-collaborative-filtering-and-matrix-factorization-in-python/
  33. 33. 7/2/19 Heiko Paulheim 33 The 2009 Semantic Web Layer Cake
  34. 34. 7/2/19 Heiko Paulheim 34 The 2019 Semantic Web Layer Cake Embeddings
  35. 35. 7/2/19 Heiko Paulheim 35 Towards Semantic Vector Space Embeddings cartoon superhero Ristoski et al.: RDF2Vec: RDF Graph Embeddings and their Applications. SWJ 10(4), 2019
  36. 36. 7/2/19 Heiko Paulheim 36 The Holy Grail • Combine semantics and embeddings – e.g., directly create meaningful dimensions – e.g., learn interpretation of dimensions a posteriori – ...
  37. 37. 7/2/19 Heiko Paulheim 37 A New Design Space quantitative performance semantic interpretability
  38. 38. 7/2/19 Heiko Paulheim 38 Software to Check Out • http://openke.thunlp.org/ – Implements many embedding approaches – Pre-trained vectors available, e.g., for Wikidata
  39. 39. 7/2/19 Heiko Paulheim 39 Software to Check Out • Loading RDF in Python: https://github.com/RDFLib/rdflib
  40. 40. 7/2/19 Heiko Paulheim 40 RapidMiner Linked Open Data Extension caution: works only until RM6! :-(
  41. 41. 7/2/19 Heiko Paulheim 41 References (1) • Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web. Scientific american, 284(5), 28-37. • Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In NIPS (pp. 2787-2795). • Cochez, M., Ristoski, P., Ponzetto, S. P., & Paulheim, H. (2017). Biased graph walks for RDF graph embeddings. In WIMS (p. 21). ACM. • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. • Melo, A., Völker, J., & Paulheim, H. (2017). Type prediction in noisy RDF knowledge bases using hierarchical multilabel classification with graph and latent features. IJAIT, 26(02). • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. • Paulheim, H., & Fümkranz, J. (2012). Unsupervised generation of data mining features from linked open data. In WIMS (p. 31). ACM. • Paulheim, H., & Bizer, C. (2013). Type inference on noisy RDF data. In International semantic web conference (pp. 510-525). Springer, Berlin, Heidelberg.
  42. 42. 7/2/19 Heiko Paulheim 42 References (2) • Paulheim, H., & Bizer, C. (2014). Improving the quality of linked data using statistical distributions. IJSWIS, 10(2), 63-86. • Paulheim, H. (2017). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, 8(3), 489-508. • Paulheim, H. (2018). Make Embeddings Semantic Again! ISWC (Blue Sky Track) • Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365. • Ristoski, P., & Paulheim, H. (2014). A comparison of propositionalization strategies for creating features from linked open data. Linked Data for Knowledge Discovery, 6. • Ristoski, P., Bizer, C., & Paulheim, H. (2015). Mining the web of linked data with rapidminer. Web Semantics: Science, Services and Agents on the World Wide Web, 35, 142-151. • Ristoski, P., & Paulheim, H. (2016). Semantic Web in data mining and knowledge discovery: A comprehensive survey. Web semantics, 36, 1-22. • Ristoski, P., & Paulheim, H. (2016). RDF2vec: RDF graph embeddings for data mining. In International Semantic Web Conference (pp. 498-514). Springer, Cham. • Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., & Paulheim, H. (2019). RDF2Vec: RDF graph embeddings and their applications. Semantic Web, 10(4), 1-32.
  43. 43. 7/2/19 Heiko Paulheim 43 Machine Learning & Embeddings for Large Knowledge Graphs Heiko Paulheim

    Be the first to comment

    Login to see the comments

  • SilviaGiannini

    Jul. 2, 2019
  • vivekkhetan1

    Jul. 2, 2019
  • MartinVoigt

    Jul. 3, 2019
  • pebbie

    Jul. 3, 2019
  • cvardeman

    Jul. 5, 2019
  • RaphalGazzotti

    Feb. 19, 2020
  • pierrelarmande

    Mar. 27, 2020

Slides for my summer school talk on ML and Embeddings for Knowledge Graphs

Views

Total views

1,463

On Slideshare

0

From embeds

0

Number of embeds

52

Actions

Downloads

31

Shares

0

Comments

0

Likes

7

×