Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web

437 views

Published on

Current research on knowledge graph completion focuses on employing graph embeddings for the task of link prediction. But knowledge graph completion is more than link prediction and tasks such as adding formerly unknown long-tail entities to the graph, extending the schema of the graph with additional properties, and completing and updating numeric values are equally important tasks. In the talk, Christian Bizer will review recent results on using data from large numbers of independent websites to accomplish these tasks. He will focus on two types of web content – relational HTML tables and semantic annotations within HTML pages – and will discuss the potential of these types of content for set completion, schema extension, and fact checking, as well as their utility as training data for matching textual entity descriptions.

Published in: Data & Analytics
  • Be the first to comment

JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web

  1. 1. Data and Web Science GroupCompleting Knowledge Graphs using Data  from the Open Web 1Nov. 25‐27, 2019, Hangzhou, China Prof. Dr. Christian Bizer JIST2019
  2. 2. Data and Web Science Group – Cross‐Domain Knowledge Graphs – Domain‐Specific Knowledge Graphs Knowledge Graphs Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 2 Product Knowledge Graphs Scholarly Knowledge Graphs Life Science  Knowledge Graphs Enterprise Knowledge Graphs
  3. 3. Data and Web Science Group Knowledge Graphs are Incomplete Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 3 P1 P2 P3 P4 … PN ? ? ? … E1 ? ? ? ? ? ? ? E2 ? ? ? ? ? ? E3 ? ? ? ? … ? ? ? ? ? Em ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? … ? ? ? ? ? ? ? ? ? ? Class in KG P1 Property E1 Entity Fact Source: Luna Dong
  4. 4. Data and Web Science Group 4 Types of Incompleteness P1 P2 P3 P4 … PN ? ? ? … E1 ? ? ? ? ? ? ? E2 ? ? ? ? ? ? E3 ? ? ? ? … ? ? ? ? ? Em ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? … ? ? ? ? ? ? ? ? ? ? Class in KG Relevant facts  are missing Relevant  properties  are missing  in schema Relevant entities are not  described Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  5. 5. Data and Web Science Group 5 Knowledge Graph Completion Tasks P1 P2 P3 P4 … PN ? ? ? … E1 ? ? ? ? ? ? ? E2 ? ? ? ? ? ? E3 ? ? ? ? … ? ? ? ? ? Em ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? … ? ? ? ? ? ? ? ? ? ? Link Prediction add object property values for  existing instances and properties Literal Slot Filling add datatype property values for  existing instances and properties Schema Expansion add and fill new properties Entity Expansion • add new entities • and their descriptions (values  for existing properties) Class in KG Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  6. 6. Data and Web Science Group Outline 1. Internal Knowledge Graph Completion 2. KG Completion using Small Sets of External Sources 3. KG Completion using Web Table Data – Profile of HTML Tables on the Web – Potential for KG Completion 4. KG Completion using Schema.org Annotations – Profile of Schema.org Annotations on Web – Potential for KG Completion 5. Conclusions 6Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  7. 7. Data and Web Science Group 1. Internal Knowledge Graph Completion – Predict missing facts by exploiting patterns in the graph – Classic value imputation applied to knowledge graphs  – Symbolic  methods – Rule learning, e.g. AIME, AnyBURL – Heutistics, e.g. SDType – Sub‐symbolic methods – Knowledge Graph Embeddings Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 7 Paulheim: Knowledge Graph Refinement: A Survey of Approaches and Evaluation Methods. SWJ 2017. Nguyen: An Overview of Embedding Models of Entities and Relationships for Knowledge Base Completion. ArXiv, 2019.
  8. 8. Data and Web Science Group Knowledge Graph Embeddings: Hits@10 Results on Link Prediction Task  8Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 Nguyen: An Overview of Embedding Models of Entities and Relationships for Knowledge Base Completion. ArXiv, 2019.
  9. 9. Data and Web Science Group Weaknesses of Current Research on Knowledge Graph Embeddings 1. Only link prediction covered – What about literal slot filling? Entity expansion? Schema expansion?  2. Evaluations self‐referencial – Hit@10 not relevant for KB completion – Hits@1 performance too bad for practical use – Accuracy evaluated on simplified dataset: FB13, 6 relations dropped! Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 9 Method Hits@1 on FB15K‐237 ComplEx (2018) 0.26 ConvE (2018) 0.24 TuckER (2019) 0.26 AnyBURL (2019) 0.23 (symbolic rule learner) Meilicke, et al.: Fine‐grained Evaluation of Rule‐ and Embedding‐based Systems for Knowledge Graph Completion.  Wang, Ruffiinelli: On Evaluating Embeddings Models for Knowledge Base Completion. RepL4NLP‐2019.
  10. 10. Data and Web Science Group – Open License Sources – Commercial Sources 2. KG Completion Using a Small Set  of External Sources Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 10 Baike
  11. 11. Data and Web Science Group Integration Process – Classic data integration  techniques apply – Small set of sources  – we can invest effort into the integration of each data source – write rules, label training data – quality depends on effort spend – All task are covered – link prediction, slot filling,  – entity expansion, schema expansion Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 11 Data Collection / Extraction Schema Mapping Data Translation Entity Matching Data Quality Assessment Data Fusion AnHai, Halevy, Ives: Principles of Data Integration. Morgan Kaufmann, 2012.
  12. 12. Data and Web Science Group KG Completion Example – YAGO2 completed using Geonames – 7 million additonal entities from Geonames – 10 additional object properties (relations) – 320 million triples from Geonames – Accuracy of spartial relations (manual evaluation) Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 12 Hoffart, et al.: YAGO2: A spatially and temporally enhanced knowledge base extracted from Wikipedia. JWS 2013.
  13. 13. Data and Web Science Group Challenges 1. Coverage of external sources – Relevant entities and relevant properties might not be covered – e.g. Wikipedia 2. Up‐to‐dateness of external sources – Are sources regularly maintained? – e.g. LOD cloud, research data  3. Cost of external sources – Complete and up‐to‐date data often costs money – Large companies can pay, other organizations often not Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 13
  14. 14. Data and Web Science Group Let‘s use the Open Web Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 14
  15. 15. Data and Web Science Group Let‘s use the Open Web – Potentials  1. Coverage of long‐tail entities 2. Coverage of long‐tail properties 3. Content often up‐to‐date – Challenges 1. Heterogeneity 2. Data quality depends on website Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 15
  16. 16. Data and Web Science Group Web Tables 16Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  17. 17. Data and Web Science Group Schema.org Annotations 17 <div  itemtype="http://schema.org/Hotel"> <span itemprop="name">Vienna Marriott Hotel</span> <span itemprop="address" itemtype="http://schema.org/PostalAddress"> <span itemprop="streetAddress">Parkring 12a</span> <span itemprop="addressLocality">Vienna</span> </span> </div> <div  itemtype="http://schema.org/Product"> <h1 itemprop="name">Signature Instant Dome 7</h1> Item # <span itemprop=“productID">200734246807</span> <img itemprop=“image“ src=“../20073424680.jpg"> <span itemprop=“review" itemtype="http://schema.org/Review"> <span itemprop=“ratingValue">5</span> <span itemprop=“reviewBody">Great waterproof tent!</span> </span> </div> Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  18. 18. Data and Web Science Group 3. Web Tables 18 Cafarella, et al.: WebTables: Exploring the Power of Tables on the Web. VLDB 2008. Crestan, Pantel: Web‐Scale Table Census and Classification. WSDM 2011. In corpus of 14B raw tables, 154M are relational tables (1.1%). Cafarella et al. 2008 Our focus Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  19. 19. Data and Web Science Group Web Data Commons – Web Tables Corpus 19 Lehmberg, Ritze, Meusel, Bizer: A Large Public Corpus of Web Tables containing Time and Context Metadata.  WWW2016 Companion. – extracted from Common Crawl 2015 – 1.78 billion pages – 10.2 billion raw HTML tables – Size of the web table corpus – 90 million relational tables (0.9%) – 139 million attribute/value tables (1,3%) – uses 100 machines on Amazon EC2  – approx. 2000 machine/hours  250 € – Download – http://webdatacommons.org/webtables/  Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  20. 20. Data and Web Science Group Task 1: Slot Filling and Link Prediction 20 A1 A2 A3 A4 … AN E1 ? ? ? E2 ? ? E3 … ? Em ? ? ? ? DBpedia Class Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  21. 21. Data and Web Science Group Two Step Process 21 1. Matching 2. Data Fusion IATA Code Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  22. 22. Data and Web Science Group Holistic Entity‐ and Schema Matching 22 – T2K+ matches – millions of web tables to the – DBpedia knowledge base – Exploits  – wide range of features – synergies between matching tasks – table corpus for indirect matching – patterns from KB for post filtering – Results Dominique Ritze, Christian Bizer: Matching Web Tables To DBpedia – A Feature Utility Study. EDBT 2017. Task Precision Recall F1 Class .94 .94 .94 Instance .90 .76 .82 Property .77 .65 .70 Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  23. 23. Data and Web Science Group ISWC2019 Challenge on Tabular Data  to Knowledge Graph Matching – Multiple tasks – CPA task = property matching Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 23 http://www.cs.ox.ac.uk/isg/challenges/sem‐tab/
  24. 24. Data and Web Science Group Large‐Scale Matching Results  24 – 1 million out of 30 million tables match DBpedia (~3%) – 301,450 matching tables have attribute correspondences (~1%)  – Results in 8 million triples describing 717,000 DBpedia entities  Ritze, et al.: Profiling the Potential of Web Tables for Augmenting Cross‐domain Knowledge Bases. WWW2016. Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  25. 25. Data and Web Science Group – Fusion Results – Group Sizes – Large groups (Head): many values  already exist in the KB – Small groups (Tail): often new  value but difficult to fuse correctly Data Fusion Results 25 Fusion Strategy Precision Recall F1 Voting .369 .823 .509 PageRank .365 .814 .504 Knowledge-based Trust .639 .785 .705 Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  26. 26. Data and Web Science Group What is going wrong?  Topical Mismatch 26 Websites contributing many tables # Tables Topic apple.com 50,910 Music baseball‐reference.com 25,647 Sports latestf1news.com 17,726 Sports nascar.com 17,465 Sports amazon.com 16,551 Products wikipedia.org 13,993 Various inkjetsuperstore.com 12,282 Products flightmemory.com 8,044 Flights windshieldguy.com 7,305 Products citytowninfo.com 6,293 Cities blogspot.com 4,762 Various 7digital.com 4,462 Music Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  27. 27. Data and Web Science Group What is going wrong? Attribute Semantics 27 https://www.bls.gov/oes/2017/may/oes_ca.htm Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  28. 28. Data and Web Science Group – Matching and fusion methods expect binary relations  – But many relations in Web tables are not binary! 28 Misinterpretation of Attribute Semantics Occupation code Annual  wage 11‐3021 $128,805 15‐0000 $113,689 15‐1111 $39,180 Occupation code Annual  wage 11‐3021 $93,805 15‐0000 $83,472 15‐1111 $39,180 Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  29. 29. Data and Web Science Group Synthesizing N‐ary Relations from Web  Tables – Exploiting page context: Title and URL – Stitching tables from the same site 29 Oliver Lehmberg, Christian Bizer: Synthesizing N‐ary Relations from Web Tables. WIMS2019. Oliver Lehmberg, Christian Bizer: Stitching web tables for improving matching quality. VLDB Endowment, 2017. Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  30. 30. Data and Web Science Group Amount of N‐ary Relations 30 Oliver Lehmberg, Christian Bizer: Profiling the Semantics of N‐ary Web Table Data. SBD Workshop @ SIGMOD 2019. Results for the WDC2015 corpus: – 37.50% of all relations are non‐binary – 50.47% of these require context attributes Time dimension is often part of keys but also many  other attributes Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  31. 31. Data and Web Science Group Task 2: Add Long‐Tail Entities  to Knowledge Graph 31 A1 A2 A3 A4 … AN ? ? ? … E1 ? ? ? ? ? ? ? E2 ? ? ? ? ? ? E3 ? ? ? ? … ? ? ? ? ? Em ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? … ? ? ? ? ? ? ? ? ? ? Entity Expansion: 1. Find new entities 2. Compile descriptions of  these new entities Class in KB Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  32. 32. Data and Web Science Group Long‐Tail Entity Expansion Pipeline 32 Row  Clustering Entity  Creation New  Detection Knowledge  base New entities added to  knowledge base Schema  Matching Web tables Output of first iteration used to refine the  schema mapping in a second iteration Yaser Oulabi, Christian Bizer: Extending cross‐domain knowledge bases with long tail entities using web table  data. EDBT 2019. Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 1. Cluster rows that describe the same instance together 2. Create entities by fusing rows in clusters 3. Determine which entities describe new instances by comparing then to instances in KB
  33. 33. Data and Web Science Group Entity Expansion Results 33 Class New Entities New Facts New Entities  Precision Goldst. New Facts  Precision Goldst. Football Player 13,983 (+67%) 43,800 (+32%) 0.82 0.85 Song 186,943 (+356%) 393,711 (+125%) 0.72 0.79  ‐  50.000  100.000  150.000  200.000  250.000 Football Player Song Existing New Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 DBpedia has low coverage for songs,  thus it is easy to find new ones
  34. 34. Data and Web Science Group Conclusion: Web Tables – Extremely wide range of very specific attributes – many n‐ary relations with complex keys / time dimension – table context is often required to understand the data – Coverage of long tail entitles beyond Wikipedia – Things work OK, but not good enough yet – fact precision slot filling: 0.64 (without stitching) – new entities accuracy:  0.72‐0.82 – For researchers: Rewarding but challenging domain – For practitioners: Human in the loop still required for  verification 34Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  35. 35. Data and Web Science Group 4. Schema.org Annotations 35 – ask site owners since 2011 to  annotate data for enriching  search results – 675 Types: Event, Place, Local  Business, Product, Review, Person   – Encoding: Microdata, RDFa, JSON‐LD Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  36. 36. Data and Web Science Group Usage of Schema.org Data Data snippets within search results Local businesses on maps Data snippets in knowledge panels Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  37. 37. Data and Web Science Group Web Data Commons – Structured Data 37 – extracts all Microformat, Microdata,  RDFa, JSON‐LD data from the Common Crawl – analyzes and provides the extracted data for download – statistics of some extraction runs – 2018 CC Corpus: 2.5 billion HTML pages  31.5 billion RDF triples – 2017 CC Corpus: 3.1 billion HTML pages  38.2 billion RDF triples – 2014 CC Corpus: 2.0 billion HTML pages  20.4 billion RDF triples – 2010 CC Corpus: 2.8 billion HTML pages  5.1 billion RDF triples – Download – http://webdatacommons.org/structureddata/ Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  38. 38. Data and Web Science Group Overall Adoption 2018 http://webdatacommons.org/structureddata/2018‐12/ 38 944 million HTML pages out of the 2.5 billion pages provide semantic annotations (37.1%). 9.6 million pay-level-domains (PLDs) out of the 32.8 million pay-level-domains covered by the crawl provide semantic annotations (29.3%). Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  39. 39. Data and Web Science Group Frequently used Schema.org Classes 39 Top Classes # Websites (PLDs) Microdata JSON‐LD schema:WebPage 1,124,583 121,393 schema:Product 812,205 40,169 schema:Offer 676,899 57,756 schema:BreadcrumbList 621,344 205,971 schema:Article 612,361 57,082 schema:Organization 510,069 1,349,775 schema:PostalAddress 502,615 176,500 schema:ImageObject 360,875 111,946 schema:Blog 337,843 12,174 schema:Person 324,349 335,784 schema:LocalBusiness 294,390 249,017 schema:AggregateRating 258,078 23,105 schema:Review 124,022 6,622 schema:Place 92,127 66,396 schema:Event 88,130 63,605 http://webdatacommons.org/structureddata/2018‐12/  Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  40. 40. Data and Web Science Group Entities in WDC Dataset for KG Completion 40 http://webdatacommons.org/structureddata/2018‐12/stats/schema_org_subsets.html Class # Entities Names Product 368,386,801 Organization 176,365,220 Local Business 44,100,625 Place 31,748,612 Event 26,035,295 Movie 5,233,520 Hotel 2,537,089 College or University 2,392,797 Recipe 2,036,417 Music Album 1,594,941 Restaurant 1,480,443 Airport 1,145,323 as Common Crawl is rather shallow Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  41. 41. Data and Web Science Group Use Case 1: Completing KG Describing Books and Movies – Results of KnowMore experiments – Entity matching F1: book 0.88, movie 0.83 – New fact precision: book 0.88, movie 0.95 – Density increase for properties of existing entites Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 41 Yu, et al.: KnowMore–knowledge base augmentation with structured web markup. SWJ 2019. Movies Books
  42. 42. Data and Web Science Group 42 Top Attributes PLDs Microdata # % schema:Product/name 754,812 92 % schema:Product/offers 645,994 79 % schema:Offer/price 639,598 78 % schema:Offer/priceCurrency 606,990 74 % schema:Product/image 573,614 70 % schema:Product/description 520,307 64 % schema:Offer/availability 477,170 58 % schema:Product/url 364,889 44 % schema:Product/sku 160,343 19 % schema:Product/aggregateRating 141,194 17 % schema:Product/brand 113,209 13 % schema:Product/category 62,170 7 % schema:Product/productID 47,088 5 % … … … http://webdatacommons.org/structureddata/2018‐12/stats/html‐md.xlsx Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.25 Das Samsung Galaxy S4 ist der unterhaltsame und hilfreiche Begleiter für Ihr mobiles Leben. Es verbindet Sie mit Ihren Liebsten. Es lässt Sie gemeinsam unvergessliche Momente erleben und festhalten. Es vereinfacht Ihren Alltag. UPC 610214632623 000214632623 Use Case 2: Completing Product KGs
  43. 43. Data and Web Science Group HTML Tables Containing Product Specifications 43 Qui, et al.: DEXTER: Large‐Scale Discovery and Extraction of Product Specifications on the Web. VLDB 2015. Petrovski, et al: The WDC Gold Standards for Product Feature Extraction and Product Matching. ECWeb 2016. s:name Specifications as HTML Table s:breadcrumb Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.25
  44. 44. Data and Web Science Group Use Product ID Annotations as Supervision for Product Matching – Some e‐shops annotate product IDs – Most e‐shops do not  44 Properties PLDs # % schema:Product/name 754,812 92 % schema:Product/description 520,307 64 % schema:Product/sku 160,343 19 % schema:Product/productID 47,088 5 % schema:Product/mpn 12,882 1.6% schema:Product/gtin13 7,994 1% Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 Bizer, Primpeli, Peeters: Using the Semantic Web as a Source of Training Data. Datenbank Spektrum, 2019.
  45. 45. Data and Web Science Group Combine Specification Table Content  from All Offers for the Same Product 45 Product offer Product offer Product offer Product offerProduct offerProduct offer Product offer Product offer Product offer Product offer Product offer Product offer Clusters of offers having the same product ID Match additional offers without IDs Learn Matcher Matcher Product offer Product offer Combine spec table content of matching offers Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  46. 46. Data and Web Science Group Use Case 3: Distant Supervision for  Information Extraction – Goal: Learn how to extract data from websites that do  not provide schema.org annotations using annotations as  training data. – Example: Small Events Paper @ SIGIR 2015 – extract data about small venue concerts, theatre performances,  garage sales, etc.  – they use 217,000 schema.org events from ClueWeb2012  as distant supervision, e.g. 46 Foley, Bendersky, Josifovski: Learning to Extract Local Events from the Web. SIGIR2015. <div  itemtype="http://schema.org/MusicEvent"> <h1 itemprop="name">School Cool Concert</h1> <p itemprop="startDate">25.09.2018 15:00</p> <p itemprop=“location" itemtype="http://schema.org/Place">School Gym Puero High</span> <div> Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  47. 47. Data and Web Science Group Small Events Paper @ SIGIR 2015 – Approach – train an information extraction method that combines structural and text features – extracted attributes: What? When? Where? – apply method to pages without semantic annotations – Result  – extract 200,000 additional small‐scale events from ClueWeb2012  – double recall at 0.85 precision 47 Foley, Bendersky, Josifovski: Learning to Extract Local Events from the Web. SIGIR2015. Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  48. 48. Data and Web Science Group Use Case 4: Training Data for Taxonomy  Matching and Taxonomy Clustering – E‐Shops annotate their product categorization – The marketing ideas behind these categorizations differ widely 48 Relevant Properties # PLDs % schema:Offer/category 62,170 7 % of all shops schema:WebPage/BreadcrumbList 621,344 11% of all sites Apparel > Summer > Mens > Jerseys Philadelphia Eagles > Philadelphia Eagles Mens > Philadelphia Eagles Mens Jersey > over $60 s Home > Outdoor & Garden > Barbecues & Outdoor Living > Garden Furniture > Tables Shop > Tables > Dining Tables > Aland Wood Tables Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26 Meusel, etal.: Exploiting Microdata Annotations to Consistently Categorize Product Offers at Web Scale. EC‐Web 2015. 
  49. 49. Data and Web Science Group Cool Taxonomy Clustering Problem – How to cluster 62,000 taxonomies in order to discover the  most frequent categorization ideas within a domain? – How do shops categorize action cameras? – What subcategories are frequently used for action cameras? 49Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  50. 50. Data and Web Science Group Learn How to Match Product Taxonomies using Schema.org Data as Supervision  50 Category A Category B Taxonomies that share some products Unseen categories Similar category? P1 P2 P1 P2 P3 Learn Matcher Matcher Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  51. 51. Data and Web Science Group Conclusions 1. Internal Knowledge Graph Completion – Interesting, but not ready for prime time yet – Focus on link prediction, hardly any results on other tasks  2. KG Completion using Small Sets of External Sources – Production quality results for all tasks – Manuel effort per source required – Bottle neck is the availability and costs of external data sources 51Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  52. 52. Data and Web Science Group Conclusions 3. KG Completion using Web Table Data – Wide coverage of long‐tail entities – Wide coverage of long‐tail properties – Difficult to get property semantics right (page context required) 4. KG Completion using Schema.org Annotations – Comprehensive coverage of specific domains – Potential as training data for different KG completion tasks 52Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26
  53. 53. Data and Web Science Group Thank you. – Web Table Corpus http://webdatacommons.org/webtables/  – Semantic Annotations http://webdatacommons.org/structureddata/ – Product Matching Training Data and Goldstandard http://webdatacommons.org/largescaleproductcorpus/v2/ 53Christian Bizer: Completing Knowledge Graphs. JIST2019, Hangzhou, 2019.11.26

×