Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Future of Semantics on the Web

662 views

Published on

A Keynote at the Web Science Conference, 2018, held at the VU Amsterdam [1]. This describes in the main the output of the Semantic Technology Institute International (STI2) Summit (for senior researchers in the Semantic Web field) held in Crete in September, 2017 [2].

1. https://websci18.webscience.org/
2. https://www.sti2.org/events/2017-sti2-semantic-summit

Published in: Data & Analytics
  • Be the first to comment

The Future of Semantics on the Web

  1. 1. The Future of Semantics on the Web Web Science Conference Prof. John Domingue (@johndmk) Director, Knowledge Media Institute, the Open University, UK President STI International On behalf of STI 2017 Summit Attendees http://kmi.open.ac.uk/ https://www.sti2.org/events/2017-sti2-semantic-summit
  2. 2. Introduction
  3. 3. Agenda Introduction• Scalable Data• Dynamics• Real World Applications• Conclusions• 3
  4. 4. STI2 Summit 2017 • Invitation only – mainly for seniors • Run every 2 years • Individually submitted papers • Aggregation into 3 topics • Sub-groups tackle topics in turn
  5. 5. STI Summit Attendees Dieter FenselClaudia d’Amato John Domingue Sung-Kook HanStefan Decker JuanMiguelGómezBerbis Andreas Harth Sabrine KirraneXatanas KiryakovMartin Hepp Axel NgongaMaria MaleshkovaJens Lehmann York SureElena SimperlAdrian Paschke Ioan TomaJuan SequedaOleksandraPanasiuk Raphaël Troncy 5
  6. 6. Caveats • Only a portion of the event • Personal likes and prejudices remain • All the mistakes are mine • Happy to forward requests to authors of ideas/research 6
  7. 7. Brief History
  8. 8. Seman&c Web History Important “Those who cannot remember the past are condemned to repeat it.” George Santayana 1905/06 https://en.wikiquote.org/wiki/George_Santayana https://en.wikiquote.org/wiki/Edmund_Burke Edmund Burke 1790 “People will not look forward to posterity, who never look backward to their ancestors.” 8
  9. 9. Turing Test 9
  10. 10. Knowledge Level Allan Newell https://dl.acm.org/citation.cfm?id=3015714 10http://diva.library.cmu.edu/Newell/
  11. 11. CommonKads h"ps://commonkads.org/introduc4on/ 11
  12. 12. IBROW https://www.researchgate.net/figure/The-IBROW-3-architecture-for-knowledge- based-systems_fig1_2643592 Linked Open Data Web APIs Schema.org actions 12
  13. 13. Cyc Common Sense Knowledge h"p://www.cyc.com/kb/ 13 500K terms 17K types of relations 7M assertions based on terms
  14. 14. Agents in Original Semantic Web “The real power of the Semantic Web will be realized when people create many programs that collect Web content from diverse sources, process the information and exchange the results with other programs. The effectiveness of such software agents will increase exponentially as more machine-readable Web content and automated services (including other agents) become available. The Semantic Web promotes this synergy: even agents that were not expressly designed to work together can transfer data among themselves when the data come with semantics.” Tim Berners-Lee, James Hendler and Ora Lassila 14
  15. 15. Linked Open Data Use1. URIs to name (identify) things. Use2. HTTP URIs so that these things can be looked up (interpreted, “dereferenced”). Provide useful information about what a3. name identifies when it’s looked up, using open standards such as RDF, SPARQL, etc. Refer to other things using their HTTP URI4. - based names when publishing data on the Web. 15
  16. 16. May 2007; 12 datasets
  17. 17. October 2007; 25 datasets
  18. 18. November 2007; 28 datasets
  19. 19. November 2007; 28 datasets
  20. 20. February 2008; 32 datasets
  21. 21. March 2008; 34 datasets
  22. 22. September 2008; 45 datasets
  23. 23. March 2009; 89 datasets
  24. 24. March 2009; 93 datasets
  25. 25. July 2009; 95 datasets
  26. 26. September 2010; 203 datasets
  27. 27. September 2011; 295 datasets
  28. 28. August 2014; 570 datasets
  29. 29. January 2017; 1146 datasets
  30. 30. February 2017; 1139 datasets
  31. 31. August 2017; 1163 datasets
  32. 32. April 2018; 1184 datasets
  33. 33. 33
  34. 34. 34
  35. 35. Amsterdam 35 Adapted from h/ps://www.ambiverse.com/knowledge-graphs-encyclopaedias-for-machines/ using data from h/p://lookup.dbpedia.org/api/search.asmx/KeywordSearch?QueryClass=place&QueryString=amsterdam
  36. 36. Cyc -> Knowledge Graph? Jamie Taylor Manager of Schema Team at Google Keynote at ISWC 2017 http://videolectures.net/iswc2017_taylor_applied_semantics/ 36
  37. 37. Domains with Triples http://webdatacommons.org/structureddata/ Total Data 66 Terabyte Parsed HTML URLs 3,155,601,774 URLs with Triples 1,228,129,002 Domains in Crawl 26,271,491 Domains with Triples 7,422,886 Typed Entities 9,430,164,323 Triples 38,721,044,133 37
  38. 38. Problems and Issues Scalable Data
  39. 39. Over Centralisa-on 39
  40. 40. Thin Files and Data Poor 40
  41. 41. Thin Files and Data Poor http://www.theweek.co.uk/92944/who-are-the-windrush-generation-and-why-are-they- facing-deportation 41
  42. 42. Claim 42
  43. 43. Proof 43
  44. 44. Attestation 44
  45. 45. Data Changes 45
  46. 46. Fake Data 46
  47. 47. Centralised Data 47
  48. 48. Toxic Data 48
  49. 49. Jurisdictional Politics 49
  50. 50. Monopolis(c Tendencies 50
  51. 51. Verifiable Claims WG
  52. 52. Self Sovereign Identity 52
  53. 53. Data Consumption • Is still problematic • No good paradigm and best practices for making data good for a purpose 53
  54. 54. Ontology engineering and data mapping • Ontology engineering and data mapping, as done today, are complex and do not scale • Matching/linking entities is still too complex • Keeping mappings up to date as data changes at source • This makes it hard to come up with reliable “data architecture” that uses LOD for enterprise applications 54
  55. 55. Scalable Data: Future Directions
  56. 56. http://gmonster320.blogspot.com/2016/12/black-mirror-season-3-episodes-1-6.html
  57. 57. PrivOn: Society, Privacy and the Semantic Web - Policy and Technology
  58. 58. FAIR Principles Findable• Identifiers, rich metadata, indexed– Accessible• Retrievable, open free protocol, authentication– Interoperable• Broad applicable KR, vocabularies– Reusable• Clear licences, provenance, standards– 58
  59. 59. Decentralised Data Web Scalable Data: Future Directions
  60. 60. Linked Data Fragments Data and processing now split Ruben Verborgh 60
  61. 61. Interplanetary File System (IPFS) • Content-addressed distributed storage (CADS) • Files identified by hash of contents • Shared across BitTorrent-based network 61
  62. 62. Blockchain = an Immutable Linked List 62
  63. 63. Linked Data Fragments Ruben Verbough 63
  64. 64. Linked Data Fragments 64
  65. 65. Semantic Interoperability and Indexing 65
  66. 66. Reducing the Burden Scalable Data: Future Directions
  67. 67. Reducing the Burden: Proto Data 68
  68. 68. Benchmarking the Knowledge Graph Lifecycle • Goals not fully known • Not always clear what to measure • Solution – Semantic definitions – Open benchmarks – Open Toolset – Queryable results – Apply sFAIR – Benchmarking body 69
  69. 69. https://project-hobbit.eu 70
  70. 70. http://gerbil.aksw.org 71
  71. 71. Be#er tools for ontology Engineering & Mapping 72
  72. 72. Tools, Incentives, Approaches Data Publication
  73. 73. Data publication • Incentives • Tools – Search, ranking, fast federated querying • Micro publishing and micro attribution • Provenance – micro provenance for whole life-cycle • Licence interoperability – Semantic description of licenses – Computation on licenses • Aggregating open and closed data 74
  74. 74. h"ps://crea+vecommons.org/ns 75
  75. 75. Data Publication Incentives https://chiefmartec.com/2010/01/7-business-models-for-linked-data/ 76
  76. 76. https://www.coindesk.com/amazon-sees-bitcoin-use-case-data-marketplaces/https://blog.enigma.co/we-are-launching-the-enigma-data-marketplace-a7d251ce0bfd h?ps://repux.io/ h?ps://datum.org/
  77. 77. KG Lifecycle Coin 78
  78. 78. Embedding Knowledge Graphs Scaling and Linking Data
  79. 79. Embedding Knowledge Graphs Knowledge Graph Embedding: A Survey of Approaches and Applica<ons Quan Wang , Zhendong Mao , Bin Wang, and Li Guo IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 29, NO. 12, DECEMBER 2017 80
  80. 80. Embedding Knowledge Graphs 81
  81. 81. Embedding Knowledge Graphs 82
  82. 82. Embedding Knowledge Graphs Photo by Paul Clarke h/ps://www.wikidata.org/wiki/Q20202034 83
  83. 83. Embedding Knowledge Graphs 84
  84. 84. Addi$onal Data Issues Evolution of knowledge graphs• Evolution of machine learning models built– from knowledge graph Temporal data• Geo spatial data• Uncertainty• Usability• decent tools– 85
  85. 85. Issues and Challenges Dynamics
  86. 86. Distributed M2M Context 88
  87. 87. Protocols • HTTP(S) • The Constrained Application Protocol (CoAP) web transfer protocol for IoT from IETF – Compatible with XML and JSON • oneM2M - Standards for M2M and IoT – Has a base ontology in OWL • Discoverability, security, scalability and minimal resource use http://www.onem2m.org/http://coap.technology/ 89
  88. 88. Challenge 1: Actions on entities What modelling language do we need? Actions– Context– Constraints (devices, environment, user,– etc.) Interactions– Compositions– 90
  89. 89. Challenge 2: Distributed Processing 91
  90. 90. Challenge 3: Data Velocity 92
  91. 91. Challenge 4: Describing Policies • Supporting offers, agreements, negotiation, constraints etc... • Digital rights management 93
  92. 92. Distributed Context http://www.iphonehacks.com/2015/07/popular-ios-apps-susceptible-passwords.html 2,100,000 IoS Apps 94
  93. 93. Distributed Context https://www.voicebot.ai/2018/03/22/amazon-alexa-skill-count-surpasses-30000-u-s/ 95
  94. 94. > 1M Smart Contracts 96
  95. 95. h"ps://www.w3.org/2018/vocabws/ 97
  96. 96. Dynamic Vocabularies
  97. 97. WoT Thing Description https://www.w3.org/TR/wot-thing-description/ 99
  98. 98. IoT Lite Ontology h"ps://www.w3.org/Submission/2015/SUBM-iot-lite-20151126/ 100
  99. 99. Schema.org Action <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "ListenAction", "agent": { "@type": "Person", "name": "John” }, "object": { "@type": "MusicGroup", "name": "Pink!"}, "participant": { "@type": "Person", "name": "Steve"}, "location": { "@type": "Residence", "name": "Ann's apartment"}, "instrument": { "@type": "Product", "name": "iPod"}} </script> http://schema.org/Action John listened to Pink with Steve at Anna’s apartment on his iPod. 101
  100. 100. Background Real World Applica.ons
  101. 101. Background Tourism and Tyrol • Tourism 40% GDP • 40 Million night stays per year • Tyrol, Salzburg, Vienna -> 40% Austria’s GDP. • Tourism sector 3rd largest in EU 10% GDP/employees • Especially important for Southern Europe http://www.alpine-space.eu/projects/alpes/en/test-regions/regions/south-tyrol-italy 103
  102. 102. Issues and Challenges Real World Applications
  103. 103. 105
  104. 104. Plethora of Social Media & Communica5on Channels 106
  105. 105. ChatBots – Hello Hipmunk https://www.30secondstofly.com/ai-software/ultimate-travel-bot-list/ 107
  106. 106. Smart Home Speakers/Personal Assistants 108
  107. 107. Approach • Ontologies based on schema.org • Static content and dynamic data – Bookings and booking engines – JSON-LD • Publish guidelines for accommodation providers • Pilots and partnerships – Local agencies for hotels and ski resorts, Tyrol tourism state agency, Austrian Ministry • New approach for dealing with multiple channels 109
  108. 108. Other Regions • 3cixty.com – French Riviera, Singapore, London, Milan, Amsterdam, St Barthelemy… – data from official tourist offices feeds & web harvesting (Google, Facebook, Foursquare, Yelp) – partnerships with booking/ticketing retailers (booking.com, airbnb, ticketmaster) • pas-time.org - personalize the package to individuals based on past trips and activities (using Amadeus data lake) 110
  109. 109. Challenges and Issues GDPR vs security (terrorism)• Telecom data (+ IoT wearables) track location & increase– personalization Heterogeneity of data sources• Bias in data sources• Data quality• live data, updates, outdated data– Licensing• Multilinguality• google translate can help– semantics helps with controlled vocabularies (e.g. events)– 111
  110. 110. Future Real World Applications
  111. 111. Tools https://semantify.it/
  112. 112. Open Speech Recognition 114
  113. 113. “It’s a small world …… …but I wouldn’t want to have to paint it.” Steven Wright https://morristowngreen.com/2012/12/02/comedian-steven-wright-at-the-mayo-in- morristown-we-could-have-used-him-during-the-gas-lines/ 115
  114. 114. Touristic Knowledge Graphs 116
  115. 115. Touris'c Knowledge Graphs 117
  116. 116. ‘Pure’ machine readable web 118
  117. 117. Conclusions
  118. 118. Conclusions 1/3 Main role benefit of semantics is its ability to sit• between the machine and humans: Raising the level of discourse– Supporting interoperability– Enabling automation– What has changed• Scale– Industrial take– -up Future• Adapting to ever changing context– Hitting the sweet spot between reasoning power and– simplicity 120
  119. 119. Conclusions 2/3 • Context is very different to pre and early web days – Ubiquity of devices – Volume of data • Societal Concerns – Over centralization – Data exacerbating inequality – need for ‘data justice’ – Privacy – Security 121
  120. 120. Conclusions 3/3 • Scalable Data – Need for incentivized, benchmarked full life-cycle – Benefits from a more open publishing process – Decentralised platforms • Dynamics – Unclear how to support decentralized semantic applications in M2M context • Real world applications – Development of comprehensive framework to support creation of sector specific KGs 122
  121. 121. kmi.open.ac.uk s-2.org

×