Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Semantic Web – A Vision Come True, or Giving Up the Great Plan?


Published on

  • Be the first to comment

The Semantic Web – A Vision Come True, or Giving Up the Great Plan?

  1. 1. The Semantic Web – A Vision Come True, or Giving Up the Great Plan? Martin Hepp, @mfhepp
  2. 2. Semantic Web: A Decade of Achievement? •  Linked Open Data Cloud • •  Google Knowledge Graph •  Bing Sartori •  Linked Data in Libraries •  Linked Data in Public Data Initiatives •  Etc. Semantic Web and Linked Data Success Stories http://www.heppresearch.com2
  3. 3. The LOD Cloud A hard-wired, small-scale data integration project with no quality of service guarantees. http://www.heppresearch.com3 Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak.
  4. 4. Web Data Commons A pretty outdated RDF representation of information extracted from a biased sample of popular Web pages, missing a lot of data in deep detail pages. http://www.heppresearch.com4 2015-04-02: RDFa, Microdata, and Microformat data sets extracted from the December 2014 Common Crawl corpus available for download.
  5. 5. The Old Testament of the Semantic Web http://www.heppresearch.com5 Mostly WHAT a better Web should allow §  Computers should be able to help us process information from the Web
  6. 6. The New Testament of the Semantic Web http://www.heppresearch.com6 Detailed technical assumptions about the HOW §  Widely driven by applying principles from small-scale, controlled settings to the Web. §  Need for extensions of old paradigms acknowledged. §  But fundamental question of match between paradigms and ecosystem largely unchallenged.
  7. 7. The Modern Sects and their Cults http://www.heppresearch.com7 Turned assumptions and drafts into laws §  Linked Data Principles –  URIs over strings §  Entity identifiers §  Qualitative values (enumerations) –  Page vs. Entity / Conneg / Redirects –  Open Licenses §  SPARQL endpoints §  Reuse visible content in RDFa and Microdata Berner-Lee, Tim: Linked Data,
  8. 8. An now they fight a useless war over the details of their interpretation… http://www.heppresearch.com8 3rd Commandment: Thou shalt not make unto thee any graven image §  Exodus 20:4-6 §  Minimal ontological commitment, folks! §  Occam's razor §  Ludwig Wittgenstein: Tractatus Logico-Philosophicus: –  “Occam's Razor is, of course, not an arbitrary rule nor one justified by its practical success. It simply says that unnecessary elements in a symbolism mean nothing. Signs which serve one purpose are logically equivalent; signs which serve no purpose are logically meaningless.” (*) Image Credit: PD, (*) Taken from's_razor#Ludwig_Wittgenstein
  9. 9. What is What is GoodRelations? 1. Official Characterization 2. Purpose: §  Focus on information extraction on the Web §  Other uses as a by-product 3. Knowledge Representation Perspective §  Entity Types §  Relationship Types §  Weak Domain / Range Semantics §  Syntax-independent Meta-Model And how are they related? Questions? Suggestions? Contact me at @mfhepp!9
  10. 10. Official Characterization from Questions? Suggestions? Contact me at @mfhepp!10 This site provides a collection of schemas that webmasters can use to markup HTML pages in ways recognized by major search providers, and that can also be used for structured data interoperability (e.g. in JSON). Search engines including Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right Web pages. Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure.
  11. 11. Overview and Motivation: There is REAL Momentum Questions? Suggestions? Contact me at @mfhepp!11 A lot of data §  Since 2011, has been added to >25% of top-ranked e- commerce sites product detail pages. §  RDF-based representations are specified. Table: Random sample of n=73 product detail pages from high-ranking Google results. Note that these numbers have a strong bias towards popular, professionally operated sites.
  12. 12. A Data Publication Ontology Questions? Suggestions? Contact me at @mfhepp!12 Not designed for raw data consumption (only as a by-product) §  Historically, ontologies in computer science aimed at harmonizing the conceptualization and representation of data for publishers and consumers of the data. §  Implicit goal of the traditional Semantic Web stack: More or less, consumption of raw data. §  This requires detailed consensus on the level of data granularity and data semantics at scale, and high data quality. § does not make this assumption, since its sponsors have the power to work on semi-structured data at Web scale.
  13. 13. The Semantic Web Vision Come True? 1. No OWL. Not even an ontology in the narrow sense. 2. Direct consumption difficult §  Crawling §  Cleansing §  Lifting 3. No broad use of Linked Data principles §  Mostly no global entity identifiers §  Page = Entity (vs. httpRange-14) §  No vocabulary reuse (*) Likely not what the Semantic Web community had hoped for. Questions? Suggestions? Contact me at @mfhepp!13
  14. 14. Web Ontology Engineering Patterns 1. Dynamic Degree of Disambiguation 2. Dynamic Data Granularity 3. Sweet Spots Rule §  Distinctions that can be populated reliably and with little effort §  Distinctions that are hard to reconstruct by the recipient Hepp (2015, forthcoming) http://www.heppresearch.com14
  15. 15. The Fallacy of Raw Consumption of Web Data http://www.heppresearch.com15 Naïve Type Membership Interpretation: SPARQL # Find former STI members who are professors PREFIX dbpedia-owl: <> SELECT * {?s a dbpedia-owl:Professor} LIMIT 100
  16. 16. Naïve Type Membership Interpretation: SPARQL http://www.heppresearch.com17 Find all professors from Web markup <html prefix="schema:! dbpedia:">! <!-- .. -->! <div typeOf="schema:Person dbpedia:Professor" about="#person">! <span property="schema:honorificPrefix">Prof. Dr.</span>&nbsp;! <span property="schema:givenName">Zaphod</span>! <span property="schema:familyname">Beeblebrox</span>! </div>! </html>
  17. 17. Type Membership as a Machine Learning Problem http://www.heppresearch.com18 Supervised Learning: Logistic Regression §  Input: –  Entity e –  Type t –  Origin (Graph / Domain / URI) o –  Optional: Properties and property values [(p1,v1), (p2,v2),…] §  Output –  t’(e) = f(e, t, o) –  p(t(e) == True) Example data: (, …#person, (, …#event1, Hepp (2015b, forthcoming)
  18. 18. Let’s Do Science, not Cult! http://www.heppresearch.com19 §  Challenge paradigms and approaches §  Use hard data, not beliefs and assumptions (neither your own ones nor the ones inherited from the old folks) CC BY-SA 3.0 / Nicor /'s_cult_of_personality#/ media/File:Mansudae_Grand_Monument_08.JPG
  19. 19. Thank you. http://www.heppresearch.com20 HEPP RESEARCH GmbH Prof. Dr. Martin Hepp, CEO Contact us! Kuppelnaustrasse 5 88212 Ravensburg, Germany Phone +49 751 2708 5256-0 Fax +49 751 2708 5256-9