Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

schema.org: Linked Data's Gateway Drug

1,686 views

Published on

This deck, presented at Connected Data London 2018 (7 Nov. 2018), looks at the past, present and possible future of schema.org.

In particular it examines to which degree schema.org has helped move us toward the web of linked data envisioned by Tim Berners-Lee, and what lessons can be learned from what, I argue, has been the successful launch of a collaboratively-developed structured data vocabulary.

Published in: Internet

schema.org: Linked Data's Gateway Drug

  1. 1. schema.org Linked Data’s Gateway Drug <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "Drug", "name": "schema.org", "activeIngredient": "Linked data", "dosageForm": "Structured data", "recognizingAuthority": [{ "@type": "Organization", "name": "Bing" },{ "@type": "Organization", "name": "Google" },{ "@type": "Organization", "name": "Yahoo" },{ "@type": "Organization", "name": "Yandex" }] } </script> Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged
  2. 2. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged Electronic Arts schema.org/worksFor
  3. 3. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged bit.ly/semsearch schema.org pending.schema.org/knowsAbout bit.ly/sdataevents schema.org/WebSite
  4. 4. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged schema.org pending.schema.org/knowsAbout
  5. 5. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged History and adoption schema.org followed in the footsteps of other structured data initiatives, but appears to have enjoyed much broader adoption
  6. 6. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged schema.org Microformats (2004) Broad search engine support data-vocabulary.org (2009) data-vocabulary.org Open Graph Protocol (2007) Partial search engine support GoodRelations (2007) DCMI Terms (2003) FOAF (2000) No explicit search engine support Structured data existed prior to schema.org, but often with little or no search engine support The road to schema.org schema.org (2011)
  7. 7. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged A “collection of shared vocabularies … that can be understood by the major search engines” schema.org in a nutshell Structure • A collection of schemas consisting of types, properties and enumerations • Types – classes and subclasses (e.g. “Book”) • Properties – attributes expecting a value of a particular data type (e.g. “sameAs”), or relations expecting an instance of a particular type (e.g. “author”) or an enumeration member (e.g. “availability”) • Enumerations – a class (e.g. “ItemAvailability) whose members are considered neither types nor properties (e.g. “InStock”) Search engine support • A joint initiative supported at launch by Bing, Google and Yahoo, and soon after by Yandex Supported encoding formats • Microdata and RDFa supported at launch, with RDFa Lite and JSON-LD support following
  8. 8. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged All data from Web Data Commons 0.00% 2.00% 4.00% 6.00% 8.00% 10.00% 12.00% 14.00% 16.00% 2012 Aug 2013 Nov 2014 Dec 2015 Nov 2016 Oct 2017 Nov Format Use as a Percentage of Sampled Domains RDFa Microdata JSON-LD Robust schema.org adoption data is hard to come by, but format use helps paint the picture schema.org adoption as inferred from Web Data Commons data
  9. 9. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged What’s currently being encoded with these syntaxes is almost exclusively schema.org For microdata and JSON-LD, it’s schema.org all the way down Top Classes, Microdata, Nov. 2017 Top Classes, JSON-LD, Nov. 2017
  10. 10. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged All data from Web Data Commons Format Use by Number of Domains in Sample Raw Web Data Commons format usage data belies the relative expressiveness of schema.org A relatively large vocabulary results in more assertions 2012 2017
  11. 11. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged Raw Web Data Commons format usage data belies the relative expressiveness of schema.org A relatively large vocabulary results in more assertions <span class= "author vcard"> <a href= "http://www.seoskeptic.com/ aaron-bradley/" class="url fn">Aaron Bradley</a> “... OGP (Open Graph Protocol) and microformat approaches can be found on approximately as many sites as Schema.org, but given their much smaller vocabularies, they appear on less than fewer than half as many pages and contain fewer than a quarter as many logical assertions.” Guha, Brickley and Macbeth, Dec. 2015
  12. 12. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged Such as they are schema.org use by the numbers Apr. 2014 Dec. 2014 Dec. 2015 Nov. 2018 0.3% 22.0% 31.3% 21.9% JSON-LD 15.6% Microdata % of domains SearchMetrics 500K domains Microdata only? % of pages Guha, Brickley, Macbeth 10B pages % of websites W3Techs Top 10M websites (Alexa) % of pages Guha, Brickley, Macbeth 10B pages
  13. 13. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged The path to adoption The vocabulary launched with a clear value proposition for webmasters, and has been buoyed since by a collaborative vocabulary development model, a modified extension mechanism and the added flexibility afforded by JSON-LD
  14. 14. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged Event Recipe, AggregateRating Product, AggregateRating The search engines incentivized schema.org use right out of the gate with rich snippets Rich results at launch
  15. 15. Rich results post-launch The search engines have been steadily adding new search features as the vocabulary grows Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged Organization.logo, Organization.sameAs JobPosting ClaimReview
  16. 16. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged 23 March 20174 May 2016 0 200 400 600 800 1000 Jun-11 Nov-15 Nov-18 Classes in schema.org, 2011-2018 Core Extensions Pending A living vocabulary Over the course of time schema.org has become more and more expressive
  17. 17. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged public-schemaorg W3C Mailing List schema.org provides multiple mechanisms for collaborative vocabulary development Making vocabulary development a community affair schema.org on Github Partnerships
  18. 18. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged GS1’s SmartSearch is powered by a schema.org external extension schema.org’s extension mechanism was completely revamped in v2.0 (May 2015) Extending schema.org with more specialized vocabulary SmartSearch in action at Tesco
  19. 19. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged schema.org endorsed JSON-LD in 2013; Google started using it in 2014, with full support by 2016 JSON-LD: developer-friendly linked data “…the whole point about it is, it is JSON first and RDF second. And the fact that it carries RDF is simply unimportant. And it's particularly unimportant to people who are JSON users – which is basically every web developer these days. “People don't need to know everything, they can create really cool applications, and if they find JSON-LD useful – fantastic. If they don't know that it's RDF, I don't care.” Phil Archer, Aug. 2014
  20. 20. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged Separation of the data and presentation layers makes life considerably easier for web developers JSON-LD versus inline markup: no contest Product Details Page: Before Product Details Page: After <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "Product", "name": "Bob's Best Basic T" "image": "bbbt-pink.jpg", "offers": { "@type": "Offer", "price": "$28", "priceCurrency": "$USD", }, "aggregateRating": { … <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "Product", "name": "Bob's Best Basic T" "image": "bbbt-pink.jpg", "offers": { "@type": "Offer", "price": "$28", "priceCurrency": "$USD", }, "aggregateRating": { …
  21. 21. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged schema.org beyond search Seemingly striking the right balance between expressiveness and complexity, the vocabulary is being used for applications outside of search, and is increasingly the starting point for ground-up linked data initiatives
  22. 22. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged Pinterest uses schema.org to populate Article, Product and Recipe Rich Pins Leveraging structured data to enhance the presentation layer Pinterest Product Rich Pin Offer Information on Pin Source Page
  23. 23. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged When Google needed vocabulary for its Assistant it unsurprisingly turned to schema.org Virtual assistants and schema.org
  24. 24. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged When Google needed vocabulary for its Assistant it unsurprisingly turned to schema.org Virtual assistants and schema.org
  25. 25. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged Amazon’s Alexa Meaning Representation Language is based on schema.org Virtual assistants and schema.org “The Alexa ontology utilized schema.org as its base and has been updated to include support for spoken language. In addition, using schema.org as the base of the Alexa Ontology means that it shares a vocabulary used by more than 10 million websites, which can be linked to the Alexa ontology” Thomas Kollar et al, Jun. 2018
  26. 26. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged A New Zealand health insurance company used the vocabulary to kickstart product development Bootstrapping development with schema.org David Gibson, Feb. 2018
  27. 27. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged The vocabulary allows linked data practitioners to construct knowledge graphs with relative ease Bootstrapping development with schema.org “…the knowledge graph is implemented as a triple store where the data has been represented using a small number of vocabularies (mostly schema.org with some terms borrowed from TAXREF-LD and the TDWG LSID vocabularies).” Rod Page, Ozymandias
  28. 28. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged Chinese search engine Baidu appears to have based its knowledge graph on schema.org Bootstrapping development with schema.org Via Google Translate
  29. 29. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged Electronic Arts used the vocabulary as the basis for their domain ontology Bootstrapping development with schema.org
  30. 30. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged Boundaries of the vocabulary As schema.org is adopted for use in increasingly diverse domains, there’s more and more demands to add to the vocabulary: does it risk becoming too much “an ontology of everything”, or is it actually not expressive enough?
  31. 31. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged Is it an animal? Just how much can we say about each entity? Let’s play 20 questions using schema.org vocabulary! Is it a vegetable? Is it a mineral? It’s a Thing It’s a Thing It’s a Thing More expressive exceptions: Person, Product More expressive exception: Product More expressive exception: Product
  32. 32. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged But there’s always a tension between adding to schema.org and referencing existing vocabularies The “add animals and plants” discussion has recently reignited
  33. 33. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged But there’s always a tension between adding to schema.org and referencing existing vocabularies The “add animals and plants” discussion has recently reignited
  34. 34. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged Recent developments and future directions At the same time that the improved ability of machines to understand content makes structured data use less of an imperative, schema.org is increasingly finding itself useful as a mechanism for serialized linked data
  35. 35. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged If machines are eventually able to parse content like humans will structured data still be necessary? Will AI and related technologies render schema.org obsolete?
  36. 36. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged Leveraging schema.org allows Google to improve the discoverability of datasets Bridging the semantic gap with Dataset Search Year of Birth No. of cases 1976 1 1977 1 1980 1 1981 2 1982 7 1983 8 1984 7 1985 7 1986 11 … Total 89
  37. 37. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged JSON-LD data feeds enable publishers to support user-initiated video or audio playback Bridging the action gap with Google Media Actions <script type="application/ld+json"> { "@context": ["http://schema.org", {"@language": "en"}], "@type": "Movie", "@id": "http://example.com/M", "url": "http://example.com/M", "name": “M", "potentialAction": { "@type": "WatchAction", "target": { "@type": "EntryPoint", "urlTemplate": "http://example.com/M?autoplay=true", "inLanguage": "en", "actionPlatform": [ "http://schema.org/DesktopWebPlatform", "http://schema.org/MobileWebPlatform", "http://schema.org/AndroidPlatform", "http://schema.org/IOSPlatform", "http://schema.googleapis.com/GoogleVideoCa st" ] …
  38. 38. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged This Google tool supports direct entry of ClaimReview data, which then appears on dataCommons.org Bridging the markup gap with the Fact Check Markup Tool ... "@type" : "DataFeedItem", "dateModified" : "2018-10-24T15:00:14.238315+00:00", "item" : [ { "@context" : "schema.org", "@type" : "ClaimReview", "author" : { "@type" : "Organization", "name" : "Sens3", "url" : "http://fct.sens3.com/" }, "claimReviewed" : "I play the trumpet!", "datePublished" : "2018-10-09", "itemReviewed" : { "@type" : "Claim", "author" : { "@type" : "Person", "name" : "Paul McCartney" } }, "reviewRating" : ...
  39. 39. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged This Google tool supports direct entry of ClaimReview data, which then appears on dataCommons.org Bridging the markup gap with the Fact Check Markup Tool ... "@type" : "DataFeedItem", "dateModified" : "2018-10-24T15:00:14.238315+00:00", "item" : [ { "@context" : "schema.org", "@type" : "ClaimReview", "author" : { "@type" : "Organization", "name" : "Sens3", "url" : "http://fct.sens3.com/" }, "claimReviewed" : "I play the trumpet!", "datePublished" : "2018-10-09", "itemReviewed" : { "@type" : "Claim", "author" : { "@type" : "Person", "name" : "Paul McCartney" } }, "reviewRating" : ... "@type": "Rating", "ratingValue": “2", "alternateName" : “Mostly False", "bestRating": "5", "worstRating": "1“
  40. 40. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged schema.org has established common ground on shared terminology: is it time to address identifiers? Questions of identity “Very early in the formation of schema.org we made a strong decision, which was not to support canonical IDs, and I think it was an important thing because it would have been very politically contentious at the time to support it, because we basically would have had to pick somebody's ID system to have canonical IDs. “I think the time has come for canonical IDs, so I would love to see schema.org or some other organization take on canonical IDs.” Steve Macbeth, Microsoft, Apr. 2018
  41. 41. Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged Let’s keep the conversation going Thanks! <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "CommunicateAction", "agent": { "@type": "Person", "name": "Aaron" }, "recipient": { "@type": "PeopleAudience", "name": "CDL2018 Attendees" }, "object": "Stay in touch!" } </script> Twitter @aaranged LinkedIn linkedin.com/in/aaranged/ Semantic Search Marketing bit.ly/semsearch

×