semantic markup using schema.org
Upcoming SlideShare
Loading in...5
×
 

semantic markup using schema.org

on

  • 7,952 views

A basic intro to microdata and schema.org, along with a new schema.org extension for datasets and data catalogs. "TWed" talk April 4, 2012.

A basic intro to microdata and schema.org, along with a new schema.org extension for datasets and data catalogs. "TWed" talk April 4, 2012.

Statistics

Views

Total Views
7,952
Views on SlideShare
7,920
Embed Views
32

Actions

Likes
16
Downloads
113
Comments
1

6 Embeds 32

http://fortytwo.net 20
http://semanticmarkup.blogspot.com 7
http://www.twylah.com 2
http://paper.li 1
http://www.blog-deco-maison.com 1
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

semantic markup using schema.org semantic markup using schema.org Presentation Transcript

  • Joshua ShinavierWednesday Nights in the Tetherless World (TWed) April 4th, 2012
  • Outline• rich snippets • microformats • RDFa • microdata• microdata syntax• schema.org • deployment • mappings, tools, extensions• the Dataset extension 2
  • 3
  • the three syntaxes• several solutions for embedding semantic data in Web pages• three syntaxes known (by Google) as “rich snippets” - microformats - RDFa - HTML microdata• all three are supported by Google, while - microdata is the “recommended” syntax 4
  • First came microformats• microformats emerged around 2005• some key principles - start by solving simple, specific problems - design for humans first, machines second• wide deployment - used on billions of Web pages - usage share was at 94% vis-a-vis competing formats (before microdata, anyway)• formats exist for marking up Atom feeds, calendars, addresses and contact info, geo-location, multimedia, news, products, recipes, reviews, resumes, social relationships, etc. 5
  • microformats example<div class="vcard"> <a class="fn org url" href="http://www.commerce.net/">CommerceNet</a> <div class="adr"> <span class="type">Work</span>: <div class="street-address">169 University Avenue</div> <span class="locality">Palo Alto</span>, <abbr class="region" title="California">CA</abbr>&nbsp;&nbsp; <span class="postal-code">94301</span> <div class="country-name">USA</div> </div> <div class="tel"> <span class="type">Work</span> +1-650-289-4040 </div> <div>Email: <span class="email">info@commerce.net</span> </div></div> 6
  • then came RDFa• RDFa aims to bridge the gap between human- oriented HTML and machine-oriented RDF documents• provides XHTML attributes to indicate machine- understandable information• uses the RDF data model, and Semantic Web vocabularies directly 7
  • RDFa example<div typeof="foaf:Person" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <p property="foaf:name"> Alice Birpemswick </p> <p> Email: <a rel="foaf:mbox"href="mailto:alice@example.com">alice@example.com</a> </p> <p> Phone: <a rel="foaf:phone" href="tel:+1-617-555-7332">+1 617.555.7332</a> </p></div> 8
  • last but not least, microdata• microdata syntax is based on nested groups of name- value pairs• HTML microdata specification includes - an unambiguous parsing model - an algorithm to convert microdata to RDF• compatible with the Semantic Web via mappings 9
  • 10
  • microdata properties • annotate an item with text-valued properties using the “itemprop” attribute<div itemscope> <p>My name is <spanitemprop="name">Daniel</span>.</p></div> 11
  • multiple values are OK • as in RDF, you can have two properties, for the same item (subject) with the same value (object)<div itemscope> <p>Flavors in my favorite ice cream:</p> <ul> <li itemprop="flavor">Lemon sorbet</li> <li itemprop="flavor">Apricot sorbet</li> </ul></div> 12
  • item types • these correspond to classes in RDF<section itemscope itemtype="http://example.org/animals#cat"> <h1 itemprop="name">Hedral</h1> <p itemprop="desc">Hedral is a male american domestic shorthair, with a fluffy black fur with white pawsand belly.</p> <img itemprop="img" src="hedral.jpeg" alt=""title="Hedral, age 18 months"></section> 13
  • global IDs • items may be given global identifiers, which are URLs • they may be, but do not need to be Semantic Web URIs<dl itemscope itemtype="http://vocab.example.net/book" itemid="urn:isbn:0-330-34032-8"> <dt>Title <dd itemprop="title">The Reality Dysfunction <dt>Author <dd itemprop="author">Peter F. Hamilton <dt>Publication date <dd><time itemprop="pubdate" datetime="1996-01-26">26January 1996</time></dl> 14
  • 15
  • the schema.org vocabulary• schema.org is one of a number of microdata vocabularies• it is a shared collection of microdata schemas for use by webmasters• includes a type hierarchy, like an RDFS schema - starts with top-level Thing and DataType types - properties are inherited by descendant types 16
  • Why should you use schema.org? There are several reasons. 17
  • current schema.org types (there are around 300 of them) 18
  • In terms of deployment... ...a few key types stand out. 19
  • Top types type occurrences relativeProduct 5001966 0.27689260175PostalAddress 1437388 0.07956913403WebPage 1402426 0.07763375119Offer 1267545 0.07016717684Book 1111463 0.06152698395Person 968737 0.05362613587AggregateRating 780967 0.04323179816GeoCoordinates 546586 0.03025722678LocalBusiness 544662 0.03015072039Article 525487 0.02908925463Place 490433 0.02714877897Residence 451652 0.02500198869ItemPage 421911 0.02335562347Organization 405876 0.02246797792Blog 268582 0.01486782772 20
  • Who’s using it?Over 1,000 domains found (through Sindice) 21
  • Some early adopters domain occurrences relativewww.couponcabin.com 3662 0.04400596www.digifotopro.nl 2852 0.034272255www.weg.de 2336 0.028071525futpedia.globo.com 2003 0.02406989www.the-plug.com 2001 0.024045857www.virtualtourist.com 1953 0.023469044gdgt.com 1857 0.02231542www.notasdeprensa.es 1564 0.018794463www.libreriadelsanto.it 1294 0.015549894liriklaguindonesia.net 1274 0.015309556www.direct2florist.com 1080 0.012978273www.bluefountainmedia.com 1065 0.01279802www.alphabetsigns.com 1059 0.012725918www.tasit.com 1004 0.012064988www.teachstreet.com 1001 0.012028937 22
  • schema.rdfs.org• maintains schema.org ↔ RDF mappings - there are mappings for BIBO, DBpedia, Dublin Core, FOAF, GoodRelations, SIOC, and WordNet• also provides examples, tutorials, and data dumps See: http://schema.rdfs.org/mappings.html 23
  • schema.org tools• Google’s Rich Snippets Testing Tool• schema.org libraries are available in Java, JavaScript, Perl, PHP, Python, and Ruby• there are schema.org modules for Drupal, Joomla!, WordPress, and Virtuoso• online tools include microdata extractors, generators and validators• sindice.com supports microdata See: http://schema.rdfs.org/tools.html 24
  • schema.org extensions• there are dozens of schema.org community proposals - they extend existing schema.org vocabulary• several have already been accepted into schema.org, incl. - Job Postings - IPTC/rNews integration - User Comments• others: Comics, Learning Resources, TV and Radio, Software Application, etc. 25
  • 26
  • motivation: open government data 27
  • the Dataset vocabulary: types• DataCatalog - a collection of datasets - e.g. the International Open Government Data catalog• Dataset - an individual, abstract data set - e.g. a data set about seismic hazard zones near San Francisco• DataDownload - a dataset in downloadable form - e.g. an RDF/XML dump of the seismic hazard zones data set 28
  • the Dataset vocabulary: properties• catalog - the catalog containing a dataset• dataset - a dataset contained in a catalog• distribution - a data download for a dataset• keyword - the topic of a dataset• spatial - the spatial extent of a data set (e.g. United States) 29
  • Dataset extension RDF• the Dataset extension maps to a subset of the Data Catalog Vocabulary (DCAT)• many other types and properties are inherited from schema.org• collectively, they cover - around 2/3 of DCAT, and - around half of the Asset Description Metadata Schema (ADMS) 30
  • Dataset example (microdata)<div itemscope="itemscope" itemid="http://logd.tw.rpi.edu/source/datasf-org/dataset/catalog/datasf.org/version/2011-Jun-07/thing_89"itemtype="http://schema.org/Dataset"> <a href="http://www.datasf.org/story.php?title=seismic-hazard-zones-"><span itemprop="name"> <b>Seismic Hazard Zones</b> </span></a> <div><meta itemprop="url" content="http://www.datasf.org/story.php?title=seismic-hazard-zones-"/> <span itemprop="description">The dataset represents the Liquefactionand Landslide Zones [...]</span></div> <div><i>Country:</i> <a href="http://dbpedia.org/resource/United_States"><spanitemprop="spatial" itemscope="itemscope" itemtype="http://schema.org/Country"> <span itemprop="name">United States</span> </span> </a></div> <div><i>Publisher:</i> <span itemprop="publisher" itemscope="itemscope" itemtype="http://schema.org/Organization"> <span itemprop="name">Department of Technology</span> </span> </div></div> 31
  • Dataset example (RDFa)<div about="http://logd.tw.rpi.edu/source/datasf-org/dataset/catalog/datasf.org/version/2011-Jun-07/thing_89" typeof="dcat:Dataset"> <div><b><a href="http://www.datasf.org/story.php?title=seismic-hazard-zones-"> <span property="dcterms:title">Seismic Hazard Zones</span> </a></b></div> <div property="dcterms:description">The dataset represents theLiquefaction and Landslide Zones [...]</div> <div rel="dcterms:spatial" resource="http://dbpedia.org/resource/United_States"><i>Country:</i> <a href="http://dbpedia.org/resource/United_States"> <span about="http://dbpedia.org/resource/United_States"typeof="adms:Country"> <span property="dcterms:title">United States</span> </span> </a> </div> <div rel="dcterms:publisher"><i>Publisher:</i> <span typeof="foaf:Organization"> <span property="dcterms:title">Department of Technology</span> </span> </div></div> 32
  • Google extracts this dataItemType: http://schema.org/datasetname = Seismic Hazard Zonesurl = http://www.datasf.org/story.php?title=seismic-hazard-zones-description = The dataset represents the Liquefaction and Landslide Zones [...]spatial = Item( 1 )publisher = Item( 2 )Item 1Type: http://schema.org/countryname = United StatesItem 2Type: http://schema.org/organizationname = Department of Technology 33
  • Resources• HTML microdata - http://www.w3.org/TR/microdata• Schema.RDFS.org - http://schema.rdfs.org• W3C Web Schemas group (public-vocabs@w3c.org) - http://lists.w3.org/Archives/Public/public-vocabs• The Dataset proposal - http://www.w3.org/wiki/WebSchemas/Datasets• Rich Snippets Testing Tool - http://google.com/webmasters/tools/richsnippets 34
  • Credits• word clouds by - http://wordle.net• deployment statistics discovered using Sindice and Sindice4j - http://sindice.com - http://sindice4j.googlecode.com 35
  • Thanks!• Tetherless World Constellation • http://tw.rpi.edu• Contact: • josh@fortytwo.net, @joshsh 36
  • 37