Semantic Web fundamentals
- information linking and layering in practice
(or: what RDF might do for Drupal)
(Vrije Universiteit Amsterdam)
- Where I’m coming from
- Layered & linked information
- Web History & Linking Open Data
- What’s out there now: data, apps, ecosystem
- Semantic Web standards - acronym attack:
- RDF/S, OWL, SPARQL, SKOS, RDFa
- POWDER, GRDDL, RDB2RDF...
- What’s in it for Drupal & its users?
• “I don’t know what it is, but Dries likes it”
• a set of attributes for adding typed links
and properties to document markup
• a bit like microformats, but with different
strengths & weaknesses
• a way of putting data into Web pages
Semantic Web & RDF
(De-fragmenting the Web of data?)
• Accidental standards-nerd
• Making things in the Web since ’94
• Found metadata community & W3C
through trying to make better Web sites
• W3C staff ‘99-’05, edited RDFS, co-chaired
RDFCore work; SKOS, POWDER; FOAF...
• TV stuff at Joost & now VU University
Amsterdam, see www.NoTube.tv
Layers of data?
Can we also “layer” the kind of information
we ﬁnd in ordinary Web pages & databases?
How do we select, compose, compare layers?
How do we ﬁgure out when two data sources
are talking about the same thing?
How can we model, create, publish, consume,
and query general-purpose layered data?
What links the layers?
• when we don’t have time & space indexes...
• Good old-fashioned HTML hyperlinks
• Data ﬁles that form a Web (Linked Data)
• Two documents mentioning same thing
• Two identiﬁers referring to the same thing
• Typed links between things (including docs)
Where the Web came from...
Original image credits: Tim Berners-Lee
“The Semantic Web isn't just about putting data on
the web. It is about making links, so that a person or
machine can explore the web of data.
With linked data, when you have some of it, you
can ﬁnd other, related, data.”
Use URIs as names for things.
Use HTTP URIs so that people can look up those names.
When someone looks up a URI, provide useful information, using the standards.
Include links to other URIs. so that they can discover more things.”
From linked documents to databases & websites...
Aside: the value of linking
• GMail - labelling messages or conversations
• W3C POWDER - label pages or collections?
• Google Social Graph API - data linking
• Biz/ed Web site - my ﬁrst webmaster job
WHAT’S OUT THERE?
RDF DATA - lots
From lifescience data and all the BBC’s TV and music
listings to FOAF proﬁles, agricultural research, views
of bestbuy.com for machines, and an ever growing
collection of taxonomies and thesauri.
Thesauri and taxonomies?
- formal, logical taxonomies - OWL, Web Ontology Language
- library-style thesauri, looser topical concept hierarchies - SKOS
Library of Congress
Subject Headings (LCSH)
“The Library of Congress Authorities and Vocabularies service enables both humans
and machines to programmatically access authority data at the Library of Congress.
This service is inﬂuenced by -- and implements -- the Linked Data movement's
approach of exposing and inter-connecting data on the Web via dereferenceable URIs.”
One XHTML/RDFa page per topic,
LCSH topics for Paris
More linked data on Paris...
In HTML at http://www.geonames.org/2988507/paris.html
<link rel="alternate" type="application/rdf+xml"
title="RDF Version" href="http://sws.geonames.org/2988507/about.rdf" />
In RDF/XML at http://sws.geonames.org/2988507/about.rdf
<alternateName xml:lang="ja"> </alternateName>
Data: In Summary...
More every day.
All use RDF’s data model and vocabularies.
Some write RDF in RDF/XML, others XHTML/RDFa.
Varying focus, detail, quality, linkage.
Serious Library RDF data in Dublin Core and SKOS.
FOAF visualization by Tim O’Brien
Yahoo’s BOSS (Build your Own Search Service):
“By combining your unique assets and ideas with
our search technology assets, BOSS is a platform
for the next generation of search innovation, serving
hundreds of millions of users across the Web.”
Google Rich Snippets: RDFa & Microformats
FILES & WORKFLOW
Since 2001, RDF/XML inside TIFF, JPEG, PNG, GIF, PDF, ...
OpenOfﬁce.org & OASIS OpenDocument Format 1.2:
Multi-ﬁle archives, with a manifest.rdf and inline RDFa (coming soon...)
OpenOfﬁce.org report 2.5 Million download / week, > 100 million users.
RDF in Drupal
DC - Dublin Core Metadata Initiative
DC & Education
DC & Museums
DC & Libraries...
& the “cold start” taxonomy problem
SKOS in Drupal
“OrganiK extends the Taxonomy XML module by adding SKOS support. Load...
This allows you to import/export taxonomies using SKOS.”
"OrganiK is a partially EU-funded Research project for the beneﬁt of SMEs."
see also: http://drupal.org/node/560326
Drupal Fields API
• Attach arbitrary properties to anything in
• RDF does the same for the Web at large
• Natural to bridge these two
Drupal & Taxonomies
• Many existing add-ons, patches, applications
• SKOS (a W3C standard since August :) will
bring dozens of thesauri, glossaries etc.
• News sites can be pre-populated with
externally managed categories and topics
• Using standard identiﬁers for topics aids
Drupal & Data APIs
• SPARQL is to RDF as SQL is to RDBMs
• SPARQL scripting looks very familiar to
anyone working with SQL
• Drupal code can call external SPARQL DBs
• SPARQL queries can include “GRAPH”
clauses which specify chosen data layers...
• the “lowest hanging fruit”
• data need no longer be hidden in the db
• SEO (Yahoo searchmonkey, Google
Snippets), cross-linking and improved
• A foundation to build on
• RDF is a data model for linked/layered data
• “Links” are made with URIs
• “Layers” represent independent data sources
• Types of thing (classes) and relationship
(property) are deﬁned using RDFS and OWL
• So there is an RDFS or OWL deﬁnition for all
the terms deﬁned by Dublin Core, FOAF etc.
• Basically a simple machine-readable dictionary
• RDF/XML is the oldest RDF syntax
• RDFa is new, integrates with XHTML
• GRDDL lets you map from other XML or
• You can also map from SQL, JSON, CSV ...
• a language for asking questions
• looks a lot like SQL, but simpler and webby
• comes with xml and json Web service API
• Working Group is working on an update
• let’s you say “everything on
mp3.myserver.example.com/ is a CC-SA-
licensed MP3 music ﬁle”
• Or that “everything under w3.org/People/
is published by W3C”
• Or that everything on images.playboy.com
is an image that might not be child-suitable
• Simple Knowledge Organization Systems
• An RDF-based way of encoding thesauri
• Or other things that are thesaurus-like
• Like site hierarchy, IMAP folders, tags
• bringing lots of library data into the Web
There’s a growing ecosystem around linked RDF data.
Drupal can expect to manage ﬁles containing it,
link to sites carrying it, and be populated with
taxonomies expressed in it.