2. Overview
• historical origins of the Semantic Web initiative
• example of SPARQL querying ‘Linked Data’
• some conclusions and suggestions
A brief introduction to SemanticWeb data sharing,
focussing on underlying principles.
Tuesday, 2 November 2010
13. Part 2: SemWeb today
• lessons: no global consistency;Web pages that
make claims; inter-twingularity...
• what does this mean for modern RDF tools?
• how can we share and link data in the Web, in
practice?
Tuesday, 2 November 2010
14. over 24.7 billion triples
over 436 million links between datasets
Tuesday, 2 November 2010
18. Linked Data guidelines
• 1. Use URIs as names for things (eg. schools!)
• 2. Use HTTP URIs to allow people to get info.
• 3. Publish useful info there (eg. using RDF).
• 4. Include links to other URIs in your data.
see: http://www.w3.org/DesignIssues/LinkedData.html
Tuesday, 2 November 2010
19. RDF/SPARQL example
“Q: Which schools in the BANES area have a nursery?”
prefix sch-ont: <http://education.data.gov.uk/def/school/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?name WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:districtAdministrative
<http://statistics.data.gov.uk/id/local-authority-district/00HA> ;
sch-ont:nurseryProvision "true"^^xsd:boolean
}
ORDER BY ?name
examples by Leigh Dodds,Talis: http://blogs.talis.com/n2/archives/818
Tuesday, 2 November 2010
21. Fosse Way School, Fosseway Infant School, Keynsham Primary
School, King Edward's School, Midsomer Norton Primary School,
Monkton Prep School, Peasedown St John Primary School, Royal
High School, Southdown Community Infant School, St Andrew's
CofE Primary School, St Keyna Primary School, St Martin's
Garden Primary School, St Saviour's CofE Infant School, The
Paragon School, Junior School of Prior Park, College Trinity Coe
VC Primary, Twerton Infant School...
(according to the SPARQL RDF database at
http://services.data.gov.uk/education/sparql )
Answer:
Tuesday, 2 November 2010
23. More SPARQL-able queries from UK linked data :
Select the name, lowest and highest age ranges,
capacity and pupil:teacher ratio for all schools
in the Bath & North East Somerset district.
What is the uri, name, and opening date
of the oldest school in the UK?
Select the name, easting and northing for
the 100 newest schools in the UK.
Select the uri, name, and the reason for closing for all
schools that are currently scheduled for closure. The reason
is a URI from a controlled vocabulary in the ontology.
In which parliamentary constituencies did schools open in 2008?
examples by Leigh Dodds,Talis: http://blogs.talis.com/n2/archives/818
Tuesday, 2 November 2010
24. Lessons from part 1
• no global consistency: RDF and SPARQL
allow for contradictory, competing data
• semantics: RDF/XML, RDFa, GRDDL -
several ways to get RDF statements from a
document; several publishing models for
RDF in your Web site.
• intertwingularity:“the interconnectedness
of all things” as an engineering problem...
Tuesday, 2 November 2010
25. ‘Scope creep’
• “intertwingularity” is a silly name for a
serious problem: scope creep
• Schema designers are under constant
pressure to change, add, improve their
designs. Problems are not tidily packaged.
• RDF is built to survive this: independent
schemas and datasets can be freely mixed
together, without always ‘asking permission’.
Tuesday, 2 November 2010
26. In practice
• Each school could have an HTML/RDFa
page (or RDF/XML too)
• Datasets that distinguish institution from
location might publish one set of RDF;
others that flatten these aspects together
can do likewise with their data.
• Cross-dataset consistency comes later, if at
all.
Tuesday, 2 November 2010
27. Problems don't come nicely scoped and packaged into cleanly distinct
domains. Whenever you try to solve one problem, it borders on a dozen
others that are a higher priority for people elsewhere.
You think you're working with 'events' data but find yourself with
information describing musicians; you think you're describing musicians,
but find yourself describing digital images; you think you're describing
digital images, but find yourself describing geographic locations; you
think you're building a database of geographic locations, and find
yourself modeling the opening hours of the businesses based at those
locations.
To a poet or idealist, these interconnections might be beautiful or
inspiring; to a project manager or product manager, they are as likely to
be terrifying.
By dropping in identifiers that link to a big pile of other people's
data, we can hopefully make it easier to keep projects nicely scoped
without needlessly restricting future functionality.
An events database can remain an events database, but use identifiers
for artists and performers, making it possible to filter events by
properties of those participants. A database of places can be only a
link or two away from records describing the opening hours or business
offerings of the things at those places.
Tuesday, 2 November 2010
28. “Pay as you go”
integration
• there is no single “right” ontology
• data can be mixed and merged ad-hoc
• relations like owl:sameAs, skos:closeMatch
can be used to interlink datasets later
• common models emerge from bottom up,
“pave the cowpaths...”
*
* analogy by Richard Cyganiak
Tuesday, 2 November 2010
29. Geo questions
• Can GML, KML etc be handled in RDF?
• yes, either as links, textual ‘islands’ or some
RDF systems have extensions to support
spatial queries within SPARQL.
• Which geo-related ontology to use?
• several exist, simple and complex. It depends.
• Is it better to use a common ontology, or capture
our data exactly in a custom one?
• you can do both and let others decide.
Tuesday, 2 November 2010
30. Suggestions
• Build a Linked Data test-bed with several
datasets whose coverage overlaps in scope
• each dataset initially mapped to its own RDF
• experiment with finding common models;
schemas/ontologies, and shared identifiers
• evaluate against use cases expressed as
SPARQL queries
Tuesday, 2 November 2010
31. Conclusions
• The Semantic Web project applies Web ideas to data
sharing.
• Linked RDF datasets have different emphasis (eg.
geo, schools, politics, events), accuracy and focus.
• Treated properly this is a strength, as it allows the
Web of data to grow organically without central
control.
• Location-related data is a natural ‘hub’, often mixed
with non-geo data. RDF and SPARQL offer Web
standards for sharing and querying such mixed data,
allowing for decentralised schemas.
Tuesday, 2 November 2010
32. Questions?
Credits: original NeXT browser, see
http://en.wikipedia.org/wiki/WorldWideWeb
Images:Tim Berners-Lee, Richard Cyganiak,Anja Jentzsch
Tuesday, 2 November 2010