Linked Data
at The Science Museum

Tristan Roddis, Cogapp
Daniel Evans, Science Museum
MCN 2012, Seattle
Agenda
The context
The big idea
Why linked data?
The approach
The challenges
Where next?
Questions
The context
Specifically
Collections Management System (MultiMimsy XG)
Digital Asset Management System (iBase)
Archives Management System (AdLib)
Web Content Management System (Sitecore)
Legacy Web content (XML files)
Etc.
Sitecore
    Mimsy              CMS
     CMS




iBase
DAM
                                AdLib
                                AMS


            Legacy
             docs
The big idea
Extract data from all silos and connect
Use Linked Data for extensibility
Sitecore
    Mimsy                    CMS
     CMS




            Triple store


iBase
DAM
                                      AdLib
                                      AMS


             Legacy
              docs
Why linked data?
A brief history of data
Relational
Hierarchical
Graph
Linked Data is easy!

                 foaf:firstName
   cog:tristan                    “Tristan”



   subject         predicate          object




    <http://data.cogapp.com/id/tristan>
   <http://xmlns.com/foaf/0.1/firstName>
                  "Tristan".
Linked Data ingredients

RDF triples
Triple-store
SPARQL endpoint
(Inferencing engine)
Benefits of Linked Data

Flexible storage
Flexible querying
Evolution of data
Standard format and interface
Linking to the web of data
The approach
Sitecore
    Mimsy                           CMS
     CMS
            COBOAT       Workflow hook



                  Triple store

        COBOAT                      API
iBase                               +
DAM                                rdflib    AdLib
                 One-off import              AMS


                     Legacy
                      docs
The challenges
Identifiers

Stability
303 redirects
Opaque versus human-readable

http://data.sciencemuseum.org.uk/id/objects/smxg-12345
Ontologies

Dublin Core (DC)
Ontologies

Dublin Core (DC)
Dublin Core Terms (DCT)
Ontologies

Dublin Core (DC)
Dublin Core Terms (DCT)
Friend Of A Friend (FOAF)
Ontologies

Dublin Core (DC)
Dublin Core Terms (DCT)
Friend Of A Friend (FOAF)
Simple Knowledge Organization System (SKOS)
Ontologies

Dublin Core (DC)
Dublin Core Terms (DCT)
Friend Of A Friend (FOAF)
Simple Knowledge Organization System (SKOS)
CIDOC Conceptual Reference Model (CIDOC CRM)
Ontologies

Dublin Core (DC)
Dublin Core Terms (DCT)
Friend Of A Friend (FOAF)
Simple Knowledge Organization System (SKOS)
CIDOC Conceptual Reference Model (CIDOC CRM)
Europeana Data Model (EDM)
Ontologies

Dublin Core (DC)
Dublin Core Terms (DCT)
Friend Of A Friend (FOAF)
Simple Knowledge Organization System (SKOS)
CIDOC Conceptual Reference Model (CIDOC CRM)
Europeana Data Model (EDM)
schema.org
Ontologies

Dublin Core (DC)
Dublin Core Terms (DCT)
Friend Of A Friend (FOAF)
Simple Knowledge Organization System (SKOS)
CIDOC Conceptual Reference Model (CIDOC CRM)
Europeana Data Model (EDM)
schema.org
Etc.
Linked Data is still young

Immature tools and technology
Small pool of experts
Mindset change
Linked Data doesn’t solve everything

Filtering tasks
Reconciliation tasks
Exposes inconsistencies
Exposes copyright issues
Where next?
Where next?

Data opportunities:

    More sources (Sitecore, legacy sites, new content)

    More data from existing sources
    (reconcilliation between systems, turning literal strings into nodes)

    From Linked Data to Linked Open Data: link to DBpedia,
    Geonames, VIAF, BNB, etc.

    Inferencing to expose data via different ontologies
Where next

Publishing opportunities:

    Public SPARQL Endpoint

    REST API

    Website

    Pull in UGC

    Pull in external data
Geonames
            DBPedia
                            Sitecore            Web pages
    Mimsy                     CMS               REST API
     CMS
                      UGC




            Triple store               SPARQL


iBase
DAM
                                       AdLib
                                       AMS


             Legacy
              docs
Questions?

Linked data at the Science Museum

Editor's Notes

  • #2 Daniel can’t be here
  • #4 In common with other institutions, SM has disparate data silos
  • #13 Not consolidation. Instead: scheduled, repeatable extraction
  • #17 This is called RDFEverything is a tripleUnique URIsTriples interconnect to form a graph
  • #19 Individual triples with the same subject or object interconnect to form a graph of data
  • #20 Inferencing: e.g. hasMother, hasFather: hasParent
  • #21 Storage: no schema changesQuerying: no inherent hierarchyEvolution: easy to change or layer or mergeStandard: works regardless of underlying systemsLinking: LD-&gt;LOD. Unambiguously connect
  • #22 http://richard.cyganiak.de/2007/10/lod/
  • #24     Overall scheme:        Collections Management System (MultiMimsy XG) = RDDMS = COBOAT        Digital Asset Management System (iBase)  = RDDMS = COBOAT        Archives Management System (AdLib) = REST API = custom Python scripts (rdflib)        Web Content Management System (Sitecore) = .NET = custom workflow hook (dotNetRDF)        Legacy Web content (XML files) = custom script for one-off importSome channels still pending
  • #26 Example query
  • #27 Results shown as a graph
  • #29 Explain structure of e.g. http://data.sciencemuseum.org.uk/id/agents/sm-12345Mention Cool URIs, and UK government guidelinesPossible example of opaque vs non
  • #30 http://www.w3.org/TR/cooluris/http://data.gov.uk/resources/urishttp://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector
  • #38 Mention that we talked to the BL, the BM, archives hub, Kasabi
  • #39 Discussion about design principle: minimise number of different classes; reuse popular ontologies; model our domain; limit repetition;
  • #40 Popular ontologies: FOAF
  • #41 Useful ontology: SKOSMention unresolved questions e.g. how much duplication of predicates; how to map to others – inferencing?
  • #42 Brand new in terms of adoptionTools: subtle differences between systems. Inferencing not standard.Experts: compare finding info on Jena versus find info on Apache web serverMindset change: lots of relational database developers, comparatively few for LD
  • #43 Filtering: only public data should be exposed (at both item and field level)Reconciliation: making sure identifiers sync; making sure data links up(i.e. no orphaned content)Inconsistencies: example of many source predicates for ‘maker’ collapsed to a few in final structure; example of different uses of similar systems such as iBaseBtLvsiBase IngeniousCopyright: issue of text or images not owned by institution: triple store would need to only have references if made available as CC0
  • #46 UGC example: crowdsourced Babbage transcriptionsExternal example: wikipedia bios for people, geonames data for places
  • #47 Verbal summary: the move to linked data can be challenging, but is exciting and liberating