Linked Dataat The Science MuseumTristan Roddis, CogappDaniel Evans, Science MuseumMCN 2012, Seattle
AgendaThe contextThe big ideaWhy linked data?The approachThe challengesWhere next?Questions
The context
SpecificallyCollections Management System (MultiMimsy XG)Digital Asset Management System (iBase)Archives Management System...
Sitecore    Mimsy              CMS     CMSiBaseDAM                                AdLib                                AMS...
The big idea
Extract data from all silos and connectUse Linked Data for extensibility
Sitecore    Mimsy                    CMS     CMS            Triple storeiBaseDAM                                      AdLi...
Why linked data?
A brief history of dataRelationalHierarchicalGraph
Linked Data is easy!                 foaf:firstName   cog:tristan                    “Tristan”   subject         predicate...
Linked Data ingredientsRDF triplesTriple-storeSPARQL endpoint(Inferencing engine)
Benefits of Linked DataFlexible storageFlexible queryingEvolution of dataStandard format and interfaceLinking to the web o...
The approach
Sitecore    Mimsy                           CMS     CMS            COBOAT       Workflow hook                  Triple stor...
The challenges
IdentifiersStability303 redirectsOpaque versus human-readablehttp://data.sciencemuseum.org.uk/id/objects/smxg-12345
OntologiesDublin Core (DC)
OntologiesDublin Core (DC)Dublin Core Terms (DCT)
OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)
OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)Simple Knowledge Organization System (SKOS)
OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)Simple Knowledge Organization System (SKOS)CIDOC...
OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)Simple Knowledge Organization System (SKOS)CIDOC...
OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)Simple Knowledge Organization System (SKOS)CIDOC...
OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)Simple Knowledge Organization System (SKOS)CIDOC...
Linked Data is still youngImmature tools and technologySmall pool of expertsMindset change
Linked Data doesn’t solve everythingFiltering tasksReconciliation tasksExposes inconsistenciesExposes copyright issues
Where next?
Where next?Data opportunities:    More sources (Sitecore, legacy sites, new content)    More data from existing sources   ...
Where nextPublishing opportunities:    Public SPARQL Endpoint    REST API    Website    Pull in UGC    Pull in external data
Geonames            DBPedia                            Sitecore            Web pages    Mimsy                     CMS     ...
Questions?
Linked data at the Science Museum
Linked data at the Science Museum
Linked data at the Science Museum
Linked data at the Science Museum
Linked data at the Science Museum
Linked data at the Science Museum
Linked data at the Science Museum
Linked data at the Science Museum
Linked data at the Science Museum
Linked data at the Science Museum
Linked data at the Science Museum
Linked data at the Science Museum
Linked data at the Science Museum
Linked data at the Science Museum
Linked data at the Science Museum
Upcoming SlideShare
Loading in …5
×

Linked data at the Science Museum

832 views

Published on

Slides to accompany my talk at the Museum Computer Network conference 2012.

I discuss how we extract information from various repositories at the museum, and convert it to RDF format, as well as some of the challenges along the way.

The video of the talk can be seen at http://www.youtube.com/watch?v=NZZhkyEnxhk and the description of the conference session is at http://www.mcn.edu/2012/linked-open-data-science-museum

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
832
On SlideShare
0
From Embeds
0
Number of Embeds
59
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Daniel can’t be here
  • In common with other institutions, SM has disparate data silos
  • Not consolidation. Instead: scheduled, repeatable extraction
  • This is called RDFEverything is a tripleUnique URIsTriples interconnect to form a graph
  • Individual triples with the same subject or object interconnect to form a graph of data
  • Inferencing: e.g. hasMother, hasFather: hasParent
  • Storage: no schema changesQuerying: no inherent hierarchyEvolution: easy to change or layer or mergeStandard: works regardless of underlying systemsLinking: LD->LOD. Unambiguously connect
  • http://richard.cyganiak.de/2007/10/lod/
  •     Overall scheme:        Collections Management System (MultiMimsy XG) = RDDMS = COBOAT        Digital Asset Management System (iBase)  = RDDMS = COBOAT        Archives Management System (AdLib) = REST API = custom Python scripts (rdflib)        Web Content Management System (Sitecore) = .NET = custom workflow hook (dotNetRDF)        Legacy Web content (XML files) = custom script for one-off importSome channels still pending
  • Example query
  • Results shown as a graph
  • Explain structure of e.g. http://data.sciencemuseum.org.uk/id/agents/sm-12345Mention Cool URIs, and UK government guidelinesPossible example of opaque vs non
  • http://www.w3.org/TR/cooluris/http://data.gov.uk/resources/urishttp://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector
  • Mention that we talked to the BL, the BM, archives hub, Kasabi
  • Discussion about design principle: minimise number of different classes; reuse popular ontologies; model our domain; limit repetition;
  • Popular ontologies: FOAF
  • Useful ontology: SKOSMention unresolved questions e.g. how much duplication of predicates; how to map to others – inferencing?
  • Brand new in terms of adoptionTools: subtle differences between systems. Inferencing not standard.Experts: compare finding info on Jena versus find info on Apache web serverMindset change: lots of relational database developers, comparatively few for LD
  • Filtering: only public data should be exposed (at both item and field level)Reconciliation: making sure identifiers sync; making sure data links up(i.e. no orphaned content)Inconsistencies: example of many source predicates for ‘maker’ collapsed to a few in final structure; example of different uses of similar systems such as iBaseBtLvsiBase IngeniousCopyright: issue of text or images not owned by institution: triple store would need to only have references if made available as CC0
  • UGC example: crowdsourced Babbage transcriptionsExternal example: wikipedia bios for people, geonames data for places
  • Verbal summary: the move to linked data can be challenging, but is exciting and liberating
  • Linked data at the Science Museum

    1. 1. Linked Dataat The Science MuseumTristan Roddis, CogappDaniel Evans, Science MuseumMCN 2012, Seattle
    2. 2. AgendaThe contextThe big ideaWhy linked data?The approachThe challengesWhere next?Questions
    3. 3. The context
    4. 4. SpecificallyCollections Management System (MultiMimsy XG)Digital Asset Management System (iBase)Archives Management System (AdLib)Web Content Management System (Sitecore)Legacy Web content (XML files)Etc.
    5. 5. Sitecore Mimsy CMS CMSiBaseDAM AdLib AMS Legacy docs
    6. 6. The big idea
    7. 7. Extract data from all silos and connectUse Linked Data for extensibility
    8. 8. Sitecore Mimsy CMS CMS Triple storeiBaseDAM AdLib AMS Legacy docs
    9. 9. Why linked data?
    10. 10. A brief history of dataRelationalHierarchicalGraph
    11. 11. Linked Data is easy! foaf:firstName cog:tristan “Tristan” subject predicate object <http://data.cogapp.com/id/tristan> <http://xmlns.com/foaf/0.1/firstName> "Tristan".
    12. 12. Linked Data ingredientsRDF triplesTriple-storeSPARQL endpoint(Inferencing engine)
    13. 13. Benefits of Linked DataFlexible storageFlexible queryingEvolution of dataStandard format and interfaceLinking to the web of data
    14. 14. The approach
    15. 15. Sitecore Mimsy CMS CMS COBOAT Workflow hook Triple store COBOAT APIiBase +DAM rdflib AdLib One-off import AMS Legacy docs
    16. 16. The challenges
    17. 17. IdentifiersStability303 redirectsOpaque versus human-readablehttp://data.sciencemuseum.org.uk/id/objects/smxg-12345
    18. 18. OntologiesDublin Core (DC)
    19. 19. OntologiesDublin Core (DC)Dublin Core Terms (DCT)
    20. 20. OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)
    21. 21. OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)Simple Knowledge Organization System (SKOS)
    22. 22. OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)Simple Knowledge Organization System (SKOS)CIDOC Conceptual Reference Model (CIDOC CRM)
    23. 23. OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)Simple Knowledge Organization System (SKOS)CIDOC Conceptual Reference Model (CIDOC CRM)Europeana Data Model (EDM)
    24. 24. OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)Simple Knowledge Organization System (SKOS)CIDOC Conceptual Reference Model (CIDOC CRM)Europeana Data Model (EDM)schema.org
    25. 25. OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)Simple Knowledge Organization System (SKOS)CIDOC Conceptual Reference Model (CIDOC CRM)Europeana Data Model (EDM)schema.orgEtc.
    26. 26. Linked Data is still youngImmature tools and technologySmall pool of expertsMindset change
    27. 27. Linked Data doesn’t solve everythingFiltering tasksReconciliation tasksExposes inconsistenciesExposes copyright issues
    28. 28. Where next?
    29. 29. Where next?Data opportunities: More sources (Sitecore, legacy sites, new content) More data from existing sources (reconcilliation between systems, turning literal strings into nodes) From Linked Data to Linked Open Data: link to DBpedia, Geonames, VIAF, BNB, etc. Inferencing to expose data via different ontologies
    30. 30. Where nextPublishing opportunities: Public SPARQL Endpoint REST API Website Pull in UGC Pull in external data
    31. 31. Geonames DBPedia Sitecore Web pages Mimsy CMS REST API CMS UGC Triple store SPARQLiBaseDAM AdLib AMS Legacy docs
    32. 32. Questions?

    ×