• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Linked data at the Science Museum
 

Linked data at the Science Museum

on

  • 436 views

Slides to accompany my talk at the Museum Computer Network conference 2012. ...

Slides to accompany my talk at the Museum Computer Network conference 2012.

I discuss how we extract information from various repositories at the museum, and convert it to RDF format, as well as some of the challenges along the way.

The video of the talk can be seen at http://www.youtube.com/watch?v=NZZhkyEnxhk and the description of the conference session is at http://www.mcn.edu/2012/linked-open-data-science-museum

Statistics

Views

Total Views
436
Views on SlideShare
416
Embed Views
20

Actions

Likes
1
Downloads
0
Comments
0

1 Embed 20

http://lanyrd.com 20

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Daniel can’t be here
  • In common with other institutions, SM has disparate data silos
  • Not consolidation. Instead: scheduled, repeatable extraction
  • This is called RDFEverything is a tripleUnique URIsTriples interconnect to form a graph
  • Individual triples with the same subject or object interconnect to form a graph of data
  • Inferencing: e.g. hasMother, hasFather: hasParent
  • Storage: no schema changesQuerying: no inherent hierarchyEvolution: easy to change or layer or mergeStandard: works regardless of underlying systemsLinking: LD->LOD. Unambiguously connect
  • http://richard.cyganiak.de/2007/10/lod/
  •     Overall scheme:        Collections Management System (MultiMimsy XG) = RDDMS = COBOAT        Digital Asset Management System (iBase)  = RDDMS = COBOAT        Archives Management System (AdLib) = REST API = custom Python scripts (rdflib)        Web Content Management System (Sitecore) = .NET = custom workflow hook (dotNetRDF)        Legacy Web content (XML files) = custom script for one-off importSome channels still pending
  • Example query
  • Results shown as a graph
  • Explain structure of e.g. http://data.sciencemuseum.org.uk/id/agents/sm-12345Mention Cool URIs, and UK government guidelinesPossible example of opaque vs non
  • http://www.w3.org/TR/cooluris/http://data.gov.uk/resources/urishttp://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector
  • Mention that we talked to the BL, the BM, archives hub, Kasabi
  • Discussion about design principle: minimise number of different classes; reuse popular ontologies; model our domain; limit repetition;
  • Popular ontologies: FOAF
  • Useful ontology: SKOSMention unresolved questions e.g. how much duplication of predicates; how to map to others – inferencing?
  • Brand new in terms of adoptionTools: subtle differences between systems. Inferencing not standard.Experts: compare finding info on Jena versus find info on Apache web serverMindset change: lots of relational database developers, comparatively few for LD
  • Filtering: only public data should be exposed (at both item and field level)Reconciliation: making sure identifiers sync; making sure data links up(i.e. no orphaned content)Inconsistencies: example of many source predicates for ‘maker’ collapsed to a few in final structure; example of different uses of similar systems such as iBaseBtLvsiBase IngeniousCopyright: issue of text or images not owned by institution: triple store would need to only have references if made available as CC0
  • UGC example: crowdsourced Babbage transcriptionsExternal example: wikipedia bios for people, geonames data for places
  • Verbal summary: the move to linked data can be challenging, but is exciting and liberating

Linked data at the Science Museum Linked data at the Science Museum Presentation Transcript

  • Linked Dataat The Science MuseumTristan Roddis, CogappDaniel Evans, Science MuseumMCN 2012, Seattle
  • AgendaThe contextThe big ideaWhy linked data?The approachThe challengesWhere next?Questions
  • The context
  • SpecificallyCollections Management System (MultiMimsy XG)Digital Asset Management System (iBase)Archives Management System (AdLib)Web Content Management System (Sitecore)Legacy Web content (XML files)Etc.
  • Sitecore Mimsy CMS CMSiBaseDAM AdLib AMS Legacy docs
  • The big idea
  • Extract data from all silos and connectUse Linked Data for extensibility
  • Sitecore Mimsy CMS CMS Triple storeiBaseDAM AdLib AMS Legacy docs
  • Why linked data?
  • A brief history of dataRelationalHierarchicalGraph
  • Linked Data is easy! foaf:firstName cog:tristan “Tristan” subject predicate object <http://data.cogapp.com/id/tristan> <http://xmlns.com/foaf/0.1/firstName> "Tristan".
  • Linked Data ingredientsRDF triplesTriple-storeSPARQL endpoint(Inferencing engine)
  • Benefits of Linked DataFlexible storageFlexible queryingEvolution of dataStandard format and interfaceLinking to the web of data
  • The approach
  • Sitecore Mimsy CMS CMS COBOAT Workflow hook Triple store COBOAT APIiBase +DAM rdflib AdLib One-off import AMS Legacy docs
  • The challenges
  • IdentifiersStability303 redirectsOpaque versus human-readablehttp://data.sciencemuseum.org.uk/id/objects/smxg-12345
  • OntologiesDublin Core (DC)
  • OntologiesDublin Core (DC)Dublin Core Terms (DCT)
  • OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)
  • OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)Simple Knowledge Organization System (SKOS)
  • OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)Simple Knowledge Organization System (SKOS)CIDOC Conceptual Reference Model (CIDOC CRM)
  • OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)Simple Knowledge Organization System (SKOS)CIDOC Conceptual Reference Model (CIDOC CRM)Europeana Data Model (EDM)
  • OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)Simple Knowledge Organization System (SKOS)CIDOC Conceptual Reference Model (CIDOC CRM)Europeana Data Model (EDM)schema.org
  • OntologiesDublin Core (DC)Dublin Core Terms (DCT)Friend Of A Friend (FOAF)Simple Knowledge Organization System (SKOS)CIDOC Conceptual Reference Model (CIDOC CRM)Europeana Data Model (EDM)schema.orgEtc.
  • Linked Data is still youngImmature tools and technologySmall pool of expertsMindset change
  • Linked Data doesn’t solve everythingFiltering tasksReconciliation tasksExposes inconsistenciesExposes copyright issues
  • Where next?
  • Where next?Data opportunities: More sources (Sitecore, legacy sites, new content) More data from existing sources (reconcilliation between systems, turning literal strings into nodes) From Linked Data to Linked Open Data: link to DBpedia, Geonames, VIAF, BNB, etc. Inferencing to expose data via different ontologies
  • Where nextPublishing opportunities: Public SPARQL Endpoint REST API Website Pull in UGC Pull in external data
  • Geonames DBPedia Sitecore Web pages Mimsy CMS REST API CMS UGC Triple store SPARQLiBaseDAM AdLib AMS Legacy docs
  • Questions?