Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
From EAD to Linked Data:(still) a work in progressArchives & Linked Data meeting,  JISC London, 7 Feb 2012Pete Johnston  T...
How?•   Model our “world”•   Design URI patterns•   Select/create RDF vocabularies•   Design mapping of existing data to R...
in  Finding                              maintainedBy/       Repository         administeredBy/    Place                  ...
Finding            maintainedBy/              Repository                 administeredBy/      Place  Aid              main...
Design URI PatternsCool URIs for the Semantic Webhttp://blogs.ukoln.ac.uk/locah/2010/11/16/identifying-the-things-uri-patt...
HTML                                                   Expose    XHTML+EAD                                                ...
EAD EADXML  EAD XMLEAD    Transform   Triple  XMLEAD               Store    XML     XML
Transform•   Transform EAD XML to RDF/XML using XSLT•   Translate RDF/XML to N-Triples•   Split N-Triples into chunks•   P...
Challenges• Archival description/Encoded Archival Description   • Document v data• Hub as aggregation   • Messy data, from...
Triple        Store       SPARQL/         API       EnhanceData    Data     DataSet     Set      Set
Enhance• Add supplementary data   • Repository postcode data   • Data about project (DOAP), dataset (VOID) etc• Internal l...
Enhance• Tools   • Silk - pattern matching   • Google Refine• Use third-party links   • e.g. get Dbpedia link from VIAF• U...
Challenges• Various target interfaces for lookup• Identity/similarity/”sameAs” issues, verification• Workflow   • Repeatab...
RDF o3-1        Lic A                RDF i1                              RDF o3                                           ...
HTML o2-1  Meta A                              HTML o2 (from     RDF i1                                     HTML o2-2     ...
Summary: challenges   Archival description/EAD   Data consistency, cleaning   Lookups, linking & identity   Time, vers...
From EAD to Linked Data:(still) a work in progressArchives & Linked Data meeting,  JISC London, 7 Feb 2012Pete Johnston  T...
Upcoming SlideShare
Loading in …5
×

From EAD to Linked Data: (still) a work in progress

1,546 views

Published on

Short presentation on challenges encountered in publishing EAD data as Linked Data in LOCAH and Linking Lives projects.

Archives & Linked Data meeting, JISC, London, Tuesday 7 February 2012

Published in: Technology, Education
  • Be the first to comment

From EAD to Linked Data: (still) a work in progress

  1. 1. From EAD to Linked Data:(still) a work in progressArchives & Linked Data meeting, JISC London, 7 Feb 2012Pete Johnston Technical Researcher, Eduserv pete.johnston@eduserv.org.uk
  2. 2. How?• Model our “world”• Design URI patterns• Select/create RDF vocabularies• Design mapping of existing data to RDF• Convert/transform data• Generate links• Publish/expose data• Maintain/sustain
  3. 3. in Finding maintainedBy/ Repository administeredBy/ Place Postcode Aid maintains (Agent) administers Unit hasPart/ encodedAs/ partOf encodes EAD Document accessProvidedBy/ LevelBiographical providesAccessTo hasBiogHist/ topic/ History isBiogHistFor page level Language Archival language topic/ at time page origination hasPart/ Resource product of Creation Temporal partOf Entity associatedWith extent inScheme Extent Agent Concept Concept Scheme representedBy Is-a foaf:focus Object Is-a associatedWith Person Family Organisation Place Book participates in Birth Death Genre Function at time Temporal Entity
  4. 4. Finding maintainedBy/ Repository administeredBy/ Place Aid maintains (Agent) administers accessProvidedBy/ providesAccessTo topic/ page Archival Resource origination hasPart/ partOf associatedWithAgent Concept Concept associatedWith Scheme inScheme Book foaf:focus Is-a Is-aPerson Family Organisation Place Genre Function
  5. 5. Design URI PatternsCool URIs for the Semantic Webhttp://blogs.ukoln.ac.uk/locah/2010/11/16/identifying-the-things-uri-patterns-for-the-hub-linked-data/Designing URI Sets for the UK Public Sectorhttp://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sectorhttp://example.org/id/person/p123456http://example.org/doc/person/p123456http://example.org/doc/person/p123456.htmlhttp://example.org/doc/person/p123456.rdfIdentifying the “things”: URI Patterns for the Hub Linked Datahttp://blogs.ukoln.ac.uk/locah/2010/11/16/identifying-the-things-uri-patterns-for-the-hub-linked-data/
  6. 6. HTML Expose XHTML+EAD RDFa EAD SPARQLXML EAD RDF/ XMLEAD Transform Triple XML XMLEAD Store XML XML SPARQL/ Other API Apps Enhance Data Data Data Set Set Set
  7. 7. EAD EADXML EAD XMLEAD Transform Triple XMLEAD Store XML XML
  8. 8. Transform• Transform EAD XML to RDF/XML using XSLT• Translate RDF/XML to N-Triples• Split N-Triples into chunks• Post to Triple Store• Manage inputs• Capture metadata about each step of process
  9. 9. Challenges• Archival description/Encoded Archival Description • Document v data• Hub as aggregation • Messy data, from multiple sources• Versioning • What happens when EAD doc X updated?• Tracking triple/graph provenance • Graph/quad support in store
  10. 10. Triple Store SPARQL/ API EnhanceData Data DataSet Set Set
  11. 11. Enhance• Add supplementary data • Repository postcode data • Data about project (DOAP), dataset (VOID) etc• Internal links/consolidation• Generate links to external resources • Ordnance Survey – trivial from postcode • VIAF – script to look up candidate matches • LCSH – script to look up, match
  12. 12. Enhance• Tools • Silk - pattern matching • Google Refine• Use third-party links • e.g. get Dbpedia link from VIAF• Use aggregator services • e.g. sameas.org• Capture metadata about each process
  13. 13. Challenges• Various target interfaces for lookup• Identity/similarity/”sameAs” issues, verification• Workflow • Repeatability?• Versioning• Tracking triple/graph provenance • Graph/quad support in store• Exposing triple provenance
  14. 14. RDF o3-1 Lic A RDF i1 RDF o3 Lic AEAD 1 RDF o3-2 Lic C RDF i2 RDF i1 RDF i2 RDF o2EAD 2 Lic C Lic B RDF iX Lic B Lic A Lic B RDF o1 RDF iX Lic C Lic C
  15. 15. HTML o2-1 Meta A HTML o2 (from RDF i1 HTML o2-2 Meta oA LinkedArchives Hub) RDF o2-1 RDF o2 Meta oB RDF i1 Meta B RDF o3-2 RDF i2 Meta A (from RDF i2 RDF o1DBpedia) HTML o1 Meta oB Meta B
  16. 16. Summary: challenges Archival description/EAD Data consistency, cleaning Lookups, linking & identity Time, versioning, persistence, workflows Trust, provenance, graphs, metadata
  17. 17. From EAD to Linked Data:(still) a work in progressArchives & Linked Data meeting, JISC London, 7 Feb 2012Pete Johnston Technical Researcher, Eduserv pete.johnston@eduserv.org.uk

×