0
From EAD to Linked Data:(still) a work in progressArchives & Linked Data meeting,  JISC London, 7 Feb 2012Pete Johnston  T...
How?•   Model our “world”•   Design URI patterns•   Select/create RDF vocabularies•   Design mapping of existing data to R...
in  Finding                              maintainedBy/       Repository         administeredBy/    Place                  ...
Finding            maintainedBy/              Repository                 administeredBy/      Place  Aid              main...
Design URI PatternsCool URIs for the Semantic Webhttp://blogs.ukoln.ac.uk/locah/2010/11/16/identifying-the-things-uri-patt...
HTML                                                   Expose    XHTML+EAD                                                ...
EAD EADXML  EAD XMLEAD    Transform   Triple  XMLEAD               Store    XML     XML
Transform•   Transform EAD XML to RDF/XML using XSLT•   Translate RDF/XML to N-Triples•   Split N-Triples into chunks•   P...
Challenges• Archival description/Encoded Archival Description   • Document v data• Hub as aggregation   • Messy data, from...
Triple        Store       SPARQL/         API       EnhanceData    Data     DataSet     Set      Set
Enhance• Add supplementary data   • Repository postcode data   • Data about project (DOAP), dataset (VOID) etc• Internal l...
Enhance• Tools   • Silk - pattern matching   • Google Refine• Use third-party links   • e.g. get Dbpedia link from VIAF• U...
Challenges• Various target interfaces for lookup• Identity/similarity/”sameAs” issues, verification• Workflow   • Repeatab...
RDF o3-1        Lic A                RDF i1                              RDF o3                                           ...
HTML o2-1  Meta A                              HTML o2 (from     RDF i1                                     HTML o2-2     ...
Summary: challenges   Archival description/EAD   Data consistency, cleaning   Lookups, linking & identity   Time, vers...
From EAD to Linked Data:(still) a work in progressArchives & Linked Data meeting,  JISC London, 7 Feb 2012Pete Johnston  T...
Upcoming SlideShare
Loading in...5
×

From EAD to Linked Data: (still) a work in progress

958

Published on

Short presentation on challenges encountered in publishing EAD data as Linked Data in LOCAH and Linking Lives projects.

Archives & Linked Data meeting, JISC, London, Tuesday 7 February 2012

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
958
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "From EAD to Linked Data: (still) a work in progress"

  1. 1. From EAD to Linked Data:(still) a work in progressArchives & Linked Data meeting, JISC London, 7 Feb 2012Pete Johnston Technical Researcher, Eduserv pete.johnston@eduserv.org.uk
  2. 2. How?• Model our “world”• Design URI patterns• Select/create RDF vocabularies• Design mapping of existing data to RDF• Convert/transform data• Generate links• Publish/expose data• Maintain/sustain
  3. 3. in Finding maintainedBy/ Repository administeredBy/ Place Postcode Aid maintains (Agent) administers Unit hasPart/ encodedAs/ partOf encodes EAD Document accessProvidedBy/ LevelBiographical providesAccessTo hasBiogHist/ topic/ History isBiogHistFor page level Language Archival language topic/ at time page origination hasPart/ Resource product of Creation Temporal partOf Entity associatedWith extent inScheme Extent Agent Concept Concept Scheme representedBy Is-a foaf:focus Object Is-a associatedWith Person Family Organisation Place Book participates in Birth Death Genre Function at time Temporal Entity
  4. 4. Finding maintainedBy/ Repository administeredBy/ Place Aid maintains (Agent) administers accessProvidedBy/ providesAccessTo topic/ page Archival Resource origination hasPart/ partOf associatedWithAgent Concept Concept associatedWith Scheme inScheme Book foaf:focus Is-a Is-aPerson Family Organisation Place Genre Function
  5. 5. Design URI PatternsCool URIs for the Semantic Webhttp://blogs.ukoln.ac.uk/locah/2010/11/16/identifying-the-things-uri-patterns-for-the-hub-linked-data/Designing URI Sets for the UK Public Sectorhttp://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sectorhttp://example.org/id/person/p123456http://example.org/doc/person/p123456http://example.org/doc/person/p123456.htmlhttp://example.org/doc/person/p123456.rdfIdentifying the “things”: URI Patterns for the Hub Linked Datahttp://blogs.ukoln.ac.uk/locah/2010/11/16/identifying-the-things-uri-patterns-for-the-hub-linked-data/
  6. 6. HTML Expose XHTML+EAD RDFa EAD SPARQLXML EAD RDF/ XMLEAD Transform Triple XML XMLEAD Store XML XML SPARQL/ Other API Apps Enhance Data Data Data Set Set Set
  7. 7. EAD EADXML EAD XMLEAD Transform Triple XMLEAD Store XML XML
  8. 8. Transform• Transform EAD XML to RDF/XML using XSLT• Translate RDF/XML to N-Triples• Split N-Triples into chunks• Post to Triple Store• Manage inputs• Capture metadata about each step of process
  9. 9. Challenges• Archival description/Encoded Archival Description • Document v data• Hub as aggregation • Messy data, from multiple sources• Versioning • What happens when EAD doc X updated?• Tracking triple/graph provenance • Graph/quad support in store
  10. 10. Triple Store SPARQL/ API EnhanceData Data DataSet Set Set
  11. 11. Enhance• Add supplementary data • Repository postcode data • Data about project (DOAP), dataset (VOID) etc• Internal links/consolidation• Generate links to external resources • Ordnance Survey – trivial from postcode • VIAF – script to look up candidate matches • LCSH – script to look up, match
  12. 12. Enhance• Tools • Silk - pattern matching • Google Refine• Use third-party links • e.g. get Dbpedia link from VIAF• Use aggregator services • e.g. sameas.org• Capture metadata about each process
  13. 13. Challenges• Various target interfaces for lookup• Identity/similarity/”sameAs” issues, verification• Workflow • Repeatability?• Versioning• Tracking triple/graph provenance • Graph/quad support in store• Exposing triple provenance
  14. 14. RDF o3-1 Lic A RDF i1 RDF o3 Lic AEAD 1 RDF o3-2 Lic C RDF i2 RDF i1 RDF i2 RDF o2EAD 2 Lic C Lic B RDF iX Lic B Lic A Lic B RDF o1 RDF iX Lic C Lic C
  15. 15. HTML o2-1 Meta A HTML o2 (from RDF i1 HTML o2-2 Meta oA LinkedArchives Hub) RDF o2-1 RDF o2 Meta oB RDF i1 Meta B RDF o3-2 RDF i2 Meta A (from RDF i2 RDF o1DBpedia) HTML o1 Meta oB Meta B
  16. 16. Summary: challenges Archival description/EAD Data consistency, cleaning Lookups, linking & identity Time, versioning, persistence, workflows Trust, provenance, graphs, metadata
  17. 17. From EAD to Linked Data:(still) a work in progressArchives & Linked Data meeting, JISC London, 7 Feb 2012Pete Johnston Technical Researcher, Eduserv pete.johnston@eduserv.org.uk
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×