Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cigs lod rcahms_seneschal_pm_20131118

2,590 views

Published on

SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Links / Peter McKeague, RCAHMS, on behalf of the SENESCHAL Project team

Presented at Linked Open Data: current practice in libraries and archives (Cataloguing & Indexing Group in Scotlland 3rd Linked Open Data Conference), Edinburgh, 18 Nov 2013

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Cigs lod rcahms_seneschal_pm_20131118

  1. 1. SENESCHAL SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Links Peter McKeague (On behalf of project partners) peter.mckeague@rcahms.gov.uk www.rcahms.gov.uk http://canmore.rcahms.gov.uk
  2. 2. Outline of talk  Part I RCAHMS  What we do  What we hold  Classifying  Part II Drivers for Linked Data  Part III SENESCHAL  Project Partners  The Project so far  Prospects
  3. 3. RCAHMS Mission Statement • Identifies, surveys and analyses the historic and built environment of Scotland • Preserves, cares for and adds to the information and items in its national collection • Promotes understanding, education and enjoyment through interpretation of the information it collects and the items it looks after
  4. 4. RCAHMS vocabularies SC656461 SC335945 Monuments SC1224403 Objects Maritime Craft SC694685 Events
  5. 5. Standards: Midas Heritage http://www.english-heritage.org.uk/publications/midas-heritage/ CIDOC Conceptual Reference Model (CRM) http://www.cidoc-crm.org/
  6. 6. Monuments: Internal staff database Thesaurus: Events Pick lists Thesauri Monuments Objects Maritime Craft Pick list Pick list
  7. 7. Information is published on Canmore Thesauri Monuments Objects Maritime Craft http://canmore.rcahms.gov.uk
  8. 8. Information is published on Canmore Thesauri Monuments Objects Maritime Craft
  9. 9. RCAHMS thesauri: text search http://orapweb.rcahms.gov.uk/apex/f?p=210:1:
  10. 10. RCAHMS thesauri: term definition http://orapweb.rcahms.gov.uk/apex/f?p=210:1:
  11. 11. RCAHMS thesauri : suggest a term http://orapweb.rcahms.gov.uk/apex/f?p=210:1:
  12. 12. Part II: Drivers for Linked Data We already publish our thesauri as key reference datasets for use by professional archaeologists in national organisations, in local authority Historic Environment Records as well as by anyone interested in the historic environment. BUT Our vocabularies (and other data) are not visible The thesaurus architecture limits the potential of the terminology Terms lack the persistent URIs that would allow our resources to act as hubs for the Web of Data. Interoperability ---For heritage, the main exponents of Linked Data are from the research community, and in Scotland primarily from Computer Scientists
  13. 13. Drivers for Linked Open Data It is Government policy Open Data White paper June 2012: http://data.gov.uk/sites/default/files/Open_data_White_Paper.pdf Scotland’s Digital Future April 2013: http://www.scotland.gov.uk/Resource/0042/00421478.pdf
  14. 14. Drivers for Linked Open Data It is Government policy: Open Data White Paper June 2012: • Public data policy and practice will be clearly driven by the public and businesses who want to use the data, including what data is released, when and in what form • Public data will be published in reusable, machine-readable form • Public data will be released under the same open licence which enables free reuse, including commercial reuse • Public data will be published using open standards, and following relevant recommendations of the World Wide Web Consortium • Public data from different departments about the same subject will be published in the same, standard formats and with the same definitions • Public data underlying the Government’s own website will be published in re-usable form • Release data quickly, and then work to make sure it is available in open standard formats, including Linked data forms.
  15. 15. ... And a practical use An online submission form to report fieldwork from contractors to curators
  16. 16. Part III: The partners “the key to interoperability” http://www.heritagedata.org/ ©University of Glamorgan
  17. 17. Lineage STAR: Semantic Technologies for Archaeological resources 2007-2010 AHRC funded project with English Heritage to apply semantic and knowledge-based technologies to the digital archaeological domain. STAR developed new methods for linking digital archive databases, vocabularies and the associated grey literature, exploiting the potential of a high level, core ontology and natural language processing techniques. http://hypermedia.research.southwales.ac.uk/kos/star/ STELLAR: Semantic Technologies Enhancing Links and Linked data for Archaeological Resources 2010-2011 AHRC funded project with the ADS and English Heritage. Building on the outcomes of STAR, STELLAR provided support for non-specialist users to map and extract datasets. http://hypermedia.research.southwales.ac.uk/kos/stellar/ SENESCHAL: Semantic ENrichment Enabling Sustainability of arCHAeological Links 2013-2014 AHRC funded project with the ADS, English Heritage, RCAHMS, RCAHMW and Wessex Archaeology. http://hypermedia.research.southwales.ac.uk/kos/SENESCHAL/ and http://www.heritagedata.org
  18. 18. The SENESCHAL Project  seneschal n. Historical  The steward or major-domo of a medieval great house  12 month AHRC funded project  March 2013 - February 2014  Deliverables  Controlled vocabularies online  Linked data (SKOS)  Downloadable files  Web services  term suggestion, term validation, legacy data alignment  Tools to align data with controlled vocabularies  Browser-based ‘widget’ controls
  19. 19. Interoperability  “The terminology of a subject is the key to interoperability” (John F. Sowa)  Interoperability requires more than just a common data model  Data compatibility occurs on 2 levels – semantic and syntactic. Ontologies / data structures deal with the semantic but not necessarily the syntactic  “The CRM relies on existing syntactic interoperability and is concerned only with adding semantic interoperability” (CIDOC CRM documentation)
  20. 20. You say potato, I say tomato…  Multiple datasets, multiple organisations, multiple languages  Unification of data structures is possible, BUT…  Incompatible terminology hinders cross search and prevents greater interoperability  Applications attempting to reuse data must all individually sort out the same old problems  E.g. Get all the iron age post holes… Feature Period Post-hole IRON AGE Posthole |ron age POST HOLE Iron age? POSTHLOLE EARLY IRON AGE POST HOLE (POSSIBLE) 250 BC POSTHOLES C 500-200 B.C. Solution: data cleansing and controlled vocabularies?
  21. 21. Typical interoperability issues encountered  Simple spelling errors  POSTHLOLE”, “CESS PITT”, “FURRROWS”, FLINT SCRAPPER”  Alternate word forms  “BOUNDARY”/”BOUNDARIES”, “GULLEY”/”GULLIES”  Prefixes / suffixes  “RED HILL (POSSIBLE)”, “TRACKWAY (COBBLED)”, “CROFT?”, “CAIRN (POSSIBLE)”, “PORTAL DOLMEN (RE-ERECTED)”  Nested delimiters  “POTTERY, CERAMIC TILE, IRON OBJECTS, GLASS”  Terms not intended for indexing  “NONE”, “UNIDENTIFIED OBJECT”, “N/A”, “NA”, “INCOHERENT”  Terms that would not be in (any) thesauri  “WOTSITS PACKET”, “CHARLES 2ND COIN”, “ROMAN STRUCTURE POSSIBLY A VILLA“, “ST GUTHLACS BENEDICTINE PRIORY”, “WORCESTER-BIRMINGHAM CANAL”, “KUNGLIGA SLOTTET”, “SUB-FOSSIL BEETLES”  More specific phrases  “SIDE WALL OF POT WITH LUG”, “BRICK-LINED INDUSTRIAL WELL OR MINE SHAFT”, “ALIGNMENT OF PLATFORMS AND STONES”
  22. 22. Solutions - SENESCHAL  Controlled vocabularies (again)  Commonly agreed concepts, terminology and identifiers  Existing / new thesauri – community contributions?  Openness and availability  Licensing, web services, downloads, data formats  Alignment of existing data  Data cleansing tools  Alignment techniques  Alignment of new data  Interactive embedded data entry tools  Validation at point of data entry  Rather than trying to solve this vocabulary problem, help to prevent it from happening in the first place
  23. 23. Vocabularies online as (SKOS) Linked Data  Vocabularies from English Heritage      Monument Types Thesaurus Objects Thesaurus Event Types Thesaurus Maritime Craft Thesaurus RCHME Cultural Periods List / MIDAS Archaeological Periods List  Vocabularies from RCAHMS  Monument Thesaurus (Scotland)  Multilingual - includes Scottish Gaelic translations!  Objects (Scotland)  Maritime Craft (Scotland)  Vocabularies from RCAHMW  Monument Thesaurus (Wales)  Event (Wales)  Period (Wales)  Moving from term based towards concept based indexing  Start to create links between concepts… between vocabularies… between datasets… between sites… between countries  Cross searching of (multilingual) cultural heritage resources
  24. 24. (partial) SKOS model skos:ConceptScheme skos:topConceptOf dc:title, dc:description [literal value] skos:hasTopConcept skos:inScheme skos:broader, skos:narrower, skos:related skos:Concept skos:member skos:Collection skos:prefLabel, skos:altLabel, skos:notation, skos:scopeNote, skos:changeNote [literal value]
  25. 25. Data licensing and attribution using CC REL  Attribution back to original data providers URI cc:attributionURL cc:attributionURL cc:license cc:license skos:ConceptScheme skos:Concept URI cc:attributionName cc:attributionName [literal value] dct:creator dct:creator URI dc:source URI dc:source
  26. 26. General System Architecture Native vocabularies Additional metadata web controls & applications Linked Data REST API STELLAR (SKOS) templates (upload) SKOS RDF vocabularies Web Services REST API SPARQL query endpoint SENESCHAL data store
  27. 27. Linked Data API (preliminary)  The project will implement a Linked Data (restful) API  The base URI maybe http://www.heritagedata.org/ or http://purl.org/xxx/..  Seneschal is a sub-project within the wider scope of ‘heritagedata.org’ – so:  http://www.heritagedata.org/seneschal - wiki/blog for project details, and  <base uri>/schemes/123 (e.g.) for actual data API – see below…  Proposed REST API:            /schemes – return list of all SKOS concept schemes held /schemes/search - (with parameters) – search for schemes /schemes/{id} – return details of specified SKOS concept scheme (current version) /schemes/{id}.html, .n3, .rdf, .json – return different serializations of that data, obtained either by content negotiation or by direct request including extension /schemes/{id}/concepts – return list of ALL SKOS concepts in specified scheme /schemes/{id}/concepts/search – search for concepts in the specified scheme /concepts – return list of all SKOS concepts in ALL schemes /concepts/search - (with parameters) – search for concepts in any scheme /concepts/{id} – return details of specified SKOS concept (current version) /concepts/{id}.html, .n3, .rdf, .json – return different serializations of the data, obtained either by content negotiation or by direct request including extension /concepts/{id}/schemes - return list of all schemes referencing the specified concept
  28. 28. Project deliverables http://www.heritagedata.org/blog/
  29. 29. Schema List http://heritagedata.org/test/getAllSchemes.php
  30. 30. Scottish Monument types http://heritagedata.org/test/schemes/1.html
  31. 31. Scottish Monument types: Top level http://heritagedata.org/test/schemes/1/concepts/405.html
  32. 32. Scottish Monument types: concept http://purl.org/heritagedata/schemes/1/concepts/409
  33. 33. http://heritagedata.org/test/searchForm.php
  34. 34. http://heritagedata.org/test/sparql.php
  35. 35. Versioning (preliminary)  /schemes/{id} – returns current version of the specified scheme  /schemes/{id}/versions – returns all versions of the specified scheme  /schemes/{id}/versions/{id} – returns specified version of the specified scheme  /concepts/{id} – returns current version of the specified concept  /concepts/{id}/versions – returns all versions of the specified concept  /concepts/{id}/versions/{id} – returns specified version of the specified concept dct:hasVersion [skos:ConceptScheme] data:schemes/123 (dct:isVersionOf) [skos:ConceptScheme] data:schemes/123/versions/20111005 dct:hasVersion (dct:isVersionOf) [skos:ConceptScheme] data:schemes/123/versions/2013020301
  36. 36. Published vocabularies Vocabulary England Scotland Wales Monument type YES YES YES Objects YES YES Maritime craft YES YES Period YES Events (activities) YES ??? Archaeological Sciences YES ??? Components YES Building materials YES Evidence YES YES
  37. 37. A question of jurisdiction TENEMENT (Scotland) http://purl.org/heritagedata/schemes/1/concepts/467 A large building containing a number of rooms or flats, access to which is usually gained via a common stairway. TENEMENT (England) http://purl.org/heritagedata/schemes/eh_tmt2/concepts/68997 A parcel of land. TENEMENT (Wales) http://purl.org/heritagedata/schemes/10/concepts/68997 TENEMENT BLOCK (England) http://purl.org/heritagedata/schemes/eh_tmt2/concepts/71489 Use for speculatively built 19th century "model dwellings", rather than those built by a philanthropic society. TENEMENT BLOCK (Wales) http://purl.org/heritagedata/schemes/10/concepts/71489 TENEMENT HOUSE (England) http://purl.org/heritagedata/schemes/eh_tmt2/concepts/71476 SC674834 289 Allison Street, Glasgow: TENEMENT http://canmore.rcahms.gov.uk/en/site/148111/ Originally built as a family house. Converted into flats during the 19th or 20th century.
  38. 38. A question of jurisdiction SC683414 Cruck Framed Byre, Latheron, Caithness http://canmore.rcahms.gov.uk/en/site/86630/ A Cruck House in Wick, Worcestershire Cruck cottage in Wick Philip Halling http://creativecommons.org/licenses/by-sa/2.0/
  39. 39. A bheil Gàidhlig agaibh? DP151933 The Cenotaph, George Square, Glasgow: http://canmore.rcahms.gov.uk/en/site/143264/
  40. 40. A bheil Gàidhlig agaibh?
  41. 41. Multilinguality  Multilingual labels & notes  Search in one language, retrieve another  Potential to manage regional terms
  42. 42. Challenges for RCAHMS Controlled vocabularies online  Integration of project deliverables into RCAHMS processes  Managing candidate terms  Publishing additional vocabularies  Jurisdiction  - a single British thesaurus for Cultural heritage?  Adding images  Moving the goalposts
  43. 43. Summary  Controlled vocabularies online  Linked data (SKOS)  Downloadable files  Linking out  Mapping between the different thesauri  Web services  term suggestion, term validation, legacy data alignment  Tools to align data with controlled vocabularies  Browser-based ‘widget’ controls http://www.heritagedata.org/blog/work-in-the-pipeline/
  44. 44. “the key to interoperability” http://www.heritagedata.org/ ©University of Glamorgan

×