Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cultural Heritage: when data are much worst than one can believe


Published on

Presentation by Franco Niccolucci, University of Florence at RDA National Event in Florence, Italy, November 2016

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Cultural Heritage: when data are much worst than one can believe

  1. 1. PARTHENOS and the DMP Firenze– 14/11/2016 Franco Niccolucci Scientific Coordinator, PARTHENOS
  2. 2. 2 What is PARTHENOS PARTHENOS is a “cluster” project (2015-2019) putting together Research Infrastructures in the Humanities, Language and Cultural Heritage sectors, adopting a bottom-up approach to develop joint strategies as regards: •  Data policies: data management and quality; open data and access •  Standards: produce recommendations to document primary sources, reference resources and protocols and procedures •  Interoperability and Semantics: develop a common semantic framework and a joint Cataloguing Dataset Model •  Services and Tools
  3. 3. Current project progress ü  Starting points set after wide consultation of the reference communities, also through needs reports contributed by participating infrastructure projects ü  Basic standardization kit defined •  Work on recommendations in progress, delivery of drafts in 2017 ü  Dataset Model prototype created, currently under test •  Work on data management in progress, delivery planned for May 2017 •  Parallel work on scientific data started within E-RIHS (European Research Infrastructure for Heritage Science), an ESFRi project approved after PARTHENOS started, but collaborating with it – needs to be recovered in the second iteration of PARTHENOS work 3
  4. 4. What makes a Heritage Science DMP more demanding than others? 4
  5. 5. Guidelines for a HS DMP •  Guidelines to create a DMP do not always consider the “special needs” of HS, and in the best case they state: “take a note”. Here are some examples: •  A survey of 47 DMP submitted to NEH (USA) in 2015 and 2016, only 2 concerning heritage and did not address the issue •  U. Minnesota and Colorado School of Mines provide DMP examples with non digital data, but they do not address HS issues •  The MIT DMP guide recommends to record “how the data was generated, including equipment or software used, experimental protocol, other things you might include in a lab notebook” •  The DCC (UK) 2011 guide states: “It is fundamental to capture contextual details about how and why the data were created.” •  Plan de gestion de Datos, (PaGoDa) (ES) references UK good practice but does not mention this HS issue •  The Humboldt University (DE) tool does not require specific information for the DMP, maybe because it aims at H2020 only •  In Italy (and probably elsewhere) the DMP is linked to H2020, and the main reason stated for making it is that it is mandatory, and failing to make it may cause the withdrawal of the funding (the path to brain passes through the wallet…) 5
  6. 6. What are the requirements for a HS DMP: F & A •  Findability & Accessibility: humanities and heritage sciences belong to the “long tail of science” = many small datasets, little use of IT, limited (but increasing) deposit with institutional/domain/national repositories è need for registries and search systems •  The ARIADNE Registry and Portal for archaeological datasets •  The (forthcoming) E-RIHS DIGILAB – Registry of HS-related datasets •  Both will use the PARTHENOS CDM which is designed to support all dataset documentation needs in the humanities and heritage science 6
  7. 7. What are the requirements for a HS DMP: I •  Interoperability: to define an overarching “umbrella” ontology with specializations for individual domains. However, there are mainly two standards: •  TEI for texts/humanities •  CIDOC CRM for cultural heritage •  They respond to different needs •  Forking probable, with reconciliation at high level •  The main difference is in the nature of the objects studied •  Are they the focus of research (HS) or information carrier (HUM)? •  According to which perspective prevails, there are different documentation requirements 7
  8. 8. What are the requirements for a HS DMP: R •  Re-use: to be able to re-use data, a large number of additional metadata must be provided: the 5W+H. This may be cumbersome for the data creator •  Who: not only a question of researcher’s credibility (also!) but also of the scholarly perspective the creator usually adopted, possibly not matching with the re-user’s one •  Example: classifications of flint tools use different approaches, e.g. use vs. manufacturing •  Why: what is the research question underlying the data creation? It may influence the way data were generated •  Example: 3D models created for communication may be unsuitable for research •  What: which was/were the object(s) studied? Are they acceptable for data re-use? •  Example: the object conditions may be corrupted and cause differences 8
  9. 9. What are the requirements for a HS DMP: R (cont) •  When/where: this is particularly relevant for legacy data, and all data become legacy after some time... Technologies change and this may limit the reliability of results. •  Example: the FLAME project (U. Oxford) studies the movement, exchange, and transformation of metal in Eurasian societies during the Bronze and Early Iron Age. It collects metallurgic analyses made on archaeological material since the XIX century; dating each one is paramount to qualify their reliability •  How: which protocol was used to generate the data? If data were created using equipment (from a digital camera to a particle accelerator), the instrument features and settings may significantly influence the response. Also different methods possibly lead to different results. •  Example: 3D scan models may differ according to scanner type (laser, structured light, etc), model (Minolta vs. Breuckmann), settings (resolution chosen) and post-processing (decimation) 9
  10. 10. A (tentative) solution •  The CIDOC CRM is developing a global system to address these issues and enable human and machine re-use of heritage science data through proper documentation •  CRMdig defines an ontology for documenting digital data acquisition (e.g. photo, 3D) •  CRMsci (ext) defines an ontology for documenting the results of analytical experiments in heritage science (e.g. XRF, XRD, FTIR, etc.) •  CRMrel defines ways to express confidence in one’s results and communicate them to future re-users •  All need to be agreed upon, tested and assessed in real cases 10
  11. 11. THANK YOU! PARTHENOS is a project funded by the European Commission under Horizon2020 Franco Niccolucci PARTHENOS