Lightning Talk Session 3: Enabling FAIR Research Data and Other Outputs
The Irish ORCID Consortium
presented by Catherine Ferris, IReL;
Exploring Large-Scale Open Data: The Curatr Platform
presented by Derek Greene, University College Dublin;
A Workflow for Research Data Management (RDM): Aligning the Management of Research Data
presented by Gail Birkbeck, University College Dublin;
Making Cultural Heritage Data FAIR: Developing Recommendations for the WorldFAIR Project at the Digital Repository of Ireland
presented by Joan Murphy, Digital Repository of Ireland.
2. •VICTEUR ERC Project
• Interdisciplinary ERC-funded project at UCD aims to unlock large historical datasets to
map the dynamics of migration and cultural change in Britain from the 19th to 21st
century.
2
➢ Key requirement of the ERC is to make research outputs
openly available and accessible without technical obstacles.
https://projectvicteur.com
3. •Case Study: British Library Digital Collection
• Our collaborators at the British Library provided a collection of
digitised out of copyright 18th-19th century books.
• The original scans are stored as high-resolution image files. We
work with plain text versions created via Optical Character
Recognition produced by Microsoft.
3
lish the money they paid to the Italians for silk. He
also tried to raise many valuable tropical plants.
There was hardly any good thing which needed
doing that A GEORGIA ROAD. was undertaken by
the new colony. Such an enterprise! appealed
strongly to the benevo- 1 lent, and many
thousands of pounds | were given to help on this
good work. Parlia ment also voted a donation to
Georgia. No one H was allowed to make any profit
out of the new col- I ony, on the seal of which was
a device of silk-worms | spinning, with a motto in
Latin which meant, " Not for themselves, but for
others " (Non sibi sed aliis). In 1732 Oglethorpe
took out his first company of a hundred and
sixteen people, with whom he began the town of
Savannah. Many others were added, among
whom were a regiment of Scotch Highlanders,
highland regiment 6
The Household History of the United States and its people.
Edward Eggleston (1889)
Rivers of Great Britain, the Thames, from
Source to Sea. Author Unknown (1891)
4. •Case Study: British Library Digital Collection
• Plain text files for the books and separate metadata have been made available online
by the British Library.
• However, it is challenging for researchers to filter the metadata or find relevant
segments of text in a dataset of this scale.
4
➢ Can we go beyond providing open data and develop tools to
make digital text collections more accessible and useful?
‣ 49,509 books with associated metadata
‣ 63,985 text files representing individual volumes
‣ 12.3 million pages of text in these volumes
‣ 4.08 billion words of text in total
5. •Curatr Platform
• To make the British Library Digital Collection accessible to other
researchers, we have developed the Curatr platform
5
https://curatr.ucd.ie
6. •Curatr Platform
• Curatr has been designed for non-technical users, providing multiple entry points into
the collection.
• The full texts in Curatr can be searched and filtered based on a range of criteria,
including both textual content and metadata.
6
7. •Curatr Platform
• Curatr has been designed for non-technical users, providing multiple entry points into
the collection.
• The full texts in Curatr can be searched and filtered based on a range of criteria,
including both textual content and metadata.
7
8. •Curatr Platform
• Curatr allows researchers to move seamlessly between "distant reading" of the corpus
at a high level, and "close reading" of individual volumes and pages.
8
9. •Curatr Platform
• The Curatr user interface provides integration with other resources, such as the British
Library Online Catalogue and the British Library Ark Viewer.
9
10. •Curatr Platform
• Curatr provides visual methods for exploring the British Library Collection, such as by
navigating semantic networks.
10
Semantic network for query "egypt"
Semantic network for "contagion" + "disease"
11. •Conclusions
• For large-scale digital datasets, open access needs to mean
more than availability online or consistent metadata.
• Standard digital repositories are often poorly suited for hosting
and exploring datasets of this scale.
• Many custom resources for cultural research require high levels
of technical expertise to use productively.
• Developing usable tools for exploring such datasets can be
challenging, requiring close interdisciplinary collaborations.
• Such platforms can provide significant potential to make digital
datasets more accessible and usable for non-technical
researchers and the general public.
• Beyond open access, platforms such as Curatr offer inbuilt tools
to help overcome research inequities in technical training and
geographic access.
11
12. ● Any Questions?
Curatr platform: https://curatr.ucd.ie
Accounts: <ucdcuratr@gmail.com>
Data + code: https://github.com/derekgreene/curatr
Contact: <derek.greene@ucd.ie>
13. A workflow for research data
management
Gail Birkbeck | November 2, 2023
15. Project Activity Listing
200+ activities
Key
Research activity
Data activity
Date Activity Work Task
Activity type
(Code based on
ISRALM)
31/03/2020
Research
question, study
rationale
Initial proposal, scale of study,
identify possible partners. Review
literature (separate activity?)
Study overview
to share with
partners
Research Activity
Open / FAIR
requirement
Review funder data sharing
requirements and deliverables,
additional policies, DMP template
and capacity of team to meet
requirements
DM and DS
tasks and
related roles
Data Activity
01/04/2020
Proposal
development
Write proposal: including drafting
survey; inviting project partners;
admin(liason with foundation
office and buget?
Project mgt
aspects
Research Activity
Proposal
development
write section in propsoal on data
mgt and sharing
Project mgt
aspects
Data Activity
Ethics proposal
Develop submission for ethics
review
Project mgt
aspects (legal)
Research Activity
Ethics proposal
Addressed questions related to
open data Aspects of the
dataset
Data Activity
03/04/2020
Send to Uni
research office
Project mgt
aspects
Research Activity
or Project
management?
Coded activity based on
William et al. (2017) on Data
Mangement Topcis
20. Routine – to get to an output
Engage critically
with relevant
texts on the
issue
Engagement
Make sense of
these in a way
the relates to
the project
Appropriation
Outputs that
stand-alone and
are useable
independently
Autonomization
27. LPEIQ2.53_F
Did you become
unemployed during
the COVID-19
pandemic?
你在COVID-19疫情
期間失業了嗎?
क्या आप कोविड-19 महामारी
क
े दौरान बेरोजगार हो गए?
Blev du arbetslös under
COVID-19-pandemin?
את איבדת האם
בתקופת עבודתך
הקורונה
?
LPEIQ2.54_F
Did you stop working
because you needed
to support your family
member?
疫情期間,你是否
因為需要照顧你的
有智力及發展障礙
的家人而停止工作?
क्या आपने काम करना बंद
कर ददया क्योंकक आपको
अपने पररिार क
े सदस्य को
समर्थन करने की आिश्यकता
र्ी?
Slutade du arbeta under
pandemin eftersom du
behövde stödja din
familjemedlem?
לעבוד הפסקת האם
לתמוך שתוכל כדי
המוגבלות עם באדם
?
LPEIQ2.55_F
Did you have to
reduce the hours that
you normally go to
work because you
needed to support
your family member?
疫情期間,你是否
因為需要你的有智
力及發展障礙的家
人而不得不減少返
工的時間?
क्या आपको उन घंटों को
कम करना र्ा जजन्हें आप
सामान्य रूप से काम पर
जाते हैं क्योंकक आपको अपने
पररिार क
े सदस्य का समर्थन
करने की आिश्यकता है?
Var du tvungen gå ned i
arbetstid för stödja din
familjemedlem?
לצמצם צריך היית האם
שלך העבודה שעות את
לתמוך שתוכל כדי
המוגבלות עם באדם
?
LPEIQ2.56_F
Did you work from
home during the
COVID-19 pandemic
疫情期間,你是否
完全在家工作?
क्या आपने कोविड-19
महामारी क
े दौरान घर से
काम ककया र्ा
Arbetade du hemifrån under
COVID-19-pandemin
מהבית עבדת האם
במהלך בשכר
הקורונה
?
Creating a naming convention
29. NORFest, 2 November 2023
Joan Murphy, Research Associate
Digital Repository of Ireland
‘Global cooperation on FAIR data policy and practice’ (WorldFAIR) has received funding from the European Union’s Horizon Europe project call HORIZON-WIDERA-2021-ERA-01-01, grant
agreement 101058393. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union.'
Making Cultural Heritage Data FAIR
Developing Recommendations for the WorldFAIR
Project at the Digital Repository of Ireland
30. Digital
Repository of
Ireland
56 member institutions across the
island of Ireland contributing data
from the Arts, Humanities, Social
Sciences and Cultural Heritage.
CoreTrust Seal Certified
FAIR-enabling repository
Promoting best practice in
Digital preservation
Digital archiving
Open Research
FAIR data sharing
31. F
A
I
R
Is the data easy for both humans and computers to FIND ?
● persistent identifier
● rich metadata
● indexed in a searchable resource.
Once the user finds the required data, is it ACCESSIBLE ?
● mediated without specialised or proprietary tools or communication methods
● provide the exact conditions under which the data are accessible
Is the data INTEROPERABLE with other data?
● formal, accessible, shared language for knowledge representation
● references to other (meta)data
Is the data REUSABLE ?
● adequate context
● accurate and relevant attributes
● clear and accessible data usage licence
● detailed provenance Adapted from Bishop, B. W., & Hank, C. (2018).
Measuring FAIR principles to inform fitness for use.
What is FAIR?
34. ▪ Each case study explores aspects of FAIR implementation
in their domain/cross-domain research area.
▪ FAIR Implementation Profiles (how do you implement
FAIR, what are the FAIR enabling resources that you use?)
▪ Findings from each case study feeding into the
development of a Cross-Domain Interoperability
Framework with case studies from a range of research
areas
▪ Recommendations for domain sensitive FAIR
assessment.
TheWorldFAIR Project
35. WP13 - Cultural
Heritage
D13.1 Undertake a landscape report on image sharing
practices in the Cultural Heritage domain (Feb 2023)
D13.2 Make recommendations to improve FAIR practices for
Cultural Heritage, focusing on image sharing (May 2023).
Test the FAIR-aligning recommendations at the DRI (by
actually implementing them !)
D13.3 Share our results and explain what worked and what
didn’t during DRI’s implementation (April 2024)
36. Dr Sümeyye Akça Marmara University, Information and Records Management | Dr Renata Oliveira de Araujo Universidade Federal do Estado do
Rio de Janeiro | Keren Barner Younes & Soraya Nazarian Library, University of Haifa | Dr Kathryn Cassidy Digital Repository of Ireland | Dr Isabel
Ceron Academy of the Social Sciences in Australia | Dr Josiline Chigwada University of South Africa | Dr Steven Claeyssens KB, The National
Library of the Netherlands | Anita Cooper Royal Irish Academy | Dr Claudio Cortese 4Science SpA | Dr Milena Dobreva GATE Institute, Sofia
University St Kliment Ohridski | Dr Kristina Hettne Leiden University Libraries | Eileen J. Manchester Digital Innovation Division (LC Labs), Library
of Congress | Mikala Narlock University of Minnesota | Dr Esther Olembe University of Yaounde II-SOA National Archives | Gina O'Kelly Irish
Museums Association | Dr Rebecca O’Neill Wikimedia Community Ireland | Thomas Padilla Internet Archive | Dana Reijerkerk Stony Brook
University | Glen Robson IIIF Consortium | Dr Antje Schmidt Museum für Kunst und Gewerbe, Hamburg | Dr Tim Sherratt Centre for Creative and
Cultural Research, University of Canberra | Margaret Warren Institute for Human & Machine Cognition, Metadata Authoring Systems
37. Recommendation 1
Citation Model
A formal citation model for cultural heritage images should be adopted which
recognises digital surrogates as research objects and includes references to revisions
of either image data or metadata
Recommendation 2
Transparency
Information about the creation, management and preservation of files should be
visible and understandable by both humans and machines
Recommendation 3
Data Documentation
The process of selection, scope and completeness of the data should always be
documented
Recommendation 4
Licensing and Rights
Rights, licences and labels should be applied consistently and in a standardised way
Recommendation 5
Delivery
Improved support for identifier schemes, APIs and machine-readable contextual data
Recommendations
38. Landscape report - easy wins !
Key Findings
● Data formats used by GLAMs are open, limited in number and widely accessible.
● Metadata interoperability is well-developed and supported by a variety of technologies, standards, crosswalks and
data models, however, there is a tendency towards reducing both the granularity of the metadata and complexity of
the metadata structures in order to facilitate this.
● Data interoperability is facilitated by a limited number of web technologies, with IIIF perhaps the most significant
new development in the field.
● URIs (expressed as URLs) are favoured over DOIs and other PIDs, and have been successfully used to support
both persistent data retrieval and the use of LOD.
● While copyright is generally stated clearly, data and metadata usage licences are sometimes unclear or more
restrictive than the copyright status might suggest. It is generally implied that data usage without attribution is not
allowed, although the image sharing platforms encourage open licences for metadata.
● Provenance information relating to the stewardship of collections is generally available, but limited. Information
about the acquisition and ongoing care of digital collections (both original objects and their surrogates) is not
usually made available.
● Despite established and robust practices and policies for digital preservation in the sector, there is no requirement
for organisations or image sharing platforms to maintain either data or metadata records as originally published,
and there is a noticeable lack of user-facing transparency around administrative or preservation actions.
✅ Open file formats allow for
easy reuse of image data
✅ Metadata standards are widely
used and information is easily
exchanged between systems
✅Widely used technologies
facilitate access
✅ Permalinks and other
registration numbers allow for
consistent retrieval
✅ Provenance information is
available
✅ Trust in the persistence of
collections is established
39. …but also areas for Improvement
Key Findings
● Data formats used by GLAMs are open, limited in number and widely accessible.
● Metadata interoperability is well-developed and supported by a variety of technologies, standards, crosswalks and
data models, however, there is a tendency towards reducing both the granularity of the metadata and complexity of
the metadata structures in order to facilitate this.
● Data interoperability is facilitated by a limited number of web technologies, with IIIF perhaps the most significant
new development in the field.
● URIs (expressed as URLs) are favoured over DOIs and other PIDs, and have been successfully used to support
both persistent data retrieval and the use of LOD.
● While copyright is generally stated clearly, data and metadata usage licences are sometimes unclear or more
restrictive than the copyright status might suggest. It is generally implied that data usage without attribution is not
allowed, although the image sharing platforms encourage open licences for metadata.
● Provenance information relating to the stewardship of collections is generally available, but limited. Information
about the acquisition and ongoing care of digital collections (both original objects and their surrogates) is not
usually made available.
● Despite established and robust practices and policies for digital preservation in the sector, there is no requirement
for organisations or image sharing platforms to maintain either data or metadata records as originally published,
and there is a noticeable lack of user-facing transparency around administrative or preservation actions.
🆇 Reduced granularity makes it
more difficult to query, parse, and
analyse the data
🆇 Why not PIDs? Are all cultural
heritage links stable?
🆇 The user is expected to
determine whether they have the
rights to use the data
🆇 Provenance information might
not be provided
🆇 Is trust always warranted? What
happens when cultural records
change?
40. Research and Review Implementation Reporting Comms & CDIF
September October November December January February March April May June
Citation Model Review DRI’s citation model Implement changes Update docs
Transparency Review DRI embedded metadata workflow and rationale Survey members* Statement of intent
Development of
recommendations
, guidance etc.
Data Documentation
RO-Crate & FAIR Signposting - asssessment and review Develop Training Materials
Datasheets Template Development Publication Training
Licensing & Rights
Review DRI Documentation Agree Licences Develop Training Materials Training
Tech
Implementation
Plain Language
page
Delivery Review LOD Identify use cases SPARQL
Implementation Phase
41. WP13 Mapping Report and
Recommendations Report
https://worldfair-project.eu/cultural-heritage/
Thanks for listening !
j.murphy@ria.ie
b.knazook@ria.ie