SlideShare a Scribd company logo
POST-INGEST
CURATION:
CURATING WITHOUT AN
INSTITUTIONAL REPOSITORY
Kevin Leonard - Evelien Dhollander
Data Curation Requires Datasets
• Data curation: adding value to (meta)data for long-term preservation
• Imagined (ideal) workflow:
1. Researcher provides data to
curator for curation
• Voluntary submission
• Automatic part of ingest in
institutional repository
2. Curator makes changes and
recommendations
3. Data is put online for long-term
preservation
• Is that realistic for many institutions?
What Often Happens to Datasets,
Really?
Scientist Curator
+
+
+
+
Proposed Workflow
1. Find datasets online
• Employ existing data linking architectures
• Use repository APIs
2. Produce (meta)data augmentation plan for discovered
datasets
• Develop plan based on current best practices for FAIR metadata
• Recommend changes that maintain existing DOI networks
3. Provide researchers with an easily actionable curation plan
Step 1: Where Are The Datasets?
• Difficulties:
• Datasets are broadly distributed
• Affiliation information is not located in a consistent location (or format!)
• Existing data linking systems (e.g., Scholix, DataCite) have limited
coverage
• Solution:
• Use repository APIs to search for institutional datasets
• Search outside of just <creator><affiliation> field
Example Python Code
• Python code to search for institutional records
• searchQuery can include multiple items
• Universiteit Gent
• UGent
• Ghent University
• 00cv9y106 (ROR id)
• Saves DOIs of all datasets to csv
• Can use OAI-PMH to extract more metadata information
• Focused on several popular repositories, easily extended
• Zenodo, OSF, Dryad, Figshare, PANGAEA
Step 2: What To Do With What You’ve
Found
• Repositories often allow metadata fields to be edited
• WITHOUT triggering the creation of a new version (and therefore a new DOI)
• Editable fields vary by repository:
STRI
CT
LENIEN
T
• Editing any
metadata fields
creates a new
version
• Most fields can be
edited
• Title, authors,
relatedTO
Develop Recommendation Plan
• Is the title clear?
• Are keywords provided?
• Are there links to related publications?
• Do the authors have linked ORCIDs or affiliations?
• Is there sufficient documentation in a README?
• Can this information be provided in <description
descriptionType=“Abstract”>?
Step 3: Communicating the
Recommendations
• Implementation relies on participation of the researcher
• Curation plan must be easily actionable with clearly articulated benefits
• Reduce burden on researcher to interpret instructions
Metadata Field Current Value
Recommended
Changes
Rationale
Title
Abstract
…
Current results
• Currently, the code harvests >2000 total records
• Frequently encountered issues:
• Abstracts redundant with publication
• No direct contact information
• Missing keywords
Source
Number of Records
Found
DataCite 236
Dryad 196
Figshare 302
OSF 186
Pangaea 724
Zenodo 710
Conclusions
• Relatively simple method to provide value to existing datasets
• Benefits even if author declines to make recommended edits:
• Helps institution find their research outputs
• Provides researchers with FAIRness recommendations that they can
implement for future datasets
• Communicates the existence (and utility!) of data support staff
More information?
Contact us:
Kevin Leonard
kevinmichael.leonard@ugent.be
Evelien Dhollander
evelien.dhollander@ugent.be

More Related Content

Similar to Leonard&Dhollander_OpenScienceBelgium.pptx

2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
datacite
 
Revolutionising the Journal through Big Data Computational Research
Revolutionising the Journal through Big Data Computational ResearchRevolutionising the Journal through Big Data Computational Research
Revolutionising the Journal through Big Data Computational ResearchAmye Kenall
 
Biehl (2012) implementing a healthcare data warehouse
Biehl (2012) implementing a healthcare data warehouseBiehl (2012) implementing a healthcare data warehouse
Biehl (2012) implementing a healthcare data warehouse
rbiehl
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
Erin D. Foster
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production process
Louise Corti
 
Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...
amiraryani
 
Lawrence-f1000-publishing with data-nfdp13
Lawrence-f1000-publishing with data-nfdp13Lawrence-f1000-publishing with data-nfdp13
Lawrence-f1000-publishing with data-nfdp13
DataDryad
 
Data accessibilityandchallenges
Data accessibilityandchallengesData accessibilityandchallenges
Data accessibilityandchallenges
jyotikhadake
 
Big data in action
Big data in actionBig data in action
Big data in action
Chad Richeson
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Anita de Waard
 
RDM Roadmap to the Future, or: Lords and Ladies of the Data
RDM Roadmap to the Future, or: Lords and Ladies of the DataRDM Roadmap to the Future, or: Lords and Ladies of the Data
RDM Roadmap to the Future, or: Lords and Ladies of the Data
Robin Rice
 
Curoverse Presentation at ICG-11 (November 2016)
Curoverse Presentation at ICG-11 (November 2016)Curoverse Presentation at ICG-11 (November 2016)
Curoverse Presentation at ICG-11 (November 2016)
Arvados
 
Effective research data management
Effective research data managementEffective research data management
Effective research data management
Catherine Gold
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Tony Ross-Hellauer
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
OpenAIRE
 
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
EUDAT
 
A FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning ModelsA FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning Models
Ben Blaiszik
 
Crossref LIVE US Online
Crossref LIVE US OnlineCrossref LIVE US Online
Crossref LIVE US Online
Crossref
 
L07 metadata
L07 metadataL07 metadata
L07 metadata
thplayer127
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
National Information Standards Organization (NISO)
 

Similar to Leonard&Dhollander_OpenScienceBelgium.pptx (20)

2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
 
Revolutionising the Journal through Big Data Computational Research
Revolutionising the Journal through Big Data Computational ResearchRevolutionising the Journal through Big Data Computational Research
Revolutionising the Journal through Big Data Computational Research
 
Biehl (2012) implementing a healthcare data warehouse
Biehl (2012) implementing a healthcare data warehouseBiehl (2012) implementing a healthcare data warehouse
Biehl (2012) implementing a healthcare data warehouse
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production process
 
Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...
 
Lawrence-f1000-publishing with data-nfdp13
Lawrence-f1000-publishing with data-nfdp13Lawrence-f1000-publishing with data-nfdp13
Lawrence-f1000-publishing with data-nfdp13
 
Data accessibilityandchallenges
Data accessibilityandchallengesData accessibilityandchallenges
Data accessibilityandchallenges
 
Big data in action
Big data in actionBig data in action
Big data in action
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
RDM Roadmap to the Future, or: Lords and Ladies of the Data
RDM Roadmap to the Future, or: Lords and Ladies of the DataRDM Roadmap to the Future, or: Lords and Ladies of the Data
RDM Roadmap to the Future, or: Lords and Ladies of the Data
 
Curoverse Presentation at ICG-11 (November 2016)
Curoverse Presentation at ICG-11 (November 2016)Curoverse Presentation at ICG-11 (November 2016)
Curoverse Presentation at ICG-11 (November 2016)
 
Effective research data management
Effective research data managementEffective research data management
Effective research data management
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
 
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
 
A FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning ModelsA FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning Models
 
Crossref LIVE US Online
Crossref LIVE US OnlineCrossref LIVE US Online
Crossref LIVE US Online
 
L07 metadata
L07 metadataL07 metadata
L07 metadata
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 

More from OpenAccessBelgium

5_UGent_TrainingCoP_Emilie_v2.pptx
5_UGent_TrainingCoP_Emilie_v2.pptx5_UGent_TrainingCoP_Emilie_v2.pptx
5_UGent_TrainingCoP_Emilie_v2.pptx
OpenAccessBelgium
 
2022-11-21_FRDN_open access Belgium FINAL.pptx
2022-11-21_FRDN_open access Belgium FINAL.pptx2022-11-21_FRDN_open access Belgium FINAL.pptx
2022-11-21_FRDN_open access Belgium FINAL.pptx
OpenAccessBelgium
 
7_2022 11 21 OA support_KU Leuven.pptx
7_2022 11 21 OA support_KU Leuven.pptx7_2022 11 21 OA support_KU Leuven.pptx
7_2022 11 21 OA support_KU Leuven.pptx
OpenAccessBelgium
 
20221121_OABE_DAFWB_JBiernaux.pptx
20221121_OABE_DAFWB_JBiernaux.pptx20221121_OABE_DAFWB_JBiernaux.pptx
20221121_OABE_DAFWB_JBiernaux.pptx
OpenAccessBelgium
 
6_ULiege_presentation.pdf
6_ULiege_presentation.pdf6_ULiege_presentation.pdf
6_ULiege_presentation.pdf
OpenAccessBelgium
 
20221121_KU Leuven Research Data Repository_OpenScienceBelgium.pptx
20221121_KU Leuven Research Data Repository_OpenScienceBelgium.pptx20221121_KU Leuven Research Data Repository_OpenScienceBelgium.pptx
20221121_KU Leuven Research Data Repository_OpenScienceBelgium.pptx
OpenAccessBelgium
 
1_OA Network Day 2022_Martijn Van Roie_YUFE.pptx
1_OA Network Day 2022_Martijn Van Roie_YUFE.pptx1_OA Network Day 2022_Martijn Van Roie_YUFE.pptx
1_OA Network Day 2022_Martijn Van Roie_YUFE.pptx
OpenAccessBelgium
 
3_OAweek2022_ULB_FVandooren.pdf
3_OAweek2022_ULB_FVandooren.pdf3_OAweek2022_ULB_FVandooren.pdf
3_OAweek2022_ULB_FVandooren.pdf
OpenAccessBelgium
 
2_ConnectingTheActors_VUB_LA_21_11_2022.pdf
2_ConnectingTheActors_VUB_LA_21_11_2022.pdf2_ConnectingTheActors_VUB_LA_21_11_2022.pdf
2_ConnectingTheActors_VUB_LA_21_11_2022.pdf
OpenAccessBelgium
 
4_Open Access policy UHasselt.pptx
4_Open Access policy UHasselt.pptx4_Open Access policy UHasselt.pptx
4_Open Access policy UHasselt.pptx
OpenAccessBelgium
 
Open science policy in flanders
Open science policy in flanders Open science policy in flanders
Open science policy in flanders
OpenAccessBelgium
 
Belgium webinar - openAIRE Research Graph
Belgium webinar - openAIRE Research GraphBelgium webinar - openAIRE Research Graph
Belgium webinar - openAIRE Research Graph
OpenAccessBelgium
 
OpenAIRE – The path from OpenAIRE to EOSC in Belgium
OpenAIRE – The path from OpenAIRE to EOSC in BelgiumOpenAIRE – The path from OpenAIRE to EOSC in Belgium
OpenAIRE – The path from OpenAIRE to EOSC in Belgium
OpenAccessBelgium
 
Open access Belgium
Open access Belgium Open access Belgium
Open access Belgium
OpenAccessBelgium
 
Zenodo - The catch-all repository
Zenodo - The catch-all repository Zenodo - The catch-all repository
Zenodo - The catch-all repository
OpenAccessBelgium
 
open peer review at BMC
open peer review at BMCopen peer review at BMC
open peer review at BMC
OpenAccessBelgium
 
Open peer review : Introductuion
Open peer review : Introductuion Open peer review : Introductuion
Open peer review : Introductuion
OpenAccessBelgium
 
Open access requirements F.N.R.S.
Open access requirements F.N.R.S.Open access requirements F.N.R.S.
Open access requirements F.N.R.S.
OpenAccessBelgium
 
20181024 oa week_rdm_myriam_mertens
20181024 oa week_rdm_myriam_mertens20181024 oa week_rdm_myriam_mertens
20181024 oa week_rdm_myriam_mertens
OpenAccessBelgium
 
Gdrp pres oct_2018_niels_hen
Gdrp pres oct_2018_niels_henGdrp pres oct_2018_niels_hen
Gdrp pres oct_2018_niels_hen
OpenAccessBelgium
 

More from OpenAccessBelgium (20)

5_UGent_TrainingCoP_Emilie_v2.pptx
5_UGent_TrainingCoP_Emilie_v2.pptx5_UGent_TrainingCoP_Emilie_v2.pptx
5_UGent_TrainingCoP_Emilie_v2.pptx
 
2022-11-21_FRDN_open access Belgium FINAL.pptx
2022-11-21_FRDN_open access Belgium FINAL.pptx2022-11-21_FRDN_open access Belgium FINAL.pptx
2022-11-21_FRDN_open access Belgium FINAL.pptx
 
7_2022 11 21 OA support_KU Leuven.pptx
7_2022 11 21 OA support_KU Leuven.pptx7_2022 11 21 OA support_KU Leuven.pptx
7_2022 11 21 OA support_KU Leuven.pptx
 
20221121_OABE_DAFWB_JBiernaux.pptx
20221121_OABE_DAFWB_JBiernaux.pptx20221121_OABE_DAFWB_JBiernaux.pptx
20221121_OABE_DAFWB_JBiernaux.pptx
 
6_ULiege_presentation.pdf
6_ULiege_presentation.pdf6_ULiege_presentation.pdf
6_ULiege_presentation.pdf
 
20221121_KU Leuven Research Data Repository_OpenScienceBelgium.pptx
20221121_KU Leuven Research Data Repository_OpenScienceBelgium.pptx20221121_KU Leuven Research Data Repository_OpenScienceBelgium.pptx
20221121_KU Leuven Research Data Repository_OpenScienceBelgium.pptx
 
1_OA Network Day 2022_Martijn Van Roie_YUFE.pptx
1_OA Network Day 2022_Martijn Van Roie_YUFE.pptx1_OA Network Day 2022_Martijn Van Roie_YUFE.pptx
1_OA Network Day 2022_Martijn Van Roie_YUFE.pptx
 
3_OAweek2022_ULB_FVandooren.pdf
3_OAweek2022_ULB_FVandooren.pdf3_OAweek2022_ULB_FVandooren.pdf
3_OAweek2022_ULB_FVandooren.pdf
 
2_ConnectingTheActors_VUB_LA_21_11_2022.pdf
2_ConnectingTheActors_VUB_LA_21_11_2022.pdf2_ConnectingTheActors_VUB_LA_21_11_2022.pdf
2_ConnectingTheActors_VUB_LA_21_11_2022.pdf
 
4_Open Access policy UHasselt.pptx
4_Open Access policy UHasselt.pptx4_Open Access policy UHasselt.pptx
4_Open Access policy UHasselt.pptx
 
Open science policy in flanders
Open science policy in flanders Open science policy in flanders
Open science policy in flanders
 
Belgium webinar - openAIRE Research Graph
Belgium webinar - openAIRE Research GraphBelgium webinar - openAIRE Research Graph
Belgium webinar - openAIRE Research Graph
 
OpenAIRE – The path from OpenAIRE to EOSC in Belgium
OpenAIRE – The path from OpenAIRE to EOSC in BelgiumOpenAIRE – The path from OpenAIRE to EOSC in Belgium
OpenAIRE – The path from OpenAIRE to EOSC in Belgium
 
Open access Belgium
Open access Belgium Open access Belgium
Open access Belgium
 
Zenodo - The catch-all repository
Zenodo - The catch-all repository Zenodo - The catch-all repository
Zenodo - The catch-all repository
 
open peer review at BMC
open peer review at BMCopen peer review at BMC
open peer review at BMC
 
Open peer review : Introductuion
Open peer review : Introductuion Open peer review : Introductuion
Open peer review : Introductuion
 
Open access requirements F.N.R.S.
Open access requirements F.N.R.S.Open access requirements F.N.R.S.
Open access requirements F.N.R.S.
 
20181024 oa week_rdm_myriam_mertens
20181024 oa week_rdm_myriam_mertens20181024 oa week_rdm_myriam_mertens
20181024 oa week_rdm_myriam_mertens
 
Gdrp pres oct_2018_niels_hen
Gdrp pres oct_2018_niels_henGdrp pres oct_2018_niels_hen
Gdrp pres oct_2018_niels_hen
 

Recently uploaded

Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
frank0071
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
SSR02
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
frank0071
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 

Recently uploaded (20)

Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 

Leonard&Dhollander_OpenScienceBelgium.pptx

  • 1. POST-INGEST CURATION: CURATING WITHOUT AN INSTITUTIONAL REPOSITORY Kevin Leonard - Evelien Dhollander
  • 2. Data Curation Requires Datasets • Data curation: adding value to (meta)data for long-term preservation • Imagined (ideal) workflow: 1. Researcher provides data to curator for curation • Voluntary submission • Automatic part of ingest in institutional repository 2. Curator makes changes and recommendations 3. Data is put online for long-term preservation • Is that realistic for many institutions?
  • 3. What Often Happens to Datasets, Really? Scientist Curator + + + +
  • 4. Proposed Workflow 1. Find datasets online • Employ existing data linking architectures • Use repository APIs 2. Produce (meta)data augmentation plan for discovered datasets • Develop plan based on current best practices for FAIR metadata • Recommend changes that maintain existing DOI networks 3. Provide researchers with an easily actionable curation plan
  • 5. Step 1: Where Are The Datasets? • Difficulties: • Datasets are broadly distributed • Affiliation information is not located in a consistent location (or format!) • Existing data linking systems (e.g., Scholix, DataCite) have limited coverage • Solution: • Use repository APIs to search for institutional datasets • Search outside of just <creator><affiliation> field
  • 6. Example Python Code • Python code to search for institutional records • searchQuery can include multiple items • Universiteit Gent • UGent • Ghent University • 00cv9y106 (ROR id) • Saves DOIs of all datasets to csv • Can use OAI-PMH to extract more metadata information • Focused on several popular repositories, easily extended • Zenodo, OSF, Dryad, Figshare, PANGAEA
  • 7. Step 2: What To Do With What You’ve Found • Repositories often allow metadata fields to be edited • WITHOUT triggering the creation of a new version (and therefore a new DOI) • Editable fields vary by repository: STRI CT LENIEN T • Editing any metadata fields creates a new version • Most fields can be edited • Title, authors, relatedTO
  • 8. Develop Recommendation Plan • Is the title clear? • Are keywords provided? • Are there links to related publications? • Do the authors have linked ORCIDs or affiliations? • Is there sufficient documentation in a README? • Can this information be provided in <description descriptionType=“Abstract”>?
  • 9. Step 3: Communicating the Recommendations • Implementation relies on participation of the researcher • Curation plan must be easily actionable with clearly articulated benefits • Reduce burden on researcher to interpret instructions Metadata Field Current Value Recommended Changes Rationale Title Abstract …
  • 10. Current results • Currently, the code harvests >2000 total records • Frequently encountered issues: • Abstracts redundant with publication • No direct contact information • Missing keywords Source Number of Records Found DataCite 236 Dryad 196 Figshare 302 OSF 186 Pangaea 724 Zenodo 710
  • 11. Conclusions • Relatively simple method to provide value to existing datasets • Benefits even if author declines to make recommended edits: • Helps institution find their research outputs • Provides researchers with FAIRness recommendations that they can implement for future datasets • Communicates the existence (and utility!) of data support staff
  • 12. More information? Contact us: Kevin Leonard kevinmichael.leonard@ugent.be Evelien Dhollander evelien.dhollander@ugent.be

Editor's Notes

  1. Thank you. Today we are going to talk about post-ingest curation, and what we at UGent have been considering to curate research data despite not having an institutional repository.
  2. As I’m sure you are all aware, research institutions are becoming increasingly aware of the importance and utility of data curation, which, for our purposes, we can broadly define as any activity that adds value to data or metadata prior to its long-term preservation in a data repository. It is usually conceptualized as an ideal workflow, wherein the researcher provides their data to a curator for curation. This can either be a voluntary submission, as envisioned here in this diagram from the Data Curation Network, with the researcher actively seeking out and requesting the assistance of a curator, or it can be automatic, such as when a researcher deposits their data in an institutional repository and that institution’s curators can immediately begin to work on the data, making the curation a necessary part of the data’s path towards preservation. Regardless, the curator is then able to make changes and recommendations, ideally through some kind of back and forth dialogue with the researcher, before finally the curated data is put online for long-term preservation. Therefore, in this conception, curators always get to curate the data BEFORE it is published online. What we asked in this project is how realistic that workflow is for most researchers and institutions, and whether an alternate model might be necessary for cases in which such curation is not so automatic.
  3. Looking outside the ideal, we turned our focus on what often actually happens to datasets in the research data lifecycle. The scientist completes some research project, generating a manuscript for publication and some associated data. They submit their manuscript to a journal and (as it approaches acceptance) have to publish their data online. Even if the researcher knows about curation services available to them at their institution, they might not feel that they have time to go through rounds of curation as they need their datasets online NOW, and so they circumvent the data curators and deposit the datasets directly into the general or domain-specific repository of their choice. They annotate the dataset with metadata according to their own understanding of best practices and what little time they have available to dedicate to documentation, and the data then sits online in the repository without ever having the opportunity to have value added to it by data curation specialists. Note that, while this workflow is possible for researchers from any institution, it’s especially likely for institutions that don’t have their own institutional repository, as the data curators will never have datasets automatically pass through their desks on the way to the institutional repository. What we aimed to do in this project is to define an alternate workflow for curators, wherein they can go out and find these datasets where they are posted online. Then, once they have knowledge of these datasets that are associated with their institution, they can develop individualized recommendation plans for the creators of those datasets, with the hope that the researcher implements those changes, thereby improving the FAIRness of those datasets.
  4. Our proposed workflow comes in three steps: First, we find the datasets that have already been posted online. To do so, we first looked at existing data linking architectures, such as Scholix, but ultimately ended up relying on repository APIs for the most popular repositories for researchers from our institution. Then, once we’ve found the datasets through these various methods, we can develop augmentation plans for these uncovered datasets. This is because many popular repositories actually allow users to edit the metadata of their published datasets without triggering the generation of a new version, therefore preserving existing DOI link networks. So, it should not be considered “too late” to curate a dataset just because it has already been hosted online. There are still things that can be done to improve its FAIRness. Finally, once we’ve developed a set of recommendations for a given dataset in an online repository, the last step is to create an action plan that can be communicated to the researcher, providing them with an easily actionable way to improve the FAIRness of their own datasets.
  5. So, the first step is to find the datasets online. This is more difficult than it sounds, because datasets are broadly distributed across many different repositories. To make matters worse, the affiliation information is not consistent, in location or format. Some records have the affiliation information associated with the creators. Some use the name of the institution written out in full, whereas others use the ROR, a specific id for institutions. For these and other complicated reasons, existing data linking systems end up missing a lot of the datasets that are out there. This can be easily verified… If you compare the results from using these services to just going onto one of these repository pages and entering the name of your institution into the search bar, you’ll find many records that these systems fail to pick up. Our solution then is to use the APIs to find as many additional institutional datasets as possible, and wherever possible, by searching outside of the CREATOR:AFFILIATION field.
  6. We’ve written some python code which harnesses the APIs of popular repositories to search for institutional datasets. Importantly for us, and probably for many institutions in Belgium, is our institution is known by many names, all of which we see authors freely use when tagging their datasets. As currently implemented, the code saves the DOIs of all the datasets it finds to a csv because that is most important to ingest into the systems that we use, but you could easily use OAI-PMH or alternate systems to extract more metadata. Lastly, we focused on the main repositories which are used by researchers from Ghent University, but this could easily be extended to focus on other repositories, insofar as they have APIs to plug into.
  7. Once the dataset records have been located, the next step is to figure out what to do with what you’ve found. The first part of that is determining what you can edit without triggering the creation of a new version (and therefore a new DOI). Even though these new DOIs are typically linked to the DOIs of the older versions, our thought was that it is avoid these potential issues. Different repositories vary with respect to which metadata fields are editable without triggering a new version, from very strict repositories (like Dryad) which allow essentially no editing, to very lenient repositories like Zenodo, for which you can edit almost anything, including the title, abstract, and authors.
  8. Once you’ve decided which metadata fields are in principle editable, you can then develop an individualized recommendation plan for that record. What exact recommendations you provide will depend on your institution’s priorities, current best practices, but we’ve collected here a few of the major items that could be included in such a plan: is the title clear? Are there keywords? Has it been linked to a publication? Are there ORCIDs linked? Have the authors provided something like an ROR? Did they provide a detailed README, and if not, could that information be provided in the abstract field?
  9. The last step is to communicate the recommendation plan to the researcher. Because the actual implementation of the plan relies on the participation of the researcher, steps should be taken to maximize the likelihood that they cooperate. For this, we envision a clearly articulated plan like in the table shown here, which outlines the metadata field in question, what that value currently contains, what the curators believe should be changed for that field, and their rationale. Anything that can reduce the burden on the researcher and lets them clearly see the reasoning and benefit behind the recommendations.
  10. This is all very interesting off course but as the saying goes: ‘the proof of the pudding is in the eating’, so here are some of our results. By running the code we gathered over 2000 dataset records from five major repositories and DataCite. We analysed a subset of these records to get some idea of the issues we will encounter in the future. A first issue is the redundancy of abstracts: most datasets have the same abstract as their corresponding publication. This isn’t necessarily a big issue when a datasheet or README is provided for the dataset but when the abstract is the only information on the content of the dataset, not enough information might be present for researchers to reuse the data. A possible solution to thids was mentioned a few slides ago: a README could be provided in the abstract metadata field. A second issue is the absence of contact information in the dataset metadata. In most cases the contact information is found via the linked publication, so contact information can be found but this is not a good practice, we want to encourage researchers to provide contact information in their dataset metadata as well. A third issue concerns keywords as in: there are no keywords provided. The dataset should at least have some of the related publication’s keywords and ideally have its own specific keywords to improve FAIRness.
  11. This is the basis of our proposal to provide curatorial benefits after a dataset has been already uploaded to a repository. Of course, we don’t want to suggest that this solves all problems, and it will not find ALL datasets, but it still has several benefits. Even if the author declines to make the recommended edits, the integration with repository APIs helps institutions find their research outputs. And the emails to the researchers, which include the detailed plan of how to improve the FAIRness of their datasets, provides the researchers with knowledge that they can carry with them in the future, and works as a way to let them know of the curatorial services that your institution might offer, and how they can help improve their online data.
  12. If you would like more information, please contact us. Thank you for your attention!