Setting up a data repository, what does it entail?

International Food Policy Research Institute (IFPRI)
International Food Policy Research Institute (IFPRI)International Food Policy Research Institute (IFPRI)
Setting up a data repository,
what does it entail?
Experience sharing with the
Ethiopian Nutrition Information Platform for nutrition
Indira Yerramareddy and Nilam Prasai
International Food Policy Research Institute (IFPRI)
Webinar, April 8, 2019
Data Repository: The What
 IT infrastructure (cloud based/online) set up to manage, share,
access, maintain, and archive datasets
 An application database specialized in storing metadata of data
files/datasets/databases
 Differs from publication repository mainly in its ability
 to store metadata at different level/hierarchy
 to store and ingest data files in various formats for long-term
preservation
Data Repository: The Why
 Easy information discovery
 Easy and efficient access
 More exposure and amplify impact
 Persistent access (through persistent URL)
 Long-term storage and preservation
Additionally,
 Enables unprecedented use, analysis, and discovery through
interoperability and interlinking with other repositories
 Enhances Open science scholarship
Data Repository: The Software
 Data specific – Dataverse, Comprehensive Knowledge Archive
Network (CKAN), HUBzero
 Expanded from text-based institutional repository – DSpace,
Fedora, Hydra
 Open source – Dataverse, HUBzero, Dspace, Fedora, CKAN
 Proprietary/commercial – Figshare, Interuniversity Consortium for
Political and Social Resource [ICSPR], Digital Commons
Data Repository Software: Limitations
Source: Amorim R.C., Castro J.A., da Silva J.R., Ribeiro C. (2015) A Comparative Study of Platforms for Research Data
Management: Interoperability, Metadata Capabilities and Integration Potential. https://doi.org/10.1007/978-3-319-16486-1_10
Data Repository: Hosting??
Data Repository: Hosting or Partnering
 Cost
oFree ??
oDirect and ongoing cost of purchasing and keeping sever or
cloud space
oOngoing cost for maintenance
 Inhouse technology infrastructure and expertise
 Number of datasets
 Size/volume of data
 Sustainability
Data Repository: Selecting partners
 Reputation
o Is the repository service provider reputable
o Does the repository demonstrate acceptance and usage in the field of
its scholarship
 Visibility
o How will other discover datasets
o Is the repository indexed by google and other databases
 Sustainability
o Does the repository have long-term data management plan
o Does the repository have contingency plans
 Cost
o Free
o Direct cost for depositing datasets
o Indirect cost for maintaining and preserving datasets
 Persistence identifier
o Does the repository assigns persistence identifier (PID) for
datasets/datafiles
Data Repository: Selecting partners cont…
 Open Archives Initiatives Protocol for Metadata Harvesting (OAI-PMH)
o Is the repository OAI-PMH compliant
 Policies and licenses
o Does the repository allow data use agreements and licensing to state
explicitly what uses the data owners will allow
o Does the repository allow to set open, closed or restricted access
o Does the repository allow creative commons licensing
 Scholarly impact
o Does the repository track data citation and impact
o How does the repository track downloads and usage
o Does the repository allow integration with third party impact tracking
software (Altmetrics, Plum etc.)
 Support services
o Is the support service of the repository well-established and quick
Preferred,
 Recognized by the FAIRsharing.org
Data Curation: A Crucial Function for
Successful Repository
 What is data curation
o active and ongoing management of data through lifecycle of interest
and usefulness
o processes and activities related to organizing, describing, cleaning,
annotating, enhancing and preserving data for public use.
o centered around maintaining and managing the metadata rather than
the data itself [documenting]
o directly linked with preservation and maintenance, interoperability, and
reuse
 Begins with the research lifecycle
 Requires change in culture/practice
Data Curation Mountain: Understanding the
concept
 What is data curation
o active and ongoing management of data through lifecycle of interest
and usefulness
o processes and activities related to organizing, describing, cleaning,
annotating, enhancing and preserving data for public use.
o centered around maintaining and managing the metadata rather than
the data itself [documenting]
o directly linked with preservation and maintenance, interoperability, and
reuse
 Begins with the research lifecycle
 Requires change in culture/practice
Dataverse: The What
 Harvard’s open source research data repository software
Dataverse: The What
 40 installations around the
world
 Harvard Dataverse, Odum
Dataverse, Abacus, INRA,
Bostwana Harvard Data
 CGIAR Centers with local
installation: CIFOR, CIMMYT,
ICRISAT
Dataverse: Pros and Cons
Pros Cons
Open source software Metadata uniformity across
domains (e.g. Social sciences,
geology, computer sciences,
etc.)
Evolving and addresses users
concern
Not effective in handling
sensitive data
DOI minting; Handles Can’t delete datasets; only
deaccessioned them
OAI-PMH compliant Custom version control
API calls
Harvard Dataverse
One installation of Dataverse software hosted and
managed by Harvard University
Harvard
Dataverse
3166
Dataverses
IFPRI
Dataverse
10 collections
(sub-Dataverse)
Study
Metadata Data Files
Study
Metadata Data Files
Study
Metadata Data Files
Setting up a data repository, what does it entail?
Data citation and Metadata Example from IFPRI Dataverse
Data Sharing: IFPRI’s History
CD-ROMs/IFPRI.Org
o Began in 1999
o Shared through IFPRI.org as a zipped file
o Data files in proprietary format
o No guarantee for long term preservation
o No unique identifier for dataset/datafile
o Cart system check out imposed barrier to access
o No licensing mechanism for dataset/datafiles
Harvard Dataverse
o Started migrating datasets to Dataverse in 2008
o All datasets are in Dataverse since 2013
Harvard Dataverse Benefits: IFPRI’s Perspectives
 Apparently free of cost than staff time for curation
 No stress of maintaining server and backups
 Always evolving and open to customer’s feedback
 Quick customer support service
 More sustainable for IFPRI in resource constrained environment
 Complies with the standards of data sharing and FAIR principles
 Easy access and data discovery
 Recognized by FAIRsharing.org, open data registry, and many journal
publishers
 Indexed by google datasets
Harvard Dataverse Hiccups: IFPRI’s Perspectives
 Difficult to fit-in IFPRI specific feature requests unless it serves the wider
community
 Slow rendering because of growing data volume
 Not designed for handling sensitive data
 Batch modification a challenge if you don’t have programming/API
expertise in-house
 Limited options for customizing interface
 Free service – can’t enforce to execute the new features quickly
Data Services at IFPRI: What and How
 Data policy awareness
o IFPRI Data Policy
o CGIAR Open Access and Open Data Policy
o Donor Policies (USA, BMGF, EU)
 Data management plan
o Assist in developing data management plan
o Review data management plan
 Data publishing
o Review datasets (direct/indirect personally identifiable information)
o Create metadata records (using controlled vocabulary and data
publishing standards)
o Create codebooks
o Develop guidelines for researchers (preparing data for publishing, data
anonymization etc)
o Registering datasets in donor specific library (USAID data
development library
Questions and Comments ??
1 of 22

Recommended

Data-Ed Online: Data Operations Management: Turning your Challenges into Success by
Data-Ed Online: Data Operations Management: Turning your Challenges into SuccessData-Ed Online: Data Operations Management: Turning your Challenges into Success
Data-Ed Online: Data Operations Management: Turning your Challenges into SuccessDATAVERSITY
6.9K views53 slides
You Need a Data Catalog. Do You Know Why? by
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
440 views38 slides
Cloud Deployments Models by
Cloud Deployments ModelsCloud Deployments Models
Cloud Deployments ModelsMohamed Sami El-Tahawy
16.8K views28 slides
Big Data Analytics by
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsGhulam Imaduddin
35.2K views52 slides
What is Big Data? by
What is Big Data?What is Big Data?
What is Big Data?Bernard Marr
585.3K views25 slides
Introduction to Metadata by
Introduction to MetadataIntroduction to Metadata
Introduction to MetadataEUDAT
9.9K views27 slides

More Related Content

What's hot

Data Catalog as a Business Enabler by
Data Catalog as a Business EnablerData Catalog as a Business Enabler
Data Catalog as a Business EnablerSrinivasan Sankar
352 views16 slides
Presentation Introduction to Alteryx by
Presentation Introduction to AlteryxPresentation Introduction to Alteryx
Presentation Introduction to Alteryxravnorge
6.8K views14 slides
Metadata ppt by
Metadata pptMetadata ppt
Metadata pptShashikant Kumar
10.4K views12 slides
Evolution of Data Spaces by
Evolution of Data SpacesEvolution of Data Spaces
Evolution of Data SpacesBoris Otto
1.1K views8 slides
Cloud Computing & Big Data by
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big DataMrinal Kumar
7K views30 slides
The future of FAIR by
The future of FAIRThe future of FAIR
The future of FAIRSarah Jones
299 views32 slides

What's hot(20)

Presentation Introduction to Alteryx by ravnorge
Presentation Introduction to AlteryxPresentation Introduction to Alteryx
Presentation Introduction to Alteryx
ravnorge6.8K views
Evolution of Data Spaces by Boris Otto
Evolution of Data SpacesEvolution of Data Spaces
Evolution of Data Spaces
Boris Otto1.1K views
Cloud Computing & Big Data by Mrinal Kumar
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big Data
Mrinal Kumar7K views
The future of FAIR by Sarah Jones
The future of FAIRThe future of FAIR
The future of FAIR
Sarah Jones299 views
Cloud Computing Business Models by Mourad ZEROUKHI
Cloud Computing Business ModelsCloud Computing Business Models
Cloud Computing Business Models
Mourad ZEROUKHI15.6K views
Data warehouse and data mining by Pradnya Saval
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
Pradnya Saval191 views
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJX by DATAVERSITY
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJXDriving Data Intelligence in the Supply Chain Through the Data Catalog at TJX
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJX
DATAVERSITY1.1K views
Workshop - Build a Graph Solution by Neo4j
Workshop - Build a Graph SolutionWorkshop - Build a Graph Solution
Workshop - Build a Graph Solution
Neo4j94 views
DATA WAREHOUSING by King Julian
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
King Julian232.7K views
Data Catalog as the Platform for Data Intelligence by Alation
Data Catalog as the Platform for Data IntelligenceData Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data Intelligence
Alation2K views
The rise of “Big Data” on cloud computing by Minhazul Arefin
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
Minhazul Arefin1.9K views
Design Principles for a Modern Data Warehouse by Rob Winters
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
Rob Winters12.6K views
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A... by Cathrine Wilhelmsen
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
DataMinds 2022 Azure Purview Erwin de Kreuk by Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de Kreuk
Erwin de Kreuk416 views
FIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry by Neo4j
FIBO in Neo4j: Applying Knowledge Graphs in the Financial IndustryFIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
Neo4j372 views
Multidimensional Database Design & Architecture by hasanshan
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
hasanshan25.4K views
Dimensional Modeling by Sunita Sahu
Dimensional ModelingDimensional Modeling
Dimensional Modeling
Sunita Sahu19.4K views

Similar to Setting up a data repository, what does it entail?

Guiding users through data deposit by
Guiding users through data depositGuiding users through data deposit
Guiding users through data depositRobin Rice
1.1K views9 slides
University of Minho Data Repository - features to publish & share data and w... by
University of Minho Data Repository - features to publish & share data and  w...University of Minho Data Repository - features to publish & share data and  w...
University of Minho Data Repository - features to publish & share data and w...Pedro Príncipe
101 views18 slides
Digital Curation 101 - Taster by
Digital Curation 101 - TasterDigital Curation 101 - Taster
Digital Curation 101 - TasterDigital Curation Centre (DCC)
1K views35 slides
CARARE: Can I use this data? FAIR into practice by
CARARE: Can I use this data? FAIR into practiceCARARE: Can I use this data? FAIR into practice
CARARE: Can I use this data? FAIR into practiceCARARE
320 views40 slides
Open Data and Institutional Repositories by
Open Data and Institutional RepositoriesOpen Data and Institutional Repositories
Open Data and Institutional RepositoriesRobin Rice
1.1K views23 slides
Providing support and services for researchers in good data governance by
Providing support and services for researchers in good data governanceProviding support and services for researchers in good data governance
Providing support and services for researchers in good data governanceRobin Rice
83 views29 slides

Similar to Setting up a data repository, what does it entail?(20)

Guiding users through data deposit by Robin Rice
Guiding users through data depositGuiding users through data deposit
Guiding users through data deposit
Robin Rice1.1K views
University of Minho Data Repository - features to publish & share data and w... by Pedro Príncipe
University of Minho Data Repository - features to publish & share data and  w...University of Minho Data Repository - features to publish & share data and  w...
University of Minho Data Repository - features to publish & share data and w...
Pedro Príncipe101 views
CARARE: Can I use this data? FAIR into practice by CARARE
CARARE: Can I use this data? FAIR into practiceCARARE: Can I use this data? FAIR into practice
CARARE: Can I use this data? FAIR into practice
CARARE320 views
Open Data and Institutional Repositories by Robin Rice
Open Data and Institutional RepositoriesOpen Data and Institutional Repositories
Open Data and Institutional Repositories
Robin Rice1.1K views
Providing support and services for researchers in good data governance by Robin Rice
Providing support and services for researchers in good data governanceProviding support and services for researchers in good data governance
Providing support and services for researchers in good data governance
Robin Rice83 views
Practicing Open Science by dri_ireland
Practicing Open SciencePracticing Open Science
Practicing Open Science
dri_ireland346 views
eROSA Stakeholder WS1: Thoughts on FAIRifying data by e-ROSA
eROSA Stakeholder WS1: Thoughts on FAIRifying dataeROSA Stakeholder WS1: Thoughts on FAIRifying data
eROSA Stakeholder WS1: Thoughts on FAIRifying data
e-ROSA198 views
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | by EUDAT
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
EUDAT6.8K views
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT by OpenAIRE
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
OpenAIRE1.6K views
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT by Tony Ross-Hellauer
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
FAIR Ddata in trustworthy repositories: the basics by OpenAIRE
FAIR Ddata in trustworthy repositories: the basicsFAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basics
OpenAIRE769 views
Meeting the NSF DMP Requirement June 13, 2012 by IUPUI
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012
IUPUI692 views
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|... by EUDAT
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT1.3K views
Making Data FAIR (Findable, Accessible, Interoperable, Reusable) by Tom Plasterer
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Tom Plasterer768 views
Introduction to research data management by dri_ireland
Introduction to research data managementIntroduction to research data management
Introduction to research data management
dri_ireland147 views

More from International Food Policy Research Institute (IFPRI)

Targeting in Development Projects: Approaches, challenges, and lessons learned by
Targeting in Development Projects: Approaches, challenges, and lessons learnedTargeting in Development Projects: Approaches, challenges, and lessons learned
Targeting in Development Projects: Approaches, challenges, and lessons learnedInternational Food Policy Research Institute (IFPRI)
15 views48 slides
Prevalence and Impact of Landmines on Ukrainian Agricultural Production by
Prevalence and Impact of Landmines on Ukrainian Agricultural ProductionPrevalence and Impact of Landmines on Ukrainian Agricultural Production
Prevalence and Impact of Landmines on Ukrainian Agricultural ProductionInternational Food Policy Research Institute (IFPRI)
17 views11 slides
Impact of the Russian Military Invasion on Ukraine’s Agriculture and Trade by
Impact of the Russian Military Invasion on Ukraine’s Agriculture and Trade Impact of the Russian Military Invasion on Ukraine’s Agriculture and Trade
Impact of the Russian Military Invasion on Ukraine’s Agriculture and Trade International Food Policy Research Institute (IFPRI)
49 views14 slides
Mapping cropland extent over a complex landscape: An assessment of the best a... by
Mapping cropland extent over a complex landscape: An assessment of the best a...Mapping cropland extent over a complex landscape: An assessment of the best a...
Mapping cropland extent over a complex landscape: An assessment of the best a...International Food Policy Research Institute (IFPRI)
8 views13 slides
Examples of remote sensing application in agriculture monitoring by
Examples of remote sensing application in agriculture monitoringExamples of remote sensing application in agriculture monitoring
Examples of remote sensing application in agriculture monitoringInternational Food Policy Research Institute (IFPRI)
6 views18 slides

More from International Food Policy Research Institute (IFPRI)(20)

Recently uploaded

A Guide to Applying for the Wells Mountain Initiative Scholarship 2023 by
A Guide to Applying for the Wells Mountain Initiative Scholarship 2023A Guide to Applying for the Wells Mountain Initiative Scholarship 2023
A Guide to Applying for the Wells Mountain Initiative Scholarship 2023Excellence Foundation for South Sudan
69 views26 slides
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively by
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks EffectivelyISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks EffectivelyPECB
651 views18 slides
GCSE Spanish by
GCSE SpanishGCSE Spanish
GCSE SpanishWestHatch
53 views166 slides
Pharmaceutical Inorganic chemistry UNIT-V Radiopharmaceutical.pptx by
Pharmaceutical Inorganic chemistry UNIT-V Radiopharmaceutical.pptxPharmaceutical Inorganic chemistry UNIT-V Radiopharmaceutical.pptx
Pharmaceutical Inorganic chemistry UNIT-V Radiopharmaceutical.pptxMs. Pooja Bhandare
120 views51 slides
Gross Anatomy of the Liver by
Gross Anatomy of the LiverGross Anatomy of the Liver
Gross Anatomy of the Liverobaje godwin sunday
69 views12 slides
GCSE Media by
GCSE MediaGCSE Media
GCSE MediaWestHatch
48 views46 slides

Recently uploaded(20)

ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively by PECB
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks EffectivelyISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
PECB 651 views
GCSE Spanish by WestHatch
GCSE SpanishGCSE Spanish
GCSE Spanish
WestHatch53 views
Pharmaceutical Inorganic chemistry UNIT-V Radiopharmaceutical.pptx by Ms. Pooja Bhandare
Pharmaceutical Inorganic chemistry UNIT-V Radiopharmaceutical.pptxPharmaceutical Inorganic chemistry UNIT-V Radiopharmaceutical.pptx
Pharmaceutical Inorganic chemistry UNIT-V Radiopharmaceutical.pptx
Ms. Pooja Bhandare120 views
11.28.23 Social Capital and Social Exclusion.pptx by mary850239
11.28.23 Social Capital and Social Exclusion.pptx11.28.23 Social Capital and Social Exclusion.pptx
11.28.23 Social Capital and Social Exclusion.pptx
mary850239383 views
A-Level Art by WestHatch
A-Level ArtA-Level Art
A-Level Art
WestHatch48 views
Relationship of psychology with other subjects. by palswagata2003
Relationship of psychology with other subjects.Relationship of psychology with other subjects.
Relationship of psychology with other subjects.
palswagata200377 views
GCSE Geography by WestHatch
GCSE GeographyGCSE Geography
GCSE Geography
WestHatch47 views
The Accursed House by Émile Gaboriau by DivyaSheta
The Accursed House  by Émile GaboriauThe Accursed House  by Émile Gaboriau
The Accursed House by Émile Gaboriau
DivyaSheta234 views
Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant... by Ms. Pooja Bhandare
Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant...Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant...
Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant...
Ms. Pooja Bhandare166 views
CUNY IT Picciano.pptx by apicciano
CUNY IT Picciano.pptxCUNY IT Picciano.pptx
CUNY IT Picciano.pptx
apicciano56 views
BÀI TẬP BỔ TRỢ TIẾNG ANH FAMILY AND FRIENDS NATIONAL EDITION - LỚP 4 (CÓ FIL... by Nguyen Thanh Tu Collection
BÀI TẬP BỔ TRỢ TIẾNG ANH FAMILY AND FRIENDS NATIONAL EDITION - LỚP 4 (CÓ FIL...BÀI TẬP BỔ TRỢ TIẾNG ANH FAMILY AND FRIENDS NATIONAL EDITION - LỚP 4 (CÓ FIL...
BÀI TẬP BỔ TRỢ TIẾNG ANH FAMILY AND FRIENDS NATIONAL EDITION - LỚP 4 (CÓ FIL...

Setting up a data repository, what does it entail?

  • 1. Setting up a data repository, what does it entail? Experience sharing with the Ethiopian Nutrition Information Platform for nutrition Indira Yerramareddy and Nilam Prasai International Food Policy Research Institute (IFPRI) Webinar, April 8, 2019
  • 2. Data Repository: The What  IT infrastructure (cloud based/online) set up to manage, share, access, maintain, and archive datasets  An application database specialized in storing metadata of data files/datasets/databases  Differs from publication repository mainly in its ability  to store metadata at different level/hierarchy  to store and ingest data files in various formats for long-term preservation
  • 3. Data Repository: The Why  Easy information discovery  Easy and efficient access  More exposure and amplify impact  Persistent access (through persistent URL)  Long-term storage and preservation Additionally,  Enables unprecedented use, analysis, and discovery through interoperability and interlinking with other repositories  Enhances Open science scholarship
  • 4. Data Repository: The Software  Data specific – Dataverse, Comprehensive Knowledge Archive Network (CKAN), HUBzero  Expanded from text-based institutional repository – DSpace, Fedora, Hydra  Open source – Dataverse, HUBzero, Dspace, Fedora, CKAN  Proprietary/commercial – Figshare, Interuniversity Consortium for Political and Social Resource [ICSPR], Digital Commons
  • 5. Data Repository Software: Limitations Source: Amorim R.C., Castro J.A., da Silva J.R., Ribeiro C. (2015) A Comparative Study of Platforms for Research Data Management: Interoperability, Metadata Capabilities and Integration Potential. https://doi.org/10.1007/978-3-319-16486-1_10
  • 7. Data Repository: Hosting or Partnering  Cost oFree ?? oDirect and ongoing cost of purchasing and keeping sever or cloud space oOngoing cost for maintenance  Inhouse technology infrastructure and expertise  Number of datasets  Size/volume of data  Sustainability
  • 8. Data Repository: Selecting partners  Reputation o Is the repository service provider reputable o Does the repository demonstrate acceptance and usage in the field of its scholarship  Visibility o How will other discover datasets o Is the repository indexed by google and other databases  Sustainability o Does the repository have long-term data management plan o Does the repository have contingency plans  Cost o Free o Direct cost for depositing datasets o Indirect cost for maintaining and preserving datasets  Persistence identifier o Does the repository assigns persistence identifier (PID) for datasets/datafiles
  • 9. Data Repository: Selecting partners cont…  Open Archives Initiatives Protocol for Metadata Harvesting (OAI-PMH) o Is the repository OAI-PMH compliant  Policies and licenses o Does the repository allow data use agreements and licensing to state explicitly what uses the data owners will allow o Does the repository allow to set open, closed or restricted access o Does the repository allow creative commons licensing  Scholarly impact o Does the repository track data citation and impact o How does the repository track downloads and usage o Does the repository allow integration with third party impact tracking software (Altmetrics, Plum etc.)  Support services o Is the support service of the repository well-established and quick Preferred,  Recognized by the FAIRsharing.org
  • 10. Data Curation: A Crucial Function for Successful Repository  What is data curation o active and ongoing management of data through lifecycle of interest and usefulness o processes and activities related to organizing, describing, cleaning, annotating, enhancing and preserving data for public use. o centered around maintaining and managing the metadata rather than the data itself [documenting] o directly linked with preservation and maintenance, interoperability, and reuse  Begins with the research lifecycle  Requires change in culture/practice
  • 11. Data Curation Mountain: Understanding the concept  What is data curation o active and ongoing management of data through lifecycle of interest and usefulness o processes and activities related to organizing, describing, cleaning, annotating, enhancing and preserving data for public use. o centered around maintaining and managing the metadata rather than the data itself [documenting] o directly linked with preservation and maintenance, interoperability, and reuse  Begins with the research lifecycle  Requires change in culture/practice
  • 12. Dataverse: The What  Harvard’s open source research data repository software
  • 13. Dataverse: The What  40 installations around the world  Harvard Dataverse, Odum Dataverse, Abacus, INRA, Bostwana Harvard Data  CGIAR Centers with local installation: CIFOR, CIMMYT, ICRISAT
  • 14. Dataverse: Pros and Cons Pros Cons Open source software Metadata uniformity across domains (e.g. Social sciences, geology, computer sciences, etc.) Evolving and addresses users concern Not effective in handling sensitive data DOI minting; Handles Can’t delete datasets; only deaccessioned them OAI-PMH compliant Custom version control API calls
  • 15. Harvard Dataverse One installation of Dataverse software hosted and managed by Harvard University Harvard Dataverse 3166 Dataverses IFPRI Dataverse 10 collections (sub-Dataverse) Study Metadata Data Files Study Metadata Data Files Study Metadata Data Files
  • 17. Data citation and Metadata Example from IFPRI Dataverse
  • 18. Data Sharing: IFPRI’s History CD-ROMs/IFPRI.Org o Began in 1999 o Shared through IFPRI.org as a zipped file o Data files in proprietary format o No guarantee for long term preservation o No unique identifier for dataset/datafile o Cart system check out imposed barrier to access o No licensing mechanism for dataset/datafiles Harvard Dataverse o Started migrating datasets to Dataverse in 2008 o All datasets are in Dataverse since 2013
  • 19. Harvard Dataverse Benefits: IFPRI’s Perspectives  Apparently free of cost than staff time for curation  No stress of maintaining server and backups  Always evolving and open to customer’s feedback  Quick customer support service  More sustainable for IFPRI in resource constrained environment  Complies with the standards of data sharing and FAIR principles  Easy access and data discovery  Recognized by FAIRsharing.org, open data registry, and many journal publishers  Indexed by google datasets
  • 20. Harvard Dataverse Hiccups: IFPRI’s Perspectives  Difficult to fit-in IFPRI specific feature requests unless it serves the wider community  Slow rendering because of growing data volume  Not designed for handling sensitive data  Batch modification a challenge if you don’t have programming/API expertise in-house  Limited options for customizing interface  Free service – can’t enforce to execute the new features quickly
  • 21. Data Services at IFPRI: What and How  Data policy awareness o IFPRI Data Policy o CGIAR Open Access and Open Data Policy o Donor Policies (USA, BMGF, EU)  Data management plan o Assist in developing data management plan o Review data management plan  Data publishing o Review datasets (direct/indirect personally identifiable information) o Create metadata records (using controlled vocabulary and data publishing standards) o Create codebooks o Develop guidelines for researchers (preparing data for publishing, data anonymization etc) o Registering datasets in donor specific library (USAID data development library

Editor's Notes

  1. Each software contains different ranges of management and collaborative options; ingestion of various data types
  2. Software framework that enables institutions to host research data repositories Allows data sharing, control, persistent data citation, data publishing and versioning management Rich data support: Tabular Data: STATA, SPSS, CSV, Tab delimited Social network data: graph ML Other data or complimentary files : all formats are accepted; only tabular files have full data support
  3. Harvard Dataverse- A centralized software installation and data repository Dataverse- Individual distributed data archive with its own branding