SlideShare a Scribd company logo
1 of 7
Download to read offline
| 1
Anita de Waard 0000-0002-9034-4119
VP Research Data Collaborations
Elsevier RDM Services
a.dewaard@elsevier.com
NSF Workshop
February 28, March 1, 2017
Data Repositories:
Recommendation,
Certification and Models
for Cost Recovery
| 2
Object of
Study
Raw
Data
Processed
Data
Data
With
Paper
Curated
Record
Method Analysis
Tables/
Figures
Curate
Methods Software
Four Types of Repositories:
Research
Question
NOAA: 20 TB/
NASA streaming > 24 PB/day
NASA Reverb: 12 PB Data
NSSD: > 230 TB of digital data
NSIDC: 1 PB data, : 1 PB total
ALMA Telescope: 40 TB/day
Local Storage/
Instrument Repositories
Size: PB
Nr of files: Trillions
Deep Blue (Umich): 80k
MIT Dspace: 75 k
HAL (France): 60 k
D-Space Cambr: 1.5 k
Of which data: hundreds
Institutional/Local
Repositories
Size: GB
Nr of files: Billions
Figshare: 1.2 M
DataDryad: 3 k
Dataverse: 58 k
Non-Domain
Repositories
Size: MB
Nr of files: Milliions
Domain
Repositories
PetDB: 6 k
PDB: 100 k
NIST ASD: 170 k
Size: kB
Nr of files: 100ks
Publication
| 3
Recommended vs Certified Data Repositories [1]
•  Studied repositories recommended by 17 organisations:
•  Compiled list of 242 recommended repositories
•  Identified criteria for recommendation
•  Identified overlap between recommendations (Fig 1)
•  Identified 5 certification schema’s:
•  Compiled list of 129 certified repositories
•  Identified criteria for certification
•  Identified overlap between recommended & certified repositories (Fig 2)
Figure 1: Most repositories are
recommended by < 3 parties
Figure 2: Most recommended
repositories are not certified
[1] All data is openly available at doi:10.17632/zx2kcyvvwm.1
| 4
Set Of Shared Criteria Between Recommendation and
Certification of Repositories
Umbrella	
  
Categories	
  
Shared	
  Meaning	
   Recommended	
  Repository	
  Criteria	
   Repository	
  Cer8fica8on	
  Scheme	
  
Criteria	
  
Mission	
   Explicit	
  mission	
  statement	
  in	
  
providing	
  long-­‐term	
  responsibility,	
  
persistence,	
  and	
  management	
  of	
  
data(sets)	
  
Community/
Recogni8on	
  
Evidence	
  of	
  use	
  by	
  downloads	
  or	
  cita<ons	
  
from	
  an	
  iden<fiable	
  and	
  ac<ve	
  user	
  
community	
  
Understand	
  and	
  meet	
  the	
  needs	
  
of	
  the	
  designated	
  and	
  defined	
  
target	
  community	
  
Legal	
  and	
  
Contractual	
  
Compliance	
  
Repository	
  operates	
  within	
  a	
  legal	
  
framework/Ensures	
  compliance	
  
with	
  legal	
  regula<ons	
  
When	
  applicable,	
  have	
  	
  contractual	
  
regula<ons	
  governing	
  the	
  protec<on	
  of	
  
human	
  subjects	
  
Contracts	
  and	
  agreements	
  
maintained	
  with	
  relevant	
  par<es	
  
on	
  relevant	
  subjects	
  
Access/Accessibility	
   Public	
  access	
  to	
  the	
  scien<fic/
repository	
  designated	
  community	
  
Anonymous	
  referees	
  (including	
  peer-­‐
reviewers)	
  have	
  access	
  to	
  the	
  data	
  before	
  
public	
  release	
  as	
  indicated	
  by	
  policies	
  
Technical	
  
Structure/Interface	
  
The	
  soIware	
  system	
  supports	
  data	
  
organisa<on	
  and	
  searchability	
  by	
  both	
  
humans	
  and	
  computers.	
  The	
  interface	
  is	
  
intui<ve	
  and	
  mobile	
  user-­‐friendly	
  
The	
  technical	
  (infra)structure	
  is	
  
appropriate,	
  protec<ve,	
  and	
  
secure	
  
Retrievability	
   Data	
  need	
  to	
  have	
  enough	
  
metadata.	
  All	
  data	
  receive	
  a	
  
persistent	
  iden<fier	
  
Preserva8on	
   Long-­‐term	
  and	
  formal	
  
preserva<on/succession	
  plan	
  for	
  
the	
  data,	
  even	
  if	
  the	
  repository	
  
ceases	
  to	
  exist	
  
If	
  the	
  data	
  are	
  retracted,	
  the	
  persistent	
  
iden<fier	
  needs	
  to	
  be	
  maintained	
  
Preserva<on	
  of	
  data	
  informa<on	
  
proper<es	
  and	
  metadata	
  
Final report: Husen, Sean Edward; de Wilde, Zoë G.; de Waard, Anita; Cousijn, Helena (2017), “"Recommended versus
Certified Repositories: Mind the Gap"”, Submitted for Revision Codata Data Science Journal, Feb 20, 2017
| 5
Debit Economy (like a pie)
•  Single pile of ‘stuff’ gets divided:
-  Thing can only be for one person
at one time
-  “If you get more, I get less”
•  Examples:
-  Money
-  Jobs
-  Samples, equipment, space, etc.
•  Behaviors:
-  Hoarding, secrecy
-  (Cut-throat) competition
-  Winning by owning
(and not sharing)
Credit Economy (like a song)
•  Credit comes from visibility:
-  The more you give away,
the more you benefit
-  “Only if I share do I really own”
(“You need me to do you!” JW)
•  Examples:
-  Papers, citations
-  Good ideas (if credited)
-  Skills
•  Behaviors:
-  Open access, citation game
-  Collaboration with top-X
-  Winning by sharing
(to enable priority & visibility)
Two Economies of Science [3]:
[3] Paula Stephan: “How Economics Shapes Science”, Harvard University Press, 2012: http://www.jstor.org/stable/j.ctt2jbqd1
<<<DATA???
| 6
RDA Repository Cost Recovery IG
•  Interviewed 22 repositories & reported [2]
•  Different income streams:
1.  Structurally funded
2.  Mostly data access charges
3.  Mostly data deposit fees
4.  Membership fees (for deposits and/or access)
5.  Serial project funding
6.  Supported by host institution
•  Different new models under considerations:
•  Sponsorships/services for the commercial sector
•  Contracts for specific services offered (hosting, archiving, curation)
•  Expanding the number of affiliated institutions
•  Deposit fees
•  More services for “national memory institutes”
•  Some comments:
•  Some countries structurally fund repositories (not US!)
•  Some repositories embedded in scholarly practice
•  Hard to come up with new models: no time, no skill sets!
•  Next step: OECD/GSF WG studies more in-depth, more countries:
http://www.codata.org/working-groups/oecd-gsf-sustainable-business-models
[2] Available at https://www.rd-alliance.org/final-report-income-streams-data-repositories.html
| 7
Thank you!
More on Elsevier’s RDM program and other interesting efforts:
•  https://www.hivebench.com
•  https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015-
international-data-rescue-award-in-the-geosciences
•  http://www.journals.elsevier.com/softwarex/
•  https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking
•  https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html
•  https://rd-alliance.org/bof-data-search.html
•  https://datasearch.elsevier.com/
•  https://data.mendeley.com/
•  https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
•  https://www.force11.org/
•  http://www.nationaldataservice.org/
•  https://rd-alliance.org/
•  https://www.elsevier.com/about/open-science/research-data
Anita de Waard, a.dewaard@elsevier.com

More Related Content

What's hot

Global registries initiative frumkin omodei
Global registries initiative frumkin omodeiGlobal registries initiative frumkin omodei
Global registries initiative frumkin omodei
ASIS&T
 
Executive Summary - Data Management Hub
Executive Summary - Data Management HubExecutive Summary - Data Management Hub
Executive Summary - Data Management Hub
Denis Parfenov
 

What's hot (19)

December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
 
Implementing Archivematica, research data network
Implementing Archivematica, research data networkImplementing Archivematica, research data network
Implementing Archivematica, research data network
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016
 
Collaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareCollaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and software
 
Wheeler & Benedict -- Enabling the Preservation Relay
Wheeler & Benedict -- Enabling the Preservation RelayWheeler & Benedict -- Enabling the Preservation Relay
Wheeler & Benedict -- Enabling the Preservation Relay
 
BioSharing - Update - Feb2016
BioSharing - Update - Feb2016BioSharing - Update - Feb2016
BioSharing - Update - Feb2016
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
EDI Training Module 2: EDI Project
EDI Training Module 2:  EDI ProjectEDI Training Module 2:  EDI Project
EDI Training Module 2: EDI Project
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
DataUp Lightning Talk for #iEvoBio
DataUp Lightning Talk for #iEvoBioDataUp Lightning Talk for #iEvoBio
DataUp Lightning Talk for #iEvoBio
 
Global registries initiative frumkin omodei
Global registries initiative frumkin omodeiGlobal registries initiative frumkin omodei
Global registries initiative frumkin omodei
 
Executive Summary - Data Management Hub
Executive Summary - Data Management HubExecutive Summary - Data Management Hub
Executive Summary - Data Management Hub
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clark
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data Lifecycle
 
Smith RDAP11 NSF Data Management Plan Case Studies
Smith RDAP11 NSF Data Management Plan Case StudiesSmith RDAP11 NSF Data Management Plan Case Studies
Smith RDAP11 NSF Data Management Plan Case Studies
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
 
Altman RDAP11 Policy-based Data Management
Altman RDAP11 Policy-based Data ManagementAltman RDAP11 Policy-based Data Management
Altman RDAP11 Policy-based Data Management
 
Presentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesPresentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research Series
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identification
 

Viewers also liked

Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
Jian Qin
 
Ndsa 2013-abrams-integrating-repositories-for-data-sharing
Ndsa 2013-abrams-integrating-repositories-for-data-sharingNdsa 2013-abrams-integrating-repositories-for-data-sharing
Ndsa 2013-abrams-integrating-repositories-for-data-sharing
University of California Curation Center
 
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
EUDAT
 

Viewers also liked (9)

Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
 
Instutional repositories and data
Instutional repositories and dataInstutional repositories and data
Instutional repositories and data
 
Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...
 
Ndsa 2013-abrams-integrating-repositories-for-data-sharing
Ndsa 2013-abrams-integrating-repositories-for-data-sharingNdsa 2013-abrams-integrating-repositories-for-data-sharing
Ndsa 2013-abrams-integrating-repositories-for-data-sharing
 
Data Publishing and Institutional Repositories
Data Publishing and Institutional RepositoriesData Publishing and Institutional Repositories
Data Publishing and Institutional Repositories
 
Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...
 
Open Data Repositories
Open Data RepositoriesOpen Data Repositories
Open Data Repositories
 
Proses Penggubalan Undang2
Proses Penggubalan Undang2Proses Penggubalan Undang2
Proses Penggubalan Undang2
 
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
 

Similar to Data Repositories: Recommendation, Certification and Models for Cost Recovery

Data management plans
Data management plansData management plans
Data management plans
Brad Houston
 
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Sarah Shreeves
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
Brad Houston
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
Brad Houston
 

Similar to Data Repositories: Recommendation, Certification and Models for Cost Recovery (20)

Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
 
How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...
 
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
 
Research Lifecycles and RDM
Research Lifecycles and RDMResearch Lifecycles and RDM
Research Lifecycles and RDM
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
RDM for Librarians
RDM for LibrariansRDM for Librarians
RDM for Librarians
 
Researh data management
Researh data managementResearh data management
Researh data management
 
Data management
Data management Data management
Data management
 
Data management plans
Data management plansData management plans
Data management plans
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
 
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
Johnston - How to Curate Research Data
Johnston - How to Curate Research DataJohnston - How to Curate Research Data
Johnston - How to Curate Research Data
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 
Research data management : [part of] PROOF course Finding and controlling sci...
Research data management : [part of] PROOF course Finding and controlling sci...Research data management : [part of] PROOF course Finding and controlling sci...
Research data management : [part of] PROOF course Finding and controlling sci...
 

More from Anita de Waard

More from Anita de Waard (20)

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
History of the future
History of the futureHistory of the future
History of the future
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to Reuse
 
Argumentation in biology papers
Argumentation in biology papersArgumentation in biology papers
Argumentation in biology papers
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Data Repositories: Recommendation, Certification and Models for Cost Recovery

  • 1. | 1 Anita de Waard 0000-0002-9034-4119 VP Research Data Collaborations Elsevier RDM Services a.dewaard@elsevier.com NSF Workshop February 28, March 1, 2017 Data Repositories: Recommendation, Certification and Models for Cost Recovery
  • 2. | 2 Object of Study Raw Data Processed Data Data With Paper Curated Record Method Analysis Tables/ Figures Curate Methods Software Four Types of Repositories: Research Question NOAA: 20 TB/ NASA streaming > 24 PB/day NASA Reverb: 12 PB Data NSSD: > 230 TB of digital data NSIDC: 1 PB data, : 1 PB total ALMA Telescope: 40 TB/day Local Storage/ Instrument Repositories Size: PB Nr of files: Trillions Deep Blue (Umich): 80k MIT Dspace: 75 k HAL (France): 60 k D-Space Cambr: 1.5 k Of which data: hundreds Institutional/Local Repositories Size: GB Nr of files: Billions Figshare: 1.2 M DataDryad: 3 k Dataverse: 58 k Non-Domain Repositories Size: MB Nr of files: Milliions Domain Repositories PetDB: 6 k PDB: 100 k NIST ASD: 170 k Size: kB Nr of files: 100ks Publication
  • 3. | 3 Recommended vs Certified Data Repositories [1] •  Studied repositories recommended by 17 organisations: •  Compiled list of 242 recommended repositories •  Identified criteria for recommendation •  Identified overlap between recommendations (Fig 1) •  Identified 5 certification schema’s: •  Compiled list of 129 certified repositories •  Identified criteria for certification •  Identified overlap between recommended & certified repositories (Fig 2) Figure 1: Most repositories are recommended by < 3 parties Figure 2: Most recommended repositories are not certified [1] All data is openly available at doi:10.17632/zx2kcyvvwm.1
  • 4. | 4 Set Of Shared Criteria Between Recommendation and Certification of Repositories Umbrella   Categories   Shared  Meaning   Recommended  Repository  Criteria   Repository  Cer8fica8on  Scheme   Criteria   Mission   Explicit  mission  statement  in   providing  long-­‐term  responsibility,   persistence,  and  management  of   data(sets)   Community/ Recogni8on   Evidence  of  use  by  downloads  or  cita<ons   from  an  iden<fiable  and  ac<ve  user   community   Understand  and  meet  the  needs   of  the  designated  and  defined   target  community   Legal  and   Contractual   Compliance   Repository  operates  within  a  legal   framework/Ensures  compliance   with  legal  regula<ons   When  applicable,  have    contractual   regula<ons  governing  the  protec<on  of   human  subjects   Contracts  and  agreements   maintained  with  relevant  par<es   on  relevant  subjects   Access/Accessibility   Public  access  to  the  scien<fic/ repository  designated  community   Anonymous  referees  (including  peer-­‐ reviewers)  have  access  to  the  data  before   public  release  as  indicated  by  policies   Technical   Structure/Interface   The  soIware  system  supports  data   organisa<on  and  searchability  by  both   humans  and  computers.  The  interface  is   intui<ve  and  mobile  user-­‐friendly   The  technical  (infra)structure  is   appropriate,  protec<ve,  and   secure   Retrievability   Data  need  to  have  enough   metadata.  All  data  receive  a   persistent  iden<fier   Preserva8on   Long-­‐term  and  formal   preserva<on/succession  plan  for   the  data,  even  if  the  repository   ceases  to  exist   If  the  data  are  retracted,  the  persistent   iden<fier  needs  to  be  maintained   Preserva<on  of  data  informa<on   proper<es  and  metadata   Final report: Husen, Sean Edward; de Wilde, Zoë G.; de Waard, Anita; Cousijn, Helena (2017), “"Recommended versus Certified Repositories: Mind the Gap"”, Submitted for Revision Codata Data Science Journal, Feb 20, 2017
  • 5. | 5 Debit Economy (like a pie) •  Single pile of ‘stuff’ gets divided: -  Thing can only be for one person at one time -  “If you get more, I get less” •  Examples: -  Money -  Jobs -  Samples, equipment, space, etc. •  Behaviors: -  Hoarding, secrecy -  (Cut-throat) competition -  Winning by owning (and not sharing) Credit Economy (like a song) •  Credit comes from visibility: -  The more you give away, the more you benefit -  “Only if I share do I really own” (“You need me to do you!” JW) •  Examples: -  Papers, citations -  Good ideas (if credited) -  Skills •  Behaviors: -  Open access, citation game -  Collaboration with top-X -  Winning by sharing (to enable priority & visibility) Two Economies of Science [3]: [3] Paula Stephan: “How Economics Shapes Science”, Harvard University Press, 2012: http://www.jstor.org/stable/j.ctt2jbqd1 <<<DATA???
  • 6. | 6 RDA Repository Cost Recovery IG •  Interviewed 22 repositories & reported [2] •  Different income streams: 1.  Structurally funded 2.  Mostly data access charges 3.  Mostly data deposit fees 4.  Membership fees (for deposits and/or access) 5.  Serial project funding 6.  Supported by host institution •  Different new models under considerations: •  Sponsorships/services for the commercial sector •  Contracts for specific services offered (hosting, archiving, curation) •  Expanding the number of affiliated institutions •  Deposit fees •  More services for “national memory institutes” •  Some comments: •  Some countries structurally fund repositories (not US!) •  Some repositories embedded in scholarly practice •  Hard to come up with new models: no time, no skill sets! •  Next step: OECD/GSF WG studies more in-depth, more countries: http://www.codata.org/working-groups/oecd-gsf-sustainable-business-models [2] Available at https://www.rd-alliance.org/final-report-income-streams-data-repositories.html
  • 7. | 7 Thank you! More on Elsevier’s RDM program and other interesting efforts: •  https://www.hivebench.com •  https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015- international-data-rescue-award-in-the-geosciences •  http://www.journals.elsevier.com/softwarex/ •  https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking •  https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html •  https://rd-alliance.org/bof-data-search.html •  https://datasearch.elsevier.com/ •  https://data.mendeley.com/ •  https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data •  https://www.force11.org/ •  http://www.nationaldataservice.org/ •  https://rd-alliance.org/ •  https://www.elsevier.com/about/open-science/research-data Anita de Waard, a.dewaard@elsevier.com