SlideShare a Scribd company logo
Ten Habits of Highly Effective Data 
Anita de Waard 
VP Research Data Collaborations 
a.dewaard@elsevier.com 
http://researchdata.elsevier.com/
The Maslow Hierarchy for humans: 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Preserved (existing in some form)
A Maslow Hierarchy for Data: 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Preserved (existing in some form)
1. Preserve: Data Rescue Challenge 
• With IEDA/Lamont: award succesful data 
rescue attempts 
• Awarded at AGU 2013 
• 23 submissions of data that was digitized, 
preserved, made available 
• Winner: NIMBUS Data Rescue: 
– Recovery, reprocessing and digitization of the 
infrared and visible observations along with their 
navigation and formatting. 
– Over 4000 7-track tapes of global infrared 
satellite data were read and reprocessed. 
– Nearly 200,000 visible light images were 
scanned, rectified and navigated. 
– All the resultant data was converted to HDF-5 
(NetCDF) format and freely distributed to users 
from NASA and NSIDC servers. 
– This data was then used to calculate monthly sea 
ice extents for both the Arctic d the Antarctic. 
• Conclusion: we (collectively) need to do more 
of this! How can we fund it? 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Preserved (existing in some form)
2. Archive: Olive Project 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Preserved (existing in some form) 
• CMU CS & Library: funded by a grant 
from the IMLS, Elsevier is partner 
• Goal: Preservation of executable content 
- nowadays a large part of intellectual 
output, and very fragile 
• Identified a series of software packages 
and prepared VM to preserve 
• Does it work? Yes – see video (1:24)
3. Access: Urban Legend 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Preserved (existing in some form) 
• Part 1: Metadata acquisition 
• Step through experimental process in series of dropdown 
menus in simple web UI 
• Can be tailored to workflow of individual researcher 
• Connected to shared ontologies through lookup table, 
managed centrally in lab 
• Connect to data input console (Igor Pro)
4. Comprehend: Urban Legend 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Preserved (existing in some form) 
• Part 2: Data Dashboard 
• Access, select and manipulate data (calculate 
properties, sort and plot) 
• Final goal: interactive figures linked to data 
• Plan to expand to more labs, other data
5. Discover: Data Discovery Index 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
1. Preserved (existing in some form) 
• NIH interested in creating DDI consortium 
• Three places where data is deposited: 
1. Curated sources for a single data type (e.g.Protein 
Data Bank, VentDB, Hubble Space Data) 
2. Non- or semicurated sources for different data types 
(e.g. DataDryad, Dataverse, Figshare) 
3. Tables in papers: 
• Ways to find this: 
– Cross-domain query tools, i.e. NIF, DataOne, etc 
– Search for papers -> link to data 
– How to find data in papers?? 
• Propose to build prototypes across all of these 
data sources: 
– Needs NLP, models of data patterns? What else? 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
Papers 
Non-curated DBs 
Curated DBs
6. Reproduce: Resource Identifier Initiative 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
1. Preserved (existing in some form) 
Force11 Working Group to add data identifiers 
to articles that is 
– 1) Machine readable; 
– 2) Free to generate and access; 
– 3) Consistent across publishers and journals. 
• Authors publishing in participating journals 
will be asked to provide RRID's for their 
resources; these are added to the keyword 
field 
• RRID's will be drawn from: 
– The Antibody Registry 
– Model Organism Databases 
– NIF Resource Registry 
• So far, Springer, Wiley, Biomednet, Elsevier 
journals have signed up with 11 journals, 
more to come 
• Wide community adoption! 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent)
9. Usable (allow tools to run on it) 
7.Trust: Moonrocks 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Preserved (existing in some form) 
How can we scale up data curation? 
Pilot project with IEDA: 
• Lunar geochemistry database: 
leapfrog & improve curation time 
• 1-year pilot, funded by Elsevier 
• If spreadsheet columns/headers 
map to RDB schema, we can scale up 
curation process and move from 
tables > curated databases!
8. Cite: Force11 Data Citation Principles 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
1. Preserved (existing in some form) 
• Another Force11 Working group 
• Defined 8 principles: 
• Now seeking endorsement/working on 
implementation 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Importance: Data should be considered legitimate, citable products of 
research. Data citations should be accorded the same importance in 
the scholarly record as citations of other research objects, such as 
publications. 
2. Credit and attribution: Data citations should facilitate giving scholarly 
credit and normative and legal attribution to all contributors to the 
data, recognizing that a single style or mechanism of attribution may 
not be applicable to all data. 
3. Evidence: Where a specific claim rests upon data, the corresponding 
data citation should be provided. 
4. Unique Identification: A data citation should include a persistent 
method for identification that is machine actionable, globally unique, 
and widely used by a community. 
5. Access: Data citations should facilitate access to the data themselves 
and to such associated metadata, documentation, and other materials, 
as are necessary for both humans and machines to make informed use 
of the referenced data. 
6. Persistence: Metadata describing the data, and unique identifiers 
should persist, even beyond the lifespan of the data they describe. 
7. Versioning and granularity: Data citations should facilitate 
identification and access to different versions and/or subsets of data. 
Citations should include sufficient detail to verifiably link the citing 
work to the portion and version of data cited. 
8. Interoperability and flexibility: Data citation methods should be 
sufficiently flexible to accommodate the variant practices among 
communities but should not differ so much that they compromise 
interoperability of data citation practices across communities.
9. Use: Executable Papers 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
1. Preserved (existing in some form) 
• Result of a challenge to come up with 
cyberinfrastructure components to 
enable executable papers 
• Pilot in Computer Science journals 
– See all code in the paper 
– Save it, export it 
– Change it and rerun on data set: 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent)
10: Let’s allow our data to be happy! 
Experimental Metadata: 
Objects, Procedures, Properties 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Preserved (existing in some form) 
Execute: Direct settings on equipment, 
circumstances of measurement 
Raw Data 
Analyze: Mathematical/computational 
Processed processes and analytics 
Data 
Record Metadata: 
DOI, Date, Author, Institute, etc. 
Prepare: Reagents, species/specimen/cell 
type, preparation details 
Entity IDs 
Validation Metadata: 
Reproduction, Curation; Selection, Citation, 
Usage, Metrics
Minimize your metadata footprint! 
Reuse: 
• ‘The good thing about standards is that there are 
so many to choose from’ 
• Haendel et al looking at 54 (!!) data standards: 
many have only been used once/for one group 
• Employ a common element set + modular 
additions over whole new schema 
Recycle: 
• Make sure you design upstream metadata 
with downstream processes in mind 
• Useful exercise: ‘buy a tag’ where 
users/systems that will store/query/cite data 
say what they need to do their job 
• Learn from genetics: one datum can play 
several different roles! 
Reduce: 
• Every tag needs to be added and read by 
someone/thing: this adds cost and waste 
• Consider ‘return on investment’ per metadata item 
• TBL: what if “http://” was “h/”?

More Related Content

Viewers also liked

Is Assessment Really So Horrible?
Is Assessment Really So Horrible?Is Assessment Really So Horrible?
Is Assessment Really So Horrible?
OPUS Management
 
Knowledge Media Panel U Toronto, Sept 30 2010
Knowledge Media Panel U Toronto, Sept 30 2010Knowledge Media Panel U Toronto, Sept 30 2010
Knowledge Media Panel U Toronto, Sept 30 2010Anita de Waard
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
Anita de Waard
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research Paper
Anita de Waard
 
Argumentation in biology papers
Argumentation in biology papersArgumentation in biology papers
Argumentation in biology papers
Anita de Waard
 
Enabling your Human Resource Information System to support HR Strategic Roles
Enabling your Human Resource Information System to support HR Strategic RolesEnabling your Human Resource Information System to support HR Strategic Roles
Enabling your Human Resource Information System to support HR Strategic Roles
OPUS Management
 
Keep the fires burning
Keep the fires burningKeep the fires burning
Keep the fires burning
OPUS Management
 

Viewers also liked (10)

Is Assessment Really So Horrible?
Is Assessment Really So Horrible?Is Assessment Really So Horrible?
Is Assessment Really So Horrible?
 
Knowledge Media Panel U Toronto, Sept 30 2010
Knowledge Media Panel U Toronto, Sept 30 2010Knowledge Media Panel U Toronto, Sept 30 2010
Knowledge Media Panel U Toronto, Sept 30 2010
 
Assessment
AssessmentAssessment
Assessment
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
 
Epistemics
EpistemicsEpistemics
Epistemics
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research Paper
 
Argumentation in biology papers
Argumentation in biology papersArgumentation in biology papers
Argumentation in biology papers
 
Enabling your Human Resource Information System to support HR Strategic Roles
Enabling your Human Resource Information System to support HR Strategic RolesEnabling your Human Resource Information System to support HR Strategic Roles
Enabling your Human Resource Information System to support HR Strategic Roles
 
Keep the fires burning
Keep the fires burningKeep the fires burning
Keep the fires burning
 
Vu210610futurejournal
Vu210610futurejournalVu210610futurejournal
Vu210610futurejournal
 

Similar to Ten Habits of Highly Effective Data

Ten habits of highly effective data
Ten habits of highly effective dataTen habits of highly effective data
Ten habits of highly effective data
Anita de Waard
 
Ten Habits of Highly Successful Data
Ten Habits of Highly Successful DataTen Habits of Highly Successful Data
Ten Habits of Highly Successful Data
Anita de Waard
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
Anita de Waard
 
Effective research data management
Effective research data managementEffective research data management
Effective research data management
Catherine Gold
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objects
seanb
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Anita de Waard
 
Enhance your rese​arch impact through open science
Enhance your rese​arch impact through open scienceEnhance your rese​arch impact through open science
Enhance your rese​arch impact through open science
London School of Hygiene and Tropical Medicine
 
FAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsFAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basics
OpenAIRE
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Anita de Waard
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycle
Marieke Guy
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing dataWorld Agroforestry (ICRAF)
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016
Rebecca Raworth, MLIS
 
Research data management workshop April 2016
Research data management workshop April 2016Research data management workshop April 2016
Research data management workshop April 2016
Rebecca Raworth, MLIS
 
CARARE: Can I use this data? FAIR into practice
CARARE: Can I use this data? FAIR into practiceCARARE: Can I use this data? FAIR into practice
CARARE: Can I use this data? FAIR into practice
CARARE
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
Ulrike Wittig
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
Josh Young
 
Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015 Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015
Susanna-Assunta Sansone
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
University of Arizona
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
Merce Crosas
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto University
Stephanie Simms
 

Similar to Ten Habits of Highly Effective Data (20)

Ten habits of highly effective data
Ten habits of highly effective dataTen habits of highly effective data
Ten habits of highly effective data
 
Ten Habits of Highly Successful Data
Ten Habits of Highly Successful DataTen Habits of Highly Successful Data
Ten Habits of Highly Successful Data
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
Effective research data management
Effective research data managementEffective research data management
Effective research data management
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objects
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Enhance your rese​arch impact through open science
Enhance your rese​arch impact through open scienceEnhance your rese​arch impact through open science
Enhance your rese​arch impact through open science
 
FAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsFAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basics
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycle
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016
 
Research data management workshop April 2016
Research data management workshop April 2016Research data management workshop April 2016
Research data management workshop April 2016
 
CARARE: Can I use this data? FAIR into practice
CARARE: Can I use this data? FAIR into practiceCARARE: Can I use this data? FAIR into practice
CARARE: Can I use this data? FAIR into practice
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
 
Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015 Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto University
 

More from Anita de Waard

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Anita de Waard
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
Anita de Waard
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Anita de Waard
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
Anita de Waard
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
Anita de Waard
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
Anita de Waard
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
Anita de Waard
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
Anita de Waard
 
History of the future
History of the futureHistory of the future
History of the future
Anita de Waard
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
Anita de Waard
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
Anita de Waard
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Anita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Anita de Waard
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
Anita de Waard
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
Anita de Waard
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
Anita de Waard
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
Anita de Waard
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
Anita de Waard
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data Lifecycle
Anita de Waard
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to Reuse
Anita de Waard
 

More from Anita de Waard (20)

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
History of the future
History of the futureHistory of the future
History of the future
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data Lifecycle
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to Reuse
 

Recently uploaded

bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
alishadewangan1
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
RASHMI M G
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
frank0071
 

Recently uploaded (20)

bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
 

Ten Habits of Highly Effective Data

  • 1. Ten Habits of Highly Effective Data Anita de Waard VP Research Data Collaborations a.dewaard@elsevier.com http://researchdata.elsevier.com/
  • 2. The Maslow Hierarchy for humans: 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Preserved (existing in some form)
  • 3. A Maslow Hierarchy for Data: 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Preserved (existing in some form)
  • 4. 1. Preserve: Data Rescue Challenge • With IEDA/Lamont: award succesful data rescue attempts • Awarded at AGU 2013 • 23 submissions of data that was digitized, preserved, made available • Winner: NIMBUS Data Rescue: – Recovery, reprocessing and digitization of the infrared and visible observations along with their navigation and formatting. – Over 4000 7-track tapes of global infrared satellite data were read and reprocessed. – Nearly 200,000 visible light images were scanned, rectified and navigated. – All the resultant data was converted to HDF-5 (NetCDF) format and freely distributed to users from NASA and NSIDC servers. – This data was then used to calculate monthly sea ice extents for both the Arctic d the Antarctic. • Conclusion: we (collectively) need to do more of this! How can we fund it? 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Preserved (existing in some form)
  • 5. 2. Archive: Olive Project 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Preserved (existing in some form) • CMU CS & Library: funded by a grant from the IMLS, Elsevier is partner • Goal: Preservation of executable content - nowadays a large part of intellectual output, and very fragile • Identified a series of software packages and prepared VM to preserve • Does it work? Yes – see video (1:24)
  • 6. 3. Access: Urban Legend 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Preserved (existing in some form) • Part 1: Metadata acquisition • Step through experimental process in series of dropdown menus in simple web UI • Can be tailored to workflow of individual researcher • Connected to shared ontologies through lookup table, managed centrally in lab • Connect to data input console (Igor Pro)
  • 7. 4. Comprehend: Urban Legend 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Preserved (existing in some form) • Part 2: Data Dashboard • Access, select and manipulate data (calculate properties, sort and plot) • Final goal: interactive figures linked to data • Plan to expand to more labs, other data
  • 8. 5. Discover: Data Discovery Index 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) • NIH interested in creating DDI consortium • Three places where data is deposited: 1. Curated sources for a single data type (e.g.Protein Data Bank, VentDB, Hubble Space Data) 2. Non- or semicurated sources for different data types (e.g. DataDryad, Dataverse, Figshare) 3. Tables in papers: • Ways to find this: – Cross-domain query tools, i.e. NIF, DataOne, etc – Search for papers -> link to data – How to find data in papers?? • Propose to build prototypes across all of these data sources: – Needs NLP, models of data patterns? What else? 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) Papers Non-curated DBs Curated DBs
  • 9. 6. Reproduce: Resource Identifier Initiative 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) Force11 Working Group to add data identifiers to articles that is – 1) Machine readable; – 2) Free to generate and access; – 3) Consistent across publishers and journals. • Authors publishing in participating journals will be asked to provide RRID's for their resources; these are added to the keyword field • RRID's will be drawn from: – The Antibody Registry – Model Organism Databases – NIF Resource Registry • So far, Springer, Wiley, Biomednet, Elsevier journals have signed up with 11 journals, more to come • Wide community adoption! 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent)
  • 10. 9. Usable (allow tools to run on it) 7.Trust: Moonrocks 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Preserved (existing in some form) How can we scale up data curation? Pilot project with IEDA: • Lunar geochemistry database: leapfrog & improve curation time • 1-year pilot, funded by Elsevier • If spreadsheet columns/headers map to RDB schema, we can scale up curation process and move from tables > curated databases!
  • 11. 8. Cite: Force11 Data Citation Principles 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) • Another Force11 Working group • Defined 8 principles: • Now seeking endorsement/working on implementation 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Importance: Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications. 2. Credit and attribution: Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data. 3. Evidence: Where a specific claim rests upon data, the corresponding data citation should be provided. 4. Unique Identification: A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community. 5. Access: Data citations should facilitate access to the data themselves and to such associated metadata, documentation, and other materials, as are necessary for both humans and machines to make informed use of the referenced data. 6. Persistence: Metadata describing the data, and unique identifiers should persist, even beyond the lifespan of the data they describe. 7. Versioning and granularity: Data citations should facilitate identification and access to different versions and/or subsets of data. Citations should include sufficient detail to verifiably link the citing work to the portion and version of data cited. 8. Interoperability and flexibility: Data citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities.
  • 12. 9. Use: Executable Papers 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) • Result of a challenge to come up with cyberinfrastructure components to enable executable papers • Pilot in Computer Science journals – See all code in the paper – Save it, export it – Change it and rerun on data set: 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent)
  • 13. 10: Let’s allow our data to be happy! Experimental Metadata: Objects, Procedures, Properties 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Preserved (existing in some form) Execute: Direct settings on equipment, circumstances of measurement Raw Data Analyze: Mathematical/computational Processed processes and analytics Data Record Metadata: DOI, Date, Author, Institute, etc. Prepare: Reagents, species/specimen/cell type, preparation details Entity IDs Validation Metadata: Reproduction, Curation; Selection, Citation, Usage, Metrics
  • 14. Minimize your metadata footprint! Reuse: • ‘The good thing about standards is that there are so many to choose from’ • Haendel et al looking at 54 (!!) data standards: many have only been used once/for one group • Employ a common element set + modular additions over whole new schema Recycle: • Make sure you design upstream metadata with downstream processes in mind • Useful exercise: ‘buy a tag’ where users/systems that will store/query/cite data say what they need to do their job • Learn from genetics: one datum can play several different roles! Reduce: • Every tag needs to be added and read by someone/thing: this adds cost and waste • Consider ‘return on investment’ per metadata item • TBL: what if “http://” was “h/”?