SlideShare a Scribd company logo
1 of 14
Ten Habits of Highly Effective Data 
Anita de Waard 
VP Research Data Collaborations 
a.dewaard@elsevier.com 
http://researchdata.elsevier.com/
The Maslow Hierarchy for humans: 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Preserved (existing in some form)
A Maslow Hierarchy for Data: 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Preserved (existing in some form)
1. Preserve: Data Rescue Challenge 
• With IEDA/Lamont: award succesful data 
rescue attempts 
• Awarded at AGU 2013 
• 23 submissions of data that was digitized, 
preserved, made available 
• Winner: NIMBUS Data Rescue: 
– Recovery, reprocessing and digitization of the 
infrared and visible observations along with their 
navigation and formatting. 
– Over 4000 7-track tapes of global infrared 
satellite data were read and reprocessed. 
– Nearly 200,000 visible light images were 
scanned, rectified and navigated. 
– All the resultant data was converted to HDF-5 
(NetCDF) format and freely distributed to users 
from NASA and NSIDC servers. 
– This data was then used to calculate monthly sea 
ice extents for both the Arctic d the Antarctic. 
• Conclusion: we (collectively) need to do more 
of this! How can we fund it? 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Preserved (existing in some form)
2. Archive: Olive Project 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Preserved (existing in some form) 
• CMU CS & Library: funded by a grant 
from the IMLS, Elsevier is partner 
• Goal: Preservation of executable content 
- nowadays a large part of intellectual 
output, and very fragile 
• Identified a series of software packages 
and prepared VM to preserve 
• Does it work? Yes – see video (1:24)
3. Access: Urban Legend 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Preserved (existing in some form) 
• Part 1: Metadata acquisition 
• Step through experimental process in series of dropdown 
menus in simple web UI 
• Can be tailored to workflow of individual researcher 
• Connected to shared ontologies through lookup table, 
managed centrally in lab 
• Connect to data input console (Igor Pro)
4. Comprehend: Urban Legend 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Preserved (existing in some form) 
• Part 2: Data Dashboard 
• Access, select and manipulate data (calculate 
properties, sort and plot) 
• Final goal: interactive figures linked to data 
• Plan to expand to more labs, other data
5. Discover: Data Discovery Index 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
1. Preserved (existing in some form) 
• NIH interested in creating DDI consortium 
• Three places where data is deposited: 
1. Curated sources for a single data type (e.g.Protein 
Data Bank, VentDB, Hubble Space Data) 
2. Non- or semicurated sources for different data types 
(e.g. DataDryad, Dataverse, Figshare) 
3. Tables in papers: 
• Ways to find this: 
– Cross-domain query tools, i.e. NIF, DataOne, etc 
– Search for papers -> link to data 
– How to find data in papers?? 
• Propose to build prototypes across all of these 
data sources: 
– Needs NLP, models of data patterns? What else? 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
Papers 
Non-curated DBs 
Curated DBs
6. Reproduce: Resource Identifier Initiative 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
1. Preserved (existing in some form) 
Force11 Working Group to add data identifiers 
to articles that is 
– 1) Machine readable; 
– 2) Free to generate and access; 
– 3) Consistent across publishers and journals. 
• Authors publishing in participating journals 
will be asked to provide RRID's for their 
resources; these are added to the keyword 
field 
• RRID's will be drawn from: 
– The Antibody Registry 
– Model Organism Databases 
– NIF Resource Registry 
• So far, Springer, Wiley, Biomednet, Elsevier 
journals have signed up with 11 journals, 
more to come 
• Wide community adoption! 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent)
9. Usable (allow tools to run on it) 
7.Trust: Moonrocks 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Preserved (existing in some form) 
How can we scale up data curation? 
Pilot project with IEDA: 
• Lunar geochemistry database: 
leapfrog & improve curation time 
• 1-year pilot, funded by Elsevier 
• If spreadsheet columns/headers 
map to RDB schema, we can scale up 
curation process and move from 
tables > curated databases!
8. Cite: Force11 Data Citation Principles 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
1. Preserved (existing in some form) 
• Another Force11 Working group 
• Defined 8 principles: 
• Now seeking endorsement/working on 
implementation 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Importance: Data should be considered legitimate, citable products of 
research. Data citations should be accorded the same importance in 
the scholarly record as citations of other research objects, such as 
publications. 
2. Credit and attribution: Data citations should facilitate giving scholarly 
credit and normative and legal attribution to all contributors to the 
data, recognizing that a single style or mechanism of attribution may 
not be applicable to all data. 
3. Evidence: Where a specific claim rests upon data, the corresponding 
data citation should be provided. 
4. Unique Identification: A data citation should include a persistent 
method for identification that is machine actionable, globally unique, 
and widely used by a community. 
5. Access: Data citations should facilitate access to the data themselves 
and to such associated metadata, documentation, and other materials, 
as are necessary for both humans and machines to make informed use 
of the referenced data. 
6. Persistence: Metadata describing the data, and unique identifiers 
should persist, even beyond the lifespan of the data they describe. 
7. Versioning and granularity: Data citations should facilitate 
identification and access to different versions and/or subsets of data. 
Citations should include sufficient detail to verifiably link the citing 
work to the portion and version of data cited. 
8. Interoperability and flexibility: Data citation methods should be 
sufficiently flexible to accommodate the variant practices among 
communities but should not differ so much that they compromise 
interoperability of data citation practices across communities.
9. Use: Executable Papers 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
1. Preserved (existing in some form) 
• Result of a challenge to come up with 
cyberinfrastructure components to 
enable executable papers 
• Pilot in Computer Science journals 
– See all code in the paper 
– Save it, export it 
– Change it and rerun on data set: 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent)
10: Let’s allow our data to be happy! 
Experimental Metadata: 
Objects, Procedures, Properties 
9. Usable (allow tools to run on it) 
8. Citable (able to point & track citations) 
7. Trusted (validated/checked by reviewers) 
6. Reproducible (others can redo 
experiments) 
5. Discoverable (can be indexed by a system) 
4. Comprehensible (others can understand 
data & processes) 
3. Accessible (can be accessed by others) 
2. Archived (long-term & format-independent) 
1. Preserved (existing in some form) 
Execute: Direct settings on equipment, 
circumstances of measurement 
Raw Data 
Analyze: Mathematical/computational 
Processed processes and analytics 
Data 
Record Metadata: 
DOI, Date, Author, Institute, etc. 
Prepare: Reagents, species/specimen/cell 
type, preparation details 
Entity IDs 
Validation Metadata: 
Reproduction, Curation; Selection, Citation, 
Usage, Metrics
Minimize your metadata footprint! 
Reuse: 
• ‘The good thing about standards is that there are 
so many to choose from’ 
• Haendel et al looking at 54 (!!) data standards: 
many have only been used once/for one group 
• Employ a common element set + modular 
additions over whole new schema 
Recycle: 
• Make sure you design upstream metadata 
with downstream processes in mind 
• Useful exercise: ‘buy a tag’ where 
users/systems that will store/query/cite data 
say what they need to do their job 
• Learn from genetics: one datum can play 
several different roles! 
Reduce: 
• Every tag needs to be added and read by 
someone/thing: this adds cost and waste 
• Consider ‘return on investment’ per metadata item 
• TBL: what if “http://” was “h/”?

More Related Content

Viewers also liked

Is Assessment Really So Horrible?
Is Assessment Really So Horrible?Is Assessment Really So Horrible?
Is Assessment Really So Horrible?OPUS Management
 
Knowledge Media Panel U Toronto, Sept 30 2010
Knowledge Media Panel U Toronto, Sept 30 2010Knowledge Media Panel U Toronto, Sept 30 2010
Knowledge Media Panel U Toronto, Sept 30 2010Anita de Waard
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesAnita de Waard
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research PaperAnita de Waard
 
Argumentation in biology papers
Argumentation in biology papersArgumentation in biology papers
Argumentation in biology papersAnita de Waard
 
Enabling your Human Resource Information System to support HR Strategic Roles
Enabling your Human Resource Information System to support HR Strategic RolesEnabling your Human Resource Information System to support HR Strategic Roles
Enabling your Human Resource Information System to support HR Strategic RolesOPUS Management
 

Viewers also liked (10)

Is Assessment Really So Horrible?
Is Assessment Really So Horrible?Is Assessment Really So Horrible?
Is Assessment Really So Horrible?
 
Knowledge Media Panel U Toronto, Sept 30 2010
Knowledge Media Panel U Toronto, Sept 30 2010Knowledge Media Panel U Toronto, Sept 30 2010
Knowledge Media Panel U Toronto, Sept 30 2010
 
Assessment
AssessmentAssessment
Assessment
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
 
Epistemics
EpistemicsEpistemics
Epistemics
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research Paper
 
Argumentation in biology papers
Argumentation in biology papersArgumentation in biology papers
Argumentation in biology papers
 
Enabling your Human Resource Information System to support HR Strategic Roles
Enabling your Human Resource Information System to support HR Strategic RolesEnabling your Human Resource Information System to support HR Strategic Roles
Enabling your Human Resource Information System to support HR Strategic Roles
 
Keep the fires burning
Keep the fires burningKeep the fires burning
Keep the fires burning
 
Vu210610futurejournal
Vu210610futurejournalVu210610futurejournal
Vu210610futurejournal
 

Similar to Ten Habits of Highly Effective Data

Ten habits of highly effective data
Ten habits of highly effective dataTen habits of highly effective data
Ten habits of highly effective dataAnita de Waard
 
Ten Habits of Highly Successful Data
Ten Habits of Highly Successful DataTen Habits of Highly Successful Data
Ten Habits of Highly Successful DataAnita de Waard
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 
Effective research data management
Effective research data managementEffective research data management
Effective research data managementCatherine Gold
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objectsseanb
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
FAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsFAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsOpenAIRE
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataAnita de Waard
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycleMarieke Guy
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing dataWorld Agroforestry (ICRAF)
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016 Rebecca Raworth, MLIS
 
Research data management workshop April 2016
Research data management workshop April 2016Research data management workshop April 2016
Research data management workshop April 2016Rebecca Raworth, MLIS
 
CARARE: Can I use this data? FAIR into practice
CARARE: Can I use this data? FAIR into practiceCARARE: Can I use this data? FAIR into practice
CARARE: Can I use this data? FAIR into practiceCARARE
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData ManagementUlrike Wittig
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorialJosh Young
 
Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015 Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015 Susanna-Assunta Sansone
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingMerce Crosas
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto UniversityStephanie Simms
 

Similar to Ten Habits of Highly Effective Data (20)

Ten habits of highly effective data
Ten habits of highly effective dataTen habits of highly effective data
Ten habits of highly effective data
 
Ten Habits of Highly Successful Data
Ten Habits of Highly Successful DataTen Habits of Highly Successful Data
Ten Habits of Highly Successful Data
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
Effective research data management
Effective research data managementEffective research data management
Effective research data management
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objects
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Enhance your rese​arch impact through open science
Enhance your rese​arch impact through open scienceEnhance your rese​arch impact through open science
Enhance your rese​arch impact through open science
 
FAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsFAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basics
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycle
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016
 
Research data management workshop April 2016
Research data management workshop April 2016Research data management workshop April 2016
Research data management workshop April 2016
 
CARARE: Can I use this data? FAIR into practice
CARARE: Can I use this data? FAIR into practiceCARARE: Can I use this data? FAIR into practice
CARARE: Can I use this data? FAIR into practice
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
 
Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015 Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto University
 

More from Anita de Waard

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataAnita de Waard
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesAnita de Waard
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Anita de Waard
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingAnita de Waard
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...Anita de Waard
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecycleAnita de Waard
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to ReuseAnita de Waard
 

More from Anita de Waard (20)

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
History of the future
History of the futureHistory of the future
History of the future
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data Lifecycle
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to Reuse
 

Recently uploaded

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 

Recently uploaded (20)

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 

Ten Habits of Highly Effective Data

  • 1. Ten Habits of Highly Effective Data Anita de Waard VP Research Data Collaborations a.dewaard@elsevier.com http://researchdata.elsevier.com/
  • 2. The Maslow Hierarchy for humans: 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Preserved (existing in some form)
  • 3. A Maslow Hierarchy for Data: 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Preserved (existing in some form)
  • 4. 1. Preserve: Data Rescue Challenge • With IEDA/Lamont: award succesful data rescue attempts • Awarded at AGU 2013 • 23 submissions of data that was digitized, preserved, made available • Winner: NIMBUS Data Rescue: – Recovery, reprocessing and digitization of the infrared and visible observations along with their navigation and formatting. – Over 4000 7-track tapes of global infrared satellite data were read and reprocessed. – Nearly 200,000 visible light images were scanned, rectified and navigated. – All the resultant data was converted to HDF-5 (NetCDF) format and freely distributed to users from NASA and NSIDC servers. – This data was then used to calculate monthly sea ice extents for both the Arctic d the Antarctic. • Conclusion: we (collectively) need to do more of this! How can we fund it? 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Preserved (existing in some form)
  • 5. 2. Archive: Olive Project 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Preserved (existing in some form) • CMU CS & Library: funded by a grant from the IMLS, Elsevier is partner • Goal: Preservation of executable content - nowadays a large part of intellectual output, and very fragile • Identified a series of software packages and prepared VM to preserve • Does it work? Yes – see video (1:24)
  • 6. 3. Access: Urban Legend 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Preserved (existing in some form) • Part 1: Metadata acquisition • Step through experimental process in series of dropdown menus in simple web UI • Can be tailored to workflow of individual researcher • Connected to shared ontologies through lookup table, managed centrally in lab • Connect to data input console (Igor Pro)
  • 7. 4. Comprehend: Urban Legend 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Preserved (existing in some form) • Part 2: Data Dashboard • Access, select and manipulate data (calculate properties, sort and plot) • Final goal: interactive figures linked to data • Plan to expand to more labs, other data
  • 8. 5. Discover: Data Discovery Index 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) • NIH interested in creating DDI consortium • Three places where data is deposited: 1. Curated sources for a single data type (e.g.Protein Data Bank, VentDB, Hubble Space Data) 2. Non- or semicurated sources for different data types (e.g. DataDryad, Dataverse, Figshare) 3. Tables in papers: • Ways to find this: – Cross-domain query tools, i.e. NIF, DataOne, etc – Search for papers -> link to data – How to find data in papers?? • Propose to build prototypes across all of these data sources: – Needs NLP, models of data patterns? What else? 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) Papers Non-curated DBs Curated DBs
  • 9. 6. Reproduce: Resource Identifier Initiative 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) Force11 Working Group to add data identifiers to articles that is – 1) Machine readable; – 2) Free to generate and access; – 3) Consistent across publishers and journals. • Authors publishing in participating journals will be asked to provide RRID's for their resources; these are added to the keyword field • RRID's will be drawn from: – The Antibody Registry – Model Organism Databases – NIF Resource Registry • So far, Springer, Wiley, Biomednet, Elsevier journals have signed up with 11 journals, more to come • Wide community adoption! 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent)
  • 10. 9. Usable (allow tools to run on it) 7.Trust: Moonrocks 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Preserved (existing in some form) How can we scale up data curation? Pilot project with IEDA: • Lunar geochemistry database: leapfrog & improve curation time • 1-year pilot, funded by Elsevier • If spreadsheet columns/headers map to RDB schema, we can scale up curation process and move from tables > curated databases!
  • 11. 8. Cite: Force11 Data Citation Principles 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) • Another Force11 Working group • Defined 8 principles: • Now seeking endorsement/working on implementation 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Importance: Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications. 2. Credit and attribution: Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data. 3. Evidence: Where a specific claim rests upon data, the corresponding data citation should be provided. 4. Unique Identification: A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community. 5. Access: Data citations should facilitate access to the data themselves and to such associated metadata, documentation, and other materials, as are necessary for both humans and machines to make informed use of the referenced data. 6. Persistence: Metadata describing the data, and unique identifiers should persist, even beyond the lifespan of the data they describe. 7. Versioning and granularity: Data citations should facilitate identification and access to different versions and/or subsets of data. Citations should include sufficient detail to verifiably link the citing work to the portion and version of data cited. 8. Interoperability and flexibility: Data citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities.
  • 12. 9. Use: Executable Papers 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) • Result of a challenge to come up with cyberinfrastructure components to enable executable papers • Pilot in Computer Science journals – See all code in the paper – Save it, export it – Change it and rerun on data set: 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent)
  • 13. 10: Let’s allow our data to be happy! Experimental Metadata: Objects, Procedures, Properties 9. Usable (allow tools to run on it) 8. Citable (able to point & track citations) 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 5. Discoverable (can be indexed by a system) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 2. Archived (long-term & format-independent) 1. Preserved (existing in some form) Execute: Direct settings on equipment, circumstances of measurement Raw Data Analyze: Mathematical/computational Processed processes and analytics Data Record Metadata: DOI, Date, Author, Institute, etc. Prepare: Reagents, species/specimen/cell type, preparation details Entity IDs Validation Metadata: Reproduction, Curation; Selection, Citation, Usage, Metrics
  • 14. Minimize your metadata footprint! Reuse: • ‘The good thing about standards is that there are so many to choose from’ • Haendel et al looking at 54 (!!) data standards: many have only been used once/for one group • Employ a common element set + modular additions over whole new schema Recycle: • Make sure you design upstream metadata with downstream processes in mind • Useful exercise: ‘buy a tag’ where users/systems that will store/query/cite data say what they need to do their job • Learn from genetics: one datum can play several different roles! Reduce: • Every tag needs to be added and read by someone/thing: this adds cost and waste • Consider ‘return on investment’ per metadata item • TBL: what if “http://” was “h/”?