SlideShare a Scribd company logo
Leonore Reiser and Lisa Harper
UC Berkeley
February 14, 2018
Good Data Stewardship
• Publish data with the paper
• Describe data to your fullest ability
• Use the right words to identify Data
• Deposit data in the right Data Repository
• Budget time for Data Management
• Don’t think of it as YOUR data
What’s in it for YOU?
We all benefit from data sharing.
More citations of YOUR work, increasing
your visibility in the research community.
Easily comply with journal and
funding requirements
Less time spent fulfilling requests for data.
Publications are increasing exponentially
http://bar.utoronto.ca/50YearsOfArabidopsis/
Idea
Funding
Experiments
Analysis
Publication
Reuse
Data
Lifecycle
Idea
Funding
Experiments
Analysis
Publication
Reuse
Data
Lifecycle
Don’t THROW it away!
Recycle!
Data re-use leads to new insights
Data Processing
Quality Control
Validation
503 datasets 314 datasets
Statistical Analysis
Additional Experiments
Yu Zhang et al. PNAS doi:10.1073/pnas.1716300115
NOVEL DISCOVERY
MET1 and CMT3 are independently required for the maintenance of
asymmetric CHH methylation at CMT2 target sites
Credit: Melissa Haendel
Wilkinson, et al., (2016) The FAIR Guiding Principles for scientific data management and stewardship
10.1038/sdata.2016.18. https://www.nature.com/articles/sdata201618
• Findable means data is human and machine readable
and attached to persistent identifiers
• Accessible means data can be found and retrieved by
humans and machines using standard formats
• Interoperable means data can be exchanged and used
between systems.
• Reusable means data can be used by others
How to Make Your Published Data FAIR
• Use standard formats
• Supply complete metadata
• Embrace Ontologies
• Use persistent and unambiguous identifiers
• Put your data in a long term stable repository
• Cite, share freely and encourage others
CHROM POS REF ALT Line
1
Line
2
1 12345 A C A A
3 67891 C T H C
10 23456 G T T U
CHROM POS REF ALT Line
1
Line
2
Gm01 12345 A C 0/0 0/0
Gm03 67891 C T 0/1 0/0
Gm10 23456 G T 1/1 ./.
CHROM POS REF ALT Line
1
Line
2
Chr01 12345 A C AA AA
Chr03 67891 C T C/T CC
Chr10 23456 G T TT NN
ALL MEAN THE SAME!
BUT ARE NOT THE SAME
Use Standard formats: SNP example
SNP (Single Nucleotide Polymorphism): A base, a chromosome
number and genome position, and a reference to the genome
assembly used, and the genotypes of lines tested.
VCF: Variant Call Format
Is the STANDARD
Use the File format
STANDARD
for your data type
DOI:/10.3389/fpls.2017.01812
Use Standard formats: Data in
images is NOT accessible
Data in PDF (image) format
is not findable or
accessible.
Leave tabular data in tables
If you use EXCEL, look out for data corruption and hidden
Microsoft characters that impede parsing
Zeimann, 2016
10.1186/s13059-016-1044-7
Use Standard formats: Beware of Excel
Fig. 1: Prevalence of gene name errors in Supplementary Excel files
Percentage of papers with gene lists effected Increase in supplementary files with gene
name errors per year
How to Make Your Published Data FAIR
• Use standard formats
• Supply complete metadata
• Embrace Ontologies
• Use persistent and unambiguous identifiers
• Put your data in a long term stable repository
• Cite, share freely and encourage others
Metadata: Species = xxx
Germplasm = xxx
Field location = xxx
Environment = xxx
Measurement = xxx
method
Phenotype (Data): Plant is 170cm tall
Metadata is data about the data,
and allows understanding of the data
Supply Complete Metadata
• Write your Materials and Methods as if you wanted
someone else to be able to reproduce your work.
• Be accurate and complete about your bench and field
work; include samples/stocks/lines used, accession
numbers, sources of materials, exact measuring
techniques etc.
• Be AS accurate and complete about your computational
pipelines. Include your created raw data files and
versions. If you use reference data (eg; sequence
assembly), include the version number, download dates,
and download source.
• Include names of software applications, versions,
platforms and source. If you use a CyVerse, use their
metadata reporting tools.
Supply Complete Metadata
Supply Complete Metadata
Pretty Good!
Supply Complete Metadata
Pretty Good!
Supply Complete Metadata
Not so good
Supply Complete Metadata
597 Possible Attributes
At least 50 Attributes
Genome Sequence Assembly At least 100 Attributes
Budget TIME
to provide Metadata
The metadata in public databases is often confusing; a test case
with Zea mays mRNA seq data reveals a high proportion of
missing, misleading or incomplete metadata. 2018.
https://doi.org/10.1016/j.plantsci.2017.10.014
Supply Complete Metadata
• Established: Genomic Standards Consortium
(http://gensc.org)
• Minimal Information about Any Sequence
• Emerging
• Minimal Information about a Plant Phenotyping Experiment
(MIAPPE)
Metadata Standards for Various Data Types
Supply Complete Metadata
Ask For Help from Database People
How to Make Your Published Data FAIR
• Use standard formats
• Supply complete and deep metadata
• Embrace Ontologies
• Use persistent and unambiguous identifiers
• Put your data in a long term stable repository
• Cite, share freely and encourage others
Cell
Same word,
different meanings
Different words,
same concept
Eggplant
Aubergine
Melongene
Embrace Ontologies
An Ontology is:
A set of precisely defined terms
In a logical hierarchy, and the
Relationship between can be
understood by computers
PO:0020105
ligule
Ontologies: Hierarchy of terms and
explicit relationship among terms
Plant
Ontology
(PO)
Ligule
PO:0020105
Vascular leaf
PO:0009025
Leaf sheath
PO:0020104
Flag leaf
PO:0020103
Adult vascular leaf
PO:0020103
Leaf
PO:0025034
Data from diverse types of experiments and organisms
can be compared
Henk J. Franssen, et al (2015)
doi: 10.1242/dev.120774
(Medicago)
Li,S. et al., (2016)
10.1016/j.devcel.2016.10.012
Arabidopsis
Zhou, X-F, et a.l. (2014) 10.1104/pp.114.243808
Embracing ontologies
• Ontologies provide a POWERFUL, MACHINE READABLE utility
for data
• Find and use existing ontologies
(http://www.obofoundry.org/, Planteome)
• Gene Function = Gene Ontology (GO)
• Sequences = Sequence Ontology (SO)
• Plant Anatomy and Development = Plant Ontology (PO)
• Phenotypes = Phenotype and Trait Ontology (PATO)
• …..many many others
• Apply them consistently
• To datasets (e.g. in metadata)
• In publications (e.g. TAIR GO/PO submission)
• Ask Questions!
How to Make Your Published Data FAIR
• Use standard formats
• Supply complete and deep metadata
• Embrace Ontologies
• Use persistent and unambiguous identifiers
• Put your data in a long term stable repository
• Cite, share freely and encourage others
Use persistent, unambiguous
identifiers
Example: Gene names
GOOD!
Identifiers also resolve confusion over
species
Is this Arabidopsis? Maize? Tomato?
DOI:10/24/pp.17.00021
One gene- many names
GOOD
OK
(history)
One name- many genes
Solution: Community Standards and
Nomenclature Resources
How to Make Your Published Data FAIR
• Use standard formats
• Supply complete and deep metadata
• Embrace Ontologies
• Use persistent and unambiguous identifiers
• Put your data in a long term stable repository
• Cite, share freely and encourage others
Problem: Data is not findable because
it is not available
Piwowar HA, Vision TJ.(2013)Data reuse and the open data
citation
advantage.PeerJ1:e175https://doi.org/10.7717/peerj.175
Gibney and VanNorden
doi:10.1038/nature.2013.14416
Put your data in a stable public
repository
Large International Repositories for many data
types for all species. ALL sequence data goes here
Large but specialized databases serving many species
Soybase
Specialized databases serving specific communities
Submitting to a repository: SNP example
As of 9/2017, All NON- human SNPs are
processed through EMBL in the European
Variation Archive (EVA,
https://www.ebi.ac.uk/eva/).
NCBI’s dbSNP will only process Human SNPs
EVA will require:
• Data in (standard) Variant Calling Format
(VCF) including allele frequencies
• SUBMITTED Genome or Transcriptome
assembly
What if there is no specialized database?
Or no recommendations from journals ?
You should get a Digital Object Identifier (DOI)
http://datadryad.org
** Curated, metadata
https://zenodo.org/
https://figshare.com/
https://datashare.ucsf.edu/stash
And just for you folks at UC……
But.. please, don’t forget to actually complete
your submission*...
*And you never have to spend time fielding requests
or transferring huge data files again
https://xkcd.com/1909/
How to Make Your Published Data FAIR
• Use standard formats
• Supply complete and deep metadata
• Embrace Ontologies
• Use persistent and unambiguous identifiers
• Put your data in a long term stable repository
• Cite, share freely and encourage others
Cite, share freely and encourage others to be FAIR
Include searchable and citable identifiers for your data in
your papers
Release your data with clearly defined terms of use
e.g. Creative Commons (CC) CC-0, CC-BY
If you do not specify restrictions are implied limiting reuse
Cite all of your data sources
Enhances reproducibility….. and also shows value to funders!
When reviewing papers check them for FAIRness
Good data practices benefit everyone
(and help you get funded)
A few simple things to remember when
preparing your paper
• Include unambiguous identifiers
• Format data according to defined standards
• Keep data in (parseable) tables or text
• Include meaningful metadata
• Deposit data in a long term stable public repository and get a
DOI
• It is never to early to think about (meta) data, the best time to
start is BEFORE you are writing
You can get help structuring,
organizing and managing your data
● Contact your Community Database
● Don’t have one? Contact a curator
(Leonore, Lisa… we live amongst you)
● UCB Research Data Management Librarians
(http://researchdata.berkeley.edu/)
Thank you!
AgBioData
What YOU can do right now to
support FAIR data
Ask your funders for increased access to FAIR data
When you review papers- looks at the data, and be
sure it is well described (Metadata is great)
Change your attitude a little: You data will be more
cited, more important if you make it FAIR
Deposit your Data and get a DOI
Ask your institution to value good data submission,
and good data recycling

More Related Content

What's hot

Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
CEDAR: Center for Expanded Data Annotation and Retrieval
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
CEDAR: Center for Expanded Data Annotation and Retrieval
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
CEDAR: Center for Expanded Data Annotation and Retrieval
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
Carole Goble
 
Connecting the dots: drug information and Linked Data
Connecting the dots: drug information and Linked DataConnecting the dots: drug information and Linked Data
Connecting the dots: drug information and Linked Data
Tomasz Adamusiak
 
Measuring electronic resource availability final version
Measuring electronic resource availability final versionMeasuring electronic resource availability final version
Measuring electronic resource availability final versionSanjeet Mann
 
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Alison Hitchens
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
Carole Goble
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-jupp
Simon Jupp
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
Prof. Wim Van Criekinge
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
Carole Goble
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Carole Goble
 
Reuse of Repository Data
Reuse of Repository DataReuse of Repository Data
Reuse of Repository Data
Valerie Enriquez
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
Prof. Wim Van Criekinge
 
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Michel Dumontier
 
schema.org and biomedical ontologies
schema.org and biomedical ontologies schema.org and biomedical ontologies
schema.org and biomedical ontologies
Simon Jupp
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
Michel Dumontier
 
Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...
"Open Access - Open Data" conference, 13th/14th December, 2010
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbget
SurendraKumar338
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for Biopharma
Tom Plasterer
 

What's hot (20)

Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
Connecting the dots: drug information and Linked Data
Connecting the dots: drug information and Linked DataConnecting the dots: drug information and Linked Data
Connecting the dots: drug information and Linked Data
 
Measuring electronic resource availability final version
Measuring electronic resource availability final versionMeasuring electronic resource availability final version
Measuring electronic resource availability final version
 
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-jupp
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Reuse of Repository Data
Reuse of Repository DataReuse of Repository Data
Reuse of Repository Data
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
 
schema.org and biomedical ontologies
schema.org and biomedical ontologies schema.org and biomedical ontologies
schema.org and biomedical ontologies
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
 
Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbget
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for Biopharma
 

Similar to How to make your published data findable, accessible, interoperable and reusable

NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
Susanna-Assunta Sansone
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
Susanna-Assunta Sansone
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and Sharing
C. Tobin Magle
 
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
SC CTSI at USC and CHLA
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Susanna-Assunta Sansone
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
C. Tobin Magle
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
GigaScience, BGI Hong Kong
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014Susanna-Assunta Sansone
 
The challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can helpThe challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can help
Varsha Khodiyar
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
Carly Strasser
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
Stuart Chalk
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Susanna-Assunta Sansone
 
A Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data Repositories
LIBER Europe
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
sesrdm
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
David Johnson
 
TAIR ICAR 2010 Presentation
TAIR ICAR 2010 PresentationTAIR ICAR 2010 Presentation
TAIR ICAR 2010 Presentation
Phoenix Bioinformatics
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
Ulrike Wittig
 
Data sharing as part of the research workflow
Data sharing as part of the research workflowData sharing as part of the research workflow
Data sharing as part of the research workflow
Varsha Khodiyar
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
FAIRDOM
 

Similar to How to make your published data findable, accessible, interoperable and reusable (20)

NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and Sharing
 
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
 
The challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can helpThe challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can help
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
 
A Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data Repositories
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
TAIR ICAR 2010 Presentation
TAIR ICAR 2010 PresentationTAIR ICAR 2010 Presentation
TAIR ICAR 2010 Presentation
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
 
Data sharing as part of the research workflow
Data sharing as part of the research workflowData sharing as part of the research workflow
Data sharing as part of the research workflow
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
 

More from Phoenix Bioinformatics

PhyloGenes Webinar Spring 2020
PhyloGenes Webinar Spring 2020PhyloGenes Webinar Spring 2020
PhyloGenes Webinar Spring 2020
Phoenix Bioinformatics
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenes
Phoenix Bioinformatics
 
Stanford workshop2020
Stanford workshop2020Stanford workshop2020
Stanford workshop2020
Phoenix Bioinformatics
 
Reiser aspb2019 asgiven
Reiser aspb2019 asgivenReiser aspb2019 asgiven
Reiser aspb2019 asgiven
Phoenix Bioinformatics
 
TAIR ASPB 2018 Presentation
TAIR ASPB 2018 PresentationTAIR ASPB 2018 Presentation
TAIR ASPB 2018 Presentation
Phoenix Bioinformatics
 
Tair workshop stanford2017
Tair workshop stanford2017Tair workshop stanford2017
Tair workshop stanford2017
Phoenix Bioinformatics
 
2014 International Conference on Arabidopsis Research (ICAR) presentation
2014 International Conference on Arabidopsis Research (ICAR) presentation2014 International Conference on Arabidopsis Research (ICAR) presentation
2014 International Conference on Arabidopsis Research (ICAR) presentation
Phoenix Bioinformatics
 
2014 ASPB Presentation- Berardini
2014 ASPB Presentation- Berardini2014 ASPB Presentation- Berardini
2014 ASPB Presentation- Berardini
Phoenix Bioinformatics
 
2014 Plant and Animal Genome Conference- Huala
2014 Plant and Animal Genome Conference- Huala2014 Plant and Animal Genome Conference- Huala
2014 Plant and Animal Genome Conference- Huala
Phoenix Bioinformatics
 
TAIR Presentation ICAR 2017
TAIR Presentation ICAR 2017TAIR Presentation ICAR 2017
TAIR Presentation ICAR 2017
Phoenix Bioinformatics
 
TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...
Phoenix Bioinformatics
 
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
Phoenix Bioinformatics
 
TAIR Presentation ASPB 2017
TAIR Presentation ASPB 2017TAIR Presentation ASPB 2017
TAIR Presentation ASPB 2017
Phoenix Bioinformatics
 

More from Phoenix Bioinformatics (13)

PhyloGenes Webinar Spring 2020
PhyloGenes Webinar Spring 2020PhyloGenes Webinar Spring 2020
PhyloGenes Webinar Spring 2020
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenes
 
Stanford workshop2020
Stanford workshop2020Stanford workshop2020
Stanford workshop2020
 
Reiser aspb2019 asgiven
Reiser aspb2019 asgivenReiser aspb2019 asgiven
Reiser aspb2019 asgiven
 
TAIR ASPB 2018 Presentation
TAIR ASPB 2018 PresentationTAIR ASPB 2018 Presentation
TAIR ASPB 2018 Presentation
 
Tair workshop stanford2017
Tair workshop stanford2017Tair workshop stanford2017
Tair workshop stanford2017
 
2014 International Conference on Arabidopsis Research (ICAR) presentation
2014 International Conference on Arabidopsis Research (ICAR) presentation2014 International Conference on Arabidopsis Research (ICAR) presentation
2014 International Conference on Arabidopsis Research (ICAR) presentation
 
2014 ASPB Presentation- Berardini
2014 ASPB Presentation- Berardini2014 ASPB Presentation- Berardini
2014 ASPB Presentation- Berardini
 
2014 Plant and Animal Genome Conference- Huala
2014 Plant and Animal Genome Conference- Huala2014 Plant and Animal Genome Conference- Huala
2014 Plant and Animal Genome Conference- Huala
 
TAIR Presentation ICAR 2017
TAIR Presentation ICAR 2017TAIR Presentation ICAR 2017
TAIR Presentation ICAR 2017
 
TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...
 
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
A Few Simple Things Authors Can Do to Make Their Data More Discoverable and R...
 
TAIR Presentation ASPB 2017
TAIR Presentation ASPB 2017TAIR Presentation ASPB 2017
TAIR Presentation ASPB 2017
 

Recently uploaded

June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 

Recently uploaded (20)

June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 

How to make your published data findable, accessible, interoperable and reusable

  • 1. Leonore Reiser and Lisa Harper UC Berkeley February 14, 2018
  • 2. Good Data Stewardship • Publish data with the paper • Describe data to your fullest ability • Use the right words to identify Data • Deposit data in the right Data Repository • Budget time for Data Management • Don’t think of it as YOUR data
  • 3. What’s in it for YOU? We all benefit from data sharing. More citations of YOUR work, increasing your visibility in the research community. Easily comply with journal and funding requirements Less time spent fulfilling requests for data.
  • 4. Publications are increasing exponentially http://bar.utoronto.ca/50YearsOfArabidopsis/
  • 7. Data re-use leads to new insights Data Processing Quality Control Validation 503 datasets 314 datasets Statistical Analysis Additional Experiments Yu Zhang et al. PNAS doi:10.1073/pnas.1716300115 NOVEL DISCOVERY MET1 and CMT3 are independently required for the maintenance of asymmetric CHH methylation at CMT2 target sites
  • 8. Credit: Melissa Haendel Wilkinson, et al., (2016) The FAIR Guiding Principles for scientific data management and stewardship 10.1038/sdata.2016.18. https://www.nature.com/articles/sdata201618 • Findable means data is human and machine readable and attached to persistent identifiers • Accessible means data can be found and retrieved by humans and machines using standard formats • Interoperable means data can be exchanged and used between systems. • Reusable means data can be used by others
  • 9. How to Make Your Published Data FAIR • Use standard formats • Supply complete metadata • Embrace Ontologies • Use persistent and unambiguous identifiers • Put your data in a long term stable repository • Cite, share freely and encourage others
  • 10. CHROM POS REF ALT Line 1 Line 2 1 12345 A C A A 3 67891 C T H C 10 23456 G T T U CHROM POS REF ALT Line 1 Line 2 Gm01 12345 A C 0/0 0/0 Gm03 67891 C T 0/1 0/0 Gm10 23456 G T 1/1 ./. CHROM POS REF ALT Line 1 Line 2 Chr01 12345 A C AA AA Chr03 67891 C T C/T CC Chr10 23456 G T TT NN ALL MEAN THE SAME! BUT ARE NOT THE SAME Use Standard formats: SNP example SNP (Single Nucleotide Polymorphism): A base, a chromosome number and genome position, and a reference to the genome assembly used, and the genotypes of lines tested. VCF: Variant Call Format Is the STANDARD Use the File format STANDARD for your data type
  • 11. DOI:/10.3389/fpls.2017.01812 Use Standard formats: Data in images is NOT accessible Data in PDF (image) format is not findable or accessible. Leave tabular data in tables
  • 12. If you use EXCEL, look out for data corruption and hidden Microsoft characters that impede parsing Zeimann, 2016 10.1186/s13059-016-1044-7 Use Standard formats: Beware of Excel Fig. 1: Prevalence of gene name errors in Supplementary Excel files Percentage of papers with gene lists effected Increase in supplementary files with gene name errors per year
  • 13. How to Make Your Published Data FAIR • Use standard formats • Supply complete metadata • Embrace Ontologies • Use persistent and unambiguous identifiers • Put your data in a long term stable repository • Cite, share freely and encourage others
  • 14. Metadata: Species = xxx Germplasm = xxx Field location = xxx Environment = xxx Measurement = xxx method Phenotype (Data): Plant is 170cm tall Metadata is data about the data, and allows understanding of the data Supply Complete Metadata
  • 15. • Write your Materials and Methods as if you wanted someone else to be able to reproduce your work. • Be accurate and complete about your bench and field work; include samples/stocks/lines used, accession numbers, sources of materials, exact measuring techniques etc. • Be AS accurate and complete about your computational pipelines. Include your created raw data files and versions. If you use reference data (eg; sequence assembly), include the version number, download dates, and download source. • Include names of software applications, versions, platforms and source. If you use a CyVerse, use their metadata reporting tools. Supply Complete Metadata
  • 19. Supply Complete Metadata 597 Possible Attributes At least 50 Attributes Genome Sequence Assembly At least 100 Attributes
  • 20. Budget TIME to provide Metadata The metadata in public databases is often confusing; a test case with Zea mays mRNA seq data reveals a high proportion of missing, misleading or incomplete metadata. 2018. https://doi.org/10.1016/j.plantsci.2017.10.014
  • 22. • Established: Genomic Standards Consortium (http://gensc.org) • Minimal Information about Any Sequence • Emerging • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types Supply Complete Metadata Ask For Help from Database People
  • 23. How to Make Your Published Data FAIR • Use standard formats • Supply complete and deep metadata • Embrace Ontologies • Use persistent and unambiguous identifiers • Put your data in a long term stable repository • Cite, share freely and encourage others
  • 24. Cell Same word, different meanings Different words, same concept Eggplant Aubergine Melongene
  • 25. Embrace Ontologies An Ontology is: A set of precisely defined terms In a logical hierarchy, and the Relationship between can be understood by computers
  • 26. PO:0020105 ligule Ontologies: Hierarchy of terms and explicit relationship among terms Plant Ontology (PO) Ligule PO:0020105 Vascular leaf PO:0009025 Leaf sheath PO:0020104 Flag leaf PO:0020103 Adult vascular leaf PO:0020103 Leaf PO:0025034
  • 27. Data from diverse types of experiments and organisms can be compared Henk J. Franssen, et al (2015) doi: 10.1242/dev.120774 (Medicago) Li,S. et al., (2016) 10.1016/j.devcel.2016.10.012 Arabidopsis Zhou, X-F, et a.l. (2014) 10.1104/pp.114.243808
  • 28. Embracing ontologies • Ontologies provide a POWERFUL, MACHINE READABLE utility for data • Find and use existing ontologies (http://www.obofoundry.org/, Planteome) • Gene Function = Gene Ontology (GO) • Sequences = Sequence Ontology (SO) • Plant Anatomy and Development = Plant Ontology (PO) • Phenotypes = Phenotype and Trait Ontology (PATO) • …..many many others • Apply them consistently • To datasets (e.g. in metadata) • In publications (e.g. TAIR GO/PO submission) • Ask Questions!
  • 29. How to Make Your Published Data FAIR • Use standard formats • Supply complete and deep metadata • Embrace Ontologies • Use persistent and unambiguous identifiers • Put your data in a long term stable repository • Cite, share freely and encourage others
  • 31. Identifiers also resolve confusion over species Is this Arabidopsis? Maize? Tomato?
  • 32. DOI:10/24/pp.17.00021 One gene- many names GOOD OK (history)
  • 33. One name- many genes
  • 34. Solution: Community Standards and Nomenclature Resources
  • 35. How to Make Your Published Data FAIR • Use standard formats • Supply complete and deep metadata • Embrace Ontologies • Use persistent and unambiguous identifiers • Put your data in a long term stable repository • Cite, share freely and encourage others
  • 36. Problem: Data is not findable because it is not available Piwowar HA, Vision TJ.(2013)Data reuse and the open data citation advantage.PeerJ1:e175https://doi.org/10.7717/peerj.175 Gibney and VanNorden doi:10.1038/nature.2013.14416
  • 37. Put your data in a stable public repository Large International Repositories for many data types for all species. ALL sequence data goes here Large but specialized databases serving many species Soybase Specialized databases serving specific communities
  • 38. Submitting to a repository: SNP example As of 9/2017, All NON- human SNPs are processed through EMBL in the European Variation Archive (EVA, https://www.ebi.ac.uk/eva/). NCBI’s dbSNP will only process Human SNPs EVA will require: • Data in (standard) Variant Calling Format (VCF) including allele frequencies • SUBMITTED Genome or Transcriptome assembly
  • 39. What if there is no specialized database? Or no recommendations from journals ? You should get a Digital Object Identifier (DOI) http://datadryad.org ** Curated, metadata https://zenodo.org/ https://figshare.com/ https://datashare.ucsf.edu/stash And just for you folks at UC……
  • 40. But.. please, don’t forget to actually complete your submission*... *And you never have to spend time fielding requests or transferring huge data files again
  • 42. How to Make Your Published Data FAIR • Use standard formats • Supply complete and deep metadata • Embrace Ontologies • Use persistent and unambiguous identifiers • Put your data in a long term stable repository • Cite, share freely and encourage others
  • 43. Cite, share freely and encourage others to be FAIR Include searchable and citable identifiers for your data in your papers Release your data with clearly defined terms of use e.g. Creative Commons (CC) CC-0, CC-BY If you do not specify restrictions are implied limiting reuse Cite all of your data sources Enhances reproducibility….. and also shows value to funders! When reviewing papers check them for FAIRness
  • 44. Good data practices benefit everyone (and help you get funded)
  • 45. A few simple things to remember when preparing your paper • Include unambiguous identifiers • Format data according to defined standards • Keep data in (parseable) tables or text • Include meaningful metadata • Deposit data in a long term stable public repository and get a DOI • It is never to early to think about (meta) data, the best time to start is BEFORE you are writing
  • 46. You can get help structuring, organizing and managing your data ● Contact your Community Database ● Don’t have one? Contact a curator (Leonore, Lisa… we live amongst you) ● UCB Research Data Management Librarians (http://researchdata.berkeley.edu/)
  • 48. What YOU can do right now to support FAIR data Ask your funders for increased access to FAIR data When you review papers- looks at the data, and be sure it is well described (Metadata is great) Change your attitude a little: You data will be more cited, more important if you make it FAIR Deposit your Data and get a DOI Ask your institution to value good data submission, and good data recycling