SlideShare a Scribd company logo
1 of 44
Download to read offline
Consultant, 
Honorary Academic Editor 
Associate Director, 
Principal Investigator 
http://www.slideshare.net/SusannaSansone 
! 
High quality data publications: 
drives and needs 
! 
Susanna-Assunta Sansone, PhD! 
! 
! 
@biosharing! 
@isatools! 
@scientificdata! 
! 
BBSRC DTP, Oxford, 15 December, 2014
Credit to: 
https://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/
Plagued by selective reporting of data and methods 
• Over 50% of completed studies in 
biomedicine do not appear in the 
published literature! 
! 
• Often because results do not 
conform to author's hypotheses! 
“Only half the health-related 
studies funded by the European 
Union between 1998 and 2006 - 
an expenditure of €6 billion - led 
to identifiable reports”!
Incentivizing individual contributor to share data 
• Big science efforts! 
o data is often better organized, reported and shared! 
• Small independent efforts, yielding a rich variety of specialty data sets! 
o Most of these data (such as null findings) is unpublished! 
o These dark data hold a potential wealth of knowledge!
A community mobilization for “openness” 
http://discovery.urlibraries.org/ https://okfn.org 
image by Greg Emmerich 
Open data is a means to do 
better science more efficiently! 
http://opendefinition.org/licenses/ 
http://pantonprinciples.org 
https://creativecommons.org
Open access is not enough on its own 
http://www.theguardian.com/higher-education-network/blog/2014/jun/26 
If your research has been funded by 
the taxpayer, there's a good chance 
you'll be encouraged to publish your 
results on an open access basis….. 
This final article makes publicly 
available the hypotheses, 
interpretations and conclusions of your 
research. 
But what about the data that led you 
to those results and conclusions?
Also open data is not always enough 
http://www.theguardian.com/higher-education-network/blog/2014/jun/26 
So data that is in theory open and 
free to access! 
• may still be hard to get hold of! 
• it may not have been stored or cited 
in the appropriate manner! 
• it may not be interoperable with 
related data because it is not 
formatted appropriately; or! 
• it may not be reusable because it 
may not contain enough information 
for others to understand it!
Movement for FAIR data in life and medical sciences 
http://bd2k.nih.gov/workshops.html#ADDS
Because, in all fairness, not much data is FAIR!
Responsibilities lie across several stakeholder groups 
Understand the benefits of sharing 
FAIR datasets and enact them 
Engage and assist researchers to 
enable them to share FAIR datasets 
Release or endorse practices 
and polices, but also incentive 
and credit mechanisms for 
researchers, curators and 
developers
Publishers occupy a leverage point 
Because of importance of formal 
publications in the academic ! 
incentive structure!
Role of publishers as “agents of change” 
Serve as the implementation and/or enforcement arm 
at the point of publication!
Publishers and data/reproducibility 
• Policies on access (to data, code, reagents etc.)! 
o Supporting funder & community needs! 
• Format and amount of content! 
o Methodological details, supplementary info, data integration and 
links to repositories! 
• Licensing for reuse! 
• Incentives to share! 
o Data citations! 
o Data journals and articles! 
• Quality assurance through peer review! 
Credit to: 
Iain Hrynaszkiewicz
Nature Publishing Group: the changing landscape 
Human Genome 2001 
62 Pages, 150 Authors, 
49 Figure, 27 tables 
Encode Project 2012 
30 papers, 
3 Journals
2013 
Credit to: 
Iain Hrynaszkiewicz
Data/reproducibility at NPG 
Wang et al, Nature, 2013 
doi:10.1038/nature12730 
• Figure source data 
o putting data behind figures/graphs
Data/reproducibility at NPG 
• Figure source data 
o putting data behind figures/graphs 
• Data citation 
o tackling both styling and format; monitoring community developments, 
such the Data Citation Synthesis Group 
• Code reproducibility 
o peer review, availability and reuse 
• NPG’s Linked Data release – CC0 
• A new data journal
Data journals everywhere? 
Credit to: 
Iain Hrynaszkiewicz
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
A new open-access, online-only publication for 
descriptions of scientifically valuable datasets !
• Get Credit for Sharing Your Data 
• Publications will be listed in the major indexes and will be citeable 
• Focused on Data Reuse 
• All the information others need to reuse the data; no interpretative 
analysis or hypothesis testing 
• Open-access 
• Authors select from three Creative Commons licences for the main 
• Data Descriptor. Each publication supported by curated CC0 
metadata 
• Peer-reviewed 
• Rigorous peer-review managed by our Editorial Board of academic 
researchers ensures data quality and standards 
• Promoting Community Data Repositories 
• Data stored in community data repositories
Introducing a new content type: the Data Descriptor 
• Designed to make data more discoverable, interpretable and 
reusable! 
• Concerned with the facts behind the methodology 
of data generation/collection and processing! 
• Complements a journal article! 
Synthesis 
Analysis 
Data Descriptor 
Conclusions 
Interpretation 
What is the 
sample? 
What did I do to 
generate the data? 
How was the data 
processed? 
Where is the data? 
Who did what when? 
Summary of 
Data 
Descriptor 
Facts 
Data Descriptor 
Journal article 
NARRATIVE
Data Descriptor: narrative and structure! 
! 
! 
! 
Experimental metadata or ! 
structured component! 
(in-house curated, machine-readable 
formats)! 
Article or ! 
narrative component! 
(PDF and HTML) !
Data Descriptor: narrative! 
Focus on data reuse! 
Detailed descriptions of the methods and technical analyses supporting the 
quality of the measurements.! 
Does not contain tests of new scientific hypotheses! 
In traditional publications this 
information is not provided in a 
sufficiently detailed manner 
However this information is 
essential for understanding, 
reusing, and reproducing 
datasets 
Sections:! 
• Title! 
• Abstract! 
• Background & Summary! 
• Methods! 
• Technical Validation! 
• Data Records! 
• Usage Notes ! 
• Figures & Tables ! 
• References! 
• Data Citations! 
!
Data Descriptor: narrative! 
Focus on data reuse! 
Detailed descriptions of the methods and technical analyses supporting the 
quality of the measurements.! 
Does not contain tests of new scientific hypotheses! 
Sections:! 
• Title! 
• Abstract! 
• Background & Summary! 
• Methods! 
• Technical Validation! 
• Data Records! 
• Usage Notes ! 
• Figures & Tables ! 
• References! 
• Data Citations! 
!
Data Descriptor: narrative! 
Focus on data reuse! 
Detailed descriptions of the methods and technical analyses supporting the 
quality of the measurements.! 
Does not contain tests of new scientific hypotheses! 
Sections:! 
• Title! 
• Abstract! 
• Background & Summary! 
• Methods! 
• Technical Validation! 
• Data Records! 
• Usage Notes ! 
• Figures & Tables ! 
• References! 
• Data Citations! 
! 
Joint Declaration of Data Citation Principles by the 
Data Citation Synthesis Group
Data Descriptor: structure - content ! 
In-house editorial curator:! 
• assists users to submit the structured 
content via simple templates and an 
internal authoring tool! 
• performs value-added semantic 
annotation of the experimental 
metadata! 
For advanced users/service providers 
willing to export ISA-Tab for direct 
submission, we have released a technical 
specification:! 
Data file or ! 
record in a 
database! 
analysis ! 
method! script!
Workflow overview! 
Green: author; Purple: repository; Blue: SciData; Red: production
Collect 
Data! 
Publish your data early! 
Follow-up 
experiments! 
Publish 
Findings! 
Publish 
Data! 
Scientific Data’s prior publication policy with other NPG journals 
protects your ability to publish the screen data and the hits later 
Credit to: 
Andrew Hufton
Hao et al.: Environmental! 
8 citations 
Data sets from the Global Integrated 
Drought Monitoring and Prediction 
System (GIDMaPS), which provides 
drought information based on multiple 
drought indicators
Hao et al.: Environmental! 
8 citations 
New Dataset 
• Data in figshare 
• Code in figshare
Hao et al.: Environmental! 
8 citations 
New Dataset 
• Data in figshare 
• Code in figshare 
• Cited in Science
! 
! 
! 
! 
! 
! 
! 
! 
! 
Code in GitHub 
! 
! 
! 
! 
! 
! 
! 
! 
! 
Data in OpenfMRI 
Hanke: Neuroscience ! 
New Dataset
Or your data and findings simultaneously! 
Collect 
Data! 
Follow-up 
experiments! 
Publish 
Findings! 
Submit 
Data! 
Hold 
publication! 
Scientific Data will hold a Data Descriptor publication that has 
been accepted for publication, while your other related research 
publications clear peer review 
Credit to: 
Andrew Hufton
Or after the findings, but….! 
Collect 
Data! 
Follow-up 
experiments! 
Publish 
Findings! 
Publish 
Data! 
• A fuller, more in-depth look at the data processing steps, 
supported by additional data files and code from each step 
• And/or additional tutorial-like information for scientists 
interested in reusing or integrating the data with their own
Messina et al.: Epidemiology! 
4 citations 
The most comprehensive geographic 
collection of human dengue virus 
occurrence data (1960 -2012), linked 
to point or polygon locations, derived 
from peer-reviewed literature and 
case reports as well as informal online 
sources
! 
! 
! 
! 
! 
! 
! 
! 
Scientific hypotheses:! 
Synthesis! 
Analysis! 
Conclusions! 
Messina et al.: Epidemiology! 4 citations 
Associated Nature Article 
• Data in figshare 
Methods and technical analyses supporting 
the quality of the measurements:! 
What did I do to generate the data?! 
How was the data processed?! 
Where is the data?! 
Who did what when!
Adding value to research articles and data records 
Research 
papers 
Descriptors 
Data 
Data 
records
Helping authors find the right place for the data! 
• We currently recognize over 60 public 
data repositories, and provide advice on 
the best place for authors to archive their 
data! 
• We have integrated systems with both:! 
! 
! 
2 
4 
3 
10 4 
1 
4 
3 
4 
“Omics” is emphasized 
among basic life-sciences 
repositories 
DNA and protein sequence 
Functional genomics 
Genetic association and genome variation 
Metagenomics 
Molecular interactions 
Organism- or disease-specific 
Proteomics 
Taxonomy and species diversity 
Traces and sequencing reads
3 Big 
data 
| 
CSE 
2014 
9 
Repositories criteria! 
1. Broad support and recognition within their scientific community ! 
2. Ensure long-term persistence and preservation of datasets! 
3. Provide expert curation ! 
4. Implement relevant, community-endorsed reporting requirements ! 
Progressively monitor this via ! 
5. Provide for confidential review of submitted datasets ! 
6. Provide stable identifiers for submitted datasets ! 
7. Allow public access to data without unnecessary restrictions !
Citations of and links to data files - databases!
Peer review process focused on quality and reuse! 
Evaluation is not be based on the perceived impact ! 
or novelty of the findings or size of the data! 
! 
• Experimental rigour and technical data quality! 
o Methodologically sound! 
o Technical validation experiments and statistical analyses! 
o Depth, coverage, size, and/or completeness of data sufficient for the types 
of applications! 
• Completeness of the description! 
o Sufficient details to allow others to reproduce the results, reuse or 
integrate it with other data! 
o Compliance with relevant minimum information or reporting standards! 
• Integrity of the data files and repository record! 
o Data files match the descriptions in the Data Descriptor! 
o Deposited in the most appropriate available databases!
Current content is diverse - bimonthly releases ! 
• Neuroscience, ecology, epidemiology, environmental science, 
functional genomics, metabolomics, toxicology etc.! 
• New previously published individual datasets, curated 
aggregation and citizen science:! 
• Datasets in figshare, Dryad and domain specific databases! 
• Code deposited in figshare and GitHub! 
• First collection:! 
42
Supported by:! 
Advisory Panel including senior researchers, funders, librarians and curators 
Michael Huerta ● National Institutes of Health, USA ● Mark Thorley ● Natural Environment Research 
Council, UK ● Patricia Cruse ● University of California, USA ● Susan Gregurick ● Office of 
Biological and Environmental Research, Department of Energy, USA ● Ioannis Xenarios ● Swiss 
Institute of Bioinformatics, Switzerland ● Chris Bowler ● IBENS, France ● Mark Forster ● Syngenta, 
UK ● Anthony Rowe ● Johnson & Johnson, USA ● Stephen Chanock ● National Cancer Institute, 
USA ● Weida Tong ● National Center for Toxicological Research, FDA, USA ● Albert J. R. Heck ● 
Utrecht University, The Netherlands ● Johanna McEntyre ● EMBL-EBI, European Bioinformatics 
Institute, UK ● Simon Hodson ● CODATA, France ● Joseph R. Ecker ● Howard Hughes Medical 
Institute & Salk Institute, USA ● Stephen Friend ● Sage Bionetworks, USA ● Jessica Tenenbaum ● 
Duke Translational Medicine Institute, USA ● Anne-Claude Gavin ● EMBL, Germany ● David Carr ● 
Wellcome Trust, UK ● Wolfram Horstmann ● Göttingen State and University Library, Germany ● 
Piero Carninci ● RIKEN Omics Science Center, Japan ● Pascale Gaudet ● Swiss Institute of 
Bioinformatics, Switzerland ● Judith A. Blake ● The Jackson Laboratory, USA ● Richard H. 
Scheuermann ● J. Craig Venter Institute, USA ● Caroline Shamu ● Harvard Medical School, USA 
Susanna-Assunta Sansone 
Honorary Academic Editor 
(University of Oxford, UK) 
Andrew L Hufton 
Managing Editor 
Varsha Khodiyar 
Editorial Curator 
Iain Hrynaszkiewicz 
Publisher 
An open access, peer-reviewed publication for 
descriptions of scientifically valuable datasets! 
Launched May 2014

More Related Content

What's hot

NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
Susanna-Assunta Sansone
 
Data management (1)
Data management (1)Data management (1)
Data management (1)
SM Lalon
 

What's hot (20)

Managing and sharing confidential data in Australian social science
Managing and sharing confidential data	in Australian social scienceManaging and sharing confidential data	in Australian social science
Managing and sharing confidential data in Australian social science
 
THOR Workshop - Services PANGAEA
THOR Workshop - Services PANGAEATHOR Workshop - Services PANGAEA
THOR Workshop - Services PANGAEA
 
Anonymisation 101
Anonymisation 101Anonymisation 101
Anonymisation 101
 
Federal Funding Agency's Public Access Policies and You
Federal Funding Agency's Public Access Policies and YouFederal Funding Agency's Public Access Policies and You
Federal Funding Agency's Public Access Policies and You
 
Publishing and impact 20141028
Publishing and impact 20141028Publishing and impact 20141028
Publishing and impact 20141028
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOS
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
 
UWA Research Week 2016
UWA Research Week 2016UWA Research Week 2016
UWA Research Week 2016
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
Data sharing as part of the research workflow
Data sharing as part of the research workflowData sharing as part of the research workflow
Data sharing as part of the research workflow
 
Gaining credit for sharing research data
Gaining credit for sharing research dataGaining credit for sharing research data
Gaining credit for sharing research data
 
THOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing ElsevierTHOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing Elsevier
 
Data management (1)
Data management (1)Data management (1)
Data management (1)
 
The Right Metrics for Generation Open [Open Access Week 2014]
The Right Metrics for Generation Open [Open Access Week 2014]The Right Metrics for Generation Open [Open Access Week 2014]
The Right Metrics for Generation Open [Open Access Week 2014]
 
SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...
SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...
SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...
 
THOR Workshop - Data Publishing
THOR Workshop - Data PublishingTHOR Workshop - Data Publishing
THOR Workshop - Data Publishing
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset use
 
Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016
 
Peer Reviewing Data: experiences from a data journal
Peer Reviewing Data: experiences from a data journalPeer Reviewing Data: experiences from a data journal
Peer Reviewing Data: experiences from a data journal
 

Similar to Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014

Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Susanna-Assunta Sansone
 
Managing Big Data - Berlin, July 9-10, 201.
Managing Big Data - Berlin, July 9-10, 201.Managing Big Data - Berlin, July 9-10, 201.
Managing Big Data - Berlin, July 9-10, 201.
Susanna-Assunta Sansone
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
Susanna-Assunta Sansone
 
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
NASIG
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producers
Incisive_Events
 

Similar to Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014 (20)

Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014
 
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
 
Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015 Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015
 
Managing Big Data - Berlin, July 9-10, 201.
Managing Big Data - Berlin, July 9-10, 201.Managing Big Data - Berlin, July 9-10, 201.
Managing Big Data - Berlin, July 9-10, 201.
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
 
Big data, small data, data papers - short statement for "BDebate on Biomedici...
Big data, small data, data papers - short statement for "BDebate on Biomedici...Big data, small data, data papers - short statement for "BDebate on Biomedici...
Big data, small data, data papers - short statement for "BDebate on Biomedici...
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystem
 
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things data
 
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLANINCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
 
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveviewRDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
 
How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producers
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
 
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
 
Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, Visualise
 
Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 

More from Susanna-Assunta Sansone

More from Susanna-Assunta Sansone (20)

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
FAIRsharing-Standards-4-GSC-Aug23.pdf
FAIRsharing-Standards-4-GSC-Aug23.pdfFAIRsharing-Standards-4-GSC-Aug23.pdf
FAIRsharing-Standards-4-GSC-Aug23.pdf
 
FAIR-4-GSC-Sansone-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdfFAIR-4-GSC-Sansone-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdf
 
FAIRsharing & FAIRcookbook at RDA 2023
FAIRsharing & FAIRcookbook at RDA 2023FAIRsharing & FAIRcookbook at RDA 2023
FAIRsharing & FAIRcookbook at RDA 2023
 
NFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRNFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIR
 
Metadata Standards
Metadata StandardsMetadata Standards
Metadata Standards
 
FAIRcookbook: GSRS22-Singapore
FAIRcookbook: GSRS22-SingaporeFAIRcookbook: GSRS22-Singapore
FAIRcookbook: GSRS22-Singapore
 
FAIR Cookbook
FAIR Cookbook FAIR Cookbook
FAIR Cookbook
 
FAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesFAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipes
 
FAIRsharing and the FAIR Cookbook
FAIRsharing and the FAIR Cookbook FAIRsharing and the FAIR Cookbook
FAIRsharing and the FAIR Cookbook
 
FAIRsharing for EOSC
FAIRsharing for EOSC FAIRsharing for EOSC
FAIRsharing for EOSC
 
FAIR: standards and services
FAIR: standards and servicesFAIR: standards and services
FAIR: standards and services
 
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookFAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
 
FAIRsharing: what we do for policies
FAIRsharing: what we do for policiesFAIRsharing: what we do for policies
FAIRsharing: what we do for policies
 
FAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessFAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRness
 
ELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - ExamplarsELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - Examplars
 
FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features
 
FAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 responseFAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 response
 
FAIRsharing poster
FAIRsharing posterFAIRsharing poster
FAIRsharing poster
 
The FAIR Cookbook poster
The FAIR Cookbook posterThe FAIR Cookbook poster
The FAIR Cookbook poster
 

Recently uploaded

Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 

Recently uploaded (20)

Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 

Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014

  • 1. Consultant, Honorary Academic Editor Associate Director, Principal Investigator http://www.slideshare.net/SusannaSansone ! High quality data publications: drives and needs ! Susanna-Assunta Sansone, PhD! ! ! @biosharing! @isatools! @scientificdata! ! BBSRC DTP, Oxford, 15 December, 2014
  • 3. Plagued by selective reporting of data and methods • Over 50% of completed studies in biomedicine do not appear in the published literature! ! • Often because results do not conform to author's hypotheses! “Only half the health-related studies funded by the European Union between 1998 and 2006 - an expenditure of €6 billion - led to identifiable reports”!
  • 4. Incentivizing individual contributor to share data • Big science efforts! o data is often better organized, reported and shared! • Small independent efforts, yielding a rich variety of specialty data sets! o Most of these data (such as null findings) is unpublished! o These dark data hold a potential wealth of knowledge!
  • 5. A community mobilization for “openness” http://discovery.urlibraries.org/ https://okfn.org image by Greg Emmerich Open data is a means to do better science more efficiently! http://opendefinition.org/licenses/ http://pantonprinciples.org https://creativecommons.org
  • 6. Open access is not enough on its own http://www.theguardian.com/higher-education-network/blog/2014/jun/26 If your research has been funded by the taxpayer, there's a good chance you'll be encouraged to publish your results on an open access basis….. This final article makes publicly available the hypotheses, interpretations and conclusions of your research. But what about the data that led you to those results and conclusions?
  • 7. Also open data is not always enough http://www.theguardian.com/higher-education-network/blog/2014/jun/26 So data that is in theory open and free to access! • may still be hard to get hold of! • it may not have been stored or cited in the appropriate manner! • it may not be interoperable with related data because it is not formatted appropriately; or! • it may not be reusable because it may not contain enough information for others to understand it!
  • 8. Movement for FAIR data in life and medical sciences http://bd2k.nih.gov/workshops.html#ADDS
  • 9. Because, in all fairness, not much data is FAIR!
  • 10. Responsibilities lie across several stakeholder groups Understand the benefits of sharing FAIR datasets and enact them Engage and assist researchers to enable them to share FAIR datasets Release or endorse practices and polices, but also incentive and credit mechanisms for researchers, curators and developers
  • 11. Publishers occupy a leverage point Because of importance of formal publications in the academic ! incentive structure!
  • 12. Role of publishers as “agents of change” Serve as the implementation and/or enforcement arm at the point of publication!
  • 13. Publishers and data/reproducibility • Policies on access (to data, code, reagents etc.)! o Supporting funder & community needs! • Format and amount of content! o Methodological details, supplementary info, data integration and links to repositories! • Licensing for reuse! • Incentives to share! o Data citations! o Data journals and articles! • Quality assurance through peer review! Credit to: Iain Hrynaszkiewicz
  • 14. Nature Publishing Group: the changing landscape Human Genome 2001 62 Pages, 150 Authors, 49 Figure, 27 tables Encode Project 2012 30 papers, 3 Journals
  • 15. 2013 Credit to: Iain Hrynaszkiewicz
  • 16. Data/reproducibility at NPG Wang et al, Nature, 2013 doi:10.1038/nature12730 • Figure source data o putting data behind figures/graphs
  • 17. Data/reproducibility at NPG • Figure source data o putting data behind figures/graphs • Data citation o tackling both styling and format; monitoring community developments, such the Data Citation Synthesis Group • Code reproducibility o peer review, availability and reuse • NPG’s Linked Data release – CC0 • A new data journal
  • 18. Data journals everywhere? Credit to: Iain Hrynaszkiewicz
  • 19. ! ! ! ! ! ! ! ! ! ! ! A new open-access, online-only publication for descriptions of scientifically valuable datasets !
  • 20. • Get Credit for Sharing Your Data • Publications will be listed in the major indexes and will be citeable • Focused on Data Reuse • All the information others need to reuse the data; no interpretative analysis or hypothesis testing • Open-access • Authors select from three Creative Commons licences for the main • Data Descriptor. Each publication supported by curated CC0 metadata • Peer-reviewed • Rigorous peer-review managed by our Editorial Board of academic researchers ensures data quality and standards • Promoting Community Data Repositories • Data stored in community data repositories
  • 21. Introducing a new content type: the Data Descriptor • Designed to make data more discoverable, interpretable and reusable! • Concerned with the facts behind the methodology of data generation/collection and processing! • Complements a journal article! Synthesis Analysis Data Descriptor Conclusions Interpretation What is the sample? What did I do to generate the data? How was the data processed? Where is the data? Who did what when? Summary of Data Descriptor Facts Data Descriptor Journal article NARRATIVE
  • 22. Data Descriptor: narrative and structure! ! ! ! Experimental metadata or ! structured component! (in-house curated, machine-readable formats)! Article or ! narrative component! (PDF and HTML) !
  • 23. Data Descriptor: narrative! Focus on data reuse! Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.! Does not contain tests of new scientific hypotheses! In traditional publications this information is not provided in a sufficiently detailed manner However this information is essential for understanding, reusing, and reproducing datasets Sections:! • Title! • Abstract! • Background & Summary! • Methods! • Technical Validation! • Data Records! • Usage Notes ! • Figures & Tables ! • References! • Data Citations! !
  • 24. Data Descriptor: narrative! Focus on data reuse! Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.! Does not contain tests of new scientific hypotheses! Sections:! • Title! • Abstract! • Background & Summary! • Methods! • Technical Validation! • Data Records! • Usage Notes ! • Figures & Tables ! • References! • Data Citations! !
  • 25. Data Descriptor: narrative! Focus on data reuse! Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.! Does not contain tests of new scientific hypotheses! Sections:! • Title! • Abstract! • Background & Summary! • Methods! • Technical Validation! • Data Records! • Usage Notes ! • Figures & Tables ! • References! • Data Citations! ! Joint Declaration of Data Citation Principles by the Data Citation Synthesis Group
  • 26. Data Descriptor: structure - content ! In-house editorial curator:! • assists users to submit the structured content via simple templates and an internal authoring tool! • performs value-added semantic annotation of the experimental metadata! For advanced users/service providers willing to export ISA-Tab for direct submission, we have released a technical specification:! Data file or ! record in a database! analysis ! method! script!
  • 27. Workflow overview! Green: author; Purple: repository; Blue: SciData; Red: production
  • 28. Collect Data! Publish your data early! Follow-up experiments! Publish Findings! Publish Data! Scientific Data’s prior publication policy with other NPG journals protects your ability to publish the screen data and the hits later Credit to: Andrew Hufton
  • 29. Hao et al.: Environmental! 8 citations Data sets from the Global Integrated Drought Monitoring and Prediction System (GIDMaPS), which provides drought information based on multiple drought indicators
  • 30. Hao et al.: Environmental! 8 citations New Dataset • Data in figshare • Code in figshare
  • 31. Hao et al.: Environmental! 8 citations New Dataset • Data in figshare • Code in figshare • Cited in Science
  • 32. ! ! ! ! ! ! ! ! ! Code in GitHub ! ! ! ! ! ! ! ! ! Data in OpenfMRI Hanke: Neuroscience ! New Dataset
  • 33. Or your data and findings simultaneously! Collect Data! Follow-up experiments! Publish Findings! Submit Data! Hold publication! Scientific Data will hold a Data Descriptor publication that has been accepted for publication, while your other related research publications clear peer review Credit to: Andrew Hufton
  • 34. Or after the findings, but….! Collect Data! Follow-up experiments! Publish Findings! Publish Data! • A fuller, more in-depth look at the data processing steps, supported by additional data files and code from each step • And/or additional tutorial-like information for scientists interested in reusing or integrating the data with their own
  • 35. Messina et al.: Epidemiology! 4 citations The most comprehensive geographic collection of human dengue virus occurrence data (1960 -2012), linked to point or polygon locations, derived from peer-reviewed literature and case reports as well as informal online sources
  • 36. ! ! ! ! ! ! ! ! Scientific hypotheses:! Synthesis! Analysis! Conclusions! Messina et al.: Epidemiology! 4 citations Associated Nature Article • Data in figshare Methods and technical analyses supporting the quality of the measurements:! What did I do to generate the data?! How was the data processed?! Where is the data?! Who did what when!
  • 37. Adding value to research articles and data records Research papers Descriptors Data Data records
  • 38. Helping authors find the right place for the data! • We currently recognize over 60 public data repositories, and provide advice on the best place for authors to archive their data! • We have integrated systems with both:! ! ! 2 4 3 10 4 1 4 3 4 “Omics” is emphasized among basic life-sciences repositories DNA and protein sequence Functional genomics Genetic association and genome variation Metagenomics Molecular interactions Organism- or disease-specific Proteomics Taxonomy and species diversity Traces and sequencing reads
  • 39. 3 Big data | CSE 2014 9 Repositories criteria! 1. Broad support and recognition within their scientific community ! 2. Ensure long-term persistence and preservation of datasets! 3. Provide expert curation ! 4. Implement relevant, community-endorsed reporting requirements ! Progressively monitor this via ! 5. Provide for confidential review of submitted datasets ! 6. Provide stable identifiers for submitted datasets ! 7. Allow public access to data without unnecessary restrictions !
  • 40. Citations of and links to data files - databases!
  • 41. Peer review process focused on quality and reuse! Evaluation is not be based on the perceived impact ! or novelty of the findings or size of the data! ! • Experimental rigour and technical data quality! o Methodologically sound! o Technical validation experiments and statistical analyses! o Depth, coverage, size, and/or completeness of data sufficient for the types of applications! • Completeness of the description! o Sufficient details to allow others to reproduce the results, reuse or integrate it with other data! o Compliance with relevant minimum information or reporting standards! • Integrity of the data files and repository record! o Data files match the descriptions in the Data Descriptor! o Deposited in the most appropriate available databases!
  • 42. Current content is diverse - bimonthly releases ! • Neuroscience, ecology, epidemiology, environmental science, functional genomics, metabolomics, toxicology etc.! • New previously published individual datasets, curated aggregation and citizen science:! • Datasets in figshare, Dryad and domain specific databases! • Code deposited in figshare and GitHub! • First collection:! 42
  • 43.
  • 44. Supported by:! Advisory Panel including senior researchers, funders, librarians and curators Michael Huerta ● National Institutes of Health, USA ● Mark Thorley ● Natural Environment Research Council, UK ● Patricia Cruse ● University of California, USA ● Susan Gregurick ● Office of Biological and Environmental Research, Department of Energy, USA ● Ioannis Xenarios ● Swiss Institute of Bioinformatics, Switzerland ● Chris Bowler ● IBENS, France ● Mark Forster ● Syngenta, UK ● Anthony Rowe ● Johnson & Johnson, USA ● Stephen Chanock ● National Cancer Institute, USA ● Weida Tong ● National Center for Toxicological Research, FDA, USA ● Albert J. R. Heck ● Utrecht University, The Netherlands ● Johanna McEntyre ● EMBL-EBI, European Bioinformatics Institute, UK ● Simon Hodson ● CODATA, France ● Joseph R. Ecker ● Howard Hughes Medical Institute & Salk Institute, USA ● Stephen Friend ● Sage Bionetworks, USA ● Jessica Tenenbaum ● Duke Translational Medicine Institute, USA ● Anne-Claude Gavin ● EMBL, Germany ● David Carr ● Wellcome Trust, UK ● Wolfram Horstmann ● Göttingen State and University Library, Germany ● Piero Carninci ● RIKEN Omics Science Center, Japan ● Pascale Gaudet ● Swiss Institute of Bioinformatics, Switzerland ● Judith A. Blake ● The Jackson Laboratory, USA ● Richard H. Scheuermann ● J. Craig Venter Institute, USA ● Caroline Shamu ● Harvard Medical School, USA Susanna-Assunta Sansone Honorary Academic Editor (University of Oxford, UK) Andrew L Hufton Managing Editor Varsha Khodiyar Editorial Curator Iain Hrynaszkiewicz Publisher An open access, peer-reviewed publication for descriptions of scientifically valuable datasets! Launched May 2014