The challenge of sharing data well, how publishers can helpVarsha Khodiyar
Researchers, academic institutes and funders are increasingly recognizing the importance of data sharing for reproducible science. However, it is not always straightforward and clear to researchers as to how best to share data in a useful way. At Springer Nature we are working on several initiatives to help facilitate the sharing of research data in a reusable way, with our overarching goal being to publish research that is robust and reproducible. I will talk about the effort that goes into our flagship data journal, Scientific Data, to facilitate best practices in publication and sharing of research data, and share some of our experiences publishing Challenge datasets. I will also describe some of the newer Research Data Services that are now available to help all researchers (not only Springer Nature authors) to share their data in a useful way.
The challenge of sharing data well, how publishers can helpVarsha Khodiyar
Researchers, academic institutes and funders are increasingly recognizing the importance of data sharing for reproducible science. However, it is not always straightforward and clear to researchers as to how best to share data in a useful way. At Springer Nature we are working on several initiatives to help facilitate the sharing of research data in a reusable way, with our overarching goal being to publish research that is robust and reproducible. I will talk about the effort that goes into our flagship data journal, Scientific Data, to facilitate best practices in publication and sharing of research data, and share some of our experiences publishing Challenge datasets. I will also describe some of the newer Research Data Services that are now available to help all researchers (not only Springer Nature authors) to share their data in a useful way.
Transparency and reproducibility in researchLouise Corti
Talk given at the ESS Summer School: An introduction to using big data in the social sciences, 20-24 July 2020, University of Essex, Colchester, UK.
In the morning we look at publishing and sharing data and the importance of research replication, code sharing, examining what methodological issues peer reviewers might look for in a published paper using big data. An increasing number of journals in the sciences and social sciences expect a high degree of transparency and knowing how best to publish high quality raw (or processed data), methodology and code is a useful skill. We show how ‘data papers’ help to elucidate how datasets were constructed, compiled and processed, and help to showcase the value of data beyond the original research.
Presentation by Ruth Wilson on Nature Publishing Group's Scientific Data journal given at the Now and Future of Data Publishing Symposium, 22 May 2013, Oxford, UK
This presentation was provided by Patricia Payton of Proquest during the NISO webinar, Engineering Access Under the Hood, Part Two, held on November 15, 2017.
This presentation was provided by Clara Llebot of Oregon State University, during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
Identifying and tracking research resources using RRIDs: a practical approachdkNET
At this presentation, you will learn (1) Why you need to use Research Resource identifier (RRID) (2) What is Resource Identification Initiative (3) How dkNET.org supports RRID (4) What can you do with RRID
This presentation was provided by Carly Strasser of the Chan Zuckerberg Initiative during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...ARDC
Dr Jacobs' introduction to the RIA Data Management Workshop in Brisbane on 31 March 2017. The RIA Data Management Workshop series is a joint collaboration of the Australian Research Council, the National Health and Medical Research Council, the Australasian Research Management Society and the Australian National Data Service.
Presentation slides on Open Science and research reproducibility. Presented by Gareth Knight (LSHTM Research Data Manager) on 18th September 2018, as part of an Open Science event for LSHTM Week 2018.
Preparing your data for sharing and publishingVarsha Khodiyar
Talk given as part of the MRC Cognition and Brain Sciences Unit Open Science Day on 20th November 2018 , University of Cambridge (https://www.eventbrite.co.uk/e/open-science-day-at-the-mrc-cbu-tickets-50363553745)
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...SC CTSI at USC and CHLA
Date: Apr 4, 2018
Speaker: Hyoungjoo Park, PhD candidate, School of Information Studies, University of Wisconsin-Milwaukee, and Dietmar Wolfram, PhD
Overview: It is increasingly common for researchers to make their data freely available. This is often a requirement of funding agencies but also consistent with the principles of open science, according to which all research data should be shared and made available for reuse. Once data is reused, the researchers who have provided access to it should be acknowledged for their contributions, much as authors are recognised for their publications through citation. Hyoungjoo Park and Dietmar Wolfram have studied characteristics of data sharing, reuse, and citation and found that current data citation practices do not yet benefit data sharers, with little or no consistency in their format. More formalised citation practices might encourage more authors to make their data available for reuse.
Transparency and reproducibility in researchLouise Corti
Talk given at the ESS Summer School: An introduction to using big data in the social sciences, 20-24 July 2020, University of Essex, Colchester, UK.
In the morning we look at publishing and sharing data and the importance of research replication, code sharing, examining what methodological issues peer reviewers might look for in a published paper using big data. An increasing number of journals in the sciences and social sciences expect a high degree of transparency and knowing how best to publish high quality raw (or processed data), methodology and code is a useful skill. We show how ‘data papers’ help to elucidate how datasets were constructed, compiled and processed, and help to showcase the value of data beyond the original research.
Presentation by Ruth Wilson on Nature Publishing Group's Scientific Data journal given at the Now and Future of Data Publishing Symposium, 22 May 2013, Oxford, UK
This presentation was provided by Patricia Payton of Proquest during the NISO webinar, Engineering Access Under the Hood, Part Two, held on November 15, 2017.
This presentation was provided by Clara Llebot of Oregon State University, during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
Identifying and tracking research resources using RRIDs: a practical approachdkNET
At this presentation, you will learn (1) Why you need to use Research Resource identifier (RRID) (2) What is Resource Identification Initiative (3) How dkNET.org supports RRID (4) What can you do with RRID
This presentation was provided by Carly Strasser of the Chan Zuckerberg Initiative during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
Introduction to the Research Integrity Advisor Data Management Workshop, Bris...ARDC
Dr Jacobs' introduction to the RIA Data Management Workshop in Brisbane on 31 March 2017. The RIA Data Management Workshop series is a joint collaboration of the Australian Research Council, the National Health and Medical Research Council, the Australasian Research Management Society and the Australian National Data Service.
Presentation slides on Open Science and research reproducibility. Presented by Gareth Knight (LSHTM Research Data Manager) on 18th September 2018, as part of an Open Science event for LSHTM Week 2018.
Preparing your data for sharing and publishingVarsha Khodiyar
Talk given as part of the MRC Cognition and Brain Sciences Unit Open Science Day on 20th November 2018 , University of Cambridge (https://www.eventbrite.co.uk/e/open-science-day-at-the-mrc-cbu-tickets-50363553745)
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...SC CTSI at USC and CHLA
Date: Apr 4, 2018
Speaker: Hyoungjoo Park, PhD candidate, School of Information Studies, University of Wisconsin-Milwaukee, and Dietmar Wolfram, PhD
Overview: It is increasingly common for researchers to make their data freely available. This is often a requirement of funding agencies but also consistent with the principles of open science, according to which all research data should be shared and made available for reuse. Once data is reused, the researchers who have provided access to it should be acknowledged for their contributions, much as authors are recognised for their publications through citation. Hyoungjoo Park and Dietmar Wolfram have studied characteristics of data sharing, reuse, and citation and found that current data citation practices do not yet benefit data sharers, with little or no consistency in their format. More formalised citation practices might encourage more authors to make their data available for reuse.
SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...Susanna-Assunta Sansone
Part of the SciDataCon14 workshop on "Data Papers and their applications" run by myself and Brian Hole to help attendees understand current data-publishing journals and trends and help them understand the editorial processes on NPG's Scientific Data and Ubiquity's Open Health Data.
Lesson 8 in a set of 10 created by DataONE on Best Practices for Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
INSERM Workshop 246 - Management and reuse of health data: methodological issues: https://ateliersinserm.dakini.fr/en/workshop.246.management.and.reuse.of.health.data.methodological.issues-66-22.php
How can we ensure research data is re-usable? The role of Publishers in Resea...LEARN Project
How can we ensure research data is re-usable? The role of Publishers in Research Data Management, by Catriona MacCallum. 2nd LEARN Workshop, Vienna, 6th April 2016
The Dryad Digital Repository: Published data as part of the greater data ecos...Hilmar Lapp
Presented at the M3 and Biosharing Special Interest Group (SIG) meeting at ISMB 2010 in Boston, MA: http://gensc.org/gc_wiki/index.php/M3_%26_BioSharing
Digital transformation to enable a FAIR approach for health data scienceVarsha Khodiyar
Invited talk for ConTech Pharma on 1st March 2022
Abstract
Health Data Research UK is the UK’s national institute for health data science, with a mission to unite the UK’s health data to enable discoveries that improve people’s lives. In this talk, Dr Varsha Khodiyar will outline how HDR UK is bringing together disparate health data from all four countries of the United Kingdom, creating the infrastructure to enable discovery of and access to health data, and the convening standards making bodies to improve data linkage and data reuse. Varsha will also discuss how HDR UK is moving beyond the traditional confines of FAIR data to also ensure that data sharing and data use is transparent and ‘fair’ for the patients and lay public who are the subjects of these datasets.
Lessons from the UK: Data access, patient trust & real-world impact with heal...Varsha Khodiyar
Slides supporting presentation given at the virtual Beilstein Open Science Symposium in October 2021.
Abstract:
Health Data Research UK’s mission is to unite the UK’s health data to enable discoveries that improve people’s lives. Our 20-year vision is for large scale data and advanced analytics to benefit every patient interaction, clinical trial, biomedical discovery and enhance public health. A key part of HDR UK’s vision is our data portal, the Innovation Gateway. The Gateway facilitates discovery of healthcare data and simplifies data request procedures across multiple data custodians. The Gateway contains metadata on a variety of datasets, including those related to COVID-19, cardiovascular, maternal health, emergency care, primary care, secondary care, acute care, palliative care, biobanks, research cohorts and deeply phenotyped patient cohorts.
From the outset HDR UK has sought the voices, views and experiences of patient and lay-public groups to ensure there is transparency and clear public benefit in the use of the UK’s health data. Patient and public involvement is key to making the Gateway accessible, transparent and to ensure public confidence in research access to health data. The importance of public outreach combined with providing research access to data is illustrated with HDR UK’s contribution to the UK’s coronavirus pandemic response. HDR UK was tasked by the UK’s Chief Scientific Office to build and facilitate the infrastructure to support the National Core Studies, providing key insights on the evolving situation to UK policy makers during the course of the pandemic.
In this talk, I will show how HDR UK is enabling open science by facilitating the discovery of health data, and simplifying the process of requesting access to multiple datasets. I’ll discuss HDR UK’s approach to embedding transparency on research data usage for patients and public, and summarise some of the key ways in which HDR UK has contributed to the coronavirus pandemic.
The information in this slide deck was presented at the Covid Crisis in India - Information & Appeal on Sunday 23rd May 2021.
If you find the information in this slide deck useful, please donate to https://justgiving.com/fundraising/covidcrisisinindia
Data citation and sharing during article publicationVarsha Khodiyar
Deck presented to CHORUS forum on 21st Jan 2021, as part of panel on Data Citations & Sharing (https://www.chorusaccess.org/events/chorus-forum-new-connections/)
What role can publishers play in the open data ecosystem?Varsha Khodiyar
Presentation at session 3 of the NIH workshop 'Role of Generalist Repositories to Enhance Data Discoverability and Reuse' on Feb 11th, at the NIH Main Campus.
New approaches to data management: supporting FAIR data sharing at Springer N...Varsha Khodiyar
Presentation given at Biocuration 2019 Session 5 (Data standards and ontologies: Making data FAIR)
Abstract:
Since 2016, academic publishers including Springer Nature, Elsevier and Taylor & Francis have been providing standard research data policies to journal authors, reflecting key aspects of the FAIR Principles’ practical applications: sharing data in repositories, using persistent identifiers and citing data appropriately. In spite of the rise of FAIR and good data management practice, recent surveys found that nearly 60% of researchers had never heard of the FAIR Principles, and 46% are not sure how to organise their data in a presentable and useful way. In this presentation we will analyse the results of a white paper which assessed the key challenges faced by researchers in sharing their data, and discuss current initiatives and approaches to support researchers to adopt good data sharing practice.
These include the roll-out of research data policies since 2016, as well as the launch of a Helpdesk service which has provided support to authors and allowed the research data team to capture more granular information on the challenges they face in sharing their data. We will also discuss the development of a third-party curation service which assists authors in depositing their data into appropriate repositories, and drafting data availability statements.
Finally we will assess the impacts of some of these interventions, including an analysis of data availability statements and an overview of the methods authors are currently using to share their data, and how these align with FAIR.
The value of data curation as part of the publishing processVarsha Khodiyar
Presentation given at Biocuration 2019 Session 5 (Interacting with the Research Community)
Abstract:Journals and publishers have an important role to play in the drive to increase the reproducibility of published science. Since its launch in 2014, the Nature Research journal Scientific Data has established a reputation for publishing data papers (‘Data Descriptors’) that are highly reusable, as evidenced by a strong citation record. One of the ways in which Scientific Data ensures maximum reusability of published data is via the in-house data curation workflow applied to every Data Descriptor. In 2017, Springer Nature launched its Research Data Support (RDS) service to provide data curation expertise to researchers publishing at other Springer Nature journals.
During curation at Scientific Data and RDS, our data editors familiarise themselves with the related manuscript and perform a thorough check of each data archive. This ensures the descriptions in the manuscript match the metadata and data at the data repositories. The curation process facilitates the identification of any discrepancies between the manuscript text and the information held at the data repository.
Over the last year, the curation team have been recording the types of discrepancies rectified as a direct result of our curation process. At Scientific Data approximately 10% of the discrepancies the team find are significant enough to potentially have warranted a formal correction had the issue had not been resolved prior to publication.
In this presentation we give an overview of our observed outcomes from embedding data curation within the publishing process. We describe of how we are monitoring the value of our curation work, and show examples of the types of discrepancy most commonly identified through curation at Scientific Data and RDS.
Facilitating good research data management practice as part of scholarly publ...Varsha Khodiyar
Presentation given to the SciDataCon #IDW2018 session: Democratising Data Publishing: A Global Perspective, on Tuesday 6th November 2018, Gaborone, Botswana
Practical challenges for researchers in data sharingVarsha Khodiyar
Presentation given at the Research Data Alliance Plenary 12 session: IG Open Questionnaire for Research Data Sharing Survey, on Tuesday 6th November 2018, Gaborone, Botswana
Update from Data policy standardisation and implementation IGVarsha Khodiyar
Update given to the Research Data Alliance Plenary 12 joint meeting session: WG FAIRSharing Registry and Data Policy Standardisation and Implementation IG, on Monday 5th November 2018, Gaborone, Botswana
Data Publishing and Institutional RepositoriesVarsha Khodiyar
Slides presented at the Force16 panel discussion on 18th April 2016 "Libraries united in opening new scholarly platforms" https://www.force11.org/meetings/force2016/program/agenda/concurrent-session-libraries-united-opening-new-scholarly
Presentation given at Open Science question and answer session hosted by the Institute for Quantitative Social Science (IQSS), and the Office for Scholarly Communication (OSC) at Harvard University, on July 16th 2014.
Slides shown to BOSC2014 (Bioinformatics Open Source Conference 2014) attendees as an introduction to the open science journal F1000Research, prior to a panel discussion on reproducibility.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Comparative structure of adrenal gland in vertebrates
Data sharing as part of the research workflow
1. Data sharing as part of the research workflow
Varsha Khodiyar, PhD
Data Curation Editor, Scientific Data
Nature Publishing Group
@varsha_khodiyar
@scientificdata
Perspective from Scientific Data
Data Perspective beyond Alliances, 3rd March 2016
2. Why the push to share data?
Research conduct
Publication bias – what is submitted
Experimental design
Statistics
Lab supervision and training
Research reporting and sharing
Gels, microscopy images
Statistical reporting
Methods description
Data deposition and availability
2
3. Generating research data is expensive
Just 18.1% NIH grant applications funded in 2014*
• Hours spent writing grants?
• Hours spent reviewing grants?
Resources are finite/expensive
• Modified animals
• Specialized reagents
Time and effort taken in the laboratory to generate
good, valid data
* report.nih.gov/success_rates/Success_ByIC.cfm
4. Data needs to be…
Discoverable
Need to
know it’s
there
Accessible
Must be able
to get to the
data
Usable
Require
sufficient
information
about how
the data was
generated
Persistent
Historical
data access
as part of the
scientific
record, as
well as for
new research
Reliable
Data
provenance
informs data
reuse
decisions
Joint Declaration of Data Citation Principles www.force11.org/group/joint-declaration-
data-citation-principles-final
Achieving human and machine accessibility of cited data in scholarly publications Starr et
al. PeerJ Computer Science (2015). doi:10.7717/peerj-cs.1
Making data count Kratz & Strasser. Sci. Data (2015). doi:10.1038/sdata.2015.39
The FAIR guiding principles for scientific data management and stewardship Williams et al.
Sci. Data (in press)
5. Researchers already share data
• Most researchers are sharing
data, and using the data of
others
• Direct contact between
researchers (on request) is a
common way of sharing data
• Repositories are second most
common method of sharing
Kratz and Strasser (2015) doi: 10.1371/journal.pone.0117619 9
6. But…
Sharing of data upon request from published articles
• relies heavily on trust
• when stored informally, disappears at a rate of ~17% per year
(Vines et al. 2014; doi: 10.1016/j.cub.2013.11.014)
Data shared in a repository
• often not reusable due to insufficient context
• may not be possible to determine reliability (peer review?)
• may not be easily findable, if not referenced in a scholarly
article
• no scholarly credit for data producers
7. Data papers and journals
• Ensure formal storage in repository
• Allow space for authors to include sufficient context for
reuse
• Peer reviewers often specifically requested to comment
on data archive reusability
• Data paper are formal works, giving scholarly credit to
data producers
• Formal data citations enabling data discovery via
bibliographic indexes that researchers are used to using
8. Data journals and multidisciplinary research
Cross-domain data sharing vital for solving the most pressing world
issues:
• Public health (social science, epidemiology & molecular biology)
• Resource management & sustainability (energy research, policy,
ecology & climate science)
Differences between researchers of vocabulary and expressions of
reliability, mean clear descriptions of data become even more essential
for cross-domain data sharing.
Multidisciplinary data journals (e.g. Data Science Journal, Scientific
Data):
• provide a data sharing outlet to researchers in all domains
• help datasets cross domain boundaries, data is more visible and
searchable i.e. less siloing
8
9. Data reuse by the research community
9
“The Data Descriptor made it easier
to use the data, for me it was critical
that everything was there…all the
technical details like voxel size.”
Professor Daniele Marinazzo
10. Data reuse by the non-research community
10
http://www.nytimes.com/interactive/2014/12/30/science/history-of-ebola-in-24-outbreaks.html
11. Increasing the discoverability of data
• Is data truly discoverable by
researchers outside the original
authors domain?
• Too many papers to read in each
person’s own field.
• Could increasing the machine
accessibility of data, result in
increased data reuse?
12. Data Descriptors have human and machine readable
components
12
Human readable
representation of
study
i.e. article (HTML &
PDF)
Human readable
representation of
study
i.e. article (HTML
& PDF)
Machine
readable
representation
of study
i.e. metadata
13. • We capture metadata about the data being described in each Data Descriptor
• The manuscript captures human readable metadata needed for data reuse
• The curated metadata records capture machine readable metadata needed for
machine based data discovery
Metadata at Scientific Data
14. ISA format for machine readable metadata
14
• Study workflow
• Key sample characteristics
needed for data discovery
• Relates samples to data files
• Shows location of dataset
• Uses controlled vocabularies
and ontologies (where
possible)
15. Metadata for data discovery
Search by:
• Data Repositories
• Experiment design
• Measurements made
• Technologies used
• Factor types
• Sample Characteristics
• Organism
• Environment types
• Geographic locations
scientificdata.isa-explorer.org
16. 16
After data
analysis has
been
published
Before analysis
has been
published
Authors not
intending to
analyse data
Data Descriptors can be
submitted and published
at any point in the
research workflow
After data
analysis has
been
published
Before the
analysis has
been published
Publication
alongside analysis
article
Data as part of the publication workflow
17. Data as part of the research workflow?
Papers usually written after analyses, key details can be forgotten
• Ideally metadata would be captured during data generation
process
• Takes time and effort to capture adequate metadata of
sufficient quality for data reuse
Machine readable metadata
• Metadata format needs to be decided prospectively
• Researchers require professional expertise and guidance to use
ontologies (essential for machine readability and discovery)
How to ensure data generators are able to capture metadata easily
and in sufficient detail for reuse?
17
18. Discoverable
Machine
based data
discovery
Implement
data citations
Use
community
ontologies
Accessible &
Persistent
Encourage
use of
repositories
Use
persistent
identifiers
for data
Usable
Metadata
capture
during data
generation
process
Encourage
use of
minimal
reporting
standards
Reliable
Encourage
peer
reviewers to
evaluate
data archive
(structure,
format)
alongside
the article
Researcher
incentives
Recognise
data as a
first class
scholarly
work
Provide
tools for
data
visualization
and
discovery
Building infrastructure to promote data sharing as part
of the research workflow
19. Scientific Data at RDA
Working groups
Publishing Data Workflows
(co-chair)
BioSharing Registry
(Susanna Sansone is co-chair)
Interest groups
Publishing Data
Data Fabric
Data in Context
Metadata
Certification of Digital Repositories
19
20. Visit nature.com/sdata
Email scientificdata@nature.com
Tweet @ScientificData
Honorary Academic Editor
Susanna-Assunta Sansone
Managing Editor
Andrew L. Hufton
Data Curation Editor
Varsha K. Khodiyar
Advisory Panel and Editorial
Board including senior researchers,
funders, librarians and curators
Supported by