Making your data good enough for sharing.

•Download as PPTX, PDF•

0 likes•4,812 views

Written and presented by Wolfgang Müller (HITS) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.

Science

On Golf and Data
Wolfgang Müller
Making your data good
enough for sharing

Challenge of data sharing
• Most data never gets shared
– Wrong experimental method
– Hidden parameter discovered
– Faulty experiment
• How to prepare data in this situation?
– Don‘t want to waste time
– Want to be prepared if we share
• Propose useful way forward

80-20 rule
Voltaire: „The best is the enemy of the good“
80-20 rule: Often you can get 80% of the
benefits using 20% of the effort.
Tee-off Approach
Put-
ting
Biggest
approach
in one shot

What to share?
• Raw data (sometimes)
• Condensed, interpreted data
• Metadata: Data about the data
– Conditions of the measurements
– Information about the samples
• What was sampled?
• How was it prepared?
• How was it treated after sampling?

Levels of detail
• Action guidelines (e.g. SOP)
• Structure guidelines (e.g. F1000 data
preparation guidelines)
• Semantics guidelines (metadata + content,
e.g. some MIBBIs)
• File format standards (e.g. ISA-TAB, SBML)
• Ontologies + vocabularies (e.g. ChEBI)

Standardisation scales
• Self
• Group
• Collaborative project
• Field scale
Increasedusabilityforothers

Self-standardisation
• Store same things in same structure
– Test question: „Does Excel cell (e.g.) A2 have the same
meaning in all files about the same experiment type“?
• Name same things the same way
– Test question: „Does ‚gl‘ mean exactly the same in all
occurences“?
• Identify uniquely things that you reference.
Benefit:
Automatic adaptation of your data much easier

Identify uniquely
(e.g. McCurry et al. preprint)
1. If you create identifiers, do not DIY (Do Identifiers by
Yourself)
2. Help identifiers travel well: don’t let them leave home
without a Prefix and a Namespace
3. Make Local Resource Identifiers rugged to realworld use
4. Make the full URI simple and durable
5. Carefully consider whether to embed meaning
6. Make the full URI and CURIE clear and easy to find
7. Implement a version management policy
8. Manage complex lifecycles without deletion
9. Document the identifiers you issue and use
10. Reference responsibly and rely on full URIs

Standardisation within group or
project
Same as before, but in addition:
• Needs agreeing on how to do things
the same way
• Needs looking into standards for your domain
– Inspiration how to proceed
– Clear insight into migration paths

e.g. F1000 data preparation guidelines
• Give each column a descriptive heading
• Use a single header row
• Ensure you have used the first cell, i.e. A1
• Include Title & Legend for each spreadsheet
• Save each data file with a telling name
• Submit each table as a separate file
• Submit each work sheet as a separate file

Systems Biology Markup Language
• XML-Based format
– Levels and Versions
– Packages
• Model of relations within SBML files as UML
• Library implementations
• MIRIAM guidelines for proper annotation of
SBML files
• MIRIAM resources, MIRIAM resolver
for providing identifiers and links
• ...

The two-day Systems Biology Data Management Foundry Workshop brought together 35 participants from 5 countries to improve collaboration among data management practitioners and explore opportunities in systems biology, synthetic biology, and systems medicine. Participants gained a better understanding of different systems through show-and-tell sessions, generated ideas for cross-integration, and discussed establishing a foundry to support developers. Outcomes included forming collaborations and planning for future meetings to continue developing solutions for open, interoperable, and reusable data management.

Reflections on a (slightly unusual) multi-disciplinary academic career

Carole Goble

Reproducible Research: how could Research Objects help

Carole Goble

The FAIRDOM Commons for Systems Biology

FAIRDOM

Reproducibility (and the R*) of Science: motivations, challenges and trends

Carole Goble

Reproducible and citable data and models: an introduction.

FAIRDOM

FAIRy stories: tales from building the FAIR Research Commons

Carole Goble

Plenary Lecture Presented at INCF Neuroinformatics 2019 https://www.neuroinformatics2019.org Title: FAIRy stories: tales from building the FAIR Research Commons Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any kind of Research Object is a mantra; a method; a meme; a myth; a mystery. For the past 15 years I have been working on FAIR in a range of projects and initiatives in the Life Sciences as we try to build the FAIR Research Commons. Some are top-down like the European Research Infrastructures ELIXIR, ISBE and IBISBA, and the NIH Data Commons. Some are bottom-up, supporting FAIR for investigator-led projects (FAIRDOM), biodiversity analytics (BioVel), and FAIR drug discovery (Open PHACTS, FAIRplus). Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. There are villains and heroes. Some have happy endings; all have morals.

Introduction to FAIRDOM

Carole Goble

This document introduces FAIRDOM, a consortium that provides a platform and services to help researchers organize, manage, share, and preserve research outputs according to FAIR principles. FAIRDOM has been in operation for 10 years and has over 50 installations supporting over 118 projects. It provides tools and services to help researchers collaborate better and integrate their data, models, publications and other research objects. FAIRDOM also works with other organizations and infrastructure providers to support broader research initiatives.

MultiScale Biology Network Springboard meeting, Nottingham, UK, 1 June 2015 FAIR Data and model management for Systems Biology Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Yes, data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. And the multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation. Data and model management for the Systems Biology community is a multi-faceted one including: the development and adoption appropriate community standards (and the navigation of the standards maze); the sustaining of international public archives capable of servicing quantitative biology; and the development of the necessary tools and know-how for researchers within their own institutes so that they can steward their assets in a sustainable, coherent and credited manner while minimizing burden and maximising personal benefit. The FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has grown out of several efforts in European programmes (SysMO and EraSysAPP ERANets and the ISBE ESRFI) and national initiatives (de.NBI, German Virtual Liver Network, SystemsX, UK SynBio centres). It aims to support Systems Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth. This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges multi-scale biology presents. http://www.fair-dom.org http://www.fairdomhub.org http://www.seek4science.org

Trust and Accountability: experiences from the FAIRDOM Commons Initiative.

Carole Goble

Presented at Digital Life 2018, Bergen, March 2018. In the Trust and Accountability session. In recent years we have seen a change in expectations for the management and availability of all the outcomes of research (models, data, SOPs, software etc) and for greater transparency and reproduciblity in the method of research. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for stewardship [1] have proved to be an effective rallying-cry for community groups and for policy makers. The FAIRDOM Initiative (FAIR Data Models Operations, http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards and sensitivity to asset sharing and credit anxiety. Our aim is a FAIR Research Commons that blends together the doing of research with the communication of research. The Platform has been installed by over 30 labs/projects and our public, centrally hosted FAIRDOMHub [2] supports the outcomes of 90+ projects. We are proud to support projects in Norway’s Digital Life programme. 2018 is our 10th anniversary. Over the past decade we learned a lot about trust between researchers, between researchers and platform developers and curators and between both these groups and funders. We have experienced the Tragedy of the Commons but also seen shifts in attitudes. In this talk we will use our experiences in FAIRDOM to explore the political, economic, social and technical, social practicalities of Trust. [1] Wilkinson et al (2016) The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18 [2] Wolstencroft, et al (2016) FAIRDOMHub: a repository and collaboration environment for sharing systems biology research Nucleic Acids Research, 45(D1): D404-D407. DOI: 10.1093/nar/gkw1032

What is Reproducibility? The R* brouhaha (and how Research Objects can help)

Carole Goble

Capturing the context: one small(ish step for modellers, one giant leap for m...

FAIRDOM

Let’s go on a FAIR safari!

Carole Goble

COMBINE 2019, EU-STANDS4PM, Heidelberg, Germany 18 July 2019 FAIR: Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any other kind of Research Object one can think of, is now a mantra; a method; a meme; a myth; a mystery. FAIR is about supporting and tracking the flow and availability of data across research organisations and the portability and sustainability of processing methods to enable transparent and reproducible results. All this is within the context of a bottom up society of collaborating (or burdened?) scientists, a top down collective of compliance-focused funders and policy makers and an in-the-middle posse of e-infrastructure providers. Making the FAIR principles a reality is tricky. They are aspirations not standards. They are multi-dimensional and dependent on context such as the sensitivity and availability of the data and methods. We already see a jungle of projects, initiatives and programmes wrestling with the challenges. FAIR efforts have particularly focused on the “last mile” – “FAIRifying” destination community archive repositories and measuring their “compliance” to FAIR metrics (or less controversially “indicators”). But what about FAIR at the first mile, at source and how do we help Alice and Bob with their (secure) data management? If we tackle the FAIR first and last mile, what about the FAIR middle? What about FAIR beyond just data – like exchanging and reusing pipelines for precision medicine? Since 2008 the FAIRDOM collaboration [1] has worked on FAIR asset management and the development of a FAIR asset Commons for multi-partner researcher projects [2], initially in the Systems Biology field. Since 2016 we have been working with the BioCompute Object Partnership [3] on standardising computational records of HTS precision medicine pipelines. So, using our FAIRDOM and BioCompute Object binoculars let’s go on a FAIR safari! Let’s peruse the ecosystem, observe the different herds and reflect what where we are for FAIR personalised medicine. References [1] http://www.fair-dom.org [2] http://www.fairdomhub.org [3] http://www.biocomputeobject.org

FAIR Data, Operations and Model management for Systems Biology and Systems Me...

Carole Goble

This document discusses the FAIRDOM consortium's efforts to promote FAIR (Findable, Accessible, Interoperable, Reusable) principles for managing data, operations, and models from systems biology and systems medicine projects. It outlines challenges in asset management for multi-partner, multi-disciplinary projects using multiple formats and repositories. FAIRDOM provides pillars of support including community actions, platforms/tools, and a public project commons to help address these challenges and better enable sharing, reuse, and reproducibility of research assets according to FAIR principles.

Better Software, Better Research

Carole Goble

Reproducibility of model-based results: standards, infrastructure, and recogn...

FAIRDOM

Crediting informatics and data folks in life science teams

Carole Goble

Improving the Management of Computational Models -- Invited talk at the EBI

Martin Scharm

Research Shared: researchobject.org

Norman Morrison

This document discusses Research Objects (RO), which provide a framework for bundling, exchanging, and linking resources related to experiments in order to improve reproducibility. The RO framework uses unique identifiers, aggregation, and metadata to group related resources. Real-world examples of ROs include reviewed scientific papers, workflow runs, and Docker images. ROs can help make research fully FAIR (Findable, Accessible, Interoperable, Reusable). Tools and platforms like FAIRDOM, SEEK, and Figshare support the use of ROs.

FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...

Carole Goble

Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation. The FAIR Guiding Principles for scientific data management and stewardship (http://www.nature.com/articles/sdata201618) has been an effective rallying-cry for EU and USA Research Infrastructures. FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has 8 years of experience of asset sharing and data infrastructure ranging across European programmes (SysMO and EraSysAPP ERANets), national initiatives (de.NBI, German Virtual Liver Network, UK SynBio centres) and PI's labs. It aims to support Systems and Synthetic Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges of and approaches to sharing, credit, citation and asset infrastructures in practice. I'll also highlight recent experiments in affecting sharing using behavioural interventions. http://www.fair-dom.org http://www.fairdomhub.org http://www.seek4science.org Presented at COMBINE 2016, Newcastle, 19 September. http://co.mbine.org/events/COMBINE_2016

Reproducibility, Research Objects and Reality, Leiden 2016

Carole Goble

Presented at the Leiden Bioscience Lecture, 24 November 2016, Reproducibility, Research Objects and Reality Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. It all sounds very laudable and straightforward. BUT….. Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices. In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange In this talk I will explore these issues in data-driven computational life sciences through the examples and stories from initiatives I am involved, and Leiden is involved in too including: · FAIRDOM which has built a Commons for Systems and Synthetic Biology projects, with an emphasis on standards smuggled in by stealth and efforts to affecting sharing practices using behavioural interventions · ELIXIR, the EU Research Data Infrastructure, and its efforts to exchange workflows · Bioschemas.org, an ELIXIR-NIH-Google effort to support the finding of assets.

Mtsr2015 goble-keynote

Carole Goble

Metadata and Semantics Research Conference, Manchester, UK 2015 Research Objects: why, what and how, In practice the exchange, reuse and reproduction of scientific experiments is hard, dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: codes fork, data is updated, algorithms are revised, workflows break, service updates are released. Neither should they be viewed just as second-class artifacts tethered to publications, but the focus of research outcomes in their own right: articles clustered around datasets, methods with citation profiles. Many funders and publishers have come to acknowledge this, moving to data sharing policies and provisioning e-infrastructure platforms. Many researchers recognise the importance of working with Research Objects. The term has become widespread. However. What is a Research Object? How do you mint one, exchange one, build a platform to support one, curate one? How do we introduce them in a lightweight way that platform developers can migrate to? What is the practical impact of a Research Object Commons on training, stewardship, scholarship, sharing? How do we address the scholarly and technological debt of making and maintaining Research Objects? Are there any examples I’ll present our practical experiences of the why, what and how of Research Objects.

How are we Faring with FAIR? (and what FAIR is not)

Carole Goble

Keynote presented at the workshop FAIRe Data Infrastructures, 15 October 2020 https://www.gmds.de/aktivitaeten/medizinische-informatik/projektgruppenseiten/faire-dateninfrastrukturen-fuer-die-biomedizinische-informatik/workshop-2020/ Remarkably it was only in 2016 that the ‘FAIR Guiding Principles for scientific data management and stewardship’ appeared in Scientific Data. The paper was intended to launch a dialogue within the research and policy communities: to start a journey to wider accessibility and reusability of data and prepare for automation-readiness by supporting findability, accessibility, interoperability and reusability for machines. Many of the authors (including myself) came from biomedical and associated communities. The paper succeeded in its aim, at least at the policy, enterprise and professional data infrastructure level. Whether FAIR has impacted the researcher at the bench or bedside is open to doubt. It certainly inspired a great deal of activity, many projects, a lot of positioning of interests and raised awareness. COVID has injected impetus and urgency to the FAIR cause (good) and also highlighted its politicisation (not so good). In this talk I’ll make some personal reflections on how we are faring with FAIR: as one of the original principles authors; as a participant in many current FAIR initiatives (particularly in the biomedical sector and for research objects other than data) and as a veteran of FAIR before we had the principles.

THOR Workshop - Introduction

Maaike Duine

The document discusses the THOR project, which aims to place Persistent Identifiers (PIDs) at the fingertips of researchers and integrate them into existing research services and outputs. The goals are to uniquely attribute work to researchers and make PID use the default across the research lifecycle. The project focuses on biological sciences, earth sciences, physical sciences, social sciences, and humanities. It provides examples of how PIDs can improve credit for researchers, discoverability and reuse of data and publications, demonstrate value for data centers, improve evidence for publishers, and measure impact for funders.

OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...

Open Science Fair

Carole Goble presents the FAIRDOM | OSFair2017 Workshop Workshop title: How FAIR friendly is your data catalogue? Workshop overview: This workshop will build upon the work planned by the EOSCpilot data interoperability task and the BlueBridge workshop held on April 3 at the RDA meeting. We will investigate common mechanisms for interoperation of data catalogues that preserve established community standards, norms and resources, while simplifying the process of being/becoming FAIR. Can we have a simple interoperability architecture based on a common set of metadata types? What are the minimum metadata requirements to expose FAIR data to EOSC services and EOSC users? DAY 3 - PARALLEL SESSION 6 & 7

THOR Workshop - Data Publishing Elsevier

Maaike Duine

Elsevier supports researchers in sharing their data through several programs and services: 1. A data-linking program connects articles to over 60 domain-specific data repositories through in-article data accession numbers and banners. 2. Mendeley Data is Elsevier's research data repository, allowing researchers to store, share and publish research data with a DOI and link to related articles. 3. In-article data visualization tools display plot data from supplementary materials in journals, allowing readers to access, explore and download underlying data.

Improving the management of computational models.

FAIRDOM

Citing data in research articles: principles, implementation, challenges - an...

FAIRDOM

What's hot

Research Objects, SEEK and FAIRDOM

Carole Goble

FAIR data and model management for systems biology.

FAIRDOM

FAIR Data and Model Management for Systems Biology(and SOPs too!)

Carole Goble

Trust and Accountability: experiences from the FAIRDOM Commons Initiative.

Carole Goble

What is Reproducibility? The R* brouhaha (and how Research Objects can help)

Carole Goble

Capturing the context: one small(ish step for modellers, one giant leap for m...

FAIRDOM

Let’s go on a FAIR safari!

Carole Goble

FAIR Data, Operations and Model management for Systems Biology and Systems Me...

Carole Goble

Better Software, Better Research

Carole Goble

Reproducibility of model-based results: standards, infrastructure, and recogn...

FAIRDOM

Crediting informatics and data folks in life science teams

Carole Goble

Improving the Management of Computational Models -- Invited talk at the EBI

Martin Scharm

Research Shared: researchobject.org

Norman Morrison

FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...

Carole Goble

Reproducibility, Research Objects and Reality, Leiden 2016

Carole Goble

Mtsr2015 goble-keynote

Carole Goble

How are we Faring with FAIR? (and what FAIR is not)

Carole Goble

THOR Workshop - Introduction

Maaike Duine

OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...

Open Science Fair

THOR Workshop - Data Publishing Elsevier

Maaike Duine

What's hot (20)

Research Objects, SEEK and FAIRDOM

FAIR data and model management for systems biology.

FAIR Data and Model Management for Systems Biology(and SOPs too!)

Trust and Accountability: experiences from the FAIRDOM Commons Initiative.

What is Reproducibility? The R* brouhaha (and how Research Objects can help)

Capturing the context: one small(ish step for modellers, one giant leap for m...

Let’s go on a FAIR safari!

FAIR Data, Operations and Model management for Systems Biology and Systems Me...

Better Software, Better Research

Reproducibility of model-based results: standards, infrastructure, and recogn...

Crediting informatics and data folks in life science teams

Improving the Management of Computational Models -- Invited talk at the EBI

Research Shared: researchobject.org

FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...

Reproducibility, Research Objects and Reality, Leiden 2016

Mtsr2015 goble-keynote

How are we Faring with FAIR? (and what FAIR is not)

THOR Workshop - Introduction

OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...

THOR Workshop - Data Publishing Elsevier

Viewers also liked

Improving the management of computational models.

FAIRDOM

Citing data in research articles: principles, implementation, challenges - an...

FAIRDOM

Licensing, Citation and Sustainability.

FAIRDOM

The document discusses licensing, citation, and sustainability of intellectual property. It covers different types of licenses for software and data including open source, proprietary, and Creative Commons licenses. It provides resources for choosing an appropriate license, ensuring works are properly cited and credited to help sustain them, and guidelines for repositories, audits, and certifications.

FAIR data and model management for systems biology (and SOPs too!)

FAIRDOM

Publishing data and code openly

FAIRDOM

Advances in Scientific Workflow Environments

Carole Goble

ERA CoBioTech Data Management Webinar

FAIRDOM

The webinar discussed FAIRDOM services that can help applicants to the ERACoBioTech call with their data management plans and requirements. FAIRDOM offers webinars on developing data management plans, and their platform and tools can help with organizing, storing, sharing, and publishing research data and models in a FAIR manner by utilizing metadata standards. Different levels of support are available, from general community resources through their hub, to premium customized support for individual projects. Consortia can include FAIRDOM as a subcontractor within the guidelines of the ERACoBioTech call.

Viewers also liked (7)

Improving the management of computational models.

Citing data in research articles: principles, implementation, challenges - an...

Licensing, Citation and Sustainability.

FAIR data and model management for systems biology (and SOPs too!)

Publishing data and code openly

Advances in Scientific Workflow Environments

ERA CoBioTech Data Management Webinar

Similar to Making your data good enough for sharing.

Best practices data collection

Sherry Lake

This document outlines best practices for creating research data. [1] It recommends using consistent data organization with standardized formats and descriptive file names. [2] Researchers should perform quality assurance checks and use scripted programs to analyze data while keeping notes. [3] All aspects of data collection and analysis should be thoroughly documented. Following these practices will improve data usability, sharing, and reproducibility.

Best practices data management

Sherry Lake

1. The document discusses best practices for managing research data over the data life cycle, from collection through sharing and archiving. It provides tips for organizing, documenting, and storing data in sustainable file formats and naming conventions. Following best practices helps ensure usability, reproducibility, and long-term access to research data. 2. Specific best practices covered include using consistent organization, standardized naming and formats, descriptive filenames, quality assurance, scripting for processing, documenting file contents, and choosing open file formats. The document also addresses data security, backup, and storage considerations. 3. Managing data properly is important for reuse and sharing data with others now or in the future. Scripting helps capture data workflows for reproducibility.

Planning for Research Data Management: 26th January 2016

IzzyChad

This document provides an overview of a session on planning for research data management. It discusses what research data management is, why it is important, and walks through the steps for creating a data management plan. The presenter explains the benefits of effective data management, such as helping researchers work more efficiently and enabling data sharing. Key aspects of a data management plan are also outlined, including describing the data, addressing ethics and intellectual property, determining how data will be stored and preserved, and making plans for data sharing and access.

Take control of your PhD journey: Manage your research data according to best...

Lars Figenschou

Data Archiving and Sharing

C. Tobin Magle

Planning for Research Data Management

dancrane_open

This document provides an overview of research data management and outlines the steps for creating a data management plan. It discusses why research data management is important, including enabling data reuse and sharing and meeting funder requirements. The document then walks through creating a data management plan, covering topics like the types and formats of data that will be generated, ethical and intellectual property issues, how data will be stored and backed up, and long-term preservation and deposition of data. It emphasizes that planning early helps ensure accurate, complete and secure data, and avoids problems down the line.

Data presentation and transfer

Iyad Abou Rabii

1) The document discusses best practices for managing research data, including organizing files, documenting data with metadata, storing data securely both internally and externally, and presenting data through tables, charts, and text for publication and sharing. 2) Key recommendations for data management include using logical file naming conventions, non-proprietary file formats, and documenting data with standard metadata fields. External repositories can increase data accessibility and preservation. 3) Effective data presentation involves using tables and charts to clearly visualize quantitative and qualitative findings. Graphs should have clear titles and labels while tables should have logical data placement. Text should concisely summarize results.

Best Practice in Data Management and Sharing

Mojtaba Lotfaliany

It is about: Introduction: What Is “Research Data”? and Data Lifecycle Part 1: Why Manage Your Data? Formatting and organizing the data Storage and Security of Data Data documentation and meta data Quality Control Version controlling Working with sensitive data Controlled Vocabulary Centralized Data Management Part 2: Data sharing What are publishers & funders saying about data sharing? Researchers’ Attitudes Benefits of data sharing Considerations before data sharing Methods of Data Sharing Shared Data Uses and Its’ Limitations Data management plans Brief summary Acknowledgment , References

File_Organization_112014

eshuppy

This document discusses best practices for organizing files and folders in research projects. It recommends following consistent naming conventions that provide context and description and adhering to a logical directory structure. Specifically, it suggests including relevant information like location, date, and version in file names. The document also stresses the importance of documentation and provides an example folder structure from the LOCI lab that organizes data by project, subproject, experiment, and replicates. Overall, the key aspects of file organization highlighted are naming conventions, directory structure, documentation, and consistency.

20170222 ku-librarians勉強会 #211 :海外研修報告：英国大学図書館を北から南へ巡る旅

kulibrarians

Managing data throughout the research lifecycle

Marieke Guy

This document summarizes a presentation about managing data throughout the research lifecycle. It discusses the stages of the research lifecycle, including planning, data creation, documentation, storage, sharing, and preservation. It provides examples of research lifecycle models and addresses key questions to consider at each stage, such as what formats to use, how to document data, where to store it, and how to share and preserve it. The presentation emphasizes making informed decisions about data management and talking to colleagues for support and advice.

Data Management for Graduate Students

Rebekah Cummings

This document provides an overview of data management best practices for graduate students presented in a workshop. It discusses what constitutes research data, the importance of managing data, how to create a data management plan, file naming conventions, metadata, data storage and backup strategies, and archiving options. The workshop covers topics like using a structured folder system, creating codebooks and documentation to describe data, and ensuring long-term access and preservation of research data. University librarians are available to help students with all aspects of responsible data management.

Data management (newest version)

Graça Gabriel

This document provides guidance on managing research data. It discusses planning ahead by considering data needs, formats, volume and ethics. It also covers organizing data through file naming, metadata, references, remote access and safekeeping. Preserving data involves determining what to keep/delete and using long-term storage such as repositories. Reasons for sharing data include scientific integrity, funding mandates and increasing impact, while reasons for not sharing include financial or sensitive personal information.

Documentation and Metdata - VA DM Bootcamp

Sherry Lake

This document discusses documentation and metadata for research data. It begins with an overview of why documentation is important at different stages of the research data lifecycle from collection through archiving. Key elements to document include how the data was created, its content and structure, who created and maintains it, and how it can be accessed and cited. The document then discusses common documentation formats like readmes, data dictionaries, and codebooks. It also introduces metadata as structured information that describes resources and explains common metadata standards and tools for creating structured metadata files. Exercises guide creating documentation in these formats for a weather dataset example.

Managing Your Research Data

Kristin Briney

Introduction to Research Data Management for postgraduate students

Marieke Guy

The document provides an introduction to research data management for postgraduate students, outlining what research data is, the research process, what research data management involves and why it is important, and how students can start thinking about good research data management practices. It discusses defining and organizing data, storage and security, and maintaining findable and understandable data throughout the research lifecycle. The goal is to explain the importance of research data management and the roles students play in effective data management.

Research Data Mangagement Essentials, 5th July 2017

Research Data Leeds

FAIR BioData Management

Ulrike Wittig

The state of global research data initiatives: observations from a life on th...

Projeto RCAAP

The document discusses research data management and provides guidance on best practices. It defines research data management as the active management of data over its lifecycle. It recommends writing a data management plan to document how data will be created, stored, shared, and preserved. It also provides tips for making data accessible and reusable through use of metadata standards, documentation, open licensing, and depositing data in repositories with persistent identifiers. The goal is to help researchers manage and share their data effectively to increase access and reuse.

My Dissertation Journey

jlposton

The document provides guidance for completing a dissertation journey, including: - Deciding between a qualitative or quantitative study approach, each with their own advantages and disadvantages. - Creating a proposal presentation for the dissertation committee on chapters 1-3 that includes frameworks, definitions, variables, and methodology. - Working with the research ethics committee and following submission guidelines. - Undergoing multiple revisions of all chapters based on committee feedback.

Similar to Making your data good enough for sharing. (20)

Best practices data collection

Best practices data management

Planning for Research Data Management: 26th January 2016

Take control of your PhD journey: Manage your research data according to best...

Data Archiving and Sharing

Planning for Research Data Management

Data presentation and transfer

Best Practice in Data Management and Sharing

File_Organization_112014

20170222 ku-librarians勉強会 #211 :海外研修報告：英国大学図書館を北から南へ巡る旅

Managing data throughout the research lifecycle

Data Management for Graduate Students

Data management (newest version)

Documentation and Metdata - VA DM Bootcamp

Managing Your Research Data

Introduction to Research Data Management for postgraduate students

Research Data Mangagement Essentials, 5th July 2017

FAIR BioData Management

The state of global research data initiatives: observations from a life on th...

My Dissertation Journey

Recently uploaded

Physiology of Nervous System presentation.pptx

fatima132662

HUMAN EYE By-R.M Class 10 phy best digital notes.pdf

Ritik83251

Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf

frank0071

Sustainable Land Management - Climate Smart Agriculture

International Food Policy Research Institute- South Asia Office

JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS

Sérgio Sacani

The pathway(s) to seeding the massive black holes (MBHs) that exist at the heart of galaxies in the present and distant Universe remains an unsolved problem. Here we categorise, describe and quantitatively discuss the formation pathways of both light and heavy seeds. We emphasise that the most recent computational models suggest that rather than a bimodal-like mass spectrum between light and heavy seeds with light at one end and heavy at the other that instead a continuum exists. Light seeds being more ubiquitous and the heavier seeds becoming less and less abundant due the rarer environmental conditions required for their formation. We therefore examine the different mechanisms that give rise to different seed mass spectrums. We show how and why the mechanisms that produce the heaviest seeds are also among the rarest events in the Universe and are hence extremely unlikely to be the seeds for the vast majority of the MBH population. We quantify, within the limits of the current large uncertainties in the seeding processes, the expected number densities of the seed mass spectrum. We argue that light seeds must be at least 103 to 105 times more numerous than heavy seeds to explain the MBH population as a whole. Based on our current understanding of the seed population this makes heavy seeds (Mseed > 103 M⊙) a significantly more likely pathway given that heavy seeds have an abundance pattern than is close to and likely in excess of 10−4 compared to light seeds. Finally, we examine the current state-of-the-art in numerical calculations and recent observations and plot a path forward for near-future advances in both domains.

Anti-Universe And Emergent Gravity and the Dark Universe

Sérgio Sacani

Recent theoretical progress indicates that spacetime and gravity emerge together from the entanglement structure of an underlying microscopic theory. These ideas are best understood in Anti-de Sitter space, where they rely on the area law for entanglement entropy. The extension to de Sitter space requires taking into account the entropy and temperature associated with the cosmological horizon. Using insights from string theory, black hole physics and quantum information theory we argue that the positive dark energy leads to a thermal volume law contribution to the entropy that overtakes the area law precisely at the cosmological horizon. Due to the competition between area and volume law entanglement the microscopic de Sitter states do not thermalise at sub-Hubble scales: they exhibit memory effects in the form of an entropy displacement caused by matter. The emergent laws of gravity contain an additional ‘dark’ gravitational force describing the ‘elastic’ response due to the entropy displacement. We derive an estimate of the strength of this extra force in terms of the baryonic mass, Newton’s constant and the Hubble acceleration scale a0 = cH0, and provide evidence for the fact that this additional ‘dark gravity force’ explains the observed phenomena in galaxies and clusters currently attributed to dark matter.

Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...

Sérgio Sacani

We present the JWST discovery of SN 2023adsy, a transient object located in a host galaxy JADES-GS + 53.13485 − 27.82088 with a host spectroscopic redshift of 2.903 ± 0.007 . The transient was identified in deep James Webb Space Telescope (JWST)/NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) program. Photometric and spectroscopic followup with NIRCam and NIRSpec, respectively, confirm the redshift and yield UV-NIR light-curve, NIR color, and spectroscopic information all consistent with a Type Ia classification. Despite its classification as a likely SN Ia, SN 2023adsy is both fairly red ( � ⁢ ( � − � ) ∼ 0.9 ) despite a host galaxy with low-extinction and has a high Ca II velocity ( 19 , 000 ± 2 , 000 km/s) compared to the general population of SNe Ia. While these characteristics are consistent with some Ca-rich SNe Ia, particularly SN 2016hnk, SN 2023adsy is intrinsically brighter than the low- � Ca-rich population. Although such an object is too red for any low- � cosmological sample, we apply a fiducial standardization approach to SN 2023adsy and find that the SN 2023adsy luminosity distance measurement is in excellent agreement ( ≲ 1 ⁢ � ) with Λ CDM. Therefore unlike low- � Ca-rich SNe Ia, SN 2023adsy is standardizable and gives no indication that SN Ia standardized luminosities change significantly with redshift. A larger sample of distant SNe Ia is required to determine if SN Ia population characteristics at high- � truly diverge from their low- � counterparts, and to confirm that standardized luminosities nevertheless remain constant with redshift.

fermented food science of sauerkraut.pptx

ananya23nair

11.1 Role of physical biological in deterioration of grains.pdf

PirithiRaju

Quality assurance B.pharm 6th semester BP606T UNIT 5

vimalveerammal

Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT

savindersingh16

Summary Of transcription and Translation.pdf

vadgavevedant86

Microbiology of Central Nervous System INFECTIONS.pdf

sammy700571

Reaching the age of Adolescence- Class 8

abhinayakamasamudram

AJAY KUMAR NIET GreNo Guava Project File.pdf

AJAY KUMAR

MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...

ABHISHEK SONI NIMT INSTITUTE OF MEDICAL AND PARAMEDCIAL SCIENCES , GOVT PG COLLEGE NOIDA

Microbial interaction Microorganisms interacts with each other and can be physically associated with another organisms in a variety of ways. One organism can be located on the surface of another organism as an ectobiont or located within another organism as endobiont. Microbial interaction may be positive such as mutualism, proto-cooperation, commensalism or may be negative such as parasitism, predation or competition Types of microbial interaction Positive interaction: mutualism, proto-cooperation, commensalism Negative interaction: Ammensalism (antagonism), parasitism, predation, competition I. Mutualism: It is defined as the relationship in which each organism in interaction gets benefits from association. It is an obligatory relationship in which mutualist and host are metabolically dependent on each other. Mutualistic relationship is very specific where one member of association cannot be replaced by another species. Mutualism require close physical contact between interacting organisms. Relationship of mutualism allows organisms to exist in habitat that could not occupied by either species alone. Mutualistic relationship between organisms allows them to act as a single organism. Examples of mutualism: i. Lichens: Lichens are excellent example of mutualism. They are the association of specific fungi and certain genus of algae. In lichen, fungal partner is called mycobiont and algal partner is called II. Syntrophism: It is an association in which the growth of one organism either depends on or improved by the substrate provided by another organism. In syntrophism both organism in association gets benefits. Compound A Utilized by population 1 Compound B Utilized by population 2 Compound C utilized by both Population 1+2 Products In this theoretical example of syntrophism, population 1 is able to utilize and metabolize compound A, forming compound B but cannot metabolize beyond compound B without co-operation of population 2. Population 2is unable to utilize compound A but it can metabolize compound B forming compound C. Then both population 1 and 2 are able to carry out metabolic reaction which leads to formation of end product that neither population could produce alone. Examples of syntrophism: i. Methanogenic ecosystem in sludge digester Methane produced by methanogenic bacteria depends upon interspecies hydrogen transfer by other fermentative bacteria. Anaerobic fermentative bacteria generate CO2 and H2 utilizing carbohydrates which is then utilized by methanogenic bacteria (Methanobacter) to produce methane. ii. Lactobacillus arobinosus and Enterococcus faecalis: In the minimal media, Lactobacillus arobinosus and Enterococcus faecalis are able to grow together but not alone. The synergistic relationship between E. faecalis and L. arobinosus occurs in which E. faecalis require folic acid

Lattice Defects in ionic solid compound.pptx

DrRajeshDas

快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样

hozt8xgk

学校原件一模一样【微信：741003700 】《(UAM毕业证书)马德里自治大学毕业证学位证》【微信：741003700 】学位证，留信认证（真实可查，永久存档）原件一模一样纸张工艺/offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原。 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 【主营项目】一.毕业证【q微741003700】成绩单、使馆认证、教育部认证、雅思托福成绩单、学生卡等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 如果您处于以下几种情况： ◇在校期间，因各种原因未能顺利毕业……拿不到官方毕业证【q/微741003700】 ◇面对父母的压力，希望尽快拿到； ◇不清楚认证流程以及材料该如何准备； ◇回国时间很长，忘记办理； ◇回国马上就要找工作，办给用人单位看； ◇企事业单位必须要求办理的 ◇需要报考公务员、购买免税车、落转户口 ◇申请留学生创业基金留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

LEARNING TO LIVE WITH LAWS OF MOTION .pptx

yourprojectpartner05

Compositions of iron-meteorite parent bodies constrainthe structure of the pr...

Sérgio Sacani

Magmatic iron-meteorite parent bodies are the earliest planetesimals in the Solar System,and they preserve information about conditions and planet-forming processes in thesolar nebula. In this study, we include comprehensive elemental compositions andfractional-crystallization modeling for iron meteorites from the cores of five differenti-ated asteroids from the inner Solar System. Together with previous results of metalliccores from the outer Solar System, we conclude that asteroidal cores from the outerSolar System have smaller sizes, elevated siderophile-element abundances, and simplercrystallization processes than those from the inner Solar System. These differences arerelated to the formation locations of the parent asteroids because the solar protoplane-tary disk varied in redox conditions, elemental distributions, and dynamics at differentheliocentric distances. Using highly siderophile-element data from iron meteorites, wereconstruct the distribution of calcium-aluminum-rich inclusions (CAIs) across theprotoplanetary disk within the first million years of Solar-System history. CAIs, the firstsolids to condense in the Solar System, formed close to the Sun. They were, however,concentrated within the outer disk and depleted within the inner disk. Future modelsof the structure and evolution of the protoplanetary disk should account for this dis-tribution pattern of CAIs.

Recently uploaded (20)

Physiology of Nervous System presentation.pptx

HUMAN EYE By-R.M Class 10 phy best digital notes.pdf

Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf

Sustainable Land Management - Climate Smart Agriculture

JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS

Anti-Universe And Emergent Gravity and the Dark Universe

Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...

fermented food science of sauerkraut.pptx

11.1 Role of physical biological in deterioration of grains.pdf

Quality assurance B.pharm 6th semester BP606T UNIT 5

Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT

Summary Of transcription and Translation.pdf

Microbiology of Central Nervous System INFECTIONS.pdf

Reaching the age of Adolescence- Class 8

AJAY KUMAR NIET GreNo Guava Project File.pdf

MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...

Lattice Defects in ionic solid compound.pptx

快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样

LEARNING TO LIVE WITH LAWS OF MOTION .pptx

Compositions of iron-meteorite parent bodies constrainthe structure of the pr...

Making your data good enough for sharing.

1. On Golf and Data Wolfgang Müller Making your data good enough for sharing

2. Challenge of data sharing • Most data never gets shared – Wrong experimental method – Hidden parameter discovered – Faulty experiment • How to prepare data in this situation? – Don‘t want to waste time – Want to be prepared if we share • Propose useful way forward

4. 80-20 rule Voltaire: „The best is the enemy of the good“ 80-20 rule: Often you can get 80% of the benefits using 20% of the effort. Tee-off Approach Put- ting Biggest approach in one shot

5. What to share? • Raw data (sometimes) • Condensed, interpreted data • Metadata: Data about the data – Conditions of the measurements – Information about the samples • What was sampled? • How was it prepared? • How was it treated after sampling?

6. Levels of detail • Action guidelines (e.g. SOP) • Structure guidelines (e.g. F1000 data preparation guidelines) • Semantics guidelines (metadata + content, e.g. some MIBBIs) • File format standards (e.g. ISA-TAB, SBML) • Ontologies + vocabularies (e.g. ChEBI)

7. Standardisation scales • Self • Group • Collaborative project • Field scale Increasedusabilityforothers

8. Self-standardisation • Store same things in same structure – Test question: „Does Excel cell (e.g.) A2 have the same meaning in all files about the same experiment type“? • Name same things the same way – Test question: „Does ‚gl‘ mean exactly the same in all occurences“? • Identify uniquely things that you reference. Benefit: Automatic adaptation of your data much easier

9. Identify uniquely (e.g. McCurry et al. preprint) 1. If you create identifiers, do not DIY (Do Identifiers by Yourself) 2. Help identifiers travel well: don’t let them leave home without a Prefix and a Namespace 3. Make Local Resource Identifiers rugged to realworld use 4. Make the full URI simple and durable 5. Carefully consider whether to embed meaning 6. Make the full URI and CURIE clear and easy to find 7. Implement a version management policy 8. Manage complex lifecycles without deletion 9. Document the identifiers you issue and use 10. Reference responsibly and rely on full URIs

10. Standardisation within group or project Same as before, but in addition: • Needs agreeing on how to do things the same way • Needs looking into standards for your domain – Inspiration how to proceed – Clear insight into migration paths

11. e.g. F1000 data preparation guidelines • Give each column a descriptive heading • Use a single header row • Ensure you have used the first cell, i.e. A1 • Include Title & Legend for each spreadsheet • Save each data file with a telling name • Submit each table as a separate file • Submit each work sheet as a separate file

12. JERM templates

13. Systems Biology Markup Language • XML-Based format – Levels and Versions – Packages • Model of relations within SBML files as UML • Library implementations • MIRIAM guidelines for proper annotation of SBML files • MIRIAM resources, MIRIAM resolver for providing identifiers and links • ...

14. biosharing.org

15. Modify reproducibly

Making your data good enough for sharing.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Making your data good enough for sharing.

Similar to Making your data good enough for sharing. (20)

Recently uploaded

Recently uploaded (20)

Making your data good enough for sharing.