Workshop - finding and accessing data - Cambridge August 22 2016Fiona Nielsen
Finding and accessing human genomic data for research
University of Cambridge, United Kingdom | Seminar Room G
Monday, 22 August 2016 from 10:00 to 12:00 (BST)
Charlotte, Nadia and Fiona presented an overview of data sources around the world where you can find genomics data for your research and gave examples of the data access application for dbGaP and EGA with specific details relevant for University of Cambridge researchers.
The webinar discussed FAIRDOM services that can help applicants to the ERACoBioTech call with their data management plans and requirements. FAIRDOM offers webinars on developing data management plans, and their platform and tools can help with organizing, storing, sharing, and publishing research data and models in a FAIR manner by utilizing metadata standards. Different levels of support are available, from general community resources through their hub, to premium customized support for individual projects. Consortia can include FAIRDOM as a subcontractor within the guidelines of the ERACoBioTech call.
A Big Picture in Research Data ManagementCarole Goble
A personal view of the big picture in Research Data Management, given at GFBio - de.NBI Summer School 2018 Riding the Data Life Cycle! Braunschweig Integrated Centre of Systems Biology (BRICS), 03 - 07 September 2018
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
The document discusses research data management and provides guidance on best practices. It defines research data management as the active management of data over its lifecycle. It recommends writing a data management plan to document how data will be created, stored, shared, and preserved. It also provides tips for making data accessible and reusable through use of metadata standards, documentation, open licensing, and depositing data in repositories with persistent identifiers. The goal is to help researchers manage and share their data effectively to increase access and reuse.
Keynote on software sustainability given at the 2nd Annual Netherlands eScience Symposium, November 2014.
Based on the article
Carole Goble ,
Better Software, Better Research
Issue No.05 - Sept.-Oct. (2014 vol.18)
pp: 4-8
IEEE Computer Society
http://www.computer.org/csdl/mags/ic/2014/05/mic2014050004.pdf
http://doi.ieeecomputersociety.org/10.1109/MIC.2014.88
http://www.software.ac.uk/resources/publications/better-software-better-research
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
Plenary Lecture Presented at INCF Neuroinformatics 2019 https://www.neuroinformatics2019.org
Title: FAIRy stories: tales from building the FAIR Research Commons
Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any kind of Research Object is a mantra; a method; a meme; a myth; a mystery. For the past 15 years I have been working on FAIR in a range of projects and initiatives in the Life Sciences as we try to build the FAIR Research Commons. Some are top-down like the European Research Infrastructures ELIXIR, ISBE and IBISBA, and the NIH Data Commons. Some are bottom-up, supporting FAIR for investigator-led projects (FAIRDOM), biodiversity analytics (BioVel), and FAIR drug discovery (Open PHACTS, FAIRplus). Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. There are villains and heroes. Some have happy endings; all have morals.
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
A presentation on research data management tools, workflows and best practices at Imperial College London with a focus on software management. Presented at the 2017 session of the HPC Summer School (Dept. of Computing).
Workshop - finding and accessing data - Cambridge August 22 2016Fiona Nielsen
Finding and accessing human genomic data for research
University of Cambridge, United Kingdom | Seminar Room G
Monday, 22 August 2016 from 10:00 to 12:00 (BST)
Charlotte, Nadia and Fiona presented an overview of data sources around the world where you can find genomics data for your research and gave examples of the data access application for dbGaP and EGA with specific details relevant for University of Cambridge researchers.
The webinar discussed FAIRDOM services that can help applicants to the ERACoBioTech call with their data management plans and requirements. FAIRDOM offers webinars on developing data management plans, and their platform and tools can help with organizing, storing, sharing, and publishing research data and models in a FAIR manner by utilizing metadata standards. Different levels of support are available, from general community resources through their hub, to premium customized support for individual projects. Consortia can include FAIRDOM as a subcontractor within the guidelines of the ERACoBioTech call.
A Big Picture in Research Data ManagementCarole Goble
A personal view of the big picture in Research Data Management, given at GFBio - de.NBI Summer School 2018 Riding the Data Life Cycle! Braunschweig Integrated Centre of Systems Biology (BRICS), 03 - 07 September 2018
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
The document discusses research data management and provides guidance on best practices. It defines research data management as the active management of data over its lifecycle. It recommends writing a data management plan to document how data will be created, stored, shared, and preserved. It also provides tips for making data accessible and reusable through use of metadata standards, documentation, open licensing, and depositing data in repositories with persistent identifiers. The goal is to help researchers manage and share their data effectively to increase access and reuse.
Keynote on software sustainability given at the 2nd Annual Netherlands eScience Symposium, November 2014.
Based on the article
Carole Goble ,
Better Software, Better Research
Issue No.05 - Sept.-Oct. (2014 vol.18)
pp: 4-8
IEEE Computer Society
http://www.computer.org/csdl/mags/ic/2014/05/mic2014050004.pdf
http://doi.ieeecomputersociety.org/10.1109/MIC.2014.88
http://www.software.ac.uk/resources/publications/better-software-better-research
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
Plenary Lecture Presented at INCF Neuroinformatics 2019 https://www.neuroinformatics2019.org
Title: FAIRy stories: tales from building the FAIR Research Commons
Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any kind of Research Object is a mantra; a method; a meme; a myth; a mystery. For the past 15 years I have been working on FAIR in a range of projects and initiatives in the Life Sciences as we try to build the FAIR Research Commons. Some are top-down like the European Research Infrastructures ELIXIR, ISBE and IBISBA, and the NIH Data Commons. Some are bottom-up, supporting FAIR for investigator-led projects (FAIRDOM), biodiversity analytics (BioVel), and FAIR drug discovery (Open PHACTS, FAIRplus). Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. There are villains and heroes. Some have happy endings; all have morals.
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
A presentation on research data management tools, workflows and best practices at Imperial College London with a focus on software management. Presented at the 2017 session of the HPC Summer School (Dept. of Computing).
Data Access & Storage @ UWA - UWA Research Week September 2017Katina Toufexis
The document discusses research data management services provided by the University of Western Australia (UWA) Library. It notes that funders like the Australian Research Council (ARC) and National Health and Medical Research Council (NHMRC) require research data to be managed and shared. UWA policies also require research data related to publications to be available through the UWA Research Repository. The document provides guidance on creating data management plans, using appropriate licenses, and securely storing data long-term using the Institutional Research Data Storage (IRDS) system rather than third-party cloud services like Dropbox.
As BioPharma adapts to incorporate nimble networks of suppliers, collaborators, and regulators the ability to link data is critical for dynamic interoperability. Adoption of linked data paradigm allows BioPharma to focus on core business: delivering valuable therapeutics in a timely manner.
S. Venkataraman (DCC) talks about the basics of Research Data Management and how to apply this when creating or reviewing a Data Management Plan (DMP). He discusses data formats and metadata standards, persistent identifiers, licensing, controlled vocabularies and data repositories.
link to : dcc.ac.uk/resources
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
A workshop as part of the International Digital Curation Conference 2016 on DMP development and support. This presentation demonstrates how we can use data management plans as a source of information to better understand researcher data stewardship practices and how to support them. Be sure to see the slide notes to better understand the presentation (most slides are just photos/icons).
Data repositories -- Xiamen University 2012 06-08Jian Qin
The document discusses data repositories and services. It begins by defining what a data repository is, noting that it is a logical and sometimes physical partitioning of data where multiple databases reside. It then outlines some key aspects of data repositories, including technical features like standards, software, and staffing requirements. The document also discusses functions of repositories like content management, archiving, dissemination and system maintenance. It provides examples of institutional repositories and data repositories, highlighting characteristics of each. Finally, it provides a case study on Dryad, an international repository for data and publications in biosciences.
FAIR Workflows and Research Objects get a Workout Carole Goble
So, you want to build a pan-national digital space for bioscience data and methods? That works with a bunch of pre-existing data repositories and processing platforms? So you can share FAIR workflows and move them between services? Package them up with data and other stuff (or just package up data for that matter)? How? WorkflowHub (https://workflowhub.eu) and RO-Crate Research Objects (https://www.researchobject.org/ro-crate) that’s how! A step towards FAIR Digital Objects gets a workout.
Presented at DataVerse Community Meeting 2021
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
As scientists in the life sciences we are trained to pursue singular goals around a publication or a validated target or a drug submission. Our failure rates are exceedingly high especially as we move closer to patients in the attempt to collect sufficient clinical evidence to demonstrate the value of novel therapeutics. This wastes resources as well as time for patients depending upon us for the next breakthrough.
Edge Informatics is an approach to ameliorate these failures. Using both technical and social solutions together knowledge can be shared and leveraged across the drug development process. This is accomplished by making data assets discoverable, accessible, self-described, reusable and annotatable. The Open PHACTS project pioneered this approach and has provided a number of the technical and social solutions to enable Edge Informatics. A number of pre-competitive consortia and some content providers have also embraced this approach, facilitating networks of collaborators within and outside a given organization. When taken together more accurate, timely and inclusive decision-making is fostered.
This slideshow was used in a Preparing Your Research Data for the Future course taught in the Medical Sciences Division, University of Oxford, on 2015-06-08. It provides an overview of some key issues, focusing on long-term data management, sharing, and curation.
The first workshop of the series "Services to support FAIR data" took place in Prague during the EOSC-hub week (on April 12, 2019).
Speaker: Kostas Repanas (EC DG RTD)
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesPistoia Alliance
The FAIR (Findable, Accessible, Interoperable and Reusable) principles aim to maximize the discovery and reuse of digital resources. Using recently developed software and metrics to assess FAIRness and supported through an ELIXIR Implementation Study, Michel worked with a subset of ELIXIR Core Data Resources to apply these technologies. In this webinar, he will discuss their approach, findings, and lessons learned towards the understanding and promotion of the FAIR principles.
Our regular Introduction to Data Management (DM) workshop (90-minutes). Covers very basic DM topics and concepts. Audience is graduate students from all disciplines. Most of the content is in the NOTES FIELD.
How Portable Are the Metadata Standards for Scientific Data?Jian Qin
The one-covers-all approach in current metadata standards for scientific data has serious limitations in keeping up with the ever-growing data. This paper reports the findings from a survey to metadata standards in the scientific data domain and argues for the need for a metadata infrastructure. The survey collected 4400+ unique elements from 16 standards and categorized these elements into 9 categories. Findings from the data included that the highest counts of element occurred in the descriptive category and many of them overlapped with DC elements. This pattern also repeated in the elements co-occurred in different standards. A small number of semantically general elements appeared across the largest numbers of standards while the rest of the element co-occurrences formed a long tail with a wide range of specific semantics. The paper discussed implications of the findings in the context of metadata portability and infrastructure and pointed out that large, complex standards and widely varied naming practices are the major hurdles for building a metadata infrastructure.
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
BioPharma and the broader research community is faced with the challenge of simply finding the appropriate internal and external datasets for downstream analytics, knowledge-generation and collaboration. With datasets as the core asset, we wanted to promote both human and machine exploitability, using web-centric data cataloguing principles as described in the W3C Data on the Web Best Practices. To do so, we adopted DCAT (Data CATalog Vocabulary) and VoID (Vocabulary of Interlinked Datasets) for both RDF and non-RDF datasets at summary, version and distribution levels. Further, we’ve described datasets using a limited set of well-vetted public vocabularies, focused on cross-omics analytes and clinical features of the catalogued datasets.
This slideshow was used at a lunchtime session delivered at the Humanities Division, University of Oxford, on 2014-05-12. It provides a general overview of some key data management topics, plus some pointers on where to find further information.
This document discusses computational workflows and FAIR principles. It begins by providing background on computational workflows and their increasing importance. It then discusses challenges around finding, accessing, and sharing workflows. Next, it explores how applying FAIR principles to workflows could help address these challenges by making workflows and their associated objects findable, accessible, interoperable, and reusable. This includes discussing applying metadata standards, using persistent identifiers, and developing principles for FAIR workflows and FAIR software. The document concludes by examining the roles and responsibilities of different stakeholders in working towards FAIR workflows.
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
Lecture 1:
Being FAIR: FAIR data and model management
In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects.
Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester.
In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face.
I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects.
[1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM
This document provides information about a webinar from the FAIRDOM Consortium on data management for ERACoBioTech full proposals. It includes:
- Details on how to budget for and include a data management plan in proposals
- A checklist for developing a data management plan covering topics like the types and volumes of data, data sharing and reuse, and making data FAIR
- An overview of the FAIRDOM services and software platform that can help with project data management and stewardship
Data Access & Storage @ UWA - UWA Research Week September 2017Katina Toufexis
The document discusses research data management services provided by the University of Western Australia (UWA) Library. It notes that funders like the Australian Research Council (ARC) and National Health and Medical Research Council (NHMRC) require research data to be managed and shared. UWA policies also require research data related to publications to be available through the UWA Research Repository. The document provides guidance on creating data management plans, using appropriate licenses, and securely storing data long-term using the Institutional Research Data Storage (IRDS) system rather than third-party cloud services like Dropbox.
As BioPharma adapts to incorporate nimble networks of suppliers, collaborators, and regulators the ability to link data is critical for dynamic interoperability. Adoption of linked data paradigm allows BioPharma to focus on core business: delivering valuable therapeutics in a timely manner.
S. Venkataraman (DCC) talks about the basics of Research Data Management and how to apply this when creating or reviewing a Data Management Plan (DMP). He discusses data formats and metadata standards, persistent identifiers, licensing, controlled vocabularies and data repositories.
link to : dcc.ac.uk/resources
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
A workshop as part of the International Digital Curation Conference 2016 on DMP development and support. This presentation demonstrates how we can use data management plans as a source of information to better understand researcher data stewardship practices and how to support them. Be sure to see the slide notes to better understand the presentation (most slides are just photos/icons).
Data repositories -- Xiamen University 2012 06-08Jian Qin
The document discusses data repositories and services. It begins by defining what a data repository is, noting that it is a logical and sometimes physical partitioning of data where multiple databases reside. It then outlines some key aspects of data repositories, including technical features like standards, software, and staffing requirements. The document also discusses functions of repositories like content management, archiving, dissemination and system maintenance. It provides examples of institutional repositories and data repositories, highlighting characteristics of each. Finally, it provides a case study on Dryad, an international repository for data and publications in biosciences.
FAIR Workflows and Research Objects get a Workout Carole Goble
So, you want to build a pan-national digital space for bioscience data and methods? That works with a bunch of pre-existing data repositories and processing platforms? So you can share FAIR workflows and move them between services? Package them up with data and other stuff (or just package up data for that matter)? How? WorkflowHub (https://workflowhub.eu) and RO-Crate Research Objects (https://www.researchobject.org/ro-crate) that’s how! A step towards FAIR Digital Objects gets a workout.
Presented at DataVerse Community Meeting 2021
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
As scientists in the life sciences we are trained to pursue singular goals around a publication or a validated target or a drug submission. Our failure rates are exceedingly high especially as we move closer to patients in the attempt to collect sufficient clinical evidence to demonstrate the value of novel therapeutics. This wastes resources as well as time for patients depending upon us for the next breakthrough.
Edge Informatics is an approach to ameliorate these failures. Using both technical and social solutions together knowledge can be shared and leveraged across the drug development process. This is accomplished by making data assets discoverable, accessible, self-described, reusable and annotatable. The Open PHACTS project pioneered this approach and has provided a number of the technical and social solutions to enable Edge Informatics. A number of pre-competitive consortia and some content providers have also embraced this approach, facilitating networks of collaborators within and outside a given organization. When taken together more accurate, timely and inclusive decision-making is fostered.
This slideshow was used in a Preparing Your Research Data for the Future course taught in the Medical Sciences Division, University of Oxford, on 2015-06-08. It provides an overview of some key issues, focusing on long-term data management, sharing, and curation.
The first workshop of the series "Services to support FAIR data" took place in Prague during the EOSC-hub week (on April 12, 2019).
Speaker: Kostas Repanas (EC DG RTD)
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesPistoia Alliance
The FAIR (Findable, Accessible, Interoperable and Reusable) principles aim to maximize the discovery and reuse of digital resources. Using recently developed software and metrics to assess FAIRness and supported through an ELIXIR Implementation Study, Michel worked with a subset of ELIXIR Core Data Resources to apply these technologies. In this webinar, he will discuss their approach, findings, and lessons learned towards the understanding and promotion of the FAIR principles.
Our regular Introduction to Data Management (DM) workshop (90-minutes). Covers very basic DM topics and concepts. Audience is graduate students from all disciplines. Most of the content is in the NOTES FIELD.
How Portable Are the Metadata Standards for Scientific Data?Jian Qin
The one-covers-all approach in current metadata standards for scientific data has serious limitations in keeping up with the ever-growing data. This paper reports the findings from a survey to metadata standards in the scientific data domain and argues for the need for a metadata infrastructure. The survey collected 4400+ unique elements from 16 standards and categorized these elements into 9 categories. Findings from the data included that the highest counts of element occurred in the descriptive category and many of them overlapped with DC elements. This pattern also repeated in the elements co-occurred in different standards. A small number of semantically general elements appeared across the largest numbers of standards while the rest of the element co-occurrences formed a long tail with a wide range of specific semantics. The paper discussed implications of the findings in the context of metadata portability and infrastructure and pointed out that large, complex standards and widely varied naming practices are the major hurdles for building a metadata infrastructure.
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
BioPharma and the broader research community is faced with the challenge of simply finding the appropriate internal and external datasets for downstream analytics, knowledge-generation and collaboration. With datasets as the core asset, we wanted to promote both human and machine exploitability, using web-centric data cataloguing principles as described in the W3C Data on the Web Best Practices. To do so, we adopted DCAT (Data CATalog Vocabulary) and VoID (Vocabulary of Interlinked Datasets) for both RDF and non-RDF datasets at summary, version and distribution levels. Further, we’ve described datasets using a limited set of well-vetted public vocabularies, focused on cross-omics analytes and clinical features of the catalogued datasets.
This slideshow was used at a lunchtime session delivered at the Humanities Division, University of Oxford, on 2014-05-12. It provides a general overview of some key data management topics, plus some pointers on where to find further information.
This document discusses computational workflows and FAIR principles. It begins by providing background on computational workflows and their increasing importance. It then discusses challenges around finding, accessing, and sharing workflows. Next, it explores how applying FAIR principles to workflows could help address these challenges by making workflows and their associated objects findable, accessible, interoperable, and reusable. This includes discussing applying metadata standards, using persistent identifiers, and developing principles for FAIR workflows and FAIR software. The document concludes by examining the roles and responsibilities of different stakeholders in working towards FAIR workflows.
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
Lecture 1:
Being FAIR: FAIR data and model management
In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects.
Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester.
In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face.
I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects.
[1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM
This document provides information about a webinar from the FAIRDOM Consortium on data management for ERACoBioTech full proposals. It includes:
- Details on how to budget for and include a data management plan in proposals
- A checklist for developing a data management plan covering topics like the types and volumes of data, data sharing and reuse, and making data FAIR
- An overview of the FAIRDOM services and software platform that can help with project data management and stewardship
Aim:- To show how research data management can contribute to the success of your PhD.
*What is research data and why it is important?
*The Research Data lifecycle
* Research Data – more than just your results
* FAIR data and Open Research
* DMP online tool
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...OSTHUS
During SmartLab Exchange 2015, Allotrope Foundation and OSTHUS presented the latest update on the Allotrope Framework. To learn more, please view the slides below.
Presented by:
Dana Vanderwall, BMS Research IT & Automation Patrick Chin, Merck Research Laboratories IT Wolfgang Colsman, OSTHUS
Pemanfaatan Big Data Dalam Riset 2023.pptxelisarosa29
1. The document discusses the use of big data in various fields such as education, bioinformatics, and genomics. It provides examples of studies that have used big data analytics for student performance monitoring, genomic repeats detection, and understanding trends in educational research.
2. Methodologies for big data analysis discussed include using Apache Spark for efficient processing of large genomic datasets and building predictive models from multiple educational variables.
3. Key applications highlighted are automatic grading of MOOC assignments using machine learning, analyzing program learning outcomes in outcome-based education systems.
Data Management Planning for researchersSarah Jones
This document provides information about creating a data management plan (DMP) for researchers. It begins with defining what a DMP is - a short plan that outlines what data will be created, how it will be managed and stored, and plans for sharing and preservation. It then discusses the common components of a DMP, including describing the data, standards and methodologies, ethics and intellectual property, data sharing plans, and preservation strategies. The document provides examples of DMP requirements and recommendations from funders. It offers tips for creating a good DMP, including thinking about the needs of future data re-users, consulting stakeholders, grounding plans in reality, and planning for sharing from the outset. Finally, it discusses tools and resources
This document discusses reproducible research and provides guidance on how to conduct research in a reproducible manner. It covers:
1. The importance of reproducible research due to large datasets, computational analyses, and the potential for human error. Ensuring reproducibility requires new expertise and infrastructure.
2. Key aspects of reproducible research include data management plans, version control, use of file formats and software/tools that allow reproducibility, and publishing data and code to allow others to replicate results.
3. Reproducible research benefits the scientific community by increasing transparency and allows researchers to re-analyze their own data in the future. Journals and funders are increasingly requiring reproducibility.
A presentation I gave at the 2018 Molecular Med Tri-Con in San Francisco, February 2018. This addresses the general challenge of biomedical data management, some of the things to consider when evaluation solutions in this space, and concludes with a brief summary of some of the tools and platforms in this space.
This document introduces FAIRDOM, a consortium that provides a platform and services to help researchers organize, manage, share, and preserve research outputs according to FAIR principles. FAIRDOM has been in operation for 10 years and has over 50 installations supporting over 118 projects. It provides tools and services to help researchers collaborate better and integrate their data, models, publications and other research objects. FAIRDOM also works with other organizations and infrastructure providers to support broader research initiatives.
FAIR - Working Data - It's not just about FAIR publishing. Presented by John Morrissey from CSIRO at the C3DIS post conference workshop: Managed data – trusted research: an introduction to Research Data Management 31 may 2018 in Melbourne
This document summarizes a seminar on data management for undergraduate researchers. It discusses what data is, why it needs to be managed, and key aspects of the data management process such as data organization, metadata, storage, and archiving. Topics covered include file naming best practices, version control, documentation, metadata standards, storage options, and long-term archiving. The goal is to help researchers organize and document their data so it can be understood, preserved, and reused.
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
Elsevier's RDM Program: Ten Habits of Highly Effective Data
The document outlines Elsevier's research data management (RDM) program and efforts to support the effective management of research data. It discusses a "Maslow hierarchy" with 10 aspects of highly effective research data from stored to integrated. It provides examples of Elsevier's RDM tools and services like Hivebench, Mendeley Data, and DataSearch that help support storing, sharing, citing, and discovering research data. It also discusses collaborative RDM efforts like Force11, Research Data Alliance, and Crossref as well as journal initiatives to improve reproducibility. The document concludes with a proposed partnership where an institution could pilot and provide feedback on Elsevier's
This workshop aims at gathering together practioners of all levels and from a variety of research areas (agronomy, plant biology, food, life sciences etc) to compare best practices, points of views and projects about producing and consuming data in the agrifood field.
As it happens in general for digital data, the current trends in this arena include integration of "traditional" semantic-based approaches (eg, ontoloies, RDF-based linked data) with lightweight schemas (eg, Bioschemas/schema.org), use of JSON-based APIs, development of data lakes and knowledge graphs based on NoSQL technologies, graph databases based on property graphs (eg, Neo4j, TinkerPop/Gremlin).
Workshop participants will get an opportunity to discuss how those approaches and technologies are being used in the agrifood field, for the purpose or realising the FAIR data principles and make data sharing a powerful tool for research, industry or socio-economic investigation. In particular, we will propose an interactive session to outline the way participant-proposed datasets can be encoded through bioschemas or similar approaches.
This document summarizes Rob Grim's presentation on e-Science, research data, and the role of libraries. It discusses the Open Data Foundation's work in promoting metadata standards like DDI and SDMX. It also outlines the research data lifecycle and how metadata management can help libraries support research through services like data registration, archiving, discovery and access. Finally, it provides examples of how Tilburg University library supports research data through services aligned with data availability, discovery, access and delivery.
This presentation was provided by Carly Strasser of the Chan Zuckerberg Initiative during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Nucleophilic Addition of carbonyl compounds.pptxSSR02
Nucleophilic addition is the most important reaction of carbonyls. Not just aldehydes and ketones, but also carboxylic acid derivatives in general.
Carbonyls undergo addition reactions with a large range of nucleophiles.
Comparing the relative basicity of the nucleophile and the product is extremely helpful in determining how reversible the addition reaction is. Reactions with Grignards and hydrides are irreversible. Reactions with weak bases like halides and carboxylates generally don’t happen.
Electronic effects (inductive effects, electron donation) have a large impact on reactivity.
Large groups adjacent to the carbonyl will slow the rate of reaction.
Neutral nucleophiles can also add to carbonyls, although their additions are generally slower and more reversible. Acid catalysis is sometimes employed to increase the rate of addition.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
2. - Where do you store your experimental data?
- What happens with data when a PhD students leaves the group?
- Are all data complete for a publication?
- Do you make regular backups of your local machine?
- Do you send emails to share data with your colleagues?
- Do you always store email attachements in your local directory?
- Do you store all different versions of a data file together in the same place?
- Which protocol was used for the experiment?
...
Why do you need data management?
3. Vahan Simonyan, Center for Biologics
Evaluation and Research, Food and
DrugAdministration, USA
How well is your experiment
documented?
4. • Track collection of raw and processed
(secondary) data, models & metadata
• Maintain experimental context
• Organise and link assets
• Choose what to keep and what to ditch
• Report consistently
• Reproducible publications
• Promote standardised metadata practices
• Exchange among colleagues
• How and when to share and publish
• Get and give credit
• Retain and find beyond project
• Integrate with legacy, home grown,
external systems
• Reuse tools and community archives
• Support automation and analytics
workflows. Support curation
CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
ACCESS
TO DATA
RE-USING
DATA
Purpose of Project Data
Management
5. Purpose of Project Data
Management
Organisation
Communication
Dissemination
Partners
Funders
Public
6. The FAIR Guiding Principles for scientific data management and stewardship
https://www.nature.com/articles/sdata201618 (2016)
8. FAIR Checklists
Making Data Findable (documentation and metadata management)
• What documentation and metadata will accompany the data (assist its
discoverability)? (Details on methodology, definitions, procedures, SOPs,
vocabularies, units, dependencies, etc)
• What information is needed for the data to be read and interpreted in the
future?
• What naming conventions will be used?
• How will you approach versioning your data?
• How will you capture / create this documentation and metadata?
• How do you ensure the completeness of the captured data?
Making Data Accessible
Specify which data will be made openly available taking into consideration
• What ethics and legal compliance issues do you have if any? Do you need
consent for data preservation and sharing? Do you have to protect certain
data? Is any data sensitive?
• Do you think you might have Intellectual Property Rights issues? Have you
considered ownership of the data, licensing, restrictions on use?
• Do you think you will need to embargo any data?
• How will you make the data available? (consider the platforms you will use:
databases, repositories, etc)
• What methods or software tools are needed to access the data? shoudl you
include documentation detailing how to access use/access the software that is
needed for accessing the data? Is it possible to include this software with the
data (e.g. source code, docker etc)
• If there are any restrictions on accessibility, how will you provide access?
Making Data Interoperable
• What standards (metadata vocabularies, formats,
checklists) or methodologies will you use?
• How do you address data and model quality?What
validation steps do you foresee?
• Will you use standardised vocabulary for all data types
to allow inter-disciplinary interoperability?
• Where you can not used standardised vocabulary for all
types of data, can you map to more commonly used
ontologies?
Making Data Re-usable
• How will you licence your data to permit the widest re-
use possible?
• When will the data be made available for re-use? Does
this include an embargo period? (if so, why?)
• Which data will be available for re-use during/after the
project? If not, why?
• What are your data quality assurance processes?
• How long do you expect your data to remain re-usable?
9. FAIRDOM Initiative
- develop a community
- establish an internationally sustained Data and Model Management service
- joint action of ERA-Net EraSysAPP and European Research Infrastructure ISBE
10. A bit of history : 11Year Anniversary
2008
2010
2014
2018
2012
2016
2020
Standards based asset
management (data,
models, workflows,
SOPs…) for multi-party
projects
Sensitive sharing
Self-deposit / curation
Mixed stewardship skills
Legacy local systems
Community resources
Started in Systems
Biology. Now widened.
11. SEEK Software
- Open source web platform for sharing scientific research assets, processes
and outcomes
- Associations between data along with information about the people and
organisations (yellow pages)
- ISA (Investigation, Study, Assay) structure for describing how individual
experiments are aggregated into studies and investigations
- Flexible and detailed sharing permissions
- DOI can be generated for individual items, or entire aggregates
- Semantic technology, allowing sophisticated queries over the content
- Collection of meta data
https://seek4science.org/
18. Data Files, SOPs, Documents
- no file format restrictions
- some formats allow to view the content in SEEK: e.g.Excel,Word, PDF, XML, PNG
19. Models
SBML Model simulation
Model comparison
Model versioning
Reproducing simulations
[Jacky Snoep, Dagmar Waltemath,
Martin Peters, Martin Scharm]
21. Tracking model versions smartly
Scharm, M.,Wolkenhauer, O., &Waltemath, D. (2015). An algorithm to detect and communicate the differences in
computational models describing biological systems. Bioinformatics
29. More than simple supplementary materials
16 datafiles (kinetic, flux inhibition, runout)
19 models (kinetics, validation)
13 SOPs
3 studies (model analysis, construction,
validation)
24 assays/analyses (simulations, model
characterisations)
Penkler, G., du Toit, F., Adams, W., Rautenbach, M.,
Palm, D. C., van Niekerk, D. D. and Snoep, J. L. (2015),
Construction and validation of a detailed kinetic model
of glycolysis in Plasmodium falciparum. FEBS J, 282:
1481–1511. doi:10.1111/febs.13237
30. Scharm M,Wendland F, Peters M,Wolfien M,TheileT,Waltemath D
SEMS, University of Rostock
zip-like file with a manifest & metadata
- Bundling files - Keeping provenance
- Exchanging data - Shipping results
Bergmann, F.T.,Adams, R., Moodie, S., Cooper, J., Glont, M., Golebiewski, M., ... & Olivier, B. G. (2014). COMBINE archive and OMEX format:
one file to share all information to reproduce a modeling project. BMC bioinformatics,15(1), 1.
Packaging: COMBINEarchive
31. Standards-based metadata framework for
bundling (scattered) resources with context and citation
Packaging: Research Objects
http://researchobject.org
32. SEEK as project-specific local
instances or as central FAIRDOMHub
Service hosted at HITS
(Institutional Guarantee at least until 2029)
33. FAIRDOMHub Statistics
1st July 2019
Programmes 60
Projects 144
Institutions 274
People 1291
Data files 2280
Models 487
SOPs 301
Sample types 63
Presentations 729
Publications 370
Events 178
34. FAIRDOM Platform
Free and Open Source
Front end
Project(s) Hub
Back end
Onsite storage & analytics
On site
Tracking, data analytic pipelines,
Extract,Transform and Load direct from the
instruments, large data management
LIMS, auto-archiving
Web-based portal
Project controlled spaces
Metadata catalogue &Yellow pages
Results repository, dissemination and collaboration
Tool gateway
Built using Built using
35. Back end
Instrument Data Management, LIMS, ELN
• samples
• protocols
• instruments
• data management
• experimental description
Norway’s national e-Infrastructure
for Life Science
https://nels.bioinfo.no/
Electronic Laboratory Notebook and Laboratory
Information Management System (ELN-LIMS)
https://csb.ethz.ch/tools/software/openbis-lims-eln.html
36. [Adapted from Ursula Klingmüller, Martin Böhm]
Excemplify
Antibody
Database
FAIR collaboration
from the ERANet ERASysAPP
37. 38
Programme
Overarching research theme (The Digital Salmon)
Project
Research grant (DigiSal, GenoSysFat)
Investigation
A particular biological process, phenomenon or thing
(typically corresponds to [plans for] one or more closely related
papers)
Study
Experiment whose design reflects a specific biological research
question
Assay
Standardized measurement or diagnostic experiment using a
specific protocol
(applied to material from a study)
Jon Olav Vik,
Norwegian University of Life Science
Integration with Norway’s national
einfrastructure for Life Science (NeLS)
38. • Project controlled protected spaces
– Working space, show space for results
– Supp. materials space for publications
– Yellow pages and collaboration
– Upload or link to data
• One place catalogue
– Regardless of physical store
– ISA with shared metadata
– Standards-compliant
• Linked with other systems
– Project on-site (secure) repositories
– Public deposition archives
– Integrated with JWSOnline modelling tools
Front End
Find, Access and Organise assets
“Using FAIRDOMHub my own
lab colleagues saw what I was
doing and called to
collaborate!”
https://fairdomhub.org/
39. Catalogue across repositories regardless
of location
In House Stores
External Databases
Publishing services
Secure Stores
Model Resources
Upload or
Reference
42. PALs - Project Area Liaisons
PALs
DM Team
Data management training
Requirements & Suggestions
• Training needs for users
• Suggestions to improve SEEK
• Requirements for new SEEK
features and DM services
43. PALs - Project Area Liaisons
- our user focus group
- post docs, postgrads and techs
- experimentalists, modellers and bioinformaticians
- advocates and communicate our progress back to their projects
44. Data Stewards
function, profession, cultural shift
• 500,000 needed in Europe*
• Specialist skills
• Career pathways
• Recognition
Curation and management
• Supported, Resourced
• Recognised, Rewarded
Sharing policy and practice embedded
* Realising the Open European Science Cloud (2016)
47. LiSyM (Liver Systems Medicine)
German Research Network on
Systems Medicine for Liver Disease
Supported by
The German Federal Ministry of Education and Research 2016-2020
Multiple disciplines
• Medicine
• Biology, Biochemistry
• Pharmacology
• Physics
• Bioinformatics
• Data management
• Industry
38 independent research groups:
• Bayer AG
• Max Planck (Dresden and Berlin)
• MEVIS Fraunhofer (Bremen)
• Leibniz Institute IfaDo (Dortmund)
• Charité (Berlin)
• DKFZ (Heidelberg)
• Hospitals: Dresden, Kiel,Aachen, Homburg,
Berlin, Heidelberg, Munich
• + 18 Universitieshttp://www.lisym.org/
49. Clinical data sharing concept
Goal:
• Diffuse description of data
throughout consortium
Challenge:
• Some partners cannot share
Solution:
• Share table structure
• Create & share common code
• Distributedly create summaries
50. NMTrypI
Trypanosomiasis causes sleeping
sickness, leishmaniasis and Chagas
disease - in Africa, South America and
India
EU-funded project 2014 – 2017
Goal: new candidate drugs against
Trypanosomatidic infections
Consortium: 12 partners (3 SMEs and 9
academics) in Europe and in disease-
endemic countries (Italy, Greece, Portugal,
Spain, Germany, UK, Sudan, Brazil)
https://fp7-nmtrypi.eu
51. NMTrypI specific challenges
• New visualizations of spreadsheet data
• Cross-references with external databases
• Chemical compound specific features
– show structure
– allow (sub)structure search
– create compound summary reports
55. de.NBI -The German Network for
Bioinformatics Infrastructure
de.NBI consortium
• 39 project partners
• 30 institutions
• 8 service centers
https://www.denbi.de/
Mission
• Provide, expand and improve specialized
bioinformatics tools
• Provide access to computing and storage
capacities
• Provide regular training events and workshops
• Maintain and develop specific high-quality data
resources
56. Research and service topics of de.NBI service centers
HD-HuB
Bioinformatics Infrastructures in Biomedical Research
• Human genetics and genomics
• Metagenomics
• Systematic phenotyping of human cells
• Epigenetics
BiGi
Microbial Research for Biotechnology and Medicine
• High performance computing services
• Repository of reusable workflows
• Comparative genomics and meta-omics
• Post-genomics data integration
BioData
Reference Databases, Services and Tools
• Ribosomal RNAs (SILVA)
• Environmental data (PANGAEA)
• Taxon-associated metadata (BacDive)
• Enzymes & Ligands (BRENDA/EnzymeStructures)
CIBI
Tools for omics data and imaging
• Open-source libraries (OpenMS, SeqAn, FIJI)
• Tools for NGS, mass spec, and imaging
• Workflow engine (KNIME) for automation
• (Multi-)omics data analysis workflows
RBC
RNA Bioinformatics
• Analysis of RNA-related data
• Life science data analysis with Galaxy
• Meta-transcriptomics
• Epigenetic research
de.NBI-SysBio
Standards-based Systems Biology
• Data and model management tools
• SABIO-RK reaction kinetics data
• Methods and tools for modeling in Systems Biology
• Standards & tools for model search and management
GCBN
Crops and BioGreenFormatics
• Plant genetic resources and traits
• Bridging genotypes to phenotypes
• Plant gene and genome annotation
• Enabling technologies to improve crops
BioInfra.Prot
Bioinformatics for Proteomics
• Comprehensive proteomics workflow
• Data publication, analysis & tool services
• Quality standards for targeted proteomics
• Lipidomics
de.NBI -The German Network for
Bioinformatics Infrastructure
57. Current Actions in de.NBI
• Goal: Make Data FAIRness part of all de.NBI centers
• Idea: Have service centers collect more metadata. No metadata, no
service.
• Approach: Build use cases that involve data management and service
centers
Two example use cases: Medical proteomics center
• Statistical advice service
– tracking of advice given
– making reports FAIR
• From data to PRIDE
– Catalogue links to PRIDE in SEEK/FAIRDOMHub
– Store and standardise intermediate files
58. Summary FAIRDOM
FAIRDOM Software Platform+Tools
A Central Public Hub
for Projects
Customised Project
Installations
Project Stewardship
Consultancy Services
Community
Activities
144 Projects 30+ Installations
59. Summary FAIRDOM
Find & Access Central catalogue
Link to original files and external resources
Search
Metadata tagging and standards
Yellow pages of projects and people
Access control to spaces
Embedded tools
Interoperate Rich metadata, standards compliance
Consistent reporting – ISA
Curation support
Integration with other resources, archives, tools
Export packages
Reuse Secure sharing space
Long term retention
Reproducible publication
60. - Where do you store your experimental data?
- What happens with data when a PhD students leaves the group?
- Are all data complete for a publication?
- Do you make regular backups of your local machine?
- Do you send emails to share data with your colleagues?
- Do you always store email attachements in your local directory?
- Do you store all different versions of a data file together in the same place?
- Which protocol was used for the experiment?
...
Why do you need data management?
61. What can you do? Be FAIR!
1. make a Data Management Plan
2. use standard identifiers
3. use metadata standards
4. catalogue / register data with metadata
5. define and share your SOPs
6. use data (assets) management platforms and tools that work
together
7. deposit into public archives
8. have a sustainability / end project plan
9. resource and support, and that means people too
10. embed data management into work practices and do some
training
11. give credit
12. check if you have sensitive data issues
13. educate your supervisors, institutions and peers