We live in an era of cloud computing. Many of the services in the life sciences are keenly planning cloud transformations, seeking to create globally distributed ecosystems of harmonised data based on standards from organisations like GA4GH. CINECA faces similar challenges, gathering cohort datasets from all over the globe, many of which are pinned in place, due to their size, legal restrictions, or other considerations. But is “bringing compute to the data” always the right choice? In this webinar, based on experiences from the Human Cell Atlas Data Coordination Platform and other projects from EMBL-EBI, we will explore the concept of “data gravity”: The idea that whilst there are forces that may hold data in one place, there are others that require it to be mobile. We’ll consider how effectively planning a cloud strategy requires consideration of the gravity of datasets, and the impact it may have on team skills required, incentives for good practice, and storage and compute costs.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing. CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions.
This webinar took place on 12th November 2020 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
Presentation on the work we've done within BeSTGRID as it relates to bioinformatics in NZ, for the 2010 Bioinformatics Symposium https://www.bestgrid.org/NZ-Bioinformatics-Symposium-2010
An update on BeSTGRID activity and plans, in particular in preparation for the planned future developments of a unified approach to high performance and distributed computing in NZ.
A presentation by Rachel Bruce, director open science and research lifecycle, Jisc and Matthew Spitzer, community manager, Centre for Open Science (COS).
UK e-Infrastructure for Research - UK/USA HPC Workshop, Oxford, July 2015Martin Hamilton
A briefing on UK e-Infrastructure for research from Jisc and the UK research councils, presented at the UK/USA HPC workshop in July 2015, organized by HPC-SIG (UK) and CASC (USA).
Turning FAIR into Reality: Briefing on the EC’s report on FAIR datadri_ireland
DRI Director Natalie Harrower, a member of the European Commission's Expert Group on FAIR (Findable, Accessible, Interoperable and Re-usable) data, delivered a lunchtime briefing on the recently published 'Turning FAIR into Reality' report on Tuesday 26 February in the Royal Irish Academy, Dublin.
In 2016 the FAIR Data Principles were developed to support the position that effective research data management is ‘not a goal in itself but rather is the key conduit leading to knowledge discovery and innovation’. The new publication is both a report and an action plan for turning FAIR into reality. It offers a survey and analysis of what is needed to implement FAIR and it provides a set of concrete recommendations and actions for stakeholders in Europe and beyond.
The briefing provided an overview of the contents of the report, which include the principles of FAIR, as well as the elements required to implement FAIR data.
This webinar will focus on practical applications of the FAIR data principles, particularly in the context of clinical bioinformatics. We will highlight several example projects that have put the FAIR principles in practice, and discuss the advantages and some of the challenges involved. ELIXIR Galaxy community (elixir-europe.org/communities/galaxy) promotes the use of Galaxy projects that enhance the FAIRness in data analysis. We will demonstrate the Galaxy services that deliver practical FAIR data analysis with “Single Sign-On” capability provided by ELIXIR-AAI. The aim is to provide (medical) researchers with the practicalities of implementing and using FAIR principles in the context of the CINECA project as applied to translational research at Erasmus University Medical Center.
The “How FAIR are you” webinar series and hackathon aim at increasing and facilitating the uptake of FAIR approaches into software, training materials and cohort data, to facilitate responsible and ethical data and resource sharing and implementation of federated applications for data analysis.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing.
This webinar took place on 4th March 2021 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
Presentation on the work we've done within BeSTGRID as it relates to bioinformatics in NZ, for the 2010 Bioinformatics Symposium https://www.bestgrid.org/NZ-Bioinformatics-Symposium-2010
An update on BeSTGRID activity and plans, in particular in preparation for the planned future developments of a unified approach to high performance and distributed computing in NZ.
A presentation by Rachel Bruce, director open science and research lifecycle, Jisc and Matthew Spitzer, community manager, Centre for Open Science (COS).
UK e-Infrastructure for Research - UK/USA HPC Workshop, Oxford, July 2015Martin Hamilton
A briefing on UK e-Infrastructure for research from Jisc and the UK research councils, presented at the UK/USA HPC workshop in July 2015, organized by HPC-SIG (UK) and CASC (USA).
Turning FAIR into Reality: Briefing on the EC’s report on FAIR datadri_ireland
DRI Director Natalie Harrower, a member of the European Commission's Expert Group on FAIR (Findable, Accessible, Interoperable and Re-usable) data, delivered a lunchtime briefing on the recently published 'Turning FAIR into Reality' report on Tuesday 26 February in the Royal Irish Academy, Dublin.
In 2016 the FAIR Data Principles were developed to support the position that effective research data management is ‘not a goal in itself but rather is the key conduit leading to knowledge discovery and innovation’. The new publication is both a report and an action plan for turning FAIR into reality. It offers a survey and analysis of what is needed to implement FAIR and it provides a set of concrete recommendations and actions for stakeholders in Europe and beyond.
The briefing provided an overview of the contents of the report, which include the principles of FAIR, as well as the elements required to implement FAIR data.
This webinar will focus on practical applications of the FAIR data principles, particularly in the context of clinical bioinformatics. We will highlight several example projects that have put the FAIR principles in practice, and discuss the advantages and some of the challenges involved. ELIXIR Galaxy community (elixir-europe.org/communities/galaxy) promotes the use of Galaxy projects that enhance the FAIRness in data analysis. We will demonstrate the Galaxy services that deliver practical FAIR data analysis with “Single Sign-On” capability provided by ELIXIR-AAI. The aim is to provide (medical) researchers with the practicalities of implementing and using FAIR principles in the context of the CINECA project as applied to translational research at Erasmus University Medical Center.
The “How FAIR are you” webinar series and hackathon aim at increasing and facilitating the uptake of FAIR approaches into software, training materials and cohort data, to facilitate responsible and ethical data and resource sharing and implementation of federated applications for data analysis.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing.
This webinar took place on 4th March 2021 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
Cyber-Physical Energy Systems (CPES) exploit the potential of information technology to boost energy efficiency while minimising environmental impacts. CPES can help manage energy more efficiently by providing a functional view of the entire energy system so that energy activities can be understood, changed, and reinvented to better support sustainable practices. CPES can be applied at different scales from Smart Grids and Smart Cities to Smart Enterprises and Smart Buildings. Significant technical challenges exist in terms of information management, leveraging real-time sensor data, coordination of the various stakeholders to optimize energy usage.
In this talk I describe an approach to overcome these challenges by re-using the Web standards to quickly connect the required systems within a CPES. The resulting lightweight architecture leverages Web technologies including Linked Data, the Web of Things, and Social Media. The paper describes the fundamentals of the approach and demonstrates it within an Enterprise Energy Management scenario smart building.
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
Data management efforts such as Master Data Management and Data Curation are a popular approach for high quality enterprise data. However, Data Curation can be heavily centralised and labour intensive, where the cost and effort can become prohibitively high. The concentration of data management and stewardship onto a few highly skilled individuals, like developers and data experts, can be a significant bottleneck. This talk explores how to effectively involving a wider community of users within big data management activities. The bottom-up approach of involving crowds in the creation and management of data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. The talk discusses how crowdsourcing data management techniques can be applied within an enterprise context.
Topics covered include:
- Data Quality And Data Curation
- Crowdsourcing
- Case Studies on Crowdsourced Data Curation
- Setting up a Crowdsourced Data Curation Process
- Linked Open Data Example
- Future Research Challenges
Data accessibility and the role of informatics in predicting the biosphereAlex Hardisty
The variety, distinctiveness and complexity of life – biodiversity in other words and by implication the ecosystems in which it is situated – is our life support system. It is absolutely essential and more important than almost everything else but it is typically taken for granted. Today’s big societal challenges – food and water security, coping with environmental change and aspects of human health – are beyond the abilities of any one individual or research group to solve. Solving them depends not only on collaboration to deliver the appropriate scientific evidence but increasingly on vast amounts of data from multiple sources (environmental, taxonomic, genomic and ecological) gathered by manual observation and automated sensors, digitisation, remote sensing, and genetic sequencing. In April 2012 we called the biodiversity and ecosystems research communities to arms to formulate a consensus view on establishing an infrastructure to improve the accessibility of the ever-increasing volumes of biological data. We published the whitepaper: “A decadal view of biodiversity informatics: challenges and priorities” that has since been viewed more than 24,000 times. We envisage a shared and maintained multi-purpose network of computationally-based processing services sitting on top of an open data domain. By open data domain we mean data that is accessible i.e., published, registered and linked. BioVeL, pro-iBiosphere, ViBRANT and other FP7 funded projects have all explored aspects of this vision.
Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...EOSCpilot .eu
Turning FAIR into Reality report and action plan by Simon Hodson, Executive Director of CODATA, delivered during the FAIR Data Session at the EOSC Stakeholders Forum 2018
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
https://datascience.nih.gov/news/march-data-sharing-and-reuse-seminar 11 March 2022
Starting in 2023, the US National Institutes of Health (NIH) will require institutes and researchers receiving funding to include a Data Management Plan (DMP) in their grant applications, including the making their data publicly available. Similar mandates are already in place in Europe, for example a DMP is mandatory in Horizon Europe projects involving data.
Policy is one thing - practice is quite another. How do we provide the necessary information, guidance and advice for our bioscientists, researchers, data stewards and project managers? There are numerous repositories and standards. Which is best? What are the challenges at each step of the data lifecycle? How should different types of data? What tools are available? Research Data Management advice is often too general to be useful and specific information is fragmented and hard to find.
ELIXIR, the pan-national European Research Infrastructure for Life Science data, aims to enable research projects to operate “FAIR data first”. ELIXIR supports researchers across their whole RDM lifecycle, navigating the complexity of a data ecosystem that bridges from local cyberinfrastructures to pan-national archives and across bio-domains.
The ELIXIR RDMkit (https://rdmkit.elixir-europe.org (link is external)) is a toolkit built by the biosciences community, for the biosciences community to provide the RDM information they need. It is a framework for advice and best practice for RDM and acts as a hub of RDM information, with links to tool registries, training materials, standards, and databases, and to services that offer deeper knowledge for DMP planning and FAIR-ification practices.
Launched in March 2021, over 120 contributors have provided nearly 100 pages of content and links to more than 300 tools. Content covers the data lifecycle and specialized domains in biology, national considerations and examples of “tool assemblies” developed to support RDM. It has been accessed by over 123 countries, and the top of the access list is … the United States.
The RDMkit is already a recommended resource of the European Commission. The platform, editorial, and contributor methods helped build a specialized sister toolkit for infectious diseases as part of the recently launched BY-COVID project. The toolkit’s platform is the simplest we could manage - built on plain GitHub - and the whole development and contribution approach tailored to be as lightweight and sustainable as possible.
In this talk, Carole and Frederik will present the RDMkit; aims and context, content, community management, how folks can contribute, and our future plans and potential prospects for trans-Atlantic cooperation.
Data policy must be partnered with data practice. Our researchers need to be the best informed in order to meet these new data management and data sharing mandates.
If Big Data is data that exceeds the processing capacity of conventional systems, thereby necessitating alternative processing measures, we are looking at an essentially technological challenge that IT managers are best equipped to address.
The DCC is currently working with 18 HEIs to support and develop their capabilities in the management of research data and, whilst the aforementioned challenge is not usually core to their expressed concerns, are there particular issues of curation inherent to Big Data that might force a different perspective?
We have some understanding of Big Data from our contacts in the Astronomy and High Energy Physics domains, and the scale and speed of development in Genomics data generation is well known, but the inability to provide sufficient processing capacity is not one of their more frequent complaints.
That’s not to say that Big Science and its Big Data are free of challenges in data curation; only that they are shared with their lesser cousins, where one might say that the real challenge is less one of size than diversity and complexity.
This brief presentation explores those aspects of data curation that go beyond the challenges of processing power but which may lend a broader perspective to the technology selection process.
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
Cyber-Physical Energy Systems (CPES) exploit the potential of information technology to boost energy efficiency while minimising environmental impacts. CPES can help manage energy more efficiently by providing a functional view of the entire energy system so that energy activities can be understood, changed, and reinvented to better support sustainable practices. CPES can be applied at different scales from Smart Grids and Smart Cities to Smart Enterprises and Smart Buildings. Significant technical challenges exist in terms of information management, leveraging real-time sensor data, coordination of the various stakeholders to optimize energy usage.
In this talk I describe an approach to overcome these challenges by re-using the Web standards to quickly connect the required systems within a CPES. The resulting lightweight architecture leverages Web technologies including Linked Data, the Web of Things, and Social Media. The paper describes the fundamentals of the approach and demonstrates it within an Enterprise Energy Management scenario smart building.
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
Data management efforts such as Master Data Management and Data Curation are a popular approach for high quality enterprise data. However, Data Curation can be heavily centralised and labour intensive, where the cost and effort can become prohibitively high. The concentration of data management and stewardship onto a few highly skilled individuals, like developers and data experts, can be a significant bottleneck. This talk explores how to effectively involving a wider community of users within big data management activities. The bottom-up approach of involving crowds in the creation and management of data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. The talk discusses how crowdsourcing data management techniques can be applied within an enterprise context.
Topics covered include:
- Data Quality And Data Curation
- Crowdsourcing
- Case Studies on Crowdsourced Data Curation
- Setting up a Crowdsourced Data Curation Process
- Linked Open Data Example
- Future Research Challenges
Data accessibility and the role of informatics in predicting the biosphereAlex Hardisty
The variety, distinctiveness and complexity of life – biodiversity in other words and by implication the ecosystems in which it is situated – is our life support system. It is absolutely essential and more important than almost everything else but it is typically taken for granted. Today’s big societal challenges – food and water security, coping with environmental change and aspects of human health – are beyond the abilities of any one individual or research group to solve. Solving them depends not only on collaboration to deliver the appropriate scientific evidence but increasingly on vast amounts of data from multiple sources (environmental, taxonomic, genomic and ecological) gathered by manual observation and automated sensors, digitisation, remote sensing, and genetic sequencing. In April 2012 we called the biodiversity and ecosystems research communities to arms to formulate a consensus view on establishing an infrastructure to improve the accessibility of the ever-increasing volumes of biological data. We published the whitepaper: “A decadal view of biodiversity informatics: challenges and priorities” that has since been viewed more than 24,000 times. We envisage a shared and maintained multi-purpose network of computationally-based processing services sitting on top of an open data domain. By open data domain we mean data that is accessible i.e., published, registered and linked. BioVeL, pro-iBiosphere, ViBRANT and other FP7 funded projects have all explored aspects of this vision.
Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...EOSCpilot .eu
Turning FAIR into Reality report and action plan by Simon Hodson, Executive Director of CODATA, delivered during the FAIR Data Session at the EOSC Stakeholders Forum 2018
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
https://datascience.nih.gov/news/march-data-sharing-and-reuse-seminar 11 March 2022
Starting in 2023, the US National Institutes of Health (NIH) will require institutes and researchers receiving funding to include a Data Management Plan (DMP) in their grant applications, including the making their data publicly available. Similar mandates are already in place in Europe, for example a DMP is mandatory in Horizon Europe projects involving data.
Policy is one thing - practice is quite another. How do we provide the necessary information, guidance and advice for our bioscientists, researchers, data stewards and project managers? There are numerous repositories and standards. Which is best? What are the challenges at each step of the data lifecycle? How should different types of data? What tools are available? Research Data Management advice is often too general to be useful and specific information is fragmented and hard to find.
ELIXIR, the pan-national European Research Infrastructure for Life Science data, aims to enable research projects to operate “FAIR data first”. ELIXIR supports researchers across their whole RDM lifecycle, navigating the complexity of a data ecosystem that bridges from local cyberinfrastructures to pan-national archives and across bio-domains.
The ELIXIR RDMkit (https://rdmkit.elixir-europe.org (link is external)) is a toolkit built by the biosciences community, for the biosciences community to provide the RDM information they need. It is a framework for advice and best practice for RDM and acts as a hub of RDM information, with links to tool registries, training materials, standards, and databases, and to services that offer deeper knowledge for DMP planning and FAIR-ification practices.
Launched in March 2021, over 120 contributors have provided nearly 100 pages of content and links to more than 300 tools. Content covers the data lifecycle and specialized domains in biology, national considerations and examples of “tool assemblies” developed to support RDM. It has been accessed by over 123 countries, and the top of the access list is … the United States.
The RDMkit is already a recommended resource of the European Commission. The platform, editorial, and contributor methods helped build a specialized sister toolkit for infectious diseases as part of the recently launched BY-COVID project. The toolkit’s platform is the simplest we could manage - built on plain GitHub - and the whole development and contribution approach tailored to be as lightweight and sustainable as possible.
In this talk, Carole and Frederik will present the RDMkit; aims and context, content, community management, how folks can contribute, and our future plans and potential prospects for trans-Atlantic cooperation.
Data policy must be partnered with data practice. Our researchers need to be the best informed in order to meet these new data management and data sharing mandates.
If Big Data is data that exceeds the processing capacity of conventional systems, thereby necessitating alternative processing measures, we are looking at an essentially technological challenge that IT managers are best equipped to address.
The DCC is currently working with 18 HEIs to support and develop their capabilities in the management of research data and, whilst the aforementioned challenge is not usually core to their expressed concerns, are there particular issues of curation inherent to Big Data that might force a different perspective?
We have some understanding of Big Data from our contacts in the Astronomy and High Energy Physics domains, and the scale and speed of development in Genomics data generation is well known, but the inability to provide sufficient processing capacity is not one of their more frequent complaints.
That’s not to say that Big Science and its Big Data are free of challenges in data curation; only that they are shared with their lesser cousins, where one might say that the real challenge is less one of size than diversity and complexity.
This brief presentation explores those aspects of data curation that go beyond the challenges of processing power but which may lend a broader perspective to the technology selection process.
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
Maximizing the value of data, computing, data science in an academic medical center, or 'towards a molecularly informed Learning Health System. Given in October at the University of Florida in Gainesville
At the heart of this DataBench webinar is the goal to share a benchmarking process helping European organisations developing Big Data Technologies to reach for excellence and constantly improve their performance, by measuring their technology development activity against parameters of high business relevance.
The webinar aims to provide the audience with a framework and tools to assess the performance and impact of Big Data and AI technologies, by providing real insights coming from DataBench. In addition, representatives from other projects part of the BDV PPP such as DeepHealth and They-Buy-for-You will participate to share the challenges and opportunities they have identified on the use of Big Data, Analytics, AI. The perspective of other projects that also have looked into benchmarking, such as Track&Now and I-BiDaaS will be introduced.
BioIT 2018 'Easier integration and enrichment of your data by making public d...Hans Constandt
Joint presentation, Hans Constandt, CEO, ONTOFORCE and Chris Evelo, Ph.D., Maastricht University and ELIXIR
Public data has different levels of FAIRness. The higher the FAIRness level of a data source, the easier it is to use this source for data integration and linking. One of the goals of the intergovernmental organization ELIXIR is to facilitate the improvement of finding and sharing data and exchange of expertise in life science. ONTOFORCE focusses on integrating and linking public and private data by bringing data to a higher level of FAIRness. In this joint presentation, we will discuss what ELIXIR is doing to make public data more FAIR and combine this with showing examples of what the direct benefits are for data searching, browsing and visual analytics on the DISQOVER platform by making and using more FAIR internal, private or third party data.
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
Seminar for Dr. Min Zhang's Purdue Bioinformatics Seminar Series. Touched on learning health systems, the Gen3 Data Commons, the NCI Genomic Data Commons, Data Harmonization, FAIR, and open science.
In the age of Big Data, filtering mechanisms have to professionalized to increase accessibility to data. This presentation, held at Knowledge Management Academy in Vienna, shows how technologies derived from the Semantic Web can help to establish more efficient means to manage data and information.
LIBER Webinar: Turning FAIR Data Into RealityLIBER Europe
These slides relate to a LIBER Webinar given on 23 April 2018. Turning FAIR Data Into Reality — Progress and Plans from the European Commission FAIR Data Expert Group.
In this webinar, Simon Hodson, Executive Director of CODATA and Chair of the FAIR Data Expert Group, and Sarah Jones, Associate Director at the Digital Curation Centre and Rapporteur, reported on the Group’s progress.
Delivering Faster Insights with a Logical Data FabricDenodo
Watch full webinar here: https://bit.ly/38B5yOW
We will learn from our speakers today how a logical data fabric helps organisations realise faster insights. They will touch on the recent Forrester total economic impact report, as well as discuss real life customer use cases where a demonstrably faster time to insights helped achieve better decision making, supporting improved business goals.
Presentation investigating the state of FAIR practice and what is needed to turn FAIR data into reality given at the Danish FAIR conference in Copenhagen on 20th November 2018. https://vidensportal.deic.dk/en/Programme/FAIR_Toolbox_Nov2018 The presentation reflect on recent FAIR studies and international initiatives and outlines the recommendations emerging from the European Commission's FAIR Data Expert Group report - http://tinyurl.com/FAIR-EG
Similar to CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned from the Human Cell Atlas and other federated data projects (20)
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECAProject
Genetic analysis of molecular traits such as gene expression, splicing and chromatin accessibility requires a number of complex analysis steps that can easily take weeks or months for a analyst to implement from scratch. In the CINECA project, we have developed a number of modular Nextflow workflows that standardise and automate these steps. In this webinar, we will give an overview of the CINECA workflows for genotype imputation, gene expression and splicing quantification, data normalisation and association testing, and demonstrate how these workflows can be used in a federated setting without transferring identifiable personal data between partners.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing.
This webinar took place on 10th November 2020 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
Beacon v2 Reference Implementation: An OverviewCINECAProject
The Beacon v2 Reference Implementation (B2RI) is a free open source Linux-based software created by the Centre for Genomic Regulation (Barcelona, Spain) that allows lighting up a Beacon v2 out-of-the-box.In this training session, a B2RI developer will give an overview on how to use the software to “beaconize” your data (from the user’s perspective).
At the end of this training session, participants will be familiarized with the input and output requirements of the B2RI, as well as with the type of queries allowed.
This training session was delivered on 17 February 2022 as part of the CINECA GA4GH Beacon series.
You can learn about the CINECA project on https://www.cineca-project.eu/
Lighting a Beacon: training for (future) implementersCINECAProject
The Beacon project has received a lot of attention and can already count with implementations (e.g., CINECA cohorts, Beacon+). Beacon purposely leaves a lot of room for freedom on the options to be implemented, and its versatility is part of its success. It also means that implementing Beacon can sometimes be quite challenging: this training session presents the steps for a Beacon implementation in general ways, and leaves room for an extended Q&A session from the audience. The questions might be sent in advance so the use cases can turn into training exercises.
At the end of this training session, participants should have the tools to get started with the implementation autonomously, and have identified the resource persons to answer their questions during the implementation process.
This training session was delivered on 15 February 2022 as part of the CINECA GA4GH Beacon series.
You can learn about the CINECA project on https://www.cineca-project.eu/
CINECA webinar slides: Ethics/ELSI considerations - From FAIR to fair data sh...CINECAProject
The FAIR principles – standing for Findability, Accessibility, Interoperability, and Reusability – have become the guiding principles for the wider sharing of research data in the life sciences. While FAIR provides guidance for the management of data as well as tools and workflows, the institutional conditions and organizational challenges associated with data sharing need to be taken into account to ensure responsible and fair data practices. This requires considering the context of legal requirements, for instance the principle of fairness and transparency in GDPR, expectations of research participants/data subjects, societal aspects and the “ethics work” that is an integral part of data flows, as well as fairness, equity and benefit sharing within transnational collaborations, which is of utmost importance. This webinar will, from the perspective of ethical, legal and societal implications (ELSI), discuss this broader context of responsible and fair data sharing associated with FAIR.
The “How FAIR are you” webinar series and hackathon aim at increasing and facilitating the uptake of FAIR approaches into software, training materials and cohort data, to facilitate responsible and ethical data and resource sharing and implementation of federated applications for data analysis.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing.
This webinar took place on 15th April 2021 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
CINECA webinar slides: How to make training FAIRCINECAProject
There is a wealth of training content developed and delivered across the globe each year, there will be many similar sessions on similar topics all delivered to similar audiences. At the same time, there will be trainers looking for inspiration and ideas on how to approach new topics or on new ways to teach old topics; as well as trainees looking for materials to further their own knowledge. Many trainers (or lecturers/educators etc) do not share their materials, or if materials are shared they are not easily found or re-used by others.
During this webinar, we will give you some tips and suggestions on how you can make more of the training materials you produce and encourage others to do the same. FAIR is not just for data - we can make our training materials FAIR too. Join us to find out the benefits of sharing your FAIR materials and some simple ways you can make it easy for others to use your materials in their teaching, or as aids for individuals to learn more.
The “How FAIR are you” webinar series and hackathon aim at increasing and facilitating the uptake of FAIR approaches into software, training materials and cohort data, to facilitate responsible and ethical data and resource sharing and implementation of federated applications for data analysis.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing.
This webinar took place on 18th March 2021 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
In this talk, I will discuss the importance of the FAIR principles for the software tools we use to process data. Ranging from small analysis scripts to full fledged data processing pipelines, software needs to be FAIR to enable other researchers to reproduce our own experiments and reuse our software. However software and data are fundamentally different – software is executable in nature and may have intricate dependencies. FAIR principles apply differently to software than they do to data and we must be aware of these differences. Existing initiatives such as the RDA FAIR for Research Software (FAIR4RS) working group (https://www.rd-alliance.org/groups/fair-4-research-software-fair4rs-wg) and http://fair-software.eu/ are already focused on addressing these differences and raising awareness of the importance of FAIR for software.
The “How FAIR are you” webinar series and hackathon aim at increasing and facilitating the uptake of FAIR approaches into software, training materials and cohort data, to facilitate responsible and ethical data and resource sharing and implementation of federated applications for data analysis.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing.
This webinar took place on 24th February 2021 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
CINECA webinar slides: Making cohort data FAIRCINECAProject
Cohort studies, which recruit groups of individuals who share common characteristics and follow them over a period of time, are a robust and essential method in biomedical research for understanding the links between risk factors and diseases. Through questionnaires, medical assessments, and other interactions, voluminous and complex data are collected about the study participants. While cohort studies present a treasure trove of data, the data is often not FAIR (findable, accessible, interoperable and reusable). First, due to the sensitive and private nature of medical information, cohort data are often access controlled. Due to the lack of information about the studies (metadata), often one needs to dig deep to know what data is available in a cohort study. Therefore, many cohort datasets suffer from the findable and accessible issues. Second, often data collection is performed with instruments and data specifications tailored to the study. As a result, combining data across cohorts, even ones with similar characteristics, is difficult, making interoperability and reusability a challenge. In this presentation, we will explore several informatics techniques, such as the use of ontology, to make cohort data more FAIR. We will also consider the implications of making cohort data more open and the ethical and governance issues associated with open science benefit sharing.
This webinar is part of the “How FAIR are you” webinar series and hackathon, which aim at increasing and facilitating the uptake of FAIR approaches into software, training materials and cohort data, to facilitate responsible and ethical data and resource sharing and implementation of federated applications for data analysis.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing. CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions.
This webinar took place on 17th February 2021 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
CINECA webinar slides: Open science through fair health data networks dream o...CINECAProject
Since the FAIR data principles were published in 2016, many organizations including science funders and governments have adopted these principles to promote and foster true open science collaborations. However, to define a vision and create a video of a Personal Health Train that leverages worldwide FAIR health data in a federated manner is one step. To actually make this happen at scale and be able to show new scientific and medical insights for it is quite another!
In this webinar, we will dive into the basics of FAIR health data, but also take stock of the current situation in health data networks: after a year of frantic research and collaborations and many open datasets and hackathons on COVID-19, has the situation actually improved? Are we sharing health data on a global scale to improve medical practice, or is quality medical data still only accessible to researchers with the right credentials and deep pockets?
This webinar is part of the “How FAIR are you” webinar series and hackathon, which aim at increasing and facilitating the uptake of FAIR approaches into software, training materials and cohort data, to facilitate responsible and ethical data and resource sharing and implementation of federated applications for data analysis.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing. CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions.
This webinar took place on 21st January 2021 and is part of the CINECA webinar series.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
CINECA webinar slides: Status Update Code of Conduct: Teaming up & Talking ab...CINECAProject
Committed to the drafting of a Code of Conduct for the sector of health research according to Art. 40 GDPR, our initiative is advancing slowly but steadily. Throughout Europe, national jurisdictions differ to a great deal in their interpretations of the GDPR, especially in regard to its application in health research. This is due to some quite vague provisions (public interest, not incompatible clause) as wells as to numerous exemption/derogation clauses concerning the use of health data for research purposes, which encourage States to set up national rules – enhancing fragmentation. Notably, a Code of Conduct can help to bridge the harmonization gaps that may exist between Member States in their application of data protection law. On a practical level, a code is potentially a cost-effective method to achieve greater levels of consistency of protection as well as a mechanism to demonstrate compliance with the GDPR. By spring 2020, several hundred individuals representing around 90 organizations in the field of health research have indicated their interest and support for the Code of Conduct for Health Research. At this stage, this does not yet indicate an endorsement but means that they see a benefit in the development of such a code and are interested in partaking in the process. Additionally, several exchanges take place with national and sectoral codes in order to use synergies and finds ways for collaboration. This webinar is intended to inform you about the latest results.
The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing. CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions.
This webinar took place on 1st October 2020 and is part of the CINECA webinar series. It is best viewed in full screen mode using Google Chrome.
For previous and upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
CINECA webinar slides: H3ABioNet Experiences in Phenotype Data Harmonisation ...CINECAProject
H3ABioNet supports the H3Africa Consortium in their research data collection and analysis efforts. This includes both genomic and phenotypic data. With the completion of data collection from some of the projects in H3Africa cycle 1 of funding it become clear midway that some data standardisation/harmonisation would be needed to facilitate meta-analyses of data especially with regards to the clinical data collected by the various studies. H3ABioNet working with the H3Africa Phenotype Harmonisation Working Group have developed an H3Africa Standard CRF which newer studies in cycle 2 of funding having taken up to use for their clinical data collection and we have also supported post data collection harmonisation effort for the H3Africa Cardiovascular Disease Working Group assisting with harmonising data across 6 different studies. We will be talking about experiences in both standards development in Africa and data harmonisation methods used.
CINECA (Common Infrastructure for National Cohorts in Europe, Canada, and Africa) project, aims to develop a federated cloud enabled infrastructure to make population scale genomic and biomolecular data accessible across international borders, to accelerate research, and improve the health of individuals across continents. CINECA will leverage international investment in human cohort studies from Europe, Canada, and Africa to deliver a paradigm shift of federated research and clinical applications.
This webinar took place on 12th December 2019. Recording of the webinar is available through the CINECA website.
https://www.cineca-project.eu/news-events-all/h3abionet-experiences-in-phenotype-data-harmonisation-and-standards-development
For upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
CINECA webinar slides: Ethical, legal and societal issues in international da...CINECAProject
The CINECA webinar series continues with a presentation by Dr. Éloïse Gennet (INSERM) and Dr. Melanie Goisauf (BBMRI-ERIC) on Ethical, Legal and Societal Issues in international data sharing.
The goal of this webinar will be to present the first findings of the ELSI activities in the CINECA project, ranging from questions of ethics of data sharing across continents to legal basis of secondary processing of personal data, consent requirements and vulnerable groups or public and stakeholders’ attitudes toward sharing of genomic and health related data for research.
CINECA (Common Infrastructure for National Cohorts in Europe, Canada, and Africa) project, aims to develop a federated cloud enabled infrastructure to make population scale genomic and biomolecular data accessible across international borders, to accelerate research, and improve the health of individuals across continents. CINECA will leverage international investment in human cohort studies from Europe, Canada, and Africa to deliver a paradigm shift of federated research and clinical applications.
This webinar took place on 24th January 2020. Recording of the webinar is available through the CINECA website.
https://www.cineca-project.eu/news-events-all/ethical-legal-and-societal-issues-in-international-data-sharing
For upcoming CINECA webinars see:
https://www.cineca-project.eu/webinars
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned from the Human Cell Atlas and other federated data projects
1. This project has received funding from the European Union’s Horizon 2020 research and
Innovation programme under grant agreement No. 825775
Data Gravity in the Life Sciences: Lessons learned
from the Human Cell Atlas and other federated data
projects
Presenter: Tony Burdett (EMBL-EBI)
Host: Marta Lloret Llinares (EMBL-EBI)
4. The challenges:
Stay
informed
@CinecaProject
www.cineca-project.eu
Common Infrastructure for National Cohorts
in Europe, Canada and Africa
This project has received funding from the European Union’s Horizon 2020 research and
Innovation programme under grant agreement No. 825775
Accelerating disease research and
improving health by facilitating
transcontinental human data exchange
The vision:
This project has received funding from the Canadian Institute of Health
Research under grant agreement #404896
5. Today’s presenter
Tony Burdett leads the Archival Infrastructure and Technology team,
which develops services and provides technology to support the
activities of EMBL-EBI’s molecular archives, including data submission,
storage, validation, coordination and presentation.
Tony joined EMBL-EBI in 2005 and has personally built and led
development teams for many resources such as the GWAS Catalog,
ArrayExpress, the Expression Atlas and BioSamples. His team now
develops the ingestion service for the Human Cell Atlas Data
Coordination Platform, EMBL-EBI’s Unified Submission Interface, and the
BioSamples database.
6. Lessons learned from the Human Cell Atlas and other
federated data projects
Data Gravity in the Life Sciences
Tony Burdett, EMBL-EBI
12th November, 2020
7.
8. A bit about me…
• I joined EBI in 2005
• I have a biological and medical background
• My career has been heavily focused on service engineering in bioinformatics
• I’ve built, helped develop, or run the development teams for…
• ArrayExpress
• Expression Atlas
• BioSamples
• Ontology tooling
• GWAS Catalog
• Human Cell Atlas DCP
9. Data Gravity
I didn’t coin the term...
https://datagravitas.com/2010/12/07/data-gravity-in-the-clouds/
10. vR
BC
G =
“Let data gravity of a given dataset, G, be the product of data volume, V and the regulatory restrictions of the
region in which the data was generated, R, over the bandwidth at the location of the data, B, and the cost of
compute in that location, C”
Data Gravity
Background photo created by rawpixel.com - www.freepik.com
16. Percentage of whole genomes and exomes
that are funded solely by healthcare systems
2012
~1%
2017
~20%
2022
>80%
Changing Genomic Data Generation Landscape
25. ● Data and Data Sciences are core elements of Health Research and
Innovation and in all elements of Biopharma Research
● The impact and reuse of data is rapidly growing - but nearly 80% of
investment is spent assembling and harmonizing data
Bottleneck: FAIR Data
Forbes article on 2016 Data
Scientist Report
26. Cost of not having FAIR research data:
€26bn/yr in Europe
https://dx.doi.org/10.2777/02999
Impact on innovation
27. Bottleneck: Data Federation
• National genomics initiatives in most European
countries
• Primary goal healthcare diagnostics and personalised
medicine
• Federated EGA is a harmonised platform for human
data discovery, access, distribution, coordinated via
ELIXIR human data community
• Central EGA: International submissions+helpdesk
• Local EGA: Host data locally, share metadata, national
node for submissions and/or helpdesk
• EGA community: Host data locally, share metadata
29. @CinecaProject
CINECA - Federated Analysis
Data sources
EGA
Biobanks
CHILD
H3ABioNet
..
WP1
Federated data
discovery
- Phenotype
- Genotype
- Data use
WP4
Federated
research
- Federated
GWAS
- Federated
Genomic
Analyses
WP3
Cohort Level
Meta Data
Representation
WP2
AAI
- Europe,
Canada, Africa
interoperability
30. Sending Compute to Data… Globally?
• Global data storage and analysis
infrastructures required
• Generating truly portable analysis
workflows is complex - and we
don’t have good solutions yet
• Some high powered spacecraft still
need building!
33. Human Cell Atlas - profiling millions of human cells
Global effort requiring:
• Hundreds of labs
• Organ-specific data
• Disparate experimental
techniques and data types
Integrating data at this scale
requires next generation
technology and infrastructure
34. Comprehensive Inclusive Organized Dynamic
G
en
eti
cs
Accessible
Tom Deerinck, NIGMS, NIH
Human Cell Atlas Data Coordination Platform
To bridge disparate data, tools and research from all over the world, we must
bring them together in a public platform (the “HCA DCP”) that is:
35. Labs contribute
single-cell data
DCP pipelines upload
authors data and process
Researchers access
data on the portal
Researchers find
community tools to
work with the data
How it works: the DCP data flow
38. “Cloud native” engineering is
not enough
to change behaviour
Lessons Learned
• The DCP adopted a heavily “cloud
native” engineering approach
• Services are somewhat traditional
• Data archive (both raw and
summary results)
• Analysis pipeline
• Engineered with cloud technology
(has no impact to users)
• All the data lives in AWS or GCP, in
US-East (expensive to download)
• Analysis platform available (but
underused)
41. Strategic Implications
Data Gravity in the life sciences tells us we need a culture change
Federating data and analysis requires:
1. Standards
2. Data provider adoption
3. Data consumer adoption
4. Understanding and considering
data gravity
42. Strategic Implications
Data Gravity in the life sciences tells us we need a culture change
Federating data and analysis requires:
1. Standards
2. Data provider adoption
3. Data consumer adoption
4. Understanding and considering
data gravity
SKILLS
43. Strategic Implications
Data Gravity in the life sciences tells us we need a culture change
Federating data and analysis requires:
1. Standards
2. Data provider adoption
3. Data consumer adoption
4. Understanding and considering
data gravity
SKILLS
INCENTIVES
44. Strategic Implications
Data Gravity in the life sciences tells us we need a culture change
Federating data and analysis requires:
1. Standards
2. Data provider adoption
3. Data consumer adoption
4. Understanding and considering
data gravity
SKILLS
INCENTIVES
COSTS
45.
46. Credit to: Ian Harrow, FAIR & OM projects
FAIR as enabler for the digital transformation
Slide credit: Susanna Sansone
46
● Data providers improve their own returns
by implementing the FAIR Principles -
gathering traction in big pharma
● FAIR enables powerful new AI analytics to
access data for machine learning and
prediction
● Requirements
○ financial, technical, training
● Challenges
○ change the culture, show business value,
achieve the ‘FAIR enough’
○ Sustain FAIR solutions and activities
48. Top Tips: Driving Data Consumer Adoption
1. Identify good measures of value
• What can I do faster, cheaper, better?
• How many people are using your cloud platform vs downloading data?
2. Start small and expand
• Big re-engineering efforts are costly, risky, and too slow to keep up with
the rate of change in the field
3. Find some exemplars
• Are there smaller sets of data that are high value?
• Can you pilot approaches within communities?
4. Invest in training and outreach
• Even if data is federated and the cloud platform exists, many
bioinformaticians do not have the skills to exploit them
50. vR
BC
G =
“Let data gravity of a given dataset, G, be the product of data volume, V and the regulatory restrictions of the
region in which the data was generated, R, over the bandwidth at the location of the data, B, and the cost of
compute in that location, C”
Data Gravity
53. Questions?
Title: Data Gravity in the Life Sciences: Lessons learned from the
Human Cell Atlas and other federated data projects
Presenter: Tony Burdett
Please write your questions in the
questions window of the GoToWebinar
application