SlideShare a Scribd company logo
This project has received funding from the European Union’s Horizon 2020 research and
Innovation programme under grant agreement No. 825775
Data Gravity in the Life Sciences: Lessons learned
from the Human Cell Atlas and other federated data
projects
Presenter: Tony Burdett (EMBL-EBI)
Host: Marta Lloret Llinares (EMBL-EBI)
This webinar is being recorded
Audience Q&A Session
Please write your
questions in the
questions
window of the
GoToWebinar
application
The challenges:
Stay
informed
@CinecaProject
www.cineca-project.eu
Common Infrastructure for National Cohorts
in Europe, Canada and Africa
This project has received funding from the European Union’s Horizon 2020 research and
Innovation programme under grant agreement No. 825775
Accelerating disease research and
improving health by facilitating
transcontinental human data exchange
The vision:
This project has received funding from the Canadian Institute of Health
Research under grant agreement #404896
Today’s presenter
Tony Burdett leads the Archival Infrastructure and Technology team,
which develops services and provides technology to support the
activities of EMBL-EBI’s molecular archives, including data submission,
storage, validation, coordination and presentation.
Tony joined EMBL-EBI in 2005 and has personally built and led
development teams for many resources such as the GWAS Catalog,
ArrayExpress, the Expression Atlas and BioSamples. His team now
develops the ingestion service for the Human Cell Atlas Data
Coordination Platform, EMBL-EBI’s Unified Submission Interface, and the
BioSamples database.
Lessons learned from the Human Cell Atlas and other
federated data projects
Data Gravity in the Life Sciences
Tony Burdett, EMBL-EBI
12th November, 2020
A bit about me…
• I joined EBI in 2005
• I have a biological and medical background
• My career has been heavily focused on service engineering in bioinformatics
• I’ve built, helped develop, or run the development teams for…
• ArrayExpress
• Expression Atlas
• BioSamples
• Ontology tooling
• GWAS Catalog
• Human Cell Atlas DCP
Data Gravity
I didn’t coin the term...
https://datagravitas.com/2010/12/07/data-gravity-in-the-clouds/
vR
BC
G =
“Let data gravity of a given dataset, G, be the product of data volume, V and the regulatory restrictions of the
region in which the data was generated, R, over the bandwidth at the location of the data, B, and the cost of
compute in that location, C”
Data Gravity
Background photo created by rawpixel.com - www.freepik.com
Data Gravity
Data Gravity
Data Gravity
Why does “data gravity” matter?
Who uses EMBL-EBI services?
Percentage of whole genomes and exomes
that are funded solely by healthcare systems
2012
~1%
2017
~20%
2022
>80%
Changing Genomic Data Generation Landscape
Data Gravity
Big Data in Digital Biology: EMBL-EBI 2015-2019
Public Web Infrastructure
• Web Requests: 27M → 40M/day
• Unique Host IPs: 1.1M → 2.4M/month
• Web Jobs: 138M → 145M/year
• Search Requests: 272M → 551M/year
6.2PB → 22.7PB
1600VMs → 3100VMs
(TB)
450TB → 973TB
Slide acknowledgment: Steven Newhouse
Data Gravity
Data GravityData Gravity
Collating Data for Analysis
Data being analysed
Cohort datasets
Reference annotation datasets
Proprietary, firewalled datasets
Bottlenecks and Barriers
FEDERATED
DATA
FEDERATED
WORKFLOW
EXECUTION
GLOBAL FEDERATED RESEARCH PLATFORM
● Data and Data Sciences are core elements of Health Research and
Innovation and in all elements of Biopharma Research
● The impact and reuse of data is rapidly growing - but nearly 80% of
investment is spent assembling and harmonizing data
Bottleneck: FAIR Data
Forbes article on 2016 Data
Scientist Report
Cost of not having FAIR research data:
€26bn/yr in Europe
https://dx.doi.org/10.2777/02999
Impact on innovation
Bottleneck: Data Federation
• National genomics initiatives in most European
countries
• Primary goal healthcare diagnostics and personalised
medicine
• Federated EGA is a harmonised platform for human
data discovery, access, distribution, coordinated via
ELIXIR human data community
• Central EGA: International submissions+helpdesk
• Local EGA: Host data locally, share metadata, national
node for submissions and/or helpdesk
• EGA community: Host data locally, share metadata
Bottleneck: Reproducible Research and Analysis
Figure courtesy of: https://esciencelab.org.uk/projects/eosclife/
@CinecaProject
CINECA - Federated Analysis
Data sources
EGA
Biobanks
CHILD
H3ABioNet
..
WP1
Federated data
discovery
- Phenotype
- Genotype
- Data use
WP4
Federated
research
- Federated
GWAS
- Federated
Genomic
Analyses
WP3
Cohort Level
Meta Data
Representation
WP2
AAI
- Europe,
Canada, Africa
interoperability
Sending Compute to Data… Globally?
• Global data storage and analysis
infrastructures required
• Generating truly portable analysis
workflows is complex - and we
don’t have good solutions yet
• Some high powered spacecraft still
need building!
Overcoming Data Gravity
DEPENDS ON...
Costs of compute
Network bandwidth
Data sharing
regulations
Data volumes
“Cloud native” is the answer!
Human Cell Atlas - profiling millions of human cells
Global effort requiring:
• Hundreds of labs
• Organ-specific data
• Disparate experimental
techniques and data types
Integrating data at this scale
requires next generation
technology and infrastructure
Comprehensive Inclusive Organized Dynamic
G
en
eti
cs
Accessible
Tom Deerinck, NIGMS, NIH
Human Cell Atlas Data Coordination Platform
To bridge disparate data, tools and research from all over the world, we must
bring them together in a public platform (the “HCA DCP”) that is:
Labs contribute
single-cell data
DCP pipelines upload
authors data and process
Researchers access
data on the portal
Researchers find
community tools to
work with the data
How it works: the DCP data flow
HCA DCP Architecture
Outcomes Downloads
(Metadata)
Downloads
(Raw and
Analysed Data)
Checkout to Terra
(to work on in
analysis platform)
HCA DCP Data Browser Statistics from Q3 2020,
from a total 2671 data access requests
“Cloud native” engineering is
not enough
to change behaviour
Lessons Learned
• The DCP adopted a heavily “cloud
native” engineering approach
• Services are somewhat traditional
• Data archive (both raw and
summary results)
• Analysis pipeline
• Engineered with cloud technology
(has no impact to users)
• All the data lives in AWS or GCP, in
US-East (expensive to download)
• Analysis platform available (but
underused)
Strategic Implications
Data Gravity in the life sciences tells us we need a culture change
Strategic Implications
Data Gravity in the life sciences tells us we need a culture change
Federating data and analysis requires:
1. Standards
2. Data provider adoption
3. Data consumer adoption
4. Understanding and considering
data gravity
Strategic Implications
Data Gravity in the life sciences tells us we need a culture change
Federating data and analysis requires:
1. Standards
2. Data provider adoption
3. Data consumer adoption
4. Understanding and considering
data gravity
SKILLS
Strategic Implications
Data Gravity in the life sciences tells us we need a culture change
Federating data and analysis requires:
1. Standards
2. Data provider adoption
3. Data consumer adoption
4. Understanding and considering
data gravity
SKILLS
INCENTIVES
Strategic Implications
Data Gravity in the life sciences tells us we need a culture change
Federating data and analysis requires:
1. Standards
2. Data provider adoption
3. Data consumer adoption
4. Understanding and considering
data gravity
SKILLS
INCENTIVES
COSTS
Credit to: Ian Harrow, FAIR & OM projects
FAIR as enabler for the digital transformation
Slide credit: Susanna Sansone
46
● Data providers improve their own returns
by implementing the FAIR Principles -
gathering traction in big pharma
● FAIR enables powerful new AI analytics to
access data for machine learning and
prediction
● Requirements
○ financial, technical, training
● Challenges
○ change the culture, show business value,
achieve the ‘FAIR enough’
○ Sustain FAIR solutions and activities
47
https://www.covid19dataportal.org/
https://covidhub.psnc.pl/
https://covid19dataportal.se/sv/
https://covid19dataportal.jp/
COVID-19 Data Portals
Top Tips: Driving Data Consumer Adoption
1. Identify good measures of value
• What can I do faster, cheaper, better?
• How many people are using your cloud platform vs downloading data?
2. Start small and expand
• Big re-engineering efforts are costly, risky, and too slow to keep up with
the rate of change in the field
3. Find some exemplars
• Are there smaller sets of data that are high value?
• Can you pilot approaches within communities?
4. Invest in training and outreach
• Even if data is federated and the cloud platform exists, many
bioinformaticians do not have the skills to exploit them
Data Gravity
vR
BC
G =
“Let data gravity of a given dataset, G, be the product of data volume, V and the regulatory restrictions of the
region in which the data was generated, R, over the bandwidth at the location of the data, B, and the cost of
compute in that location, C”
Data Gravity
The AIT Team at EMBL-EBI
Acknowledgements
Questions?
Questions?
Title: Data Gravity in the Life Sciences: Lessons learned from the
Human Cell Atlas and other federated data projects
Presenter: Tony Burdett
Please write your questions in the
questions window of the GoToWebinar
application

More Related Content

What's hot

Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Edward Curry
 
Ticer summer school_24_aug06
Ticer summer school_24_aug06Ticer summer school_24_aug06
Ticer summer school_24_aug06
SayDotCom.com
 
SKA NZ R&D BeSTGRID Infrastructure
SKA NZ R&D BeSTGRID InfrastructureSKA NZ R&D BeSTGRID Infrastructure
SKA NZ R&D BeSTGRID Infrastructure
Nick Jones
 
Supporting the community-owned open scholarly communications ecosystem
Supporting the community-owned open scholarly communications ecosystemSupporting the community-owned open scholarly communications ecosystem
Supporting the community-owned open scholarly communications ecosystem
Jisc
 
Jisc's new shared data centre
Jisc's new shared data centreJisc's new shared data centre
Jisc's new shared data centre
Jisc
 
Supporting Research Data Management at the University of Stirling
Supporting Research Data Management at the University of StirlingSupporting Research Data Management at the University of Stirling
Supporting Research Data Management at the University of StirlingLisa Haddow
 
Global Research Data Initiatives
Global Research Data InitiativesGlobal Research Data Initiatives
Global Research Data Initiatives
Sarah Jones
 
Creating a Data Management Plan for your Grant Application
Creating a Data Management Plan for your Grant ApplicationCreating a Data Management Plan for your Grant Application
Creating a Data Management Plan for your Grant Application
Historic Environment Scotland
 
Data quality supporting AI in Life Sciences webinar 10 dec 2018
Data quality supporting AI in Life Sciences webinar 10 dec 2018Data quality supporting AI in Life Sciences webinar 10 dec 2018
Data quality supporting AI in Life Sciences webinar 10 dec 2018
Pistoia Alliance
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Edward Curry
 
091020 E Research Otago
091020 E Research Otago091020 E Research Otago
091020 E Research Otago
Nick Jones
 
Data accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereData accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphere
Alex Hardisty
 
Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016
Pistoia Alliance
 
Data discovery and sharing at UCLH
Data discovery and sharing at UCLHData discovery and sharing at UCLH
Data discovery and sharing at UCLH
Jisc
 
FAIR data
FAIR dataFAIR data
FAIR data
Sarah Jones
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, London
Paul Agapow
 
David Park APAN Slid..
David Park APAN Slid..David Park APAN Slid..
David Park APAN Slid..Videoguy
 
Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...
Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...
Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...
EOSCpilot .eu
 
Building and Operating National Open Science Research Infrastructures - the e...
Building and Operating National Open Science Research Infrastructures - the e...Building and Operating National Open Science Research Infrastructures - the e...
Building and Operating National Open Science Research Infrastructures - the e...
African Open Science Platform
 
Application of Assent in the safe - Networkshop44
Application of Assent in the safe -  Networkshop44Application of Assent in the safe -  Networkshop44
Application of Assent in the safe - Networkshop44
Jisc
 

What's hot (20)

Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
 
Ticer summer school_24_aug06
Ticer summer school_24_aug06Ticer summer school_24_aug06
Ticer summer school_24_aug06
 
SKA NZ R&D BeSTGRID Infrastructure
SKA NZ R&D BeSTGRID InfrastructureSKA NZ R&D BeSTGRID Infrastructure
SKA NZ R&D BeSTGRID Infrastructure
 
Supporting the community-owned open scholarly communications ecosystem
Supporting the community-owned open scholarly communications ecosystemSupporting the community-owned open scholarly communications ecosystem
Supporting the community-owned open scholarly communications ecosystem
 
Jisc's new shared data centre
Jisc's new shared data centreJisc's new shared data centre
Jisc's new shared data centre
 
Supporting Research Data Management at the University of Stirling
Supporting Research Data Management at the University of StirlingSupporting Research Data Management at the University of Stirling
Supporting Research Data Management at the University of Stirling
 
Global Research Data Initiatives
Global Research Data InitiativesGlobal Research Data Initiatives
Global Research Data Initiatives
 
Creating a Data Management Plan for your Grant Application
Creating a Data Management Plan for your Grant ApplicationCreating a Data Management Plan for your Grant Application
Creating a Data Management Plan for your Grant Application
 
Data quality supporting AI in Life Sciences webinar 10 dec 2018
Data quality supporting AI in Life Sciences webinar 10 dec 2018Data quality supporting AI in Life Sciences webinar 10 dec 2018
Data quality supporting AI in Life Sciences webinar 10 dec 2018
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
 
091020 E Research Otago
091020 E Research Otago091020 E Research Otago
091020 E Research Otago
 
Data accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereData accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphere
 
Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016
 
Data discovery and sharing at UCLH
Data discovery and sharing at UCLHData discovery and sharing at UCLH
Data discovery and sharing at UCLH
 
FAIR data
FAIR dataFAIR data
FAIR data
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, London
 
David Park APAN Slid..
David Park APAN Slid..David Park APAN Slid..
David Park APAN Slid..
 
Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...
Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...
Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...
 
Building and Operating National Open Science Research Infrastructures - the e...
Building and Operating National Open Science Research Infrastructures - the e...Building and Operating National Open Science Research Infrastructures - the e...
Building and Operating National Open Science Research Infrastructures - the e...
 
Application of Assent in the safe - Networkshop44
Application of Assent in the safe -  Networkshop44Application of Assent in the safe -  Networkshop44
Application of Assent in the safe - Networkshop44
 

Similar to CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned from the Human Cell Atlas and other federated data projects

RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
Carole Goble
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
Eduserv
 
EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data Commons
Vivien Bonazzi
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
African Open Science Platform
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
African Open Science Platform
 
Virtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
Virtual BenchLearning - DeepHealth - Needs & Requirements for BenchmarkingVirtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
Virtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
Big Data Value Association
 
BioIT 2018 'Easier integration and enrichment of your data by making public d...
BioIT 2018 'Easier integration and enrichment of your data by making public d...BioIT 2018 'Easier integration and enrichment of your data by making public d...
BioIT 2018 'Easier integration and enrichment of your data by making public d...
Hans Constandt
 
I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17
Tom Nyongesa
 
Open data pilot
Open data pilotOpen data pilot
Open data pilot
Sarah Jones
 
Cyber Infrastructure for Research & Education in Canada. What is Canada's vis...
Cyber Infrastructure for Research & Education in Canada. What is Canada's vis...Cyber Infrastructure for Research & Education in Canada. What is Canada's vis...
Cyber Infrastructure for Research & Education in Canada. What is Canada's vis...
CANARIE Inc.
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
Graham Pryor
 
Turning FAIR data into reality
Turning FAIR data into realityTurning FAIR data into reality
Turning FAIR data into reality
Sarah Jones
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
African Open Science Platform
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
Data Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked DataData Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked Data
Semantic Web Company
 
LIBER Webinar: Turning FAIR Data Into Reality
LIBER Webinar: Turning FAIR Data Into RealityLIBER Webinar: Turning FAIR Data Into Reality
LIBER Webinar: Turning FAIR Data Into Reality
LIBER Europe
 
Delivering Faster Insights with a Logical Data Fabric
Delivering Faster Insights with a Logical Data FabricDelivering Faster Insights with a Logical Data Fabric
Delivering Faster Insights with a Logical Data Fabric
Denodo
 
FAIR play?
FAIR play? FAIR play?
FAIR play?
Sarah Jones
 

Similar to CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned from the Human Cell Atlas and other federated data projects (20)

RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data Commons
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
 
Virtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
Virtual BenchLearning - DeepHealth - Needs & Requirements for BenchmarkingVirtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
Virtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
 
BioIT 2018 'Easier integration and enrichment of your data by making public d...
BioIT 2018 'Easier integration and enrichment of your data by making public d...BioIT 2018 'Easier integration and enrichment of your data by making public d...
BioIT 2018 'Easier integration and enrichment of your data by making public d...
 
I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17
 
Open data pilot
Open data pilotOpen data pilot
Open data pilot
 
Cyber Infrastructure for Research & Education in Canada. What is Canada's vis...
Cyber Infrastructure for Research & Education in Canada. What is Canada's vis...Cyber Infrastructure for Research & Education in Canada. What is Canada's vis...
Cyber Infrastructure for Research & Education in Canada. What is Canada's vis...
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
 
Turning FAIR data into reality
Turning FAIR data into realityTurning FAIR data into reality
Turning FAIR data into reality
 
Wiser2009 Luis Martinez
Wiser2009 Luis MartinezWiser2009 Luis Martinez
Wiser2009 Luis Martinez
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Data Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked DataData Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked Data
 
LIBER Webinar: Turning FAIR Data Into Reality
LIBER Webinar: Turning FAIR Data Into RealityLIBER Webinar: Turning FAIR Data Into Reality
LIBER Webinar: Turning FAIR Data Into Reality
 
Delivering Faster Insights with a Logical Data Fabric
Delivering Faster Insights with a Logical Data FabricDelivering Faster Insights with a Logical Data Fabric
Delivering Faster Insights with a Logical Data Fabric
 
FAIR play?
FAIR play? FAIR play?
FAIR play?
 

More from CINECAProject

CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECAProject
 
Beacon v2 Reference Implementation: An Overview
Beacon v2 Reference Implementation: An OverviewBeacon v2 Reference Implementation: An Overview
Beacon v2 Reference Implementation: An Overview
CINECAProject
 
Lighting a Beacon: training for (future) implementers
Lighting a Beacon: training for (future) implementersLighting a Beacon: training for (future) implementers
Lighting a Beacon: training for (future) implementers
CINECAProject
 
CINECA webinar slides: Ethics/ELSI considerations - From FAIR to fair data sh...
CINECA webinar slides: Ethics/ELSI considerations - From FAIR to fair data sh...CINECA webinar slides: Ethics/ELSI considerations - From FAIR to fair data sh...
CINECA webinar slides: Ethics/ELSI considerations - From FAIR to fair data sh...
CINECAProject
 
CINECA webinar slides: How to make training FAIR
CINECA webinar slides: How to make training FAIRCINECA webinar slides: How to make training FAIR
CINECA webinar slides: How to make training FAIR
CINECAProject
 
CINECA webinar slides: FAIR software tools
CINECA webinar slides: FAIR software toolsCINECA webinar slides: FAIR software tools
CINECA webinar slides: FAIR software tools
CINECAProject
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
CINECAProject
 
CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...
CINECAProject
 
CINECA webinar slides: Status Update Code of Conduct: Teaming up & Talking ab...
CINECA webinar slides: Status Update Code of Conduct: Teaming up & Talking ab...CINECA webinar slides: Status Update Code of Conduct: Teaming up & Talking ab...
CINECA webinar slides: Status Update Code of Conduct: Teaming up & Talking ab...
CINECAProject
 
CINECA webinar slides: H3ABioNet Experiences in Phenotype Data Harmonisation ...
CINECA webinar slides: H3ABioNet Experiences in Phenotype Data Harmonisation ...CINECA webinar slides: H3ABioNet Experiences in Phenotype Data Harmonisation ...
CINECA webinar slides: H3ABioNet Experiences in Phenotype Data Harmonisation ...
CINECAProject
 
CINECA webinar slides: Ethical, legal and societal issues in international da...
CINECA webinar slides: Ethical, legal and societal issues in international da...CINECA webinar slides: Ethical, legal and societal issues in international da...
CINECA webinar slides: Ethical, legal and societal issues in international da...
CINECAProject
 

More from CINECAProject (11)

CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...
 
Beacon v2 Reference Implementation: An Overview
Beacon v2 Reference Implementation: An OverviewBeacon v2 Reference Implementation: An Overview
Beacon v2 Reference Implementation: An Overview
 
Lighting a Beacon: training for (future) implementers
Lighting a Beacon: training for (future) implementersLighting a Beacon: training for (future) implementers
Lighting a Beacon: training for (future) implementers
 
CINECA webinar slides: Ethics/ELSI considerations - From FAIR to fair data sh...
CINECA webinar slides: Ethics/ELSI considerations - From FAIR to fair data sh...CINECA webinar slides: Ethics/ELSI considerations - From FAIR to fair data sh...
CINECA webinar slides: Ethics/ELSI considerations - From FAIR to fair data sh...
 
CINECA webinar slides: How to make training FAIR
CINECA webinar slides: How to make training FAIRCINECA webinar slides: How to make training FAIR
CINECA webinar slides: How to make training FAIR
 
CINECA webinar slides: FAIR software tools
CINECA webinar slides: FAIR software toolsCINECA webinar slides: FAIR software tools
CINECA webinar slides: FAIR software tools
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
 
CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...
 
CINECA webinar slides: Status Update Code of Conduct: Teaming up & Talking ab...
CINECA webinar slides: Status Update Code of Conduct: Teaming up & Talking ab...CINECA webinar slides: Status Update Code of Conduct: Teaming up & Talking ab...
CINECA webinar slides: Status Update Code of Conduct: Teaming up & Talking ab...
 
CINECA webinar slides: H3ABioNet Experiences in Phenotype Data Harmonisation ...
CINECA webinar slides: H3ABioNet Experiences in Phenotype Data Harmonisation ...CINECA webinar slides: H3ABioNet Experiences in Phenotype Data Harmonisation ...
CINECA webinar slides: H3ABioNet Experiences in Phenotype Data Harmonisation ...
 
CINECA webinar slides: Ethical, legal and societal issues in international da...
CINECA webinar slides: Ethical, legal and societal issues in international da...CINECA webinar slides: Ethical, legal and societal issues in international da...
CINECA webinar slides: Ethical, legal and societal issues in international da...
 

Recently uploaded

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 

Recently uploaded (20)

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 

CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned from the Human Cell Atlas and other federated data projects

  • 1. This project has received funding from the European Union’s Horizon 2020 research and Innovation programme under grant agreement No. 825775 Data Gravity in the Life Sciences: Lessons learned from the Human Cell Atlas and other federated data projects Presenter: Tony Burdett (EMBL-EBI) Host: Marta Lloret Llinares (EMBL-EBI)
  • 2. This webinar is being recorded
  • 3. Audience Q&A Session Please write your questions in the questions window of the GoToWebinar application
  • 4. The challenges: Stay informed @CinecaProject www.cineca-project.eu Common Infrastructure for National Cohorts in Europe, Canada and Africa This project has received funding from the European Union’s Horizon 2020 research and Innovation programme under grant agreement No. 825775 Accelerating disease research and improving health by facilitating transcontinental human data exchange The vision: This project has received funding from the Canadian Institute of Health Research under grant agreement #404896
  • 5. Today’s presenter Tony Burdett leads the Archival Infrastructure and Technology team, which develops services and provides technology to support the activities of EMBL-EBI’s molecular archives, including data submission, storage, validation, coordination and presentation. Tony joined EMBL-EBI in 2005 and has personally built and led development teams for many resources such as the GWAS Catalog, ArrayExpress, the Expression Atlas and BioSamples. His team now develops the ingestion service for the Human Cell Atlas Data Coordination Platform, EMBL-EBI’s Unified Submission Interface, and the BioSamples database.
  • 6. Lessons learned from the Human Cell Atlas and other federated data projects Data Gravity in the Life Sciences Tony Burdett, EMBL-EBI 12th November, 2020
  • 7.
  • 8. A bit about me… • I joined EBI in 2005 • I have a biological and medical background • My career has been heavily focused on service engineering in bioinformatics • I’ve built, helped develop, or run the development teams for… • ArrayExpress • Expression Atlas • BioSamples • Ontology tooling • GWAS Catalog • Human Cell Atlas DCP
  • 9. Data Gravity I didn’t coin the term... https://datagravitas.com/2010/12/07/data-gravity-in-the-clouds/
  • 10. vR BC G = “Let data gravity of a given dataset, G, be the product of data volume, V and the regulatory restrictions of the region in which the data was generated, R, over the bandwidth at the location of the data, B, and the cost of compute in that location, C” Data Gravity Background photo created by rawpixel.com - www.freepik.com
  • 14. Why does “data gravity” matter?
  • 15. Who uses EMBL-EBI services?
  • 16. Percentage of whole genomes and exomes that are funded solely by healthcare systems 2012 ~1% 2017 ~20% 2022 >80% Changing Genomic Data Generation Landscape
  • 17.
  • 18.
  • 20. Big Data in Digital Biology: EMBL-EBI 2015-2019 Public Web Infrastructure • Web Requests: 27M → 40M/day • Unique Host IPs: 1.1M → 2.4M/month • Web Jobs: 138M → 145M/year • Search Requests: 272M → 551M/year 6.2PB → 22.7PB 1600VMs → 3100VMs (TB) 450TB → 973TB Slide acknowledgment: Steven Newhouse
  • 23. Collating Data for Analysis Data being analysed Cohort datasets Reference annotation datasets Proprietary, firewalled datasets
  • 25. ● Data and Data Sciences are core elements of Health Research and Innovation and in all elements of Biopharma Research ● The impact and reuse of data is rapidly growing - but nearly 80% of investment is spent assembling and harmonizing data Bottleneck: FAIR Data Forbes article on 2016 Data Scientist Report
  • 26. Cost of not having FAIR research data: €26bn/yr in Europe https://dx.doi.org/10.2777/02999 Impact on innovation
  • 27. Bottleneck: Data Federation • National genomics initiatives in most European countries • Primary goal healthcare diagnostics and personalised medicine • Federated EGA is a harmonised platform for human data discovery, access, distribution, coordinated via ELIXIR human data community • Central EGA: International submissions+helpdesk • Local EGA: Host data locally, share metadata, national node for submissions and/or helpdesk • EGA community: Host data locally, share metadata
  • 28. Bottleneck: Reproducible Research and Analysis Figure courtesy of: https://esciencelab.org.uk/projects/eosclife/
  • 29. @CinecaProject CINECA - Federated Analysis Data sources EGA Biobanks CHILD H3ABioNet .. WP1 Federated data discovery - Phenotype - Genotype - Data use WP4 Federated research - Federated GWAS - Federated Genomic Analyses WP3 Cohort Level Meta Data Representation WP2 AAI - Europe, Canada, Africa interoperability
  • 30. Sending Compute to Data… Globally? • Global data storage and analysis infrastructures required • Generating truly portable analysis workflows is complex - and we don’t have good solutions yet • Some high powered spacecraft still need building!
  • 31. Overcoming Data Gravity DEPENDS ON... Costs of compute Network bandwidth Data sharing regulations Data volumes
  • 32. “Cloud native” is the answer!
  • 33. Human Cell Atlas - profiling millions of human cells Global effort requiring: • Hundreds of labs • Organ-specific data • Disparate experimental techniques and data types Integrating data at this scale requires next generation technology and infrastructure
  • 34. Comprehensive Inclusive Organized Dynamic G en eti cs Accessible Tom Deerinck, NIGMS, NIH Human Cell Atlas Data Coordination Platform To bridge disparate data, tools and research from all over the world, we must bring them together in a public platform (the “HCA DCP”) that is:
  • 35. Labs contribute single-cell data DCP pipelines upload authors data and process Researchers access data on the portal Researchers find community tools to work with the data How it works: the DCP data flow
  • 37. Outcomes Downloads (Metadata) Downloads (Raw and Analysed Data) Checkout to Terra (to work on in analysis platform) HCA DCP Data Browser Statistics from Q3 2020, from a total 2671 data access requests
  • 38. “Cloud native” engineering is not enough to change behaviour Lessons Learned • The DCP adopted a heavily “cloud native” engineering approach • Services are somewhat traditional • Data archive (both raw and summary results) • Analysis pipeline • Engineered with cloud technology (has no impact to users) • All the data lives in AWS or GCP, in US-East (expensive to download) • Analysis platform available (but underused)
  • 39.
  • 40. Strategic Implications Data Gravity in the life sciences tells us we need a culture change
  • 41. Strategic Implications Data Gravity in the life sciences tells us we need a culture change Federating data and analysis requires: 1. Standards 2. Data provider adoption 3. Data consumer adoption 4. Understanding and considering data gravity
  • 42. Strategic Implications Data Gravity in the life sciences tells us we need a culture change Federating data and analysis requires: 1. Standards 2. Data provider adoption 3. Data consumer adoption 4. Understanding and considering data gravity SKILLS
  • 43. Strategic Implications Data Gravity in the life sciences tells us we need a culture change Federating data and analysis requires: 1. Standards 2. Data provider adoption 3. Data consumer adoption 4. Understanding and considering data gravity SKILLS INCENTIVES
  • 44. Strategic Implications Data Gravity in the life sciences tells us we need a culture change Federating data and analysis requires: 1. Standards 2. Data provider adoption 3. Data consumer adoption 4. Understanding and considering data gravity SKILLS INCENTIVES COSTS
  • 45.
  • 46. Credit to: Ian Harrow, FAIR & OM projects FAIR as enabler for the digital transformation Slide credit: Susanna Sansone 46 ● Data providers improve their own returns by implementing the FAIR Principles - gathering traction in big pharma ● FAIR enables powerful new AI analytics to access data for machine learning and prediction ● Requirements ○ financial, technical, training ● Challenges ○ change the culture, show business value, achieve the ‘FAIR enough’ ○ Sustain FAIR solutions and activities
  • 48. Top Tips: Driving Data Consumer Adoption 1. Identify good measures of value • What can I do faster, cheaper, better? • How many people are using your cloud platform vs downloading data? 2. Start small and expand • Big re-engineering efforts are costly, risky, and too slow to keep up with the rate of change in the field 3. Find some exemplars • Are there smaller sets of data that are high value? • Can you pilot approaches within communities? 4. Invest in training and outreach • Even if data is federated and the cloud platform exists, many bioinformaticians do not have the skills to exploit them
  • 50. vR BC G = “Let data gravity of a given dataset, G, be the product of data volume, V and the regulatory restrictions of the region in which the data was generated, R, over the bandwidth at the location of the data, B, and the cost of compute in that location, C” Data Gravity
  • 51. The AIT Team at EMBL-EBI Acknowledgements
  • 53. Questions? Title: Data Gravity in the Life Sciences: Lessons learned from the Human Cell Atlas and other federated data projects Presenter: Tony Burdett Please write your questions in the questions window of the GoToWebinar application