Biomedical Data Sciences – New
Name with New Opportunities for
Change?
Philip E. Bourne PhD, FACMI
Stephenson Dean, School of Data Science
Professor of Biomedical Engineering
peb6a@virginia.edu
https://www.slideshare.net/pebourne
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
@pebourne
Objectives:
To say something useful as India
contemplates a biomedical data
sharing policy and a central biomedical
data source
AKA
What would I do differently than what
has gone before?
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Disclaimer:
I am speaking for myself and my
academic institution not for a
federal agency
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Perspective
• Researcher
• Database provider
• Chief Data Officer/Funder,
US NIH
• Dean, School of Data
Science
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Why? –The Human Factor
Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, I3CK, 2013, isbn: 978-3-642-37186-8
Josh Sommer
Chordoma Foundation
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Probability of
finding the data
associated with a
paper declined by
17% every year
Vines, Timothy et al. “The
Availability of Research Data
Declines Rapidly with Article
Age.” Current Biology (June 1,
2014)
Image: Nature doi:10.1038/nature.2013.14416
Why? – The Economic Factor
Data Availability Declines Over Time
A Waste of Taxpayers Money
ALMOST ALL DATA LOST 10-15 YRS AFTER PUBLICATION
[Adapted from Emma Ganley @ PLOS]
Why? – Science is Changing
The interface between biology and
computer science will be different
going forward than that which saw the
bioinformatics revolution
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
First The History of Bioinformatics
According to Bourne
1980s 1990s 2000s 2010s 2020
Discipline:
Unknown Expt. Driven Emergent Over-sold A Service A Partner A Driver
The Raw Material:
Non-existent Limited /Poor More/Ontologies Big Data/Siloed Open/Integrated
The People:
No name Technicians Industry recognition data scientists Academics
Searls (ed) The Roots in Bioinformatics Series PLOS Comp Biol
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
https://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)
https://www.microsoft.com/en-us/research/wp-
content/uploads/2009/10/Fourth_Paradigm.pdf
https://twitter.com/aip_publishing/status/856825353645559808
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
We Have Reached the 4th
Paradigm
Which means …
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Data Will Have Greater Volume & Variety
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
DNA Sequence Data Since the Human Genome
http://synbio.info/display/synbio/Genetic+data+likely+to+become+the+biggest+big+data+in+2025
https://slideplayer.com/slide/7795050/
Data Will Be More Complex
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Contents of the Protein Data Bank
Model
Transportability
Horizontal
Integration
Multi-scale
Integration
human
mouse
zebrafish
DNA
Gene/Protein
Network
Cell
Tissue
Organ
Body
Population
CNV SNP methylation
3D structure Gene
expression Proteomics
Metabolomics
MetabolicSignaling
transduction
Gene
regulation
Hepatic Myoepithelial Erythrocyte
Epithelial Muscle Nervous
Liver Kidney Pancreas Heart
Physiologically based
pharmacokinetics
GWASPopulation
dynamics
Microbiota
From Harnessing Big Data for Systems Pharmacology 2017
https://doi.org/10.1146/annurev-pharmtox-010716-104659
The Problems to be Tackled Will
Require Integration Across Scales
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Gohlke et al. 2019
Real World Evidence for Preventive Effects of Statins on
Cancer Incidence: A Transatlantic Analysis
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
EHR
Animal Models
Pathways
Data Science will Drive Change
Which Means:
• Biomedical data science will need to be:
– Less siloed
– More collaborative
– More open
– More transparent
– More humble
– Listen more to other disciplines
• Consequently …
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Take Home
Science is/will be different than when
NCBI/EBI were formed and NIH data
sharing went into effect. This calls for a
somewhat different response.
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Lets Start with Data Sharing
Many good things have happened, but
more is needed…
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Laying the Foundation :
HGP, Bermuda, 1996
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
“The HGP changed the norms around data sharing
in biomedical research.”
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Data Sharing: An Essential Component
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
A Culture of Sharing
1999 20042003 2007 20142008
Research
Tools Policy
NIH Data Sharing
Policy
Model
Organism
Policy
Genome-wide
Association
(GWAS) Policy
2012
NIH Public
Access Policy
(Publications)
Big Data to
Knowledge (BD2K)
Initiative
Genomic Data
Sharing (GDS)
Policy
Modernization of
NIH Clinical Trials
White House
Initiative
(2013 “Holdren
Memo”)
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Options of scaled implementation for sharing datasets
• PMC stores publication-
related supplemental
materials and datasets
directly associated
publications. Up to 2 GB.
• Generate Unique
Identifiers for the stored
supplementary materials
and datasets.
Use of commercial and
non-profit repositories
STRIDES Cloud Partners
• Store and manage large
scale, high priority NIH
datasets. (Partnership with
STRIDES)
• Assign Unique Identifiers,
implement authentication,
authorization and access
control.
Datasets up to 2 gigabytes Datasets up to 20*gigabytes High Priority Datasets
petabytes
PubMed Central
• Assign Unique Identifiers
to datasets associated
with publications and link
to PubMed.
• Store and manage
datasets associated with
publication, up to 20* GB.
NIH strongly encourages
open access Data Sharing Repositories
as a first choice.
https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html
US NIH Current Model
[Adapted from Susan Gregurick]
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Data Sharing Goes Global: GA4GH
Global Alliance for Genomics and
Health• Accelerating the potential of genomic medicine to
advance human health, by:
– Establishing common framework of approaches to enable
effective, responsible sharing of genomic and clinical data
– Catalyzing data sharing projects that drive and
demonstrate value of data sharing
• Alliance*: >350 leading institutions (healthcare,
research, advocacy, life science, IT) representing 35
countries
• Working groups (Clinical, Data, Security, Regulatory &
Ethics) assess, prioritize needs
– Form task teams to produce tools, solutions,
demonstration projects
*Statistics as of October 5, 2015
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Data Sharing - What I Would Do
Differently
• Do not make it a non-funded mandate
• Create a data sharing policy with teeth
• Pull the levers – publishers, scientists and
funders
• Make it easy to conform
• Provide incentives to conform
• Think more carefully about consent
• Plan for patient reidentification
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Funders Must Provide Incentives to
Conform ..
• Funding agencies must support the policy and
create incentives to conform – no data
deposition – no money
• Data must include appropriate metadata
• Tools to support data quality must be funded
• Data curators must be academically
celebrated
• Data must be considered a first class research
product…
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
https://www.force11.org/datacitation
principles
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
An Indian NCBI/EBI - What I Would Do
Differently
• Move from pipes to platforms
– Have a Commons
– A research infrastructure that builds upon itself
• Be sure the community thinks they own it
• Push the work out from the center to the
community
• Be sustainable
• Be FAIR from the start – not in retrospect
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
An Indian NCBI/EBI - What I Would Do
Differently
• Move from pipes to platforms
– Have a Commons
– A research infrastructure that builds upon itself
• Be sure the community thinks they own it
• Push the work out from the center to the
community
• Be sustainable
• Be FAIR from the start – not in retrospect
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Databases
organize data
around a project.
Data warehouses
organize the data
for an organization
Data commons
organize the data
for a scientific
discipline or field
Data
Warehouse
[Adapted from Bob Grossman]
Data Ecosystems
Have a Commons
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
If platforms are the answer we could
ask the question…
Will biomedical research become more
like Airbnb?
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Vivien Bonazzi
Should biomedical research be Like Airbnb?
doi: 10.1371/journal.pbio.2001818
I am Not Crazy, Hear Me Out
• Airbnb is a platform that supports a trusted relationship
between consumer (renter) and supplier (host)
• The platform focuses on maximizing the exchange of services
between supplier and consumer and maximizing the amount
of trust associated with a given stakeholder
• It seems to be working:
– 60 million users searching 2 million listings in 192 countries
– Average of 500,000 stays per night.
– Evaluation of US $25bn
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Should biomedical research be Like Airbnb?
doi: 10.1371/journal.pbio.2001818
Platforms will ultimately digitally
integrate the scholarly workflow for
human and machine analysis
Should biomedical research be Like Airbnb?
doi: 10.1371/journal.pbio.2001818
4TH IIC/Open Health Systems Colloquium
Delhi
7-Feb-2020
Paper Author Paper Reader
Data Provider Data Consumer
Employer Employee
Reagent Provider Reagent Consumer
Software Provider Software Consumer
Grant Writer Grant Reviewer
Supplier Consumer Platform
MS Project
Google Drive
Coursera
Researchgate
Academia.edu
Open Science
Framework
Synapse
F1000
Rio
Educator Student
Open Data Lab (ODL) Pilot Underway
4TH IIC/Open Health Systems Colloquium
Delhi
7-Feb-2020
NIH Data Commons Architecture
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
NCI Genomic Data Commons*
• The GDC makes over 2.5 PB
of data available for access
via an API, analysis by cloud
resources on public clouds,
and downloading.
• In an average month, the
GDC is used by over 22,000
users and over 2 PB of data
are downloaded.
• The GDC is based upon an
open source software stack
that can be used to build
other data commons.
*See: NCI Genomic Data Commons: Grossman, Robert L., et al. "Toward a shared vision for cancer
genomic data." New England Journal of Medicine 375.12 (2016): 1109-1112.
The GDC consists of a 1) data exploration & visualization portal (DAVE and
cDAVE), 2) data submission portal, 3) data analysis and harmonization system
system (GPAS), 4) an API so third party can build applications.
[Adapted from Bob Grossman]7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Impediments to a Biomedical Platform
• Current work practices by all stakeholders
• Entrenched business models
• Size of the undertaking aka resources
needed
• Trust
• Incentives to use the platform
http://www.forbes.com/sites/johnhall/2013/04/29/1
0-barriers-to-employee-innovation/#8bdbaa811133
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
An Indian NCBI/EBI - What I Would Do
Differently
• Move from pipes to platforms
– Have a Commons
– A research infrastructure that builds upon itself
• Be sure the community thinks they own it
• Push the work out from the center to the
community
• Be sustainable
• Be FAIR from the start – not in retrospect
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Community
• Implement what the community tells you
• Push as much of the data curation as possible
back to the depositor through incentives
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
An Indian NCBI/EBI - What I Would Do
Differently
• Move from pipes to platforms
– Have a Commons
– A research infrastructure that builds upon itself
• Be sure the community thinks they own it
• Push the work out from the center to the
community
• Be sustainable
• Be FAIR from the start – not in retrospect
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
https://doi.org/10.1371/journal.pbio.2002617
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
• The goal of the Open Science Prize is to stimulate the
development of novel and ground-breaking tools and platforms to
enable the reuse and repurposing of open digital research objects
(e.g. data, publications, other research outputs) relevant to
biomedical or health applications. Two-phase prize competition.
• International Teams will compete for 6 awards of $80K each in
Phase I, & a final prize of $230K for the best prototype in Phase II.
• Information webinar to be held on December 10, 2015.
• Phase I applications are due February 29, 2016 (Leap Day!)
www.OpenSciencePrize.org
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Global Biodata Coalition Update
1st Interim Board Meeting held October 2019
• Definitive discussions of governance, mission, etc – leading to…
GBC Funders Agreement developed (an informal document, no legal status)
• Consultation on document under way; signing begins March 2020 (at Singapore
meeting…
Next Board meeting March 2020 in Singapore
• To finalize Funders Agreement and obtain initial signatures
Scientific activities underway
• Developing selection process for identifying Global Core Data Resources
• Planning to undertake a global inventory of data resources and their funding to inform
GBC strategy
Stakeholder outreach and communications with multiple presentations
• Biocuration 2019 (Cambridge April 2019), CODATA Workshop (Beijing Sep 2019), Plant
and Animal Genomes (San Diego Jan 2020), ELIXIR Open Day (Cambridge Feb 2020),
Biocuration 2020 (Bar Harbor May 2020)
GBC Secretariat established
• Two full-time staff hired, branding and logo completed, and website to be launched soon
Interim funding supports GBC until end of 2020
• Budget of 1.3M USD proposed for 2021, rising to 1.9M at end of 5-year Strategic Plan
[From Eric Green, NHGRI, NIH]
An Indian NCBI/EBI - What I Would Do
Differently
• Move from pipes to platforms
– Have a Commons
– A research infrastructure that builds upon itself
• Be sure the community thinks they own it
• Push the work out from the center to the
community
• Be sustainable
• Be FAIR from the start – not in retrospect
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
Guiding Principles for Scientific Data
Management and Stewardship
Sci Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi
F A I R
A I
Why the FAIR Principles?
• Being Fair
• Data Driven Bioscience
• Usual data sharing bleating
• The norms of science*
• Modern scholarship
• Knowledge exchange,
pooling
• Team Science
• Reproducibility,
accountability,
transparency, best practice
• A rallying cry • Merton’s CUDOS - four norms of scientific behaviour (1942)
• Shapin, Pump and Circumstance (1984)
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi [Adapted from Carole Goble]
7-Feb-2020
In 2005 I Had a Dream
0. Paper is but one view
1. User clicks on
thumbnail
2. FAIR data provide a
rendered image that
can be annotated
3. Selecting a features
provides a
database/literature
mashup
4. That leads to new
papers
1. A link brings up figures
from the paper
0. Full text of PLoS papers stored
in a database
2. Clicking the paper figure retrieves
data from the PDB which is
analyzed
3. A composite view of
journal and database
content results
4. The composite view has
links to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
PLoS Comp. Biol. 2005 1(3) e34
4TH IIC/Open Health Systems Colloquium
Delhi
Thank You!
Questions?
peb6a@virginia.edu @pebourne
7-Feb-2020
4TH IIC/Open Health Systems Colloquium
Delhi

Biomedical Data Sciences - New Name and New Opportunities for Change?

  • 1.
    Biomedical Data Sciences– New Name with New Opportunities for Change? Philip E. Bourne PhD, FACMI Stephenson Dean, School of Data Science Professor of Biomedical Engineering peb6a@virginia.edu https://www.slideshare.net/pebourne 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi @pebourne
  • 2.
    Objectives: To say somethinguseful as India contemplates a biomedical data sharing policy and a central biomedical data source AKA What would I do differently than what has gone before? 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 3.
    Disclaimer: I am speakingfor myself and my academic institution not for a federal agency 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 4.
    Perspective • Researcher • Databaseprovider • Chief Data Officer/Funder, US NIH • Dean, School of Data Science 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 5.
    Why? –The HumanFactor Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, I3CK, 2013, isbn: 978-3-642-37186-8 Josh Sommer Chordoma Foundation 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 6.
    Probability of finding thedata associated with a paper declined by 17% every year Vines, Timothy et al. “The Availability of Research Data Declines Rapidly with Article Age.” Current Biology (June 1, 2014) Image: Nature doi:10.1038/nature.2013.14416 Why? – The Economic Factor Data Availability Declines Over Time A Waste of Taxpayers Money ALMOST ALL DATA LOST 10-15 YRS AFTER PUBLICATION [Adapted from Emma Ganley @ PLOS]
  • 7.
    Why? – Scienceis Changing The interface between biology and computer science will be different going forward than that which saw the bioinformatics revolution 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 8.
    First The Historyof Bioinformatics According to Bourne 1980s 1990s 2000s 2010s 2020 Discipline: Unknown Expt. Driven Emergent Over-sold A Service A Partner A Driver The Raw Material: Non-existent Limited /Poor More/Ontologies Big Data/Siloed Open/Integrated The People: No name Technicians Industry recognition data scientists Academics Searls (ed) The Roots in Bioinformatics Series PLOS Comp Biol 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 9.
  • 10.
    Which means … 7-Feb-2020 4THIIC/Open Health Systems Colloquium Delhi
  • 11.
    Data Will HaveGreater Volume & Variety 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi DNA Sequence Data Since the Human Genome http://synbio.info/display/synbio/Genetic+data+likely+to+become+the+biggest+big+data+in+2025 https://slideplayer.com/slide/7795050/
  • 12.
    Data Will BeMore Complex 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi Contents of the Protein Data Bank
  • 13.
    Model Transportability Horizontal Integration Multi-scale Integration human mouse zebrafish DNA Gene/Protein Network Cell Tissue Organ Body Population CNV SNP methylation 3Dstructure Gene expression Proteomics Metabolomics MetabolicSignaling transduction Gene regulation Hepatic Myoepithelial Erythrocyte Epithelial Muscle Nervous Liver Kidney Pancreas Heart Physiologically based pharmacokinetics GWASPopulation dynamics Microbiota From Harnessing Big Data for Systems Pharmacology 2017 https://doi.org/10.1146/annurev-pharmtox-010716-104659 The Problems to be Tackled Will Require Integration Across Scales 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 14.
    Gohlke et al.2019 Real World Evidence for Preventive Effects of Statins on Cancer Incidence: A Transatlantic Analysis 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi EHR Animal Models Pathways
  • 15.
    Data Science willDrive Change Which Means: • Biomedical data science will need to be: – Less siloed – More collaborative – More open – More transparent – More humble – Listen more to other disciplines • Consequently … 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 16.
    Take Home Science is/willbe different than when NCBI/EBI were formed and NIH data sharing went into effect. This calls for a somewhat different response. 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 17.
    Lets Start withData Sharing Many good things have happened, but more is needed… 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 18.
    Laying the Foundation: HGP, Bermuda, 1996 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 19.
    7-Feb-2020 4TH IIC/Open HealthSystems Colloquium Delhi
  • 20.
    “The HGP changedthe norms around data sharing in biomedical research.” 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 21.
    Data Sharing: AnEssential Component 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 22.
    A Culture ofSharing 1999 20042003 2007 20142008 Research Tools Policy NIH Data Sharing Policy Model Organism Policy Genome-wide Association (GWAS) Policy 2012 NIH Public Access Policy (Publications) Big Data to Knowledge (BD2K) Initiative Genomic Data Sharing (GDS) Policy Modernization of NIH Clinical Trials White House Initiative (2013 “Holdren Memo”) 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 23.
    Options of scaledimplementation for sharing datasets • PMC stores publication- related supplemental materials and datasets directly associated publications. Up to 2 GB. • Generate Unique Identifiers for the stored supplementary materials and datasets. Use of commercial and non-profit repositories STRIDES Cloud Partners • Store and manage large scale, high priority NIH datasets. (Partnership with STRIDES) • Assign Unique Identifiers, implement authentication, authorization and access control. Datasets up to 2 gigabytes Datasets up to 20*gigabytes High Priority Datasets petabytes PubMed Central • Assign Unique Identifiers to datasets associated with publications and link to PubMed. • Store and manage datasets associated with publication, up to 20* GB. NIH strongly encourages open access Data Sharing Repositories as a first choice. https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html US NIH Current Model [Adapted from Susan Gregurick] 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 24.
    Data Sharing GoesGlobal: GA4GH Global Alliance for Genomics and Health• Accelerating the potential of genomic medicine to advance human health, by: – Establishing common framework of approaches to enable effective, responsible sharing of genomic and clinical data – Catalyzing data sharing projects that drive and demonstrate value of data sharing • Alliance*: >350 leading institutions (healthcare, research, advocacy, life science, IT) representing 35 countries • Working groups (Clinical, Data, Security, Regulatory & Ethics) assess, prioritize needs – Form task teams to produce tools, solutions, demonstration projects *Statistics as of October 5, 2015 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 25.
    Data Sharing -What I Would Do Differently • Do not make it a non-funded mandate • Create a data sharing policy with teeth • Pull the levers – publishers, scientists and funders • Make it easy to conform • Provide incentives to conform • Think more carefully about consent • Plan for patient reidentification 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 26.
    Funders Must ProvideIncentives to Conform .. • Funding agencies must support the policy and create incentives to conform – no data deposition – no money • Data must include appropriate metadata • Tools to support data quality must be funded • Data curators must be academically celebrated • Data must be considered a first class research product… 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 27.
  • 28.
    7-Feb-2020 4TH IIC/Open HealthSystems Colloquium Delhi
  • 29.
    An Indian NCBI/EBI- What I Would Do Differently • Move from pipes to platforms – Have a Commons – A research infrastructure that builds upon itself • Be sure the community thinks they own it • Push the work out from the center to the community • Be sustainable • Be FAIR from the start – not in retrospect 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 30.
    An Indian NCBI/EBI- What I Would Do Differently • Move from pipes to platforms – Have a Commons – A research infrastructure that builds upon itself • Be sure the community thinks they own it • Push the work out from the center to the community • Be sustainable • Be FAIR from the start – not in retrospect 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 31.
    Databases organize data around aproject. Data warehouses organize the data for an organization Data commons organize the data for a scientific discipline or field Data Warehouse [Adapted from Bob Grossman] Data Ecosystems Have a Commons 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 32.
    If platforms arethe answer we could ask the question… Will biomedical research become more like Airbnb? 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi Vivien Bonazzi Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818
  • 33.
    I am NotCrazy, Hear Me Out • Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host) • The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder • It seems to be working: – 60 million users searching 2 million listings in 192 countries – Average of 500,000 stays per night. – Evaluation of US $25bn 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818
  • 34.
    Platforms will ultimatelydigitally integrate the scholarly workflow for human and machine analysis Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818 4TH IIC/Open Health Systems Colloquium Delhi 7-Feb-2020
  • 35.
    Paper Author PaperReader Data Provider Data Consumer Employer Employee Reagent Provider Reagent Consumer Software Provider Software Consumer Grant Writer Grant Reviewer Supplier Consumer Platform MS Project Google Drive Coursera Researchgate Academia.edu Open Science Framework Synapse F1000 Rio Educator Student Open Data Lab (ODL) Pilot Underway 4TH IIC/Open Health Systems Colloquium Delhi 7-Feb-2020
  • 36.
    NIH Data CommonsArchitecture 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 37.
    NCI Genomic DataCommons* • The GDC makes over 2.5 PB of data available for access via an API, analysis by cloud resources on public clouds, and downloading. • In an average month, the GDC is used by over 22,000 users and over 2 PB of data are downloaded. • The GDC is based upon an open source software stack that can be used to build other data commons. *See: NCI Genomic Data Commons: Grossman, Robert L., et al. "Toward a shared vision for cancer genomic data." New England Journal of Medicine 375.12 (2016): 1109-1112. The GDC consists of a 1) data exploration & visualization portal (DAVE and cDAVE), 2) data submission portal, 3) data analysis and harmonization system system (GPAS), 4) an API so third party can build applications. [Adapted from Bob Grossman]7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 38.
    Impediments to aBiomedical Platform • Current work practices by all stakeholders • Entrenched business models • Size of the undertaking aka resources needed • Trust • Incentives to use the platform http://www.forbes.com/sites/johnhall/2013/04/29/1 0-barriers-to-employee-innovation/#8bdbaa811133 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 39.
    An Indian NCBI/EBI- What I Would Do Differently • Move from pipes to platforms – Have a Commons – A research infrastructure that builds upon itself • Be sure the community thinks they own it • Push the work out from the center to the community • Be sustainable • Be FAIR from the start – not in retrospect 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 40.
    Community • Implement whatthe community tells you • Push as much of the data curation as possible back to the depositor through incentives 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 41.
    An Indian NCBI/EBI- What I Would Do Differently • Move from pipes to platforms – Have a Commons – A research infrastructure that builds upon itself • Be sure the community thinks they own it • Push the work out from the center to the community • Be sustainable • Be FAIR from the start – not in retrospect 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 42.
  • 43.
    • The goalof the Open Science Prize is to stimulate the development of novel and ground-breaking tools and platforms to enable the reuse and repurposing of open digital research objects (e.g. data, publications, other research outputs) relevant to biomedical or health applications. Two-phase prize competition. • International Teams will compete for 6 awards of $80K each in Phase I, & a final prize of $230K for the best prototype in Phase II. • Information webinar to be held on December 10, 2015. • Phase I applications are due February 29, 2016 (Leap Day!) www.OpenSciencePrize.org 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 44.
    Global Biodata CoalitionUpdate 1st Interim Board Meeting held October 2019 • Definitive discussions of governance, mission, etc – leading to… GBC Funders Agreement developed (an informal document, no legal status) • Consultation on document under way; signing begins March 2020 (at Singapore meeting… Next Board meeting March 2020 in Singapore • To finalize Funders Agreement and obtain initial signatures Scientific activities underway • Developing selection process for identifying Global Core Data Resources • Planning to undertake a global inventory of data resources and their funding to inform GBC strategy Stakeholder outreach and communications with multiple presentations • Biocuration 2019 (Cambridge April 2019), CODATA Workshop (Beijing Sep 2019), Plant and Animal Genomes (San Diego Jan 2020), ELIXIR Open Day (Cambridge Feb 2020), Biocuration 2020 (Bar Harbor May 2020) GBC Secretariat established • Two full-time staff hired, branding and logo completed, and website to be launched soon Interim funding supports GBC until end of 2020 • Budget of 1.3M USD proposed for 2021, rising to 1.9M at end of 5-year Strategic Plan [From Eric Green, NHGRI, NIH]
  • 45.
    An Indian NCBI/EBI- What I Would Do Differently • Move from pipes to platforms – Have a Commons – A research infrastructure that builds upon itself • Be sure the community thinks they own it • Push the work out from the center to the community • Be sustainable • Be FAIR from the start – not in retrospect 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 46.
    Guiding Principles forScientific Data Management and Stewardship Sci Data 3, 160018 (2016) doi:10.1038/sdata.2016.18 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi
  • 47.
    F A IR A I Why the FAIR Principles? • Being Fair • Data Driven Bioscience • Usual data sharing bleating • The norms of science* • Modern scholarship • Knowledge exchange, pooling • Team Science • Reproducibility, accountability, transparency, best practice • A rallying cry • Merton’s CUDOS - four norms of scientific behaviour (1942) • Shapin, Pump and Circumstance (1984) 7-Feb-2020 4TH IIC/Open Health Systems Colloquium Delhi [Adapted from Carole Goble]
  • 48.
    7-Feb-2020 In 2005 IHad a Dream 0. Paper is but one view 1. User clicks on thumbnail 2. FAIR data provide a rendered image that can be annotated 3. Selecting a features provides a database/literature mashup 4. That leads to new papers 1. A link brings up figures from the paper 0. Full text of PLoS papers stored in a database 2. Clicking the paper figure retrieves data from the PDB which is analyzed 3. A composite view of journal and database content results 4. The composite view has links to pertinent blocks of literature text and back to the PDB 1. 2. 3. 4. PLoS Comp. Biol. 2005 1(3) e34 4TH IIC/Open Health Systems Colloquium Delhi
  • 49.