Implications of the Fourth Paradigm
Philip E. Bourne PhD, FACMI
Stephenson Chair of Data Science
Director, Data Science Institute
Professor of Biomedical Engineering
peb6a@virginia.edu
https://www.slideshare.net/pebourne
10/16/18 ASHG 2018 1
@pebourne
10/16/18 ASHG 2018 2
http://hlwiki.slais.ubc.ca/images/b/bc/Fourth_paradigm.png
Big data and data
science are like the
Internet…
If I asked you to
define them you
would all say
something
different, yet you
use them every
day…
10/16/18 ASHG 2018 3
http://vadlo.com/cartoons.php?id=357
Big Data and Data Science Exemplify the Fourth
Paradigm Yet Definitions are Evasive
Big Data/Data Science – A Working
Definition
• Use of the ever increasing amount of open, complex, diverse
digital data frequently in ubiquitous cloud environments
• Finding ways to ask and then answer relevant questions by
combining such diverse data sets
• Arriving at statistically significant conclusions not otherwise
obtainable
• Sharing such findings in a useful way
• Translating such findings into actions that improve the human
condition
10/16/18 ASHG 2018 4
The Virtuous Data Science Cycle
10/16/18 ASHG 2018 5
Model
Transportability
Horizontal
Integration
Multi-scale
Integration
human
mouse
zebrafish
DNA
Gene/Protein
Network
Cell
Tissue
Organ
Body
Population
CNV SNP methylation
3D structure Gene
expression Proteomics
Metabolomics
MetabolicSignaling
transduction
Gene
regulation
Hepatic Myoepithelial Erythrocyte
Epithelial Muscle Nervous
Liver Kidney Pancreas Heart
Physiologically based
pharmacokinetics
GWASPopulation
dynamics
Microbiota
Open, complex, diverse digital data
Xie et al. Annu Rev Pharmacol Toxicol. 2017 57:245-262
10/16/18 6ASHG 2018
This is Not the Future it is Now
10/16/18 ASHG 2018 7
What of the Future?
10/16/18 ASHG 2018 8
Digitization
Deception
Disruption
Demonetization
Dematerialization
Democratization
Time
Volume,Velocity,Variety
Digital camera invented by
Kodak but shelved
Megapixels & quality improve slowly;
Kodak slow to react
Film market collapses;
Kodak goes bankrupt
Phones replace
cameras
Instagram,
Flickr become the
value proposition
Digital media becomes bona fide
form of communication
From a presentation to the Advisory Board to the NIH Director
Example - Photography
910/16/18 ASHG 2018
10/16/18 ASHG 2018 10
From Eric Green NHGRI
10/16/18 ASHG 2018 11
By 2022:
 >80% from healthcare
 (as opposed to research)
 ~40-50M human genome sequences
 generated
Adapted From Eric Green NHGRI
10/16/18 ASHG 2018 12
By 2022:
 >80% from healthcare
 (as opposed to research)
 ~40-50M human genome sequences
 generated
Adapted From Eric Green NHGRI
10/16/18
13
Precision Medicine
More PreciseAccounting for Individual Variability
Genomics
Lifestyle Environment
Physiology
Adapted From Eric Green NHGRI
ASHG 2018
10/16/18
Precision Medicine
More PreciseAccounting for Individual Variability
Genomics
Lifestyle Environment
Physiology
Adapted From Eric Green NHGRI
Diverse
Complex
Integrated
Translatable
ASHG 2018 14
What Are the Drivers of Change Beyond the Data
Itself?
• Machine learning e.g. image analysis, predictive modeling
– Amount and quality of data available for training
– Open source - R and python
– Algorithmic efficiency
• Advances in computing
– GPU’s
– Cloud computing
• The private sector
10/16/18 ASHG 2018 15
Pastur-Romay et al. 2016 doi:10.3390/ijms17081313
The National Institute of Standards and Technology
(NIST) states the following:
Cloud computing is a model for enabling
convenient, on-demand network access to a shared
pool of configurable computing resources (e.g.,
networks, servers, storage applications and
services) that can be rapidly provisioned and
released with minimal management effort or
service provider interaction
10/16/18 ASHG 2018 16
Or more simply:
Endless computer-related services on demand from
anywhere on any device
10/16/18 ASHG 2018 17
Profit margin is 3x
that of retail
Fig 1. Conceptual cloud-based platform with different data types that flow between producers and
consumers requiring variable data level needs.
Navale V, Bourne PE (2018) Cloud computing applications for biomedical science: A perspective. PLOS Computational Biology 14(6): e1006144.
https://doi.org/10.1371/journal.pcbi.1006144
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006144
10/16/18 ASHG 2018 18
A cloud infrastructure facilitates a
move from pipes to platform…
which begs the question ...
10/16/18 ASHG 2018 19
Vivien Bonazzi Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818.
Will biomedical research become more like Airbnb?
We Currently Operate as Pipes in
Diverse Compute Environments
Should biomedical research be Like Airbnb?
doi: 10.1371/journal.pbio.2001818
ASHG 2018 2010/16/18
Paper Author Paper Reader
Data Provider Data Consumer
Employer Employee
Reagent Provider Reagent Consumer
Software Provider Software Consumer
Grant Writer Grant Reviewer
Supplier Consumer Platform
MS Project
Google Drive
Coursera
Researchgate
Academia.edu
Open Science
Framework
Synapse
F1000
Rio
Educator Student
ASHG 2018 2110/16/18
Clouds will ultimately digitally integrate the
scholarly workflow for human and machine analysis
10/16/18 ASHG 2018 22
Open Data Lab
Lest we forget and a segue into Bartha’s talk…
Going forward there is much more to consider than
technology ….
10/16/18 ASHG 2018 23
Why a More Open Process?
Use case:
Diffuse Intrinsic Pontine Gliomas (DIPG)
• Occur 1:100,000
individuals
• Peak incidence 6-8 years
of age
• Median survival 9-12
months
• Surgery is not an option
• Chemotherapy ineffective
and radiotherapy only
transitive
From Adam Resnick10/16/18 ASHG 2018 24
Timeline of genomic studies in DIPG
• Landmark studies identify
histone mutations as
recurrent driver mutations in
DIPG ~2012
• Almost 3 years later, in
largely the same datasets,
but partially expanded, the
same two groups and 2
others identify ACVR1
mutations as a secondary,
co-occurring mutation
From Adam Resnick
10/16/18 ASHG 2018 25
What do we need to do differently
to reveal ACVR1?
• ACVR1 is a targetable kinase
• Inhibition of ACVR1 inhibited tumor
progression in vitro
• ~300 DIPG patients a year
• ~60 are predicted to have ACVR1
• If large scale data sets were only
integrated with TCGA and/or rare
disease data in 2012, ACVR1 mutations
would have been identified
• 60 patients/year X 3 years = 180
children’s lives (who likely succumbed
to the disease during that time) could
have been impacted if only data were
FAIR
From Adam Resnick
10/16/18 ASHG 2018 26
Conclusion:
• The fourth paradigm changes the way we think about research
• Cloud computing technology is and will likely remain an integral part of this
paradigm shift
• Human genetics is not immune to this change
• The opportunities for human health in embracing the fourth paradigm are
profound
• The technical challenges (beyond cybersecurity) are the easy part
10/16/18 ASHG 2018 27
Acknowledgements
10/16/18 ASHG 2018 28
The BD2K Team at NIH
My Colleagues at UVA
The 150 folks who have passed through my laboratory
https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0
Vivien Bonazzi
Thank You
peb6a@virginia.edu
2910/16/18 ASHG 2018

Implications of the Fourth Paradigm

  • 1.
    Implications of theFourth Paradigm Philip E. Bourne PhD, FACMI Stephenson Chair of Data Science Director, Data Science Institute Professor of Biomedical Engineering peb6a@virginia.edu https://www.slideshare.net/pebourne 10/16/18 ASHG 2018 1 @pebourne
  • 2.
    10/16/18 ASHG 20182 http://hlwiki.slais.ubc.ca/images/b/bc/Fourth_paradigm.png
  • 3.
    Big data anddata science are like the Internet… If I asked you to define them you would all say something different, yet you use them every day… 10/16/18 ASHG 2018 3 http://vadlo.com/cartoons.php?id=357 Big Data and Data Science Exemplify the Fourth Paradigm Yet Definitions are Evasive
  • 4.
    Big Data/Data Science– A Working Definition • Use of the ever increasing amount of open, complex, diverse digital data frequently in ubiquitous cloud environments • Finding ways to ask and then answer relevant questions by combining such diverse data sets • Arriving at statistically significant conclusions not otherwise obtainable • Sharing such findings in a useful way • Translating such findings into actions that improve the human condition 10/16/18 ASHG 2018 4
  • 5.
    The Virtuous DataScience Cycle 10/16/18 ASHG 2018 5
  • 6.
    Model Transportability Horizontal Integration Multi-scale Integration human mouse zebrafish DNA Gene/Protein Network Cell Tissue Organ Body Population CNV SNP methylation 3Dstructure Gene expression Proteomics Metabolomics MetabolicSignaling transduction Gene regulation Hepatic Myoepithelial Erythrocyte Epithelial Muscle Nervous Liver Kidney Pancreas Heart Physiologically based pharmacokinetics GWASPopulation dynamics Microbiota Open, complex, diverse digital data Xie et al. Annu Rev Pharmacol Toxicol. 2017 57:245-262 10/16/18 6ASHG 2018
  • 7.
    This is Notthe Future it is Now 10/16/18 ASHG 2018 7
  • 8.
    What of theFuture? 10/16/18 ASHG 2018 8
  • 9.
    Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume,Velocity,Variety Digital camera inventedby Kodak but shelved Megapixels & quality improve slowly; Kodak slow to react Film market collapses; Kodak goes bankrupt Phones replace cameras Instagram, Flickr become the value proposition Digital media becomes bona fide form of communication From a presentation to the Advisory Board to the NIH Director Example - Photography 910/16/18 ASHG 2018
  • 10.
    10/16/18 ASHG 201810 From Eric Green NHGRI
  • 11.
    10/16/18 ASHG 201811 By 2022:  >80% from healthcare  (as opposed to research)  ~40-50M human genome sequences  generated Adapted From Eric Green NHGRI
  • 12.
    10/16/18 ASHG 201812 By 2022:  >80% from healthcare  (as opposed to research)  ~40-50M human genome sequences  generated Adapted From Eric Green NHGRI
  • 13.
    10/16/18 13 Precision Medicine More PreciseAccountingfor Individual Variability Genomics Lifestyle Environment Physiology Adapted From Eric Green NHGRI ASHG 2018
  • 14.
    10/16/18 Precision Medicine More PreciseAccountingfor Individual Variability Genomics Lifestyle Environment Physiology Adapted From Eric Green NHGRI Diverse Complex Integrated Translatable ASHG 2018 14
  • 15.
    What Are theDrivers of Change Beyond the Data Itself? • Machine learning e.g. image analysis, predictive modeling – Amount and quality of data available for training – Open source - R and python – Algorithmic efficiency • Advances in computing – GPU’s – Cloud computing • The private sector 10/16/18 ASHG 2018 15 Pastur-Romay et al. 2016 doi:10.3390/ijms17081313
  • 16.
    The National Instituteof Standards and Technology (NIST) states the following: Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction 10/16/18 ASHG 2018 16
  • 17.
    Or more simply: Endlesscomputer-related services on demand from anywhere on any device 10/16/18 ASHG 2018 17 Profit margin is 3x that of retail
  • 18.
    Fig 1. Conceptualcloud-based platform with different data types that flow between producers and consumers requiring variable data level needs. Navale V, Bourne PE (2018) Cloud computing applications for biomedical science: A perspective. PLOS Computational Biology 14(6): e1006144. https://doi.org/10.1371/journal.pcbi.1006144 https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006144 10/16/18 ASHG 2018 18
  • 19.
    A cloud infrastructurefacilitates a move from pipes to platform… which begs the question ... 10/16/18 ASHG 2018 19 Vivien Bonazzi Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818. Will biomedical research become more like Airbnb?
  • 20.
    We Currently Operateas Pipes in Diverse Compute Environments Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818 ASHG 2018 2010/16/18
  • 21.
    Paper Author PaperReader Data Provider Data Consumer Employer Employee Reagent Provider Reagent Consumer Software Provider Software Consumer Grant Writer Grant Reviewer Supplier Consumer Platform MS Project Google Drive Coursera Researchgate Academia.edu Open Science Framework Synapse F1000 Rio Educator Student ASHG 2018 2110/16/18 Clouds will ultimately digitally integrate the scholarly workflow for human and machine analysis
  • 22.
    10/16/18 ASHG 201822 Open Data Lab
  • 23.
    Lest we forgetand a segue into Bartha’s talk… Going forward there is much more to consider than technology …. 10/16/18 ASHG 2018 23
  • 24.
    Why a MoreOpen Process? Use case: Diffuse Intrinsic Pontine Gliomas (DIPG) • Occur 1:100,000 individuals • Peak incidence 6-8 years of age • Median survival 9-12 months • Surgery is not an option • Chemotherapy ineffective and radiotherapy only transitive From Adam Resnick10/16/18 ASHG 2018 24
  • 25.
    Timeline of genomicstudies in DIPG • Landmark studies identify histone mutations as recurrent driver mutations in DIPG ~2012 • Almost 3 years later, in largely the same datasets, but partially expanded, the same two groups and 2 others identify ACVR1 mutations as a secondary, co-occurring mutation From Adam Resnick 10/16/18 ASHG 2018 25
  • 26.
    What do weneed to do differently to reveal ACVR1? • ACVR1 is a targetable kinase • Inhibition of ACVR1 inhibited tumor progression in vitro • ~300 DIPG patients a year • ~60 are predicted to have ACVR1 • If large scale data sets were only integrated with TCGA and/or rare disease data in 2012, ACVR1 mutations would have been identified • 60 patients/year X 3 years = 180 children’s lives (who likely succumbed to the disease during that time) could have been impacted if only data were FAIR From Adam Resnick 10/16/18 ASHG 2018 26
  • 27.
    Conclusion: • The fourthparadigm changes the way we think about research • Cloud computing technology is and will likely remain an integral part of this paradigm shift • Human genetics is not immune to this change • The opportunities for human health in embracing the fourth paradigm are profound • The technical challenges (beyond cybersecurity) are the easy part 10/16/18 ASHG 2018 27
  • 28.
    Acknowledgements 10/16/18 ASHG 201828 The BD2K Team at NIH My Colleagues at UVA The 150 folks who have passed through my laboratory https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0 Vivien Bonazzi
  • 29.

Editor's Notes

  • #7 Model integration in systems pharmacology. Diverse models need to be integrated across multiple methodologies, multiple heterogeneous data sets, organismal hierarchy, and species (transportability).
  • #30 29