Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Implications of the Fourth Paradigm


Published on

The 4th paradigm of research is manifest in the rising popularity of data science. Data science developments relevant to human genetics are discussed with particular reference to cloud computing and data accessibility.

American Society for Human Genetics, October 16, 2018, San Diego

Published in: Education
  • Great talk at ASHG. Glad to see the slides here.
    Are you sure you want to  Yes  No
    Your message goes here

Implications of the Fourth Paradigm

  1. 1. Implications of the Fourth Paradigm Philip E. Bourne PhD, FACMI Stephenson Chair of Data Science Director, Data Science Institute Professor of Biomedical Engineering 10/16/18 ASHG 2018 1 @pebourne
  2. 2. 10/16/18 ASHG 2018 2
  3. 3. Big data and data science are like the Internet… If I asked you to define them you would all say something different, yet you use them every day… 10/16/18 ASHG 2018 3 Big Data and Data Science Exemplify the Fourth Paradigm Yet Definitions are Evasive
  4. 4. Big Data/Data Science – A Working Definition • Use of the ever increasing amount of open, complex, diverse digital data frequently in ubiquitous cloud environments • Finding ways to ask and then answer relevant questions by combining such diverse data sets • Arriving at statistically significant conclusions not otherwise obtainable • Sharing such findings in a useful way • Translating such findings into actions that improve the human condition 10/16/18 ASHG 2018 4
  5. 5. The Virtuous Data Science Cycle 10/16/18 ASHG 2018 5
  6. 6. Model Transportability Horizontal Integration Multi-scale Integration human mouse zebrafish DNA Gene/Protein Network Cell Tissue Organ Body Population CNV SNP methylation 3D structure Gene expression Proteomics Metabolomics MetabolicSignaling transduction Gene regulation Hepatic Myoepithelial Erythrocyte Epithelial Muscle Nervous Liver Kidney Pancreas Heart Physiologically based pharmacokinetics GWASPopulation dynamics Microbiota Open, complex, diverse digital data Xie et al. Annu Rev Pharmacol Toxicol. 2017 57:245-262 10/16/18 6ASHG 2018
  7. 7. This is Not the Future it is Now 10/16/18 ASHG 2018 7
  8. 8. What of the Future? 10/16/18 ASHG 2018 8
  9. 9. Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume,Velocity,Variety Digital camera invented by Kodak but shelved Megapixels & quality improve slowly; Kodak slow to react Film market collapses; Kodak goes bankrupt Phones replace cameras Instagram, Flickr become the value proposition Digital media becomes bona fide form of communication From a presentation to the Advisory Board to the NIH Director Example - Photography 910/16/18 ASHG 2018
  10. 10. 10/16/18 ASHG 2018 10 From Eric Green NHGRI
  11. 11. 10/16/18 ASHG 2018 11 By 2022:  >80% from healthcare  (as opposed to research)  ~40-50M human genome sequences  generated Adapted From Eric Green NHGRI
  12. 12. 10/16/18 ASHG 2018 12 By 2022:  >80% from healthcare  (as opposed to research)  ~40-50M human genome sequences  generated Adapted From Eric Green NHGRI
  13. 13. 10/16/18 13 Precision Medicine More PreciseAccounting for Individual Variability Genomics Lifestyle Environment Physiology Adapted From Eric Green NHGRI ASHG 2018
  14. 14. 10/16/18 Precision Medicine More PreciseAccounting for Individual Variability Genomics Lifestyle Environment Physiology Adapted From Eric Green NHGRI Diverse Complex Integrated Translatable ASHG 2018 14
  15. 15. What Are the Drivers of Change Beyond the Data Itself? • Machine learning e.g. image analysis, predictive modeling – Amount and quality of data available for training – Open source - R and python – Algorithmic efficiency • Advances in computing – GPU’s – Cloud computing • The private sector 10/16/18 ASHG 2018 15 Pastur-Romay et al. 2016 doi:10.3390/ijms17081313
  16. 16. The National Institute of Standards and Technology (NIST) states the following: Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction 10/16/18 ASHG 2018 16
  17. 17. Or more simply: Endless computer-related services on demand from anywhere on any device 10/16/18 ASHG 2018 17 Profit margin is 3x that of retail
  18. 18. Fig 1. Conceptual cloud-based platform with different data types that flow between producers and consumers requiring variable data level needs. Navale V, Bourne PE (2018) Cloud computing applications for biomedical science: A perspective. PLOS Computational Biology 14(6): e1006144. 10/16/18 ASHG 2018 18
  19. 19. A cloud infrastructure facilitates a move from pipes to platform… which begs the question ... 10/16/18 ASHG 2018 19 Vivien Bonazzi Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818. Will biomedical research become more like Airbnb?
  20. 20. We Currently Operate as Pipes in Diverse Compute Environments Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818 ASHG 2018 2010/16/18
  21. 21. Paper Author Paper Reader Data Provider Data Consumer Employer Employee Reagent Provider Reagent Consumer Software Provider Software Consumer Grant Writer Grant Reviewer Supplier Consumer Platform MS Project Google Drive Coursera Researchgate Open Science Framework Synapse F1000 Rio Educator Student ASHG 2018 2110/16/18 Clouds will ultimately digitally integrate the scholarly workflow for human and machine analysis
  22. 22. 10/16/18 ASHG 2018 22 Open Data Lab
  23. 23. Lest we forget and a segue into Bartha’s talk… Going forward there is much more to consider than technology …. 10/16/18 ASHG 2018 23
  24. 24. Why a More Open Process? Use case: Diffuse Intrinsic Pontine Gliomas (DIPG) • Occur 1:100,000 individuals • Peak incidence 6-8 years of age • Median survival 9-12 months • Surgery is not an option • Chemotherapy ineffective and radiotherapy only transitive From Adam Resnick10/16/18 ASHG 2018 24
  25. 25. Timeline of genomic studies in DIPG • Landmark studies identify histone mutations as recurrent driver mutations in DIPG ~2012 • Almost 3 years later, in largely the same datasets, but partially expanded, the same two groups and 2 others identify ACVR1 mutations as a secondary, co-occurring mutation From Adam Resnick 10/16/18 ASHG 2018 25
  26. 26. What do we need to do differently to reveal ACVR1? • ACVR1 is a targetable kinase • Inhibition of ACVR1 inhibited tumor progression in vitro • ~300 DIPG patients a year • ~60 are predicted to have ACVR1 • If large scale data sets were only integrated with TCGA and/or rare disease data in 2012, ACVR1 mutations would have been identified • 60 patients/year X 3 years = 180 children’s lives (who likely succumbed to the disease during that time) could have been impacted if only data were FAIR From Adam Resnick 10/16/18 ASHG 2018 26
  27. 27. Conclusion: • The fourth paradigm changes the way we think about research • Cloud computing technology is and will likely remain an integral part of this paradigm shift • Human genetics is not immune to this change • The opportunities for human health in embracing the fourth paradigm are profound • The technical challenges (beyond cybersecurity) are the easy part 10/16/18 ASHG 2018 27
  28. 28. Acknowledgements 10/16/18 ASHG 2018 28 The BD2K Team at NIH My Colleagues at UVA The 150 folks who have passed through my laboratory Vivien Bonazzi
  29. 29. Thank You 2910/16/18 ASHG 2018