Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What Can Happen when Genome Sciences Meets Data Sciences?

491 views

Published on

Presentation to the Genome Sciences Group at University of Virginia, Feb 14, 2018.

Published in: Education
  • Be the first to comment

What Can Happen when Genome Sciences Meets Data Sciences?

  1. 1. What Can Happen when Genome Sciences Meets Data Sciences? Philip E. Bourne PhD, FACMI Stephenson Chair of Data Science Director, Data Science Institute Professor of Biomedical Engineering peb6a@virginia.edu https://www.slideshare.net/pebourne 02/14/18 UVA Genome Sciences 1
  2. 2. I am more interested in having a discussion than giving a lecture … This is not about my research specifically but what is happening more broadly 02/14/18 UVA Genome Sciences 2
  3. 3. Agenda • Some context – My definition of data science – What drives my thinking – What is the NIH thinking? • Relevant examples • The DSI and what is happening at UVA • Together, where do we go from here? 02/14/18 UVA Genome Sciences 3
  4. 4. What Do I Mean by Big Data/Data Science? • Use of the ever increasing amount of open, complex, diverse digital data • Finding ways to ask and then answer relevant questions by combining such diverse data sets • Arriving at statistically significant conclusions not otherwise obtainable • Sharing such findings in a useful way • Translating such findings into actions that improve the human condition 02/14/18 UVA Genome Sciences 4
  5. 5. What Drives my Thinking? Disruption: Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume,Velocity,Variety Digital camera invented by Kodak but shelved Megapixels & quality improve slowly; Kodak slow to react Film market collapses; Kodak goes bankrupt Phones replace cameras Instagram, Flickr become the value proposition Digital media becomes bona fide form of communication From a presentation to the Advisory Board to the NIH Director Example - Photography 502/14/18 UVA Genome Sciences
  6. 6. A Few Random Data {Science} Facts • There are ~2.7 Zetabytes (2.7 x 106 PB) of digital data currently – = US population tweeting 3x/min for 26,976 years • Big data currently estimated as a $50bn business – could save $3.1tn • 40% growth in data/yr; 5% growth in IT expenditure • US 140,000- 190,000 unfilled deep data analytics jobs • DSI has 600 applicants this year for 50 spots; MSDS/MBA highly sought 02/14/18 UVA Genome Sciences 6
  7. 7. A Few Random Data {Science} Facts • There are ~2.7 Zetabytes (2.7 x 106 PB) of digital data currently – = US population tweeting 3x/min for 26,976 years • Big data currently estimated as a $50bn business – could save $3.1tn – private sector research • 40% growth in data/yr; 5% growth in IT expenditure - undervalued • US 140,000- 190,000 unfilled deep data analytics jobs – competition for skilled researchers high • DSI has 600 applicants this year for 50 spots; MSDS/MBA highly sought – large human capital 02/14/18 UVA Genome Sciences 7
  8. 8. How Much Biomedical Data? • Big Data – Total data from NIH-funded research in 2016 estimated at 650 PB* – 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB in 2016 • Dark Data – Only 12% of data described in published papers is in recognized archives – 88% is dark data^ • Cost – 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data archives * In 2012 Library of Congress was 3 PB ^ http://www.ncbi.nlm.nih.gov/pubmed/26207759 02/14/18 UVA Genome Sciences 8
  9. 9. Consider Some Current High Profile NIH Examples Where Data Science is Being Applied • Moonshot - Bringing together 5 petabytes of homogenized data within the Genome Data Commons (GDC) to explore genotype-phenotype relationships • MODs – Multiple high value high cost genomic resources • Human Microbiome Project – microbe characterization and analysis • TOPMed – Genomic, proteomic, metabolomic, image and EHR data • All-of-Us Precision Medicine - Building a platform to support data on >1M individuals with extensive and constantly updated health profiles • ECHO – Effects of Environmental Exposures on Child Health and Development - Integration of child health and environmental data • BRAIN - Temporal and spatial analysis of neural circuits 9
  10. 10. How is Data Science Being Applied? • Moonshot – new ways to analyze genotype-phenotype associations • MODs – new curation and integration tools • Human Microbiome Project – new cloud based tools • TOPMed – large scale storage and analysis; data harmonization • All-of-Us Precision Medicine – security; analysis of sensor data; EHR integration • ECHO – metadata descriptions of health and environmental data; application of geospatial methods • BRAIN – methods for network analysis, visualization All: Analytics, the Commons, FAIR, sustainability, workforce 10 Wilkinson et al The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016 Mar 15;3:160018 https://datascience.nih.gov/TheCommons
  11. 11. Some underlying concerns at NIH… Reproducibility… Conformance to data sharing policies & governance more generally 11
  12. 12. Why a More Open Process? Use case: Diffuse Intrinsic Pontine Gliomas (DIPG) • Occur 1:100,000 individuals • Peak incidence 6-8 years of age • Median survival 9-12 months • Surgery is not an option • Chemotherapy ineffective and radiotherapy only transitive From Adam Resnick 02/14/18 UVA Genome Sciences 12
  13. 13. Timeline of genomic studies in DIPG • Landmark studies identify histone mutations as recurrent driver mutations in DIPG ~2012 • Almost 3 years later, in largely the same datasets, but partially expanded, the same two groups and 2 others identify ACVR1 mutations as a secondary, co- occurring mutation From Adam Resnick 02/14/18 UVA Genome Sciences 13
  14. 14. What do we need to do differently to reveal ACVR1? • ACVR1 is a targetable kinase • Inhibition of ACVR1 inhibited tumor progression in vitro • ~300 DIPG patients a year • ~60 are predicted to have ACVR1 • If large scale data sets were only integrated with TCGA and/or rare disease data in 2012, ACVR1 mutations would have been identified • 60 patients/year X 3 years = 180 children’s lives (who likely succumbed to the disease during that time) could have been impacted if only data were FAIR From Adam Resnick 02/14/18 UVA Genome Sciences 14
  15. 15. Both funders and some institutions see the need to move from pipes to platforms to accelerate research… 02/14/18 UVA Genome Sciences 15 https://blog.lexicata.com/wp-content/uploads/2015/03/platform-model- 750x410.png
  16. 16. If platforms are the answer we could ask the question… Will biomedical research become more like Airbnb? 02/14/18 UVA Genome Sciences 16 Vivien Bonazzi Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818
  17. 17. I am not crazy, hear me out • Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host) • The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder • It seems to be working: – 60 million users searching 2 million listings in 192 countries – Average of 500,000 stays per night. – Evaluation of US $25bn 02/14/18 UVA Genome Sciences 17 Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818
  18. 18. Platforms will ultimately digitally integrate the scholarly workflow for human and machine analysis Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818UVA Genome Sciences 1802/14/18
  19. 19. Why a comparison to Airbnb is not fair • Airbnb was born digital • The exchange of services on Airbnb are simple compared to what is required of a platform to support biomedical research Nevertheless there is much to be learnt 02/14/18 UVA Genome Sciences 19
  20. 20. Impediments to a biomedical platform • Current work practices by all stakeholders • Entrenched business models • Size of the undertaking aka resources needed • Trust • Incentives to use the platform http://www.forbes.com/sites/johnhall/2013/04/29/1 0-barriers-to-employee-innovation/#8bdbaa811133 02/14/18 UVA Genome Sciences 20
  21. 21. In summary there is not currently a widely adopted single platform for the exchange of services in biomedical research. Either there is a platform per service or no platform at all…. Funders and the institutions they fund need to work more closely to implement platforms 02/14/18 UVA Genome Sciences 21
  22. 22. Example: NSF and NIH Approaches 02/14/18 UVA Genome Sciences 22
  23. 23. How is the DSI responding to these various needs? 02/14/18 UVA Genome Sciences 23
  24. 24. 02/14/18 UVA Genome Sciences 24 Working across the grounds to break down traditional silos
  25. 25. • Currently sustainable • Planning for where the academical village meets Google – an ecosystem in which students, faculty, staff, visitors, private sector reps, entrepreneurs live and work • Open UVA and open data • Not owning anything; only working through collaboration e.g. – Dual degrees – Research projects across disciplines • MS DS focusing on practical training • Dual degrees • Soon PhD and undergraduate major • Wikimedian in residence (March, 2018) 02/14/18 UVA Genome Sciences 25 Hallmarks
  26. 26. Emergent DSI Organization 02/14/18 UVA Genome Sciences 26 Data Integration & Engineering Machine Learning & Analytics Visualization Data Acquisition & Dissemination Ethics, Law, Policy, Social Implications
  27. 27. Emergent DSI Organization 02/14/18 UVA Genome Sciences 27 Data Integration & Engineering Machine Learning & Analytics Visualization Data Acquisition & Dissemination Ethics, Law, Policy, Social Implications Biomedical Data Sciences
  28. 28. Paper Author Paper Reader Data Provider Data Consumer Employer Employee Reagent Provider Reagent Consumer Software Provider Software Consumer Grant Writer Grant Reviewer Supplier Consumer Platform MS Project Google Drive Coursera Researchgate Academia.edu Open Science Framework Synapse F1000 Rio Educator Student Data Acquisition & Dissemination Pilot Open Data Lab Underway UVA Genome Sciences 28gDOC02/14/18
  29. 29. Data Integration and Engineering • Ontologies • Object identifiers • Indexing schemes • Common data models 02/14/18 UVA Genome Sciences 29gDOC
  30. 30. Machine Learning & Analytics • Neural nets • Deep learning • NLP • Gene expression & neurological disease (Kipnis) • Predicting opioid overdose (VA Health) • Predicting escalating care and mortality risk of cirrhosis patients (UVA HS) • Human microbiome & mental health in maternal health (Physcology & Nursing) 02/14/18 UVA Genome Sciences 30gDOC
  31. 31. Visualization • VR • Networks • Sonics • Visualizing microbial stability (Biology & Systems) 02/14/18 UVA Genome Sciences 31gDOC
  32. 32. Ethics, Law, Policy & Social Implications • Data sharing • Privacy • Normativity 02/14/18 UVA Genome Sciences 32gDOC Wendy Novicoff, Ph.D
  33. 33. Points of Interaction • Dual degrees with an MSDS • Specific projects for: – Presidential fellows (due March 19, 2018) – Capstones (due June 29, 2018) • Thoughts on biomedical data science cluster hires • Data Science Internship program with NIH, Inova, GMU, VT, GWU, UMD… • Join the DSI faculty • Join the mailing list – Lunch and learn – Distinguished lectures – Special events 02/14/18 UVA Genome Sciences 33
  34. 34. References • Dunn and Bourne Building the Biomedical Data Science Workforce PLoS Biol. 2017 Jul 17;15(7):e2003082. • Bonazzi and Bourne Should Biomedical Research be like Airbnb? PLoS Biol. 2017 Apr 7;15(4):e2001818. • McKiernan et al How Open Science Helps Researchers Succeed Elife. 2016 Jul 7;5. pii: e16800 • Wilkinson et al The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016 Mar 15;3:160018. • https://datascience.nih.gov/TheCommons 02/14/18 UVA Genome Sciences 34
  35. 35. Acknowledgements 02/14/18 UVA Genome Sciences 35 The BD2K Team at NIH My New Colleagues at UVA The 150 folks who have passed through my laboratory https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0 Scott and Beth Stephenson Anonymous donors for the DSI endowment
  36. 36. Thank You peb6a@virginia.edu 3602/14/18 UVA Genome Sciences

×