Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Science Meets Structural Biology

347 views

Published on

How structural biology can influence data science and vice versa. Based on a forthcoming paper in Current Opinions in Structural Biology https://arxiv.org/abs/1807.09247 and presented as part of the University of Virginia Data Science Institute Lunch and Learn Series, August 31, 2018

Published in: Education
  • Be the first to comment

  • Be the first to like this

Data Science Meets Structural Biology

  1. 1. Data Science Meets Structural Biology Philip E. Bourne, Cam Mura & Eli Draizen (Open Team Science) https://www.slideshare.net/pebourne 08/31/18 DSI Lunch & Learn 1 https://arxiv.org/abs/1807.09247
  2. 2. We are more interested in having a discussion than giving a lecture … 08/31/18 DSI Lunch & Learn 2
  3. 3. Lets start with a couple of definitions… 08/31/18 DSI Lunch & Learn 3
  4. 4. What Do We Mean by Data Science? • Use of the ever increasing amount of open, complex, diverse digital data • Finding ways to ask and then answer relevant questions by combining such diverse data sets • Arriving at statistically significant conclusions not otherwise obtainable • Sharing such findings in a useful way • Translating such findings into actions that improve the human condition 08/31/18 DSI Lunch & Learn 4
  5. 5. What Do We Mean by Structural Biology? 08/31/18 DSI Lunch & Learn 5
  6. 6. Structure… What’s it good for?? Classic structural biology example A point mutation (E6→V) in the Hb β globin chain results in sickle cell anemia
  7. 7. Structural biology success stories microtubule Atomic-resolution studies of cellular-scale systems have bec- ome increasingly possible — immense explanatory power! mid-1990s 1960-70s early1990s ~2002 1986
  8. 8. Why Do We Care About this Intersection? 08/31/18 DSI Lunch & Learn 8 Stepping back… Data are transforming how we think about everything, including biomedical research… Most folks just do not realize it yet… Your reading of this slide relies on structural biology (a photoreceptor called rhodopsin!)
  9. 9. Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume,Velocity,Variety Digital camera invented by Kodak but shelved Megapixels & quality improve slowly; Kodak slow to react Film market collapses; Kodak goes bankrupt Phones replace cameras Instagram, Flickr become the value proposition Digital media becomes bona fide form of communication From a presentation to the Advisory Board to the NIH Director Example - Photography 908/31/18 DSI Lunch & Learn
  10. 10. How is the DSI Responding to this Change? • Societal good • Interdisciplinary • Practical experience • Ethical conduct • Openness and transparency 08/31/18 DSI Lunch & Learn 10 Surge in publications involving machine learning in the biosciences ('J-curve')
  11. 11. Example of Why More Openness: Diffuse Intrinsic Pontine Gliomas (DIPG) • Occur 1:100,000 individuals • Peak incidence 6-8 years of age • Median survival 9-12 months • Surgery is not an option • Chemotherapy ineffective and radiotherapy only transitive From Adam Resnick 08/31/18 DSI Lunch & Learn 11
  12. 12. Timeline of genomic studies in DIPG • Landmark studies identify histone mutations as recurrent driver mutations in DIPG ~2012 • Almost 3 years later, in largely the same datasets, but partially expanded, the same two groups and 2 others identify ACVR1 mutations as a secondary, co- occurring mutation From Adam Resnick 08/31/18 DSI Lunch & Learn 12
  13. 13. What do we need to do differently to reveal ACVR1? • ACVR1 is a targetable kinase • Inhibition of ACVR1 inhibited tumor progression in vitro • ~300 DIPG patients a year • ~60 are predicted to have ACVR1 • If large scale data sets were only integrated with TCGA and/or rare disease data in 2012, ACVR1 mutations would have been identified • 60 patients/year X 3 years = 180 children’s lives (who likely succumbed to the disease during that time) could have been impacted if only data were FAIR From Adam Resnick 08/31/18 DSI Lunch & Learn 13
  14. 14. 08/31/18 DSI Lunch & Learn 14 Working across the Grounds to break down traditional silos
  15. 15. • Sustainable • Designing for where the academical village meets Google – an ecosystem in which students, faculty, staff, visitors, private sector reps, entrepreneurs live and work • Open UVA and open data – Wikimedian in Residence • Collaboration – Dual degrees – Research projects across disciplines – Sister institutions • MS DS focusing on practical training • PhD program • Undergraduate major • Undergraduate certificate 08/31/18 DSI Lunch & Learn 15 Hallmarks Reflecting Those Principles Under development
  16. 16. DSI Organization Structural Biology is one of Many Cross Cutting Initiatives 08/31/18 DSI Lunch & Learn 16 Data Integration & Engineering Machine Learning & Analytics Visualization & Dissemination Data Acquisition Ethics, Law, Policy, Social Implications Structural Biology
  17. 17. DSI Organization Structural Biology is one of Many Cross Cutting Initiatives 08/31/18 DSI Lunch & Learn 17 Structural Biology mapped onto the five pillars of Data Science Structural Biology
  18. 18. Lets Briefly Focus on those Five Points of Intersection in the Context of Structural Biology … 08/31/18 DSI Lunch & Learn 18
  19. 19. Data Acquisition 08/31/18 DSI Lunch & Learn 19 The data production issue (the V’s of Big Data)— Experimentally • Estimated (2017) that ≈2.5 quintillion (2.5×1018) bytes of data generated daily, with 90% of all the world’s data having been created in the past two years. • Plaintext PDB files typically ≈ few 100s KB (…but, that’s just the start!)
  20. 20. Data Acquisition 08/31/18 DSI Lunch & Learn 20 The data production issue (the V’s of Big Data)— Computationally • Here are some 2D RMSD matrices from a µs-scale biomolecular simulation. • Half a mole (6.02×1023) of calculations!
  21. 21. Data Acquisition 08/31/18 DSI Lunch & Learn 21 The data reduction issue (the V’s of Big Data)— Computationally • The produce/spawn/consume idiom (MapReduce)
  22. 22. Data Integration and Engineering • Data are structured – Ontologies – Object identifiers – Indexing schemes – Common data models 08/31/18 DSI Lunch & Learn 22
  23. 23. Machine Learning & Analytics 08/31/18 DSI Lunch & Learn 23 • Structure->Function• Sequence->Structure Protein•Protein Protein•Ligand Binding sites
  24. 24. Machine Learning & Analytics • Neural nets • Deep learning 08/31/18 DSI Lunch & Learn 24
  25. 25. Machine Learning & Analytics 08/31/18 DSI Lunch & Learn 25 • Deep Learning for Object Recognition/Segmentation Features in Image Slice Predicted Classes Badrinarayanan, et al. 2016. arXiv:1511.00561v3
  26. 26. Machine Learning & Analytics 08/31/18 DSI Lunch & Learn 26 • Deep Learning for Object Recognition/Segmentation Features in Volume Slice Predicted Classes Badrinarayanan, et al. 2016. arXiv:1511.00561v3
  27. 27. Visualization • VR • Networks • Sonics 08/31/18 DSI Lunch & Learn 27 Starting point: Structure of a bacterial protein involved in RNA-associated regulatory circuits (e.g., virulence)
  28. 28. Visualization 08/31/18 DSI Lunch & Learn 28
  29. 29. Visualization • VR • Networks • Sonics 08/31/18 DSI Lunch & Learn 29 What about dynamics (life not at T=0)?
  30. 30. Visualization • VR • Networks • Sonics 08/31/18 What about dynamics (life not at T=0)?
  31. 31. Visualization 08/31/18 What about physics (of RNA-binding)?
  32. 32. Visualization 08/31/18 What about statistics (log-odds here)?
  33. 33. Visualization 08/31/18 What about cellular-scale systems? Kozlikova et al., 2016; Comp Graph Forum
  34. 34. Visualization 08/31/18 What about cellular-scale systems? Kozlikova et al., 2016; Comp Graph Forum
  35. 35. Ethics, Law, Policy & Social Implications •A Story of Fraud 08/31/18 DSI Lunch & Learn 35
  36. 36. Thank You peb6a@virginia.edu 3608/31/18 DSI Lunch & Learn

×