Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data and its Role in Biomedical Research

250 views

Published on

American Conference of Pharmacometrics (ACoP9) San Diego CA, October 10 2018 http://www.go-acop.org/conference-program

Published in: Education
  • Be the first to comment

  • Be the first to like this

Big Data and its Role in Biomedical Research

  1. 1. Big Data and its Role in Biomedical Research Philip E. Bourne PhD, FACMI Stephenson Chair of Data Science Director, Data Science Institute Professor of Biomedical Engineering peb6a@virginia.edu https://www.slideshare.net/pebourne 10/10/18 ACoP 2018 1 @pebourne
  2. 2. Bias • Cant help but be influenced by my time as Associate Director for Data Science (ADDS) at NIH • Now very much engaged in data science across disciplines – broader but shallower perspective • Knowing my long-time colleague Prof. Lei Xie and others will follow me with a deeper perspective 10/10/18 ACoP 2018 2
  3. 3. Lets start with a definition …. 10/10/18 ACoP 2018 3
  4. 4. Big data and data science are like the Internet… If I asked you to define them you would all say something different, yet you use them every day… 10/10/18 ACoP 2018 4 http://vadlo.com/cartoons.php?id=357
  5. 5. So what do I mean by big data/data science? • Use of the ever increasing amount of open, complex, diverse digital data • Finding ways to ask and then answer relevant questions by combining such diverse data sets • Arriving at statistically significant conclusions not otherwise obtainable • Sharing such findings in a useful way • Translating such findings into actions that improve the human condition 10/10/18 ACoP 2018 5
  6. 6. Model Transportability Horizontal Integration Multi-scale Integration human mouse zebrafish DNA Gene/Protein Network Cell Tissue Organ Body Population CNV SNP methylation 3D structure Gene expression Proteomics Metabolomics MetabolicSignaling transduction Gene regulation Hepatic Myoepithelial Erythrocyte Epithelial Muscle Nervous Liver Kidney Pancreas Heart Physiologically based pharmacokinetics GWASPopulation dynamics Microbiota QSP - Open, complex, diverse digital data Xie et al. Annu Rev Pharmacol Toxicol. 2017 57:245-262 10/10/18 6
  7. 7. Machine learning has been around for over 20 years – why the fuss now? • Amount of data available for training • Open source - R and python • Advances in computing (e.g., GPU’s) allow for deeper neural nets (deep learning) • Algorithmic efficiency gains (e.g., in back propagation) • Success promotes further research • Commercialization 10/10/18 ACoP 2018 7 Pastur-Romay et al. 2016 doi:10.3390/ijms17081313
  8. 8. The NIH view • Big Data – Total data from NIH-funded research in 2016 estimated at 650 PB* – 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB in 2016 • Dark Data – Only 12% of data described in published papers is in recognized archives – 88% is dark data^ • Cost – 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data archives * In 2012 Library of Congress was 3 PB ^ http://www.ncbi.nlm.nih.gov/pubmed/26207759 10/10/18 ACoP 2018 8
  9. 9. NIH strategic plan for data • Support a Highly Efficient and Effective Biomedical Research Data Infrastructure • Promote Modernization of the Data- Resources Ecosystem • Support the Development and Dissemination of Advanced Data Management, Analytics, and Visualization Tools • Enhance Workforce Development for Biomedical Data Science • Enact Appropriate Policies to Promote Stewardship and Sustainability 10/10/18 ACoP 2018 9 https://grants.nih.gov/grants/rfi/NIH-Strategic-Plan-for-Data-Science.pdf
  10. 10. A research data infrastructure requires we move from pipes to platform… which begs the question ... 10/10/18 ACoP 2018 10 Vivien Bonazzi Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818. Will biomedical research become more like Airbnb?
  11. 11. I am not crazy, hear me out • Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host) • The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder • It seems to be working: – 60 million users searching 2 million listings in 192 countries – Average of 500,000 stays per night. – Evaluation of US $25bn 10/10/18 ACoP 2018 11 Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818.
  12. 12. Cloud computing environment data metadata software model container Metadata Model Commons Model Commons Recommendation System Model registry User interface (A) (B) (C) ontology model data algorithm software These plans require moving from pipes to platforms 10/10/18 ACoP 2018 12
  13. 13. The pillars of data science operate within this platform environment 13 QSP 10/10/18 ACoP 2018
  14. 14. Lets briefly focus on those five pillars in the Context of QSP … 10/10/18 ACoP 2018 14
  15. 15. Data acquisition The data production issue (the V’s of Big Data)— Experimentally • Estimated (2017) that ≈2.5 quintillion (2.5×1018) bytes of data generated daily, with 90% of all the world’s data having been created in the past two years. • Plaintext PDB files typically ≈ few 100s KB (…but, that’s just the start!) Mura et al. 2018 Curr Opin Struct Biol. 52:95-102 10/10/18 ACoP 2018 15
  16. 16. Data integration and engineering • Generic – Ontologies – Object identifiers – Indexing schemes – Common data models 1610/10/18 ACoP 2018
  17. 17. Data analytics 17 • Generic –SVM’s –Neural nets –Deep learning –Random forest 10/10/18 ACoP 2018
  18. 18. Visualization • Generic – VR – Networks – Sonics 1810/10/18 ACoP 2018
  19. 19. Ethics, law & policy 10/10/18 ACoP 2018 19 • Landmark studies identify histone mutations as recurrent driver mutations in DIPG ~2012 • Almost 3 years later, in largely the same datasets, but partially expanded, the same two groups and 2 others identify ACVR1 mutations as a secondary, co-occurring mutation From Adam Resnick Diffuse Intrinsic Pontine Glioma (DIDG)
  20. 20. Conclusion: Driven by large amounts of open digital data of different types and new algorithms and approaches biomedical researchers are destined to follow the private sector towards the fourth paradigm 10/10/18 ACoP 2018 20
  21. 21. Acknowledgements 10/10/18 ACoP 2018 21 The BD2K Team at NIH My Colleagues at UVA The 150 folks who have passed through my laboratory https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0 Zheng Zhao Lei Xie
  22. 22. Thank You peb6a@virginia.edu 2210/10/18 ACoP 2018

×