SlideShare a Scribd company logo
Big Data and its Role in
Biomedical Research
Philip E. Bourne PhD, FACMI
Stephenson Chair of Data Science
Director, Data Science Institute
Professor of Biomedical Engineering
peb6a@virginia.edu
https://www.slideshare.net/pebourne
10/10/18 ACoP 2018 1
@pebourne
Bias
• Cant help but be influenced by my time as Associate
Director for Data Science (ADDS) at NIH
• Now very much engaged in data science across disciplines
– broader but shallower perspective
• Knowing my long-time colleague Prof. Lei Xie and others
will follow me with a deeper perspective
10/10/18 ACoP 2018 2
Lets start with a definition ….
10/10/18 ACoP 2018 3
Big data and data
science are like
the Internet…
If I asked you to
define them you
would all say
something
different, yet you
use them every
day…
10/10/18 ACoP 2018 4
http://vadlo.com/cartoons.php?id=357
So what do I mean by big data/data
science?
• Use of the ever increasing amount of open, complex, diverse
digital data
• Finding ways to ask and then answer relevant questions by
combining such diverse data sets
• Arriving at statistically significant conclusions not otherwise
obtainable
• Sharing such findings in a useful way
• Translating such findings into actions that improve the human
condition
10/10/18 ACoP 2018 5
Model
Transportability
Horizontal
Integration
Multi-scale
Integration
human
mouse
zebrafish
DNA
Gene/Protein
Network
Cell
Tissue
Organ
Body
Population
CNV SNP methylation
3D structure Gene
expression Proteomics
Metabolomics
MetabolicSignaling
transduction
Gene
regulation
Hepatic Myoepithelial Erythrocyte
Epithelial Muscle Nervous
Liver Kidney Pancreas Heart
Physiologically based
pharmacokinetics
GWASPopulation
dynamics
Microbiota
QSP - Open, complex, diverse digital data
Xie et al. Annu Rev Pharmacol Toxicol. 2017 57:245-262
10/10/18 6
Machine learning has been around for over 20
years – why the fuss now?
• Amount of data available for training
• Open source - R and python
• Advances in computing (e.g., GPU’s) allow for deeper neural nets (deep
learning)
• Algorithmic efficiency gains (e.g., in back propagation)
• Success promotes further research
• Commercialization
10/10/18 ACoP 2018 7
Pastur-Romay et al. 2016 doi:10.3390/ijms17081313
The NIH view
• Big Data
– Total data from NIH-funded research in 2016 estimated at 650 PB*
– 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10
PB in 2016
• Dark Data
– Only 12% of data described in published papers is in recognized
archives – 88% is dark data^
• Cost
– 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data
archives
* In 2012 Library of Congress was 3 PB
^ http://www.ncbi.nlm.nih.gov/pubmed/26207759
10/10/18 ACoP 2018 8
NIH strategic plan for data
• Support a Highly Efficient and Effective
Biomedical Research Data
Infrastructure
• Promote Modernization of the Data-
Resources Ecosystem
• Support the Development and
Dissemination of Advanced Data
Management, Analytics, and
Visualization Tools
• Enhance Workforce Development for
Biomedical Data Science
• Enact Appropriate Policies to Promote
Stewardship and Sustainability
10/10/18 ACoP 2018 9
https://grants.nih.gov/grants/rfi/NIH-Strategic-Plan-for-Data-Science.pdf
A research data infrastructure requires
we move from pipes to platform…
which begs the question ...
10/10/18 ACoP 2018 10
Vivien Bonazzi Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818.
Will biomedical research become more like Airbnb?
I am not crazy, hear me out
• Airbnb is a platform that supports a trusted relationship between consumer
(renter) and supplier (host)
• The platform focuses on maximizing the exchange of services between supplier and
consumer and maximizing the amount of trust associated with a given stakeholder
• It seems to be working:
– 60 million users searching 2 million listings in 192 countries
– Average of 500,000 stays per night.
– Evaluation of US $25bn
10/10/18 ACoP 2018 11
Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818.
Cloud computing
environment
data
metadata
software
model
container
Metadata
Model Commons
Model Commons
Recommendation
System
Model
registry
User interface
(A) (B)
(C)
ontology
model
data
algorithm
software
These plans require moving from pipes to platforms
10/10/18 ACoP 2018 12
The pillars of data science operate
within this platform environment
13
QSP
10/10/18 ACoP 2018
Lets briefly focus on those five pillars
in the Context of QSP …
10/10/18 ACoP 2018 14
Data acquisition
The data production issue (the V’s of Big Data)— Experimentally
• Estimated (2017) that ≈2.5 quintillion (2.5×1018) bytes of data generated daily, with 90%
of all the world’s data having been created in the past two years.
• Plaintext PDB files typically ≈ few 100s KB (…but, that’s just the start!)
Mura et al. 2018 Curr Opin Struct Biol. 52:95-102
10/10/18 ACoP 2018 15
Data integration and engineering
• Generic
– Ontologies
– Object identifiers
– Indexing schemes
– Common data models
1610/10/18 ACoP 2018
Data analytics
17
• Generic
–SVM’s
–Neural nets
–Deep learning
–Random forest
10/10/18 ACoP 2018
Visualization
• Generic
– VR
– Networks
– Sonics
1810/10/18 ACoP 2018
Ethics, law & policy
10/10/18 ACoP 2018 19
• Landmark studies identify
histone mutations as
recurrent driver mutations in
DIPG ~2012
• Almost 3 years later, in
largely the same datasets,
but partially expanded, the
same two groups and 2
others identify ACVR1
mutations as a secondary,
co-occurring mutation
From Adam Resnick
Diffuse Intrinsic Pontine Glioma (DIDG)
Conclusion:
Driven by large amounts of open
digital data of different types and new
algorithms and approaches biomedical
researchers are destined to follow the
private sector towards the fourth
paradigm
10/10/18 ACoP 2018 20
Acknowledgements
10/10/18 ACoP 2018 21
The BD2K Team at NIH
My Colleagues at UVA
The 150 folks who have passed through my laboratory
https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0
Zheng Zhao Lei Xie
Thank You
peb6a@virginia.edu
2210/10/18 ACoP 2018

More Related Content

What's hot

Bioethics in clinical research
Bioethics in clinical researchBioethics in clinical research
Bioethics in clinical research
Dr. Shazia Afreen
 
Research Methodology - Study Designs
Research Methodology - Study DesignsResearch Methodology - Study Designs
Research Methodology - Study Designs
Azmi Mohd Tamil
 
Future of Biological Drugs
Future of Biological DrugsFuture of Biological Drugs
Future of Biological Drugs
Sujay Iyer
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
Sohail Bajammal
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
Dr.Bhuvaneswari Velumani
 
Precision Medicine - The Future of Healthcare
Precision Medicine - The Future of HealthcarePrecision Medicine - The Future of Healthcare
Precision Medicine - The Future of Healthcare
Data Science Thailand
 
Nanotechnology overview final
Nanotechnology overview finalNanotechnology overview final
Nanotechnology overview final
Manoranjan Ghosh
 
Nanotechnology in diagnostic Pathology
Nanotechnology in diagnostic PathologyNanotechnology in diagnostic Pathology
Nanotechnology in diagnostic Pathology
Aamirlone47
 
Genetics ethics
Genetics ethicsGenetics ethics
Genetics ethics
tas11244
 
RISK MANAGEMENT OF NANOMATERIALS
RISK MANAGEMENT OF NANOMATERIALS RISK MANAGEMENT OF NANOMATERIALS
RISK MANAGEMENT OF NANOMATERIALS
Oeko-Institut
 
Nanobiotechnology lecture 1
Nanobiotechnology lecture 1Nanobiotechnology lecture 1
Nanobiotechnology lecture 1
Ibad khan
 
Research Grant
Research GrantResearch Grant
Research Grant
Bhaumik Bavishi
 
Ethical Issues in Human Subjects Research - Department of Supportive Care
Ethical Issues in Human Subjects Research -  Department of Supportive CareEthical Issues in Human Subjects Research -  Department of Supportive Care
Ethical Issues in Human Subjects Research - Department of Supportive Care
Global Institute GIPPEC
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
Dung Nguyen
 
History of Cloning and Ethical Issues of Human Cloning
History of Cloning and Ethical Issues of Human CloningHistory of Cloning and Ethical Issues of Human Cloning
History of Cloning and Ethical Issues of Human Cloning
Dr. Arman Firoz, Ph.D., MRSB
 
NANO TECHNOLOGY
NANO TECHNOLOGYNANO TECHNOLOGY
NANO TECHNOLOGY
Ankur Prakash Singh
 
Machine Learning in Healthcare and Life Science
Machine Learning in Healthcare and Life ScienceMachine Learning in Healthcare and Life Science
Machine Learning in Healthcare and Life Science
IDEAS - Int'l Data Engineering and Science Association
 

What's hot (20)

Bioethics defined
Bioethics definedBioethics defined
Bioethics defined
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioethics in clinical research
Bioethics in clinical researchBioethics in clinical research
Bioethics in clinical research
 
Research Methodology - Study Designs
Research Methodology - Study DesignsResearch Methodology - Study Designs
Research Methodology - Study Designs
 
Future of Biological Drugs
Future of Biological DrugsFuture of Biological Drugs
Future of Biological Drugs
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Precision Medicine - The Future of Healthcare
Precision Medicine - The Future of HealthcarePrecision Medicine - The Future of Healthcare
Precision Medicine - The Future of Healthcare
 
Ethical issues in research 2
Ethical issues in research 2Ethical issues in research 2
Ethical issues in research 2
 
Nanotechnology overview final
Nanotechnology overview finalNanotechnology overview final
Nanotechnology overview final
 
Nanotechnology in diagnostic Pathology
Nanotechnology in diagnostic PathologyNanotechnology in diagnostic Pathology
Nanotechnology in diagnostic Pathology
 
Genetics ethics
Genetics ethicsGenetics ethics
Genetics ethics
 
RISK MANAGEMENT OF NANOMATERIALS
RISK MANAGEMENT OF NANOMATERIALS RISK MANAGEMENT OF NANOMATERIALS
RISK MANAGEMENT OF NANOMATERIALS
 
Nanobiotechnology lecture 1
Nanobiotechnology lecture 1Nanobiotechnology lecture 1
Nanobiotechnology lecture 1
 
Research Grant
Research GrantResearch Grant
Research Grant
 
Ethical Issues in Human Subjects Research - Department of Supportive Care
Ethical Issues in Human Subjects Research -  Department of Supportive CareEthical Issues in Human Subjects Research -  Department of Supportive Care
Ethical Issues in Human Subjects Research - Department of Supportive Care
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
History of Cloning and Ethical Issues of Human Cloning
History of Cloning and Ethical Issues of Human CloningHistory of Cloning and Ethical Issues of Human Cloning
History of Cloning and Ethical Issues of Human Cloning
 
NANO TECHNOLOGY
NANO TECHNOLOGYNANO TECHNOLOGY
NANO TECHNOLOGY
 
Machine Learning in Healthcare and Life Science
Machine Learning in Healthcare and Life ScienceMachine Learning in Healthcare and Life Science
Machine Learning in Healthcare and Life Science
 

Similar to Big Data and its Role in Biomedical Research

UK data management environment and support
UK data management environment and supportUK data management environment and support
UK data management environment and support
Jisc
 
Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Data Science BD2K Update for NIH
Data Science BD2K Update for NIH
Philip Bourne
 
Big Data and Data Science: Opportunities for Biomedical Engineering
Big Data and Data Science: Opportunities for Biomedical EngineeringBig Data and Data Science: Opportunities for Biomedical Engineering
Big Data and Data Science: Opportunities for Biomedical Engineering
Philip Bourne
 
Are Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science AlignedAre Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science Aligned
Philip Bourne
 
Research Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the PolicyResearch Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the Policy
Torsten Reimer
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...
Robin Rice
 
Implications of the Fourth Paradigm
Implications of the Fourth ParadigmImplications of the Fourth Paradigm
Implications of the Fourth Paradigm
Philip Bourne
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
datacite
 
BD2K @ NIH - A Vision Through 2020
BD2K @ NIH - A Vision Through 2020BD2K @ NIH - A Vision Through 2020
BD2K @ NIH - A Vision Through 2020
Philip Bourne
 
Towards a Data Commons
Towards a Data CommonsTowards a Data Commons
Towards a Data Commons
Michael Becich
 
How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?
Philip Bourne
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Juan Antonio Vizcaino
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and Solutions
Martin Donnelly
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in libraries
C. Tobin Magle
 
The PDB An Exemplar for Data Science To Date, But What About the Future?
The PDB An Exemplar for Data Science To Date, But What About the Future?The PDB An Exemplar for Data Science To Date, But What About the Future?
The PDB An Exemplar for Data Science To Date, But What About the Future?
Philip Bourne
 
Open Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon HodsonOpen Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon Hodson
Academy of Science of South Africa (ASSAf)
 
How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?
Philip Bourne
 
Research Data Management: Pushing the Frontiers of Good Research Practice
Research Data Management: Pushing the Frontiers of Good Research PracticeResearch Data Management: Pushing the Frontiers of Good Research Practice
Research Data Management: Pushing the Frontiers of Good Research Practice
Yasar Tonta
 
Institutional Data Management Blueprint
Institutional Data Management BlueprintInstitutional Data Management Blueprint
Institutional Data Management Blueprint
Eduserv
 

Similar to Big Data and its Role in Biomedical Research (20)

UK data management environment and support
UK data management environment and supportUK data management environment and support
UK data management environment and support
 
Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Data Science BD2K Update for NIH
Data Science BD2K Update for NIH
 
Big Data and Data Science: Opportunities for Biomedical Engineering
Big Data and Data Science: Opportunities for Biomedical EngineeringBig Data and Data Science: Opportunities for Biomedical Engineering
Big Data and Data Science: Opportunities for Biomedical Engineering
 
Are Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science AlignedAre Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science Aligned
 
Research Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the PolicyResearch Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the Policy
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...
 
Implications of the Fourth Paradigm
Implications of the Fourth ParadigmImplications of the Fourth Paradigm
Implications of the Fourth Paradigm
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
 
BD2K @ NIH - A Vision Through 2020
BD2K @ NIH - A Vision Through 2020BD2K @ NIH - A Vision Through 2020
BD2K @ NIH - A Vision Through 2020
 
Towards a Data Commons
Towards a Data CommonsTowards a Data Commons
Towards a Data Commons
 
How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?
 
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and Solutions
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in libraries
 
The PDB An Exemplar for Data Science To Date, But What About the Future?
The PDB An Exemplar for Data Science To Date, But What About the Future?The PDB An Exemplar for Data Science To Date, But What About the Future?
The PDB An Exemplar for Data Science To Date, But What About the Future?
 
Open Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon HodsonOpen Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon Hodson
 
Simon hodson
Simon hodsonSimon hodson
Simon hodson
 
How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?
 
Research Data Management: Pushing the Frontiers of Good Research Practice
Research Data Management: Pushing the Frontiers of Good Research PracticeResearch Data Management: Pushing the Frontiers of Good Research Practice
Research Data Management: Pushing the Frontiers of Good Research Practice
 
Institutional Data Management Blueprint
Institutional Data Management BlueprintInstitutional Data Management Blueprint
Institutional Data Management Blueprint
 

More from Philip Bourne

Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
Philip Bourne
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
Philip Bourne
 
AI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationAI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a Conversation
Philip Bourne
 
AI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingAI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We Going
Philip Bourne
 
Thoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityThoughts on Biological Data Sustainability
Thoughts on Biological Data Sustainability
Philip Bourne
 
What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?
Philip Bourne
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
Philip Bourne
 
Data Science Meets Drug Discovery
Data Science Meets Drug DiscoveryData Science Meets Drug Discovery
Data Science Meets Drug Discovery
Philip Bourne
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
Philip Bourne
 
BIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchBIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in Research
Philip Bourne
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data Science
Philip Bourne
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's View
Philip Bourne
 
Novo Nordisk 080522.pptx
Novo Nordisk 080522.pptxNovo Nordisk 080522.pptx
Novo Nordisk 080522.pptx
Philip Bourne
 
Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)
Philip Bourne
 
COVID and Precision Education
COVID and Precision EducationCOVID and Precision Education
COVID and Precision Education
Philip Bourne
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
Philip Bourne
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?
Philip Bourne
 
Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?
Philip Bourne
 
Data to Advance Sustainability
Data to Advance SustainabilityData to Advance Sustainability
Data to Advance Sustainability
Philip Bourne
 
Frontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesFrontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular Scales
Philip Bourne
 

More from Philip Bourne (20)

Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
AI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationAI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a Conversation
 
AI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingAI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We Going
 
Thoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityThoughts on Biological Data Sustainability
Thoughts on Biological Data Sustainability
 
What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Data Science Meets Drug Discovery
Data Science Meets Drug DiscoveryData Science Meets Drug Discovery
Data Science Meets Drug Discovery
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
BIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchBIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in Research
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data Science
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's View
 
Novo Nordisk 080522.pptx
Novo Nordisk 080522.pptxNovo Nordisk 080522.pptx
Novo Nordisk 080522.pptx
 
Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)
 
COVID and Precision Education
COVID and Precision EducationCOVID and Precision Education
COVID and Precision Education
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?
 
Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?
 
Data to Advance Sustainability
Data to Advance SustainabilityData to Advance Sustainability
Data to Advance Sustainability
 
Frontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesFrontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular Scales
 

Recently uploaded

BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 

Recently uploaded (20)

BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 

Big Data and its Role in Biomedical Research

  • 1. Big Data and its Role in Biomedical Research Philip E. Bourne PhD, FACMI Stephenson Chair of Data Science Director, Data Science Institute Professor of Biomedical Engineering peb6a@virginia.edu https://www.slideshare.net/pebourne 10/10/18 ACoP 2018 1 @pebourne
  • 2. Bias • Cant help but be influenced by my time as Associate Director for Data Science (ADDS) at NIH • Now very much engaged in data science across disciplines – broader but shallower perspective • Knowing my long-time colleague Prof. Lei Xie and others will follow me with a deeper perspective 10/10/18 ACoP 2018 2
  • 3. Lets start with a definition …. 10/10/18 ACoP 2018 3
  • 4. Big data and data science are like the Internet… If I asked you to define them you would all say something different, yet you use them every day… 10/10/18 ACoP 2018 4 http://vadlo.com/cartoons.php?id=357
  • 5. So what do I mean by big data/data science? • Use of the ever increasing amount of open, complex, diverse digital data • Finding ways to ask and then answer relevant questions by combining such diverse data sets • Arriving at statistically significant conclusions not otherwise obtainable • Sharing such findings in a useful way • Translating such findings into actions that improve the human condition 10/10/18 ACoP 2018 5
  • 6. Model Transportability Horizontal Integration Multi-scale Integration human mouse zebrafish DNA Gene/Protein Network Cell Tissue Organ Body Population CNV SNP methylation 3D structure Gene expression Proteomics Metabolomics MetabolicSignaling transduction Gene regulation Hepatic Myoepithelial Erythrocyte Epithelial Muscle Nervous Liver Kidney Pancreas Heart Physiologically based pharmacokinetics GWASPopulation dynamics Microbiota QSP - Open, complex, diverse digital data Xie et al. Annu Rev Pharmacol Toxicol. 2017 57:245-262 10/10/18 6
  • 7. Machine learning has been around for over 20 years – why the fuss now? • Amount of data available for training • Open source - R and python • Advances in computing (e.g., GPU’s) allow for deeper neural nets (deep learning) • Algorithmic efficiency gains (e.g., in back propagation) • Success promotes further research • Commercialization 10/10/18 ACoP 2018 7 Pastur-Romay et al. 2016 doi:10.3390/ijms17081313
  • 8. The NIH view • Big Data – Total data from NIH-funded research in 2016 estimated at 650 PB* – 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB in 2016 • Dark Data – Only 12% of data described in published papers is in recognized archives – 88% is dark data^ • Cost – 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data archives * In 2012 Library of Congress was 3 PB ^ http://www.ncbi.nlm.nih.gov/pubmed/26207759 10/10/18 ACoP 2018 8
  • 9. NIH strategic plan for data • Support a Highly Efficient and Effective Biomedical Research Data Infrastructure • Promote Modernization of the Data- Resources Ecosystem • Support the Development and Dissemination of Advanced Data Management, Analytics, and Visualization Tools • Enhance Workforce Development for Biomedical Data Science • Enact Appropriate Policies to Promote Stewardship and Sustainability 10/10/18 ACoP 2018 9 https://grants.nih.gov/grants/rfi/NIH-Strategic-Plan-for-Data-Science.pdf
  • 10. A research data infrastructure requires we move from pipes to platform… which begs the question ... 10/10/18 ACoP 2018 10 Vivien Bonazzi Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818. Will biomedical research become more like Airbnb?
  • 11. I am not crazy, hear me out • Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host) • The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder • It seems to be working: – 60 million users searching 2 million listings in 192 countries – Average of 500,000 stays per night. – Evaluation of US $25bn 10/10/18 ACoP 2018 11 Bonazzi & Bourne 2017, PLoS Biol. 7;15(4):e2001818.
  • 12. Cloud computing environment data metadata software model container Metadata Model Commons Model Commons Recommendation System Model registry User interface (A) (B) (C) ontology model data algorithm software These plans require moving from pipes to platforms 10/10/18 ACoP 2018 12
  • 13. The pillars of data science operate within this platform environment 13 QSP 10/10/18 ACoP 2018
  • 14. Lets briefly focus on those five pillars in the Context of QSP … 10/10/18 ACoP 2018 14
  • 15. Data acquisition The data production issue (the V’s of Big Data)— Experimentally • Estimated (2017) that ≈2.5 quintillion (2.5×1018) bytes of data generated daily, with 90% of all the world’s data having been created in the past two years. • Plaintext PDB files typically ≈ few 100s KB (…but, that’s just the start!) Mura et al. 2018 Curr Opin Struct Biol. 52:95-102 10/10/18 ACoP 2018 15
  • 16. Data integration and engineering • Generic – Ontologies – Object identifiers – Indexing schemes – Common data models 1610/10/18 ACoP 2018
  • 17. Data analytics 17 • Generic –SVM’s –Neural nets –Deep learning –Random forest 10/10/18 ACoP 2018
  • 18. Visualization • Generic – VR – Networks – Sonics 1810/10/18 ACoP 2018
  • 19. Ethics, law & policy 10/10/18 ACoP 2018 19 • Landmark studies identify histone mutations as recurrent driver mutations in DIPG ~2012 • Almost 3 years later, in largely the same datasets, but partially expanded, the same two groups and 2 others identify ACVR1 mutations as a secondary, co-occurring mutation From Adam Resnick Diffuse Intrinsic Pontine Glioma (DIDG)
  • 20. Conclusion: Driven by large amounts of open digital data of different types and new algorithms and approaches biomedical researchers are destined to follow the private sector towards the fourth paradigm 10/10/18 ACoP 2018 20
  • 21. Acknowledgements 10/10/18 ACoP 2018 21 The BD2K Team at NIH My Colleagues at UVA The 150 folks who have passed through my laboratory https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0 Zheng Zhao Lei Xie

Editor's Notes

  1. Model integration in systems pharmacology. Diverse models need to be integrated across multiple methodologies, multiple heterogeneous data sets, organismal hierarchy, and species (transportability).
  2. $1.25bn per year to capture all data. After a significant effort at reduction, intramurally data is spread across > 60 data centers; imagine the extramural situation.
  3. Distribution of kinases and the number of covalent small-molecule kinase inhibitors (CSKIs) for every targeted kinase across the human kinome
  4. 22