SlideShare a Scribd company logo
1 of 50
Data Science Meets Biomedicine, Does Anything Change?
Philip E. Bourne PhD
peb6a@virginia.edu
https://www.slideshare.net/pebourne
October 3, 2023
Bias/Shortcomings/Perspective
• Member of the BSC for ~4 years – I know something of what is under
the hood
• Most of my recent work is administrative – this is more of a meta
perspective
• Academia has additional perspectives, different priorities
• Know a little about the inner workings of the NIH
• Strong propensity towards open scholarship
Disclaimer
I am privileged to be
helping build a new
kind of school within a
traditional institution. I
have drunk my own
Kool-Aid
https://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)
https://www.microsoft.com/en-us/research/wp-
content/uploads/2009/10/Fourth_Paradigm.pdf
https://twitter.com/aip_publishing/status/856825353645559808
Science Drivers Over the
Millennia
The Human Genome was the Tipping Point
and Led the Way
http://www.ornl.gov/hgmis
• High throughput DNA digital data changed how
we think about biomedicine
• Spawned a new field – bioinformatics /
computational biology/ systems biology /
biomedical data science
• Spawned a multi billion-dollar industry
Is Bioinformatics Dead? PLOS Biology 2021
Bourne’s Timeline
1980s 1990s 2000s 2010s 2020’s
The Discipline (Whatever it is Called)
Unknown Expt. Driven Emergent Over-sold A Service A Partner The Driver
7
Digital Data
Systems
Analytics
Design
Value
4 Pillars of Data Science
HPC Cloud GPUs
HHMs SVMs NNs CNNs LLMs
HIPPA Privacy Security HiTech
Mol Graphics Web 2.0 Dashboards
Basic Premise …..
We are at a new tipping point
Basic Premise …
“We need to be more aware than
ever of developments that may be
far outside our discipline that fall
under the broad topic of data
science. In short, we need to
become biomedical data
scientists.”
Stated another way, the
leadership role in data/informatics
afforded by the human genome
project no longer applies.
Data Science –
In 45+ Years in Academia I Have Never Seen Anything Like It
• It is a response to the digital transformation of
society
• It is touching every discipline (aka vertical)
• We can’t keep the students out of our classes
• Cause – large amounts of digital data
• Effect – interdisciplinarity, openness, translation,
search for responsibility and more
In summary, it is disruptive to current modes of biomedical research
Data Science
As a Driver Its Just the Beginning….
https://zenodo.org/record/6497693
45 Members Data scientist jobs are predicted to experience 36
percent growth between 2021 and 2031, according
to the US Bureau of Labor Statistics.
The global data science platform market size was
valued at USD 64.14 billion in 2021 and is projected
to grow from USD 81.47 billion in 2022 to USD
484.17 billion by 2029, exhibiting a CAGR of 29.0%
during the forecast period.
Data science is the fastest emerging field around the
globe.
Given these precedents about data and data
science we should start with a definition/framework
Big data and data science are like the Internet…
If I asked you to define them you would all say
something different, yet you use them every day…
http://vadlo.com/cartoons.php?id=357
One Definition of Data Science –
The 4+1 Model (aka domains)
• Value – assuring societal
benefit
• Design - Communication
of the value of data
• Systems – the means to
communicate and
convey benefit
• Analytics – models and
methods
• Practice – where
everything happens
[Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
The Data Science Interplay
• Value + Design = Openness,
responsibility
• Value + Analytics = Human
centered AI, algorithmic bias
• Value + Systems =
sustainability, access,
environmental impact
• Design + Analytics = literate
programming, visualization
• Design + Systems =
dashboards, engineering
design
• Analytics + Systems = ML
engineering
Thinking of data as a science unto itself is novel and controversial
[Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
With this definition let’s explore the
implications for biomedical research …
The 4+1 Model - Systems
• Value – assuring societal
benefit
• Design - Communication
of the value of data
• Systems – the means to
communicate and
convey benefit
• Analytics – models and
methods
• Practice – where
everything happens
[Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
Systems….
Science, 377 (6603), .
DOI: 10.1126/science.abo5947
Systems….
• Need something akin to the electricity grid or banking system
• Need to consider data and methods as first-class data objects
• Examples: European Open Science Cloud (EOSC), the CS3MESH4EOSC Science
Mesh, the China Science and Technology (CST) Cloud, the African Open Science
Platform, the South African National Integrated Cyber Infrastructure System, the
Malaysia Open Science Platform, the Global Open Science Cloud (GOSC) the
Australian Research Data Commons (ARDC) Nectar Research Cloud, the Digital
Research Alliance of Canada (formerly known as the New Digital Research
Infrastructure Organization), and the Arab States Research and Education
Network.
• Problems span funding agencies; solutions do not
• There is a lack of public-private partnership
Analytics ….
AlphaGo – Take Home Messages
https://www.alphagomovie.com/
1. Even the programmers were
disquieted by creating
something better than any
human
2. AlphaGo made a move that no
human Go expert nor
programmer anticipated
3. It takes a lot of resources to
defeat the world champion
Go has more moves than there are atoms in the universe
Its Not Over Yet….
Proteins have ~20**300 combinations also more than the
number of atoms in the universe
Science Games….
https://medium.com/proteinqure/welcome-into-the-fold-bbd3f3b19fdd
AlphaFold2 Makes Significant Leap
AlphaFold2
Numerical optimization – differential programming
Overall gradient descent trained to win CASP
Jumper et al.., 2021. Nature, 596 (7873),
pp.583-589
Transformer models using attention
Geometry invariant to
translation/rotation
Logistics Behind the Win
● Nothing fundamentally new from an AI perspective
● Data Integration
● Collaboration not competition
● Engineering challenge beyond most labs
● Compute power beyond most labs
● Team size beyond most labs
● Worked with protein structure specialists
Downstream Implications
• Cooperation rather than competition
• Public-private partnership
• Translational possibilities are endless
• Made possible by curated open data
• Appreciate engineering
Scientific Implications
Exploration of Latent Space
Rethink fold space? Rethink classification schemes?
AI Analytics Across the Scientific Discovery
Process
From Yolanda Gil 2023 AI for Science Eds. Choudhary, Fox & Hey p699
The 4+1 Model - Design
• Value – assuring societal
benefit
• Design - Communication
of the value of data
• Systems – the means to
communicate and
convey benefit
• Analytics – models and
methods
• Practice – where
everything happens
[Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
Beyond data science the academic landscape
is changing….
https://doi.org/10.1038/sdata.2016.18
https://www.heliosopen.org/
The One Lever Left in Open
Scholarship is Academia Itself
Openness/FAIR
Data Science would not exist if it were not for open
data and methods. It would be wrong for us to take
and not give back
https://sparcopen.org/
https://datascience.virginia.edu/policies
Questions I Leave You With ….
• Are we indeed at a change point?
• Will biomedicine continue to lead data science?
• Do we need new models for doing science?
• Are we placing the right emphasis on our research
products, notably data and methods vs papers
Questions?
Databases
organize data
around a project.
Data warehouses
organize the data
for an organization
Data commons
organize the data
for a scientific
discipline or field
Data
Warehouse
Data Ecosystems
How we think about our
infrastructure is important
Challenges
Fixed level of funding
Opportunities
data commons
Data commons co-locate data
with cloud computing
infrastructure and commonly
used software services, tools &
apps for managing, analyzing and
sharing data to create an
interoperable resource for the
research community.*
*Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A Case for Data Commons Towards Data Science as a Service, IEEE
Computing in Science and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons infrastructure at a University of Chicago data center.
Bonazzi VR, Bourne PE (2017) Should biomedical research be like Airbnb? PLoS Biol 15(4): e2001818.
Systems
[Adapted from Bob Grossman]
But wait the picture is more complicated….
Data Science versus Data Engineering – How
Much Emphasis Where?
A Data Integration Poster Child
Researcher and Assistant Professor of
Medicine Dr. Thomas Hartka, also a
current online Masters in Data Science
student, is combining two disparate
data sets—electronic health records
and DMV crash data—to save lives
after motor vehicle crashes.
“I enrolled in the MSDS program to
expand my research on automotive
safety. I have already used
techniques from classes in my work.
I hope to expand my research to
real-time analytics to improve
emergency room care.”
— Dr. Thomas Hartka, UVA School
of Medicine
Coming back to the question…
So we have a definition of data science and we
have a set of guiding principles, where does this
take us?
Stated another way, what do we want to be
recognized for in 10 years?
https://pebourne.wordpress.com/
Research ethics
committees (RECs) review
the ethical acceptability
of research involving
human participants.
Historically, the principal
emphases of RECs have
been to protect
participants from physical
harms and to provide
assurance as to
participants’ interests and
welfare.*
[The Framework] is
guided by, Article 27
of the 1948 Universal
Declaration of Human
Rights. Article 27
guarantees the rights
of every individual in
the world "to share in
scientific
advancement and its
benefits" (including to
freely engage in
responsible scientific
inquiry)…*
Protect human
subject data
The right of human
subjects to benefit
from research.
*GA4GH Framework for Responsible Sharing of Genomic and Health-Related Data, see goo.gl/CTavQR
Data sharing with protections provides the evidence
so patients can benefit from advances in research.
Balance protecting human subject data
with open research that benefits
patients
[Adapted from Bob Grossman]
Value
Why Responsible Data Science?
• A defining feature
• A partnership between STEM, social
sciences and the humanities
• Where UVA has strength
Model
Transportability
Horizontal
Integration
Multi-scale
Integration
human
mouse
zebrafish
DNA
Gene/Protein
Network
Cell
Tissue
Organ
Body
Population
CNV SNP methylation
3D structure Gene
expression Proteomics
Metabolomics
Metabolic
Signaling
transduction
Gene
regulation
Hepatic Myoepithelial Erythrocyte
Epithelial Muscle Nervous
Liver Kidney Pancreas Heart
Physiologically based
pharmacokinetics
GWAS
Population
dynamics
Microbiota
From Harnessing Big Data for Systems Pharmacology 2017
https://doi.org/10.1146/annurev-pharmtox-010716-104659
Current roadblocks are more cultural than technical
The Fifth Paradigm: Integration Across Scales?
Gohlke et al. 2022
https://onlinelibrary.wiley.com/doi/10.1002/ctm2.726
Real World Evidence for Preventive Effects of Statins on
Cancer Incidence: A Transatlantic Analysis
EHR
Animal Models
Pathways
Daily Challenges
• Deciding what not to do
• Competition for the best team members (faculty and staff)
• Establishing a diverse team
• Lack of a comprehensive enterprise-wide data infrastructure
• Its easier to conform

More Related Content

Similar to Data Science Meets Biomedicine, Does Anything Change

Research Metadata Mechanics - Simon Porter
Research Metadata Mechanics - Simon PorterResearch Metadata Mechanics - Simon Porter
Research Metadata Mechanics - Simon Porter
CASRAI
 

Similar to Data Science Meets Biomedicine, Does Anything Change (20)

Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data Challenges
 
AMIA 2014
AMIA 2014AMIA 2014
AMIA 2014
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Rising tide of data update 20171024
Rising tide of data update 20171024Rising tide of data update 20171024
Rising tide of data update 20171024
 
Rising tide of data update
Rising tide of data update Rising tide of data update
Rising tide of data update
 
Workshop intro090314
Workshop intro090314Workshop intro090314
Workshop intro090314
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data Science
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
Will Data Science Approaches Impact Our Science?
Will Data Science Approaches Impact Our Science?Will Data Science Approaches Impact Our Science?
Will Data Science Approaches Impact Our Science?
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
Yale Day of Data
Yale Day of Data Yale Day of Data
Yale Day of Data
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 
Biomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital EnterpriseBiomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital Enterprise
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020
 
Data at the NIH
Data at the NIHData at the NIH
Data at the NIH
 
Research Metadata Mechanics - Simon Porter
Research Metadata Mechanics - Simon PorterResearch Metadata Mechanics - Simon Porter
Research Metadata Mechanics - Simon Porter
 
Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...
 

More from Philip Bourne

More from Philip Bourne (20)

AI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationAI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a Conversation
 
AI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingAI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We Going
 
Thoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityThoughts on Biological Data Sustainability
Thoughts on Biological Data Sustainability
 
What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?
 
Data Science Meets Drug Discovery
Data Science Meets Drug DiscoveryData Science Meets Drug Discovery
Data Science Meets Drug Discovery
 
BIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchBIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in Research
 
Novo Nordisk 080522.pptx
Novo Nordisk 080522.pptxNovo Nordisk 080522.pptx
Novo Nordisk 080522.pptx
 
Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)
 
COVID and Precision Education
COVID and Precision EducationCOVID and Precision Education
COVID and Precision Education
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?
 
Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?
 
Data to Advance Sustainability
Data to Advance SustainabilityData to Advance Sustainability
Data to Advance Sustainability
 
Frontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesFrontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular Scales
 
Social Responsibility in Research
Social Responsibility in ResearchSocial Responsibility in Research
Social Responsibility in Research
 
SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?
 
The Most Important Ten Simple Rules
The Most Important Ten Simple RulesThe Most Important Ten Simple Rules
The Most Important Ten Simple Rules
 
UVA School of Data Science
UVA School of Data ScienceUVA School of Data Science
UVA School of Data Science
 
Capstone Experience - SWOT Analysis
Capstone Experience - SWOT AnalysisCapstone Experience - SWOT Analysis
Capstone Experience - SWOT Analysis
 
Data Science During and After COVID-19
Data Science During and After COVID-19Data Science During and After COVID-19
Data Science During and After COVID-19
 
Lessons in Modeling from 3-D Structural & Data Science Perspectives
Lessons in Modeling from 3-D Structural & Data Science PerspectivesLessons in Modeling from 3-D Structural & Data Science Perspectives
Lessons in Modeling from 3-D Structural & Data Science Perspectives
 

Recently uploaded

Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
AnaAcapella
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
httgc7rh9c
 

Recently uploaded (20)

Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdf
 
Play hard learn harder: The Serious Business of Play
Play hard learn harder:  The Serious Business of PlayPlay hard learn harder:  The Serious Business of Play
Play hard learn harder: The Serious Business of Play
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
 
OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 

Data Science Meets Biomedicine, Does Anything Change

  • 1. Data Science Meets Biomedicine, Does Anything Change? Philip E. Bourne PhD peb6a@virginia.edu https://www.slideshare.net/pebourne October 3, 2023
  • 2.
  • 3. Bias/Shortcomings/Perspective • Member of the BSC for ~4 years – I know something of what is under the hood • Most of my recent work is administrative – this is more of a meta perspective • Academia has additional perspectives, different priorities • Know a little about the inner workings of the NIH • Strong propensity towards open scholarship
  • 4. Disclaimer I am privileged to be helping build a new kind of school within a traditional institution. I have drunk my own Kool-Aid
  • 6. The Human Genome was the Tipping Point and Led the Way http://www.ornl.gov/hgmis • High throughput DNA digital data changed how we think about biomedicine • Spawned a new field – bioinformatics / computational biology/ systems biology / biomedical data science • Spawned a multi billion-dollar industry Is Bioinformatics Dead? PLOS Biology 2021
  • 7. Bourne’s Timeline 1980s 1990s 2000s 2010s 2020’s The Discipline (Whatever it is Called) Unknown Expt. Driven Emergent Over-sold A Service A Partner The Driver 7 Digital Data Systems Analytics Design Value 4 Pillars of Data Science HPC Cloud GPUs HHMs SVMs NNs CNNs LLMs HIPPA Privacy Security HiTech Mol Graphics Web 2.0 Dashboards
  • 8. Basic Premise ….. We are at a new tipping point
  • 9. Basic Premise … “We need to be more aware than ever of developments that may be far outside our discipline that fall under the broad topic of data science. In short, we need to become biomedical data scientists.” Stated another way, the leadership role in data/informatics afforded by the human genome project no longer applies.
  • 10. Data Science – In 45+ Years in Academia I Have Never Seen Anything Like It • It is a response to the digital transformation of society • It is touching every discipline (aka vertical) • We can’t keep the students out of our classes • Cause – large amounts of digital data • Effect – interdisciplinarity, openness, translation, search for responsibility and more In summary, it is disruptive to current modes of biomedical research
  • 11. Data Science As a Driver Its Just the Beginning…. https://zenodo.org/record/6497693 45 Members Data scientist jobs are predicted to experience 36 percent growth between 2021 and 2031, according to the US Bureau of Labor Statistics. The global data science platform market size was valued at USD 64.14 billion in 2021 and is projected to grow from USD 81.47 billion in 2022 to USD 484.17 billion by 2029, exhibiting a CAGR of 29.0% during the forecast period. Data science is the fastest emerging field around the globe.
  • 12. Given these precedents about data and data science we should start with a definition/framework
  • 13. Big data and data science are like the Internet… If I asked you to define them you would all say something different, yet you use them every day… http://vadlo.com/cartoons.php?id=357
  • 14. One Definition of Data Science – The 4+1 Model (aka domains) • Value – assuring societal benefit • Design - Communication of the value of data • Systems – the means to communicate and convey benefit • Analytics – models and methods • Practice – where everything happens [Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
  • 15. The Data Science Interplay • Value + Design = Openness, responsibility • Value + Analytics = Human centered AI, algorithmic bias • Value + Systems = sustainability, access, environmental impact • Design + Analytics = literate programming, visualization • Design + Systems = dashboards, engineering design • Analytics + Systems = ML engineering Thinking of data as a science unto itself is novel and controversial [Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
  • 16. With this definition let’s explore the implications for biomedical research …
  • 17. The 4+1 Model - Systems • Value – assuring societal benefit • Design - Communication of the value of data • Systems – the means to communicate and convey benefit • Analytics – models and methods • Practice – where everything happens [Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
  • 18. Systems…. Science, 377 (6603), . DOI: 10.1126/science.abo5947
  • 19. Systems…. • Need something akin to the electricity grid or banking system • Need to consider data and methods as first-class data objects • Examples: European Open Science Cloud (EOSC), the CS3MESH4EOSC Science Mesh, the China Science and Technology (CST) Cloud, the African Open Science Platform, the South African National Integrated Cyber Infrastructure System, the Malaysia Open Science Platform, the Global Open Science Cloud (GOSC) the Australian Research Data Commons (ARDC) Nectar Research Cloud, the Digital Research Alliance of Canada (formerly known as the New Digital Research Infrastructure Organization), and the Arab States Research and Education Network. • Problems span funding agencies; solutions do not • There is a lack of public-private partnership
  • 21. AlphaGo – Take Home Messages https://www.alphagomovie.com/ 1. Even the programmers were disquieted by creating something better than any human 2. AlphaGo made a move that no human Go expert nor programmer anticipated 3. It takes a lot of resources to defeat the world champion Go has more moves than there are atoms in the universe
  • 22. Its Not Over Yet….
  • 23. Proteins have ~20**300 combinations also more than the number of atoms in the universe
  • 25.
  • 27. AlphaFold2 Numerical optimization – differential programming Overall gradient descent trained to win CASP Jumper et al.., 2021. Nature, 596 (7873), pp.583-589 Transformer models using attention Geometry invariant to translation/rotation
  • 28. Logistics Behind the Win ● Nothing fundamentally new from an AI perspective ● Data Integration ● Collaboration not competition ● Engineering challenge beyond most labs ● Compute power beyond most labs ● Team size beyond most labs ● Worked with protein structure specialists
  • 29. Downstream Implications • Cooperation rather than competition • Public-private partnership • Translational possibilities are endless • Made possible by curated open data • Appreciate engineering
  • 31. Exploration of Latent Space Rethink fold space? Rethink classification schemes?
  • 32. AI Analytics Across the Scientific Discovery Process From Yolanda Gil 2023 AI for Science Eds. Choudhary, Fox & Hey p699
  • 33. The 4+1 Model - Design • Value – assuring societal benefit • Design - Communication of the value of data • Systems – the means to communicate and convey benefit • Analytics – models and methods • Practice – where everything happens [Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
  • 34.
  • 35. Beyond data science the academic landscape is changing….
  • 37. Openness/FAIR Data Science would not exist if it were not for open data and methods. It would be wrong for us to take and not give back https://sparcopen.org/ https://datascience.virginia.edu/policies
  • 38. Questions I Leave You With …. • Are we indeed at a change point? • Will biomedicine continue to lead data science? • Do we need new models for doing science? • Are we placing the right emphasis on our research products, notably data and methods vs papers
  • 40. Databases organize data around a project. Data warehouses organize the data for an organization Data commons organize the data for a scientific discipline or field Data Warehouse Data Ecosystems How we think about our infrastructure is important
  • 41. Challenges Fixed level of funding Opportunities data commons Data commons co-locate data with cloud computing infrastructure and commonly used software services, tools & apps for managing, analyzing and sharing data to create an interoperable resource for the research community.* *Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A Case for Data Commons Towards Data Science as a Service, IEEE Computing in Science and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons infrastructure at a University of Chicago data center. Bonazzi VR, Bourne PE (2017) Should biomedical research be like Airbnb? PLoS Biol 15(4): e2001818. Systems [Adapted from Bob Grossman]
  • 42. But wait the picture is more complicated….
  • 43. Data Science versus Data Engineering – How Much Emphasis Where?
  • 44. A Data Integration Poster Child Researcher and Assistant Professor of Medicine Dr. Thomas Hartka, also a current online Masters in Data Science student, is combining two disparate data sets—electronic health records and DMV crash data—to save lives after motor vehicle crashes. “I enrolled in the MSDS program to expand my research on automotive safety. I have already used techniques from classes in my work. I hope to expand my research to real-time analytics to improve emergency room care.” — Dr. Thomas Hartka, UVA School of Medicine
  • 45. Coming back to the question… So we have a definition of data science and we have a set of guiding principles, where does this take us? Stated another way, what do we want to be recognized for in 10 years? https://pebourne.wordpress.com/
  • 46. Research ethics committees (RECs) review the ethical acceptability of research involving human participants. Historically, the principal emphases of RECs have been to protect participants from physical harms and to provide assurance as to participants’ interests and welfare.* [The Framework] is guided by, Article 27 of the 1948 Universal Declaration of Human Rights. Article 27 guarantees the rights of every individual in the world "to share in scientific advancement and its benefits" (including to freely engage in responsible scientific inquiry)…* Protect human subject data The right of human subjects to benefit from research. *GA4GH Framework for Responsible Sharing of Genomic and Health-Related Data, see goo.gl/CTavQR Data sharing with protections provides the evidence so patients can benefit from advances in research. Balance protecting human subject data with open research that benefits patients [Adapted from Bob Grossman] Value
  • 47. Why Responsible Data Science? • A defining feature • A partnership between STEM, social sciences and the humanities • Where UVA has strength
  • 48. Model Transportability Horizontal Integration Multi-scale Integration human mouse zebrafish DNA Gene/Protein Network Cell Tissue Organ Body Population CNV SNP methylation 3D structure Gene expression Proteomics Metabolomics Metabolic Signaling transduction Gene regulation Hepatic Myoepithelial Erythrocyte Epithelial Muscle Nervous Liver Kidney Pancreas Heart Physiologically based pharmacokinetics GWAS Population dynamics Microbiota From Harnessing Big Data for Systems Pharmacology 2017 https://doi.org/10.1146/annurev-pharmtox-010716-104659 Current roadblocks are more cultural than technical The Fifth Paradigm: Integration Across Scales?
  • 49. Gohlke et al. 2022 https://onlinelibrary.wiley.com/doi/10.1002/ctm2.726 Real World Evidence for Preventive Effects of Statins on Cancer Incidence: A Transatlantic Analysis EHR Animal Models Pathways
  • 50. Daily Challenges • Deciding what not to do • Competition for the best team members (faculty and staff) • Establishing a diverse team • Lack of a comprehensive enterprise-wide data infrastructure • Its easier to conform

Editor's Notes

  1. I will introduce the concept of data science with a story that illustrates - citizen engagement, merging of unexpected data and societal benefit