Data Science Meets Biomedicine, Does Anything Change
1. Data Science Meets Biomedicine, Does Anything Change?
Philip E. Bourne PhD
peb6a@virginia.edu
https://www.slideshare.net/pebourne
October 3, 2023
2.
3. Bias/Shortcomings/Perspective
• Member of the BSC for ~4 years – I know something of what is under
the hood
• Most of my recent work is administrative – this is more of a meta
perspective
• Academia has additional perspectives, different priorities
• Know a little about the inner workings of the NIH
• Strong propensity towards open scholarship
4. Disclaimer
I am privileged to be
helping build a new
kind of school within a
traditional institution. I
have drunk my own
Kool-Aid
6. The Human Genome was the Tipping Point
and Led the Way
http://www.ornl.gov/hgmis
• High throughput DNA digital data changed how
we think about biomedicine
• Spawned a new field – bioinformatics /
computational biology/ systems biology /
biomedical data science
• Spawned a multi billion-dollar industry
Is Bioinformatics Dead? PLOS Biology 2021
7. Bourne’s Timeline
1980s 1990s 2000s 2010s 2020’s
The Discipline (Whatever it is Called)
Unknown Expt. Driven Emergent Over-sold A Service A Partner The Driver
7
Digital Data
Systems
Analytics
Design
Value
4 Pillars of Data Science
HPC Cloud GPUs
HHMs SVMs NNs CNNs LLMs
HIPPA Privacy Security HiTech
Mol Graphics Web 2.0 Dashboards
9. Basic Premise …
“We need to be more aware than
ever of developments that may be
far outside our discipline that fall
under the broad topic of data
science. In short, we need to
become biomedical data
scientists.”
Stated another way, the
leadership role in data/informatics
afforded by the human genome
project no longer applies.
10. Data Science –
In 45+ Years in Academia I Have Never Seen Anything Like It
• It is a response to the digital transformation of
society
• It is touching every discipline (aka vertical)
• We can’t keep the students out of our classes
• Cause – large amounts of digital data
• Effect – interdisciplinarity, openness, translation,
search for responsibility and more
In summary, it is disruptive to current modes of biomedical research
11. Data Science
As a Driver Its Just the Beginning….
https://zenodo.org/record/6497693
45 Members Data scientist jobs are predicted to experience 36
percent growth between 2021 and 2031, according
to the US Bureau of Labor Statistics.
The global data science platform market size was
valued at USD 64.14 billion in 2021 and is projected
to grow from USD 81.47 billion in 2022 to USD
484.17 billion by 2029, exhibiting a CAGR of 29.0%
during the forecast period.
Data science is the fastest emerging field around the
globe.
12. Given these precedents about data and data
science we should start with a definition/framework
13. Big data and data science are like the Internet…
If I asked you to define them you would all say
something different, yet you use them every day…
http://vadlo.com/cartoons.php?id=357
14. One Definition of Data Science –
The 4+1 Model (aka domains)
• Value – assuring societal
benefit
• Design - Communication
of the value of data
• Systems – the means to
communicate and
convey benefit
• Analytics – models and
methods
• Practice – where
everything happens
[Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
15. The Data Science Interplay
• Value + Design = Openness,
responsibility
• Value + Analytics = Human
centered AI, algorithmic bias
• Value + Systems =
sustainability, access,
environmental impact
• Design + Analytics = literate
programming, visualization
• Design + Systems =
dashboards, engineering
design
• Analytics + Systems = ML
engineering
Thinking of data as a science unto itself is novel and controversial
[Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
17. The 4+1 Model - Systems
• Value – assuring societal
benefit
• Design - Communication
of the value of data
• Systems – the means to
communicate and
convey benefit
• Analytics – models and
methods
• Practice – where
everything happens
[Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
19. Systems….
• Need something akin to the electricity grid or banking system
• Need to consider data and methods as first-class data objects
• Examples: European Open Science Cloud (EOSC), the CS3MESH4EOSC Science
Mesh, the China Science and Technology (CST) Cloud, the African Open Science
Platform, the South African National Integrated Cyber Infrastructure System, the
Malaysia Open Science Platform, the Global Open Science Cloud (GOSC) the
Australian Research Data Commons (ARDC) Nectar Research Cloud, the Digital
Research Alliance of Canada (formerly known as the New Digital Research
Infrastructure Organization), and the Arab States Research and Education
Network.
• Problems span funding agencies; solutions do not
• There is a lack of public-private partnership
21. AlphaGo – Take Home Messages
https://www.alphagomovie.com/
1. Even the programmers were
disquieted by creating
something better than any
human
2. AlphaGo made a move that no
human Go expert nor
programmer anticipated
3. It takes a lot of resources to
defeat the world champion
Go has more moves than there are atoms in the universe
27. AlphaFold2
Numerical optimization – differential programming
Overall gradient descent trained to win CASP
Jumper et al.., 2021. Nature, 596 (7873),
pp.583-589
Transformer models using attention
Geometry invariant to
translation/rotation
28. Logistics Behind the Win
● Nothing fundamentally new from an AI perspective
● Data Integration
● Collaboration not competition
● Engineering challenge beyond most labs
● Compute power beyond most labs
● Team size beyond most labs
● Worked with protein structure specialists
29. Downstream Implications
• Cooperation rather than competition
• Public-private partnership
• Translational possibilities are endless
• Made possible by curated open data
• Appreciate engineering
32. AI Analytics Across the Scientific Discovery
Process
From Yolanda Gil 2023 AI for Science Eds. Choudhary, Fox & Hey p699
33. The 4+1 Model - Design
• Value – assuring societal
benefit
• Design - Communication
of the value of data
• Systems – the means to
communicate and
convey benefit
• Analytics – models and
methods
• Practice – where
everything happens
[Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
37. Openness/FAIR
Data Science would not exist if it were not for open
data and methods. It would be wrong for us to take
and not give back
https://sparcopen.org/
https://datascience.virginia.edu/policies
38. Questions I Leave You With ….
• Are we indeed at a change point?
• Will biomedicine continue to lead data science?
• Do we need new models for doing science?
• Are we placing the right emphasis on our research
products, notably data and methods vs papers
40. Databases
organize data
around a project.
Data warehouses
organize the data
for an organization
Data commons
organize the data
for a scientific
discipline or field
Data
Warehouse
Data Ecosystems
How we think about our
infrastructure is important
41. Challenges
Fixed level of funding
Opportunities
data commons
Data commons co-locate data
with cloud computing
infrastructure and commonly
used software services, tools &
apps for managing, analyzing and
sharing data to create an
interoperable resource for the
research community.*
*Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A Case for Data Commons Towards Data Science as a Service, IEEE
Computing in Science and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons infrastructure at a University of Chicago data center.
Bonazzi VR, Bourne PE (2017) Should biomedical research be like Airbnb? PLoS Biol 15(4): e2001818.
Systems
[Adapted from Bob Grossman]
44. A Data Integration Poster Child
Researcher and Assistant Professor of
Medicine Dr. Thomas Hartka, also a
current online Masters in Data Science
student, is combining two disparate
data sets—electronic health records
and DMV crash data—to save lives
after motor vehicle crashes.
“I enrolled in the MSDS program to
expand my research on automotive
safety. I have already used
techniques from classes in my work.
I hope to expand my research to
real-time analytics to improve
emergency room care.”
— Dr. Thomas Hartka, UVA School
of Medicine
45. Coming back to the question…
So we have a definition of data science and we
have a set of guiding principles, where does this
take us?
Stated another way, what do we want to be
recognized for in 10 years?
https://pebourne.wordpress.com/
46. Research ethics
committees (RECs) review
the ethical acceptability
of research involving
human participants.
Historically, the principal
emphases of RECs have
been to protect
participants from physical
harms and to provide
assurance as to
participants’ interests and
welfare.*
[The Framework] is
guided by, Article 27
of the 1948 Universal
Declaration of Human
Rights. Article 27
guarantees the rights
of every individual in
the world "to share in
scientific
advancement and its
benefits" (including to
freely engage in
responsible scientific
inquiry)…*
Protect human
subject data
The right of human
subjects to benefit
from research.
*GA4GH Framework for Responsible Sharing of Genomic and Health-Related Data, see goo.gl/CTavQR
Data sharing with protections provides the evidence
so patients can benefit from advances in research.
Balance protecting human subject data
with open research that benefits
patients
[Adapted from Bob Grossman]
Value
47. Why Responsible Data Science?
• A defining feature
• A partnership between STEM, social
sciences and the humanities
• Where UVA has strength
49. Gohlke et al. 2022
https://onlinelibrary.wiley.com/doi/10.1002/ctm2.726
Real World Evidence for Preventive Effects of Statins on
Cancer Incidence: A Transatlantic Analysis
EHR
Animal Models
Pathways
50. Daily Challenges
• Deciding what not to do
• Competition for the best team members (faculty and staff)
• Establishing a diverse team
• Lack of a comprehensive enterprise-wide data infrastructure
• Its easier to conform
Editor's Notes
I will introduce the concept of data science with a story that illustrates - citizen engagement, merging of unexpected data and societal benefit