Biomedical Engineering in a
Changing Scholarly Landscape
Philip E. Bourne, PhD, FACMI
Stephenson Chair of Data Science
Director Data Science Institute
Professor of Biomedical Engineering
peb6a@virginia.edu
Celebrating the 50th Anniversary of the University of Virginia’s Biomedical
Engineering Department
https://www.slideshare.net/pebourne
BME 50th Anniversary 1
BME 50th Anniversary 2
The past 50 years has seen science and
technology bring about profound
change…
What can we learn from that and how
can we (BME) be part of the even
more profound change yet to come?
Here are a few answers from my own
biased view
I was 14 when BME started …
BME 50th Anniversary 3
The subsequent 50 years of science..
The best of times….
BME 50th Anniversary 4
BME 50th Anniversary 5
~1975
3 months
170 MB
~103 atoms
118 ms (107)
256 GB (103)
2017
~107 atoms
Life is 3-D and it begins with molecules
10.1371/journal.pbio.2002041
We now have a usable structural
proteome of model organisms
BME 50th Anniversary 6
Example - Photography
Brunk et al. 2016 Systems Biology of the Structural Proteome
doi: 10.1186/s12918-016-0271-6
Zhang Zhao
All available PDB structures mapped to
the network of E. coli metabolism
BME 50th Anniversary 7
Brunk et al. 2016 Systems Biology of the Structural Proteome doi: 10.1186/s12918-016-0271-6
The worst of times …
BME 50th Anniversary 8
Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
On November 6, 2012, Donald Trump
tweeted: "The concept of global warming
was created by and for the Chinese in
order to make U.S. manufacturing non-
competitive."
BME 50th Anniversary 9
Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
Source Washington Post
BME 50th Anniversary 10
Message 1.
Going forward we have a responsibility
to promote good science not only
through our own work but through
what we do collectively…
This action can come in many forms …
BME 50th Anniversary 11
My own recent effort (excuse the self
promotion)
BME 50th Anniversary 12
Famous scientists
Scientists
known by
those who care
about science
Average scientists
Illustrations by Jason McDermott
BME 50th Anniversary 13
Message 2.
I believe upcoming changes in science
will be profound
BME 50th Anniversary 14
Disruption:
Digitization
Deception
Disruption
Demonetization
Dematerialization
Democratization
Time
Volume,Velocity,Variety
Digital camera invented by
Kodak but shelved
Megapixels & quality improve slowly;
Kodak slow to react
Film market collapses;
Kodak goes bankrupt
Phones replace
cameras
Instagram,
Flickr become the
value proposition
Digital media becomes bona fide
form of communication
From a presentation to the Advisory Board to the NIH Director
Example - Photography
BME 50th Anniversary 15
Disruption: Biomedical Research
Digitization of Basic &
Clinical Research & EHR’s
Deception
We Are Here
Disruption
Demonetization
Dematerialization
Democratization
Open
science
Patient centered health care
BME 50th Anniversary 16
BME 50th Anniversary 17
1667
WOS: 123,763/1,839
2017
Daniel Mietchen
Disruption because…
• We cant keep up with the literature, let alone
available data, analytical tools, predictive
models etc.
• In a digital world there are new (and better?)
ways to encode knowledge and learn from it
BME 50th Anniversary 18
Consider an example:
Small beta barrels - a structural building block
SCOP folds
b.38
b.34
b.87
b.36
b.40
b.136
b.137
b.35
b.55
b.41
b.138
b.39
pseudo-symmetry of the framework no pseudo-symmetry of the frameworkBME 50th Anniversary 19
Chromatin restructuring
RNA Splicing
Signal
transduction in
kinases
RNA interference
(RNAi)
pre-tRNA processing
Genome integrity: RPA,
TEBP
Signal transduction (various
pathways)
Transcriptional
regulation
RNA processing and degradation
Same structural framework, lots of structural and functional variations
Knowledge is spread over 1,000’s of papers
BME 50th Anniversary 20
SM-like (b.38)
OB (b.40)
SplicingSignal transduction
Genome integrity
β-strands SH3-like (b.34) SM-like (b.38) OB (b.40)*
α/β0-helix-β1 N-term loop L1
β1-β2 RT L2 L12
β2-β3 n-Src L3 L23
β3-β4 Distal L4 L3α*, Lα4*
β4-β5 3-10 helix L5 L45
SH3-like (b.34)
Those papers use variable
nomenclature
Strongly bent 5-stranded
antiparallel β-sheet
2 antiparallel β-sheets
packed against each other
5-stranded β-sheet that
is coiled to form a closed
β-barrel
Two 3-stranded β-sheets
packed orthogonally to
form somewhat flattened
β-barrel
SCOP Barrel, partly open n=4, S=8 Barrel, open n=4, S=8
Barrel, closed or partly
open n=5, S=10 or S=8
DescriptionofthestructureNamingofloops
BME 50th Anniversary 21
It is years of work to pull all this
together …
Hard to publish …
When published the collective
knowledge is not very usable
BME 50th Anniversary 22
Stella Veretnik
Philippe
Youkharibache
Message 3.
Platforms will emerge that enable
better semantic reasoning across the
scientific knowledge base
BME 50th Anniversary 23
Platforms will ultimately digitally
integrate the scholarly workflow for
human and machine analysis
Should biomedical research be Like Airbnb?
doi: 10.1371/journal.pbio.2001818 BME 50th Anniversary 24Vivien Bonazzi
Paper Author Paper Reader
Data Provider Data Consumer
Employer Employee
Reagent Provider Reagent Consumer
Software Provider Software Consumer
Grant Writer Grant Reviewer
Supplier Consumer Platform
MS Project
Google Drive
Coursera
Researchgate
Academia.edu
Open Science
Framework
Synapse
F1000
Rio
Educator Student
Pilot Open Data Lab
Underway
BME 50th Anniversary 25gDOC
Message 4.
New tools will take advantage of such
platforms and accelerate discovery
BME 50th Anniversary 26
BME 50th Anniversary 27
At DeepMind, which is based in London,
AlphaGo Zero is working out how proteins
fold, a massive scientific challenge that
could give drug discovery a sorely needed
shot in the arm.
Engineering proteins nature has
missed?
There are ~ 20300 possible proteins
>>>> all the atoms in the Universe
96M protein sequences from
73,000 species (source RefSeq)
135,000 protein structures
yield 1221 folds (SCOPe 2.06)
Are their new scaffolds out there Nature has yet to discover that AI could?
BME 50th Anniversary 28
Example: Can deep neural networks
be used on protein structures?
Typical use cases
involve segmenting 2D
images to find which
pixels belong to a
certain class, i.e. dog
Can 3D image
segmentation be
used to find binding
sites on a protein
structure?
H2B Binding site in H2B:H4 PPI (3WKJ.H)
https://m2dsupsdlclass.github.io/lectures-labs/slides/04_conv_nets_2/images/dog_segment.jpg
Eli Draizen 29
Example: Histone H2B binding site
for histone H4
H2B
H4 H2B:H4 Binding Site
Nucleosome Core
Particle
3WKJ
3WKJ.H:3WKJ.F
30
Can we predict the binding site
given the structure of only one
partner?
H2B H2B:H4 Binding Site
31
Idea: Voxelize protein to find binding
sites with 3D convolutional neural
networks
1) Convert structure into “3D Image” where each atom is 1x1x1
Å box to perform image segmentation
H2B H2B:H4 Binding Site
32
Convolutional Neural Networks
Downsample Information (Channels
or Features) to make it more
interpretable
Convolutional
Layers
Max Pooling
Layers
2) “Convolute” around image or volume taking small regions and multiple each
value in the region by the filter and adding all neighboring values in the region
33
Features
For each voxel, create a 52-vector:
● Atom (Boolean, One-hot 12-vector)
● VDW
● Atom charge, +, - (Boolean)
● Hydrophobicity (KD)
● Accessible Surface Area
● Residue (Boolean, One-hot 20-vector)
● SS (E/H/X; Boolean, One-hot 3-vector)
● Train: Is binding site boolean
34
Training Data: Clustered binding sites
from one taxonomic branch, using the
LUCA structure as the representative
# of Eukaryotic clusters (n>1):
4578
Use representative sequence of
cluster (LUCA) and train for 2
classes (0=not binding site,
1=binding site)
Goncearenco A, Shaytan AK, Shoemaker BA, Panchenko AR. Biophysical Journal. 2015
35
Overall message for the coming years–
BME can lead change
• Engage with the Data Science Institute
• Experiment with platforms - participate in the
Open Data Lab
• Use the SIF fund to drive change
• Use the cluster hires to drive a focus on deep
learning and other emergent approaches
BME 50th Anniversary 36
Thank You
peb6a@virginia.edu
37

Biomedical Engineering in a Changing Scholarly Landscape

  • 1.
    Biomedical Engineering ina Changing Scholarly Landscape Philip E. Bourne, PhD, FACMI Stephenson Chair of Data Science Director Data Science Institute Professor of Biomedical Engineering peb6a@virginia.edu Celebrating the 50th Anniversary of the University of Virginia’s Biomedical Engineering Department https://www.slideshare.net/pebourne BME 50th Anniversary 1
  • 2.
    BME 50th Anniversary2 The past 50 years has seen science and technology bring about profound change… What can we learn from that and how can we (BME) be part of the even more profound change yet to come? Here are a few answers from my own biased view
  • 3.
    I was 14when BME started … BME 50th Anniversary 3 The subsequent 50 years of science..
  • 4.
    The best oftimes…. BME 50th Anniversary 4
  • 5.
    BME 50th Anniversary5 ~1975 3 months 170 MB ~103 atoms 118 ms (107) 256 GB (103) 2017 ~107 atoms Life is 3-D and it begins with molecules 10.1371/journal.pbio.2002041
  • 6.
    We now havea usable structural proteome of model organisms BME 50th Anniversary 6 Example - Photography Brunk et al. 2016 Systems Biology of the Structural Proteome doi: 10.1186/s12918-016-0271-6 Zhang Zhao
  • 7.
    All available PDBstructures mapped to the network of E. coli metabolism BME 50th Anniversary 7 Brunk et al. 2016 Systems Biology of the Structural Proteome doi: 10.1186/s12918-016-0271-6
  • 8.
    The worst oftimes … BME 50th Anniversary 8
  • 9.
    Source Michael Bellhttp://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830 On November 6, 2012, Donald Trump tweeted: "The concept of global warming was created by and for the Chinese in order to make U.S. manufacturing non- competitive." BME 50th Anniversary 9
  • 10.
    Source Michael Bellhttp://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830 Source Washington Post BME 50th Anniversary 10
  • 11.
    Message 1. Going forwardwe have a responsibility to promote good science not only through our own work but through what we do collectively… This action can come in many forms … BME 50th Anniversary 11
  • 12.
    My own recenteffort (excuse the self promotion) BME 50th Anniversary 12 Famous scientists Scientists known by those who care about science Average scientists
  • 13.
    Illustrations by JasonMcDermott BME 50th Anniversary 13
  • 14.
    Message 2. I believeupcoming changes in science will be profound BME 50th Anniversary 14
  • 15.
    Disruption: Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume,Velocity,Variety Digital camera inventedby Kodak but shelved Megapixels & quality improve slowly; Kodak slow to react Film market collapses; Kodak goes bankrupt Phones replace cameras Instagram, Flickr become the value proposition Digital media becomes bona fide form of communication From a presentation to the Advisory Board to the NIH Director Example - Photography BME 50th Anniversary 15
  • 16.
    Disruption: Biomedical Research Digitizationof Basic & Clinical Research & EHR’s Deception We Are Here Disruption Demonetization Dematerialization Democratization Open science Patient centered health care BME 50th Anniversary 16
  • 17.
    BME 50th Anniversary17 1667 WOS: 123,763/1,839 2017 Daniel Mietchen
  • 18.
    Disruption because… • Wecant keep up with the literature, let alone available data, analytical tools, predictive models etc. • In a digital world there are new (and better?) ways to encode knowledge and learn from it BME 50th Anniversary 18
  • 19.
    Consider an example: Smallbeta barrels - a structural building block SCOP folds b.38 b.34 b.87 b.36 b.40 b.136 b.137 b.35 b.55 b.41 b.138 b.39 pseudo-symmetry of the framework no pseudo-symmetry of the frameworkBME 50th Anniversary 19
  • 20.
    Chromatin restructuring RNA Splicing Signal transductionin kinases RNA interference (RNAi) pre-tRNA processing Genome integrity: RPA, TEBP Signal transduction (various pathways) Transcriptional regulation RNA processing and degradation Same structural framework, lots of structural and functional variations Knowledge is spread over 1,000’s of papers BME 50th Anniversary 20
  • 21.
    SM-like (b.38) OB (b.40) SplicingSignaltransduction Genome integrity β-strands SH3-like (b.34) SM-like (b.38) OB (b.40)* α/β0-helix-β1 N-term loop L1 β1-β2 RT L2 L12 β2-β3 n-Src L3 L23 β3-β4 Distal L4 L3α*, Lα4* β4-β5 3-10 helix L5 L45 SH3-like (b.34) Those papers use variable nomenclature Strongly bent 5-stranded antiparallel β-sheet 2 antiparallel β-sheets packed against each other 5-stranded β-sheet that is coiled to form a closed β-barrel Two 3-stranded β-sheets packed orthogonally to form somewhat flattened β-barrel SCOP Barrel, partly open n=4, S=8 Barrel, open n=4, S=8 Barrel, closed or partly open n=5, S=10 or S=8 DescriptionofthestructureNamingofloops BME 50th Anniversary 21
  • 22.
    It is yearsof work to pull all this together … Hard to publish … When published the collective knowledge is not very usable BME 50th Anniversary 22 Stella Veretnik Philippe Youkharibache
  • 23.
    Message 3. Platforms willemerge that enable better semantic reasoning across the scientific knowledge base BME 50th Anniversary 23
  • 24.
    Platforms will ultimatelydigitally integrate the scholarly workflow for human and machine analysis Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818 BME 50th Anniversary 24Vivien Bonazzi
  • 25.
    Paper Author PaperReader Data Provider Data Consumer Employer Employee Reagent Provider Reagent Consumer Software Provider Software Consumer Grant Writer Grant Reviewer Supplier Consumer Platform MS Project Google Drive Coursera Researchgate Academia.edu Open Science Framework Synapse F1000 Rio Educator Student Pilot Open Data Lab Underway BME 50th Anniversary 25gDOC
  • 26.
    Message 4. New toolswill take advantage of such platforms and accelerate discovery BME 50th Anniversary 26
  • 27.
    BME 50th Anniversary27 At DeepMind, which is based in London, AlphaGo Zero is working out how proteins fold, a massive scientific challenge that could give drug discovery a sorely needed shot in the arm.
  • 28.
    Engineering proteins naturehas missed? There are ~ 20300 possible proteins >>>> all the atoms in the Universe 96M protein sequences from 73,000 species (source RefSeq) 135,000 protein structures yield 1221 folds (SCOPe 2.06) Are their new scaffolds out there Nature has yet to discover that AI could? BME 50th Anniversary 28
  • 29.
    Example: Can deepneural networks be used on protein structures? Typical use cases involve segmenting 2D images to find which pixels belong to a certain class, i.e. dog Can 3D image segmentation be used to find binding sites on a protein structure? H2B Binding site in H2B:H4 PPI (3WKJ.H) https://m2dsupsdlclass.github.io/lectures-labs/slides/04_conv_nets_2/images/dog_segment.jpg Eli Draizen 29
  • 30.
    Example: Histone H2Bbinding site for histone H4 H2B H4 H2B:H4 Binding Site Nucleosome Core Particle 3WKJ 3WKJ.H:3WKJ.F 30
  • 31.
    Can we predictthe binding site given the structure of only one partner? H2B H2B:H4 Binding Site 31
  • 32.
    Idea: Voxelize proteinto find binding sites with 3D convolutional neural networks 1) Convert structure into “3D Image” where each atom is 1x1x1 Å box to perform image segmentation H2B H2B:H4 Binding Site 32
  • 33.
    Convolutional Neural Networks DownsampleInformation (Channels or Features) to make it more interpretable Convolutional Layers Max Pooling Layers 2) “Convolute” around image or volume taking small regions and multiple each value in the region by the filter and adding all neighboring values in the region 33
  • 34.
    Features For each voxel,create a 52-vector: ● Atom (Boolean, One-hot 12-vector) ● VDW ● Atom charge, +, - (Boolean) ● Hydrophobicity (KD) ● Accessible Surface Area ● Residue (Boolean, One-hot 20-vector) ● SS (E/H/X; Boolean, One-hot 3-vector) ● Train: Is binding site boolean 34
  • 35.
    Training Data: Clusteredbinding sites from one taxonomic branch, using the LUCA structure as the representative # of Eukaryotic clusters (n>1): 4578 Use representative sequence of cluster (LUCA) and train for 2 classes (0=not binding site, 1=binding site) Goncearenco A, Shaytan AK, Shoemaker BA, Panchenko AR. Biophysical Journal. 2015 35
  • 36.
    Overall message forthe coming years– BME can lead change • Engage with the Data Science Institute • Experiment with platforms - participate in the Open Data Lab • Use the SIF fund to drive change • Use the cluster hires to drive a focus on deep learning and other emergent approaches BME 50th Anniversary 36
  • 37.