SlideShare a Scribd company logo
1 of 40
Download to read offline
Free University of Bolzano - Bozen
Faculty of Engineering
Alan Ianeselli
SFSCON 2023
Machine learning-driven simulation of
protein folding atomistic trajectories
Protein structure
~20 amino acids
They link by peptide bonds to form a polymer
aa1 aa2
peptide bond
A protein is a polymer of tens, hundreds or
thousands of amino acids
Protein structure
→ Unique conformation given a specific aminoacidic sequence
= the protein folding problem
Primary structure
(aa sequence)
Secondary structure
(local structures)
Tertiary structure
(3D conformation)
Quaternary structure
(protein complexes)
Protein structure
All-atom
representation
Ribbon
representation
Protein synthesis
→ How does the newly synthesized disordered
protein achieve its final conformation?
Author: Juan Gaertner
mRNA
newly synthesized protein
(disordered)
tRNA
carried
single aa
ribosome
Molecular dynamics (MD)
simulations
‣ Numerically solve Newton’s equations of motion over time
for each atom of the system
‣ Forces are computed from force fields
Non-bonded interactions
Coulomb
Van der Waals
Molecular dynamics (MD)
simulations
Bonded interactions
Bond stretching Angle bending Dihedral torsions
‣ Forces are computed from force fields
Molecular dynamics (MD)
simulations
Molecular dynamics (MD)
simulations
Time
Unfolded
Folded
Folding time ~ microseconds (for small proteins)
= Weeks of simulation on supercomputers
DE Shaw et al., 2009
Anton Supercomputer
~50 µs/day for ~100’000 atoms
Molecular dynamics (MD)
simulations
~100 years of simulation!!
DE Shaw et al., 2009
Anton Supercomputer
~50 µs/day for ~100’000 atoms
For example, Lysozyme in water (~100’000 atoms)
requires SECONDS to fold
Molecular dynamics (MD)
simulations
~100 years of simulation!!
‣ Conventional MD approaches are unfeasible
DE Shaw et al., 2009
Anton Supercomputer
~50 µs/day for ~100’000 atoms
For example, Lysozyme in water (~100’000 atoms)
requires SECONDS to fold
Molecular dynamics (MD)
simulations
Timescales of macromolecules
ps ns µs ms s m h
Classical MD simulations “Relevant” biology
Timescales of macromolecules
ps ns µs ms s m h
Classical MD simulations “Relevant” biology
Coarse grained and approximated models
Timescales of macromolecules
ps ns µs ms s m h
Classical MD simulations “Relevant” biology
Coarse grained and approximated models
→ Sometimes unrealistic or unfalsifiable
What I have in mind?
A smart algorithm to study protein folding trajectories
What I have in mind?
How do you measure if it went “forward”?
A smart algorithm to study protein folding trajectories
Reaction Coordinate and
Collective Variables
Folded
Unfolded
Transition
State 1 Transition
State 2
Reaction coordinate (ideal)
Free
energy
(a.u.)
Reaction Coordinate and
Collective Variables
Folded
Unfolded
Transition
State 1 Transition
State 2
Reaction coordinate (ideal)
Free
energy
(a.u.)
Collective variable (measurable)
„Best“ collective variables (CVs)
for protein folding
CVs can quantify the difference
between two conformations
„Best“ collective variables (CVs)
for protein folding
1. Dihedral angles deviation from the native state
Syzonenko et al,
J. Chem. Inf. Model. 2020
CVs can quantify the difference
between two conformations
„Best“ collective variables (CVs)
for protein folding
1. Dihedral angles deviation from the native state
2. Inter-aa contact deviation from the native state
residue #
residue
#
Syzonenko et al,
J. Chem. Inf. Model. 2020
Beccara et al,
Phys. Rev. Lett. 2015
CVs can quantify the difference
between two conformations
„Best“ collective variables (CVs)
for protein folding
1. Dihedral angles deviation from the native state
2. Inter-aa contact deviation from the native state
3. Geometrical difference from the native state
residue #
residue
#
Syzonenko et al,
J. Chem. Inf. Model. 2020
Beccara et al,
Phys. Rev. Lett. 2015
CVs can quantify the difference
between two conformations
„Best“ collective variables (CVs)
for protein folding
1. Dihedral angles deviation from the native state
2. Inter-aa contact deviation from the native state
3. Geometrical difference from the native state
4. Deviation from Google’s Deepmind AlphaFold tensor
residue #
residue
#
Syzonenko et al,
J. Chem. Inf. Model. 2020
Beccara et al,
Phys. Rev. Lett. 2015
CVs can quantify the difference
between two conformations
Alphafold
4. Deviation from Google’s Deepmind
AlphaFold tensor
Google’s Deepmind Alphafold:
Latest AI milestone in the protein folding field
Training set: 170k protein structures
Able to predict more than 200 million of
structures
Unprecedented accuracy
→ able to predict the final conformation of any
aminoacidic sequence
4. Deviation from Google’s Deepmind
AlphaFold tensor
Google’s Deepmind Alphafold:
-Input = aa sequence (text string)
...
-Sequence alignment (database comparison)
-Prediction of distance and angle between aa pairs
...
-Output = 3D protein structure
Alphafold
4. Deviation from Google’s Deepmind AlphaFold tensor
One of the outputs of AlphaFold is the so-called distogram:
Tensor of distance bins x aa x aa
→ probability over distance between pairs of aa
example below: aa at position 5 (Asparagine) vs aa at position 19 (Aspartic acid)
→ machine-learned 170k protein
conformations
→ corresponds to a quasi-
chemical potential
Alphafold
Identify the optimal CV
Training set:
Very long folding trajectories obtained by the most powerful supercomputer (Anton)
→ 200µs of trajectories (Villin and Fip35 proteins)
Anton
UNFOLDED
FOLDED
Training set:
Very long folding trajectories obtained by the most powerful supercomputer (Anton)
→ 200µs of trajectories (Villin and Fip35 proteins)
Anton
Identify the optimal CV
Training set:
Very long folding trajectories obtained by the most powerful supercomputer (Anton)
→ 200µs of trajectories (Villin and Fip35 proteins)
Training set:
396k rows
4 features
Machine learning (PCA) Optimal CV identified
Principal Component 1
Principal
Component
2
Identify the optimal CV
Run MD folding algorithm
Towards the native state = along the optimal CV
NOTE: trajectories are unbiased
Results
Obtained the folding trajectories of 4 small proteins
My simulation time = 1 day per trajectory on a weak laptop
(Anton supercomputer would need weeks)
Brute force MD
~weeks
VS
Smart algorithm
~hours
Results
Fip35
Frame 87
Frame 30
Frame 2
Frame 0
Fip35 (35 aa, 562 atoms): 16 trajectories, 1 folded
Results
Beta Hairpin
Frame 0 Frame 5 Frame 20
Beta Hairpin (16 aa, 247 atoms): 3 trajectories, 1 folded
Frame 27
Results
TrpCage
Frame 0 Frame 15
Frame 40
Frame 74
TrpCage (20 aa, 304 atoms): 12 trajs, 4 folded
Results
Villin
Frame 45
Frame 0 Frame 10 Frame 25
Villin (35 aa, 583 atoms): 16 trajs, 4 “almost” folded
Results
Villin
Frame 45
Frame 0 Frame 10 Frame 25
Villin (35 aa, 583 atoms): 16 trajs, 4 “almost” folded
→ High energy barrier for the last helix?
Future works
Kmiecik et al,
Chem. Rev. 2016,
Coarse graining Explicit solvent
Method applications
Conformational
transitions and
point mutations
Misfolding
Identify intermediate
conformations
Thanks for the attention!
Prof. Pietro Faccioli
Master‘s supervisor
University of Milan
Prof. Emiliano Biasini
Collaborator
University of Trento
Prof. Diego Calvanese
Smart Data Factory
University of Bolzano

More Related Content

Similar to SFSCON23 - Alan Ianeselli - Machine learning driven simulation of protein folding atomistic trajectories

Molecular Dynamic: Basics
Molecular Dynamic: BasicsMolecular Dynamic: Basics
Molecular Dynamic: BasicsAjay Murali
 
Urop poster 2014 benson and mat 5 13 14 (2)
Urop poster 2014 benson and mat 5 13 14 (2)Urop poster 2014 benson and mat 5 13 14 (2)
Urop poster 2014 benson and mat 5 13 14 (2)Mathew Shum
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
Using Machine Learning to Measure the Cross Section of Top Quark Pairs in the...
Using Machine Learning to Measure the Cross Section of Top Quark Pairs in the...Using Machine Learning to Measure the Cross Section of Top Quark Pairs in the...
Using Machine Learning to Measure the Cross Section of Top Quark Pairs in the...m.a.kirn
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structureBITS
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithmszamakhan
 
Echo state networks and locomotion patterns
Echo state networks and locomotion patternsEcho state networks and locomotion patterns
Echo state networks and locomotion patternsVito Strano
 
Protein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentProtein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentSaramita De Chakravarti
 
Areejit Samal Emergence Alaska 2013
Areejit Samal Emergence Alaska 2013Areejit Samal Emergence Alaska 2013
Areejit Samal Emergence Alaska 2013Areejit Samal
 
BP219 class 4 04 2011
BP219 class 4 04 2011BP219 class 4 04 2011
BP219 class 4 04 2011waddling
 
Mpp Rsv 2008 Public
Mpp Rsv 2008 PublicMpp Rsv 2008 Public
Mpp Rsv 2008 Publiclab13unisa
 
Cytoskeleton based molecular machines/motors and their varieties
Cytoskeleton based molecular machines/motors and their varietiesCytoskeleton based molecular machines/motors and their varieties
Cytoskeleton based molecular machines/motors and their varietiesShivraj Nile
 
An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformaticsJoel Ricci-López
 
5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological dataKrish_ver2
 
Current Research on Quantum Algorithms.ppt
Current Research on Quantum Algorithms.pptCurrent Research on Quantum Algorithms.ppt
Current Research on Quantum Algorithms.pptDefiantTones
 
Presentation1
Presentation1Presentation1
Presentation1firesea
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentRai University
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentRai University
 

Similar to SFSCON23 - Alan Ianeselli - Machine learning driven simulation of protein folding atomistic trajectories (20)

Molecular Dynamic: Basics
Molecular Dynamic: BasicsMolecular Dynamic: Basics
Molecular Dynamic: Basics
 
Urop poster 2014 benson and mat 5 13 14 (2)
Urop poster 2014 benson and mat 5 13 14 (2)Urop poster 2014 benson and mat 5 13 14 (2)
Urop poster 2014 benson and mat 5 13 14 (2)
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Using Machine Learning to Measure the Cross Section of Top Quark Pairs in the...
Using Machine Learning to Measure the Cross Section of Top Quark Pairs in the...Using Machine Learning to Measure the Cross Section of Top Quark Pairs in the...
Using Machine Learning to Measure the Cross Section of Top Quark Pairs in the...
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
 
Echo state networks and locomotion patterns
Echo state networks and locomotion patternsEcho state networks and locomotion patterns
Echo state networks and locomotion patterns
 
Protein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentProtein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural Alignment
 
Areejit Samal Emergence Alaska 2013
Areejit Samal Emergence Alaska 2013Areejit Samal Emergence Alaska 2013
Areejit Samal Emergence Alaska 2013
 
BP219 class 4 04 2011
BP219 class 4 04 2011BP219 class 4 04 2011
BP219 class 4 04 2011
 
Mpp Rsv 2008 Public
Mpp Rsv 2008 PublicMpp Rsv 2008 Public
Mpp Rsv 2008 Public
 
Homology modeling
Homology modelingHomology modeling
Homology modeling
 
Cytoskeleton based molecular machines/motors and their varieties
Cytoskeleton based molecular machines/motors and their varietiesCytoskeleton based molecular machines/motors and their varieties
Cytoskeleton based molecular machines/motors and their varieties
 
An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformatics
 
5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
 
Current Research on Quantum Algorithms.ppt
Current Research on Quantum Algorithms.pptCurrent Research on Quantum Algorithms.ppt
Current Research on Quantum Algorithms.ppt
 
Presentation1
Presentation1Presentation1
Presentation1
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 

More from South Tyrol Free Software Conference

SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...
SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...
SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...South Tyrol Free Software Conference
 
SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...
SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...
SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...South Tyrol Free Software Conference
 
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSouth Tyrol Free Software Conference
 
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...South Tyrol Free Software Conference
 
SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...
SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...
SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...South Tyrol Free Software Conference
 
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...South Tyrol Free Software Conference
 
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelines
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelinesSFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelines
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelinesSouth Tyrol Free Software Conference
 
SFSCON23 - Charles H. Schulz - Why open digital infrastructure matters
SFSCON23 - Charles H. Schulz - Why open digital infrastructure mattersSFSCON23 - Charles H. Schulz - Why open digital infrastructure matters
SFSCON23 - Charles H. Schulz - Why open digital infrastructure mattersSouth Tyrol Free Software Conference
 
SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...
SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...
SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...South Tyrol Free Software Conference
 
SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...
SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...
SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...South Tyrol Free Software Conference
 
SFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free software
SFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free softwareSFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free software
SFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free softwareSouth Tyrol Free Software Conference
 
SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...
SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...
SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...South Tyrol Free Software Conference
 
SFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changer
SFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changerSFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changer
SFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changerSouth Tyrol Free Software Conference
 
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...South Tyrol Free Software Conference
 
SFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation Internet
SFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation InternetSFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation Internet
SFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation InternetSouth Tyrol Free Software Conference
 
SFSCON23 - Davide Vernassa - Empowering Insights Unveiling the latest innova...
SFSCON23 - Davide Vernassa - Empowering Insights  Unveiling the latest innova...SFSCON23 - Davide Vernassa - Empowering Insights  Unveiling the latest innova...
SFSCON23 - Davide Vernassa - Empowering Insights Unveiling the latest innova...South Tyrol Free Software Conference
 

More from South Tyrol Free Software Conference (20)

SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...
SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...
SFSCON23 - Rufai Omowunmi Balogun - SMODEX – a Python package for understandi...
 
SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...
SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...
SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...
 
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
 
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...
SFSCON23 - Marianna d'Atri Enrico Zanardo - How can Blockchain technologies i...
 
SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...
SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...
SFSCON23 - Lucas Lasota - The Future of Connectivity, Open Internet and Human...
 
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...
SFSCON23 - Giovanni Giannotta - Intelligent Decision Support System for trace...
 
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelines
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelinesSFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelines
SFSCON23 - Elena Maines - Embracing CI/CD workflows for building ETL pipelines
 
SFSCON23 - Christian Busse - Free Software and Open Science
SFSCON23 - Christian Busse - Free Software and Open ScienceSFSCON23 - Christian Busse - Free Software and Open Science
SFSCON23 - Christian Busse - Free Software and Open Science
 
SFSCON23 - Charles H. Schulz - Why open digital infrastructure matters
SFSCON23 - Charles H. Schulz - Why open digital infrastructure mattersSFSCON23 - Charles H. Schulz - Why open digital infrastructure matters
SFSCON23 - Charles H. Schulz - Why open digital infrastructure matters
 
SFSCON23 - Andrea Vianello - Achieving FAIRness with EDP-portal
SFSCON23 - Andrea Vianello - Achieving FAIRness with EDP-portalSFSCON23 - Andrea Vianello - Achieving FAIRness with EDP-portal
SFSCON23 - Andrea Vianello - Achieving FAIRness with EDP-portal
 
SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...
SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...
SFSCON23 - Thomas Aichner - How IoT and AI are revolutionizing Mass Customiza...
 
SFSCON23 - Stefan Mutschlechner - Smart Werke Meran
SFSCON23 - Stefan Mutschlechner - Smart Werke MeranSFSCON23 - Stefan Mutschlechner - Smart Werke Meran
SFSCON23 - Stefan Mutschlechner - Smart Werke Meran
 
SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...
SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...
SFSCON23 - Mirko Boehm - European regulators cast their eyes on maturing OSS ...
 
SFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free software
SFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free softwareSFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free software
SFSCON23 - Marco Pavanelli - Monitoring the fleet of Sasa with free software
 
SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...
SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...
SFSCON23 - Marco Cortella - KNOWAGE and AICS for 2030 agenda SDG goals monito...
 
SFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changer
SFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changerSFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changer
SFSCON23 - Lina Ceballos - Interoperable Europe Act - A real game changer
 
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...
SFSCON23 - Johannes Näder Linus Sehn - Let’s monitor implementation of Free S...
 
SFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation Internet
SFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation InternetSFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation Internet
SFSCON23 - Gabriel Ku Wei Bin - Why Do We Need A Next Generation Internet
 
SFSCON23 - Edoardo Scepi - The Brand-New Version of IGis Maps
SFSCON23 - Edoardo Scepi - The Brand-New Version of IGis MapsSFSCON23 - Edoardo Scepi - The Brand-New Version of IGis Maps
SFSCON23 - Edoardo Scepi - The Brand-New Version of IGis Maps
 
SFSCON23 - Davide Vernassa - Empowering Insights Unveiling the latest innova...
SFSCON23 - Davide Vernassa - Empowering Insights  Unveiling the latest innova...SFSCON23 - Davide Vernassa - Empowering Insights  Unveiling the latest innova...
SFSCON23 - Davide Vernassa - Empowering Insights Unveiling the latest innova...
 

Recently uploaded

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

SFSCON23 - Alan Ianeselli - Machine learning driven simulation of protein folding atomistic trajectories

  • 1. Free University of Bolzano - Bozen Faculty of Engineering Alan Ianeselli SFSCON 2023 Machine learning-driven simulation of protein folding atomistic trajectories
  • 2. Protein structure ~20 amino acids They link by peptide bonds to form a polymer aa1 aa2 peptide bond A protein is a polymer of tens, hundreds or thousands of amino acids
  • 3. Protein structure → Unique conformation given a specific aminoacidic sequence = the protein folding problem Primary structure (aa sequence) Secondary structure (local structures) Tertiary structure (3D conformation) Quaternary structure (protein complexes)
  • 5. Protein synthesis → How does the newly synthesized disordered protein achieve its final conformation? Author: Juan Gaertner mRNA newly synthesized protein (disordered) tRNA carried single aa ribosome
  • 6. Molecular dynamics (MD) simulations ‣ Numerically solve Newton’s equations of motion over time for each atom of the system
  • 7. ‣ Forces are computed from force fields Non-bonded interactions Coulomb Van der Waals Molecular dynamics (MD) simulations
  • 8. Bonded interactions Bond stretching Angle bending Dihedral torsions ‣ Forces are computed from force fields Molecular dynamics (MD) simulations
  • 9. Molecular dynamics (MD) simulations Time Unfolded Folded Folding time ~ microseconds (for small proteins) = Weeks of simulation on supercomputers
  • 10. DE Shaw et al., 2009 Anton Supercomputer ~50 µs/day for ~100’000 atoms Molecular dynamics (MD) simulations
  • 11. ~100 years of simulation!! DE Shaw et al., 2009 Anton Supercomputer ~50 µs/day for ~100’000 atoms For example, Lysozyme in water (~100’000 atoms) requires SECONDS to fold Molecular dynamics (MD) simulations
  • 12. ~100 years of simulation!! ‣ Conventional MD approaches are unfeasible DE Shaw et al., 2009 Anton Supercomputer ~50 µs/day for ~100’000 atoms For example, Lysozyme in water (~100’000 atoms) requires SECONDS to fold Molecular dynamics (MD) simulations
  • 13. Timescales of macromolecules ps ns µs ms s m h Classical MD simulations “Relevant” biology
  • 14. Timescales of macromolecules ps ns µs ms s m h Classical MD simulations “Relevant” biology Coarse grained and approximated models
  • 15. Timescales of macromolecules ps ns µs ms s m h Classical MD simulations “Relevant” biology Coarse grained and approximated models → Sometimes unrealistic or unfalsifiable
  • 16. What I have in mind? A smart algorithm to study protein folding trajectories
  • 17. What I have in mind? How do you measure if it went “forward”? A smart algorithm to study protein folding trajectories
  • 18. Reaction Coordinate and Collective Variables Folded Unfolded Transition State 1 Transition State 2 Reaction coordinate (ideal) Free energy (a.u.)
  • 19. Reaction Coordinate and Collective Variables Folded Unfolded Transition State 1 Transition State 2 Reaction coordinate (ideal) Free energy (a.u.) Collective variable (measurable)
  • 20. „Best“ collective variables (CVs) for protein folding CVs can quantify the difference between two conformations
  • 21. „Best“ collective variables (CVs) for protein folding 1. Dihedral angles deviation from the native state Syzonenko et al, J. Chem. Inf. Model. 2020 CVs can quantify the difference between two conformations
  • 22. „Best“ collective variables (CVs) for protein folding 1. Dihedral angles deviation from the native state 2. Inter-aa contact deviation from the native state residue # residue # Syzonenko et al, J. Chem. Inf. Model. 2020 Beccara et al, Phys. Rev. Lett. 2015 CVs can quantify the difference between two conformations
  • 23. „Best“ collective variables (CVs) for protein folding 1. Dihedral angles deviation from the native state 2. Inter-aa contact deviation from the native state 3. Geometrical difference from the native state residue # residue # Syzonenko et al, J. Chem. Inf. Model. 2020 Beccara et al, Phys. Rev. Lett. 2015 CVs can quantify the difference between two conformations
  • 24. „Best“ collective variables (CVs) for protein folding 1. Dihedral angles deviation from the native state 2. Inter-aa contact deviation from the native state 3. Geometrical difference from the native state 4. Deviation from Google’s Deepmind AlphaFold tensor residue # residue # Syzonenko et al, J. Chem. Inf. Model. 2020 Beccara et al, Phys. Rev. Lett. 2015 CVs can quantify the difference between two conformations
  • 25. Alphafold 4. Deviation from Google’s Deepmind AlphaFold tensor Google’s Deepmind Alphafold: Latest AI milestone in the protein folding field Training set: 170k protein structures Able to predict more than 200 million of structures Unprecedented accuracy → able to predict the final conformation of any aminoacidic sequence
  • 26. 4. Deviation from Google’s Deepmind AlphaFold tensor Google’s Deepmind Alphafold: -Input = aa sequence (text string) ... -Sequence alignment (database comparison) -Prediction of distance and angle between aa pairs ... -Output = 3D protein structure Alphafold
  • 27. 4. Deviation from Google’s Deepmind AlphaFold tensor One of the outputs of AlphaFold is the so-called distogram: Tensor of distance bins x aa x aa → probability over distance between pairs of aa example below: aa at position 5 (Asparagine) vs aa at position 19 (Aspartic acid) → machine-learned 170k protein conformations → corresponds to a quasi- chemical potential Alphafold
  • 28. Identify the optimal CV Training set: Very long folding trajectories obtained by the most powerful supercomputer (Anton) → 200µs of trajectories (Villin and Fip35 proteins) Anton
  • 29. UNFOLDED FOLDED Training set: Very long folding trajectories obtained by the most powerful supercomputer (Anton) → 200µs of trajectories (Villin and Fip35 proteins) Anton Identify the optimal CV
  • 30. Training set: Very long folding trajectories obtained by the most powerful supercomputer (Anton) → 200µs of trajectories (Villin and Fip35 proteins) Training set: 396k rows 4 features Machine learning (PCA) Optimal CV identified Principal Component 1 Principal Component 2 Identify the optimal CV
  • 31. Run MD folding algorithm Towards the native state = along the optimal CV NOTE: trajectories are unbiased
  • 32. Results Obtained the folding trajectories of 4 small proteins My simulation time = 1 day per trajectory on a weak laptop (Anton supercomputer would need weeks) Brute force MD ~weeks VS Smart algorithm ~hours
  • 33. Results Fip35 Frame 87 Frame 30 Frame 2 Frame 0 Fip35 (35 aa, 562 atoms): 16 trajectories, 1 folded
  • 34. Results Beta Hairpin Frame 0 Frame 5 Frame 20 Beta Hairpin (16 aa, 247 atoms): 3 trajectories, 1 folded Frame 27
  • 35. Results TrpCage Frame 0 Frame 15 Frame 40 Frame 74 TrpCage (20 aa, 304 atoms): 12 trajs, 4 folded
  • 36. Results Villin Frame 45 Frame 0 Frame 10 Frame 25 Villin (35 aa, 583 atoms): 16 trajs, 4 “almost” folded
  • 37. Results Villin Frame 45 Frame 0 Frame 10 Frame 25 Villin (35 aa, 583 atoms): 16 trajs, 4 “almost” folded → High energy barrier for the last helix?
  • 38. Future works Kmiecik et al, Chem. Rev. 2016, Coarse graining Explicit solvent
  • 39. Method applications Conformational transitions and point mutations Misfolding Identify intermediate conformations
  • 40. Thanks for the attention! Prof. Pietro Faccioli Master‘s supervisor University of Milan Prof. Emiliano Biasini Collaborator University of Trento Prof. Diego Calvanese Smart Data Factory University of Bolzano