Alphafold2 - Protein Structural
Bioinformatics After CASP14
Daisuke Kihara
Department of Biological Sciences, Computer Science
Purdue University
https://kiharalab.org
How it started: CASP13 – CASP14
1
2
Alphafold1 (CASP13) & Alphafold2 (CAPSP14)
3
Sum of the Z-score of GDT-TS
November 30, 2020
Alphafold2:
Differences and Novelty
2
4
Alphafold1 (CASP13)
5
(Jumper J et al., Nature 2020)
Jan. 15, 2020
Incomplete code release at Github
6
(From CASP13 presentations by Jumper & Senior)
trRosetta
7
Submitted: Aug. 22, 2019
Appeared: Jan. 2, 2020
(Yang et al., PNAS, 2020)
AttentiveDist (Kiharalab)
8
Submitted (& bioRxiv) : Nov. 25, 2020
Appeared: April 7, 2021
(Jain A, et al., Scientific Reports, 2021)
1D Features:
1-hot encoding (20 aa)
PSSM, HMM
Predicted 2ndary structures & ASA (Spot-1D)
2D Features:
Predicted contacts (CCMPred)
Mutual Information
Statistical pairwise potential
CASP13, 43 FM and FM/TBM targets
Alphafold2
9
April 7, 2021
(Jumper J et al., Nature 2021)
Input Sequence Embedding Structure building
(Inference) code at Github & well-documented Suppl. Info. Released
End-to-End
Input MSA
◉ DB Search
○ JackHMMER > Mgnify (Metagenome DB)
○ JackHMMER > Uniref90
○ Hhblits > Uniclust30 + BFD (Metagenome DB)
○ Above are simply stacked.
◉ MSA processing (to reduce the memory usage)
○ MSA block deletion (MSA-level dropout)
○ Nseq sequences are randomly selected as MSA cluster centers
○ 15% are masked
○ Among masked: 10%, replaced to random amino acids; 10%,
replaced from MSA profile; 10%, unchanged; 70%: targets for
prediction.
○ Nextra_Seq selected for extra MSA sequences
10
11
Processing Input Embedding
12
Evoformer Block (Sequence Embedding)
“tri-amino acid contact (potential)”
i
j
k
“MSA Transformer”
13
Structure Module (IPA)
14
Structure Module (IPA)
8 blocks (shared weights)
Model Training & Loss Function
◉ Recycling (training & inference)
◉ MSA resampling (training & inference)
◉ Noisy Student Self-distillation
◉ Loss Function
15
Atom
coordinates
Main-chain atom
coord. & angles Dist-ogram
Predicted
error
If atoms
are
experiment
ally
determined
Crystal
structure
statistics
Masked
MSA
RoseTTafold
16
April 7, 2021
(Baek M et al., Science 2021)
Preparing input of SE(3)
“MSA transformer”
Time Course
3
17
CASP13
18
December 1-4, 2018
Alphafold US Patent
19
June 3, 2021
Alphafold2 Paper & Code Release
20
July 15, 2021
(Jumper J et al., Nature 2021)
Input Sequence Embedding Structure building
Alphafold Database
21
July 22, 2021
(Tunyasuvunakool K. et al., Nature 2021)
Database of protein models of 21 model organisms
Alphafold Use Case Reports (Twitter)
◉ Installation of the software
◉ Surprisingly accurate models of specific
proteins
◉ Modeling of protein complexes
○ With Gly-linker; X-linker, manipulating residue
number index
◉ Peptide docking
◉ Indication of disordered region, mutation..
22
July - August, 2021
Alphafold-Related Papers (166 papers on bioRxiv
since 12/01/2020)
23
July - August, 2021
Peptide docking
Protein-protein docking
Mutation
Protein Design
Folding Pathway
Molecular Replacement
Application
ColabFold
24
August 15, 2021
(Mirdita M et al., bioRxiv 2021)
Future Directions …
4
25
Future Directions for Structural
Bioinformatics works after Alphafold2
◉ Reuse the AF2 code
○ Docking, interaction
○ Data-assisted protein structure modeling
○ Protein design
○ Structure modeling of other molecules
◉ Application of AF2 structure models
○ Drug screening
○ Function (binding pocket classification ..)
◉ Explore new topics
○ Cryo-ET
○ Dynamics 26
Alphafold2 as BLAST
◉ Use Alphafold2 as the base-line tool, just like
BLAST in sequence analysis
◉ Use it in a pipeline
◉ Modify the code, concepts, techniques
27
Kihara Lab Research Team
28
https://kiharalab.org
@kiharalab @d_kihara
Sai Raghavendra Maddhuri
Venkata Subramaniya
Zicong Zhang
Special Thanks to Sai and Zicong for discussion

Alphafold2 - Protein Structural Bioinformatics After CASP14

  • 1.
    Alphafold2 - ProteinStructural Bioinformatics After CASP14 Daisuke Kihara Department of Biological Sciences, Computer Science Purdue University https://kiharalab.org
  • 2.
    How it started:CASP13 – CASP14 1 2
  • 3.
    Alphafold1 (CASP13) &Alphafold2 (CAPSP14) 3 Sum of the Z-score of GDT-TS November 30, 2020
  • 4.
  • 5.
    Alphafold1 (CASP13) 5 (Jumper Jet al., Nature 2020) Jan. 15, 2020 Incomplete code release at Github
  • 6.
    6 (From CASP13 presentationsby Jumper & Senior)
  • 7.
    trRosetta 7 Submitted: Aug. 22,2019 Appeared: Jan. 2, 2020 (Yang et al., PNAS, 2020)
  • 8.
    AttentiveDist (Kiharalab) 8 Submitted (&bioRxiv) : Nov. 25, 2020 Appeared: April 7, 2021 (Jain A, et al., Scientific Reports, 2021) 1D Features: 1-hot encoding (20 aa) PSSM, HMM Predicted 2ndary structures & ASA (Spot-1D) 2D Features: Predicted contacts (CCMPred) Mutual Information Statistical pairwise potential CASP13, 43 FM and FM/TBM targets
  • 9.
    Alphafold2 9 April 7, 2021 (JumperJ et al., Nature 2021) Input Sequence Embedding Structure building (Inference) code at Github & well-documented Suppl. Info. Released End-to-End
  • 10.
    Input MSA ◉ DBSearch ○ JackHMMER > Mgnify (Metagenome DB) ○ JackHMMER > Uniref90 ○ Hhblits > Uniclust30 + BFD (Metagenome DB) ○ Above are simply stacked. ◉ MSA processing (to reduce the memory usage) ○ MSA block deletion (MSA-level dropout) ○ Nseq sequences are randomly selected as MSA cluster centers ○ 15% are masked ○ Among masked: 10%, replaced to random amino acids; 10%, replaced from MSA profile; 10%, unchanged; 70%: targets for prediction. ○ Nextra_Seq selected for extra MSA sequences 10
  • 11.
  • 12.
    12 Evoformer Block (SequenceEmbedding) “tri-amino acid contact (potential)” i j k “MSA Transformer”
  • 13.
  • 14.
    14 Structure Module (IPA) 8blocks (shared weights)
  • 15.
    Model Training &Loss Function ◉ Recycling (training & inference) ◉ MSA resampling (training & inference) ◉ Noisy Student Self-distillation ◉ Loss Function 15 Atom coordinates Main-chain atom coord. & angles Dist-ogram Predicted error If atoms are experiment ally determined Crystal structure statistics Masked MSA
  • 16.
    RoseTTafold 16 April 7, 2021 (BaekM et al., Science 2021) Preparing input of SE(3) “MSA transformer”
  • 17.
  • 18.
  • 19.
  • 20.
    Alphafold2 Paper &Code Release 20 July 15, 2021 (Jumper J et al., Nature 2021) Input Sequence Embedding Structure building
  • 21.
    Alphafold Database 21 July 22,2021 (Tunyasuvunakool K. et al., Nature 2021) Database of protein models of 21 model organisms
  • 22.
    Alphafold Use CaseReports (Twitter) ◉ Installation of the software ◉ Surprisingly accurate models of specific proteins ◉ Modeling of protein complexes ○ With Gly-linker; X-linker, manipulating residue number index ◉ Peptide docking ◉ Indication of disordered region, mutation.. 22 July - August, 2021
  • 23.
    Alphafold-Related Papers (166papers on bioRxiv since 12/01/2020) 23 July - August, 2021 Peptide docking Protein-protein docking Mutation Protein Design Folding Pathway Molecular Replacement Application
  • 24.
  • 25.
  • 26.
    Future Directions forStructural Bioinformatics works after Alphafold2 ◉ Reuse the AF2 code ○ Docking, interaction ○ Data-assisted protein structure modeling ○ Protein design ○ Structure modeling of other molecules ◉ Application of AF2 structure models ○ Drug screening ○ Function (binding pocket classification ..) ◉ Explore new topics ○ Cryo-ET ○ Dynamics 26
  • 27.
    Alphafold2 as BLAST ◉Use Alphafold2 as the base-line tool, just like BLAST in sequence analysis ◉ Use it in a pipeline ◉ Modify the code, concepts, techniques 27
  • 28.
    Kihara Lab ResearchTeam 28 https://kiharalab.org @kiharalab @d_kihara Sai Raghavendra Maddhuri Venkata Subramaniya Zicong Zhang Special Thanks to Sai and Zicong for discussion