Bioinformatics for beginners
Homology modeling
Michael A. Dolan, Ph.D.
Source: AzaToth
Myoglobin
Common questions
There is no known structure for my protein. What can I do?
How can I see which portions of my macromolecule are charged?
Solvent accessible? Hydrophobic?
I found a mutation in a protein causing drug resistance in a patient.
How does this change affect function?
How are two proteins interacting with each other?
Which amino acid residue should I change to alter protein stability?
How can I create pretty pictures for publication?
Computational results must be
verified with real-world
experiments.
molecular biologist/
medicinal chemist
bioinformatician/
computational biologist
There is no known structure for my
protein. What can I do?
X-ray crystallography NMR
Source: http://bit.ly/2k4pgZg Source: http://www.langelab.ch.tum.de/
Protein homology (comparative) modeling
- constructing an atomic-resolution model of the "target" protein from its amino
acid sequence and an experimental three-dimensional structure of a
related homologous protein (the "template").
Source: https://www.mpibpc.mpg.de/9607405/Dynasome
Source: https://www.unil.ch/pmf/en/home/menuinst/technologies/homology-modeling.html
I-TASSER
Phyre2: Good, fast homology models
www.sbg.bio.ic.ac.uk/phyre2/
Hands-on exercise: Phyre2
http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/f3a1760696e74a26/summary.html
http://bit.ly/2EjbpnF
Pre-computed homology models
ModBase - database of comparative protein structure models
https://modbase.compbio.ucsf.edu
Uses ModPipe, automated modeling pipeline relying on the programs
PSI-BLAST and MODELLER
>30% sequence ID, >4 million models, >1 million sequences
Genomic Threading Database - for detecting remote homology
between protein sequences and known folds
http://bioinf.cs.ucl.ac.uk/GTD
seq ID 10-30%, > 1 million sequences
Iterative Threading ASSEmbly Refinement (I-TASSER)
• on-line platform for protein structure and function predictions (although it
can be downloaded)
• a hierarchical approach
- structural templates first identified from the PDB by multiple
threading approach LOMETS
- full-length atomic models are then constructed by iterative
template fragment assembly simulations
- function insights of the target are derived by threading the 3D
models through protein function database BioLiP
• Consistently ranked at or near the top in the Community-wide Assessment
for Structure Prediction
- I-TASSER was ranked as the No 1 server for protein structure
prediction in CASP7, CASP8, CASP9, CASP10, CASP11,
CASP12
I-TASSER pipeline
http://www.jove.com/video/3259Check out this video:
Hands-on exercise: I-TASSER
https://zhanglab.ccmb.med.umich.edu/I-TASSER/output/S379018/
Examine the results
C-score is a confidence score for estimating the quality of models.
• calculated based on the significance of threading template alignments
and the convergence parameters of the structure assembly simulations
• C-score is typically in the range of [-5 to 2], where a C-score of higher
value signifies a model with a high confidence and vice-versa.
Tm-score - solves the problem of local error when calculating RMSD
Factors determining model quality
• % sequence identity to templates
• coverage
• steric or electrostatic clashes
• agreement with bench data
• agreement with general protein structure knowledge
• scoring (RMSD, C-score, Tm-score, others….)
% ID Confidence?
> 30 good to great
25 - 30 low to maybe?
< 25 low
root-mean-square deviation (RMSD)
the root-mean-square deviation of atomic positions is the measure of the average
distance between the atoms of superimposed proteins
An aside: Other I-TASSER features
I-TASSER accepts two types of user-specified restraints:
• inter-residue contact and distance restraints
• template structures and template-target alignment
• secondary structure assignment
* Special algorithm for GPCR modeling
Homology modeling of Fab fragments
http://rosie.rosettacommons.org/antibody
Hands-on exercise: Antibody
modeling
http://rosie.rosettacommons.org/antibody/viewjob/42648
PDB: Protein Data Bank
The Protein Data Bank (PDB) archive is the single
worldwide repository of information about the 3D
structures of large biological molecules, including
proteins and nucleic acids.
www.rcsb.org

Intro to homology modeling

  • 1.
    Bioinformatics for beginners Homologymodeling Michael A. Dolan, Ph.D. Source: AzaToth Myoglobin
  • 2.
    Common questions There isno known structure for my protein. What can I do? How can I see which portions of my macromolecule are charged? Solvent accessible? Hydrophobic? I found a mutation in a protein causing drug resistance in a patient. How does this change affect function? How are two proteins interacting with each other? Which amino acid residue should I change to alter protein stability? How can I create pretty pictures for publication?
  • 3.
    Computational results mustbe verified with real-world experiments. molecular biologist/ medicinal chemist bioinformatician/ computational biologist
  • 4.
    There is noknown structure for my protein. What can I do? X-ray crystallography NMR Source: http://bit.ly/2k4pgZg Source: http://www.langelab.ch.tum.de/
  • 5.
    Protein homology (comparative)modeling - constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein (the "template"). Source: https://www.mpibpc.mpg.de/9607405/Dynasome
  • 6.
  • 7.
  • 8.
    Phyre2: Good, fasthomology models www.sbg.bio.ic.ac.uk/phyre2/
  • 9.
  • 10.
    Pre-computed homology models ModBase- database of comparative protein structure models https://modbase.compbio.ucsf.edu Uses ModPipe, automated modeling pipeline relying on the programs PSI-BLAST and MODELLER >30% sequence ID, >4 million models, >1 million sequences Genomic Threading Database - for detecting remote homology between protein sequences and known folds http://bioinf.cs.ucl.ac.uk/GTD seq ID 10-30%, > 1 million sequences
  • 11.
    Iterative Threading ASSEmblyRefinement (I-TASSER) • on-line platform for protein structure and function predictions (although it can be downloaded) • a hierarchical approach - structural templates first identified from the PDB by multiple threading approach LOMETS - full-length atomic models are then constructed by iterative template fragment assembly simulations - function insights of the target are derived by threading the 3D models through protein function database BioLiP • Consistently ranked at or near the top in the Community-wide Assessment for Structure Prediction - I-TASSER was ranked as the No 1 server for protein structure prediction in CASP7, CASP8, CASP9, CASP10, CASP11, CASP12
  • 12.
  • 13.
  • 15.
    Examine the results C-scoreis a confidence score for estimating the quality of models. • calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations • C-score is typically in the range of [-5 to 2], where a C-score of higher value signifies a model with a high confidence and vice-versa. Tm-score - solves the problem of local error when calculating RMSD
  • 16.
    Factors determining modelquality • % sequence identity to templates • coverage • steric or electrostatic clashes • agreement with bench data • agreement with general protein structure knowledge • scoring (RMSD, C-score, Tm-score, others….) % ID Confidence? > 30 good to great 25 - 30 low to maybe? < 25 low
  • 17.
    root-mean-square deviation (RMSD) theroot-mean-square deviation of atomic positions is the measure of the average distance between the atoms of superimposed proteins
  • 18.
    An aside: OtherI-TASSER features I-TASSER accepts two types of user-specified restraints: • inter-residue contact and distance restraints • template structures and template-target alignment • secondary structure assignment * Special algorithm for GPCR modeling
  • 19.
    Homology modeling ofFab fragments http://rosie.rosettacommons.org/antibody
  • 20.
  • 21.
    PDB: Protein DataBank The Protein Data Bank (PDB) archive is the single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids. www.rcsb.org