Your SlideShare is downloading. ×
Presentation 2007 Journal Club Azhar Ali Shah
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Presentation 2007 Journal Club Azhar Ali Shah

499

Published on

ASAP-IOL Journal Club Talk 2007

ASAP-IOL Journal Club Talk 2007

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
499
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. “ Rapid Methods for Comparing Protein Structures and Scanning Structure Databases” [Oliviero Carugo, Current Bioinformatics;1(1), 2006] Azhar Ali Shah Computational Foundations of Nanoscience Journal Club (CFNJC) CFNJC, October 19, 2007
  • 2. Overview
    • Introduction
      • About the author
      • Problem
      • Requirements
      • Motivations
      • Background
    • Classification of methods
    • Summary
    • Observations
  • 3. Introduction: about the author 1/2
    • Name : Oliviero Carugo
    • Nationality : Italian and French
    • Education:
      • PhD (Chemistry), Univ. of Pavia, Italy, (1985 - 1986)
      • Post Doc (Structural Biology Program), EMBL, Heidelberg, Germany, (1995-2000)
    • Current Position:
      • AP, Dept. of General Chemistry, Univ. of Pavia, Italy (2000 --)
      • Visiting Professor, Dept. of Biomolecular Structural Chemistry, University of Vienna, Austria (2005 --)
  • 4. Introduction: about the author 2/2
    • Research interests:
      • Structural bioinformatics:
        • Estimation of protein structure similarity,
        • prediction of inter-molecular interactions,
        • prediction of crystallizability of gene products
    • DBLP: Carugo
      • CX, DPX and PRIDE: WWW servers for the analysis and comparison of protein 3D structures. Nucleic Acids Research 33(Web-Server-Issue): 252-254 (2005)
      • DPX: for the analysis of the protein core. Bioinformatics 19(2): 313-314 (2003)
      • Prediction of protein polypeptide fragments exposed to the solvent. In Silico Biology 3: 35 (2003)
      • CX, an algorithm that identifies protruding atoms in proteins. Bioinformatics 18(7): 980-984 (2002)
  • 5. Introduction: problem 1/2
    • Complexity of the structural biological information is increasing more rapidly as compared to computer performance
      • Consider:
        • Number of PDB entries as structural biological information ( PDB Graph )
        • Number of transistors per IC as a parameter of compute performance ( Moore’s Law )
          • Evaluation for 3 decades (1971 to 2003) gives:
  • 6. Introduction: problem 2/2 Number of PDB Structures Number of transistors per IC (x 100, 000) Confusing description! Total structures in 2003: 20, 000 Yearly growth in 2003: 5000
  • 7. Introduction: requirement
    • Fast algorithms and protocols to measure similarity b/w protein 3D structures available in large scale databases
  • 8. Introduction: motivations
    • The estimation of similarity between protein 3D structures helps in:
      • Molecular evolution
      • Molecular modelling
      • Function prediction
      • Database scanning
  • 9. Introduction: background 1/3
    • So many algorithms:
      • Each biological problem requires its own comparison method
        • Different problems need different logical approaches
  • 10. Introduction: background 2/3
      • Slow methods
        • Careful examination of proximity among two or more proteins using structural alignment
        • Too slow for large databases
        • Often use two step strategy
          • Coarse structure representation (e.g. SSE)
          • Fine structure representation (e.g. positions of C  atoms)
  • 11. Introduction: background 3/3
      • Fast methods
        • Used for large scale databases
        • Work on coarse representation of protein structures
        • Results are less accurate and detailed (e.g. no structural alignment)
  • 12. Introduction: focus of the paper
    • Fast comparison methods that can handle large scale structural databases
    • “ Rapid Methods for Comparing Protein Structures and Scanning Structure Databases”
  • 13. Classification of methods
    • Based on the representation of protein’s 3D structure:
      • String
      • Array
      • Secondary structure elements (SSEs)
      • Backbone
  • 14. String representation 1/4
    • Uncommon but appealing
      • Allows to use sequence alignment methods to compare 3D structures
    • 3D structure of n residues/SSEs (or other structural units) is represented by n characters
      • Characters are chosen from an alphabet
      • Each character has associated structural features
  • 15. String representation 2/4
    • Problem:
      • Difficult to design an appropriate alphabet that can well describe the 3D structural features
    • Comparison methods based on strings:
      • TOPSCAN (Martin ACR, Protein Eng , 2000),UCL
        • Uses STRIDE program to identify SSEs
        • Builds the vectors b/w the endpoints of SSEs
        • SSEs are associated with one of the 12 characters on the basis of larger component in the vector
  • 16. String representation 3/4
  • 17. String representation 4/4
    • Uses Needleman and Wunsch algorithm on string representation of two 3D structures and calculates the percentage similarity score using following scheme
    Should be 10? How fast TOPSCAN is?
  • 18. Array representation 1/4
    • 3D structure represented as a fixed length array of real numbers
    • Benefits:
      • For the comparison of equal length arrays there are well assessed mathematical tools based on proximity detection
        • E.g. Euclidian distance b/w two points in an orthogonal space
    • Problems
      • Definition of the array
        • No obvious way to describe an object by means of predefined set of variables
  • 19. Array representation 2/4
    • Comparison methods based on arrays:
      • PRIDE (Carugo and Pongor, J Mol Bio 2002 )
        • Uses distances b/w C  atoms to represent the 3D structure
        • 28 histograms are computed for each structure e.g.
    Fold similarity of two structures is estimated as the average of probability of identity scores obtained from the pairwise comparison of 28 histograms Two histograms are compared through contingency table and χ 2 Test to obtain the probability of identity score
  • 20.  
  • 21. Array representations 4/4
      • PRIDE results agreeable with CATH
        • Fast comparison
          • 1000 comparisons per second
            • SGI R10000 system with 200 MHz
  • 22. Secondary structural elements (SSEs) 1/6
    • Simplified description of 3D structure
      • i.e a few tens of SSEs as compared to several tens or hundreds of residues
      • Smaller number of variables make comparison easier
  • 23. Secondary structural elements (SSEs) 2/6
    • Different ways to represent protein 3D structure by means of SSEs
      • Secondary structural assignments
      • SSE approximation
  • 24. Secondary structural elements (SSEs) 3/6
    • Secondary structural assignments
      • Different assignments with different programs
        • Due to variable torsion angles along the backbone
      • Common methods:
        • DSSP (Kabsch and Sander, Biopolymers 1983 )
          • Dictionary of protein secondary structures
          • Looks for hydrogen bonds b/w main-chain atoms and assigns each residue with one of eight types of secondary structure conformations
        • STRIDE (Frishman and Argos, Proteins 1995)
          • Uses both hydrogen bonds and torsion angles to assign secondary structures
  • 25. Secondary structural elements (SSEs) 4/6
      • Other methods for SSE assignments
        • P-Curve
        • DEFINE
        • SSA
        • VADAR
        • Voronoi Tessellations
      • Contradiction in results
        • DSSP and STRIDE agree in 96 % (for 707 Ps)
        • DSSP, STRIDE, DEFINE agree in 71 % (for 126 Ps)
        • DSSP, DEFINE, P-Curve agree in 63 % (for 154 Ps)
    Secondary structure assignments are quite ambiguous and inconsistent! ( consensus based on majority vote needed ) Serious limitation of the methods that compare 3D structures based on SSE arrangements
  • 26. Secondary structural elements (SSEs) 5/6
    • SSE approximations
      • As a vector from N to C terminus
        • Differ from arrays in terms of variable length
        • Well assessed mathematical tools cannot be used
      • Different ways
  • 27. Secondary structural elements (SSEs) 6/6
    • Two-step methods based on SSEs
        • SSM (Krissinel and Heinrick, EMBL 2003 )
          • Secondary Structure Matching
            • http://www.ebi.ac.uk/msd-srv/ssm/
          • Protein 3D structures are represented as graphs
            • Nodes are SSEs
          • Graph comparison results in identification of equivalent residues
            • Subsequent minimization of RMSD b/w equivalent residues
        • DEJAVU (http://xray.bmc.uu.se/usf/)
        • Matras (http://biunit.naist.jp/matras/)
        • VAST (http://www.ncbi.nlm.nih.gov/Structure/VAST)
    Statistical performance of SSM or other methods? Two-step methods are slow?
  • 28. Backbone representations
    • Uses vector based profiles to describe trajectories from N to C terminus of backbone
      • Trajectory could be described as a simple curve
        • Each residue is associated with the curvature and torsion of the curve
        • Differences of these parameters are used to compare two 3D structures
      • Useful when one compares same protein in two different states (e.g with or without a substrate, inhibitors and cofactors etc.)
    • It is hard to handle with gaps and insertions
    Hardly used in general case for similarity evaluation and hence no public web servers are available. However?
  • 29. Comparison b/w various methods
    • For 86 queries, DALI gives best quality of results as compared to:
      • CE, Matras, PRIDE, SGM, Structal and VAST
        • (Sierk and Pearson, Protein Sc 2004 )
    • For 70 queries CE, Dali, VAST and Matras provide better quality of results with high speed as compared to:
      • DEJAVU, Lock, PRIDE, SSM, TOP, TOPS, TOPSCAN
        • (Novotony et al. Proteins 2004 )
    Strange! Speed also depends on the power of computing environment the algorithm runs on.
  • 30. Summary
    • Rapid methods may use coarse representation of 3D structures in following forms:
      • Strings
        • E.g TOPSCAN
      • Arrays
        • E.g PRIDE
      • SSEs
        • Two-step methods: SSM, DEJAVU, Matras, VAST
      • Backbone
        • Algorithmic level studies: no public web servers
    • Comparison on same collection of data on same computing environment is useful:
      • To benchmark the sate of the art of fast procedures
  • 31. Observations:
    • Actual benchmarking of rapid methods on large scale databases
    • Proper evaluation of methods based on different representations of protein’s 3D structure
    • Full classification of methods based on structure representation
  • 32. Source: www.intel.com/research/silicon/mooreslaw.htm
  • 33. Source: www.ncsb.org Total Yearly

×