Benchmarking and Extension of Protein3DFit Nasir Mahmood
Benchmarking and Extension of Protein3DFit <ul><li>Introduction </li></ul><ul><ul><li>Why Align Structures? </li></ul></ul...
Why Align Structures?   <ul><li>Fundamental step in: </li></ul><ul><ul><li>Analysis & Visualization </li></ul></ul><ul><ul...
Structure Alignment: Issues <ul><li>Theoretical Issues </li></ul><ul><ul><li>NP-hard geometric problem </li></ul></ul><ul>...
Simple case – two closely related proteins with the same number of amino acids. Structure alignment Problem A B Introducti...
Structure Alignment Problem Two sets of Points in 3D space:   A = (a1, a2, …, an)   B = (b1, b2, …, bm) A(P) ,  B(Q):   Op...
Two Subproblems <ul><li>Find correspondence set </li></ul><ul><li>Find optimal transformation </li></ul><ul><ul><li>(prote...
Correspondence set <ul><li>Correspondence is a key problem. </li></ul><ul><li>Exhaustive search intractable. </li></ul><ul...
Superposition <ul><li>Two sets of points  A= {a1,…,an} and B = {b1,…,bn} </li></ul><ul><li>Correspondence pairs (ai, bi) <...
Aim of the project <ul><li>To make algorithm more efficient  </li></ul><ul><ul><li>debugging </li></ul></ul><ul><ul><li>op...
Benchmarking and Extension of Protein3DFit <ul><li>Introduction </li></ul><ul><ul><li>Why Align Structures? </li></ul></ul...
Algorithm  Intra-Molecular distance Seed Alignments (Fragments & Quartets) Next Best Frag Quartet 1 Quartet 2 ...............
Intra-molecular Distance Matrices 1  2  3  4  5 . . . . . N  0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Structure A Structure...
Seed Alignments Extend Fragment Make Fragment Quartet Introduction  Method  Results  Outlook
Algorithm Intra-Molecular distance Seed Alignments (Fragments & Quartets) Next Best Frag Quartet 1 Quartet 2 ................
Algorithm- Pitfalls <ul><li>Simple Similarity measure </li></ul><ul><li>1: Match </li></ul><ul><li>0: No match </li></ul>I...
Algorithm- Pitfalls <ul><li>Problems in aligning two structures low similarity. </li></ul><ul><li>Alignments sometimes fra...
Algorithm- Pitfalls Reference: 10 ‘difficult‘ structures from (Fischer et al.) obtained by Dali (Holm & Sander, 1993)   , ...
Algorithm - Extension Introduction  Method  Results  Outlook
C α  Match increased Yes Intra-Mol distance based Seed Alignments (Fragments & Quartets) Next No Best Frag Quartet 1 Quart...
Similarity Score : similarity score used in DP   :distance :distance resulting in half-maximal  similarity M   :maximum sc...
Quality Function Q score, ratio of no.  of aligned residues & RMSD :number of matches (equivalent residues) :length of str...
Benchmarking and Extension of Protein3DFit <ul><li>Introduction </li></ul><ul><ul><li>Why Align Structures? </li></ul></ul...
Results source: 10 ‘difficult‘ structures from (Fischer et al.) obtained by Dali (Holm & Sander, 1993)   , VAST (Madej et ...
Results Introduction  Method  Results   Outlook
CE   Dali 3Dfit-old 3Dfit-ext Introduction  Method  Results   Outlook
CE   Dali 3Dfit-old 3Dfit-ext
3Dfit-ext 3Dfit-old
Outlook / Possible Extensions <ul><li>Use of secondary structure information in alignment step. </li></ul><ul><li>Update P...
Acknowlegdments <ul><li>Prof. Dietmar Schomburg </li></ul><ul><li>Pascal Benkert </li></ul><ul><li>Structural Biology Grou...
Questions? Thanks a lot for your patience!
Upcoming SlideShare
Loading in …5
×

Protein Structure Alignment

2,241 views

Published on

MS bioinformatics - Project talk held on 01.04.2006

Published in: Technology, Business
  • Be the first to comment

Protein Structure Alignment

  1. 1. Benchmarking and Extension of Protein3DFit Nasir Mahmood
  2. 2. Benchmarking and Extension of Protein3DFit <ul><li>Introduction </li></ul><ul><ul><li>Why Align Structures? </li></ul></ul><ul><ul><li>Structure Alignment: Issues & problem </li></ul></ul><ul><li>Method </li></ul><ul><ul><li>Algorithm </li></ul></ul><ul><ul><li>Pitfalls </li></ul></ul><ul><ul><li>Extension </li></ul></ul><ul><li>Results </li></ul><ul><li>Outlook </li></ul>Introduction Method Results Outlook
  3. 3. Why Align Structures? <ul><li>Fundamental step in: </li></ul><ul><ul><li>Analysis & Visualization </li></ul></ul><ul><ul><li>Comparison </li></ul></ul><ul><ul><li>Design </li></ul></ul><ul><li>Useful for: </li></ul><ul><ul><li>Structure classification </li></ul></ul><ul><ul><ul><li>To predict protein folds. </li></ul></ul></ul><ul><ul><ul><li>Gold standard for sequence alignment. </li></ul></ul></ul><ul><ul><li>Structure prediction: </li></ul></ul><ul><ul><ul><li>Identifying “structural core”. </li></ul></ul></ul><ul><ul><li>Function prediction </li></ul></ul><ul><ul><ul><li>Function of protein is determined by its structure. </li></ul></ul></ul><ul><ul><li>Drug discovery </li></ul></ul><ul><ul><ul><li>To pinpoint active sites accurately. </li></ul></ul></ul>Introduction Method Results Outlook
  4. 4. Structure Alignment: Issues <ul><li>Theoretical Issues </li></ul><ul><ul><li>NP-hard geometric problem </li></ul></ul><ul><ul><li>Heuristics needed </li></ul></ul><ul><ul><li>No unique solution </li></ul></ul><ul><li>Methodological Issues </li></ul><ul><li>Choices: </li></ul><ul><ul><li>• Structure Representation </li></ul></ul><ul><ul><li>• Scoring function </li></ul></ul><ul><ul><li>• Search algorithm </li></ul></ul><ul><li>Goals Desirable </li></ul><ul><li>Automatic </li></ul><ul><li>Discriminating </li></ul><ul><li>Fast </li></ul>Introduction Method Results Outlook
  5. 5. Simple case – two closely related proteins with the same number of amino acids. Structure alignment Problem A B Introduction Method Results Outlook T Find a transformation to achieve the best superposition
  6. 6. Structure Alignment Problem Two sets of Points in 3D space: A = (a1, a2, …, an) B = (b1, b2, …, bm) A(P) , B(Q): Optimal subsets with | A(P) |=| B(Q) |, Introduction Method Results Outlook Optimal rigid body transformation (calculated by Diamond (1998) method) Distance { d( A(P) - T( B(Q) ) ) } min T
  7. 7. Two Subproblems <ul><li>Find correspondence set </li></ul><ul><li>Find optimal transformation </li></ul><ul><ul><li>(protein superposition problem) </li></ul></ul>Introduction Method Results Outlook
  8. 8. Correspondence set <ul><li>Correspondence is a key problem. </li></ul><ul><li>Exhaustive search intractable. </li></ul><ul><li>Heuristics are applied. </li></ul>Introduction Method Results Outlook
  9. 9. Superposition <ul><li>Two sets of points A= {a1,…,an} and B = {b1,…,bn} </li></ul><ul><li>Correspondence pairs (ai, bi) </li></ul><ul><li>Find T </li></ul><ul><li> min { RMSD(A,T(B)) } </li></ul>T Introduction Method Results Outlook
  10. 10. Aim of the project <ul><li>To make algorithm more efficient </li></ul><ul><ul><li>debugging </li></ul></ul><ul><ul><li>optimization </li></ul></ul><ul><ul><li>extension </li></ul></ul><ul><ul><li>imporve quality of alignments </li></ul></ul><ul><ul><li>organization of code </li></ul></ul>Introduction Method Results Outlook
  11. 11. Benchmarking and Extension of Protein3DFit <ul><li>Introduction </li></ul><ul><ul><li>Why Align Structures? </li></ul></ul><ul><ul><li>Structure Alignment: Issues & problem </li></ul></ul><ul><li>Method </li></ul><ul><ul><li>Algorithm </li></ul></ul><ul><ul><li>Pitfalls </li></ul></ul><ul><ul><li>Extension </li></ul></ul><ul><li>Results </li></ul><ul><li>Outlook </li></ul>Introduction Method Results Outlook
  12. 12. Algorithm Intra-Molecular distance Seed Alignments (Fragments & Quartets) Next Best Frag Quartet 1 Quartet 2 .............. .............. Alignment Dynamic Programming i<N No Stop C α Match? increased Yes No Sup. Archive Yes Optimization Best Supp. Display Best Alignment Introduction Method Results Outlook Structures Superimpose
  13. 13. Intra-molecular Distance Matrices 1 2 3 4 5 . . . . . N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Structure A Structure B 1.4 3.0 0.7 1.9 0.0 4.7 0.0 2.9 0.0 0.0 2.1 0.3 0.0 3.1 0.6 2.9 0.0 0.0 0.0 0.4 0.0 5.9 0.0 2.3 1.4 3.0 0.7 1.9 0.0 4.7 0.0 2.9 2.1 0.3 0.0 3.1 0.6 2.9 0.0 0.0 0.4 0.0 5.9 0.0 2.3 0.4 0.0 5.9 0.0 2.3 3.1 0.6 2.9 0.0 0.6 2.9 0.0 1 2 3 4 5 ........................... N 0.0 0.0 0.0 1.4 3.0 0.0 1.9 0.0 4.7 0.0 2.9 0.0 0.0 2.1 0.3 0.0 3.1 0.6 2.9 0.0 0.0 0.0 0.4 0.0 5.9 0.0 2.3 1.4 1.4 1.4 1 2 3 4 5 ............................ M 1 2 3 4 5 . . . . . M i i+1 i+2 i+3 i+4 i+5 i+6 i+7 Introduction Method Results Outlook i i+1 i+2 i+3 i+4 i+5 i+6 i+7
  14. 14. Seed Alignments Extend Fragment Make Fragment Quartet Introduction Method Results Outlook
  15. 15. Algorithm Intra-Molecular distance Seed Alignments (Fragments & Quartets) Next Best Frag Quartet 1 Quartet 2 .............. .............. Alignment Dynamic Programming i<N No Stop C α Match? increased Yes No Sup. Archive Yes Optimization Best Supp. Display Best Alignment Introduction Method Results Outlook Structures Superimpose
  16. 16. Algorithm- Pitfalls <ul><li>Simple Similarity measure </li></ul><ul><li>1: Match </li></ul><ul><li>0: No match </li></ul>Introduction Method Results Outlook
  17. 17. Algorithm- Pitfalls <ul><li>Problems in aligning two structures low similarity. </li></ul><ul><li>Alignments sometimes fragmented. </li></ul><ul><li>Single matching residues not in sequential order (letters in the alignment). </li></ul><ul><li>No optimization after dynamic programming. </li></ul>Introduction Method Results Outlook
  18. 18. Algorithm- Pitfalls Reference: 10 ‘difficult‘ structures from (Fischer et al.) obtained by Dali (Holm & Sander, 1993) , VAST (Madej et al., 1995) & CE (Shindyalov & Bourne, 1998) 187/3.2 211/3.4 100/1.2 1EDE:_ 1CRL:_ 7 116/2.9 108/2.0 82/1.7 70/1.1 4FGF:_ 1TIE:_ 10 94/4.1 98/3.5 74/2.5 52/1.1 2GMF:A 1BGE:B 9 264/3.0 286/3.8 284/3.8 129/1.2 1NSB:A 2SIM:_ 8 94/2.7 95/3.3 85/2.2 59/1.1 2RHE:_ 1CID:_ 6 69/1.9 81/2.3 71/1.9 54/1.2 1MOL:A 1CEW:I 5 85/2.9 _ 74/2.2 52/1.2 1PAZ:_ 2AZA:A 4 85/3.5 63/2.5 _ 42/1.1 2RHE:_ 3HLA:B 3 87/1.9 86/1.9 78/1.6 71/1.0 3HHR:B 1TEN:_ 2 _ _ 48/2.1 39/1.0 1UBQ:_ 1FXI:A 1 CE DALI VAST Protein3DFit Chain 2 Chain 1 No
  19. 19. Algorithm - Extension Introduction Method Results Outlook
  20. 20. C α Match increased Yes Intra-Mol distance based Seed Alignments (Fragments & Quartets) Next No Best Frag Quartet 1 Quartet 2 .............. .............. Alignment Dynamic Programming Optimization C α Matches ? Increase Yes No Improved? i<N Sup. Archive Yes No i<M Yes Yes No Optimization Stop Get Supp. Display Best Alignment Next No Superimpose Structures Superimpose
  21. 21. Similarity Score : similarity score used in DP :distance :distance resulting in half-maximal similarity M :maximum score for a match Source: Gerstein Levitt, 1998 Introduction Method Results Outlook
  22. 22. Quality Function Q score, ratio of no. of aligned residues & RMSD :number of matches (equivalent residues) :length of structure 1 & 2 respectively :empirical parameter measuring Relative significance of & RMSD Source: Krissinel & Henrich (2004) Introduction Method Results Outlook
  23. 23. Benchmarking and Extension of Protein3DFit <ul><li>Introduction </li></ul><ul><ul><li>Why Align Structures? </li></ul></ul><ul><ul><li>Structure Alignment: Issues & problem </li></ul></ul><ul><li>Method </li></ul><ul><ul><li>Algorithm </li></ul></ul><ul><ul><li>Pitfalls </li></ul></ul><ul><ul><li>Extension </li></ul></ul><ul><li>Results </li></ul><ul><li>Outlook </li></ul>Introduction Method Results Outlook
  24. 24. Results source: 10 ‘difficult‘ structures from (Fischer et al.) obtained by Dali (Holm & Sander, 1993) , VAST (Madej et al., 1995) & CE (Shindyalov & Bourne, 1998). 39/1.0 71/1.0 42/1.1 52/1.2 54/1.2 59/1.1 100/1.2 129/1.2 52/1.1 70/1.1 Introduction Method Results Outlook 187/3.2 211/3.4 181/3.2 1EDE:_ 1CRL:_ 7 116/2.9 108/2.0 82/1.7 105/2.6 4FGF:_ 1TIE:_ 10 94/4.1 98/3.5 74/2.5 81/3.0 2GMF:A 1BGE:B 9 264/3.0 286/3.8 284/3.8 265/3.2 1NSB:A 2SIM:_ 8 94/2.7 95/3.3 85/2.2 91/2.6 2RHE:_ 1CID:_ 6 69/1.9 81/2.3 71/1.9 76/2.1 1MOL:A 1CEW:I 5 85/2.9 _ 74/2.2 80/2.3 1PAZ:_ 2AZA:A 4 85/3.5 63/2.5 _ 69/3.4 2RHE:_ 3HLA:B 3 87/1.9 86/1.9 78/1.6 82/1.7 3HHR:B 1TEN:_ 2 _ _ 48/2.1 59/2.9 1UBQ:_ 1FXI:A 1 CE DALI VAST Protein3DFit Chain 2 Chain 1 No
  25. 25. Results Introduction Method Results Outlook
  26. 26. CE Dali 3Dfit-old 3Dfit-ext Introduction Method Results Outlook
  27. 27. CE Dali 3Dfit-old 3Dfit-ext
  28. 28. 3Dfit-ext 3Dfit-old
  29. 29. Outlook / Possible Extensions <ul><li>Use of secondary structure information in alignment step. </li></ul><ul><li>Update Protein3DFit webserver </li></ul><ul><li>Multiple structure superposition </li></ul><ul><li>Alignment of a protein structure against PDB </li></ul><ul><li>...... </li></ul>
  30. 30. Acknowlegdments <ul><li>Prof. Dietmar Schomburg </li></ul><ul><li>Pascal Benkert </li></ul><ul><li>Structural Biology Group & CUBIC </li></ul>
  31. 31. Questions? Thanks a lot for your patience!

×