Predicting structural disruption caused by crossover :  a machine learning approach   Denis C. Bauer Talk  CIBCB 2005
Outline <ul><li>Introduction in Protein Design  </li></ul><ul><li>Theory of SCHEMA </li></ul><ul><li>Our Approach </li></u...
Protein <ul><li>Biological Functions </li></ul><ul><ul><li>Proteins are fundamental components of all living cells  </li><...
Protein Structure <ul><li>Primary Structure </li></ul><ul><li>Secondary Structure </li></ul><ul><li>Tertiary Structure </l...
Protein Design <ul><li>Creating new amino acid sequences </li></ul><ul><ul><li>Huge sequence space </li></ul></ul><ul><ul>...
Benefit of Recombination KEMHQPLTFGELENLPLLNTDKPVQALM  Problem: how to identify recombination sites ? Introduction KIPDELG...
SCHEMA <ul><li>Research group of Prof. Francis Arnold </li></ul><ul><li>Idea: Positions where the least interaction are di...
Limitations <ul><li>3D Structure necessary </li></ul><ul><ul><li>Problem: hard to derive for some proteins </li></ul></ul>...
Our approach
Alternative to SCHEMA 3D Structure Information  Schema Alg  Schema Score  Predicting Sequence Benefit: All Proteins can be...
Predicting Schema-Profile Predicted  Schema Score  Sequence Support Vector  Regression Predictive Model * * Bodén, M., Yua...
Results Table 1  Results for all approaches. r = correlation coefficient (ideally 1), devA = Root Mean Square Error (RMSE)...
Results Results
Results Results
Refinements Contact Numbers Predicting  Model Predicted  Schema Score  predicted Input features Solvent Accessibility Scor...
However… <ul><li>Only a limited number of connections are considered </li></ul><ul><li>Broken connections are reconnected ...
Summary <ul><li>Design proteins with recombination rather than from scratch </li></ul><ul><ul><li>Identifiy recombination ...
Acknowledgments <ul><li>Supervisors Dr. Mikael Bod é n and Dr. Ricarda Thier </li></ul><ul><li>Dr. Zheng Yuan  </li></ul><...
Thank you Ref: C. A. Voigt, C. Martinez, Z.-G. Wang, S. L. Mayo, and F. H. Arnold, Protein building blocks preserved by re...
PDB 1zg4
Recombination Site Identification <ul><li>Recombination vs Mutagenesis or Design </li></ul><ul><li>  from scratch </li></u...
Possible approaches <ul><li>Identify a new measure for evaluating hybrids (derived from datasets of biologically produced ...
Upcoming SlideShare
Loading in …5
×

STAR: Recombination site prediction

1,034 views
904 views

Published on

The presentation was given at the CIBCB, 2005, in San Diego about our approach to predict recombination sites in protein sequence. Recombination is the method of choice for designing new proteins with desired new or enhanced properties.

The publication is :
Bauer, D.C., Bodén, M., Thier, R. and Gillam, E. M. “STAR: Predicting recombination sites from amino acid sequence.” BMC Bioinformatics, 2006 Oct 8; 7:437. PMID: 17026775

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,034
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
20
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • STAR: Recombination site prediction

    1. 1. Predicting structural disruption caused by crossover : a machine learning approach Denis C. Bauer Talk CIBCB 2005
    2. 2. Outline <ul><li>Introduction in Protein Design </li></ul><ul><li>Theory of SCHEMA </li></ul><ul><li>Our Approach </li></ul><ul><li>Results </li></ul><ul><li>Summary </li></ul>
    3. 3. Protein <ul><li>Biological Functions </li></ul><ul><ul><li>Proteins are fundamental components of all living cells </li></ul></ul><ul><ul><ul><li>Messenger Function (e.g. Hormones) </li></ul></ul></ul><ul><ul><ul><li>Catalystic Function (e.g. Enzymes) </li></ul></ul></ul><ul><ul><ul><li>Regulatoy Function (e.g. Antibodies) </li></ul></ul></ul><ul><li>Protein Design for Industry and Medicine </li></ul><ul><ul><li>Better adjusted </li></ul></ul><ul><ul><li>New function </li></ul></ul>Introduction
    4. 4. Protein Structure <ul><li>Primary Structure </li></ul><ul><li>Secondary Structure </li></ul><ul><li>Tertiary Structure </li></ul><ul><li>Quaternary Structure </li></ul>Pictures from : Principles of BIOCHEMISTRY, Horton, Moran, Ochs, Rawn, Scrimgeours Introduction
    5. 5. Protein Design <ul><li>Creating new amino acid sequences </li></ul><ul><ul><li>Huge sequence space </li></ul></ul><ul><ul><li>Not every possible sequence is stable </li></ul></ul>Solution: using sequences which already exist Introduction Gly Ala – Glu Thr Pro Val Gly Asp – – – Glu Thr Pro – – – – – – Gly Ala – Glu Pro – – – 20 100 possible Amino Acid sequences
    6. 6. Benefit of Recombination KEMHQPLTFGELENLPLLNTDKPVQALM Problem: how to identify recombination sites ? Introduction KIPDELGLIFKFEAPGRVTRVLSSQ … M H K L N E K A P TIKELPQPPTFGELKKLPLLNTDKPVQAL M L K P G K G MKIADELGEIFKFEAPGRVTRYLSSQ… A P E L Y A Better resistant to heat Higher performance Higher performance Better resistant to heat Mayfly Lives where its hot MKIPDELGLIFKFEAPGRVTRALSSQ… MKIPDELGLIFKFEAPGRVTRALSSQ… KEMHQPLTFGELENLPLLNTDKPVQAL KEMHQPLTFGELENLPLLNTDKPVQAL
    7. 7. SCHEMA <ul><li>Research group of Prof. Francis Arnold </li></ul><ul><li>Idea: Positions where the least interaction are disrupted </li></ul>SCHEMA SCHEMA profile
    8. 8. Limitations <ul><li>3D Structure necessary </li></ul><ul><ul><li>Problem: hard to derive for some proteins </li></ul></ul><ul><ul><ul><li>time consuming </li></ul></ul></ul><ul><ul><ul><li>expensive </li></ul></ul></ul>Solution: Disengaging from 3D structure SCHEMA
    9. 9. Our approach
    10. 10. Alternative to SCHEMA 3D Structure Information Schema Alg Schema Score Predicting Sequence Benefit: All Proteins can be processed Our Approach
    11. 11. Predicting Schema-Profile Predicted Schema Score Sequence Support Vector Regression Predictive Model * * Bodén, M., Yuan, Z. and Bailey, T. L. Prediction of protein continuum secondary structure with probabilistic models. submitted Our Approach Model Bidirectional Recurrent Network Feed Forward Neural Network
    12. 12. Results Table 1 Results for all approaches. r = correlation coefficient (ideally 1), devA = Root Mean Square Error (RMSE) normalized by the standard deviation (ideally 0). Results 0.62 0.83 SVR nu 0.63 0.82 SVR eps 0.52 0.88 BRNN 0.57 0.86 FFNN devA r Method
    13. 13. Results Results
    14. 14. Results Results
    15. 15. Refinements Contact Numbers Predicting Model Predicted Schema Score predicted Input features Solvent Accessibility Score CC 0.88 0.88 0.6 Ensemble 0.88 Results ML model ML model ML model ML model
    16. 16. However… <ul><li>Only a limited number of connections are considered </li></ul><ul><li>Broken connections are reconnected after recombination </li></ul>
    17. 17. Summary <ul><li>Design proteins with recombination rather than from scratch </li></ul><ul><ul><li>Identifiy recombination site </li></ul></ul><ul><ul><li>Idea: finding the sites where the least interactions are disrupted (SCHEMA) </li></ul></ul><ul><li>Predicting SCHEMA-score to overcome the limitation </li></ul><ul><li>SCHEMA too limited to be the only means for recombination site prediction </li></ul><ul><li>Future work </li></ul><ul><ul><li>All interactions </li></ul></ul><ul><ul><li>Actual recombination process </li></ul></ul>
    18. 18. Acknowledgments <ul><li>Supervisors Dr. Mikael Bod é n and Dr. Ricarda Thier </li></ul><ul><li>Dr. Zheng Yuan </li></ul><ul><li>Prof. Francis Arnold’s research group </li></ul>
    19. 19. Thank you Ref: C. A. Voigt, C. Martinez, Z.-G. Wang, S. L. Mayo, and F. H. Arnold, Protein building blocks preserved by recombination, Nat Struct Biol, vol. 9, no. 7, pp. 553-558, Jul 2002. Meyer MM, Silberg JJ, Voigt CA, Endelman JB, Mayo SL, Wang ZG, Arnold FH. Library analysis of SCHEMA-guided protein recombination. Protein Sci. 2003 Aug;12(8):1686-93. Bodén, M., Yuan, Z. and Bailey, T. L. Prediction of protein continuum secondary structure with probabilistic models. submitted.
    20. 20. PDB 1zg4
    21. 21. Recombination Site Identification <ul><li>Recombination vs Mutagenesis or Design </li></ul><ul><li> from scratch </li></ul><ul><ul><li>Higher fraction of functional proteins </li></ul></ul><ul><ul><li>Higher diversity  higher chance to find </li></ul></ul><ul><ul><li> a better hybrid </li></ul></ul><ul><li>Requirement </li></ul><ul><ul><li>Identify recombination site </li></ul></ul><ul><ul><li>Identify which segments are useful </li></ul></ul><ul><ul><li>Identify beneficial segment combinations </li></ul></ul><ul><li>Existing methods </li></ul><ul><ul><li>SCHEMA (Hybrid evaluation : avoid breaking connections) </li></ul></ul><ul><ul><li>FamClash (Hybrid evaluation : avoid changing properties of </li></ul></ul><ul><ul><li>residue pairs) </li></ul></ul><ul><ul><li>STAR (Site suggestion according to strucural compactness) </li></ul></ul><ul><li>Known methods too limited to be a good means for </li></ul><ul><li>recombination site prediction </li></ul>http://www.che.caltech.edu/groups/fha/
    22. 22. Possible approaches <ul><li>Identify a new measure for evaluating hybrids (derived from datasets of biologically produced hybrids) </li></ul><ul><li>Include more information in the decision process </li></ul><ul><ul><li>Sequence/Structure (SCHEMA) </li></ul></ul><ul><ul><li>Chemical features (FamClash) </li></ul></ul><ul><ul><li>Predicting important residues for structure and/or function </li></ul></ul><ul><ul><li>Predicting enzyme function from protein sequence </li></ul></ul><ul><ul><li>Substitution tolerance </li></ul></ul><ul><ul><li>Hydrophobic patterning </li></ul></ul><ul><ul><li>Surface clefts or binding sites </li></ul></ul><ul><ul><li>Solvent accessibility </li></ul></ul><ul><ul><li>Domains/motifs of parents </li></ul></ul>

    ×