STAR: Recombination site prediction
Upcoming SlideShare
Loading in...5
×
 

STAR: Recombination site prediction

on

  • 1,410 views

The presentation was given at the CIBCB, 2005, in San Diego about our approach to predict recombination sites in protein sequence. Recombination is the method of choice for designing new proteins with ...

The presentation was given at the CIBCB, 2005, in San Diego about our approach to predict recombination sites in protein sequence. Recombination is the method of choice for designing new proteins with desired new or enhanced properties.

The publication is :
Bauer, D.C., Bodén, M., Thier, R. and Gillam, E. M. “STAR: Predicting recombination sites from amino acid sequence.” BMC Bioinformatics, 2006 Oct 8; 7:437. PMID: 17026775

Statistics

Views

Total Views
1,410
Slideshare-icon Views on SlideShare
1,405
Embed Views
5

Actions

Likes
0
Downloads
13
Comments
0

2 Embeds 5

http://www.linkedin.com 4
http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

STAR: Recombination site prediction STAR: Recombination site prediction Presentation Transcript

  • Predicting structural disruption caused by crossover : a machine learning approach Denis C. Bauer Talk CIBCB 2005
  • Outline
    • Introduction in Protein Design
    • Theory of SCHEMA
    • Our Approach
    • Results
    • Summary
  • Protein
    • Biological Functions
      • Proteins are fundamental components of all living cells
        • Messenger Function (e.g. Hormones)
        • Catalystic Function (e.g. Enzymes)
        • Regulatoy Function (e.g. Antibodies)
    • Protein Design for Industry and Medicine
      • Better adjusted
      • New function
    Introduction
  • Protein Structure
    • Primary Structure
    • Secondary Structure
    • Tertiary Structure
    • Quaternary Structure
    Pictures from : Principles of BIOCHEMISTRY, Horton, Moran, Ochs, Rawn, Scrimgeours Introduction
  • Protein Design
    • Creating new amino acid sequences
      • Huge sequence space
      • Not every possible sequence is stable
    Solution: using sequences which already exist Introduction Gly Ala – Glu Thr Pro Val Gly Asp – – – Glu Thr Pro – – – – – – Gly Ala – Glu Pro – – – 20 100 possible Amino Acid sequences
  • Benefit of Recombination KEMHQPLTFGELENLPLLNTDKPVQALM Problem: how to identify recombination sites ? Introduction KIPDELGLIFKFEAPGRVTRVLSSQ … M H K L N E K A P TIKELPQPPTFGELKKLPLLNTDKPVQAL M L K P G K G MKIADELGEIFKFEAPGRVTRYLSSQ… A P E L Y A Better resistant to heat Higher performance Higher performance Better resistant to heat Mayfly Lives where its hot MKIPDELGLIFKFEAPGRVTRALSSQ… MKIPDELGLIFKFEAPGRVTRALSSQ… KEMHQPLTFGELENLPLLNTDKPVQAL KEMHQPLTFGELENLPLLNTDKPVQAL
  • SCHEMA
    • Research group of Prof. Francis Arnold
    • Idea: Positions where the least interaction are disrupted
    SCHEMA SCHEMA profile
  • Limitations
    • 3D Structure necessary
      • Problem: hard to derive for some proteins
        • time consuming
        • expensive
    Solution: Disengaging from 3D structure SCHEMA
  • Our approach
  • Alternative to SCHEMA 3D Structure Information Schema Alg Schema Score Predicting Sequence Benefit: All Proteins can be processed Our Approach
  • Predicting Schema-Profile Predicted Schema Score Sequence Support Vector Regression Predictive Model * * Bodén, M., Yuan, Z. and Bailey, T. L. Prediction of protein continuum secondary structure with probabilistic models. submitted Our Approach Model Bidirectional Recurrent Network Feed Forward Neural Network
  • Results Table 1 Results for all approaches. r = correlation coefficient (ideally 1), devA = Root Mean Square Error (RMSE) normalized by the standard deviation (ideally 0). Results 0.62 0.83 SVR nu 0.63 0.82 SVR eps 0.52 0.88 BRNN 0.57 0.86 FFNN devA r Method
  • Results Results
  • Results Results
  • Refinements Contact Numbers Predicting Model Predicted Schema Score predicted Input features Solvent Accessibility Score CC 0.88 0.88 0.6 Ensemble 0.88 Results ML model ML model ML model ML model
  • However…
    • Only a limited number of connections are considered
    • Broken connections are reconnected after recombination
  • Summary
    • Design proteins with recombination rather than from scratch
      • Identifiy recombination site
      • Idea: finding the sites where the least interactions are disrupted (SCHEMA)
    • Predicting SCHEMA-score to overcome the limitation
    • SCHEMA too limited to be the only means for recombination site prediction
    • Future work
      • All interactions
      • Actual recombination process
  • Acknowledgments
    • Supervisors Dr. Mikael Bod é n and Dr. Ricarda Thier
    • Dr. Zheng Yuan
    • Prof. Francis Arnold’s research group
  • Thank you Ref: C. A. Voigt, C. Martinez, Z.-G. Wang, S. L. Mayo, and F. H. Arnold, Protein building blocks preserved by recombination, Nat Struct Biol, vol. 9, no. 7, pp. 553-558, Jul 2002. Meyer MM, Silberg JJ, Voigt CA, Endelman JB, Mayo SL, Wang ZG, Arnold FH. Library analysis of SCHEMA-guided protein recombination. Protein Sci. 2003 Aug;12(8):1686-93. Bodén, M., Yuan, Z. and Bailey, T. L. Prediction of protein continuum secondary structure with probabilistic models. submitted.
  • PDB 1zg4
  • Recombination Site Identification
    • Recombination vs Mutagenesis or Design
    • from scratch
      • Higher fraction of functional proteins
      • Higher diversity  higher chance to find
      • a better hybrid
    • Requirement
      • Identify recombination site
      • Identify which segments are useful
      • Identify beneficial segment combinations
    • Existing methods
      • SCHEMA (Hybrid evaluation : avoid breaking connections)
      • FamClash (Hybrid evaluation : avoid changing properties of
      • residue pairs)
      • STAR (Site suggestion according to strucural compactness)
    • Known methods too limited to be a good means for
    • recombination site prediction
    http://www.che.caltech.edu/groups/fha/
  • Possible approaches
    • Identify a new measure for evaluating hybrids (derived from datasets of biologically produced hybrids)
    • Include more information in the decision process
      • Sequence/Structure (SCHEMA)
      • Chemical features (FamClash)
      • Predicting important residues for structure and/or function
      • Predicting enzyme function from protein sequence
      • Substitution tolerance
      • Hydrophobic patterning
      • Surface clefts or binding sites
      • Solvent accessibility
      • Domains/motifs of parents