STAR: Recombination site prediction - Presentation Transcript
Predicting structural disruption caused by crossover : a machine learning approach Denis C. Bauer Talk CIBCB 2005
Outline
Introduction in Protein Design
Theory of SCHEMA
Our Approach
Results
Summary
Protein
Biological Functions
Proteins are fundamental components of all living cells
Messenger Function (e.g. Hormones)
Catalystic Function (e.g. Enzymes)
Regulatoy Function (e.g. Antibodies)
Protein Design for Industry and Medicine
Better adjusted
New function
Introduction
Protein Structure
Primary Structure
Secondary Structure
Tertiary Structure
Quaternary Structure
Pictures from : Principles of BIOCHEMISTRY, Horton, Moran, Ochs, Rawn, Scrimgeours Introduction
Protein Design
Creating new amino acid sequences
Huge sequence space
Not every possible sequence is stable
Solution: using sequences which already exist Introduction Gly Ala – Glu Thr Pro Val Gly Asp – – – Glu Thr Pro – – – – – – Gly Ala – Glu Pro – – – 20 100 possible Amino Acid sequences
Benefit of Recombination KEMHQPLTFGELENLPLLNTDKPVQALM Problem: how to identify recombination sites ? Introduction KIPDELGLIFKFEAPGRVTRVLSSQ … M H K L N E K A P TIKELPQPPTFGELKKLPLLNTDKPVQAL M L K P G K G MKIADELGEIFKFEAPGRVTRYLSSQ… A P E L Y A Better resistant to heat Higher performance Higher performance Better resistant to heat Mayfly Lives where its hot MKIPDELGLIFKFEAPGRVTRALSSQ… MKIPDELGLIFKFEAPGRVTRALSSQ… KEMHQPLTFGELENLPLLNTDKPVQAL KEMHQPLTFGELENLPLLNTDKPVQAL
SCHEMA
Research group of Prof. Francis Arnold
Idea: Positions where the least interaction are disrupted
SCHEMA SCHEMA profile
Limitations
3D Structure necessary
Problem: hard to derive for some proteins
time consuming
expensive
Solution: Disengaging from 3D structure SCHEMA
Our approach
Alternative to SCHEMA 3D Structure Information Schema Alg Schema Score Predicting Sequence Benefit: All Proteins can be processed Our Approach
Predicting Schema-Profile Predicted Schema Score Sequence Support Vector Regression Predictive Model * * Bodén, M., Yuan, Z. and Bailey, T. L. Prediction of protein continuum secondary structure with probabilistic models. submitted Our Approach Model Bidirectional Recurrent Network Feed Forward Neural Network
Results Table 1 Results for all approaches. r = correlation coefficient (ideally 1), devA = Root Mean Square Error (RMSE) normalized by the standard deviation (ideally 0). Results 0.62 0.83 SVR nu 0.63 0.82 SVR eps 0.52 0.88 BRNN 0.57 0.86 FFNN devA r Method
Results Results
Results Results
Refinements Contact Numbers Predicting Model Predicted Schema Score predicted Input features Solvent Accessibility Score CC 0.88 0.88 0.6 Ensemble 0.88 Results ML model ML model ML model ML model
However…
Only a limited number of connections are considered
Broken connections are reconnected after recombination
Summary
Design proteins with recombination rather than from scratch
Identifiy recombination site
Idea: finding the sites where the least interactions are disrupted (SCHEMA)
Predicting SCHEMA-score to overcome the limitation
SCHEMA too limited to be the only means for recombination site prediction
Future work
All interactions
Actual recombination process
Acknowledgments
Supervisors Dr. Mikael Bod é n and Dr. Ricarda Thier
Dr. Zheng Yuan
Prof. Francis Arnold’s research group
Thank you Ref: C. A. Voigt, C. Martinez, Z.-G. Wang, S. L. Mayo, and F. H. Arnold, Protein building blocks preserved by recombination, Nat Struct Biol, vol. 9, no. 7, pp. 553-558, Jul 2002. Meyer MM, Silberg JJ, Voigt CA, Endelman JB, Mayo SL, Wang ZG, Arnold FH. Library analysis of SCHEMA-guided protein recombination. Protein Sci. 2003 Aug;12(8):1686-93. Bodén, M., Yuan, Z. and Bailey, T. L. Prediction of protein continuum secondary structure with probabilistic models. submitted.
The presentation was given at the CIBCB, 2005, in S more
The presentation was given at the CIBCB, 2005, in San Diego about our approach to predict recombination sites in protein sequence. Recombination is the method of choice for designing new proteins with desired new or enhanced properties.
The publication is : Bauer, D.C., Bodén, M., Thier, R. and Gillam, E. M. “STAR: Predicting recombination sites from amino acid sequence.” BMC Bioinformatics, 2006 Oct 8; 7:437. PMID: 17026775 less
0 comments
Post a comment