SUMOylation-site Prediction Denis C. Bauer Fabian A. Buske Mikael Bod én
Overview <ul><li>Background </li></ul><ul><ul><li>SUMOylation - what is that ? </li></ul></ul><ul><li>Published predictors...
SUMO is not  相撲 <ul><li>S mall  U biquitin-related  Mo difier is a small protein of 97 amino acids.  </li></ul><ul><li>20%...
SUMOylation pathway
SUMOylation motif <ul><li>One consensus motif  [ILV]K.E  for about 60% of known sites </li></ul><ul><li>However </li></ul>...
Baseline prediction Method CC Regular Expression scanner 0.68
Comparison with existing predictors + Xu J.,  BMC Bioinformatics  2008, 9:8 ‡  Xue Y.,  Nucleic Acid Res  2006,  W254 -W 2...
Case study : Core histones in yeast <ul><li>Identified SUMOylation sites + </li></ul><ul><ul><li>H2B : K6/7, K16/17 </li><...
Our approach <ul><li>Identify  </li></ul><ul><ul><li>window size </li></ul></ul><ul><ul><li>which ML method is best </li><...
Training in more Detail w U w D Protein  Sequence K Imbalance in the dataset - more negatives than positives  SUMOylated K...
Prediction in more Detail w U w D Protein  Sequence K K K Trained ML 1 1 0 K K SUMOylated K Not SUMOylated K K K
ML methods <ul><li>Bidirectional Recurrent Neural Network (BRNN) </li></ul><ul><ul><li>Using information of flanking windo...
Data set <ul><li>Training/Testing data </li></ul><ul><ul><li>144 proteins with  </li></ul></ul><ul><ul><li>241 SUMOylation...
Evaluation <ul><li>5-fold cross-validation </li></ul><ul><li>Matthews correlation coefficient (CC) </li></ul><ul><li>Sensi...
Performance overview SUMOsvm
Comparison with existing methods
Quest to improve performance  <ul><li>Protein structural features and evolutionary features  </li></ul><ul><li>Separating ...
Summary <ul><li>Regular Expression Scanner is still the best classifier. </li></ul><ul><li>SUMO more versatile than expect...
Acknowledgment  <ul><li>Predictor/Analysis </li></ul><ul><ul><li>Mikael Bod én </li></ul></ul><ul><ul><li>Fabian Buske </l...
Upcoming SlideShare
Loading in …5
×

SUMOylation site prediction

6,058 views

Published on

This presentation is about predicting the sites within the primary sequence of a protein that are involved in the SUMOylation process.

Published in: Education, Technology
1 Comment
3 Likes
Statistics
Notes
  • The citation of the corresponding paper:
    Bauer, D.C., Fabian A. Buske, Mikael Bodén “Predicting SUMOylation Sites.” Lectures Notes in Bioinformatics, Volume 5265/2008, pp 28-40, Springer (PRIB 2008)
    http://www.springerlink.com/content/m635n39133134764/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
6,058
On SlideShare
0
From Embeds
0
Number of Embeds
24
Actions
Shares
0
Downloads
41
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

SUMOylation site prediction

  1. 1. SUMOylation-site Prediction Denis C. Bauer Fabian A. Buske Mikael Bod én
  2. 2. Overview <ul><li>Background </li></ul><ul><ul><li>SUMOylation - what is that ? </li></ul></ul><ul><li>Published predictors </li></ul><ul><li>Our approach </li></ul><ul><li>What makes SUMO hard to tackle </li></ul>
  3. 3. SUMO is not 相撲 <ul><li>S mall U biquitin-related Mo difier is a small protein of 97 amino acids. </li></ul><ul><li>20% homology to ubiquitin </li></ul><ul><li>Post-translational modification </li></ul><ul><li>Covalently attached to Lysines </li></ul><ul><li>Involved in many pathways/mechanisms </li></ul><ul><ul><li>Transcriptional regulation </li></ul></ul><ul><ul><li>Compartmentisation </li></ul></ul>
  4. 4. SUMOylation pathway
  5. 5. SUMOylation motif <ul><li>One consensus motif [ILV]K.E for about 60% of known sites </li></ul><ul><li>However </li></ul><ul><li>Not all [ILV]K.E -sites are SUMOylated </li></ul><ul><li>Not all SUMOylated sites have the consensus motif </li></ul>TP FP FN
  6. 6. Baseline prediction Method CC Regular Expression scanner 0.68
  7. 7. Comparison with existing predictors + Xu J., BMC Bioinformatics 2008, 9:8 ‡ Xue Y., Nucleic Acid Res 2006, W254 -W 257 † http://www.abgent.com/doc/sumoplot (commercial) Method CC Regular Expression scanner 0.68 SUMOpre + 0.64 SUMOsp ‡ 0.26 SUMOplot † 0.48
  8. 8. Case study : Core histones in yeast <ul><li>Identified SUMOylation sites + </li></ul><ul><ul><li>H2B : K6/7, K16/17 </li></ul></ul><ul><ul><li>H2A : K2, K126 </li></ul></ul><ul><ul><li>H4 : somewhere in the tail </li></ul></ul><ul><li>No SUMOylation consensus site </li></ul><ul><li>Predictor to date are not able to predict even a single SUMOylation site in the histone sequence </li></ul>+ Nathan D., Genes Dev 2006, 20(8):966-76
  9. 9. Our approach <ul><li>Identify </li></ul><ul><ul><li>window size </li></ul></ul><ul><ul><li>which ML method is best </li></ul></ul><ul><li>Voil á: better predictor ! </li></ul>Sequence xxxx K xxxx SUMOylation 1/0 ML
  10. 10. Training in more Detail w U w D Protein Sequence K Imbalance in the dataset - more negatives than positives SUMOylated K Not SUMOylated K K K ML T 0 1 0 P 1 1 0 K K
  11. 11. Prediction in more Detail w U w D Protein Sequence K K K Trained ML 1 1 0 K K SUMOylated K Not SUMOylated K K K
  12. 12. ML methods <ul><li>Bidirectional Recurrent Neural Network (BRNN) </li></ul><ul><ul><li>Using information of flanking windows </li></ul></ul><ul><ul><li>Decaying with distance to center window </li></ul></ul><ul><ul><li>Prone to overfit </li></ul></ul><ul><li>Support Vector Machine (SVM) </li></ul><ul><ul><li>regularized </li></ul></ul><ul><ul><li>requires suitable kernel and feature representation </li></ul></ul><ul><ul><li>Standard Kernels </li></ul></ul><ul><ul><ul><li>Linear, Polynomial, RBF </li></ul></ul></ul><ul><ul><li>String Kernel </li></ul></ul><ul><ul><ul><li>P-kernel, local-alignment kernel </li></ul></ul></ul>
  13. 13. Data set <ul><li>Training/Testing data </li></ul><ul><ul><li>144 proteins with </li></ul></ul><ul><ul><li>241 SUMOylation sites </li></ul></ul><ul><ul><li>5,741 non-SUMOylated Lysines </li></ul></ul><ul><ul><li>68% of the SUMOulated sites confom to the consensus motif </li></ul></ul><ul><li>Hold-out </li></ul><ul><ul><li>13 proteins with </li></ul></ul><ul><ul><li>27 SUMOylation sites </li></ul></ul><ul><ul><li>48% consensus motif </li></ul></ul>Xu J., BMC Bioinformatics 2008, 9:8
  14. 14. Evaluation <ul><li>5-fold cross-validation </li></ul><ul><li>Matthews correlation coefficient (CC) </li></ul><ul><li>Sensitivity, Specificity, Accuracy </li></ul><ul><li>Area under the curve ( AUC ) </li></ul>
  15. 15. Performance overview SUMOsvm
  16. 16. Comparison with existing methods
  17. 17. Quest to improve performance <ul><li>Protein structural features and evolutionary features </li></ul><ul><li>Separating SUMOylation sites from different species or compartment </li></ul><ul><li>Clustering for other motifs using kernel hierarchical clustering </li></ul>
  18. 18. Summary <ul><li>Regular Expression Scanner is still the best classifier. </li></ul><ul><li>SUMO more versatile than expected ! </li></ul><ul><li>The road to better predictions </li></ul><ul><ul><li>Are there other motifs? </li></ul></ul><ul><ul><li>Which features can discriminate? </li></ul></ul><ul><ul><li>Is the dataset biased? </li></ul></ul>http://spot.colorado.edu/~colemab/Theatre_Resources/SumoBallerina.jpg
  19. 19. Acknowledgment <ul><li>Predictor/Analysis </li></ul><ul><ul><li>Mikael Bod én </li></ul></ul><ul><ul><li>Fabian Buske </li></ul></ul><ul><li>Dataset </li></ul><ul><ul><li>Xu et al. </li></ul></ul><ul><li>PhD Supervisors </li></ul><ul><ul><li>Tim Bailey </li></ul></ul><ul><ul><li>Andrew Perkins </li></ul></ul><ul><ul><li>Mikael Bod én </li></ul></ul>Other Bioinformatic tools: STREAM – a practical workbench for modeling transcriptional regulation. www.bioinformatics.org.au/stream/

×