SUMOylation site prediction

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

1 comments

Comments 1 - 1 of 1 previous next Post a comment

Post a comment
Embed Video
Edit your comment Cancel

3 Favorites

SUMOylation site prediction - Presentation Transcript

  1. SUMOylation-site Prediction Denis C. Bauer Fabian A. Buske Mikael Bod én
  2. Overview
    • Background
      • SUMOylation - what is that ?
    • Published predictors
    • Our approach
    • What makes SUMO hard to tackle
  3. SUMO is not 相撲
    • S mall U biquitin-related Mo difier is a small protein of 97 amino acids.
    • 20% homology to ubiquitin
    • Post-translational modification
    • Covalently attached to Lysines
    • Involved in many pathways/mechanisms
      • Transcriptional regulation
      • Compartmentisation
  4. SUMOylation pathway
  5. SUMOylation motif
    • One consensus motif [ILV]K.E for about 60% of known sites
    • However
    • Not all [ILV]K.E -sites are SUMOylated
    • Not all SUMOylated sites have the consensus motif
    TP FP FN
  6. Baseline prediction Method CC Regular Expression scanner 0.68
  7. Comparison with existing predictors + Xu J., BMC Bioinformatics 2008, 9:8 ‡ Xue Y., Nucleic Acid Res 2006, W254 -W 257 † http://www.abgent.com/doc/sumoplot (commercial) Method CC Regular Expression scanner 0.68 SUMOpre + 0.64 SUMOsp ‡ 0.26 SUMOplot † 0.48
  8. Case study : Core histones in yeast
    • Identified SUMOylation sites +
      • H2B : K6/7, K16/17
      • H2A : K2, K126
      • H4 : somewhere in the tail
    • No SUMOylation consensus site
    • Predictor to date are not able to predict even a single SUMOylation site in the histone sequence
    + Nathan D., Genes Dev 2006, 20(8):966-76
  9. Our approach
    • Identify
      • window size
      • which ML method is best
    • Voil á: better predictor !
    Sequence xxxx K xxxx SUMOylation 1/0 ML
  10. Training in more Detail w U w D Protein Sequence K Imbalance in the dataset - more negatives than positives SUMOylated K Not SUMOylated K K K ML T 0 1 0 P 1 1 0 K K
  11. Prediction in more Detail w U w D Protein Sequence K K K Trained ML 1 1 0 K K SUMOylated K Not SUMOylated K K K
  12. ML methods
    • Bidirectional Recurrent Neural Network (BRNN)
      • Using information of flanking windows
      • Decaying with distance to center window
      • Prone to overfit
    • Support Vector Machine (SVM)
      • regularized
      • requires suitable kernel and feature representation
      • Standard Kernels
        • Linear, Polynomial, RBF
      • String Kernel
        • P-kernel, local-alignment kernel
  13. Data set
    • Training/Testing data
      • 144 proteins with
      • 241 SUMOylation sites
      • 5,741 non-SUMOylated Lysines
      • 68% of the SUMOulated sites confom to the consensus motif
    • Hold-out
      • 13 proteins with
      • 27 SUMOylation sites
      • 48% consensus motif
    Xu J., BMC Bioinformatics 2008, 9:8
  14. Evaluation
    • 5-fold cross-validation
    • Matthews correlation coefficient (CC)
    • Sensitivity, Specificity, Accuracy
    • Area under the curve ( AUC )
  15. Performance overview SUMOsvm
  16. Comparison with existing methods
  17. Quest to improve performance
    • Protein structural features and evolutionary features
    • Separating SUMOylation sites from different species or compartment
    • Clustering for other motifs using kernel hierarchical clustering
  18. Summary
    • Regular Expression Scanner is still the best classifier.
    • SUMO more versatile than expected !
    • The road to better predictions
      • Are there other motifs?
      • Which features can discriminate?
      • Is the dataset biased?
    http://spot.colorado.edu/~colemab/Theatre_Resources/SumoBallerina.jpg
  19. Acknowledgment
    • Predictor/Analysis
      • Mikael Bod én
      • Fabian Buske
    • Dataset
      • Xu et al.
    • PhD Supervisors
      • Tim Bailey
      • Andrew Perkins
      • Mikael Bod én
    Other Bioinformatic tools: STREAM – a practical workbench for modeling transcriptional regulation. www.bioinformatics.org.au/stream/

+ Denis BauerDenis Bauer, 2 years ago

custom

1169 views, 3 favs, 0 embeds more stats

This presentation is about predicting the sites wit more

More info about this document

© All Rights Reserved

Go to text version

  • Total Views 1169
    • 1169 on SlideShare
    • 0 from embeds
  • Comments 1
  • Favorites 3
  • Downloads 11
Most viewed embeds

more

All embeds

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories