Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Slides 0


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Slides 0

  1. 1. Protein-Protein Interactions Prediction Sergey Knyazev November 21, 2012
  2. 2. Outline 1) Introduction 2) Protein-Protein interaction 3) Protein-Protein interaction databases 4) Protein-Protein interaction prediction
  3. 3. Introduction 1) Introduction 2) Protein-Protein interaction 3) Protein-Protein interaction databases 4) Protein-Protein interaction prediction
  4. 4. There is Huge Ammount of Interactions in a Cell Example: Possible molecular interactions in a spreading cell.
  5. 5. There is Many Ways Biomolecules Interacts in a Cell.
  6. 6. Protein-Protein Interaction 1) Introduction 2) Protein-Protein interaction 3) Protein-Protein interaction databases 4) Protein-Protein interaction prediction
  7. 7. Protein-Protein Interaction ● Physical contacts with molecular docking between proteins that occur in a cell or in a living organism. ● Not just a ‘‘functional contact’’: The existence of many other types of functional links between biomolecular entities (genes, proteins, metabolites, etc.) in living organisms should not be confused with protein physical interactions. ● ‘‘Specific contact’’, not just all proteins that bump into each other by chance. ● Should be excluded interactions that a protein experiences when it is being made, folded, quality checked, or degraded.
  8. 8. Protein-Protein Interaction (PPI) detection
  9. 9. PPIs Detection Methods
  10. 10. Protein-Protein Interaction Databases 1) Introduction 2) Protein-Protein interaction 3) Protein-Protein interaction databases 4) Protein-Protein interaction prediction
  11. 11. Protein-Protein Interaction Databases ● BIND - Biomolecular Interaction Network Database; ● BioGRID - Biological General Repository for Interaction Datasets; ● DIP - Database of Interacting Proteins; ● IntAct - IntAct Molecular Interaction Database; ● HPRD - Human Protein Reference Database ● MINT - Molecular INTeraction database; ● PIPs - Human PPI Prediction database; ● STRING - Known and Predicted Protein-Protein Interactions.
  12. 12. PPI Network Derived from Databases
  13. 13. PIPs human PPIs database ● Contains predictions of 37 000 high probability interactions of which 34 000 are not reported in the interaction databases HPRD, BIND, DIP or OPHID. ● Interactions predicted by a naive Bayesian model. The method combines information from gene co- expression, orthology, co-occurrence of domains, post-translational modifications, co-localization of the proteins within the cell and analysis of the local topology of the predicted PPI network. ● Based on a prediction algorithm described bellow...
  14. 14. Protein-Protein interaction prediction 1) Introduction 2) Protein-Protein interaction 3) Protein-Protein interaction databases 4) Protein-Protein interaction prediction
  15. 15. Protein-Protein Interaction Prediction ● The prediction of human protein-protein interactions was investigated in a Bayesian framework by considering combinations of individual protein features known to be indicative of interaction. ● The seven individual features are used. ● The features are grouped into five distinct modules: Expression (E), Ortology(O), Combined(C), Disorder(D), Transitive(T).
  16. 16. Expression Module ● Data Source: – GDS596 from the Gene Expression Omnibus ● Description: – Gene co-expression profiles from 79 physiologically normal tissues obtained from various sources ● Scoring function: – Pearson correlation of coexpression over all conditions ● Bins: – 20 of equal size covering the correlation value range (-1 to +1)
  17. 17. Orthology Module ● Data Source: – InParanoid, BIND, DIP and GRID databases ● Description: – Interactions of homologous protein pairs from yeast, fly, worm and human ● Scoring function: – Organism-based using InParanoid score ● 13 Bins: – High, medium and low confidence bins were defined for human protein pairs that have interacting orthologs in either yeast, fly or worm (for a total of 9 bins) – two bin for human pairs that have interacting paralogs in human (a medium and a low confidence) – one bin for human pairs that have interacting homologs in more than one organism – one bin for human pairs that have only noninteracting orthologs
  18. 18. Combined Module ● This module incorporates three distinct features in a nonnaïve Bayesian framework: subcellular localization, domain co- occurrence and post-translational modification co-occurrence. ● Localization: – Data source: ● PSLT predictions – Description: ● PSLT is a human subcellular localization predictor that considers nine different compartments (ER, Golgi, cytosol, nucleus, peroxisome, plasma membrane, lysosome, mitochondria and extracellular) – Scoring function: ● Qualitative score: proximity of compartments – 4 bins: ● same, neighboring, different compartments, or not localized
  19. 19. Combined module ● Domain co-occurrence – Data source: ● InterPro and Pfam – Description: ● Protein domains and motifs – Scoring function: ● The chi-square test was used as a measure of the likelihood of co- occurrence of specific InterPro domains and motifs in protein pairs ● Chi-square scores were calculated for all pairs of domains/motifs that occurred in the training data – Bins: ● 5 covering range of Chi-square scores
  20. 20. Combined module ● PTM co-occurrence – Data source: ● HPRD and UniProt – Description: ● Post-translational modifications – Scoring function: – Bins: ● 4 covering range of PTM scores
  21. 21. Disorder Module ● Data source: – VLS2 predictions ● Description: – Prediction of protein intrinsic disorder ● Scoring function: – Sum of the percent disorder for each protein in a pair ● Bins: – 6 covering range of scoring function (0 to 200%)
  22. 22. Transitive Module ● Description: – Module that considers local topology of underlying network predicted using combinations of above features ● Scoring function: ● Bins: – 5 covering range of scoring function
  23. 23. Independence of the Modules ● The final likelihood ratio output by the predictor is only representative of the true likelihood of interaction of a protein pair if the modules considered are independent. If the modules were not independent, some likelihood ratios would likely be overestimated. ● Previous studies have demonstrated that some of the features considered here are indeed independent. ● Independence of all modules used in our predictor was verified by calculating Pearson correlation coefficients for all pairs of modules.
  24. 24. Architecture of the Predictor and Likelihoods of the Modules
  25. 25. Posterior Odds Ratio Estimation ● f1, … , fn — features ● I — interaction ● ~I — non-interaction
  26. 26. Accuracy of the Predictors ● In order to analyze the predictions, five-fold cross validation experiments were performed and the area under partial ROC (receiver operator characteristic) curves (partial AUCs) measured. ● T is the total number of positives in the test set ● Ti is the number of positives that score higher than the ith highest scoring negative
  27. 27. Prediction Accuracy of Different Combinations of Modules
  28. 28. PPI Prediction by Single Module
  29. 29. PPI Prediction by Combination of Modules
  30. 30. Receiver Operator Characteristic (ROC)
  31. 31. Comparison with Other Interaction Datasets ● Estimated datasets: – Rhodes probabilistic dataset – LR400 (derived from our predictors) – Lehner orthology-derived dataset ● The false positive rates: ● Reference datasets: – Literature-mined Ramani dataset – Human Protein Reference Database (HPRD)
  32. 32. Comparison with Other Interaction Datasets
  33. 33. Independent Validation
  34. 34. Conclusion ● Predicted over 37000 human protein interactions ● Explored a subspace of the human interactome that has not been investigated by previous large interaction datasets.
  35. 35. References ● Protein–Protein Interactions Essentials: Key Concepts to Building and Analyzing Interactome Networks 2010 Javier De Las Rivas, Celia Fontanillo ● PIPs: human protein–protein interaction prediction database 2008 Mark D. McDowall, Michelle S. Scott and Geoffrey J. Barton ● Probabilistic prediction and ranking of human protein- protein interactions 2007 Michelle S Scott and Geoffrey J Barton
  36. 36. Thank you!