Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

probabilistic ranking

521 views

Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

probabilistic ranking

  1. 1. Integration of Full-Coverage Probabilistic Functional Networks with Relevance to Specific Biological Processes James, K., Wipat, A. & Hallinan, J. School of Computing Science, Newcastle University Data Integration in the Life Sciences 2009
  2. 2. Integrated functional networks <ul><li>Bring together data from a wide range of sources </li></ul><ul><li>High throughput data is </li></ul><ul><ul><li>Large (one node per gene; multiple interactions per node) </li></ul></ul><ul><ul><li>noisy (FP 20 – 90%) </li></ul></ul><ul><ul><li>Incomplete (to different extents) </li></ul></ul><ul><li>Assess quality of each dataset against a Gold Standard </li></ul><ul><li>Weighted edges reflect sum probability that edge actually exists </li></ul><ul><li>Network can be thresholded to draw attention to most probable edges </li></ul><ul><li>Suitable for manual (interactive) or computational analysis </li></ul>
  3. 3. Dataset bias <ul><li>Different experiment types provide different types of information </li></ul><ul><li>Overlap between datasets usually low </li></ul><ul><ul><li>1% of synthetic lethal pairs physically interact </li></ul></ul><ul><li>Genes involved in the same process may be transcribed together </li></ul><ul><ul><li>Ribosomal biogenesis in yeast </li></ul></ul><ul><li>Some types of interaction may provide more information about a particular biological process </li></ul><ul><ul><li>Complex formation: Y2H </li></ul></ul><ul><ul><li>Signal transduction: phosphorylation </li></ul></ul>
  4. 4. Bias in HTP datasets From Myers and Troyanskaya, Bioinformatics 2007.
  5. 5. Bias & Relevance <ul><li>Most network analyses are related to a Process of Interest (PoI) </li></ul><ul><li>PFINs tend to be very large </li></ul><ul><li>Interactions with equal probability will have different utility </li></ul><ul><li>Several attempts to eliminate bias </li></ul><ul><ul><li>Loss of data </li></ul></ul><ul><li>We aim to use bias </li></ul><ul><ul><li>Relevance </li></ul></ul>
  6. 6. Hypothesis <ul><li>Functional annotations can be applied to probabilistic integrated functional networks to identify interactions relevant to a biological process of interest </li></ul>
  7. 7. Network integration
  8. 8. Network integration
  9. 9. Effect of D value
  10. 10. Relevance scoring <ul><li>GO annotations </li></ul><ul><li>One-tailed Fisher’s exact test to score over-representation of genes related to POI </li></ul><ul><li>POI: term of interest plus any descendants except inferred from electronic annotation </li></ul><ul><li>Control network integrated in order of confidence </li></ul><ul><li>Relevance network integrated in order of relevance </li></ul><ul><li>We use Lee et al. (2004), but method can be applied to any network, any data integration algorithm </li></ul>
  11. 11. Relevance scoring
  12. 12. Data sets <ul><li>Saccharomyces cerevisiae data from BioGRID v.38 </li></ul><ul><li>Split by PMID, duplicates removed </li></ul><ul><li>Datasets > 100 interactions treated individually </li></ul><ul><ul><li>50 data sets, max 14,421 interactions </li></ul></ul><ul><li>Datasets < 100 grouped by BioGRID Experimental categories </li></ul><ul><ul><li>22 data sets, min 33 interactions </li></ul></ul><ul><li>Gene Ontology terms </li></ul><ul><ul><li>Telomere Maintenance (GO:0000723) </li></ul></ul><ul><ul><li>Ageing (GO:0007568) </li></ul></ul>
  13. 13. Choice of D value <ul><li>GO annotations </li></ul><ul><li>Assign function to nodes based on annotation of neighbour with highest weighted edge </li></ul><ul><li>Leave-one-out on full network </li></ul><ul><li>Construct Receiver Operating Characteristic (ROC) curve </li></ul><ul><ul><li>Area Under Curve (AUC) </li></ul></ul><ul><ul><li>SE(W) using Wilcoxon statistic </li></ul></ul>
  14. 14. Classifier output classification threshold positives negatives TP TN FP FN Increasing specificity Increasing sensitivity
  15. 15. ROC Curves
  16. 16. D value
  17. 17. D value
  18. 18. Ranking 3 3 2 8 4.2253 8 1 1 1 7 4.4641 7 7 8 5 6 4.5212 6 5 5 7 5 4.9335 5 4 4 3 4 5.0842 4 2 2 6 3 5.7040 3 8 7 8 2 5.7054 2 6 6 4 1 6.6937 1 A&T Rank Telomere Rank Ageing Rank Conf. Rank Conf. Score Dataset
  19. 19. Results
  20. 20. Evaluation - Clustering <ul><li>MCL Markov-based clustering algorithm </li></ul><ul><li>Considers network topology and edge weights </li></ul>
  21. 21. Results 38.35 33.33 29.83 24.67 523 R 37.98 33.80 29.55 24.26 573 C C 8.59 8.90 7.73 6.50 508 R 6.53 7.02 6.14 5.06 573 C T 36.92 31.75 27.73 22.37 523 R 35.19 28.86 26.14 21.29 573 C A >4 nodes >3 nodes >2 nodes % COI Clusters Bias Net
  22. 22. Cluster annotation
  23. 23. Conclusions <ul><li>Function assignment is statistically significantly better, but probably not practically useful </li></ul><ul><ul><li>Simplistic algorithm </li></ul></ul><ul><ul><li>Dependant upon existing annotation </li></ul></ul><ul><li>Clustering </li></ul><ul><ul><li>Fewer, larger clusters </li></ul></ul><ul><ul><li>Clusters draw together genes of interest </li></ul></ul><ul><ul><li>Different GO terms perform differently </li></ul></ul><ul><li>Relevance networks are better for interactive exploration </li></ul><ul><ul><li>Related PoIs </li></ul></ul>
  24. 24. Future work <ul><li>Which GO terms work best with relevance? </li></ul><ul><li>Why? </li></ul><ul><li>Further exploration of experimental types and relevance </li></ul><ul><li>Implement algorithms in Ondex </li></ul><ul><li>Optimize function assignment / clustering algorithms </li></ul><ul><li>Extend technique to edges </li></ul>
  25. 25. Acknowledgements <ul><li>Centre for Integrated Systems Biology of Ageing and Nutrition (CISBAN) </li></ul><ul><li>Newcastle Systems Biology Resource Centre </li></ul><ul><li>Research Councils of the UK </li></ul><ul><li>BBSRC SABR Ondex Project </li></ul><ul><li>Integrative Bioinformatics Research Group </li></ul>

×