probabilistic ranking

451 views
424 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
451
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

probabilistic ranking

  1. 1. Integration of Full-Coverage Probabilistic Functional Networks with Relevance to Specific Biological Processes James, K., Wipat, A. & Hallinan, J. School of Computing Science, Newcastle University Data Integration in the Life Sciences 2009
  2. 2. Integrated functional networks <ul><li>Bring together data from a wide range of sources </li></ul><ul><li>High throughput data is </li></ul><ul><ul><li>Large (one node per gene; multiple interactions per node) </li></ul></ul><ul><ul><li>noisy (FP 20 – 90%) </li></ul></ul><ul><ul><li>Incomplete (to different extents) </li></ul></ul><ul><li>Assess quality of each dataset against a Gold Standard </li></ul><ul><li>Weighted edges reflect sum probability that edge actually exists </li></ul><ul><li>Network can be thresholded to draw attention to most probable edges </li></ul><ul><li>Suitable for manual (interactive) or computational analysis </li></ul>
  3. 3. Dataset bias <ul><li>Different experiment types provide different types of information </li></ul><ul><li>Overlap between datasets usually low </li></ul><ul><ul><li>1% of synthetic lethal pairs physically interact </li></ul></ul><ul><li>Genes involved in the same process may be transcribed together </li></ul><ul><ul><li>Ribosomal biogenesis in yeast </li></ul></ul><ul><li>Some types of interaction may provide more information about a particular biological process </li></ul><ul><ul><li>Complex formation: Y2H </li></ul></ul><ul><ul><li>Signal transduction: phosphorylation </li></ul></ul>
  4. 4. Bias in HTP datasets From Myers and Troyanskaya, Bioinformatics 2007.
  5. 5. Bias & Relevance <ul><li>Most network analyses are related to a Process of Interest (PoI) </li></ul><ul><li>PFINs tend to be very large </li></ul><ul><li>Interactions with equal probability will have different utility </li></ul><ul><li>Several attempts to eliminate bias </li></ul><ul><ul><li>Loss of data </li></ul></ul><ul><li>We aim to use bias </li></ul><ul><ul><li>Relevance </li></ul></ul>
  6. 6. Hypothesis <ul><li>Functional annotations can be applied to probabilistic integrated functional networks to identify interactions relevant to a biological process of interest </li></ul>
  7. 7. Network integration
  8. 8. Network integration
  9. 9. Effect of D value
  10. 10. Relevance scoring <ul><li>GO annotations </li></ul><ul><li>One-tailed Fisher’s exact test to score over-representation of genes related to POI </li></ul><ul><li>POI: term of interest plus any descendants except inferred from electronic annotation </li></ul><ul><li>Control network integrated in order of confidence </li></ul><ul><li>Relevance network integrated in order of relevance </li></ul><ul><li>We use Lee et al. (2004), but method can be applied to any network, any data integration algorithm </li></ul>
  11. 11. Relevance scoring
  12. 12. Data sets <ul><li>Saccharomyces cerevisiae data from BioGRID v.38 </li></ul><ul><li>Split by PMID, duplicates removed </li></ul><ul><li>Datasets > 100 interactions treated individually </li></ul><ul><ul><li>50 data sets, max 14,421 interactions </li></ul></ul><ul><li>Datasets < 100 grouped by BioGRID Experimental categories </li></ul><ul><ul><li>22 data sets, min 33 interactions </li></ul></ul><ul><li>Gene Ontology terms </li></ul><ul><ul><li>Telomere Maintenance (GO:0000723) </li></ul></ul><ul><ul><li>Ageing (GO:0007568) </li></ul></ul>
  13. 13. Choice of D value <ul><li>GO annotations </li></ul><ul><li>Assign function to nodes based on annotation of neighbour with highest weighted edge </li></ul><ul><li>Leave-one-out on full network </li></ul><ul><li>Construct Receiver Operating Characteristic (ROC) curve </li></ul><ul><ul><li>Area Under Curve (AUC) </li></ul></ul><ul><ul><li>SE(W) using Wilcoxon statistic </li></ul></ul>
  14. 14. Classifier output classification threshold positives negatives TP TN FP FN Increasing specificity Increasing sensitivity
  15. 15. ROC Curves
  16. 16. D value
  17. 17. D value
  18. 18. Ranking 3 3 2 8 4.2253 8 1 1 1 7 4.4641 7 7 8 5 6 4.5212 6 5 5 7 5 4.9335 5 4 4 3 4 5.0842 4 2 2 6 3 5.7040 3 8 7 8 2 5.7054 2 6 6 4 1 6.6937 1 A&T Rank Telomere Rank Ageing Rank Conf. Rank Conf. Score Dataset
  19. 19. Results
  20. 20. Evaluation - Clustering <ul><li>MCL Markov-based clustering algorithm </li></ul><ul><li>Considers network topology and edge weights </li></ul>
  21. 21. Results 38.35 33.33 29.83 24.67 523 R 37.98 33.80 29.55 24.26 573 C C 8.59 8.90 7.73 6.50 508 R 6.53 7.02 6.14 5.06 573 C T 36.92 31.75 27.73 22.37 523 R 35.19 28.86 26.14 21.29 573 C A >4 nodes >3 nodes >2 nodes % COI Clusters Bias Net
  22. 22. Cluster annotation
  23. 23. Conclusions <ul><li>Function assignment is statistically significantly better, but probably not practically useful </li></ul><ul><ul><li>Simplistic algorithm </li></ul></ul><ul><ul><li>Dependant upon existing annotation </li></ul></ul><ul><li>Clustering </li></ul><ul><ul><li>Fewer, larger clusters </li></ul></ul><ul><ul><li>Clusters draw together genes of interest </li></ul></ul><ul><ul><li>Different GO terms perform differently </li></ul></ul><ul><li>Relevance networks are better for interactive exploration </li></ul><ul><ul><li>Related PoIs </li></ul></ul>
  24. 24. Future work <ul><li>Which GO terms work best with relevance? </li></ul><ul><li>Why? </li></ul><ul><li>Further exploration of experimental types and relevance </li></ul><ul><li>Implement algorithms in Ondex </li></ul><ul><li>Optimize function assignment / clustering algorithms </li></ul><ul><li>Extend technique to edges </li></ul>
  25. 25. Acknowledgements <ul><li>Centre for Integrated Systems Biology of Ageing and Nutrition (CISBAN) </li></ul><ul><li>Newcastle Systems Biology Resource Centre </li></ul><ul><li>Research Councils of the UK </li></ul><ul><li>BBSRC SABR Ondex Project </li></ul><ul><li>Integrative Bioinformatics Research Group </li></ul>

×