Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pizza club - March 2017 - Gaia


Published on

A Tenyu et al, ChainRank, a chain prioritisation method for contextualisation of biological networks, BMC Bioinformatics 2016 17:17, DOI: 10.1186/s12859-015-0864-x

Published in: Science
  • Be the first to comment

  • Be the first to like this

Pizza club - March 2017 - Gaia

  1. 1. 22 March 2017
  2. 2. Background & Aim • There is more and more (genome-wide) data available that is still not optimally used • Genome-wide networks are too big and complex to be interpreted in a meaningful way • Knowledge-based networks are in general non specific: e.g. canonical pathways, PPI networks… Develop a flexible method to identify context-specific subnetworks
  3. 3. Approach • Model the flow of information using chains of interactions • Chains = simple paths: sequence of interactions (e.g. protein modifications) that connect one start and one ending point. • Multiple chains can exist between a couple of start and end protein: what is the best meaningful subnetwork? • Prioritization of the chains based on many possible scores: gene expression, functional module identification, … • Here they present a general tool for combining multiple biological information as chain scores: ChainRank
  4. 4. Methods 1. Search for all chains among user-defined start and end nodes in the network 2. Annotate the nodes with scores in order to calculate chains score and p-value
  5. 5. Subnetwork Restrict the network by heuristic breadth-first search from the fixed initial proteins to the final one with 2 criteria: 1. Maximal length allowed = length of the shortest path between initial and final node 2. Prefer the integration of highly connected proteins (canonical signaling interactors)
  6. 6. Scoring scheme • Chain score = 𝑛𝑜𝑑𝑒 𝑠𝑐𝑜𝑟𝑒𝑠 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑜𝑑𝑒𝑠 • Node scores used 1. Localisation: mean expression variability across studied tissue vs. mean expression variability across all others -> gene expression 2. Relevance: occurrence of each protein among the significant ones across studies -> gene expression, protein modifications, metabolism… 3. Connectivity: degree centrality -> topology • Combination of scores 1. Weighted product of normalized scores 2. Filtering: pre-filter chains by score S1 and rank them by score S2 3. Intersection: keep only chains that pass filter on all scores
  7. 7. Results • Application to chronic obstructive pulmonary disease (COPD) • Network used: experimental interactions from different public databases + COPD knowledge base (10k nodes, 62k interactions) • Significance: comparison to chains in random networks • Evaluation: enrichment of the top ranked chains in gold standard pathways proteins • Improvement metric: 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑜𝑓 𝑟𝑎𝑛𝑘𝑖𝑛𝑔 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑜𝑓 𝑟𝑎𝑛𝑑𝑜𝑚 𝑟𝑎𝑛𝑘𝑖𝑛𝑔
  8. 8. Localisation: expression variability across studied tissue vs. across all others Relevance: occurrence of each protein among the significant ones across COPD-related studies Connectivity: degree centrality Combination by weighted product: no improvement Filtering: connectivity<0.05, ranked by localization Intersection: connectivity and localization Filtering: top quartile localization, ranked by relevance Intersection: localization and relevance IGF-Akt proximity subnetwork MAPK proximity subnetwork
  9. 9. Results for the best 50 chains Other methods: recall 50-85% Precision 18-42% Here (max): recall 67%, precision 30%
  10. 10. Conclusions and claims • 50% improvement in finding gold standard proteins (compared to random), and combining scores even better (x2.5) • 11% improvement of the AUC (compared to random) • Generic tool applicable to different network types (GRN, metabolic networks) • Importance of selected scores based on scientific question • Applications • Causal, mechanistic connection? • Common mechanisms driving different diseases • Reduce the computational models • Synthetic lethality