Network and pathway analysis in systems biology - Melissa Davis


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Network and pathway analysis in systems biology - Melissa Davis

  1. 1. Network and pathway analysis in systems biology Dr. Melissa Davis The University of Queensland Institute for Molecular Bioscience
  2. 2. • Regulated by complex cellular circuitry – Extra- and intra-cellular signals are transacted through networks of interacting molecules – Changes in cellular signalling result in the activation or repression of programs of gene expression • Protein interactions, metabolic network, signalling pathways, gene regulatory networks Source: Cell , Volume 144, Issue 5, Pages 646-674
  3. 3. Interpret similarities in large cohorts – Statistics of feature selection – Interpretation of single ‘omics results – Discovery of biomarkers Interpret individual differences for patient-specific treatment – Robust n=1 analysis methods – Interpret multiple ‘omics results simultaneously – Discover diagnostic features Reductionist biology – List of molecules implicated in condition – Selection of molecule of interest – Hypothesis generation – Experiment to determine role of molecule Systems biology – Networks, pathways implicated in condition – Identify perturbed or deregulated systems – Hypothesis generation – Experiment to determine responses of the system
  4. 4. So… • We need new methods for data interpretation • We need better knowledge bases – Context specificity – Molecular resolution – Biological complexity • All this requires knowledge engineering: • Computational biology can enrich biological context, improve molecular resolution and capture missing biological complexity • New informatic methods in cancer research that enable comparative systems analysis for individuals
  5. 5. Computational biology of molecular interaction networks
  6. 6. Protein interaction networks • Advantages – Increasing coverage – Powerful insights – Increasing quality – Visualisation • Disadvantages – Inadequate metadata – Poor molecular resolution – Aggregated conditions – Flattened, generic PPI network – Little evidence of the biological specificity – Meaning of interactions missing
  7. 7. Thakur et al., (1997) • Protein-protein interactions are useful – Understand subcellular localisation (Chin, et al., 2009) – Perform comparative mammalian systems analysis (Chin, Davis and Ragan, 2009) – Interpret proteomics data in prostate cancer (Inder et al., 2011, Inder, Davis and Hill, 2012) • PPI data are usually assigned to a reference protein, or even gene • Previously characterised the impact of alternative splicing on subcellular localisation (Davis et al. 2006) • Little or no isoform specificity exists in most PPI datasets • Does alternative splicing generate protein isoforms that have different interaction potential? Guo and Qui, (2011)
  8. 8. Davis, et. al., Mol.BioSys, 2012 Buljan, et. al., Mol.Cell, 2012 Ellis, et. al., Mol.Cell, 2012 Alternative splicing of domains rewires protein-protein interactions Tissue-specific exons enriched for disordered regions favoring binding Cassette exons regulated by neural specific splicing regulator modulate PPIs Observations are not tissue specific, no analysis of disordered regions Do not address splicing of protein domains explicitly, use exons not isoforms as unit of analysis, limited expression data
  9. 9. Rewiring the dynamic interactome Davis et al. Mol. BioSyst., 2012,8, 2054-2066 8860 transcriptional units (genes) with both alternative isoforms and protein interaction domains: 3DID Stein et al., (2010) H-Invitational + Fantom 3 PPI Shin et al., (2009)
  10. 10. What is happening with the interactions? – 1787 genes involved in known interactions 1287 215 644 • STAT1 (isoform cant bind CREBBP) • AKT1 (isoform retains kinase domains but loses PH) • PTPN11 (isoform lacks a SH2 domain and cant bind JAK2) • GRB2 (isoform GRB3-3 participates in distinct interactions and signalling)
  11. 11. Current work: Tissue specific interactions • Updated the DDI data and PPI data • Illumina Bodymap 2.0: RNA seq data produced using the HiSeq 2000 (2010) • 16 phenotypically normal tissues: adrenal, adipose, brain, breast, colon, heart, kidney, liver, lung, lymph, ovary, prostate, skeletal muscle, testes, thyroid, and white blood cells 16 human tissues RNA-seq mapped (two replicates) diagnostic features AlexaSeq Cufflinks ? expressed
  12. 12. (69255 Isoform Interactions) (3627 Gene Interactions) PPI network: 14528 interactions Protein interaction domains (3DID): 2622 Isoform domain annotation (Ensembl): 151664 + + Ribosome Complex Neurotransmitter Complex
  13. 13. Domain and process results Domain Count Pkinase 393 Pkinase_Tyr 389 SH2 217 Ras 183 7tm_1 164 GO Biological Process (variable genes) q-value Intracellular signaling cascade 6.78e-42 Response to organic substance 6.26e-31 Positive regulation of molecular function 7.98e-29 Positive regulation of catalytic activity 7.76e-27 Regulation of apoptosis 3.55e-25 Information regarding genes with variable domain architecture within the maximal PID network. (a) Five most common domain classes present in the variable genes showing an enrichment for signalling domain. (b) GO biological process enrichment scores for the same genes.
  14. 14. Pathway level analysis
  15. 15. • Protein isoforms functionally diverse • Interaction network is rewired by splicing of interaction domains • Identify interaction networks for specific tissues • Isoform variability: emerging theme of opposing function • Very strong enrichment for signalling proteins • Part of normal phenotypic diversity BUT also has a role in cancer and disease: – Isoforms of Gli1 (from Shh pathway) – MST1R (RON) isoforms (upstream of MAPK pathway) – P53 isoforms with dominant negative effect – Switch to developmentally restricted isoforms – Transcript variants and protein isoforms as potential diagnostic and therapeutic targets
  16. 16. Modelling information flow in pathways • Pathways contain a richer representation of biological information than PPI networks • Mechanistic models are desirable -> hypothesis generation • Kinetic parameters aren’t available for all reactions • Does network topology contain sufficient information for predicting system-level responses? HIF1A VHL CUL2
  17. 17. Topology matters Grouping of molecules into sets breaks connectivity and eliminates real crosstalk
  18. 18. Representation of multi-cellular interactions
  19. 19. Is there such a thing as a pathway at all? • Our concept of a pathway as a linear series of events is largely a fiction • Signalling proteins may be active in many pathways • More-correct to think of this as a network AKT1 EGFR FGFRNGF PDGF SCF-KIT ERBB2GPCR Immune system, Membrane trafficking, Gene expression, Hemostasis, Apoptosis, and Metabolism
  20. 20. PATHLOGIC-S • Extract the reaction network from REACTOME (BioPAX L3 model) • Convert to a Boolean logical model of the signal transduction network • No parameterisation • Enumerate the capabilities of the network Fearnley et al., 2012, PLOS One
  21. 21. Comparison to phosphoproteomic data Novel differential signalling predicted Observed differential signalling predicted Observed molecules accurately predicted Observed molecules inaccurately predicted Observed molecules fail to map • Adapt the PATHLOGIC system to model different experimental states • Evaluate predictions against published experimental results • Performance depends on pathway coverage • Good sensitivity • Small true negative datasets -> difficult to calculate accurate specificity • Novel, mechanistic predictions Steen et al, 2002 Osinalde et al, 2011
  22. 22. • EGFR signalling • Gold-standard map (Oda et al., MSB, 1: 2005.0010) compared with representation in Reactome – Overlap (red) – Equivalency (purple) – Greens (present in different pathways) – White (not present) – Limited crosstalk found that was not captured in the gold standard Bauer-Mehren et al., (2009) MSB, 5:290
  23. 23. EGF Signalling results • Validation data: EGF phospho-proteomic experiment characterising downstream activity resulting from EGF stimulation (Steen et al, 2002) • Model two conditions: – Without EGF present -> EGF and related molecules switched off – In presence of EGF -> EGF as an input is switched on, and related molecules are set as undetermined • Correct predictions for 6 positive and 3 negative results • 38 additional proteins are predicted to have altered signalling
  24. 24. • Advanced generation pathway analysis techniques use pathway topology and rich biological information attached to interactions • Screen for the consequences of mutations, knock-outs, altered connectivity • Identify the effects of drugs targeting specific molecules or pathways • Models are untrained and unfitted – more accurate models in specific cells or tissues to improve predictive power
  25. 25. • Using CNV data from Pancreatic cancer to identify patient specific gene deletions and simulate the effects on signal transduction using PATHLOGIC • Bone-specific interactions and secreted factors to identify candidate systems implicated in prostate cancer metastasis to bone • Simulations on DNA repair pathways to identify synthetic-lethal genes in breast cancer • Protein interactions to develop network-based biomarkers in Medulloblastoma Applications in cancer research
  26. 26. Cancer heterogeneity • Recent work in a number of cancers has characterised genetic heterogeneity within tumours – Subsection or single cell sequencing coupled with phylogenetic inference to infer clonal populations • What can we tell about heterogenetiy in existing data by using new informatic approaches? • Not interested in what is the same between tumours, but what is different • mCOPA: analysis of heterogeneous features in cancer expression data (Wang, Taciroglu, Maetschke, Nelson, Ragan and Davis, J.Clin.Bioinf. 2012) – identifies over- and under-expressed outliers in individual tumour samples
  27. 27. Why are outliers interesting? • Tumours have diverse molecular characteristics • Not all interesting genes have a biomarker-like profile • Need a statistical method to detect outliers in gene expression data Tomlins et al., Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 2005, 310:644-648. • mCOPA – stand-alone method for detection of over- and under- expressed outliers • COPA transformation – COPA Score = (score – median)/mean absolute deviation • Improved outlier detection • Filters – Fold change calculation – No normal samples contain outliers for the feature of interest • Generation of outlier feature list – Over-expressed and under-expressed feature list for each individual sample • -1 (under-expressed outlier) • 0 (not an outlier) • 1 (over-expressed outlier)
  28. 28. Applications: Unsupervised clustering with mCOPA Data and sample annotation from Tommlins et al. (2007)
  29. 29. Feature selection and clustering analysis methodology 1. Select features from 12 datasets based on one of three methods – Variance (top 1000 most variable genes) – Differential expression (p value < 0.01) – Outliers 2. Apply different clustering algorithms on each features set 3. Compare resulting clusters to clinical annotation and generate RAND index 4. Evaluate performance on clinically defined cancer subtypes DE analysis Variance analysis mCOPA analysis PAM K-means Sil CH Dataset annotation RAND calculation Evaluation mCOPA features produce the best clustering in 7 cases, compared with 2 for DE and 3 when using the original COPA method
  30. 30. Feature selection approaches • Distinct biology – Usually, minimal overlap between Variable, Differentially Expressed and Outlier genes – Functional analysis reveals distinct functions and processes for selected genes GO analysis UP DOWN mCOPA Outlier Genes Cell cycle, cell division Apoptosis, positive regulation of kinase cascade, and signalling Differentially Expressed Genes Cell adhesion, Wnt and Cadherin signalling Oxidative metabolism, Cholesterol metabolism
  31. 31. Can we use under-expressed outliers to identify tumour suppressors? mCOPA analysis Gene Ontology Database 223 Under- expressed outliers 727 Cell cycle regulators 12 Potential tumour suppressors RBL2 CDK6 TP63 BIRC2 SON PAFAH1B1 PDCD4 RBBP8 DBC1 FZR1 CDC14B HEXIM1 Known prostate cancer tumour suppressors Potential new prostate cancer tumour suppressors Known cancer tumour suppressors Cancer Gene Index: Potential novel tumour suppressors
  32. 32. Evidence for novel tumour suppressors FZR1 (Degrades positive regulators of cell cycle, prevents entry into mitosis following DNA damage.) TCGA Prostate CNV: Also • Significant loss in TCGA: Ovarian, Lung, Gastric, Endometrium, Breast • Expression: Significantly under-expressed in 46 experiments • Under-expressed outlier in 17% of cancer experiments Also • Significant loss in TCGA: Breast, Ovarian, Renal, Lung, Endometrium, but not Prostate • Expression: Significantly under-expressed in 84 experiments • Under-expressed outlier in 22% of cancer experiments CDC14B (Regulates the G2 DNA damage checkpoint following DNA damage.) TCGA Ovarian CNV: TCGA Ovarian CNV: HEXIM1 (transcriptional regulator via RNA Polymerase II transcription inhibition.) Also • Significant loss in TCGA: Colorectal, Ovarian, Breast, Prostate and Endometrium • Expression: Significantly under-expressed in 101 experiments • Under-expressed outlier in 11% of cancer experiments
  33. 33. Pathway analysis for individuals • Most outliers are present in only one sample • We can treat the set of outliers for a given sample as input to a pathway analysis for each tumour • Some pathways affected only in a single patient • Some pathways show disruption in multiple tumours
  34. 34. • Using outlier profiles to understand heterogeneity in multi-focal prostate cancer • Exploring outlier profiles to improve gene regulatory network inference • Tumour-specific outliers as input to pathway modelling and simulation Applications in cancer research
  35. 35. • Systems biology needs to move beyond simple networks to representations that are rich in biology • Formal, machine-readable mechanistic models of biological knowledge • Knowledge-based analytical methods will enable n=1 scale analysis • Computational analyses can generate networks rich in biology and with great predictive power Data can be generated by machines, but to generate knowledge from data, we need to start with what we know
  36. 36. Acknowledgements The Institute for Molecular Bioscience: Mark Ragan, Rohan Teasdale, Sean Grimmond, and Brandon Wainwright UQ: Lars Nielsen, Michelle Hill, and Nicholas Saunders QIMR: Nicole Cloonan QUT: Colleen Nelson Stefan Maetschke (UQ) Chenwei Wang (QUT) Current students: David Wood Liam Fearnley Josha Inglis Akash Boda Previous students: ChangJin Shin Piyush Mathamshettiwar Ning Jing Alperen Taciroglu Anna-Belle Beau Yoann Glougen Chang Liu Anh Phuong Le [grant number DP110103384]