Your SlideShare is downloading. ×
Network and pathway analysis in systems biology - Melissa Davis
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Network and pathway analysis in systems biology - Melissa Davis

863
views

Published on

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
863
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
35
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Network and pathway analysis in systems biology Dr. Melissa Davis The University of Queensland Institute for Molecular Bioscience m.davis@imb.uq.edu.au
  • 2. • Regulated by complex cellular circuitry – Extra- and intra-cellular signals are transacted through networks of interacting molecules – Changes in cellular signalling result in the activation or repression of programs of gene expression • Protein interactions, metabolic network, signalling pathways, gene regulatory networks Source: Cell , Volume 144, Issue 5, Pages 646-674
  • 3. Interpret similarities in large cohorts – Statistics of feature selection – Interpretation of single ‘omics results – Discovery of biomarkers Interpret individual differences for patient-specific treatment – Robust n=1 analysis methods – Interpret multiple ‘omics results simultaneously – Discover diagnostic features Reductionist biology – List of molecules implicated in condition – Selection of molecule of interest – Hypothesis generation – Experiment to determine role of molecule Systems biology – Networks, pathways implicated in condition – Identify perturbed or deregulated systems – Hypothesis generation – Experiment to determine responses of the system
  • 4. So… • We need new methods for data interpretation • We need better knowledge bases – Context specificity – Molecular resolution – Biological complexity • All this requires knowledge engineering: • Computational biology can enrich biological context, improve molecular resolution and capture missing biological complexity • New informatic methods in cancer research that enable comparative systems analysis for individuals
  • 5. Computational biology of molecular interaction networks
  • 6. Protein interaction networks • Advantages – Increasing coverage – Powerful insights – Increasing quality – Visualisation • Disadvantages – Inadequate metadata – Poor molecular resolution – Aggregated conditions – Flattened, generic PPI network – Little evidence of the biological specificity – Meaning of interactions missing
  • 7. Thakur et al., (1997) • Protein-protein interactions are useful – Understand subcellular localisation (Chin, et al., 2009) – Perform comparative mammalian systems analysis (Chin, Davis and Ragan, 2009) – Interpret proteomics data in prostate cancer (Inder et al., 2011, Inder, Davis and Hill, 2012) • PPI data are usually assigned to a reference protein, or even gene • Previously characterised the impact of alternative splicing on subcellular localisation (Davis et al. 2006) • Little or no isoform specificity exists in most PPI datasets • Does alternative splicing generate protein isoforms that have different interaction potential? Guo and Qui, (2011)
  • 8. Davis, et. al., Mol.BioSys, 2012 Buljan, et. al., Mol.Cell, 2012 Ellis, et. al., Mol.Cell, 2012 Alternative splicing of domains rewires protein-protein interactions Tissue-specific exons enriched for disordered regions favoring binding Cassette exons regulated by neural specific splicing regulator modulate PPIs Observations are not tissue specific, no analysis of disordered regions Do not address splicing of protein domains explicitly, use exons not isoforms as unit of analysis, limited expression data
  • 9. Rewiring the dynamic interactome Davis et al. Mol. BioSyst., 2012,8, 2054-2066 8860 transcriptional units (genes) with both alternative isoforms and protein interaction domains: 3DID Stein et al., (2010) H-Invitational + Fantom 3 PPI Shin et al., (2009)
  • 10. What is happening with the interactions? – 1787 genes involved in known interactions 1287 215 644 • STAT1 (isoform cant bind CREBBP) • AKT1 (isoform retains kinase domains but loses PH) • PTPN11 (isoform lacks a SH2 domain and cant bind JAK2) • GRB2 (isoform GRB3-3 participates in distinct interactions and signalling)
  • 11. Current work: Tissue specific interactions • Updated the DDI data and PPI data • Illumina Bodymap 2.0: RNA seq data produced using the HiSeq 2000 (2010) • 16 phenotypically normal tissues: adrenal, adipose, brain, breast, colon, heart, kidney, liver, lung, lymph, ovary, prostate, skeletal muscle, testes, thyroid, and white blood cells 16 human tissues RNA-seq mapped (two replicates) diagnostic features AlexaSeq Cufflinks ? expressed
  • 12. (69255 Isoform Interactions) (3627 Gene Interactions) PPI network: 14528 interactions Protein interaction domains (3DID): 2622 Isoform domain annotation (Ensembl): 151664 + + Ribosome Complex Neurotransmitter Complex
  • 13. Domain and process results Domain Count Pkinase 393 Pkinase_Tyr 389 SH2 217 Ras 183 7tm_1 164 GO Biological Process (variable genes) q-value Intracellular signaling cascade 6.78e-42 Response to organic substance 6.26e-31 Positive regulation of molecular function 7.98e-29 Positive regulation of catalytic activity 7.76e-27 Regulation of apoptosis 3.55e-25 Information regarding genes with variable domain architecture within the maximal PID network. (a) Five most common domain classes present in the variable genes showing an enrichment for signalling domain. (b) GO biological process enrichment scores for the same genes.
  • 14. Pathway level analysis
  • 15. • Protein isoforms functionally diverse • Interaction network is rewired by splicing of interaction domains • Identify interaction networks for specific tissues • Isoform variability: emerging theme of opposing function • Very strong enrichment for signalling proteins • Part of normal phenotypic diversity BUT also has a role in cancer and disease: – Isoforms of Gli1 (from Shh pathway) – MST1R (RON) isoforms (upstream of MAPK pathway) – P53 isoforms with dominant negative effect – Switch to developmentally restricted isoforms – Transcript variants and protein isoforms as potential diagnostic and therapeutic targets
  • 16. Modelling information flow in pathways • Pathways contain a richer representation of biological information than PPI networks • Mechanistic models are desirable -> hypothesis generation • Kinetic parameters aren’t available for all reactions • Does network topology contain sufficient information for predicting system-level responses? HIF1A VHL CUL2
  • 17. Topology matters Grouping of molecules into sets breaks connectivity and eliminates real crosstalk
  • 18. Representation of multi-cellular interactions
  • 19. Is there such a thing as a pathway at all? • Our concept of a pathway as a linear series of events is largely a fiction • Signalling proteins may be active in many pathways • More-correct to think of this as a network AKT1 EGFR FGFRNGF PDGF SCF-KIT ERBB2GPCR Immune system, Membrane trafficking, Gene expression, Hemostasis, Apoptosis, and Metabolism
  • 20. PATHLOGIC-S • Extract the reaction network from REACTOME (BioPAX L3 model) • Convert to a Boolean logical model of the signal transduction network • No parameterisation • Enumerate the capabilities of the network Fearnley et al., 2012, PLOS One
  • 21. Comparison to phosphoproteomic data Novel differential signalling predicted Observed differential signalling predicted Observed molecules accurately predicted Observed molecules inaccurately predicted Observed molecules fail to map • Adapt the PATHLOGIC system to model different experimental states • Evaluate predictions against published experimental results • Performance depends on pathway coverage • Good sensitivity • Small true negative datasets -> difficult to calculate accurate specificity • Novel, mechanistic predictions Steen et al, 2002 Osinalde et al, 2011
  • 22. • EGFR signalling • Gold-standard map (Oda et al., MSB, 1: 2005.0010) compared with representation in Reactome – Overlap (red) – Equivalency (purple) – Greens (present in different pathways) – White (not present) – Limited crosstalk found that was not captured in the gold standard Bauer-Mehren et al., (2009) MSB, 5:290
  • 23. EGF Signalling results • Validation data: EGF phospho-proteomic experiment characterising downstream activity resulting from EGF stimulation (Steen et al, 2002) • Model two conditions: – Without EGF present -> EGF and related molecules switched off – In presence of EGF -> EGF as an input is switched on, and related molecules are set as undetermined • Correct predictions for 6 positive and 3 negative results • 38 additional proteins are predicted to have altered signalling
  • 24. • Advanced generation pathway analysis techniques use pathway topology and rich biological information attached to interactions • Screen for the consequences of mutations, knock-outs, altered connectivity • Identify the effects of drugs targeting specific molecules or pathways • Models are untrained and unfitted – more accurate models in specific cells or tissues to improve predictive power
  • 25. • Using CNV data from Pancreatic cancer to identify patient specific gene deletions and simulate the effects on signal transduction using PATHLOGIC • Bone-specific interactions and secreted factors to identify candidate systems implicated in prostate cancer metastasis to bone • Simulations on DNA repair pathways to identify synthetic-lethal genes in breast cancer • Protein interactions to develop network-based biomarkers in Medulloblastoma Applications in cancer research
  • 26. Cancer heterogeneity • Recent work in a number of cancers has characterised genetic heterogeneity within tumours – Subsection or single cell sequencing coupled with phylogenetic inference to infer clonal populations • What can we tell about heterogenetiy in existing data by using new informatic approaches? • Not interested in what is the same between tumours, but what is different • mCOPA: analysis of heterogeneous features in cancer expression data (Wang, Taciroglu, Maetschke, Nelson, Ragan and Davis, J.Clin.Bioinf. 2012) – identifies over- and under-expressed outliers in individual tumour samples
  • 27. Why are outliers interesting? • Tumours have diverse molecular characteristics • Not all interesting genes have a biomarker-like profile • Need a statistical method to detect outliers in gene expression data Tomlins et al., Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 2005, 310:644-648. • mCOPA – stand-alone method for detection of over- and under- expressed outliers • COPA transformation – COPA Score = (score – median)/mean absolute deviation • Improved outlier detection • Filters – Fold change calculation – No normal samples contain outliers for the feature of interest • Generation of outlier feature list – Over-expressed and under-expressed feature list for each individual sample • -1 (under-expressed outlier) • 0 (not an outlier) • 1 (over-expressed outlier)
  • 28. Applications: Unsupervised clustering with mCOPA Data and sample annotation from Tommlins et al. (2007)
  • 29. Feature selection and clustering analysis methodology 1. Select features from 12 datasets based on one of three methods – Variance (top 1000 most variable genes) – Differential expression (p value < 0.01) – Outliers 2. Apply different clustering algorithms on each features set 3. Compare resulting clusters to clinical annotation and generate RAND index 4. Evaluate performance on clinically defined cancer subtypes DE analysis Variance analysis mCOPA analysis PAM K-means Sil CH Dataset annotation RAND calculation Evaluation mCOPA features produce the best clustering in 7 cases, compared with 2 for DE and 3 when using the original COPA method
  • 30. Feature selection approaches • Distinct biology – Usually, minimal overlap between Variable, Differentially Expressed and Outlier genes – Functional analysis reveals distinct functions and processes for selected genes GO analysis UP DOWN mCOPA Outlier Genes Cell cycle, cell division Apoptosis, positive regulation of kinase cascade, and signalling Differentially Expressed Genes Cell adhesion, Wnt and Cadherin signalling Oxidative metabolism, Cholesterol metabolism
  • 31. Can we use under-expressed outliers to identify tumour suppressors? mCOPA analysis Gene Ontology Database 223 Under- expressed outliers 727 Cell cycle regulators 12 Potential tumour suppressors RBL2 CDK6 TP63 BIRC2 SON PAFAH1B1 PDCD4 RBBP8 DBC1 FZR1 CDC14B HEXIM1 Known prostate cancer tumour suppressors Potential new prostate cancer tumour suppressors Known cancer tumour suppressors Cancer Gene Index: http://wiki.nci.nih.gov/display/cageneindex Potential novel tumour suppressors
  • 32. Evidence for novel tumour suppressors FZR1 (Degrades positive regulators of cell cycle, prevents entry into mitosis following DNA damage.) TCGA Prostate CNV: Also • Significant loss in TCGA: Ovarian, Lung, Gastric, Endometrium, Breast • Expression: Significantly under-expressed in 46 experiments • Under-expressed outlier in 17% of cancer experiments Also • Significant loss in TCGA: Breast, Ovarian, Renal, Lung, Endometrium, but not Prostate • Expression: Significantly under-expressed in 84 experiments • Under-expressed outlier in 22% of cancer experiments CDC14B (Regulates the G2 DNA damage checkpoint following DNA damage.) TCGA Ovarian CNV: TCGA Ovarian CNV: HEXIM1 (transcriptional regulator via RNA Polymerase II transcription inhibition.) Also • Significant loss in TCGA: Colorectal, Ovarian, Breast, Prostate and Endometrium • Expression: Significantly under-expressed in 101 experiments • Under-expressed outlier in 11% of cancer experiments
  • 33. Pathway analysis for individuals • Most outliers are present in only one sample • We can treat the set of outliers for a given sample as input to a pathway analysis for each tumour • Some pathways affected only in a single patient • Some pathways show disruption in multiple tumours
  • 34. • Using outlier profiles to understand heterogeneity in multi-focal prostate cancer • Exploring outlier profiles to improve gene regulatory network inference • Tumour-specific outliers as input to pathway modelling and simulation Applications in cancer research
  • 35. • Systems biology needs to move beyond simple networks to representations that are rich in biology • Formal, machine-readable mechanistic models of biological knowledge • Knowledge-based analytical methods will enable n=1 scale analysis • Computational analyses can generate networks rich in biology and with great predictive power Data can be generated by machines, but to generate knowledge from data, we need to start with what we know
  • 36. Acknowledgements The Institute for Molecular Bioscience: Mark Ragan, Rohan Teasdale, Sean Grimmond, and Brandon Wainwright UQ: Lars Nielsen, Michelle Hill, and Nicholas Saunders QIMR: Nicole Cloonan QUT: Colleen Nelson Stefan Maetschke (UQ) Chenwei Wang (QUT) Current students: David Wood Liam Fearnley Josha Inglis Akash Boda Previous students: ChangJin Shin Piyush Mathamshettiwar Ning Jing Alperen Taciroglu Anna-Belle Beau Yoann Glougen Chang Liu Anh Phuong Le [grant number DP110103384]

×