NetBioSIG2014-Talk by Traver Hart


Published on

NetBioSIG2014 at ISMB in Boston, MA, USA on July 11, 2014

Published in: Science
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Assume the NetBio SIG audience is familiar with this network…

    Key finding of the yeast GI network is that correlated GI profiles imply cofunctionality

    Question driving my research is, how can we get this kind of information for humans?
  • In yeast, the network is generated by systematic assay of double knockout mutants or TS alleles for essential genes

    Systematic work like this has been very difficult in humans, though it has been attempted on modest scales by a couple of labs.

    However, there does exist a significant body of gene essentiality data for human cancer cell lines. Our hypothesis was that correlated gene essentiality profiles across these cell lines would produce similar information about shared biological function, and further that we may be able to discover novel patterns of shared genetic vulnerability across the cell lines assayed.

    Here the data represent either a binary measure of essentiality or quantitative measure of gene sensitivity to perturbation
  • The data comes from pooled library shRNA screens where multiple shRNA hairpins targeting nearly every protein coding gene are introduced into a cell line. hairpins targeting essential genes drop out of the population and register as strong negative fold change using either a custom microarray or sequencing readout. The initial data are from screens of 72 cancer cell lines of pancreatic, ovarian, and breast origin.
  • We have recently completed a reanalysis of this initial data set; it was just published at the beginning of this month and so I won’t go into great detail here. Briefly, we derived gold standard reference sets of essential and nonessential genes that we can use to train a Bayesian classifier—the observed fold changes of hairpins targeting a given gene are judged to be drawn from either the reference essential or nonessential distributions and the posterior log odds ratio is used as an essentiality score. Just as importantly, we can evaluate the quality of each screen using withheld reference data and identify which screens should be removed from downstream work.
  • This work led us to the ‘daisy model’ of essentiality, where the essential genes in each cell line or tissue are represented by a petal. Petals overlap to varying degrees but they all share the same set of core essential genes; the degree to which a screen recapitulates this core is then a measure of its quality. This model is supported by whole organism data: when separated into core and peripheral essentials, only peripheral essentials are enriched for disease genes. Likewise core essentials show much lower rates of putative deleterious mutation in human population genetic data, indicating that at an organismal level we can’t stand even minor perturbation of these core essential genes.
  • A surprising finding was that, in general, essentiality is not correlated with gene expression, as measure by RNAseq. Even more surprising was the exception to this rule: core essentials show a strong negative correlation. This may be explained by there being some minimum expression requirement for these genes; at this minimum, the genes are very sensitive to perturbation. At higher expression levels, this sensitivity is buffered. Imporantly, this led us to the conclusion that our scoring scheme was yielding a quantitative measure of gene essentiality – or sensitivity – I’ll use those terms interchangeably. So we used these scores directly in our matrix.
  • Despite our improved scoring scheme, the essentiality data are still quite noisy. While each cell line had a fairly modest FDR of around 15%, across over 100 cell lines false positives accumulated rapidly. Here we leaned on the lessons from the yeast network to create a high-confidence coessentiality network. Using a log likelihood enrichment score against the KEGG database as a scoring scheme, we tweaked several parameters – what quality threshold to apply to the screens, what minimum number of cell lines a gene was called essential in, and the overall variance of a gene’s essentiality profile across all cell lines – to maximize that score.
  • By rank ordering correlations at each threshold, binning in groups of 1000, and calculating the LLS vs KEGG, we get a measure of how each threshold performed. In the end, the top 107 cell lines with matching RNAseq data and genes essential in at least 4 cell lines yielded top correlations that showed about an 18-fold enrichment in KEGG pairs, and that’s after excluding the big bias-inducing sets like ribosome, proteasome, spliceosome. Taking all pairs positive correlations with FDR < 1% yielded a correlation network with 1,086 genes…
  • And here it is. The giant connected component of the human coessentiality network. 866 genes, with an average degree of a little over 4. Four major annotations are highlighted, to show that it is not dominated by the Big Three.

    Naturally, when you build a network, the first thing you want to do is tear it apart to see what’s in it.
  • Unsupervised clustering of the network yields clusters that are obviously functionally coherent (generally monochromatic). This is a very nice validation of the idea that coessentiality implies cofunctionality. The obvious next question here is, what groups of cell lines drive these clusters? Do each of these represent specific classes of genetic vulnerability? I’ll go into detail on these top two clusters; the first has no annotation of note and the second is clearly enriched for genes involved in the mitochondrial process of oxidative phosphorylation.
  • This is the first cluster, comprising over 30 genes. Each gene’s essentiality profile is shown in this heatmap (left), with green or better representing high confidence essential.

    This column represents the tissue or subtype of the cell line. Light and medium blue are luminal and HER2 amplified breast cancer cell lines. So this cluster accurately discriminates these subtypes from the other data. A closer look at the genes in the cluster shows SPDEF, TFAP2C, FOXA1, CDK4, CCND1– the known oncogenes driving this subtype. Discovery of this cluster is a strong validation of our approach.
  • It is worth noting that, despite the general trend that essentiality is not correlated with expression level, the genes in this BrCa cluster show strong positive correlation (note far out on tail of blue curve). These genes show subtype-specific expression (or overexpression) as well as essentiality but unfortunately do not indicate a general trend.
  • The second cluster is enriched for oxphos genes. These genes are essential in a mix of breast and ovarian cancer cell lines, showing that these clusters aren’t merely differentiating tissues of origin. A closer look shows that nearly all of the genes show mitochondrial localization…
  • Going into a bit more detail, we see that the entire oxphos pathway as well as its biogenesis is recovered. There are genes involved in mitochondrial import and mitochondrial genome transcription, the mitochondrial ribosome, and elements of four of the five oxphos complexes, plus cytochrome c and the enzyme that covalently attaches the heme group to the cytochrome. This is exactly what you’d expect from a classical genetic screen.

    This cluster implies that these cell lines are critically dependent on oxphos for proliferation. This is a very surprising finding, since the Warburg effect involves switching to glycolysis and away from oxphos dependence. Moreover, these genes show no correlation between essentiality and expression level; basically there is no molecular signature that predicts this dependence among these cell types.
  • I’ve focused on the cancer implications of the major clusters in this network. In fact one of the minor clusters fine tunes the BrCa cluster; the ERBB2/ERBB3 cluster differentiates Her2-amplified cell lines from other luminal BrCa lines. Turning to a more functional genomics oriented view, we can see other evidence that coessentiality implies confunctionality. Here we have three subunits of the Cops9 signalosome complex, and all three subunits of the KGDH enzyme, a key step in the TCA cycle.

    This last example shows a putative connection between an Hsp70 chaperone and its associated nucleotide exchange factor. This specific interaction is not annotated in BioGrid, CORUM, or GO, but the cofactor is a homolog of both yeast and bacterial genes with the same function. [if time permits: Moreover, in a global assay of protein complexes by large-scale coelution that we performed in collaboration with Andrew Emili’s lab, we identified these two proteins as having a moderate probability of interacting.] This example represents only one of many strong predictions of cofunctionality that come out of this network.
  • A more focused approach uses synthetic lethality to find candidate therapeutic targets. In this example, Gene B is essential only in the context of the somatic mutation in allele a, not wildtype A, and a therapy that targets B should in principle preferentially destroy cancer cells. As noted in this review, the ability to screen for these context-specific essentials has grown in concert with the availability of reagents to do systematic perturbation screens in human cells.
  • NetBioSIG2014-Talk by Traver Hart

    1. 1. Functional genomics and cancer subtyping with a human cancer coessentiality network Traver Hart Laboratory of Jason Moffat Donnelly Centre, U. Toronto NetBio SIG, 11 July 2014
    2. 2. GI correlation networks Costanzo & Baryshnikova, et al., 2010
    3. 3. GI correlation networks Gene a Gene b Gene c Gene d Gene e Cellline1 Cellline2 Cellline3 Cellline4 Cellline5 Cellline12 Cellline6 Cellline7 Cellline8 Cellline9 Cellline10 Cellline11 Essential Nonessential In yeast, highly correlated profiles imply shared gene function. Can be used to infer function of unknown genes. Hypothesis: Correlated essentiality profiles across human cancer cell lines (bottom) are analogous to correlated GI profiles and imply shared function— even if we don’t know the query strain Corollary: Gene clusters can help identify cell lines with similar vulnerabilities, possibly leading to novel classification Dixon et al., 2009 Query strains ArraygenesArraystrains Query strains
    4. 4. Pooled library shRNA screens Marcotte et al., 2012
    5. 5. Bayesian essentiality scoring Hart et al., 2014
    6. 6. The “daisy model” & core essential genes Hart et al., 2014
    7. 7. A quantitative measure of sensitivity to RNAi perturbation Correlation, Essentiality Score vs Expression Density Gene a Gene b Gene c Gene d Gene e Cellline1 Cellline2 Cellline3 Cellline4 Cellline5 Cellline12 Cellline6 Cellline7 Cellline8 Cellline9 Cellline10 Cellline11 Essential Nonessential Query cell lines Hart et al., 2014
    8. 8. Mean essentiality score Std.Devessentialityscore F-measure Number of Cell Lines NumberofCellLinesNumberofEssentialGenes
    9. 9. Optimize LLS vs KEGG 30 BrCa Luminal + Her2 24 BrCa Basal 34 OvCa 15 PDAC Filtered data: 107 total screens 2,842 genes Correlations at 1% FDR: 1,086 genes F65 F50 No hairpin norm. Achilles Correlation pair rank LogLikelihoodScore(vsKEGG) 4 other
    10. 10. 866 genes 1,877 edges Ribosome Proteasome Spliceosome OxPhos The Human Coessentiality Network
    11. 11. Network Clustering Ribosome Proteasome Spliceosome OxPhos
    12. 12. BRCA Luminal/HER2 BrCa/LUM+Her2 BrCa/Basal OvCa PDAC 2
    13. 13. Expression vs. Essentiality Correlation coefficient GENE CORR P-VAL SPDEF 0.716 2.00e-17 FOXA1 0.624 6.73e-13 ERBB2 0.583 4.57e-11 MDM2 0.529 7.87e-09 TFAP2C 0.463 5.12e-07 FUBP1 0.428 4.33e-06 ESR1 0.411 9.39e-05 CCND1 0.410 1.17e-05
    14. 14. OxPhos Cluster BrCa/LUM+Her2 BrCa/Basal OvCa PDAC
    16. 16. Functional genomics (MLL2) Homolog of GrpE, NEF of Hsp70-type ATPases Mitochondrial Hsp70 family
    17. 17. Conclusions: • The human cancer coessentiality network – Depends critically on the new scoring scheme derived from Hart et al, 2014 – Optimized by lessons learned from the yeast GI network • Clusters identify cell lines with common genetic vulnerabilities – Known and novel • Co-essentiality implies Co-functionality – A unique functional genomics resource Open questions: • Identify genomic drivers of validated clusters? • Improve coverage? • Improve accuracy? CRISPR?
    18. 18. Robert Rottapel Fabrice Sirculomb Fernando Suarez Mauricio Medrano Josee Normand Jason Moffat Troy Ketela Kevin Brown Judice Koh Glauber Brito Azin Sayid Dina Karamboulas Dewald Van Dyk Dahlia Kasimer Christine Misquitta Acknowledgements Essentiality Screens in Cancer Cell Lines 18 Yaroslav Fedyshyn Marianna Luhova Bohdana Fedyshyn Patricia Mero Christine Misquitta Franco Vizeacoumar Benjamin Neel Richard Marcotte Azin Sayad
    19. 19. CoEssential + CoElution 850 PPI 218 neg 313? 17 PPI 60 neg 236 ? Vs CORUM Vs GO_CC GENE1 GENE2 CoEss CoElu GO_CC? Notes DLST OGDH 0.75 0.97 ? aKG dehydrogenase MRPL22 MRPL23 0.66 0.52 0 Mitochondrial EIF2B2 EIF2B3 0.62 0.91 1 EIF2B complex MRPL23 SSBP1 0.59 0.70 ? Mitochondrial HCCS SSBP1 0.57 0.42 ? Mitochondrial NACA RPLP2 0.56 0.61 ? Translation ICT1 MRPL22 0.55 0.68 0 Mitochondrial ATP5B CYC1 0.53 0.59 ? Mitochondrial ICT1 PTCD3 0.53 0.70 ? Mitochondrial EML4 MAU2 0.52 0.41 0 Microtubule associated protein / sister chromatid cohesion factor NSMCE1 SMC5 0.52 0.51 1 SMC5/6 complex COPB2 COPG1 0.51 0.98 1 Coatomer complex NUTF2 RAN 0.50 0.73 1 Nuclear pore/transport NUP205 NUP93 0.50 0.61 1 Nuclear pore/transport ATP5A1 CYC1 0.50 0.79 ? Mitochondrial ARCN1 COPB1 0.49 0.99 1 Coatomer complex EBNA1BP2 NIFK 0.49 0.60 ? Ribosome biogenesis? Mitosis? BRIX1 UTP15 0.49 0.80 ? Ribosome biogenesis EIF3C PTBP3 0.48 0.47 ? EIF3 / Polypyrimidine (RNA) binding GRPEL1 HSPA9 0.47 0.48 ? HSP70 + nucleotide exchange factor CYC1 PTCD3 0.47 0.68 ? Mitochondrial EIF5A HNRNPK 0.46 0.68 0 Translation / Splicing CYC1 UQCRC1 0.45 0.41 ? Mitochondrial CYC1 ECSIT 0.45 0.50 ? Mitochondrial ATP5B UQCRC1 0.44 0.57 0 Mitochondrial
    20. 20. Why Gene Essentiality? • Context-sensitive essentials are candidate therapeutic targets Kaelin WG, Nat Rev Cancer, 2005 Wildtype A Oncogenic a Targeted b