Assume the NetBio SIG audience is familiar with this network…
Key finding of the yeast GI network is that correlated GI profiles imply cofunctionality
Question driving my research is, how can we get this kind of information for humans?
In yeast, the network is generated by systematic assay of double knockout mutants or TS alleles for essential genes
Systematic work like this has been very difficult in humans, though it has been attempted on modest scales by a couple of labs.
However, there does exist a significant body of gene essentiality data for human cancer cell lines. Our hypothesis was that correlated gene essentiality profiles across these cell lines would produce similar information about shared biological function, and further that we may be able to discover novel patterns of shared genetic vulnerability across the cell lines assayed.
Here the data represent either a binary measure of essentiality or quantitative measure of gene sensitivity to perturbation
The data comes from pooled library shRNA screens where multiple shRNA hairpins targeting nearly every protein coding gene are introduced into a cell line. hairpins targeting essential genes drop out of the population and register as strong negative fold change using either a custom microarray or sequencing readout. The initial data are from screens of 72 cancer cell lines of pancreatic, ovarian, and breast origin.
We have recently completed a reanalysis of this initial data set; it was just published at the beginning of this month and so I won’t go into great detail here. Briefly, we derived gold standard reference sets of essential and nonessential genes that we can use to train a Bayesian classifier—the observed fold changes of hairpins targeting a given gene are judged to be drawn from either the reference essential or nonessential distributions and the posterior log odds ratio is used as an essentiality score. Just as importantly, we can evaluate the quality of each screen using withheld reference data and identify which screens should be removed from downstream work.
This work led us to the ‘daisy model’ of essentiality, where the essential genes in each cell line or tissue are represented by a petal. Petals overlap to varying degrees but they all share the same set of core essential genes; the degree to which a screen recapitulates this core is then a measure of its quality. This model is supported by whole organism data: when separated into core and peripheral essentials, only peripheral essentials are enriched for disease genes. Likewise core essentials show much lower rates of putative deleterious mutation in human population genetic data, indicating that at an organismal level we can’t stand even minor perturbation of these core essential genes.
A surprising finding was that, in general, essentiality is not correlated with gene expression, as measure by RNAseq. Even more surprising was the exception to this rule: core essentials show a strong negative correlation. This may be explained by there being some minimum expression requirement for these genes; at this minimum, the genes are very sensitive to perturbation. At higher expression levels, this sensitivity is buffered. Imporantly, this led us to the conclusion that our scoring scheme was yielding a quantitative measure of gene essentiality – or sensitivity – I’ll use those terms interchangeably. So we used these scores directly in our matrix.
Despite our improved scoring scheme, the essentiality data are still quite noisy. While each cell line had a fairly modest FDR of around 15%, across over 100 cell lines false positives accumulated rapidly. Here we leaned on the lessons from the yeast network to create a high-confidence coessentiality network. Using a log likelihood enrichment score against the KEGG database as a scoring scheme, we tweaked several parameters – what quality threshold to apply to the screens, what minimum number of cell lines a gene was called essential in, and the overall variance of a gene’s essentiality profile across all cell lines – to maximize that score.
By rank ordering correlations at each threshold, binning in groups of 1000, and calculating the LLS vs KEGG, we get a measure of how each threshold performed. In the end, the top 107 cell lines with matching RNAseq data and genes essential in at least 4 cell lines yielded top correlations that showed about an 18-fold enrichment in KEGG pairs, and that’s after excluding the big bias-inducing sets like ribosome, proteasome, spliceosome. Taking all pairs positive correlations with FDR < 1% yielded a correlation network with 1,086 genes…
And here it is. The giant connected component of the human coessentiality network. 866 genes, with an average degree of a little over 4. Four major annotations are highlighted, to show that it is not dominated by the Big Three.
Naturally, when you build a network, the first thing you want to do is tear it apart to see what’s in it.
Unsupervised clustering of the network yields clusters that are obviously functionally coherent (generally monochromatic). This is a very nice validation of the idea that coessentiality implies cofunctionality. The obvious next question here is, what groups of cell lines drive these clusters? Do each of these represent specific classes of genetic vulnerability? I’ll go into detail on these top two clusters; the first has no annotation of note and the second is clearly enriched for genes involved in the mitochondrial process of oxidative phosphorylation.
This is the first cluster, comprising over 30 genes. Each gene’s essentiality profile is shown in this heatmap (left), with green or better representing high confidence essential.
This column represents the tissue or subtype of the cell line. Light and medium blue are luminal and HER2 amplified breast cancer cell lines. So this cluster accurately discriminates these subtypes from the other data. A closer look at the genes in the cluster shows SPDEF, TFAP2C, FOXA1, CDK4, CCND1– the known oncogenes driving this subtype. Discovery of this cluster is a strong validation of our approach.
It is worth noting that, despite the general trend that essentiality is not correlated with expression level, the genes in this BrCa cluster show strong positive correlation (note far out on tail of blue curve). These genes show subtype-specific expression (or overexpression) as well as essentiality but unfortunately do not indicate a general trend.
The second cluster is enriched for oxphos genes. These genes are essential in a mix of breast and ovarian cancer cell lines, showing that these clusters aren’t merely differentiating tissues of origin. A closer look shows that nearly all of the genes show mitochondrial localization…
Going into a bit more detail, we see that the entire oxphos pathway as well as its biogenesis is recovered. There are genes involved in mitochondrial import and mitochondrial genome transcription, the mitochondrial ribosome, and elements of four of the five oxphos complexes, plus cytochrome c and the enzyme that covalently attaches the heme group to the cytochrome. This is exactly what you’d expect from a classical genetic screen.
This cluster implies that these cell lines are critically dependent on oxphos for proliferation. This is a very surprising finding, since the Warburg effect involves switching to glycolysis and away from oxphos dependence. Moreover, these genes show no correlation between essentiality and expression level; basically there is no molecular signature that predicts this dependence among these cell types.
I’ve focused on the cancer implications of the major clusters in this network. In fact one of the minor clusters fine tunes the BrCa cluster; the ERBB2/ERBB3 cluster differentiates Her2-amplified cell lines from other luminal BrCa lines. Turning to a more functional genomics oriented view, we can see other evidence that coessentiality implies confunctionality. Here we have three subunits of the Cops9 signalosome complex, and all three subunits of the KGDH enzyme, a key step in the TCA cycle.
This last example shows a putative connection between an Hsp70 chaperone and its associated nucleotide exchange factor. This specific interaction is not annotated in BioGrid, CORUM, or GO, but the cofactor is a homolog of both yeast and bacterial genes with the same function. [if time permits: Moreover, in a global assay of protein complexes by large-scale coelution that we performed in collaboration with Andrew Emili’s lab, we identified these two proteins as having a moderate probability of interacting.] This example represents only one of many strong predictions of cofunctionality that come out of this network.
A more focused approach uses synthetic lethality to find candidate therapeutic targets. In this example, Gene B is essential only in the context of the somatic mutation in allele a, not wildtype A, and a therapy that targets B should in principle preferentially destroy cancer cells. As noted in this review, the ability to screen for these context-specific essentials has grown in concert with the availability of reagents to do systematic perturbation screens in human cells.
NetBioSIG2014-Talk by Traver Hart
Functional genomics and cancer subtyping
with a human cancer coessentiality network
Laboratory of Jason Moffat
Donnelly Centre, U. Toronto
NetBio SIG, 11 July 2014
GI correlation networks
Costanzo & Baryshnikova, et al., 2010
In yeast, highly correlated profiles imply
shared gene function. Can be used to
infer function of unknown genes.
Hypothesis: Correlated essentiality
profiles across human cancer cell lines
(bottom) are analogous to correlated GI
profiles and imply shared function—
even if we don’t know the query strain
Corollary: Gene clusters can help
identify cell lines with similar
vulnerabilities, possibly leading to novel
Dixon et al., 2009
Pooled library shRNA screens
Marcotte et al., 2012
Bayesian essentiality scoring
Hart et al., 2014
The “daisy model” & core essential genes
Hart et al., 2014
A quantitative measure of sensitivity
to RNAi perturbation
Correlation, Essentiality Score vs Expression
Query cell lines
Hart et al., 2014
Mean essentiality score
Number of Cell Lines
Optimize LLS vs KEGG
107 total screens
Correlations at 1% FDR:
No hairpin norm.
Correlation pair rank
Homolog of GrpE,
NEF of Hsp70-type
• The human cancer coessentiality network
– Depends critically on the new scoring scheme derived from Hart et al, 2014
– Optimized by lessons learned from the yeast GI network
• Clusters identify cell lines with common genetic vulnerabilities
– Known and novel
• Co-essentiality implies Co-functionality
– A unique functional genomics resource
• Identify genomic drivers of validated clusters?
• Improve coverage?
• Improve accuracy? CRISPR?
Dewald Van Dyk
Essentiality Screens in Cancer Cell Lines